Thanks for responding Peter.
On 30/09/2013 16:45, Peter Karman wrote:
fwiw, segments are an implementation detail that most code shouldn't
need to know anything about.
This is good to know.
show your code, please.
Doing my best to extract something close enough to the original code.
Lots of Moose and fluff culled.
%%% Schema generation %%%
$schema = Lucy::Plan::Schema->new;
$lucy_str = Lucy::Plan::StringType->new;
# schema generated from Moose MOP
for my $attribute ($record_meta->get_all_attributes) {
$schema->spec_field( name => $attribute->name, type => $lucy_str);
}
%%% Indexing %%%
$lucy_indexer = Lucy::Index::Indexer->new(
schema => $schema,
index => $path,
create => 1,
);
#
while ($record = shift) {
%flattened_record = %{$record};
$flattened_record{accessions} = join ' ',@accessions;
# Array of values turned into whitespaced list.
$lucy_indexer->add_doc(
\%flattened_record
);
}
# Commit is called ~100k records, before spinning up another indexer
$lucy_indexer->commit;
%%% Querying %%%
$query = 'accessions:UPI01';
$searcher = Lucy::Search::IndexSearcher->new(
index => $path,
);
$parser = Search::Query->parser(
dialect => 'Lucy',
fields => $lucy_indexer->get_schema()->all_fields,
);
$search = $parser->parse($query)->as_lucy_query;
$result = $searcher->hits(
query => $search,
num_wanted => 1000,
);
while (my $hit = $result->next) {
say $hit->{accessions};
}
I've not shown result paging code, and some blob data use that doesn't
affect this issue, since blobs are for later and I'm not getting any
hits on some strings, that I can grep from the .dat in a seg.
If needs be, I can tar the lot up onto Google Drive.
Cheers,
Kieron