On 02/11/2015 08:54, Gerald Richter wrote:
I like to get all distinct values from a field, something which would in sql
look like this:
select distinct fieldname from table
where fieldname is a StringType.
Is this possible with lucy?
The easiest way (using a PolyLexiconReader under the hood):
my $index = Lucy::Index::IndexReader->open(index => $path_to_index);
my $lex_reader = $index->obtain('Lucy::Index::LexiconReader');
my $lexicon = $lex_reader->lexicon(field => $field_name);
my @terms;
while ($lexicon->next) {
push(@terms, $lexicon->get_term);
}
Depending on the size of your index and the number of segments, it might be
more efficient to merge the terms from multiple segments manually:
my $index = Lucy::Index::IndexReader->open(index => $path_to_index);
my $seg_readers = $index->seg_readers;
my %term_hash;
for my $seg_reader (@$seg_readers) {
my $lex_reader = $seg_reader->obtain('Lucy::Index::LexiconReader');
my $lexicon = $lex_reader->lexicon(field => $field_name);
while ($lexicon->next) {
my $term = $lexicon->get_term;
$term_hash{$term} = undef;
}
}
my @terms = keys(%term_hash);
Note that these examples also work with full text fields.
Nick