On 02/11/2015 08:54, Gerald Richter wrote:
I like to get all distinct values from a field, something which would in sql
look like this:

select distinct fieldname from table

where fieldname is a StringType.

Is this possible with lucy?

The easiest way (using a PolyLexiconReader under the hood):

    my $index = Lucy::Index::IndexReader->open(index => $path_to_index);
    my $lex_reader = $index->obtain('Lucy::Index::LexiconReader');
    my $lexicon = $lex_reader->lexicon(field => $field_name);
    my @terms;

    while ($lexicon->next) {
        push(@terms, $lexicon->get_term);
    }

Depending on the size of your index and the number of segments, it might be more efficient to merge the terms from multiple segments manually:

    my $index = Lucy::Index::IndexReader->open(index => $path_to_index);
    my $seg_readers = $index->seg_readers;
    my %term_hash;

    for my $seg_reader (@$seg_readers) {
        my $lex_reader = $seg_reader->obtain('Lucy::Index::LexiconReader');
        my $lexicon = $lex_reader->lexicon(field => $field_name);

        while ($lexicon->next) {
            my $term = $lexicon->get_term;
            $term_hash{$term} = undef;
        }
    }

    my @terms = keys(%term_hash);

Note that these examples also work with full text fields.

Nick

Reply via email to