Saurabh Vasekar wrote on 6/21/12 3:01 PM:
>
> For queries like e.g. "content:jakarta AND content:apache" or e.g
> "+content:apache AND -content:retrieval"
> I compared the search the results with other indexing libraries viz. Ferret,
> Lucene etc and they gave the same results.
>
> But for query "content:\"jakarta apache\"~4 results shown by Lucene and Ferret
> are accurate but I am not getting any record with Lucy.
>
Thanks for the full code examples. They were helpful.
It took me a few hours of playing with it to figure out why it wasn't working as
you (and I) expected. Your indexing code is fine. The searching code assumes (as
did I at first) that the terms in a ProximityQuery object would be analyzed
(stemmed). They aren't. Only the QueryParser does the analyzing. When you
construct a Query object manually, you have to the analysis yourself.
Unfortunately, the core Lucy::Search::QueryParser class doesn't handle the
proximity syntax, since ProximityQuery is an extension to the core.
Fortunately, Search::Query::Parser handles more advanced query syntax than does
the core class. (This is no knock against the Lucy parser -- as Marvin and I
have discussed in the past, it is a thankless task to try and create a parser
that is all things to all people.)
I've included example searcher code below. I've included examples of using a
query parser vs just constructing the query objects manually.
use strict;
use warnings;
my $path_to_index = 'lucy_store';
use Lucy::Search::QueryParser;
use Lucy::Search::IndexSearcher;
use LucyX::Search::ProximityQuery;
use Search::Query;
my $searcher = Lucy::Search::IndexSearcher->new( index => $path_to_index, );
TERM: {
my $term_query = Lucy::Search::TermQuery->new(
field => 'content',
term => 'apache',
);
my $hits = $searcher->hits( query => $term_query, );
my $hit_count = $hits->total_hits;
while ( my $hit = $hits->next ) {
my $content = $hit->{content};
print("Content : $content\n");
print("\n");
}
printf( "TERM Hit Count :$hit_count for query %s\n",
$term_query->to_string );
}
TERMPARSED: {
my $qp = Lucy::Search::QueryParser->new(
schema => $searcher->get_schema,
fields => [qw( content )],
);
my $term_query = $qp->parse('apache');
my $hits = $searcher->hits( query => $term_query, );
my $hit_count = $hits->total_hits;
while ( my $hit = $hits->next ) {
my $content = $hit->{content};
print("Content : $content\n");
print("\n");
}
printf( "TERMPARSED Hit Count :$hit_count for query %s\n",
$term_query->to_string );
}
PROX: {
my $proximity_query = LucyX::Search::ProximityQuery->new(
field => 'content',
terms => [qw( apache jakarta )],
within => 4,
);
my $hits = $searcher->hits( query => $proximity_query );
my $hit_count = $hits->total_hits;
while ( my $hit = $hits->next ) {
my $content = $hit->{content};
print("Content : $content\n");
print("\n");
}
printf( "PROX Hit Count :$hit_count for query %s\n",
$proximity_query->to_string );
}
PROXSQP: {
my $schema = $searcher->get_schema();
my $field_names = $schema->all_fields;
my %fieldtypes;
for my $name (@$field_names) {
$fieldtypes{$name} = {
type => $schema->fetch_type($name),
analyzer => $schema->fetch_analyzer($name)
};
}
my $qp = Search::Query::Parser->new(
dialect => 'Lucy',
fields => \%fieldtypes,
dialect_opts => { default_field => 'content' }, # just for example
);
my $proximity_query
= $qp->parse('content:"apache jakarta"~4')->as_lucy_query;
my $hits = $searcher->hits( query => $proximity_query );
my $hit_count = $hits->total_hits;
while ( my $hit = $hits->next ) {
my $content = $hit->{content};
print("Content : $content\n");
print("\n");
}
printf( "PROXSQP Hit Count :$hit_count for query %s\n",
$proximity_query->to_string );
}
--
Peter Karman . http://peknet.com/ . [email protected]