Hi Peter,
Thanks a lot for your help.
Actually I am reading the fields to be indexed from a database and then
creating an index on these fields. I am able to perform all the other
searches but this one.
For queries like e.g. "content:jakarta AND content:apache" or e.g
"+content:apache AND -content:retrieval"
I compared the search the results with other indexing libraries viz.
Ferret, Lucene etc and they gave the same results.
But for query "content:\"jakarta apache\"~4 results shown by Lucene and
Ferret are accurate but I am not getting any record with Lucy.
My code for indexing is -
###############################################################
use strict;
use warnings;
use Redis;
use JSON;
use Lucy::Plan::Schema;
use Lucy::Plan::FullTextType;
use Lucy::Analysis::PolyAnalyzer;
use Lucy::Index::Indexer;
use lib "/root/apache-lucy-0.3.1/perl/lib";
no warnings 'uninitialized';
my $path_to_index = '/lucy_store';
my $schema = Lucy::Plan::Schema->new;
my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
language => 'en',
);
my $content = Lucy::Plan::FullTextType->new(
analyzer => $polyanalyzer,
);
my $order = Lucy::Plan::FullTextType->new(
analyzer => $polyanalyzer,
sortable => 1,
);
$schema->spec_field( name => 'content', type => $content);
$schema->spec_field( name => 'order', type => $order);
my $indexer = Lucy::Index::Indexer->new(
index => $path_to_index,
schema => $schema,
create => 1,
truncate => 1,
);
my $r = Redis->new;
my @records;
my $noOfRecords = 10000;
binmode(STDOUT, ":utf8");
print("Extracting records\n");
retrieve_data($noOfRecords); #Retriving data from the database
print("Finished Extracting\n");
print("Indexing started\n");
for(my $count = 0; $count < $noOfRecords; $count++)
{
my $doc = parse_data();
$indexer->add_doc($doc);
}
$indexer->commit;
print("Finished Indexing\n");
sub retrieve_data
{
for (my $count = 0; $count < $_[0]; $count++)
{
my $value = $r->get ("retriveFROMDATA:$count");
my $decoded_hash = decode_json $value;
push(@records, $decoded_hash);
}
}
sub parse_data
{
my $decoded_hash = shift(@records);
my %hash_packet;
while( my ($key, $value) = each%$decoded_hash)
{
if($key eq "packet")
{
while(my ($key1, $value1) = each%$value)
{
if($key1 eq "content" || $key1 eq "order")
{
if($value1)
{
if($key1 eq "order")
{
$hash_packet{$key1} = int($value1);
}
else
{
$hash_packet{$key1} = $value1;
}
}
else
{
if($key1 eq "order")
{
$hash_packet{$key1} = -1;
}
else
{
$hash_packet{$key1} = "";
}
}
}
}
}
}
return
{
content => $hash_packet{"content"},
published_at => $hash_packet{"order"},
};
}
$r->quit;
###############################################################
My code for searching through the indexed documents
###############################################################
use strict;
use warnings;
my $path_to_index = '/lucy_store';
use List::Util qw ( max min );
use POSIX qw ( ceil );
use Encode qw ( decode );
use Lucy::Search::IndexSearcher;
use Lucy::Search::QueryParser;
use Lucy::Search::TermQuery;
use LucyX::Search::ProximityQuery;
binmode STDOUT, ":encoding(UTF-8)";
my $proximity_query = LucyX::Search::ProximityQuery->new(
field => 'content',
terms => [ qw ( jakarta apache ) ],
within => 4,
);
my $by_order = Lucy::Search::SortRule->new(
field => 'order',
reverse => 1,
);
my $sort_spec = Lucy::Search::SortSpec->new(
rules => [
$by_order,
],
);
my $offset = "0";
my $page_size = 10000;
my $searcher = Lucy::Search::IndexSearcher->new(
index => $path_to_index,
);
my $hits = $searcher->hits(
query => $proximity_query,
offset => $offset,
num_wanted => $page_size,
sort_spec => $sort_spec, # when i remove this statement i am not
getting any segmentation fault
);
my $hit_count = $hits->total_hits;
while(my $hit = $hits->next)
{
my $content = $hit->{content};
my $order = $hit->{order};
print("Content : $content\n");
print("Order :$order\n");
print("\n");
}
print("Hit Count :$hit_count\n");
print("Program executing till here\n"); #Program is getting executed till
here
###############################################################
Also when I execute my searching code I am getting segmentation fault. The
statement "Program executing till here" is getting printed. When I remove
the sorting specification I am not getting any segmentation fault. I am
sorting based on an integer field.
Thank you.
On Wed, Jun 20, 2012 at 7:17 AM, Peter Karman <[email protected]> wrote:
> On 6/19/12 2:05 PM, Saurabh Vasekar wrote:
>
> [ snipped searching code ]
>
>
> Although my documents contain contents which have text 'jakarta' and
>> 'apache' I am not getting any results. The interesting thing is that is I
>> specify the following in my proximity query the search returns appropriate
>> results.
>>
>> my $proximity_query = LucyX::Search::ProximityQuery-**>new(
>> field => 'content',
>> terms => [ qw ( in the ) ],
>> within => 4,
>> );
>>
>> Is my implementation correct?
>>
>>
> your search code looks reasonable. I would suggest a fully self-contained
> example, including example docs and indexing code, to really demonstrate
> the problem. Since we can't see what's in your index, it's difficult to
> help determine if this is a problem in your code or in Lucy.
>
>
> --
> Peter Karman . http://peknet.com/ . [email protected]
>