Hi Peter,

Thanks a lot for your help.

Actually I am reading the fields to be indexed from a database and then
creating an index on these fields. I am able to perform all the other
searches but this one.

For queries like  e.g. "content:jakarta AND content:apache" or e.g
"+content:apache AND -content:retrieval"
I compared the search the results with other indexing libraries viz.
 Ferret, Lucene etc and they gave the same results.

But for query "content:\"jakarta apache\"~4 results shown by Lucene and
Ferret are accurate but I am not getting any record with Lucy.

My code for indexing is -

###############################################################
use strict;
use warnings;
use Redis;
use JSON;

use Lucy::Plan::Schema;
use Lucy::Plan::FullTextType;
use Lucy::Analysis::PolyAnalyzer;
use Lucy::Index::Indexer;


use lib "/root/apache-lucy-0.3.1/perl/lib";

no warnings 'uninitialized';

my $path_to_index = '/lucy_store';

my $schema = Lucy::Plan::Schema->new;

my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
language => 'en',
);

my $content = Lucy::Plan::FullTextType->new(
analyzer => $polyanalyzer,
);

my $order = Lucy::Plan::FullTextType->new(
analyzer => $polyanalyzer,
sortable => 1,
);

$schema->spec_field( name => 'content', type => $content);
$schema->spec_field( name => 'order', type => $order);

my $indexer = Lucy::Index::Indexer->new(
index => $path_to_index,
schema => $schema,
create => 1,
truncate => 1,
);

my $r = Redis->new;

my @records;

my $noOfRecords = 10000;

binmode(STDOUT, ":utf8");

print("Extracting records\n");

retrieve_data($noOfRecords);  #Retriving data from the database

print("Finished Extracting\n");

print("Indexing started\n");

for(my $count = 0; $count < $noOfRecords; $count++)
{
my $doc = parse_data();
$indexer->add_doc($doc);
}

$indexer->commit;

print("Finished Indexing\n");

sub retrieve_data
{
for (my $count = 0; $count < $_[0]; $count++)
{
my $value = $r->get ("retriveFROMDATA:$count");

my $decoded_hash = decode_json $value;

push(@records, $decoded_hash);
}
}


sub parse_data
{
my $decoded_hash = shift(@records);
 my %hash_packet;

while( my ($key, $value) = each%$decoded_hash)
{
if($key eq "packet")
{
      while(my ($key1, $value1) = each%$value)
     {
if($key1 eq "content" || $key1 eq "order")
{
if($value1)
{
if($key1 eq "order")
{
$hash_packet{$key1} = int($value1);
}
else
{
$hash_packet{$key1} = $value1;
}
}
else
{
if($key1 eq "order")
{
$hash_packet{$key1} = -1;
}
else
{
$hash_packet{$key1} = "";
}
}
}
}
 }
}
 return
{
content => $hash_packet{"content"},
published_at => $hash_packet{"order"},
};
}
$r->quit;
###############################################################



My code for searching through the indexed documents


###############################################################
use strict;
use warnings;

my $path_to_index = '/lucy_store';

use List::Util qw ( max min );
use POSIX qw ( ceil );
use Encode qw ( decode );

use Lucy::Search::IndexSearcher;
use Lucy::Search::QueryParser;
use Lucy::Search::TermQuery;
use LucyX::Search::ProximityQuery;


binmode STDOUT, ":encoding(UTF-8)";

my $proximity_query = LucyX::Search::ProximityQuery->new(
field => 'content',
terms => [ qw ( jakarta apache ) ],
within => 4,
);

my $by_order = Lucy::Search::SortRule->new(
field => 'order',
reverse => 1,
);

my $sort_spec = Lucy::Search::SortSpec->new(
rules => [
$by_order,
 ],
);

my $offset = "0";

my $page_size = 10000;

my $searcher = Lucy::Search::IndexSearcher->new(
index => $path_to_index,
);

my $hits = $searcher->hits(
query => $proximity_query,
offset => $offset,
num_wanted => $page_size,
sort_spec => $sort_spec,      # when i remove this statement i am not
getting any segmentation fault
);


my $hit_count = $hits->total_hits;

while(my $hit = $hits->next)
{
my $content = $hit->{content};
my $order = $hit->{order};

print("Content : $content\n");
print("Order :$order\n");
print("\n");
}

print("Hit Count :$hit_count\n");
print("Program executing till here\n"); #Program is getting executed till
here
###############################################################

Also when I execute my searching code I am getting segmentation fault. The
statement "Program executing till here" is getting printed. When I remove
the sorting specification I am not getting any segmentation fault. I am
sorting based on an integer field.

Thank you.



On Wed, Jun 20, 2012 at 7:17 AM, Peter Karman <[email protected]> wrote:

> On 6/19/12 2:05 PM, Saurabh Vasekar wrote:
>
> [ snipped searching code ]
>
>
>  Although my documents contain contents which have text 'jakarta' and
>> 'apache' I am not getting any results. The interesting thing is that is I
>> specify the following in my proximity query the search returns appropriate
>> results.
>>
>> my $proximity_query = LucyX::Search::ProximityQuery-**>new(
>> field =>  'content',
>> terms =>  [ qw ( in the ) ],
>> within =>  4,
>> );
>>
>> Is my implementation correct?
>>
>>
> your search code looks reasonable. I would suggest a fully self-contained
> example, including example docs and indexing code, to really demonstrate
> the problem. Since we can't see what's in your index, it's difficult to
> help determine if this is a problem in your code or in Lucy.
>
>
> --
> Peter Karman  .  http://peknet.com/  .  [email protected]
>

Reply via email to