What I'm doing now is since I have line number and seek position I'm moving forward line by line based on last record that I got. I'm also adding an end_point marker which is my search to decide to move forward.
Thanks, Rajiv Gupta -----Original Message----- From: Nick Wellnhofer [mailto:[email protected]] Sent: Wednesday, November 23, 2016 9:30 PM To: [email protected] Subject: Re: [lucy-user] Doc id from hits and remove redundant documents On 23/11/2016 16:31, Gupta, Rajiv wrote: > Thanks for your reply Nick. > > I wanted to delete the old documents that is why I was trying to get the > doc_id and use that to delete it. However, that does not help it deleted > other documents and keep changing the document. I wanted to use delete by > term but in my doc I don't have any primary key. > > I add document like this: > > $indexer->add_doc({ > title => $mytitle, > content => substr($mybodytext,0,1024), > url => $onlyfilename, > urlpath => $filpath, > position => $fileseektostart, > linenum => $filelinenumtostart, > jobtype => $self->{_logfile_hash}{$filetoindex}[5] , > }); You can use any field as primary key if the field's value is guaranteed to be unique for all your documents. But it seems that you index the contents of files line by line, so "urlpath" isn't unique. Your primary key is probably the tuple (urlpath, linenum). If you update all the lines of a file at once, this isn't a problem. You can simply delete all documents relating to the file with $indexer->delete_by_term( field => 'urlpath', term => $filepath, ); If you only want to update certain lines, you'll have to construct an ANDQuery for each line and use delete_by_query. For example: $indexer->delete_by_query(Lucy::Search::ANDQuery->new( children => [ Lucy::Search::TermQuery->new( field => 'urlpath', term => $filepath, ), Lucy::Search::TermQuery->new( field => 'linenum', term => $linenum, ), ], )); Or maybe use a RangeQuery to delete a contiguous range of lines. Nick
