On Thu, Jan 22, 2009 at 2:33 PM, S.Selvam Siva <s.selvams...@gmail.com>wrote:

>
>
> On Thu, Jan 22, 2009 at 7:12 AM, Chris Hostetter <hossman_luc...@fucit.org
> > wrote:
>
>>
>> : what i need is ,to log the existing urlid and new urlid(of course both
>> will
>> : not be same) ,when a .xml file of same id(unique field) is posted.
>> :
>> : I want to make this by modifying the solr source.Which file do i need to
>> : modify so that i could get the above details in log ?
>> :
>> : I tried with DirectUpdateHandler2.java(which removes the duplicate
>> : entries),but efforts in vein.
>>
>> DirectUpdateHandler2.java (on the trunk) delegates to Lucene-Java's
>> IndexWriter.updateDocument method when you have a uniqueKey and you aren't
>> allowing duplicates -- this method doesn't give you any way to access the
>> old document(s) that had that existing key.
>>
>> The easiest way to make a change like what you are interested in might be
>> an UpdateProcessor that does a lookup/search for the uniqueKey of each
>> document about to be added to see if it already exists.  that's probably
>> about as efficient as you can get, and would be nicely encapsulated.
>>
>> You might also want to take a look at SOLR-799, where some work is being
>> done to create UpdateProcessors that can do "near duplicate" detection...
>>
>> http://wiki.apache.org/solr/Deduplication
>> https://issues.apache.org/jira/browse/SOLR-799
>>
>>
>>
>>
>>
>>
>> -Hoss
>>
>
>

Hi, i added some code to *DirectUpdateHandler2.java's doDeletions()* (solr
1.2.0) ,and got the solution i wanted.(logging duplicate post entry-i.e old
field and new field of duplicate post)


       Document d1=searcher.doc(prev);        //existing doc to be deleted
       Document d2=searcher.doc(tdocs.doc());    //new doc
       String oldname=d1.get("name");
       String id1=d1.get("id");
       String newname=d2.get("name");
       String id2=d1.get("id");
       out3.write(id1+","+oldname+","+newname+"\n");

But i dont know ,wether the performance of solr will be affected by this.
Any comment on the performance issue for the above solution is welcome...
-- 
Yours,
S.Selvam

Reply via email to