Re: [GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-05 Thread Osma Suominen

On 05/05/15 16:54, Chris Dollin wrote:


Aside, hope it's useful:

Note that current jena-text doesn't /do/ conjunctive query but has enough
hooks to /enable/ conjunctive query, as is done in our ppd-index code
at https://github.com/epimorphics/ppd-text-index in TextDocProducerBatch.

When it does deletions it uses the IndexWriter's deleteDocuments() method
to brutally remove all the documents associated with the current subject
and then puts back ones that are still in the dataset.


Thanks a lot Chris, this was indeed very useful! So you're already doing 
synchronization in your ppd-index code and handling deletions in an 
appropriate, if somewhat brutal, way. That should make it easier to 
implement synchronization in the regular case, where quads correspond 
exactly to Lucene documents.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi


Re: [GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-05 Thread Chris Dollin

On 05/05/2015 02:04 PM, amiara514 wrote:

Github user amiara514 commented on the pull request:

 https://github.com/apache/jena/pull/53#issuecomment-99072949

  Ah, I see. But this still doesn't help for cases where there are small 
differences between literals within the same language, for example singular/plural 
forms that get stemmed by the analyzer, or variations in capitalization.

 It's exactly that!
 So, I push the hash solution which cover all previous cases.

 For the other issue (with conjonctive query), maybe deletion have to be 
managed with an updateDocument ?


Aside, hope it's useful:

Note that current jena-text doesn't /do/ conjunctive query but has enough
hooks to /enable/ conjunctive query, as is done in our ppd-index code
at https://github.com/epimorphics/ppd-text-index in TextDocProducerBatch.

When it does deletions it uses the IndexWriter's deleteDocuments() method
to brutally remove all the documents associated with the current subject
and then puts back ones that are still in the dataset.

:end Aside

Chris

--
If I were you, I would go to the crackpots.   /They Shall Have Stars/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)