[Neo] Neo4j + Lucene

Martin Kleppmann Tue, 16 Jun 2009 02:21:21 -0700

Hello all,

I'm planning to add Lucene indexing support to the Scala/REST wrapper  
I announced yesterday. This means venturing into areas where the  
documentation is patchy... I'm looking through Andreas' Ruby library  
as a starting point on how to do things, but I have a few additional  
questions:


- I would like to be able to track modifications to nodes and  
relationships and automatically submit these to the indexer, so that  
code outside doesn't have to worry about indexing. What I'm planning  
to do is to have my own classes implementing the NeoService, Node and  
Relationship interfaces, each delegating to an underlying service/node/ 
relationship but tracking modifications and submitting them to an  
IndexService on transaction commit. Can you see anything wrong with  
this approach? If I implement all of NeoService's getters to return my  
wrapped node/relationship implementations, can I be sure that query  
code (e.g. using traversers) will always return my wrapped  
implementations? (Asked the other way round, is it possible to  
intercept every occurrence of org.neo4j.impl.core.NodeImpl and  
RelationshipImpl being instantiated?)

- What is the thread safety of LuceneIndexService and friends? Would  
it be right to have (a) a single instance and synchronise all threads  
on it; (b) a single instance with multi-threaded access; (c) one  
instance per thread?

- Am I right in my understanding that the Lucene index is disk-backed,  
and thus it should not be necessary to re-index the db after  
restarting the server? Do you find that it is still necessary to do a  
full rebuild of the index occasionally in case it goes out of sync? (I  
guess that when using the more loose isolation modes like  
ASYNC_OTHER_TX you might get lost updates to the indexer on abrupt  
server shut-down.)

- What is NeoIndexService (used by the IMDB demo) about? From skimming  
the code it looks like it is represented completely in a Neo4j  
subgraph (arranged in a BTree?). What advantages does this approach  
offer over Lucene? I assume it won't support any of Lucene's more  
advanced features, such as fuzzy matching. Similarly, how do  
SortedTree/Timeline compare to a sorted Lucene index and range queries?

I'll probably run into more questions as I go along, but some initial  
feedback would be much appreciated.

Best regards
Martin

_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

[Neo] Neo4j + Lucene

Reply via email to