Hi Johan and others
>>I am having a hard time to follow what the problems really are since 
>>conversation is split up in several thread
My fault, sorry. I was replying to a message posted before I subscribed to the 
list so didn't have the orginal poster's email. 

>>as I understand it you are saying that it is the index lookups that are 
>>taking to long time?

In your current implementation, "Yes" - in the indexing implementation I 
provide on that Google code project there is no performance issue.
However, having fixed the Lucene indexing issue it only reveals that the 
*database* is now the bottleneck and blows up after 30 million edge inserts. 
That is now the issue here.

See the test results here : 
http://code.google.com/p/graphdb-load-tester/wiki/TestResults

>>For example inserting 500M relationships
>>requiring 1B index lookups (one for each node) with an avg index
>>lookup time of 1ms is 11 days worth of index lookup time.
That is why I suggested to Peter when he asked for help with indexing that a 
Bloom filter helps "know what you don't know" and an LRU Cache helps hang onto 
popular nodes. These are in my implementation and both avoid reads.
Re your suggestion about avoiding indexes by inserting in batches - I can't see 
how that will help because you can sort input data by from node key or to node 
key but will not necessarily end up with node pairs that are joined by edges 
conveniently located in the same batch and will therefore need an index service 
to add any edges - but as I say this is fixed in my implementation andindexing 
is not the remaining issue - the database is.
I do encourage you to try run it.

Cheers,
Mark



_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to