Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Massimo Lusetti
On Wed, Feb 2, 2011 at 1:20 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: You are doing I/O bound work. More then two threads is most likely just going to add overhead and make things slower! I'm certainly doing something wired cause the performance of my tests aren't linear.

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Mattias Persson
2011/2/3 Massimo Lusetti mluse...@gmail.com On Wed, Feb 2, 2011 at 1:20 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: You are doing I/O bound work. More then two threads is most likely just going to add overhead and make things slower! I'm certainly doing something wired

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Massimo Lusetti
On Thu, Feb 3, 2011 at 11:30 AM, Mattias Persson matt...@neotechnology.com wrote: Lucene lookup performance degrades the bigger the index gets. That may be a reason. I don't think Lucene cannot handle an index with 6/7 million of entries. Maybe are some logs around? Cheers -- Massimo

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Peter Neubauer
Massimo, I yesterday just tried to import the Germany OpenStreetMap dataset into Neo4j using Lucene indexing. There are around 60M nodes that all are indexed into Lucene, and then looked up when the Ways, consisting of a number of nodes each, are calculated. Lucene is not fast, but it works on

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Massimo Lusetti
On Thu, Feb 3, 2011 at 2:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Massimo, I yesterday just tried to import the Germany OpenStreetMap dataset into Neo4j using Lucene indexing. There are around 60M nodes that all are indexed into Lucene, and then looked up when the Ways,

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-03 Thread Mattias Persson
2011/2/3 Massimo Lusetti mluse...@gmail.com On Thu, Feb 3, 2011 at 2:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Massimo, I yesterday just tried to import the Germany OpenStreetMap dataset into Neo4j using Lucene indexing. There are around 60M nodes that all are

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-02 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 10:19 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: For getting a performance boost out of writes, doing multiple operations in one transaction will give a much bigger gain than multiple threads though. For your use case, I think two writer threads and a

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-02 Thread Tobias Ivarsson
More threads != faster You are doing I/O bound work. More then two threads is most likely just going to add overhead and make things slower! Also, I'm wondering, what does crunch mean in this context? Is it the write operations we have been talking about, or is it some other operation? I'm

[Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
Hi everyone, I'm new to neo4j and I'm making experience with it, I got a fairly big table (in my current db) which consists of something more then 220 million rows. I want to put that in a graphdb, for instance neo4j, and graph it to do some statistics on them. Every row will be a node in my

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Tobias Ivarsson
Since you are checking for existence before inserting the conflict you are getting is strange. Are you running multiple insertion threads? -Tobias On Tue, Feb 1, 2011 at 6:19 PM, Massimo Lusetti mluse...@gmail.com wrote: Hi everyone, I'm new to neo4j and I'm making experience with it, I got

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Peter Neubauer
Also, have you been running this insert multiple times without cleaning up the database between runs? Cheers, /peter neubauer GTalk:      neubauer.peter Skype       peter.neubauer Phone       +46 704 106975 LinkedIn   http://www.linkedin.com/in/neubauer Twitter     

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Michael Hunger
Hmm MD5 is not a unique hashing function so it might be that you get the same hash for different byte arrays. Can you output the MD5 of the multiple logRow's that are returned by the index. Michael Am 01.02.2011 um 18:19 schrieb Massimo Lusetti: Hi everyone, I'm new to neo4j and I'm making

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Mattias Persson
Seems a little weird, the commit rate won't affect the end result, just performance (more operations per commit means faster performance). Your code seems correct for single threaded use btw. Den tisdag 1 februari 2011 skrev Michael Hungermichael.hun...@neotechnology.com: Hmm MD5 is not a unique

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 6:36 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Since you are checking for existence before inserting the conflict you are getting is strange. Are you running multiple insertion threads? Yep, I got 20 concurrent threads doing the job. I've forgot about

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 6:43 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Also, have you been running this insert multiple times without cleaning up the database between runs? Nope for the tests I wipe (rm -rf) the db dir every run. Cheers -- Massimo http://meridio.blogspot.com

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 8:02 PM, Mattias Persson matt...@neotechnology.com wrote: Seems a little weird, the commit rate won't affect the end result, just performance (more operations per commit means faster performance). Your code seems correct for single threaded use btw. Does it means that I

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Tobias Ivarsson
No, it means that you have to synchronize the threads so that they don't insert the same data concurrently. Perhaps a ConcurrentHashMapMD5,token where you would putIfAbsent(md5,new Object()) when you start working on a new hash. If the token Object you get back is not the same as you put in, you

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Michael Hunger
What about batch insertion of the nodes and indexing them after the fact? And I agree with Tobias that a CHM should be a better claim checking algorithm than using indexing for that. The index as well as the insertion of the nodes will only be visible to other threads after the commit (ACID,

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Tobias Ivarsson
That is correct, the Isolation of ACID says that data isn't visible to other threads until after commit. The CHM should not replace the index check though, since you want to limit the number of items in the CHM, you only want this to reflect the elements currently being worked on, the index check

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 10:19 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: No, it means that you have to synchronize the threads so that they don't insert the same data concurrently. That would be a typical issue but I'm sure my are not duplicated since the come from the (old)

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 10:25 PM, Michael Hunger michael.hun...@neotechnology.com wrote: What about batch insertion of the nodes and indexing them after the fact? The data to be entered will changes values in other nodes (statistics) so I absolutely need to be sure to not insert data twice and

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

2011-02-01 Thread Massimo Lusetti
On Tue, Feb 1, 2011 at 10:50 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: That is correct, the Isolation of ACID says that data isn't visible to other threads until after commit. The CHM should not replace the index check though, since you want to limit the number of items in