On Wed, Feb 2, 2011 at 1:20 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
You are doing I/O bound work. More then two threads is most likely just
going to add overhead and make things slower!
I'm certainly doing something wired cause the performance of my tests
aren't linear.
2011/2/3 Massimo Lusetti mluse...@gmail.com
On Wed, Feb 2, 2011 at 1:20 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
You are doing I/O bound work. More then two threads is most likely just
going to add overhead and make things slower!
I'm certainly doing something wired
On Thu, Feb 3, 2011 at 11:30 AM, Mattias Persson
matt...@neotechnology.com wrote:
Lucene lookup performance degrades the bigger the index gets. That may be a
reason.
I don't think Lucene cannot handle an index with 6/7 million of
entries. Maybe are some logs around?
Cheers
--
Massimo
Massimo,
I yesterday just tried to import the Germany OpenStreetMap dataset
into Neo4j using Lucene indexing. There are around 60M nodes that all
are indexed into Lucene, and then looked up when the Ways, consisting
of a number of nodes each, are calculated. Lucene is not fast, but it
works on
On Thu, Feb 3, 2011 at 2:01 PM, Peter Neubauer
peter.neuba...@neotechnology.com wrote:
Massimo,
I yesterday just tried to import the Germany OpenStreetMap dataset
into Neo4j using Lucene indexing. There are around 60M nodes that all
are indexed into Lucene, and then looked up when the Ways,
2011/2/3 Massimo Lusetti mluse...@gmail.com
On Thu, Feb 3, 2011 at 2:01 PM, Peter Neubauer
peter.neuba...@neotechnology.com wrote:
Massimo,
I yesterday just tried to import the Germany OpenStreetMap dataset
into Neo4j using Lucene indexing. There are around 60M nodes that all
are
On Tue, Feb 1, 2011 at 10:19 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
For getting a performance boost out of writes, doing multiple operations in
one transaction will give a much bigger gain than multiple threads though.
For your use case, I think two writer threads and a
More threads != faster
You are doing I/O bound work. More then two threads is most likely just
going to add overhead and make things slower!
Also, I'm wondering, what does crunch mean in this context? Is it the
write operations we have been talking about, or is it some other operation?
I'm
Hi everyone,
I'm new to neo4j and I'm making experience with it, I got a fairly
big table (in my current db) which consists of something more then 220
million rows.
I want to put that in a graphdb, for instance neo4j, and graph it to
do some statistics on them. Every row will be a node in my
Since you are checking for existence before inserting the conflict you are
getting is strange. Are you running multiple insertion threads?
-Tobias
On Tue, Feb 1, 2011 at 6:19 PM, Massimo Lusetti mluse...@gmail.com wrote:
Hi everyone,
I'm new to neo4j and I'm making experience with it, I got
Also,
have you been running this insert multiple times without cleaning up
the database between runs?
Cheers,
/peter neubauer
GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter
Hmm MD5 is not a unique hashing function so it might be that you get the same
hash for different byte arrays.
Can you output the MD5 of the multiple logRow's that are returned by the index.
Michael
Am 01.02.2011 um 18:19 schrieb Massimo Lusetti:
Hi everyone,
I'm new to neo4j and I'm making
Seems a little weird, the commit rate won't affect the end result,
just performance (more operations per commit means faster
performance). Your code seems correct for single threaded use btw.
Den tisdag 1 februari 2011 skrev Michael
Hungermichael.hun...@neotechnology.com:
Hmm MD5 is not a unique
On Tue, Feb 1, 2011 at 6:36 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
Since you are checking for existence before inserting the conflict you are
getting is strange. Are you running multiple insertion threads?
Yep, I got 20 concurrent threads doing the job. I've forgot about
On Tue, Feb 1, 2011 at 6:43 PM, Peter Neubauer
peter.neuba...@neotechnology.com wrote:
Also,
have you been running this insert multiple times without cleaning up
the database between runs?
Nope for the tests I wipe (rm -rf) the db dir every run.
Cheers
--
Massimo
http://meridio.blogspot.com
On Tue, Feb 1, 2011 at 8:02 PM, Mattias Persson
matt...@neotechnology.com wrote:
Seems a little weird, the commit rate won't affect the end result,
just performance (more operations per commit means faster
performance). Your code seems correct for single threaded use btw.
Does it means that I
No, it means that you have to synchronize the threads so that they don't
insert the same data concurrently.
Perhaps a ConcurrentHashMapMD5,token where you would putIfAbsent(md5,new
Object()) when you start working on a new hash. If the token Object you get
back is not the same as you put in, you
What about batch insertion of the nodes and indexing them after the fact?
And I agree with Tobias that a CHM should be a better claim checking algorithm
than using
indexing for that. The index as well as the insertion of the nodes will only be
visible to other
threads after the commit (ACID,
That is correct, the Isolation of ACID says that data isn't visible to other
threads until after commit.
The CHM should not replace the index check though, since you want to limit
the number of items in the CHM, you only want this to reflect the elements
currently being worked on, the index check
On Tue, Feb 1, 2011 at 10:19 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
No, it means that you have to synchronize the threads so that they don't
insert the same data concurrently.
That would be a typical issue but I'm sure my are not duplicated since
the come from the (old)
On Tue, Feb 1, 2011 at 10:25 PM, Michael Hunger
michael.hun...@neotechnology.com wrote:
What about batch insertion of the nodes and indexing them after the fact?
The data to be entered will changes values in other nodes (statistics)
so I absolutely need to be sure to not insert data twice and
On Tue, Feb 1, 2011 at 10:50 PM, Tobias Ivarsson
tobias.ivars...@neotechnology.com wrote:
That is correct, the Isolation of ACID says that data isn't visible to other
threads until after commit.
The CHM should not replace the index check though, since you want to limit
the number of items in
22 matches
Mail list logo