Rick, I wrote a few tests trying to reproduce the slowdown with larger batch size but could not. Larger batch size results in a stable throughput while small batch sizes will spend more time flushing to disk (creating and deleting relationships the way you describe).
Could you provide a test case for this that triggers the problem? -Johan On Tue, Mar 22, 2011 at 12:53 PM, Rick Bullotta <[email protected]> wrote: > Hi, Johan. > > I've allocated 500M to the relationship store, so that's probably not the > limitation (the current relationship store size on disk is about 100M). > > My thought is that we are manipulating a lot of relationships > (adding/deleting) within the transaction, and in fact, some (many) of the > relationships that are added during the transaction are deleted during the > same transaction and never actually saved. The scenario is the creation of > an ordered linked list using nodes/relationships, and as each new item is > "inserted", there are potentially 2-3 relationships that will be > destroyed/created. In fact, if 5000 items are inserted, only 5002 > relationships will be ultimately saved, although 15000+ will have been > created in total, with 10000 of them being deleted. I'm not sure how to > optimize that much further, though I'll look into it. I was considering > using the Lucene index, but it does not have an obvious way to allow us to > traverse from both the beginning and the end of the "index". > > Best, > > Rick > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Johan Svensson > Sent: Tuesday, March 22, 2011 5:56 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] Possible performance regression issue? > > Could you start by verifying it is not GC related. Turn on verbose GC > and see if larger transactions trigger GC pause times. > > Another possible cause could be that the relationship store file has > grown so configuration needs to be tweaked. The OS may be flushing > pages to disk when it should not. There is a guide how to investigate > and tweak that when running on Linux > http://wiki.neo4j.org/content/Linux_Performance_Guide > > This could also be an issue with the setup of the persistence windows > when not using memory mapped buffers. I remember those settings got > tweaked some after 1.1 release. We could try make some changes there > but it would be better to first perform some profiling before doing > that. > > Regards, > Johan > > On Mon, Mar 21, 2011 at 11:07 PM, Rick Bullotta > <[email protected]> wrote: >> Here's the quick summary of what we're encountering: >> >> We are inserting large numbers of activity stream entries on a nearly >> constant basis. To optimize transactioning, we queue these up and have a >> single scheduled task that reads the entries from the queue and persists >> them to Neo. Within these transactions, it's possible that a very large >> number of relationships will be created and deleted (sometimes create and >> deleted all within the transaction, since we are managing something similar >> to an index). I've noticed that the time required to handle the inserts >> (not just the total, but the time per insert) degrades DRAMATICALLY if there >> are more than a few hundred entries to write. It is very fast if there are >> < 100 entries in the batch, but very slow if there are over > 1000. With >> Neo 1.1, we did not notice this behavior. We have tried Neo 1.2 and 1.3 and >> both seem to exhibit this behavior. >> >> Can anyone provide any insight into possible causes/fixes? >> >> Thanks, >> >> Rick _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

