Rick,

I wrote a few tests trying to reproduce the slowdown with larger batch
size but could not. Larger batch size results in a stable throughput
while small batch sizes will spend more time flushing to disk
(creating and deleting relationships the way you describe).

Could you provide a test case for this that triggers the problem?

-Johan

On Tue, Mar 22, 2011 at 12:53 PM, Rick Bullotta
<[email protected]> wrote:
> Hi, Johan.
>
> I've allocated 500M to the relationship store, so that's probably not the 
> limitation (the current relationship store size on disk is about 100M).
>
> My thought is that we are manipulating a lot of relationships 
> (adding/deleting) within the transaction, and in fact, some (many) of the 
> relationships that are added during the transaction are deleted during the 
> same transaction and never actually saved.  The scenario is the creation of 
> an ordered linked list using nodes/relationships, and as each new item is 
> "inserted", there are potentially 2-3 relationships that will be 
> destroyed/created. In fact, if 5000 items are inserted, only 5002 
> relationships will be ultimately saved, although 15000+ will have been 
> created in total, with 10000 of them being deleted.  I'm not sure how to 
> optimize that much further, though I'll look into it.  I was considering 
> using the Lucene index, but it does not have an obvious way to allow us to 
> traverse from both the beginning and the end of the "index".
>
> Best,
>
> Rick
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Johan Svensson
> Sent: Tuesday, March 22, 2011 5:56 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] Possible performance regression issue?
>
> Could you start by verifying it is not GC related. Turn on verbose GC
> and see if larger transactions trigger GC pause times.
>
> Another possible cause could be that the relationship store file has
> grown so configuration needs to be tweaked. The OS may be flushing
> pages to disk when it should not. There is a guide how to investigate
> and tweak that when running on Linux
> http://wiki.neo4j.org/content/Linux_Performance_Guide
>
> This could also be an issue with the setup of the persistence windows
> when not using memory mapped buffers. I remember those settings got
> tweaked some after 1.1 release. We could try make some changes there
> but it would be better to first perform some profiling before doing
> that.
>
> Regards,
> Johan
>
> On Mon, Mar 21, 2011 at 11:07 PM, Rick Bullotta
> <[email protected]> wrote:
>> Here's the quick summary of what we're encountering:
>>
>> We are inserting large numbers of activity stream entries on a nearly 
>> constant basis.  To optimize transactioning, we queue these up and have a 
>> single scheduled task that reads the entries from the queue and persists 
>> them to Neo.  Within these transactions, it's possible that a very large 
>> number of relationships will be created and deleted (sometimes create and 
>> deleted all within the transaction, since we are managing something similar 
>> to an index).   I've noticed that the time required to handle the inserts 
>> (not just the total, but the time per insert) degrades DRAMATICALLY if there 
>> are more than a few hundred entries to write.  It is very fast if there are 
>> < 100 entries in the batch, but very slow if there are over > 1000.  With 
>> Neo 1.1, we did not notice this behavior.  We have tried Neo 1.2 and 1.3 and 
>> both seem to exhibit this behavior.
>>
>> Can anyone provide any insight into possible causes/fixes?
>>
>> Thanks,
>>
>> Rick
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to