Re: [Neo4j] Performance issue on nodes with lots of relationships

Michael Hunger Thu, 07 Jul 2011 13:56:32 -0700

Niels could you perhaps write up a blog post detailing the usage (also for your 
own scenario and how that solution would compare to the naive supernodes with 
just millions of relationships.


Also I'd like to see a performance comparision of both approaches.

Thanks so much for your work

Michael

Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen:

> 
> I am glad to see a solution will be provided at the core level. 
> Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, 
> see: 
> https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship
> This provides a solution to the issue, but is certainly not as fast as a 
> solution in core would be. 
> However, it does solve my issues and as a bonus, indexed relationships can be 
> traversed in sorted order,this is especially pleasant, since I usually want 
> to know only the recent additions of dense relationships.
> Niels
> 
> 
>> Date: Thu, 7 Jul 2011 21:37:26 +0200
>> From: matt...@neotechnology.com
>> To: user@lists.neo4j.org
>> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
>> 
>> 2011/7/7 Agelos Pikoulas <agelos.pikou...@gmail.com>
>> 
>>> I think its the same problem pattern that been in discussion lately with
>>> dense nodes or supernodes (check
>>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
>>> 
>>> Michael Hunger has provided a quick solution to visiting the *few*
>>> RelationshipTypes on a node that has *millions* of others, utilizing a
>>> RelationshipExpander with an Index (check
>>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
>>> 
>>> Ideally this would be abstracted & implemented in the core distribution so
>>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
>>> efficiently...
>>> 
>> 
>> Yes, I'm positive that something will be done on a core level to make
>> getting relationships of a specific type regardless of the total number of
>> relationships fast. In the foreseeable future hopefully.
>> 
>>> 
>>> Agelos
>>> 
>>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White <li...@andrewewhite.net>
>>> wrote:
>>> 
>>>> I use the shell as-is, but the messages.log is reporting...
>>>> 
>>>>    Physical mem: 3962MB, Heap size: 881MB
>>>> 
>>>> My point is that if you ignore caching altogether, why did one run take
>>>> 17x longer with only 2.4x more data? Considering this is a rather
>>>> iterative algorithm, I don't see why you would even read a node or
>>>> relationship more than once and thus a cache shouldn't matter at all.
>>>> 
>>>> In this particular case, I can't imagine taking 9+ minutes to read a
>>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
>>>> artifact of Cypher in which it is building a set of Rs before applying
>>>> `count` rather than making count accept an iterable stream.
>>>> 
>>>> Andrew
>>>> 
>>>> On 07/06/2011 11:33 PM, David Montag wrote:
>>>>> Hi Andrew,
>>>>> 
>>>>> How big is your configured Java heap? It could be that all the nodes
>>> and
>>>>> relationships don't fit into the cache.
>>>>> 
>>>>> David
>>>>> 
>>>>> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White<li...@andrewewhite.net>
>>>> wrote:
>>>>> 
>>>>>> Here is some interesting stats to consider. First, I split my nodes
>>> into
>>>>>> two groups, one node with 1.4M children and the other with 3.4M
>>>>>> children. While I do see some cache warm-up improvements, the
>>>>>> transversal doesn't seem to scale linearly; ie the larger super-node
>>> has
>>>>>> 2.4x more children but takes 17x longer to transverse.
>>>>>> 
>>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>>>>>> +----------+
>>>>>> | count(r) |
>>>>>> +----------+
>>>>>> | 1468486  |
>>>>>> +----------+
>>>>>> 1 rows, 25724 ms
>>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>>>>>> +----------+
>>>>>> | count(r) |
>>>>>> +----------+
>>>>>> | 1468486  |
>>>>>> +----------+
>>>>>> 1 rows, 19763 ms
>>>>>> 
>>>>>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>>>>>> +----------+
>>>>>> | count(r) |
>>>>>> +----------+
>>>>>> | 3472174  |
>>>>>> +----------+
>>>>>> 1 rows, 565448 ms
>>>>>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>>>>>> +----------+
>>>>>> | count(r) |
>>>>>> +----------+
>>>>>> | 3472174  |
>>>>>> +----------+
>>>>>> 1 rows, 337975 ms
>>>>>> 
>>>>>> Any ideas on this?
>>>>>> Andrew
>>>>>> 
>>>>>> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
>>>>>>> Andrew,
>>>>>>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
>>>>>>> order to count the relationships of a node, not returning them:
>>>>>>> 
>>>>>>> start n=(1) match (n)-[r]-(x) return count(r)
>>>>>>> 
>>>>>>> and try that several times to see if cold caches are initially
>>> slowing
>>>>>>> down things.
>>>>>>> 
>>>>>>> or something along these lines. In the LS and Neoclipse the output
>>> and
>>>>>>> visualization will be slow for that amount of data.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> /peter neubauer
>>>>>>> 
>>>>>>> GTalk:      neubauer.peter
>>>>>>> Skype       peter.neubauer
>>>>>>> Phone       +46 704 106975
>>>>>>> LinkedIn   http://www.linkedin.com/in/neubauer
>>>>>>> Twitter      http://twitter.com/peterneubauer
>>>>>>> 
>>>>>>> http://www.neo4j.org               - Your high performance graph
>>>>>> database.
>>>>>>> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
>>>>>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
>>>> party.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<li...@andrewewhite.net>
>>>>>>  wrote:
>>>>>>>> I have a graph with roughly 10M nodes. Some of these nodes are
>>> highly
>>>>>>>> connected to other nodes. For example I may have a single node with
>>>> 1M+
>>>>>>>> relationships. A good analogy is a population that has a  "lives-in"
>>>>>>>> relationship to a state. Now the problem...
>>>>>>>> 
>>>>>>>> Both neoclipse or neo4j-shell are terribly slow when working with
>>>> these
>>>>>>>> nodes. In the shell I would expect a `cd<node-id>` to be very fast,
>>>>>>>> much like selecting via a rowid in a standard DB. Instead, I usually
>>>> see
>>>>>>>> several seconds delay. Doing a `ls` takes so long that I usually
>>> have
>>>> to
>>>>>>>> just kill the process. In fact `ls` never outputs anything which is
>>>> odd
>>>>>>>> since I would expect it to "stream" the output as it found it. I
>>> have
>>>>>>>> very similar performance issues with neoclipse.
>>>>>>>> 
>>>>>>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>>>>>>>> Disclaimer, I am new to Neo4j.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>>> _______________________________________________
>>>>>>>> Neo4j mailing list
>>>>>>>> User@lists.neo4j.org
>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Neo4j mailing list
>>>>>>> User@lists.neo4j.org
>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Neo4j mailing list
>>>>>> User@lists.neo4j.org
>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>> 
>> 
>> 
>> -- 
>> Mattias Persson, [matt...@neotechnology.com]
>> Hacker, Neo Technology
>> www.neotechnology.com
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>                                         
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Performance issue on nodes with lots of relationships

Reply via email to