Re: [Neo4j] Performance issue on nodes with lots of relationships

Agelos Pikoulas Thu, 07 Jul 2011 06:51:59 -0700

I think its the same problem pattern that been in discussion lately with
dense nodes or supernodes (check
http://lists.neo4j.org/pipermail/user/2011-July/009832.html).


Michael Hunger has provided a quick solution to visiting the *few*
RelationshipTypes on a node that has *millions* of others, utilizing a
RelationshipExpander with an Index (check
http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)

Ideally this would be abstracted & implemented in the core distribution so
that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
efficiently...

Agelos

On Thu, Jul 7, 2011 at 3:16 PM, Andrew White <[email protected]> wrote:

> I use the shell as-is, but the messages.log is reporting...
>
>     Physical mem: 3962MB, Heap size: 881MB
>
> My point is that if you ignore caching altogether, why did one run take
> 17x longer with only 2.4x more data? Considering this is a rather
> iterative algorithm, I don't see why you would even read a node or
> relationship more than once and thus a cache shouldn't matter at all.
>
> In this particular case, I can't imagine taking 9+ minutes to read a
> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
> artifact of Cypher in which it is building a set of Rs before applying
> `count` rather than making count accept an iterable stream.
>
> Andrew
>
> On 07/06/2011 11:33 PM, David Montag wrote:
> > Hi Andrew,
> >
> > How big is your configured Java heap? It could be that all the nodes and
> > relationships don't fit into the cache.
> >
> > David
> >
> > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White<[email protected]>
>  wrote:
> >
> >> Here is some interesting stats to consider. First, I split my nodes into
> >> two groups, one node with 1.4M children and the other with 3.4M
> >> children. While I do see some cache warm-up improvements, the
> >> transversal doesn't seem to scale linearly; ie the larger super-node has
> >> 2.4x more children but takes 17x longer to transverse.
> >>
> >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> >> +----------+
> >> | count(r) |
> >> +----------+
> >> | 1468486  |
> >> +----------+
> >> 1 rows, 25724 ms
> >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> >> +----------+
> >> | count(r) |
> >> +----------+
> >> | 1468486  |
> >> +----------+
> >> 1 rows, 19763 ms
> >>
> >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> >> +----------+
> >> | count(r) |
> >> +----------+
> >> | 3472174  |
> >> +----------+
> >> 1 rows, 565448 ms
> >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> >> +----------+
> >> | count(r) |
> >> +----------+
> >> | 3472174  |
> >> +----------+
> >> 1 rows, 337975 ms
> >>
> >> Any ideas on this?
> >> Andrew
> >>
> >> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> >>> Andrew,
> >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> >>> order to count the relationships of a node, not returning them:
> >>>
> >>> start n=(1) match (n)-[r]-(x) return count(r)
> >>>
> >>> and try that several times to see if cold caches are initially slowing
> >>> down things.
> >>>
> >>> or something along these lines. In the LS and Neoclipse the output and
> >>> visualization will be slow for that amount of data.
> >>>
> >>> Cheers,
> >>>
> >>> /peter neubauer
> >>>
> >>> GTalk:      neubauer.peter
> >>> Skype       peter.neubauer
> >>> Phone       +46 704 106975
> >>> LinkedIn   http://www.linkedin.com/in/neubauer
> >>> Twitter      http://twitter.com/peterneubauer
> >>>
> >>> http://www.neo4j.org               - Your high performance graph
> >> database.
> >>> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
> party.
> >>>
> >>>
> >>>
> >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<[email protected]>
> >>   wrote:
> >>>> I have a graph with roughly 10M nodes. Some of these nodes are highly
> >>>> connected to other nodes. For example I may have a single node with
> 1M+
> >>>> relationships. A good analogy is a population that has a  "lives-in"
> >>>> relationship to a state. Now the problem...
> >>>>
> >>>> Both neoclipse or neo4j-shell are terribly slow when working with
> these
> >>>> nodes. In the shell I would expect a `cd<node-id>` to be very fast,
> >>>> much like selecting via a rowid in a standard DB. Instead, I usually
> see
> >>>> several seconds delay. Doing a `ls` takes so long that I usually have
> to
> >>>> just kill the process. In fact `ls` never outputs anything which is
> odd
> >>>> since I would expect it to "stream" the output as it found it. I have
> >>>> very similar performance issues with neoclipse.
> >>>>
> >>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
> >>>> Disclaimer, I am new to Neo4j.
> >>>>
> >>>> Thanks,
> >>>> Andrew
> >>>> _______________________________________________
> >>>> Neo4j mailing list
> >>>> [email protected]
> >>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>
> >>> _______________________________________________
> >>> Neo4j mailing list
> >>> [email protected]
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>>
> >> _______________________________________________
> >> Neo4j mailing list
> >> [email protected]
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >
> >
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Performance issue on nodes with lots of relationships

Reply via email to