My problem pattern is exactly the same as Niels's :

A dense-node has millions of relations of a certain direction & type,
and only a few (sparse) relations of a different direction and type.
The traversing is usually following only those sparse relationships on those
dense-nodes.

Now, even when traversing on these sparse relations, neo4j becomes extremely
slow
on a certainly non linear Order (the big cs O).

Some tests I run (email me if u want the code) reveal that even the number
of those dense-nodes in the database greatly influences the results.

I just reported to Michael the runs with the latest M05 snapshot, which are
not very positive...
I have suggested an (auto) indexing of relationship types / direction that
is used by traversing frameworks,
but I ain't no graphdb-engine expert :-(

A'


Message: 5
> Date: Wed, 29 Jun 2011 18:19:10 +0200
> From: Niels Hoogeveen <[email protected]>
> Subject: Re: [Neo4j] traversing densely populated nodes
> To: <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> Michael,
>
>
>
> The issue I am refering to does not pertain to traversing many relations at
> once
>
> but the impact many relationship of one type have on relationships
>
> of another type on the same node.
>
>
>
> Example:
>
>
>
> A topic class has 2 million outgoing relationships of type "HAS_INSTANCE"
> and
>
> has 3 outgoing relationships of type "SUB_CLASS_OF".
>
>
>
> Fetching the 3 relations of type "SUB_CLASS_OF" takes very long,
>
> I presume due to the presence of the 2 million other relationships.
>
>
>
> I have no need to ever fetch the "HAS_INSTANCE" relationships from
>
> the topic node. That relation is always traversed from the other direction.
>
>
>
> I do want to know the class of a topic instance, leading to he topic class,
>
> but have no real interest ever to traverse all topic instance from  the
> topic
>
> class (at least not directly.. i do want to know the most recent addition,
>
> and that's what I use the timeline index for).
>
>
>
> Niels
>
>
> > From: [email protected]
> > Date: Wed, 29 Jun 2011 17:50:08 +0200
> > To: [email protected]
> > Subject: Re: [Neo4j] traversing densely populated nodes
> >
> > I think this is the same problem that Angelos is facing, we are currently
> evaluating options to improve the performance on those highly connected
> supernodes.
> >
> > A traditional option is really to split them into group or even kind of
> shard their relationships to a second layer.
> >
> > We're looking into storage improvement options as well as modifications
> to retrieval of that many relationships at once.
> >
> > Cheers
> >
> > Michael
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to