Dear all
/**** Im' sorry if I cant use the user@lists properly, I am indeed lost :-(
Neo4J would be so much better as a forum or a stackOverNeo4J :-)***/
Allow me to say, that the 50K magic number is not very useful for real &
practical modern Social Network apps.
What if there's simply a couple of million "Person" nodes that may "LIKE"
the "Movie" nodes?
And what if I have a few million of Movies and many million of Persons ?
Its a typical case a "movie" having a few 100K rating/votes. And imagine if
I have Song, Book & Product nodes!
I think this issue is *MAJOR* and it needs to be promoted to a high priority
to the neo4j team.
The proxy solution sounds wonderful, but it can be quite a hassle if its not
rightly encapsulated & transparent.
I think all Traversals will become quite hacked & I can't even think what
will happen to Object mapping etc.
I imagine it COULD be part of an upcoming version of the new & amazing
Spring Data Graph framework (check it out!),
where a simple Annotation such as @NodeWithProxy along with information for
what *RelationshipTypes / Directions
*should go to the real or the proxy Node, could do all of the proxy magic!!!
But, the *RelationshipType/Direction indexing *I proposed, I dare say, could
be a more generic and cleaner idea, and also a quicker hack!
All we need is a method TraversalDescription.*index("myIndex");* where we
can declare which "index" should be used for looking up
the (few) RelationshipTypes/Directions among the millions on the Node.
The best thing is that we have already declared those on
TraversalDescription.*relationships(*MyRelationshipType.hasPart,
Direction.OUTGOING).
The *Traversal *would then follow (only) those found on the index! Bingo!!!!
We could also have a *.followIndexedOnly(false) *and even
*recreateFollowedIndexes(true)
*to save us next time!
In any case, something must be implemented!
Without being an expert on neo4j, I think there is a lot of Indexing
optimization needed yet!
Michael what do you think ? Could you please see this being promoted to the
team while sharing their views?
Agelos
Date: Wed, 15 Jun 2011 17:57:55 +0200
From: "Balazs E. Pataki" <[email protected]>
Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
To: Neo4j user discussions <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
when we started to evaluate neo4j we made some measurements and for us
it seemed that 50.000 is a magical number: this many relationships and
properties on one node seemed to be a limit, which once reached makes
things slow. But we didn't actually need that much relationship/property
in our case, so we could live with it, or could make workarounds (eg.
storing things in properties and doing indexed lookups instead of using
relationships)
An automatic indexed lookup on relationship types and directions would
be awsome, definitely.
Regards,
---
balazs
Date: Wed, 15 Jun 2011 23:19:32 +0800
From: Craig Taverner <[email protected]>
Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
To: Neo4j user discussions <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Could this also be related to the possibility that in order to determine
relationship type and direction, the relationships need to be loaded from
disk? If so, then having a large number of relationships on the same node
would decrease performance, if the number was large enough to affect the
disk io caching.
If this is the case, perhaps adding a proxy node for the incoming
relationships would work-around the problem? Of course then you have doubled
the number of part nodes (two for each part, one part and one containers
proxy).
Date: Wed, 15 Jun 2011 18:40:05 +0300
From: Agelos Pikoulas <[email protected]>
Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
I have to respectfully agree with Rick Bullotta.
I was suspecting the big-O is not linear for this case.
To verify I added x4 Container nodes (400.000) and their appropriate
Relationships, and it is now *unbelievably* slow :
It does not take x4 more, but it takes more than 30-40 seconds for each
next() Remind you 100K nodes = ~2secs for each next() !!!
And only to make matters worse, the subsequent runs weren't fast either -
they actually took more time than the first
(1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms)
The whole setup is running on
Eclipse 3.6, with -Xmx512m on JavaVM,
Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ
Vertex 2). The neo4J data resides on this SSD.
The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB.
I wonder what would happen if the Container nodes were a few million (which
will be my case) - it will run forever.
Could you please looking into my suggestion - i.e "Using a 'smart' behind
the scenes Indexing on both *RelationshipType* and *Direction* that
Traversals actually use to boost things up" ?
To another topic, how does one use this mailing list - I use it through
gmail and I am utterly lost - is there a better client/UI to actually
post/reply into threads ?
------------------------------
Message: 1
Date: Wed, 15 Jun 2011 07:27:26 -0700
From: Rick Bullotta <[email protected]>
Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
To: Neo4j user discussions <[email protected]>
Message-ID:
<
09df3402c845ec489a3323a06208f20d0a9d4...@p3pw5ex1mb14.ex1.secureserver.net>
Content-Type: text/plain; charset="us-ascii"
I would respectfully disagree that it doesn't necessarily represent
production usage, since in some cases, each query/traversal will be unique
and isolated to a part of a subgraph, so in some cases, a "cold" query may
be the norm....
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Michael Hunger
Sent: Wednesday, June 15, 2011 10:25 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
That is rather a case of warming up your caches.
Determining the traversal speed from the first run is not a good benchmark
as it doesn't represent production usage :)
The same (warming up) is true for all kinds of benchmarks (except for
startup performance benchmarks).
Cheers
Michael
Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas:
> I have a few "Part" nodes related with each via "HASPART"
> relationship/edges.
> (eg Part1---HASPART--->Part2---
HASPART--->Part3 etc) .
> TraversalDescription works fine, following each Part's outgoing HASPART
> relationship.
>
> Then I add a large number (say 100.000) of "Container" Nodes, where each
> "Container" has a "CONTAINS" relation to almost *every* "Part" node.
> Hence each Part node now has a 100.000 incoming CONTAINS relationships
from
> Container nodes,
> but only a few outgoing HASPART relationships to other Part nodes.
>
> Now my previous TraversalDescription run extremely slow (several seconds
> inside each Iterator<Path>.next() call)
> Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on the
> TraversalDescription,
> but it seems its not used by neo4j as a hint. Note that on a subsequent
run
> of the same Traversal, its very quick indeed.
>
> Is there any way to use Indexing on relationships for such a scenario, to
> boost things up ?
>
> Ideally, the Traversal framework could use automatic/declerative indexing
on
> Node Relationship types and/or direction to perform such traversals
quicker.
>
> Regards
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user