Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-09 Thread Max De Marzi Jr.
Can you expand on this a bit... as to what the graph internals are doing:

Option 1:
You have "colored" relationships (RED, BLUE, GREEN, etc to 10k colors).
>From a random node, you traverse the graph finding all nodes that it is
connected to via the PURPLE or FUSIA relationship.

vs

Option 2:
You have a COLOR relationship with a name property that contains the actual
color name.
>From a random node, you traverse the graph finding all nodes that it is
connected to via the COLOR relationship containing the name "PURPLE" or
"FUSIA" in the relationship.


For some reason I thought it was more expensive (in terms of traversal time)
to look up a property on a relationship than to simply pass a named
relationship type.


On Fri, Jul 9, 2010 at 8:45 AM, Johan Svensson wrote:

> Hi,
>
> I would not recommend to use large amounts of different (dynamically
> created) relationship types. It is better to use well defined
> relationship types with an additional property on the relationship
> whenever needed. The limit is actually not 64k but 2^31, but having
> large amounts of relationship types like 10k-100k+ will reduce
> performance and consume a lot of memory.
>
> Regards,
> Johan
>
> On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr. 
> wrote:
> > Can somebody verify the max number of relationship types? If it is 64k,
> is
> > there a way to increase it without significant effort?
> >
> >
> >>  I believe you can have something like 64k
> >> relationship types, so using the relationship type for the route name is
> >> possible.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-09 Thread Johan Svensson
Hi,

I would not recommend to use large amounts of different (dynamically
created) relationship types. It is better to use well defined
relationship types with an additional property on the relationship
whenever needed. The limit is actually not 64k but 2^31, but having
large amounts of relationship types like 10k-100k+ will reduce
performance and consume a lot of memory.

Regards,
Johan

On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr.  wrote:
> Can somebody verify the max number of relationship types? If it is 64k, is
> there a way to increase it without significant effort?
>
>
>>  I believe you can have something like 64k
>> relationship types, so using the relationship type for the route name is
>> possible.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-09 Thread Tim Jones
Hi Craig,

That's great, thanks a lot. I'll give it a go.

Cheers, 
Tim



- Original Message 
From: Craig Taverner 
To: Neo4j user discussions 
Sent: Thu, July 8, 2010 8:49:38 PM
Subject: Re: [Neo4j] How to traverse by the number of relationships between 
nodes?

Hi Tim,

It is exactly the same approach, but instead of building the route cache
while loading the graph, just do it on a second pass which traverses the
graph. If the graph is structured like you describe, then write a traverser
that visits each Visit node once, and for each Visit node iterate over the
Page relationships, creating an Array of relationships. Sort the array by
your visit order property and you have the route cache. Step backwards
through the route creating the route relationships as described before.

Cheers, Craig

On Thu, Jul 8, 2010 at 5:33 PM, Tim Jones  wrote:

> Hi Craig, thanks for your answer.
>
> What's your approach that would allow me to specify the destination node at
> analysis time? I'd like to retain the flexibility to do this too.
>
> Thanks
> Tim
>
>
>
> - Original Message 
> From: Craig Taverner 
> To: Neo4j user discussions 
> Sent: Thu, July 8, 2010 12:09:28 PM
> Subject: Re: [Neo4j] How to traverse by the number of relationships between
> nodes?
>
> Even without the new traversal framework, the returnable evaluator has
> access to the current node being evaluated and can investigate it's
> relationships (and even run another traverser). I'm not sure if nested
> traversing is a good idea, but I certainly have used methods like
> getRelationships inside an evaluator with no problems.
>
> As for the main goal, I think there are many ways to skin a cat. For
> performance reasons I would always look for the way that embeds the final
> result in the graph structure itself, so you don't need complex traversals
> to get your answer. So in your case you want the 10 most popular routes, I
> guess what you are looking for are relationships between pages that define
> a
> route and a popularity score. So the final answer would be found by simply
> sorting these relationships to the destination page by popularity. No
> traversal required :-)
>
> Your current structure is a good match for the incoming data, but requires
> lots of traversing to determine the main answer you are after. So I would
> vote for adding a new structure that includes the answer. I think I have an
> idea that can be done during load if you know in advance the destination
> node you want to analyse, as well as after load (second pass) if you want
> to
> specify the destination node only at analysis time. I'll describe the
> 'during load' approach.
>
> Load the apache log data, optionally building the structure you do now, but
> also identifying all start points and routes to the destination. This can
> be
> achieved by an in memory cache for each user session (visit) of the route
> from the entry point, appended to as each new page is visited (just an
> ArrayList of page Nodes, growing page-by-page), and when the destination
> Page is reached, create a unique identifier for that route (eg. a string of
> all node-ids in the route, or the hashcode of that). Then step back along
> all nodes in the route, adding relations with
> DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1,
> and if the relationship already exists for that name, increment the count.
>
> You can even load later apache logs to this and it will continue to
> incremement the route counters nicely. And to reset the counters, just
> delete all those route relationships.
>
> Now the final answer for your query is only to iterate over all incoming
> relationships to the destination page, and if the relationship type name
> starts with 'ROUTE-' add to an ArrayList of relationships, and then sort
> that list by the counter property. This should be almost instantaneous
> result :-)
>
> Of course, this algorithm assumes that the total number of possible routes
> is not unreasonably high. I believe you can have something like 64k
> relationship types, so using the relationship type for the route name is
> possible. If you are uncomfortable with that, just use a static type like
> 'ROUTE', and put the relationship name in a relationship property. That
> slightly increases the complexity of the test for the route during creation
> and slightly decreases the complecity of the test for the route during the
> final scoring. For this example, the performance difference is
> insignificant.
>
> Cheers, Craig
>
>
> On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth  >wrote:
>
> > Hi Tim!
> >
> > Maybe you can use the new t

Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-08 Thread Craig Taverner
Hi Tim,

It is exactly the same approach, but instead of building the route cache
while loading the graph, just do it on a second pass which traverses the
graph. If the graph is structured like you describe, then write a traverser
that visits each Visit node once, and for each Visit node iterate over the
Page relationships, creating an Array of relationships. Sort the array by
your visit order property and you have the route cache. Step backwards
through the route creating the route relationships as described before.

Cheers, Craig

On Thu, Jul 8, 2010 at 5:33 PM, Tim Jones  wrote:

> Hi Craig, thanks for your answer.
>
> What's your approach that would allow me to specify the destination node at
> analysis time? I'd like to retain the flexibility to do this too.
>
> Thanks
> Tim
>
>
>
> - Original Message 
> From: Craig Taverner 
> To: Neo4j user discussions 
> Sent: Thu, July 8, 2010 12:09:28 PM
> Subject: Re: [Neo4j] How to traverse by the number of relationships between
> nodes?
>
> Even without the new traversal framework, the returnable evaluator has
> access to the current node being evaluated and can investigate it's
> relationships (and even run another traverser). I'm not sure if nested
> traversing is a good idea, but I certainly have used methods like
> getRelationships inside an evaluator with no problems.
>
> As for the main goal, I think there are many ways to skin a cat. For
> performance reasons I would always look for the way that embeds the final
> result in the graph structure itself, so you don't need complex traversals
> to get your answer. So in your case you want the 10 most popular routes, I
> guess what you are looking for are relationships between pages that define
> a
> route and a popularity score. So the final answer would be found by simply
> sorting these relationships to the destination page by popularity. No
> traversal required :-)
>
> Your current structure is a good match for the incoming data, but requires
> lots of traversing to determine the main answer you are after. So I would
> vote for adding a new structure that includes the answer. I think I have an
> idea that can be done during load if you know in advance the destination
> node you want to analyse, as well as after load (second pass) if you want
> to
> specify the destination node only at analysis time. I'll describe the
> 'during load' approach.
>
> Load the apache log data, optionally building the structure you do now, but
> also identifying all start points and routes to the destination. This can
> be
> achieved by an in memory cache for each user session (visit) of the route
> from the entry point, appended to as each new page is visited (just an
> ArrayList of page Nodes, growing page-by-page), and when the destination
> Page is reached, create a unique identifier for that route (eg. a string of
> all node-ids in the route, or the hashcode of that). Then step back along
> all nodes in the route, adding relations with
> DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1,
> and if the relationship already exists for that name, increment the count.
>
> You can even load later apache logs to this and it will continue to
> incremement the route counters nicely. And to reset the counters, just
> delete all those route relationships.
>
> Now the final answer for your query is only to iterate over all incoming
> relationships to the destination page, and if the relationship type name
> starts with 'ROUTE-' add to an ArrayList of relationships, and then sort
> that list by the counter property. This should be almost instantaneous
> result :-)
>
> Of course, this algorithm assumes that the total number of possible routes
> is not unreasonably high. I believe you can have something like 64k
> relationship types, so using the relationship type for the route name is
> possible. If you are uncomfortable with that, just use a static type like
> 'ROUTE', and put the relationship name in a relationship property. That
> slightly increases the complexity of the test for the route during creation
> and slightly decreases the complecity of the test for the route during the
> final scoring. For this example, the performance difference is
> insignificant.
>
> Cheers, Craig
>
>
> On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth  >wrote:
>
> > Hi Tim!
> >
> > Maybe you can use the new traversal framework, this interface comes to
> > mind:
> >
> >
> http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html
> >l
> >
> > Regarding the number of relationships, it could be a good idea to store
> &

Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-08 Thread Tim Jones
Hi Craig, thanks for your answer.

What's your approach that would allow me to specify the destination node at 
analysis time? I'd like to retain the flexibility to do this too.

Thanks
Tim



- Original Message 
From: Craig Taverner 
To: Neo4j user discussions 
Sent: Thu, July 8, 2010 12:09:28 PM
Subject: Re: [Neo4j] How to traverse by the number of relationships between 
nodes?

Even without the new traversal framework, the returnable evaluator has
access to the current node being evaluated and can investigate it's
relationships (and even run another traverser). I'm not sure if nested
traversing is a good idea, but I certainly have used methods like
getRelationships inside an evaluator with no problems.

As for the main goal, I think there are many ways to skin a cat. For
performance reasons I would always look for the way that embeds the final
result in the graph structure itself, so you don't need complex traversals
to get your answer. So in your case you want the 10 most popular routes, I
guess what you are looking for are relationships between pages that define a
route and a popularity score. So the final answer would be found by simply
sorting these relationships to the destination page by popularity. No
traversal required :-)

Your current structure is a good match for the incoming data, but requires
lots of traversing to determine the main answer you are after. So I would
vote for adding a new structure that includes the answer. I think I have an
idea that can be done during load if you know in advance the destination
node you want to analyse, as well as after load (second pass) if you want to
specify the destination node only at analysis time. I'll describe the
'during load' approach.

Load the apache log data, optionally building the structure you do now, but
also identifying all start points and routes to the destination. This can be
achieved by an in memory cache for each user session (visit) of the route
from the entry point, appended to as each new page is visited (just an
ArrayList of page Nodes, growing page-by-page), and when the destination
Page is reached, create a unique identifier for that route (eg. a string of
all node-ids in the route, or the hashcode of that). Then step back along
all nodes in the route, adding relations with
DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1,
and if the relationship already exists for that name, increment the count.

You can even load later apache logs to this and it will continue to
incremement the route counters nicely. And to reset the counters, just
delete all those route relationships.

Now the final answer for your query is only to iterate over all incoming
relationships to the destination page, and if the relationship type name
starts with 'ROUTE-' add to an ArrayList of relationships, and then sort
that list by the counter property. This should be almost instantaneous
result :-)

Of course, this algorithm assumes that the total number of possible routes
is not unreasonably high. I believe you can have something like 64k
relationship types, so using the relationship type for the route name is
possible. If you are uncomfortable with that, just use a static type like
'ROUTE', and put the relationship name in a relationship property. That
slightly increases the complexity of the test for the route during creation
and slightly decreases the complecity of the test for the route during the
final scoring. For this example, the performance difference is
insignificant.

Cheers, Craig


On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth wrote:

> Hi Tim!
>
> Maybe you can use the new traversal framework, this interface comes to
> mind:
>
>http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html
>l
>
> Regarding the number of relationships, it could be a good idea to store
> it as a property on the node.
>
> /anders
>
> > Is there any way I can write a ReturnableEvaluator that can examine the
> > collection of nodes related to the current node? Is this even the correct
> > approach?
> >
> > I actually want to be able to return the 10 most popular routes to the
> > registration page. For the most popular, I can use the above algorithm,
> but for
> > the second it's going to be more tricky.
> >
> > Would I be able to search for all 10 routes in a single pass, or should I
> > perform multiple passes?
> >
> > Any help  would be really appreciated since I'm not really sure how to
> approach
> > this.
> >
> > Thanks,
> > Tim
> >
> >
> >
> >
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> ___

Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-08 Thread Max De Marzi Jr.
Can somebody verify the max number of relationship types? If it is 64k, is
there a way to increase it without significant effort?


>  I believe you can have something like 64k
> relationship types, so using the relationship type for the route name is
> possible.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-08 Thread Craig Taverner
Even without the new traversal framework, the returnable evaluator has
access to the current node being evaluated and can investigate it's
relationships (and even run another traverser). I'm not sure if nested
traversing is a good idea, but I certainly have used methods like
getRelationships inside an evaluator with no problems.

As for the main goal, I think there are many ways to skin a cat. For
performance reasons I would always look for the way that embeds the final
result in the graph structure itself, so you don't need complex traversals
to get your answer. So in your case you want the 10 most popular routes, I
guess what you are looking for are relationships between pages that define a
route and a popularity score. So the final answer would be found by simply
sorting these relationships to the destination page by popularity. No
traversal required :-)

Your current structure is a good match for the incoming data, but requires
lots of traversing to determine the main answer you are after. So I would
vote for adding a new structure that includes the answer. I think I have an
idea that can be done during load if you know in advance the destination
node you want to analyse, as well as after load (second pass) if you want to
specify the destination node only at analysis time. I'll describe the
'during load' approach.

Load the apache log data, optionally building the structure you do now, but
also identifying all start points and routes to the destination. This can be
achieved by an in memory cache for each user session (visit) of the route
from the entry point, appended to as each new page is visited (just an
ArrayList of page Nodes, growing page-by-page), and when the destination
Page is reached, create a unique identifier for that route (eg. a string of
all node-ids in the route, or the hashcode of that). Then step back along
all nodes in the route, adding relations with
DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1,
and if the relationship already exists for that name, increment the count.

You can even load later apache logs to this and it will continue to
incremement the route counters nicely. And to reset the counters, just
delete all those route relationships.

Now the final answer for your query is only to iterate over all incoming
relationships to the destination page, and if the relationship type name
starts with 'ROUTE-' add to an ArrayList of relationships, and then sort
that list by the counter property. This should be almost instantaneous
result :-)

Of course, this algorithm assumes that the total number of possible routes
is not unreasonably high. I believe you can have something like 64k
relationship types, so using the relationship type for the route name is
possible. If you are uncomfortable with that, just use a static type like
'ROUTE', and put the relationship name in a relationship property. That
slightly increases the complexity of the test for the route during creation
and slightly decreases the complecity of the test for the route during the
final scoring. For this example, the performance difference is
insignificant.

Cheers, Craig


On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth wrote:

> Hi Tim!
>
> Maybe you can use the new traversal framework, this interface comes to
> mind:
>
> http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html
>
> Regarding the number of relationships, it could be a good idea to store
> it as a property on the node.
>
> /anders
>
> > Is there any way I can write a ReturnableEvaluator that can examine the
> > collection of nodes related to the current node? Is this even the correct
> > approach?
> >
> > I actually want to be able to return the 10 most popular routes to the
> > registration page. For the most popular, I can use the above algorithm,
> but for
> > the second it's going to be more tricky.
> >
> > Would I be able to search for all 10 routes in a single pass, or should I
> > perform multiple passes?
> >
> > Any help  would be really appreciated since I'm not really sure how to
> approach
> > this.
> >
> > Thanks,
> > Tim
> >
> >
> >
> >
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to traverse by the number of relationships between nodes?

2010-07-08 Thread Anders Nawroth
Hi Tim!

Maybe you can use the new traversal framework, this interface comes to mind:
http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html

Regarding the number of relationships, it could be a good idea to store 
it as a property on the node.

/anders

> Is there any way I can write a ReturnableEvaluator that can examine the
> collection of nodes related to the current node? Is this even the correct
> approach?
>
> I actually want to be able to return the 10 most popular routes to the
> registration page. For the most popular, I can use the above algorithm, but 
> for
> the second it's going to be more tricky.
>
> Would I be able to search for all 10 routes in a single pass, or should I
> perform multiple passes?
>
> Any help  would be really appreciated since I'm not really sure how to 
> approach
> this.
>
> Thanks,
> Tim
>
>
>
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] How to traverse by the number of relationships between nodes?

2010-07-07 Thread Logo Bogo
Hi,

I'm trying to use neo4j to work out what the most popular routes are through 
our 

web site that result in users registering.

I have Visitor nodes related with multiple Visit nodes. Each Visit is then 
related with multiple Page nodes by a relationship VISITED_STEP which indicates 
when the Visitor went to that Page in their visit, e.g. step 0 = the first page 
visited, 1 = the next, etc.

The algorithm I need will let me start at the registration page, and then 
traverse outwards. The next node it should visit would be the one with the 
greatest total number of incoming VISIT_STEP relationships. This node will be 
added to a data structure for later retrieval. From there, it should move on to 
the next node that has the greatest total number of incoming VISIT_STEP 
relationships, and again add the node to a data structure.

Is there any way I can write a ReturnableEvaluator that can examine the 
collection of nodes related to the current node? Is this even the correct 
approach?

I actually want to be able to return the 10 most popular routes to the 
registration page. For the most popular, I can use the above algorithm, but for 
the second it's going to be more tricky.

Would I be able to search for all 10 routes in a single pass, or should I 
perform multiple passes?

Any help  would be really appreciated since I'm not really sure how to approach 
this.

Thanks,
Tim



  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user