Re: [Neo4j] How to traverse by the number of relationships between nodes?
Can you expand on this a bit... as to what the graph internals are doing: Option 1: You have "colored" relationships (RED, BLUE, GREEN, etc to 10k colors). >From a random node, you traverse the graph finding all nodes that it is connected to via the PURPLE or FUSIA relationship. vs Option 2: You have a COLOR relationship with a name property that contains the actual color name. >From a random node, you traverse the graph finding all nodes that it is connected to via the COLOR relationship containing the name "PURPLE" or "FUSIA" in the relationship. For some reason I thought it was more expensive (in terms of traversal time) to look up a property on a relationship than to simply pass a named relationship type. On Fri, Jul 9, 2010 at 8:45 AM, Johan Svensson wrote: > Hi, > > I would not recommend to use large amounts of different (dynamically > created) relationship types. It is better to use well defined > relationship types with an additional property on the relationship > whenever needed. The limit is actually not 64k but 2^31, but having > large amounts of relationship types like 10k-100k+ will reduce > performance and consume a lot of memory. > > Regards, > Johan > > On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr. > wrote: > > Can somebody verify the max number of relationship types? If it is 64k, > is > > there a way to increase it without significant effort? > > > > > >> I believe you can have something like 64k > >> relationship types, so using the relationship type for the route name is > >> possible. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi, I would not recommend to use large amounts of different (dynamically created) relationship types. It is better to use well defined relationship types with an additional property on the relationship whenever needed. The limit is actually not 64k but 2^31, but having large amounts of relationship types like 10k-100k+ will reduce performance and consume a lot of memory. Regards, Johan On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr. wrote: > Can somebody verify the max number of relationship types? If it is 64k, is > there a way to increase it without significant effort? > > >> I believe you can have something like 64k >> relationship types, so using the relationship type for the route name is >> possible. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi Craig, That's great, thanks a lot. I'll give it a go. Cheers, Tim - Original Message From: Craig Taverner To: Neo4j user discussions Sent: Thu, July 8, 2010 8:49:38 PM Subject: Re: [Neo4j] How to traverse by the number of relationships between nodes? Hi Tim, It is exactly the same approach, but instead of building the route cache while loading the graph, just do it on a second pass which traverses the graph. If the graph is structured like you describe, then write a traverser that visits each Visit node once, and for each Visit node iterate over the Page relationships, creating an Array of relationships. Sort the array by your visit order property and you have the route cache. Step backwards through the route creating the route relationships as described before. Cheers, Craig On Thu, Jul 8, 2010 at 5:33 PM, Tim Jones wrote: > Hi Craig, thanks for your answer. > > What's your approach that would allow me to specify the destination node at > analysis time? I'd like to retain the flexibility to do this too. > > Thanks > Tim > > > > - Original Message > From: Craig Taverner > To: Neo4j user discussions > Sent: Thu, July 8, 2010 12:09:28 PM > Subject: Re: [Neo4j] How to traverse by the number of relationships between > nodes? > > Even without the new traversal framework, the returnable evaluator has > access to the current node being evaluated and can investigate it's > relationships (and even run another traverser). I'm not sure if nested > traversing is a good idea, but I certainly have used methods like > getRelationships inside an evaluator with no problems. > > As for the main goal, I think there are many ways to skin a cat. For > performance reasons I would always look for the way that embeds the final > result in the graph structure itself, so you don't need complex traversals > to get your answer. So in your case you want the 10 most popular routes, I > guess what you are looking for are relationships between pages that define > a > route and a popularity score. So the final answer would be found by simply > sorting these relationships to the destination page by popularity. No > traversal required :-) > > Your current structure is a good match for the incoming data, but requires > lots of traversing to determine the main answer you are after. So I would > vote for adding a new structure that includes the answer. I think I have an > idea that can be done during load if you know in advance the destination > node you want to analyse, as well as after load (second pass) if you want > to > specify the destination node only at analysis time. I'll describe the > 'during load' approach. > > Load the apache log data, optionally building the structure you do now, but > also identifying all start points and routes to the destination. This can > be > achieved by an in memory cache for each user session (visit) of the route > from the entry point, appended to as each new page is visited (just an > ArrayList of page Nodes, growing page-by-page), and when the destination > Page is reached, create a unique identifier for that route (eg. a string of > all node-ids in the route, or the hashcode of that). Then step back along > all nodes in the route, adding relations with > DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1, > and if the relationship already exists for that name, increment the count. > > You can even load later apache logs to this and it will continue to > incremement the route counters nicely. And to reset the counters, just > delete all those route relationships. > > Now the final answer for your query is only to iterate over all incoming > relationships to the destination page, and if the relationship type name > starts with 'ROUTE-' add to an ArrayList of relationships, and then sort > that list by the counter property. This should be almost instantaneous > result :-) > > Of course, this algorithm assumes that the total number of possible routes > is not unreasonably high. I believe you can have something like 64k > relationship types, so using the relationship type for the route name is > possible. If you are uncomfortable with that, just use a static type like > 'ROUTE', and put the relationship name in a relationship property. That > slightly increases the complexity of the test for the route during creation > and slightly decreases the complecity of the test for the route during the > final scoring. For this example, the performance difference is > insignificant. > > Cheers, Craig > > > On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth >wrote: > > > Hi Tim! > > > > Maybe you can use the new t
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi Tim, It is exactly the same approach, but instead of building the route cache while loading the graph, just do it on a second pass which traverses the graph. If the graph is structured like you describe, then write a traverser that visits each Visit node once, and for each Visit node iterate over the Page relationships, creating an Array of relationships. Sort the array by your visit order property and you have the route cache. Step backwards through the route creating the route relationships as described before. Cheers, Craig On Thu, Jul 8, 2010 at 5:33 PM, Tim Jones wrote: > Hi Craig, thanks for your answer. > > What's your approach that would allow me to specify the destination node at > analysis time? I'd like to retain the flexibility to do this too. > > Thanks > Tim > > > > - Original Message > From: Craig Taverner > To: Neo4j user discussions > Sent: Thu, July 8, 2010 12:09:28 PM > Subject: Re: [Neo4j] How to traverse by the number of relationships between > nodes? > > Even without the new traversal framework, the returnable evaluator has > access to the current node being evaluated and can investigate it's > relationships (and even run another traverser). I'm not sure if nested > traversing is a good idea, but I certainly have used methods like > getRelationships inside an evaluator with no problems. > > As for the main goal, I think there are many ways to skin a cat. For > performance reasons I would always look for the way that embeds the final > result in the graph structure itself, so you don't need complex traversals > to get your answer. So in your case you want the 10 most popular routes, I > guess what you are looking for are relationships between pages that define > a > route and a popularity score. So the final answer would be found by simply > sorting these relationships to the destination page by popularity. No > traversal required :-) > > Your current structure is a good match for the incoming data, but requires > lots of traversing to determine the main answer you are after. So I would > vote for adding a new structure that includes the answer. I think I have an > idea that can be done during load if you know in advance the destination > node you want to analyse, as well as after load (second pass) if you want > to > specify the destination node only at analysis time. I'll describe the > 'during load' approach. > > Load the apache log data, optionally building the structure you do now, but > also identifying all start points and routes to the destination. This can > be > achieved by an in memory cache for each user session (visit) of the route > from the entry point, appended to as each new page is visited (just an > ArrayList of page Nodes, growing page-by-page), and when the destination > Page is reached, create a unique identifier for that route (eg. a string of > all node-ids in the route, or the hashcode of that). Then step back along > all nodes in the route, adding relations with > DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1, > and if the relationship already exists for that name, increment the count. > > You can even load later apache logs to this and it will continue to > incremement the route counters nicely. And to reset the counters, just > delete all those route relationships. > > Now the final answer for your query is only to iterate over all incoming > relationships to the destination page, and if the relationship type name > starts with 'ROUTE-' add to an ArrayList of relationships, and then sort > that list by the counter property. This should be almost instantaneous > result :-) > > Of course, this algorithm assumes that the total number of possible routes > is not unreasonably high. I believe you can have something like 64k > relationship types, so using the relationship type for the route name is > possible. If you are uncomfortable with that, just use a static type like > 'ROUTE', and put the relationship name in a relationship property. That > slightly increases the complexity of the test for the route during creation > and slightly decreases the complecity of the test for the route during the > final scoring. For this example, the performance difference is > insignificant. > > Cheers, Craig > > > On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth >wrote: > > > Hi Tim! > > > > Maybe you can use the new traversal framework, this interface comes to > > mind: > > > > > http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html > >l > > > > Regarding the number of relationships, it could be a good idea to store > &
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi Craig, thanks for your answer. What's your approach that would allow me to specify the destination node at analysis time? I'd like to retain the flexibility to do this too. Thanks Tim - Original Message From: Craig Taverner To: Neo4j user discussions Sent: Thu, July 8, 2010 12:09:28 PM Subject: Re: [Neo4j] How to traverse by the number of relationships between nodes? Even without the new traversal framework, the returnable evaluator has access to the current node being evaluated and can investigate it's relationships (and even run another traverser). I'm not sure if nested traversing is a good idea, but I certainly have used methods like getRelationships inside an evaluator with no problems. As for the main goal, I think there are many ways to skin a cat. For performance reasons I would always look for the way that embeds the final result in the graph structure itself, so you don't need complex traversals to get your answer. So in your case you want the 10 most popular routes, I guess what you are looking for are relationships between pages that define a route and a popularity score. So the final answer would be found by simply sorting these relationships to the destination page by popularity. No traversal required :-) Your current structure is a good match for the incoming data, but requires lots of traversing to determine the main answer you are after. So I would vote for adding a new structure that includes the answer. I think I have an idea that can be done during load if you know in advance the destination node you want to analyse, as well as after load (second pass) if you want to specify the destination node only at analysis time. I'll describe the 'during load' approach. Load the apache log data, optionally building the structure you do now, but also identifying all start points and routes to the destination. This can be achieved by an in memory cache for each user session (visit) of the route from the entry point, appended to as each new page is visited (just an ArrayList of page Nodes, growing page-by-page), and when the destination Page is reached, create a unique identifier for that route (eg. a string of all node-ids in the route, or the hashcode of that). Then step back along all nodes in the route, adding relations with DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1, and if the relationship already exists for that name, increment the count. You can even load later apache logs to this and it will continue to incremement the route counters nicely. And to reset the counters, just delete all those route relationships. Now the final answer for your query is only to iterate over all incoming relationships to the destination page, and if the relationship type name starts with 'ROUTE-' add to an ArrayList of relationships, and then sort that list by the counter property. This should be almost instantaneous result :-) Of course, this algorithm assumes that the total number of possible routes is not unreasonably high. I believe you can have something like 64k relationship types, so using the relationship type for the route name is possible. If you are uncomfortable with that, just use a static type like 'ROUTE', and put the relationship name in a relationship property. That slightly increases the complexity of the test for the route during creation and slightly decreases the complecity of the test for the route during the final scoring. For this example, the performance difference is insignificant. Cheers, Craig On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth wrote: > Hi Tim! > > Maybe you can use the new traversal framework, this interface comes to > mind: > >http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html >l > > Regarding the number of relationships, it could be a good idea to store > it as a property on the node. > > /anders > > > Is there any way I can write a ReturnableEvaluator that can examine the > > collection of nodes related to the current node? Is this even the correct > > approach? > > > > I actually want to be able to return the 10 most popular routes to the > > registration page. For the most popular, I can use the above algorithm, > but for > > the second it's going to be more tricky. > > > > Would I be able to search for all 10 routes in a single pass, or should I > > perform multiple passes? > > > > Any help would be really appreciated since I'm not really sure how to > approach > > this. > > > > Thanks, > > Tim > > > > > > > > > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > ___
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Can somebody verify the max number of relationship types? If it is 64k, is there a way to increase it without significant effort? > I believe you can have something like 64k > relationship types, so using the relationship type for the route name is > possible. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Even without the new traversal framework, the returnable evaluator has access to the current node being evaluated and can investigate it's relationships (and even run another traverser). I'm not sure if nested traversing is a good idea, but I certainly have used methods like getRelationships inside an evaluator with no problems. As for the main goal, I think there are many ways to skin a cat. For performance reasons I would always look for the way that embeds the final result in the graph structure itself, so you don't need complex traversals to get your answer. So in your case you want the 10 most popular routes, I guess what you are looking for are relationships between pages that define a route and a popularity score. So the final answer would be found by simply sorting these relationships to the destination page by popularity. No traversal required :-) Your current structure is a good match for the incoming data, but requires lots of traversing to determine the main answer you are after. So I would vote for adding a new structure that includes the answer. I think I have an idea that can be done during load if you know in advance the destination node you want to analyse, as well as after load (second pass) if you want to specify the destination node only at analysis time. I'll describe the 'during load' approach. Load the apache log data, optionally building the structure you do now, but also identifying all start points and routes to the destination. This can be achieved by an in memory cache for each user session (visit) of the route from the entry point, appended to as each new page is visited (just an ArrayList of page Nodes, growing page-by-page), and when the destination Page is reached, create a unique identifier for that route (eg. a string of all node-ids in the route, or the hashcode of that). Then step back along all nodes in the route, adding relations with DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1, and if the relationship already exists for that name, increment the count. You can even load later apache logs to this and it will continue to incremement the route counters nicely. And to reset the counters, just delete all those route relationships. Now the final answer for your query is only to iterate over all incoming relationships to the destination page, and if the relationship type name starts with 'ROUTE-' add to an ArrayList of relationships, and then sort that list by the counter property. This should be almost instantaneous result :-) Of course, this algorithm assumes that the total number of possible routes is not unreasonably high. I believe you can have something like 64k relationship types, so using the relationship type for the route name is possible. If you are uncomfortable with that, just use a static type like 'ROUTE', and put the relationship name in a relationship property. That slightly increases the complexity of the test for the route during creation and slightly decreases the complecity of the test for the route during the final scoring. For this example, the performance difference is insignificant. Cheers, Craig On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth wrote: > Hi Tim! > > Maybe you can use the new traversal framework, this interface comes to > mind: > > http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html > > Regarding the number of relationships, it could be a good idea to store > it as a property on the node. > > /anders > > > Is there any way I can write a ReturnableEvaluator that can examine the > > collection of nodes related to the current node? Is this even the correct > > approach? > > > > I actually want to be able to return the 10 most popular routes to the > > registration page. For the most popular, I can use the above algorithm, > but for > > the second it's going to be more tricky. > > > > Would I be able to search for all 10 routes in a single pass, or should I > > perform multiple passes? > > > > Any help would be really appreciated since I'm not really sure how to > approach > > this. > > > > Thanks, > > Tim > > > > > > > > > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi Tim! Maybe you can use the new traversal framework, this interface comes to mind: http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html Regarding the number of relationships, it could be a good idea to store it as a property on the node. /anders > Is there any way I can write a ReturnableEvaluator that can examine the > collection of nodes related to the current node? Is this even the correct > approach? > > I actually want to be able to return the 10 most popular routes to the > registration page. For the most popular, I can use the above algorithm, but > for > the second it's going to be more tricky. > > Would I be able to search for all 10 routes in a single pass, or should I > perform multiple passes? > > Any help would be really appreciated since I'm not really sure how to > approach > this. > > Thanks, > Tim > > > > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] How to traverse by the number of relationships between nodes?
Hi, I'm trying to use neo4j to work out what the most popular routes are through our web site that result in users registering. I have Visitor nodes related with multiple Visit nodes. Each Visit is then related with multiple Page nodes by a relationship VISITED_STEP which indicates when the Visitor went to that Page in their visit, e.g. step 0 = the first page visited, 1 = the next, etc. The algorithm I need will let me start at the registration page, and then traverse outwards. The next node it should visit would be the one with the greatest total number of incoming VISIT_STEP relationships. This node will be added to a data structure for later retrieval. From there, it should move on to the next node that has the greatest total number of incoming VISIT_STEP relationships, and again add the node to a data structure. Is there any way I can write a ReturnableEvaluator that can examine the collection of nodes related to the current node? Is this even the correct approach? I actually want to be able to return the 10 most popular routes to the registration page. For the most popular, I can use the above algorithm, but for the second it's going to be more tricky. Would I be able to search for all 10 routes in a single pass, or should I perform multiple passes? Any help would be really appreciated since I'm not really sure how to approach this. Thanks, Tim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user