Even without the new traversal framework, the returnable evaluator has
access to the current node being evaluated and can investigate it's
relationships (and even run another traverser). I'm not sure if nested
traversing is a good idea, but I certainly have used methods like
getRelationships inside an evaluator with no problems.

As for the main goal, I think there are many ways to skin a cat. For
performance reasons I would always look for the way that embeds the final
result in the graph structure itself, so you don't need complex traversals
to get your answer. So in your case you want the 10 most popular routes, I
guess what you are looking for are relationships between pages that define a
route and a popularity score. So the final answer would be found by simply
sorting these relationships to the destination page by popularity. No
traversal required :-)

Your current structure is a good match for the incoming data, but requires
lots of traversing to determine the main answer you are after. So I would
vote for adding a new structure that includes the answer. I think I have an
idea that can be done during load if you know in advance the destination
node you want to analyse, as well as after load (second pass) if you want to
specify the destination node only at analysis time. I'll describe the
'during load' approach.

Load the apache log data, optionally building the structure you do now, but
also identifying all start points and routes to the destination. This can be
achieved by an in memory cache for each user session (visit) of the route
from the entry point, appended to as each new page is visited (just an
ArrayList of page Nodes, growing page-by-page), and when the destination
Page is reached, create a unique identifier for that route (eg. a string of
all node-ids in the route, or the hashcode of that). Then step back along
all nodes in the route, adding relations with
DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1,
and if the relationship already exists for that name, increment the count.

You can even load later apache logs to this and it will continue to
incremement the route counters nicely. And to reset the counters, just
delete all those route relationships.

Now the final answer for your query is only to iterate over all incoming
relationships to the destination page, and if the relationship type name
starts with 'ROUTE-' add to an ArrayList of relationships, and then sort
that list by the counter property. This should be almost instantaneous
result :-)

Of course, this algorithm assumes that the total number of possible routes
is not unreasonably high. I believe you can have something like 64k
relationship types, so using the relationship type for the route name is
possible. If you are uncomfortable with that, just use a static type like
'ROUTE', and put the relationship name in a relationship property. That
slightly increases the complexity of the test for the route during creation
and slightly decreases the complecity of the test for the route during the
final scoring. For this example, the performance difference is
insignificant.

Cheers, Craig


On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth <and...@neotechnology.com>wrote:

> Hi Tim!
>
> Maybe you can use the new traversal framework, this interface comes to
> mind:
>
> http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html
>
> Regarding the number of relationships, it could be a good idea to store
> it as a property on the node.
>
> /anders
>
> > Is there any way I can write a ReturnableEvaluator that can examine the
> > collection of nodes related to the current node? Is this even the correct
> > approach?
> >
> > I actually want to be able to return the 10 most popular routes to the
> > registration page. For the most popular, I can use the above algorithm,
> but for
> > the second it's going to be more tricky.
> >
> > Would I be able to search for all 10 routes in a single pass, or should I
> > perform multiple passes?
> >
> > Any help  would be really appreciated since I'm not really sure how to
> approach
> > this.
> >
> > Thanks,
> > Tim
> >
> >
> >
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to