Hi Alberto, On Wed, Jul 28, 2010 at 5:02 PM, Alberto Perdomo <[email protected]>wrote:
> Hi David, > > > > But then you need to store the result. You can store these metrics as > > relationships in neo4j, and then just update them for each user when > > you recompute. You can find the user nodes via indexing. Maybe it's > > acceptable that some metrics are out of date, so you can just > > background process them continuously. > > I already have background processes that go through all users and > calculate new new pairs. But then in order to do that I do need to > exclude the pairs I already have... because it would be silly and as > the relationship density grows the probablity of calculating a pair > again would be higher and higher... > Would I be able to do that kind of query using indexing? > >From your description it sounds like the factors that influence the metric don't change, so a single calculation per pair is enough. In this case, you could just determine the pairs in some way and then do the computation, storing the relationship in Neo4j. You can do it all in one go, nothing fancy. You would of course have to compute the metric to N peers for each new user. In other scenarios, the factors that influence the metric might change over time, e.g. a user's city or favorite movie. Then you actually need to keep recomputing the metric between existing users, and yes, then you probably want some scheme to make sure that you don't starve some users. You might for example want to prioritize the most active users first. Again, I don't know if this applies to your case though. As for the indexing, I'm not sure how you would use it here. Like, what kind of querying were you picturing? > > > Depending on your scenario, if your users know each other, it might be > > interesting to start computing in a foaf style order (breadth first). > > Remember, the power is in the relationships. Isolated nodes are not > > interesting. > > You mean I look first for possible pairs with users that are friends > of friends instead of randomly? We are also interesting in storing > friendship relationship so that sounds interesting. > That would be a different type of query: Traverse the graph from node > A to nodes which are friends of friends of A and have no match > relationship with A. I guess that is not difficult to implement using > Neo4j? > Exactly, so you might want to start with the most relevant other people, i.e. people you can realistically meet IRL via friends. Don't know if that's relevant to your application though. Neo4j would be a perfect fit for storing friendship relationships between users. It opens up all kinds of interesting data mining possibilities. The FOAF query would be easy to write using the Neo4j APIs, or some other tool such as Gremlin on top of Neo4j. So you could combine the friendship relationships with your processing step and prioritize active users, and start by checking people close to them in their social network. Again, if it's relevant. And, as Mattias suggested, if you can leverage friendship relationships between users, you might be able to calculate your metric on the fly, given that you limit the search to the user's extended social network. Of course, if you go deep enough, you might reach all users this way too. > > Thanks for your input David! > Glad to be of service. Ask as much as you like! We're all learning here :) > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

