Hi Craig, On Thu, Jul 29, 2010 at 12:14 PM, Craig Taverner <[email protected]> wrote: > I think leveraging existing relationships is obviously valuable, but I > thought I'd throw in an idea for doing the original suggestion, pure random > search:
Sounds interesting. I think the way to go is to leverage existing relationships like (favorites, etc.) and the pseudo random. > Reword the original problem to instead of looking for a set of random > potential matches for every node, rather looking for new random > relationships. What I mean it find both A and B randomly. This can be done > at high performance by simply generating a random number between 0 and the > maximum node ID. Assuming most nodes are people, you will be able to > generate a sample set of random people almost instantly (need to trim the > set to real people nodes of course, removing invalid nodes and non-people > nodes, hence the word 'almost'). The array needs to be random but according to certain constraints, like age, gender, etc. For instance calculate the score with n users that are female or male and within an age range of ... > The sample set can be some pre-defined size, eg. 100 nodes. Then compute all > node-node relationships between the nodes in this set (up to 10k > relationships) with the following rules: > > - Ignore if a relationship already exists > - Possibly limit to only 10 relationships per node (your suggestion > above) Limit meaning in this run? Or at all times? The first is ok, the second not. I guess you mean exiting after I have computed already 10 new relationships right? > - If the total number of pre-existing relationships are high (or > relationships per node are high), invoke a trimming algorithm, for example > removing relationships of low weight, since you care less about them This might be interesting to keep the density low but I have to look at it since they way it works is that people see suggestions of other users that have a good match. I can't probably make them disappear suddenly. > > This idea can be run continously in the background thread, and if the > trimmer works well, will allow the total graph size to reach some stable > state with time. > > Then you can add features like when a node's properties are changed, delete > all those relationships to the node and pass the id into the background > process for immediate inclusion in the sample set (bypass the random > sampling for new or edited nodes, so they get some relationships > immediately). _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

