> Mmm we cannot limit the number of relationships in this app. One the > most important features > is that we'll keep looking for good matches for the user. >
Since the trimming algorithm will entirely delete low scoring relationships, the search algorithm will also keep finding them again, and if something has changed (some data, or the scoring algorithm), then the relationships might score higher on a later round and not get trimmed (something else might be trimmed instead). Sure we can avoid storing low value relationships to avoid having an > almost complete graph. > We'll also limit the calculation of the score to users that match some > factors, like age, geographical region, language, etc. In this way the > graph might get dense but it won't really grow into a complete graph. > I think your factors for limitation would be part of the score. For example, large age differences would lower the score, large geographic separation also. Instead of making these hard constraints, just build them into the score. This allows you to adjust the score calculation as you decide how better to model the 'compatibility' and the graph can adjust accordingly. I would think also that whenever a new score formula is introduced, you would go through the graph and recalculate the scores of existing relationships, trimming new low scoring ones, before continuing with the ongoing search/trim cycle. This way a new scoring method would reflect very quickly in the graph without affecting runtime capabilities. Peoples relationships would not disappear only to appear at some unknown future time, they would just start adjusting in order as the new score took effect. For example, if you previously did not take into account age, a user might see high scoring partners on their list. When you add age into the formula, that users list would re-order, with some previous matches possibly even being trimmed right off the list if the age difference is high enough. Very dynamic :-) We can choose not to store relationships with a score under N. In this > we could end up calculating a possible match that we have already > discarded. This would mean sacrificing CPU in favor of a less dense > graph, but I think it might make sense. > Yes. The wasted CPU cycles would be most noticed once the graph is stable, and the algorithm is running in circles re-calculating the same stuff. That is a minor issue I think, but can be optimized for later. The main thing would be to be sure that when new data comes in the graph adjusts accordingly. So one optimization is to trigger recalculation cycles based on the rate of data input (new users, or users editing their profiles). I think that these edits will also re-prioritise the queue, since a new user should get to the front of the queue to get some initial matches, before an existing users gets his existing matches refined (possibly only marginally refined). _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

