Hi everyone,

I would have an SQL db for the app besides the graph db.

I have users that I would store as nodes within the graph besides
storing them in SQL as well. Within those nodes I store attributes
like male/female, age or date of birth, etc.
I would have one kind of relationship for friendship, which doesn't
present any kind of problem and I would do the standard type of
queries neo4jr-social provides (e.g. friend suggestions, degrees of
separation, friends in common, ...)

We want to measure the compatibility/taste match/whatever between
users in background, meaning for instance how much you have in common.
This is done in Ruby. The result will be an integer between 0 and 100.
BTW, this value is symmetric, meaning it could be modelled as a
bidirectional relationship.

Let's say I have 10k users and for every user I calculate the match
between him and 10 other users.
If I store all the results I calculate I potentially up to 100k
relationships every day / 3m relationships every month. If I store
this in SQL it can turn into a bottleneck very fast. The table will
grow soon too big and the queries will be slower and slower.

That's when I started thinking in storing those relationships in Neo4j
because it's meant to handle a very large number of nodes and
relationships really efficiently. I can model that as a relationship
and either store the value inside the relationship or code the
relationship names as 'match_high, match_medium, match_low'

Now back to step 1. Selecting the users I'll be calculating new
relationships with. They must match certain criteria, e.g.
female/male, similar age, etc. and it could be pseudo random.
Now the first step if you think in SQL is to query for all users that
match the criteria and don't have a relationship with user A.

And then yesterday looking at the Neo4j docs I thought this kind of
query cannot be done. I could select all the users that match the
criteria from SQL, then query all the relationships for A from Neo4j,
substract those from the array of valid users and pick randomly n
users. Because n is a low value, perhaps 10, this looks to me like a
very inefficient way of doing this. Also it will be fast at the
beginning but it will get slower as the relationship density grows
with time...

Maybe I should consider a different strategy. I've been also
considering only storing high or interesting values but it would be
more interesting to have the n top users for A ordered by relationship
value. If I go ahead with this then I could just go and store it
within SQL.

This is not what we strive for but if I don't find a better way I'll
guess we'll have to live with that. Also the solution I find should be
easily scalable. It should also apply when having for instance 100k
users.

Any thoughts or comments?
What would you recommend?

Thanks for help guys!
Alberto.
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to