Hi,
With Gremlin (and I believe Cypher), you can do these operations lazily and
thus, be more memory efficient. Moreoever, I don't know how much speedup you
will get with parallelism, but I suspect the overhead of threads is going to
slow down your query --- (as I've seen with 'small queries' using parallel
branches). With large queries (touching millions of things, parallelism starts
to show benefits).
In Gremlin, this is how your query is represented, where if vertex 1 is "I":
m = [:]; x = [] as Set
g.v(1).out('follows').aggregate(x).sideEffect{m[it] =
it.in('follows').retain(x).count()} >> -1
m will have keys that are the vertices that you follow and values being the
number of shared followers of those followers (?! loopy talk ?!). More
specifically, it has the answers analagous to "vertex 2 followed by 4 people
that you follow."
If you use Gremlin 1.2 (which I don't think Neo4j has released with their
server yet), there is a more concise representation that I can show you. Also,
Gremlin 1.3-SNAPSHOT will make this query ~twice as fast, but its not released
yet :(. I can show you a trick to make it twice as fast if you are interested.
HTH, // Peter taught me what HTH means. Its a good salutation. I would
previously have done "Thanks, Marko" but that doesn't really make much sense.
Marko.
http://markorodriguez.com
On Aug 24, 2011, at 3:14 PM, Aseem Kishore wrote:
> Hi guys,
>
> We're building a social network which has an asymmetrical follower model
> like Twitter's: users "follow" each other.
>
> We have various views where we show a list of people. This could be e.g. all
> people in the network, or it might be some user's followers, or it might be
> a list of people that share interests, etc.
>
> In these views, it's easy to show how many followers each person has. But we
> also want to show a message like "Followed by 4 people you follow" next to
> each person. This helps show the trustworthiness/relevance of each person.
>
> We implemented that by logic like this:
>
> 1. Fetch the list of people that *I* follow.
> 2. Given the list of people we want to show, for each person in parallel...
> 3. ...Fetch the list of people that follow *that* person...
> 4. ...And compare this list with the list of people that I follow.
>
> Each "fetch" is a traverse (breadth first, max depth 1). This requires O(n)
> traverses, where "n" is the number of people we're showing in this view.
>
> (Assume that, generally, the number of people we're showing is smaller than
> the number of people I potentially follow, but the logic could be reversed
> if this is not the case: for each person I follow, fetch the list of people
> that *they* follow.)
>
> I wanted to do a sanity check: is this the best way of answering this
> question? Or is there a better way, e.g. via a single traverse somehow, or
> via a Cypher or Gremlin query?
>
> Thanks much!
>
> Aseem
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user