You might also want to take a look at the videos from the Twitter presentations 
from QCon London.

-----Original Message-----
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Aseem Kishore
Sent: Monday, November 21, 2011 4:43 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Activity Streams and Twitter Sample App

This is very interesting -- thanks Peter for the link, and thanks
maxdemarzi for starting this conversation.

In our social network -- which has extremely little load, it's just in beta
-- we currently use basically (a), and it works just fine. We use Cypher to
do the sorting/trimming on the server, e.g. roughly this query:

START user=(123)
MATCH (user) -[:FOLLOWS]-> (friend) -[:EVENT]-> (event)
RETURN event
ORDER BY event.timestamp DESC
LIMIT 20

This obviously will not scale, since Neo4j has to read every single
potential event from disk and process it in memory.

We too have been thinking about this in the back of our minds, and the
obvious solution in our minds is indeed to use a linked list timeline for
events, and have a custom traverser that "merge sorts" in realtime.

Since this'd be a linked list where you want the newest events at the head
of the list, there will indeed be a small write-time cost of having to
update two relationships whenever a new event is created (i.e. inserting it
at the head of the list).

Rene mentions this approach as the first ("baseline") approach he tried.
It's at 3:08 in the video in that post.

He decides that approach isn't good enough, however, because "merge
sorting" depends on the number of friends you have (# users you follow),
which he wants to avoid at read-time.

IMHO however, that's not worth optimizing for. It's rare that a user
follows a very high number of other people. E.g. it's usually around let's
say 100-1000 people.

The flip side -- the # of *followers* -- can be very high. And
interestingly, Rene's solution suffers from *that* at write-time. E.g. a
new event has to update every *followers'* Graphity index.

I'll provide Rene this feedback, but FWIW, I have the feeling that a simple
timeline will easily be good enough, in fact optimal, for a social network
on Neo4j:

- O(1) writes
- O(d) reads in space performance
- O(d log d) reads in time performance

Where d is the out-degree -- # of users you follow -- which is generally
small.

Cheers,
Aseem


On Mon, Nov 21, 2011 at 6:07 AM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> You might even be interested in Rene Pickards work on a full solution
> (albeit with some write-time overhead), see
>
> http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org              - NOSQL for the Enterprise.
> http://startupbootcamp.org/    - Ă–resund - Innovation happens HERE.
>
>
>
> On Thu, Nov 3, 2011 at 10:36 PM, maxdemarzi <maxdema...@gmail.com> wrote:
> > I had not considered imperfect solutions, and in some activity stream
> > scenarios a sampling of the last few messages could work.  The sample
> would
> > have to be taken from all Person nodes because if we sample from the
> Tweets
> > in general and we encounter a "chatty" person node early on, it would
> take
> > up all the sample space.
> >
> > Person1 -> Person 2 -> 10 tweets
> > Person1 -> Person3 -> 1000 tweets  = Sample Size reached, traversal stops
> > Person1 -> Person4 -> 10 tweets
> > Person1 -> PersonX -> 10 tweets
> >
> > Person3 would prevent Person4 to PersonX's tweets from ever making it to
> the
> > sample.
> >
> > Some applications in domains like Financials or Network Monitoring may
> > require the last known status and sampling might not be acceptable.
> >
> > --
> > View this message in context:
> http://neo4j-community-discussions.438527.n3.nabble.com/Activity-Streams-and-Twitter-Sample-App-tp3477669p3478477.html
> > Sent from the Neo4j Community Discussions mailing list archive at
> Nabble.com.
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to