[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-08 Thread John Kalucki

Eventually all requests will be handled by the third system, and the
second system will be removed from production. I don't know how this
will all play out. I'll see about getting Someone Who Knows to Do
Something.

-John


On Sep 7, 11:03 pm, PJB  wrote:
> John:
>
> Will the "third system" be used if, e.g., the user has 1000 friends
> and we request friends/ids WITHOUT pagination?  Or must we include
> pagination arguments even if <5000 to use the third system?
>
> PJB
>
> On Sep 7, 9:52 pm, John Kalucki  wrote:
>
> > I don't know all the details, but my general understanding is that
> > these bulk followers calls have been heavily returning 503s for quite
> > some time now, and this is long established, but bad, behavior. These
> > bulk calls are hard to support and they need to be moved over to some
> > form of practical pagination scheme. Ideally, we'd offer a stream of
> > social graph deltas on the Streaming API and this polling business
> > could be tightly restricted.
>
> > Bluntly, until further back-end work is in place, we can return 5k
> > followers reliably from the third system, or we can attempt to return
> > large result sets, but often throw 503s -- really, timeouts, from the
> > second system. We cannot return bulk operations, or use row-based
> > cursors, from the third system.
>
> > Scraping the social graph is certainly valuable in some cases, but
> > generally it's a low value proposition for users, and scraping is
> > often is used to support abusive behavior.
>
> > -John Kaluckihttp://twitter.com/jkalucki
> > Services, Twitter Inc.
>
> > On Sep 7, 9:27 pm, "David W."  wrote:
>
> > > Hi John,
>
> > > On Sep 6, 3:59 pm, John Kalucki  wrote:
>
> > > > resources. There is minor pagination jitter in one case and a certain
> > > > class of row-count-based queries have to be deprecated (or limited)
> > > > and replaced with cursor-based queries to be practical. For now, we're
> > > > sending the row-count-queries queries back to the second system, which
> > > > is otherwise idle, but isn't consistent with the first or third
> > > > system.
>
> > > I am getting several emails per day at the moment from users telling
> > > me my app's results are wrong. The application currently asks for the
> > > entire follower/following ID list at once, using /followers/ids and /
> > > friends/ids. Does this count as a "row-count-query"?
>
> > > David


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-07 Thread PJB



John:

Will the "third system" be used if, e.g., the user has 1000 friends
and we request friends/ids WITHOUT pagination?  Or must we include
pagination arguments even if <5000 to use the third system?

PJB

On Sep 7, 9:52 pm, John Kalucki  wrote:
> I don't know all the details, but my general understanding is that
> these bulk followers calls have been heavily returning 503s for quite
> some time now, and this is long established, but bad, behavior. These
> bulk calls are hard to support and they need to be moved over to some
> form of practical pagination scheme. Ideally, we'd offer a stream of
> social graph deltas on the Streaming API and this polling business
> could be tightly restricted.
>
> Bluntly, until further back-end work is in place, we can return 5k
> followers reliably from the third system, or we can attempt to return
> large result sets, but often throw 503s -- really, timeouts, from the
> second system. We cannot return bulk operations, or use row-based
> cursors, from the third system.
>
> Scraping the social graph is certainly valuable in some cases, but
> generally it's a low value proposition for users, and scraping is
> often is used to support abusive behavior.
>
> -John Kaluckihttp://twitter.com/jkalucki
> Services, Twitter Inc.
>
> On Sep 7, 9:27 pm, "David W."  wrote:
>
> > Hi John,
>
> > On Sep 6, 3:59 pm, John Kalucki  wrote:
>
> > > resources. There is minor pagination jitter in one case and a certain
> > > class of row-count-based queries have to be deprecated (or limited)
> > > and replaced with cursor-based queries to be practical. For now, we're
> > > sending the row-count-queries queries back to the second system, which
> > > is otherwise idle, but isn't consistent with the first or third
> > > system.
>
> > I am getting several emails per day at the moment from users telling
> > me my app's results are wrong. The application currently asks for the
> > entire follower/following ID list at once, using /followers/ids and /
> > friends/ids. Does this count as a "row-count-query"?
>
> > David
>
>


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-07 Thread John Kalucki

I don't know all the details, but my general understanding is that
these bulk followers calls have been heavily returning 503s for quite
some time now, and this is long established, but bad, behavior. These
bulk calls are hard to support and they need to be moved over to some
form of practical pagination scheme. Ideally, we'd offer a stream of
social graph deltas on the Streaming API and this polling business
could be tightly restricted.

Bluntly, until further back-end work is in place, we can return 5k
followers reliably from the third system, or we can attempt to return
large result sets, but often throw 503s -- really, timeouts, from the
second system. We cannot return bulk operations, or use row-based
cursors, from the third system.

Scraping the social graph is certainly valuable in some cases, but
generally it's a low value proposition for users, and scraping is
often is used to support abusive behavior.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Sep 7, 9:27 pm, "David W."  wrote:
> Hi John,
>
> On Sep 6, 3:59 pm, John Kalucki  wrote:
>
> > resources. There is minor pagination jitter in one case and a certain
> > class of row-count-based queries have to be deprecated (or limited)
> > and replaced with cursor-based queries to be practical. For now, we're
> > sending the row-count-queries queries back to the second system, which
> > is otherwise idle, but isn't consistent with the first or third
> > system.
>
> I am getting several emails per day at the moment from users telling
> me my app's results are wrong. The application currently asks for the
> entire follower/following ID list at once, using /followers/ids and /
> friends/ids. Does this count as a "row-count-query"?
>
> David


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-07 Thread David W.

I might add that, as ever, a message on status.twitter mentioning this
would really go a long way.


David.

On Sep 8, 5:27 am, "David W."  wrote:
> Hi John,
>
> On Sep 6, 3:59 pm, John Kalucki  wrote:
>
> > resources. There is minor pagination jitter in one case and a certain
> > class of row-count-based queries have to be deprecated (or limited)
> > and replaced with cursor-based queries to be practical. For now, we're
> > sending the row-count-queries queries back to the second system, which
> > is otherwise idle, but isn't consistent with the first or third
> > system.
>
> I am getting several emails per day at the moment from users telling
> me my app's results are wrong. The application currently asks for the
> entire follower/following ID list at once, using /followers/ids and /
> friends/ids. Does this count as a "row-count-query"?
>
> David


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-07 Thread David W.

Hi John,

On Sep 6, 3:59 pm, John Kalucki  wrote:

> resources. There is minor pagination jitter in one case and a certain
> class of row-count-based queries have to be deprecated (or limited)
> and replaced with cursor-based queries to be practical. For now, we're
> sending the row-count-queries queries back to the second system, which
> is otherwise idle, but isn't consistent with the first or third
> system.

I am getting several emails per day at the moment from users telling
me my app's results are wrong. The application currently asks for the
entire follower/following ID list at once, using /followers/ids and /
friends/ids. Does this count as a "row-count-query"?


David


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Dewald Pretorius

John,

Thanks for the background info. "Row count queries" means to me the
summary friends and followers numbers displayed on the Twitter web
pages, and returned on the user profile via the API, correct? So, if I
am understanding you correctly, then the friends and followers that
we're getting back from the social graph methods are pulled from the
third store, and doing a count() on the returned JSON array gives one
the actual valid numbers of current friends and followers. (Not that
users would ever believe us. LOL. They believe what they see on the
Twitter web pages.)

Anyway, I cannot imagine the challenges you must face with your
explosive growth. It will be interesting if, one day, one of your
engineers could give an overview of your technical architecture.
Facebook has done that (I remember the one regarding their image
serving) and it was very fascinating.

I will appreciate it if you can fix the 10+ seconds delay issue on
Tuesday or Wednesday. It's not a major "train smash" issue, it is just
slowing down my scripts to a great extent. They are battling to keep
up with the workload when they are slowed down like that.

Dewald

On Sep 6, 11:59 am, John Kalucki  wrote:
> I can't speak to the policy issues, but I'll share a few things about
> social graph backing stores.
>
> To put it politely, the social graph grows quickly. Projecting the
> growth out just 3 or 6 months causes most engineers to do a spit-
> take.
>
> We have three online (user-visible) ways of storing the social graph.
> One is considered canonical, but it is useless for online queries. The
> second used to handle all queries. This store began to suffer from
> correctness and internal inconsistency problems as this store was
> pushed well beyond its capabilities. We recognized this issue long
> before the issues became critical, allocated significant resources,
> and built a third store. This store is correct (eventually
> consistent), internally consistent, fast, efficient, very scalable,
> and we're very happy with it.
>
> As the second system was slagged into uselessness, we had to cut over
> the majority of the site to the third system when the third reached a
> good, but not totally perfect, state. As we cut over, all sorts of
> problems, bugs and issues were eliminated. Hope was restored, flowers
> bloomed, etc. Yet, the third store has two minor user-visible flaws
> that we are fixing. Note that working on a large critical production
> data store with heavy read and write volume takes time, care and
> resources. There is minor pagination jitter in one case and a certain
> class of row-count-based queries have to be deprecated (or limited)
> and replaced with cursor-based queries to be practical. For now, we're
> sending the row-count-queries queries back to the second system, which
> is otherwise idle, but isn't consistent with the first or third
> system.
>
> We also have follower and following counts memoized in two ways that I
> know about, and there's probably at least one more way that I don't
> know about.
>
> Experienced hands can intuit the trade-offs and well-agonized choices
> that were made when we were well-behind a steep growth curve on the
> social graph.
>
> These are the cards.
>
> -John Kaluckihttp://twitter.com/jkalucki
> Services, Twitter Inc.


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Nick Arnett
On Sun, Sep 6, 2009 at 1:52 PM, Jesse Stay  wrote:

> I don't understand how asking to release features earlier in the week is
> asking a lot?  What does that have to do with scaling social graphs?


I was referring to a beta environment.

Nick


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Jesse Stay
I don't understand how asking to release features earlier in the week is
asking a lot?  What does that have to do with scaling social graphs?
Jesse

On Sun, Sep 6, 2009 at 2:49 PM, Nick Arnett  wrote:

>
>
> On Sun, Sep 6, 2009 at 11:18 AM, Jesse Stay  wrote:
>
>> Thanks John.  I appreciate the various ways of accessing this data, but
>> when you guys make updates to any of these, can you either do it in a beta
>> environment we can test in first, or earlier in the week?  Where there are
>> very few Twitter engineers monitoring these lists during the weekends, and
>> we ourselves often have other plans, this really makes for an interesting
>> weekend for all of us when changes go into production that break code.  It
>> happens, but it would be nice to have this earlier in the week, or in a beta
>> environment we can test in.
>
>
>
> I think that's probably asking a lot of a company trying to grow as fast as
> Twitter.  Graphs are very hard to scale.  Ask anybody who has tried.
>
> Now if the graph weren't dependent on a centralized system
>
> Nick
>
>


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Nick Arnett
On Sun, Sep 6, 2009 at 11:18 AM, Jesse Stay  wrote:

> Thanks John.  I appreciate the various ways of accessing this data, but
> when you guys make updates to any of these, can you either do it in a beta
> environment we can test in first, or earlier in the week?  Where there are
> very few Twitter engineers monitoring these lists during the weekends, and
> we ourselves often have other plans, this really makes for an interesting
> weekend for all of us when changes go into production that break code.  It
> happens, but it would be nice to have this earlier in the week, or in a beta
> environment we can test in.



I think that's probably asking a lot of a company trying to grow as fast as
Twitter.  Graphs are very hard to scale.  Ask anybody who has tried.

Now if the graph weren't dependent on a centralized system

Nick


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread PJB


"For now, we're sending the row-count-queries queries back to the
second system, which is otherwise idle, but isn't consistent with the
first or third system. "

Can you help us better understand what queries you're talking about?
Do you mean, e.g., that any queries that call for *ALL* friends/ids
without pagination will use the second inconsistent system?  And so
the recommended solution would for us to change our queries to use
pagination... if we want accurate data?


[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Jesse Stay
Thanks John.  I appreciate the various ways of accessing this data, but when
you guys make updates to any of these, can you either do it in a beta
environment we can test in first, or earlier in the week?  Where there are
very few Twitter engineers monitoring these lists during the weekends, and
we ourselves often have other plans, this really makes for an interesting
weekend for all of us when changes go into production that break code.  It
happens, but it would be nice to have this earlier in the week, or in a beta
environment we can test in.
Also, when things like this do happen, is there a way you can lift following
limits for specific users so we can correct the wrong with out customers?

Thanks,

Jesse
On Sun, Sep 6, 2009 at 8:59 AM, John Kalucki  wrote:

>
> I can't speak to the policy issues, but I'll share a few things about
> social graph backing stores.
>
> To put it politely, the social graph grows quickly. Projecting the
> growth out just 3 or 6 months causes most engineers to do a spit-
> take.
>
> We have three online (user-visible) ways of storing the social graph.
> One is considered canonical, but it is useless for online queries. The
> second used to handle all queries. This store began to suffer from
> correctness and internal inconsistency problems as this store was
> pushed well beyond its capabilities. We recognized this issue long
> before the issues became critical, allocated significant resources,
> and built a third store. This store is correct (eventually
> consistent), internally consistent, fast, efficient, very scalable,
> and we're very happy with it.
>
> As the second system was slagged into uselessness, we had to cut over
> the majority of the site to the third system when the third reached a
> good, but not totally perfect, state. As we cut over, all sorts of
> problems, bugs and issues were eliminated. Hope was restored, flowers
> bloomed, etc. Yet, the third store has two minor user-visible flaws
> that we are fixing. Note that working on a large critical production
> data store with heavy read and write volume takes time, care and
> resources. There is minor pagination jitter in one case and a certain
> class of row-count-based queries have to be deprecated (or limited)
> and replaced with cursor-based queries to be practical. For now, we're
> sending the row-count-queries queries back to the second system, which
> is otherwise idle, but isn't consistent with the first or third
> system.
>
> We also have follower and following counts memoized in two ways that I
> know about, and there's probably at least one more way that I don't
> know about.
>
> Experienced hands can intuit the trade-offs and well-agonized choices
> that were made when we were well-behind a steep growth curve on the
> social graph.
>
> These are the cards.
>
> -John Kalucki
> http://twitter.com/jkalucki
> Services, Twitter Inc.
>