Matt - I'll verify that is the issue (I assume I should have new
results on page one AND page 2 - otherwise there is something else
going on).

Brian

On May 15, 8:33 am, Matt Sanford <m...@twitter.com> wrote:
> Hi Brian,
>
>      My guess is that this is the same since_id/max_id pagination  
> confusion we have always had. If you look at the next_page URL in our  
> API you'll notice that it does not contain the since_id. If you are  
> searching with since_id and requesting multiple pages you need to  
> manually stop pagination once you find an id lower than your original  
> since_id. I know this is a pain but there is a large performance gain  
> in it on our back end. There was an update a few weeks ago [1] where I  
> talked about this and a warning message (twitter:warning in atom and  
> "warning" in JSON) was added to alert you to the fact it had been  
> removed. Does that sound like the cause of your issue?
>
> Thanks;
>   – Matt Sanford / @mzsanford
>       Twitter Dev
>
> [1] -http://groups.google.com/group/twitter-development-talk/browse_frm/th...
>
> On May 15, 2009, at 7:50 AM, briantroy wrote:
>
>
>
> > I've noticed this before but always tried to deal with it as a bug on
> > my side. It is, however, now clear to me that from time to time
> > Twitter Search API seems to ignore the since_id.
>
> > We track FollowFriday by polling Twitter Search every so often (the
> > process is throttled from 10 seconds to 180 seconds depending on how
> > many results we get). This works great 90% of the time. But on high
> > volume days (Fridays) I've noticed we get a lot of multi-page
> > responses causing us to make far too many requests to the Twitter API
> > (900/hour).
> > When attempting to figure out why we are making so many requests I
> > uncovered something very interesting. When we get a "tweet" we store
> > it in our database. That database has a unique index on the customer
> > id/Tweet Id. When we get mulit-page responses from Twitter and iterate
> > through each page the VAST MAJORITY of the Tweets violate this unique
> > index. What does this mean? That we already have that tweet.
> > Today, I turned on some additional debugging and saw that the tweets
> > we were getting from Twitter Search were, in fact, prior to the
> > since_id we sent.
>
> > This is causing us to POUND the API servers unnecessarily. There is,
> > however, really nothing I can do about it on my end.
>
> > Here is a snip of the log showing the failed inserts and the ID we are
> > working with. The last line shows you both the old max id and the new
> > max id (after processing the tweets). As you can see every tweet
> > violates the unique constraint (27 is the customer id). You can also
> > see that we've called the API for this one search 1016 times this
> > hour... which is WAY, WAY too much (16.9 times per second):
>
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522797' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('#<b>followfriday</b> edubloggers
> > @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo
> > @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http://
> > s3.amazonaws.com/twitter_production/profile_images/52716611/
> > Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +0000', 27)
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522766' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('thx for the #<b>followfriday</b>
> > love, @brokesocialite &amp; @silveroaklimo.  Also thx to @diamondemory
> > &amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953,
> > 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/
> > profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009
> > 14:41:51 +0000', 27)
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522760' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #<b>followfriday</b>
> > @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay
> > @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322,
> > 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/
> > profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51
> > +0000', 27)
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522759' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('Morning my tweets!!! <b>follow
> > friday</b>! Dnt forget to RT me in need of followers LOL!',1806522759,
> > 0, '', 11790458, 'Dae_Marie', 'en', 'http://s3.amazonaws.com/
> > twitter_production/profile_images/199283178/dae_babyyyy_normal.jpg',
> > 'Fri, 15 May 2009 14:41:50 +0000', 27)
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522752' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('<b>#ff</b> #<b>followfriday</b>
> > @dirtyert (he\'s started with scrap metal stories) and @soufron if you
> > speak French',1806522752, 0, '', 1704, 'vagredajr', 'en', 'http://
> > s3.amazonaws.com/twitter_production/profile_images/155241633/
> > _agreda_normal.jpg', 'Fri, 15 May 2009 14:41:50 +0000', 27)
> > NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate
> > entry '27-1806522729' for key 2
> > SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,
> > from_user_id, from_user, iso_language_code, profile_image_url,
> > created_at, bulk_svc_id) values('#<b>followfriday</b> @hootsuite
> > @FitnessMagazine @packagingdiva @MobileLifeToday',1806522729, 0, '',
> > 11893419, 'ServiceFoods', 'it', 'http://s3.amazonaws.com/
> > twitter_production/profile_images/141678280/SF-shrunken_normal.bmp',
> > 'Fri, 15 May 2009 14:41:50 +0000', 27)
> > Updating number for api hits for hour: 10 to: 1016
> > DEBUG: 10:45:37 AM on Fri May 15th Checking for next page... **?
> > page=11&max_id=1806554381&rpp=100&q=followfriday+OR+%22follow+friday
> > %22+OR+%23ff+OR+fastfollowfive**
> > DEBUG: 10:45:37 AM on Fri May 15th There is another page for this
> > search... doing the next page now...
> > DEBUG: 10:45:37 AM on Fri May 15th Old max: 1806554381 New max:
> > 1806554381
>
> > I'd love to help you track this down... I think it has something to do
> > with high volume search queries (and perhaps the API server not all
> > having the same index of the tweets at the same time). Regardless -
> > this will cause un-due load on the API servers... and serves NO
> > purpose for me...
>
> > Brian Roy
>
> > President and Founder - justSignal

Reply via email to