[twitter-dev] Re: Search API - 403 bursts and (maybe) a caching issue.

2009-10-27 Thread Marc W


A number of people are seeing similar things, especially if you
specify a since_id:

http://groups.google.com/group/twitter-development-talk/browse_thread/thread/e6289b6439c1d26d/e367ca8af09d28d5?lnk=gstq=searches+returning+no+tweetspli=1

My current (extremely bad) solution is to just hire hose and get all
the tweets every time, and then filter out those I've seen before by
id.  Gaaak!

I'll see if opening a support ticket or whatever helps with this
instead.

Mark.


On Oct 27, 6:28 am, briantroy brian.cosin...@gmail.com wrote:
 This is happening RIGHT NOW for the following:

 1) Go to search.twitter.com and enter tweetsforboobs OR
 tweetforboobs as the search.

 2) Go tohttp://tweetsforboobs.organd see the twitter feed on the
 left.

 Notice that the last tweet from 2 hours ago (VerticalMeasures) is not
 in the twitter feed on tweetsforboobs.org. Also note the ID of the
 tweet - from VerticalMeasures that is missing from tweetsforboobs.org:
 5181937429

 Now here is the log file of the Twitter API call:

 DEBUG: 06:18:01 PM on Mon October 26th Doing CURL fetch with User
 Agent: justsignal/1.0 (+http://justsignal.com) and 
 RFERER:http://justsignal.com/widgets/20ab5e90bf116397d6fb84ca80321928/widget...
 DEBUG: 06:18:01 PM on Mon October 26th Twitter responded with 200 HTTP
 Status Code.
 DEBUG: 06:18:01 PM on Mon October 26th MaxID: 5182676703
 DEBUG: 06:18:01 PM on Mon October 26th There are: 0 results in this
 fetch.
 Updating number for api hits for hour: 18 to: 6
 THROTTLE-69: 06:18:01 PM on Mon October 26th Slowing collection...
 Avg: 0 returning delay: 180
 DEBUG: 06:18:01 PM on Mon October 26th Checking for next page... 
 DEBUG: 06:18:01 PM on Mon October 26th There is NOT another page of
 results...
 DEBUG: 06:18:01 PM on Mon October 26th Old max: 5182676703 New max:
 5182676703
 DEBUG: 06:18:01 PM on Mon October 26th Old max: 5182676703 New max:
 5182676703

 Note that our id is already  the last tweet ID from VerticalMeasures,
 yet we never got that tweet. Our id from the log snip: (5182676703) is
 NOT in our database (we never got it). It does not match the tweet ID
 before Vertical Measures: 5180513610

 Somehow the API is returning a new (and bigger) max id on 200
 responses with no tweets in them OR on 403 (those are the only two
 http codes in the log for today). Either way, that shouldn't be
 happening.

 Brian Roy
 justSignal

 On Oct 26, 12:47 pm, briantroy brian.cosin...@gmail.com wrote:



  Everything below ONLY PERTAINS TO THE SEARCH API:

  1) Since late last week I've noticed a significant number of 403
  errors (403 Error from JSON: since_id too recent, poll less
  frequently). These usually indicate I'm hitting a server with an
  older view of the search index - since it thinks the ID I sent in
  since_id is newer than the newest it has. These trouble me because
  when I get a 200 after the 403 sometimes I get everything back to my
  since_id, sometimes I don't. I appears some indexes have gaps until
  they catch up.

  QUESTION: Are there any ongoing search indexing issues that you are
  aware of?

  2) Since late last week I've noticed that some search API requests
  appear to get stuck returning an empty json result (no new tweets).
  This can go on for HOURS (today one got stuck like this for 12 hours).
  When I restart my process sometimes this clears up (I get the backlog)
  - other times it does not (I continue to get 0 tweets in the json).
  All of the requests return HTTP 200 and valid json.

  QUESTION: Are they any ongoing caching issues with the search API?

  These issues are new in the last 7 days (since about last Thursday).
  My IP is whitelisted. I'm sending both a valid user agent and referrer
  header. My processes are throttled by the volume of tweets the
  receive. I've made no changes to my processing since late September.

  Any assistance would be appreciated. My user's are comparing what they
  see from my service to search.twitter.com and telling me we are
  broken.

  Regards,

  Brian Roy
  justSignal


[twitter-dev] Re: Search API - 403 bursts and (maybe) a caching issue.

2009-10-27 Thread Marc W

It looks as though it depends on the exact nature of the query.

The following always return up to date results, even with a since_id
(I haven't included those since_ids here)

http://search.twitter.com/search.json?q=hong+kong+OR+kowloonrpp=100
http://search.twitter.com/search.json?q=%23iphonerpp=100

but the following will just return 200 OK with no results:

http://search.twitter.com/search.json?q=from%3ADavidFeng+OR+from%3ABeijingWithKids+OR+from%3ABlueJDMBA+OR+from%3Akaiserkuo+OR+from%3Acharlieflint+OR+from%3Aourmaninshrpp=100
http://search.twitter.com/search.json?q=%23beijing+OR+%E5%8C%97%E4%BA%AC+OR+beijing+OR+%E5%8D%97%E7%BD%97%E6%95%85%E4%B9%A1+OR+nanluoguxiang+OR+%E4%B8%89%E9%87%8C%E5%B1%AF+OR+%E4%BA%94%E9%81%93%E5%8F%A3+OR+%E6%9C%9B%E4%BA%AC+OR+wudaokou+OR+sanlitunrpp=100

Interesting: change the above first query to:

http://search.twitter.com/search.json?q=hong+kong+OR+kowloon+OR+tsim+tsa+shuirpp=100

and then the results STOP coming if there is a since_id 


I've filed a support ticket with Twitter ( 623447 ) with this info,
and hopefully we'll see some progress on it.

Mark.


On Oct 28, 9:56 am, Marc W marcwanchipm...@gmail.com wrote:
 A number of people are seeing similar things, especially if you
 specify a since_id:

 http://groups.google.com/group/twitter-development-talk/browse_thread...

 My current (extremely bad) solution is to just hire hose and get all
 the tweets every time, and then filter out those I've seen before by
 id.  Gaaak!

 I'll see if opening a support ticket or whatever helps with this
 instead.

 Mark.

 On Oct 27, 6:28 am, briantroy brian.cosin...@gmail.com wrote:



  This is happening RIGHT NOW for the following:

  1) Go to search.twitter.com and enter tweetsforboobs OR
  tweetforboobs as the search.

  2) Go tohttp://tweetsforboobs.organdsee the twitter feed on the
  left.

  Notice that the last tweet from 2 hours ago (VerticalMeasures) is not
  in the twitter feed on tweetsforboobs.org. Also note the ID of the
  tweet - from VerticalMeasures that is missing from tweetsforboobs.org:
  5181937429

  Now here is the log file of the Twitter API call:

  DEBUG: 06:18:01 PM on Mon October 26th Doing CURL fetch with User
  Agent: justsignal/1.0 (+http://justsignal.com) and 
  RFERER:http://justsignal.com/widgets/20ab5e90bf116397d6fb84ca80321928/widget...
  DEBUG: 06:18:01 PM on Mon October 26th Twitter responded with 200 HTTP
  Status Code.
  DEBUG: 06:18:01 PM on Mon October 26th MaxID: 5182676703
  DEBUG: 06:18:01 PM on Mon October 26th There are: 0 results in this
  fetch.
  Updating number for api hits for hour: 18 to: 6
  THROTTLE-69: 06:18:01 PM on Mon October 26th Slowing collection...
  Avg: 0 returning delay: 180
  DEBUG: 06:18:01 PM on Mon October 26th Checking for next page... 
  DEBUG: 06:18:01 PM on Mon October 26th There is NOT another page of
  results...
  DEBUG: 06:18:01 PM on Mon October 26th Old max: 5182676703 New max:
  5182676703
  DEBUG: 06:18:01 PM on Mon October 26th Old max: 5182676703 New max:
  5182676703

  Note that our id is already  the last tweet ID from VerticalMeasures,
  yet we never got that tweet. Our id from the log snip: (5182676703) is
  NOT in our database (we never got it). It does not match the tweet ID
  before Vertical Measures: 5180513610

  Somehow the API is returning a new (and bigger) max id on 200
  responses with no tweets in them OR on 403 (those are the only two
  http codes in the log for today). Either way, that shouldn't be
  happening.

  Brian Roy
  justSignal

  On Oct 26, 12:47 pm, briantroy brian.cosin...@gmail.com wrote:

   Everything below ONLY PERTAINS TO THE SEARCH API:

   1) Since late last week I've noticed a significant number of 403
   errors (403 Error from JSON: since_id too recent, poll less
   frequently). These usually indicate I'm hitting a server with an
   older view of the search index - since it thinks the ID I sent in
   since_id is newer than the newest it has. These trouble me because
   when I get a 200 after the 403 sometimes I get everything back to my
   since_id, sometimes I don't. I appears some indexes have gaps until
   they catch up.

   QUESTION: Are there any ongoing search indexing issues that you are
   aware of?

   2) Since late last week I've noticed that some search API requests
   appear to get stuck returning an empty json result (no new tweets).
   This can go on for HOURS (today one got stuck like this for 12 hours).
   When I restart my process sometimes this clears up (I get the backlog)
   - other times it does not (I continue to get 0 tweets in the json).
   All of the requests return HTTP 200 and valid json.

   QUESTION: Are they any ongoing caching issues with the search API?

   These issues are new in the last 7 days (since about last Thursday).
   My IP is whitelisted. I'm sending both a valid user agent and referrer
   header. My processes are throttled by the volume of tweets the
   receive. I've made no changes to my processing since late

[twitter-dev] Lots of Couldn't find Status with ID messages

2009-10-14 Thread Marc W

Hello!

We regularly fetch feeds from the Twitter API and then take the max id
as the sinceid to subsequent calls.

The problem is that in the last two days, we suddenly are getting lots
of

Couldn't find Status with ID=XYZ

messages.  We weren't seeing these before, but now multiple times a
day.

Is there a correct way to avoid these messages or deal with them?
We're playing with a number of ideas ranging from incrementing the
sinceid until it works or just getting the full feed again and
discarding.

Am I doing something incorrectly?


Thanks,
mark.