[twitter-dev] Streaming API time drifting problem and possible solutions
Hi everyone, I have a program calling the statuses/sample method of a garden hose of the Streaming API, and I am experiencing the following problem: the timestamps of the tweets that I downloaded constantly drift behind real-time, the time drift keeps increasing until it reaches around 25 minutes, and then I get a timeout from the request, sleep for 5 seconds and reset the connection. The time drift is also reset to 0 when the connection is reset. One solution for this I have now is to proactively reset the connection more frequently, e.g., if I reconnect every 1 minute, the time drift I get will be at most 1 minute. But I am not sure whether this is allow by the API. So could anyone tell me if you have the same problem as mine or I am using the API in the wrong way. And is it OK to reset connection every minute? I am using Tweepy (http://github.com/joshthecoder/tweepy) as the library for accessing the Streaming API. Thanks a lot! -Larry
Re: [twitter-dev] Streaming API time drifting problem and possible solutions
Absolutely do not reset the connection and reconnect. Connections should be long-lived on the Streaming API. This is almost certainly a problem with the read throughput of your client, or, less likely, with bandwidth from your system. Run curl(1) from the same system and grep for the date field. It will almost certainly not fall behind. -John Kalucki http://twitter.com/jkalucki Infrastructure, Twitter Inc. On Thu, Jul 8, 2010 at 8:31 AM, Larry Zhang yuelizh...@gmail.com wrote: Hi everyone, I have a program calling the statuses/sample method of a garden hose of the Streaming API, and I am experiencing the following problem: the timestamps of the tweets that I downloaded constantly drift behind real-time, the time drift keeps increasing until it reaches around 25 minutes, and then I get a timeout from the request, sleep for 5 seconds and reset the connection. The time drift is also reset to 0 when the connection is reset. One solution for this I have now is to proactively reset the connection more frequently, e.g., if I reconnect every 1 minute, the time drift I get will be at most 1 minute. But I am not sure whether this is allow by the API. So could anyone tell me if you have the same problem as mine or I am using the API in the wrong way. And is it OK to reset connection every minute? I am using Tweepy (http://github.com/joshthecoder/tweepy) as the library for accessing the Streaming API. Thanks a lot! -Larry
Re: [twitter-dev] Streaming API time drifting problem and possible solutions
Larry, have you decoupled the processing code from tweepy's StreamListener, for example using a Queue.Queue oder some message queue server? Pascal On Jul 8, 2010, at 17:31 , Larry Zhang wrote: Hi everyone, I have a program calling the statuses/sample method of a garden hose of the Streaming API, and I am experiencing the following problem: the timestamps of the tweets that I downloaded constantly drift behind real-time, the time drift keeps increasing until it reaches around 25 minutes, and then I get a timeout from the request, sleep for 5 seconds and reset the connection. The time drift is also reset to 0 when the connection is reset. One solution for this I have now is to proactively reset the connection more frequently, e.g., if I reconnect every 1 minute, the time drift I get will be at most 1 minute. But I am not sure whether this is allow by the API. So could anyone tell me if you have the same problem as mine or I am using the API in the wrong way. And is it OK to reset connection every minute? I am using Tweepy (http://github.com/joshthecoder/tweepy) as the library for accessing the Streaming API. Thanks a lot! -Larry
Re: [twitter-dev] Streaming API time drifting problem and possible solutions
Larry, moreover, I assume you checked I/O and CPU load. But even if that's not the issue, you should absolutely check if you have simplejson with c extension installed. The python included version is 1.9 which is decidedly slower than the new 2.x branch. You might see json decoding load drop by 50% or more. Pascal On Jul 8, 2010, at 17:31 , Larry Zhang wrote: Hi everyone, I have a program calling the statuses/sample method of a garden hose of the Streaming API, and I am experiencing the following problem: the timestamps of the tweets that I downloaded constantly drift behind real-time, the time drift keeps increasing until it reaches around 25 minutes, and then I get a timeout from the request, sleep for 5 seconds and reset the connection. The time drift is also reset to 0 when the connection is reset. One solution for this I have now is to proactively reset the connection more frequently, e.g., if I reconnect every 1 minute, the time drift I get will be at most 1 minute. But I am not sure whether this is allow by the API. So could anyone tell me if you have the same problem as mine or I am using the API in the wrong way. And is it OK to reset connection every minute? I am using Tweepy (http://github.com/joshthecoder/tweepy) as the library for accessing the Streaming API. Thanks a lot! -Larry