[twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread Larry Zhang
Hi everyone,

I have a program calling the statuses/sample method of a garden hose
of the Streaming API, and I am experiencing the following problem: the
timestamps of the tweets that I downloaded constantly drift behind
real-time, the time drift keeps increasing until it reaches around 25
minutes, and then I get a timeout from the request, sleep for 5
seconds and reset the connection. The time drift is also reset to 0
when the connection is reset.

One solution for this I have now is to proactively reset the
connection more frequently, e.g., if I reconnect every 1 minute, the
time drift I get will be at most 1 minute. But I am not sure whether
this is allow by the API.

So could anyone tell me if you have the same problem as mine or I am
using the API in the wrong way. And is it OK to reset connection every
minute?

I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
library for accessing the Streaming API.

Thanks a lot!
-Larry


Re: [twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread John Kalucki
Absolutely do not reset the connection and reconnect. Connections should be
long-lived on the Streaming API.

This is almost certainly a problem with the read throughput of your client,
or, less likely, with bandwidth from your system. Run curl(1) from the same
system and grep for the date field. It will almost certainly not fall
behind.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.




On Thu, Jul 8, 2010 at 8:31 AM, Larry Zhang yuelizh...@gmail.com wrote:

 Hi everyone,

 I have a program calling the statuses/sample method of a garden hose
 of the Streaming API, and I am experiencing the following problem: the
 timestamps of the tweets that I downloaded constantly drift behind
 real-time, the time drift keeps increasing until it reaches around 25
 minutes, and then I get a timeout from the request, sleep for 5
 seconds and reset the connection. The time drift is also reset to 0
 when the connection is reset.

 One solution for this I have now is to proactively reset the
 connection more frequently, e.g., if I reconnect every 1 minute, the
 time drift I get will be at most 1 minute. But I am not sure whether
 this is allow by the API.

 So could anyone tell me if you have the same problem as mine or I am
 using the API in the wrong way. And is it OK to reset connection every
 minute?

 I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
 library for accessing the Streaming API.

 Thanks a lot!
 -Larry



Re: [twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread Pascal Jürgens
Larry,

have you decoupled the processing code from tweepy's StreamListener, for 
example using a Queue.Queue oder some message queue server?

Pascal

On Jul 8, 2010, at 17:31 , Larry Zhang wrote:

 Hi everyone,
 
 I have a program calling the statuses/sample method of a garden hose
 of the Streaming API, and I am experiencing the following problem: the
 timestamps of the tweets that I downloaded constantly drift behind
 real-time, the time drift keeps increasing until it reaches around 25
 minutes, and then I get a timeout from the request, sleep for 5
 seconds and reset the connection. The time drift is also reset to 0
 when the connection is reset.
 
 One solution for this I have now is to proactively reset the
 connection more frequently, e.g., if I reconnect every 1 minute, the
 time drift I get will be at most 1 minute. But I am not sure whether
 this is allow by the API.
 
 So could anyone tell me if you have the same problem as mine or I am
 using the API in the wrong way. And is it OK to reset connection every
 minute?
 
 I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
 library for accessing the Streaming API.
 
 Thanks a lot!
 -Larry



Re: [twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread Pascal Jürgens
Larry,

moreover, I assume you checked I/O and CPU load. But even if that's not the 
issue, you should absolutely check if you have simplejson with c extension 
installed. The python included version is 1.9 which is decidedly slower than 
the new 2.x branch. You might see json decoding load drop by 50% or more.


Pascal


On Jul 8, 2010, at 17:31 , Larry Zhang wrote:

 Hi everyone,
 
 I have a program calling the statuses/sample method of a garden hose
 of the Streaming API, and I am experiencing the following problem: the
 timestamps of the tweets that I downloaded constantly drift behind
 real-time, the time drift keeps increasing until it reaches around 25
 minutes, and then I get a timeout from the request, sleep for 5
 seconds and reset the connection. The time drift is also reset to 0
 when the connection is reset.
 
 One solution for this I have now is to proactively reset the
 connection more frequently, e.g., if I reconnect every 1 minute, the
 time drift I get will be at most 1 minute. But I am not sure whether
 this is allow by the API.
 
 So could anyone tell me if you have the same problem as mine or I am
 using the API in the wrong way. And is it OK to reset connection every
 minute?
 
 I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
 library for accessing the Streaming API.
 
 Thanks a lot!
 -Larry