I'm consuming the Streaming API using the filter method (tracking some
user ids).  I've noticed that I'm getting an extra, undocumented, line
before each length delimiter.

I connect and get the following coming down the pipe:

{{{

HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.17)

5DE
1496
{"coordinates":null, ... snip ..., "id":10487365330}

A52
2636
{"coordinates":null, ...snip ..., "id":10487377907}

592
1420
{"coordinates":null, ... snip ..., "id":10487298462}


}}}

Now, the Streaming API docs say, "Statuses are represented by a
length, in bytes, a newline, and the status text that is exactly
length bytes. Note that "keep-alive" newlines may be inserted before
each length."

This suggests the following read loop code (based on and equivalent to
the way tweepy's consumer is implemented):

{{{

length = ''
while True:
    c = s.recv(1)
    if c == '\n':
        break
    length += c
length = length.strip()
if length.isdigit():
    length = int(length)
    status_data = s.recv(length)
    # do something with the data

}}}

However, if you look at the third status data from above, you see that
the extra line can sometimes be a digit, in that case ``592``.  Which
fairly effectively borkes the consumer.

Now, I can hack that read loop in quite a few ways to accomodate this
extra data coming down the pipe.  Question is, what's the best way to
do so?  Is this something I can rely on, e.g.: I can look for a line
above the length delimiter?  Will it always have three chars?  Do
statuses always have > 1000 bytes?

Plus I'm wondering whether this has always been the case, or if there
are broken consumers missing tweets out there?

Thanks,

James.

Reply via email to