You appear to be looking at the raw HTTP chunk transfer encoded stream. The
documentation assumes that you are using a HTTP client, not the raw TCP
stream. If you are using the raw TCP stream, you can try to play games and
use the chunk encoding, but there are no guarantees that the chunks will
always align with the payload.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.


On Sun, Mar 14, 2010 at 3:43 PM, thruflo <thru...@googlemail.com> wrote:

> I'm consuming the Streaming API using the filter method (tracking some
> user ids).  I've noticed that I'm getting an extra, undocumented, line
> before each length delimiter.
>
> I connect and get the following coming down the pipe:
>
> {{{
>
> HTTP/1.1 200 OK
> Content-Type: application/json
> Transfer-Encoding: chunked
> Server: Jetty(6.1.17)
>
> 5DE
> 1496
> {"coordinates":null, ... snip ..., "id":10487365330}
>
> A52
> 2636
> {"coordinates":null, ...snip ..., "id":10487377907}
>
> 592
> 1420
> {"coordinates":null, ... snip ..., "id":10487298462}
>
>
> }}}
>
> Now, the Streaming API docs say, "Statuses are represented by a
> length, in bytes, a newline, and the status text that is exactly
> length bytes. Note that "keep-alive" newlines may be inserted before
> each length."
>
> This suggests the following read loop code (based on and equivalent to
> the way tweepy's consumer is implemented):
>
> {{{
>
> length = ''
> while True:
>    c = s.recv(1)
>    if c == '\n':
>        break
>    length += c
> length = length.strip()
> if length.isdigit():
>    length = int(length)
>    status_data = s.recv(length)
>    # do something with the data
>
> }}}
>
> However, if you look at the third status data from above, you see that
> the extra line can sometimes be a digit, in that case ``592``.  Which
> fairly effectively borkes the consumer.
>
> Now, I can hack that read loop in quite a few ways to accomodate this
> extra data coming down the pipe.  Question is, what's the best way to
> do so?  Is this something I can rely on, e.g.: I can look for a line
> above the length delimiter?  Will it always have three chars?  Do
> statuses always have > 1000 bytes?
>
> Plus I'm wondering whether this has always been the case, or if there
> are broken consumers missing tweets out there?
>
> Thanks,
>
> James.
>

Reply via email to