[twitter-dev] Re: Streaming API's XML format

John Kalucki Wed, 13 May 2009 09:31:27 -0700

The delimited=length option causes the length, in bytes, of the next
status to be placed in the stream. You should see:


...
</status>

1704
<?xml version="1.0" encoding="UTF-8"?>
<status>
...

Whereas, without the parameter, you see:
...
</status>

<?xml version="1.0" encoding="UTF-8"?>
<status>
...

The length allows you process the stream deterministically, with no
error-prone pre-parsing and with the minimum number of reads. This
also ensures that you call your parser exactly once per status.
There's some pseudo code in the API docs that shows how this might
work:

  while (true) {
    do {
      lengthBytes = readline()
    } while (lengthBytes.length < 1)
    parseMarkup(read(Integer(lengthBytes).parseInt()))
  }

-John


On May 13, 9:00 am, Ianiv Schweber <ian...@gmail.com> wrote:
> I'm trying the delimited parameter but I don't see any difference in the
> feed. I find it's description in the docs a little confusing:
>
> "Indicates that statuses should be delimited in the stream. Statuses are
> represented by a length, in bytes, a newline, and the status text that is
> exactly length bytes. Note that "keep-alive" newlines may be inserted before
> each length. Values: length in bytes (integer)"
>
> Makes it sound like I have to use
>
> http://stream.twitter.com/spritzer.xml?delimiter=length
>
> But what is length in this case? No matter what I use, the feed always looks
> the same.
>
> In terms of efficiency, it would seem to me that letting the parser read
> directly from one never ending XML document using a callback every time it
> parses a status is better than having to wrap the parser inside another one
> that splits the feed into documents then parsing each document individually.
>
> Ianiv Schweber
> ia...@blogaholics.ca
> Skype: ianivs
> Public Key:http://www.blogaholics.ca/ianivpubkey.asc
>
> On Wed, May 13, 2009 at 6:24 AM, John Kalucki <jkalu...@gmail.com> wrote:
>
> > Ianiv,
>
> > I'm glad you are giving the XML feed a close examination. Nearly all
> > consumers so far have been on the JSON feed.
>
> > The statuses begin their life upstream from Hosebird as these fully-
> > formed documents. Hosebird doesn't alter their contents, as it's
> > conceptualized as middleware and not a data-changing application. Both
> > the XML and JSON feed are pretty symmetrical.
>
> > I've found it easiest to parse the markup by using the delimited
> > parameter to read an entire status at once, and then pass that on to
> > the XML parser. If it is considerably more efficient to do something
> > else, perhaps we can make a change.
>
> > -John
>
> > On May 12, 9:38 pm, Ianiv  Schweber <ian...@gmail.com> wrote:
> > > Currently the spritzer stream looks like:
>
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <status> .... </status>
> > > \n
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <status> .... </status>
> > > \n
> > > ...
>
> > > I'm wondering why it contains a stream of XML documents instead of
> > > just one never ending document with the same format as the public
> > > timeline:
>
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <statuses>
> > > <status>...</status>
> > > <status>...</status>
> > > <status>...</status>
> > > ...
>
> > > To me it seems like it would be a lot easier just passing a stream
> > > like this to a parser. Instead, with the current stream of documents
> > > one has to look out for new prologs, split the stream at that point,
> > > parse that doc, reset the parser and continue. Not an insurmountable
> > > problem, but it seems like a lot of extra work that shouldn't really
> > > be needed.
>
> > > Just wondering what everyone else thinks.
>
> > > Ianiv Schweber

[twitter-dev] Re: Streaming API's XML format

Reply via email to