Re: If you want Fat Pings just use Atom!

2005-09-04 Thread A. Pagaltzis

* Henri Sivonen [EMAIL PROTECTED] [2005-08-23 08:45]:
 On Aug 23, 2005, at 07:27, A. Pagaltzis wrote:
  I still dislike the idea of a “virtual closing tag” – it
  ain’t 1995 anymore, after all.
 
 You seem to be thinking that an XML parser needs to consider
 the whole document before reporting data to the app.

No, I’m not. I’m just thinking that an incomplete document can’t
be a well-formed one; I don’t think anyone here will want to deny
that.

I admit that I was *initially* only considering only DOM-based
processing, for which well-formedness is a prerequisite.

However I never disputed this:

 There's nothing in the XML spec requiring the app to throw away
 the data structures it has already built when the parser
 reports the error.

My discomfort about using ill-formed documents persists, though,
even if Tim Bray’s argument has convinced me that they are the
best practical choice in this context.

The spec does REQUIRE that Atom Documents be XML, and casting
well-formedness aside seems like a rather liberal interpretation
of this demand.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: If you want Fat Pings just use Atom!

2005-08-24 Thread Martin Duerst


At 22:58 05/08/23, Antone Roundy wrote:

On Monday, August 22, 2005, at 09:54  PM, A. Pagaltzis wrote:
 For this application, I would do just
 that, in which case, as a bonus, non-UTF-8 streams would get to
 avoid resending the XML preamble over and over and over.

Of course, if you do that, you won't be able to keep signatures for 
entries originally published in an encoding other than the one you've chosen.


Wrong, unfortunately. XML Signature requires to transcode
to UTF-8 before signing, exactly to protect against such
problems.

If one were to want to signal an encoding change mid-stream, how might 
that work with what's been proposed thus far?


You can't change an encoding midstream in an XML entity.
You can use different encodings for different external
entities, though.

Regrads,  Martin. 



RE: If you want Fat Pings just use Atom!

2005-08-23 Thread Bob Wyman

Bill de hÓra wrote:
 the problem is managing the stream buffer off the wire for a
 protocol model that has no underlying concept of an octet frame.
 I've written enough XMPP code to understand why the BEEP/MIME crowd
 might frown at it
Framing is, in fact, an exceptionally important issue. Fortunately,
HTTP offers us some framing capability in the form of chunked delivery. This
is much more light weight than what BEEP provides since HTTP assumes TCP/IP
as a transport layer while BEEP did not.
The HTTP chunked delivery method would be vastly superior to the
suggestions for doing thing like including form-feeds or sequences of nulls
as entry boundary markers. If you accept a simple rule that says that you
will insert HTTP chunk length markers between each entry sent in a
never-ending Atom file, you get something like the feed I show below.
Simply strip out the chunk length data prior to stuffing data into your XML
parser. If an entry appears to continue beyond a chunk boundary, discard
that entry and continue by reading the next chunk.

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
for more information on this method. Note that RFC2616 says: All HTTP/1.1
applications MUST be able to receive and decode the chunked
transfer-coding,...
Note: the chunk lengths are not correct in the following example.

GET /never-ending-feed.xml HTTP/1.1

HTTP/1.1 200 OK
Date: Fri Apr 8 17:41:11 2005
Server: FeedMesh/0.1
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml; charset=utf-8

ab
?xml version=1.0 encoding=utf-8?
feed ...
...

a8
entry
...
/entry

93
entry
...
/entry

And so forth until finally you get a /feed, the connection closes, or you
close the connection.

This is simple, requires no new specifications and provides for robust error
recovery in that broken entries can be easily detected and discarded.

bob wyman






Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Henri Sivonen


On Aug 23, 2005, at 07:27, A. Pagaltzis wrote:


I still dislike the idea of a “virtual closing tag” – it ain’t
1995 anymore, after all.


You seem to be thinking that an XML parser needs to consider the whole 
document before reporting data to the app. This is not the case.


If you have a document that lacks the end tag, the fatal error is hit 
when the data stream ends. All the subtrees rooted at children of the 
document element will be just fine. Try it with any streaming SAX 
parser.


In this case, the data stream ends only if either party decides to 
close the connection. The parser will never know how long the document 
would have been. From the parsers point of view it looks like a broken 
stream while reading a finite doc. There's nothing in the XML spec 
requiring the app to throw away the data structures it has already 
built when the parser reports the error.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Bill de hÓra

Tim Bray wrote:
 
 On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote:
 
 Essentially, LiveJournal is making this data available to anybody who
 wishes to access it, without any need to register or to invent a 
 unique API.


 Ahh, I had thought this was a more dedicated ping traffic stream. The
 never ending Atom document makes much more sense now.
 
 
 It's got another advantage.  You connect and ask for the feed.  You get
 
 feed xmlns=http://www.w3.org/2005/Atom;
  ... goes on forever 
 
 and none of the entry documents need to redeclare the Atom namespace, 
 which saves quite a few bytes after the first hundred thousand or so 
 entries. -Tim

Any value in using this to exercise the PaceBatch? If it dealt with the
use-case in terms of bulk transfer that would make it more compelling imo.

cheers
Bill



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Bjoern Hoehrmann

* Tim Bray wrote:
 That's a bit misleading, a fatal error just means that the XML
 processor must report the error to the application and that the
 processor is not required by the XML specification to continue
 processing; doing so is however an optional feature and further
 processing would be implementation-defined. So this scenario is
 unconstrained by the XML specifications.

No.  See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal  
error. -Tim

Yes, exactly what I wrote...
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Antone Roundy


On Monday, August 22, 2005, at 09:54  PM, A. Pagaltzis wrote:

* Martin Duerst [EMAIL PROTECTED] [2005-08-23 05:10]:

Well, modulo character encoding issues, that is. An FF will
look differently in UTF-16 than in ASCII-based encodings.

Depends on whether you specify a single encoding for all entries
at the HTTP level or not. For this application, I would do just
that, in which case, as a bonus, non-UTF-8 streams would get to
avoid resending the XML preamble over and over and over.


Of course, if you do that, you won't be able to keep signatures for 
entries originally published in an encoding other than the one you've 
chosen.


If one were to want to signal an encoding change mid-stream, how might 
that work with what's been proposed thus far?




Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Walter Underwood

--On August 23, 2005 9:40:44 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote:

 There's nothing in the XML spec requiring the app to throw away the data
 structures it has already built when the parser reports the error.

There is also nothing requiring it. It is optional. The only 
reqired behavior is to report the error and stop creating parsed
information. Otherwise, results are undefined according to the spec.

The spec does require that normal processing stop at the error.
The parser can make data past the error available, but it must not
continue to pass character data and information about the document's
logical structure to the application in the normal way.

This still feels like a hack to me. An unterminated document is 
not well-formed, and is not XML or Atom. Doing this should require
another RFC that says, we didn't really mean that it had to be XML.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Graham


On 23 Aug 2005, at 2:57 pm, Bjoern Hoehrmann wrote:


No.  See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal
error. -Tim



Yes, exactly what I wrote...


the XML processor must report the error to the application and that the
processor is not required by the XML specification to continue
processing; doing so is however an optional feature and further
processing would be implementation-defined

vs

Once a fatal error is detected, however, the processor must not  
continue normal processing (i.e., it must not continue to pass  
character data and information about the document's logical structure  
to the application in the normal way).


Graham



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Bjoern Hoehrmann

* Graham wrote:
the XML processor must report the error to the application and that the
processor is not required by the XML specification to continue
processing; doing so is however an optional feature and further
processing would be implementation-defined

vs

Once a fatal error is detected, however, the processor must not  
continue normal processing (i.e., it must not continue to pass  
character data and information about the document's logical structure  
to the application in the normal way).

Yes, the normal way could be for example to have the illformed flag on
each event not set, and an unnormal way would be to have the flag set.
In particular, the processor MAY make unprocessed data from the
document (with intermingled character data and markup) available to
the application and there is no constraint on what the application
may or must not do with such data.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Tim Bray


On Aug 23, 2005, at 6:57 AM, Bjoern Hoehrmann wrote:


That's a bit misleading, a fatal error just means that the XML
processor must report the error to the application and that the
processor is not required by the XML specification to continue
processing; doing so is however an optional feature...



No.  See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal
error. -Tim



Yes, exactly what I wrote...


No.  The specification clearly *forbids* proceeding with normal  
processing.  If you get a busted element in a feed, you'd better  
close that connection and open another one. -Tim




Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Tim Bray


On Aug 22, 2005, at 9:27 PM, A. Pagaltzis wrote:


It's got another advantage.  You connect and ask for the feed.
You get

feed xmlns=http://www.w3.org/2005/Atom;
 ... goes on forever 

and none of the entry documents need to redeclare the Atom
namespace, which saves quite a few bytes after the first
hundred thousand or so entries. -Tim


That’s the first really solid pro-single-doc argument I see…


There's another.  You don't have to create a new XML parser for each  
of the entries.  In most programming environments, that's a time- 
saver. -Tim





Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Tim Bray


On Aug 22, 2005, at 9:56 PM, Bjoern Hoehrmann wrote:


If you encounter a busted tag on the Nth entry, per the XML spec
that's a fatal error and you can't process any more.


That's a bit misleading, a fatal error just means that the XML
processor must report the error to the application and that the
processor is not required by the XML specification to continue
processing; doing so is however an optional feature and further
processing would be implementation-defined. So this scenario is
unconstrained by the XML specifications.


No.  See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal  
error. -Tim




Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Bill de hÓra

Bob Wyman wrote:
 Aristotle Pagaltzis wrote:
 
I wonder how you would make sure that the document is
well-formed. Since the stream never actually ends and there
is no way for a client to signal an intent to close the connection,
the feed at the top would never actually be accompanied by a
/feed at the bottom.
 
   This is a problem which has become well understood in the use and
 implementation of the XMPP/Jabber protocols which are based on streaming
 XML. 

It is: every XMPP server has a hack to deal with it ;)


 Basically, what you do is consider the open tag to have a virtual
 closure and use it primarily as a carrier of stream metadata. In XMPP
 terminology, your code works at picking stanzas out of the stream that can
 be parsed successfully or unsuccessfully on their own. In an Atom stream,
 the processor would consider each atom:entry to be a parseable atomic unit.

How XMPP does that opening stanza thing and then proceeds to not close
it is not a design to be emulated (imvho). The opening stanza should
arrived and be closed.


If you accept that the stream can never be a complete
well-formed document, is there any reason not to simply send a
stream of concatenated Atom Entry Documents?
That would seem like the absolute simplest solution.
 
   You could certainly do that, however, you will inevitably want to
 pass across some stream oriented metadata and you'll eventually realize that
 much of it is stuff that you can map into an Atom Feed. (i.e. created
 date, unique stream id, stream title, etc.). Since we're all in the process
 of learning how to deal with atom:feed elements anyway, why not just reuse
 what we've got instead of inventing something new.

In the Atom case, I don't see why you need to keep the atom:feed open.
Just send it once and then send entries in the raw.


   A rather nice side effect of forming the stream as an atom feed is
 the simple fact that a log of the stream can be written to disk as a
 well-formed Atom file. Thus, the same tools that you usually use to parse
 Atom files can be used to parse the log of the stream. It is nice to be able
 to reuse tools in this way... (Note: At PubSub, the atom files that we serve
 to people are, in essence, just slightly stripped logs of the proto-Atom
 over XMPP streams that they would have received if they had been listening
 with that protocol. In our clients we can use the same parser for the stream
 as we do for atom files. It works out nicely and elegantly.)

For high load scenarios using Atom/XMPP to decouple the atom processing
from xmpp stream handling you can use the log you're talking about as a
poor-man's message queue.

cheers
Bill



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Joe Gregorio

On 8/22/05, Sam Ruby [EMAIL PROTECTED] wrote:
 Joe Gregorio wrote:
  Why not POST the Atom Entry, ala the Atom Publishing Protocol?
 
 Essentially, LiveJournal is making this data available to anybody who
 wishes to access it, without any need to register or to invent a unique API.

Ahh, I had thought this was a more dedicated ping traffic stream. The 
never ending Atom document makes much more sense now.

  Thanks,
   -joe

-- 
Joe Gregoriohttp://bitworking.org



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 22, 2005 12:36:17 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote:

 With a HTTP client library and SAX, the absolute simplest solution is
 what Bob is describing: a single document that never completes.

Except that an endless document can't be legal XML, because XML requires
the root element to balance. An endless document never closes it. So, the
endless document cannot be legal Atom. Worse, there is no chance for error
recovery. One error, and the rest of the stream might not be parsable.

So, it is simple, but busted.

The standard trick here is to use a sequence of small docs, separated
by ASCII form-feed characters. That character is not legal within an
XML document, so it allows the stream to resyncronize on that character.
Besides, form-feed actually has almost the right semantics -- start a
new page.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread A. Pagaltzis

* Walter Underwood [EMAIL PROTECTED] [2005-08-22 18:35]:
 The standard trick here is to use a sequence of small docs,
 separated by ASCII form-feed characters. That character is not
 legal within an XML document, so it allows the stream to
 resyncronize on that character.

Ooh! I like this – simple and very clever.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Bill de hÓra

Walter Underwood wrote:

 The standard trick here is to use a sequence of small docs, separated
 by ASCII form-feed characters. That character is not legal within an
 XML document, so it allows the stream to resyncronize on that character.
 Besides, form-feed actually has almost the right semantics -- start a
 new page.

In XMPP, you can reset on seeing /atom:entry, assuming you don't have
atom:feed left hanging. You probably won't need atom:feed there anyway -
feeds are very much artefacts of going over HTTP.

cheers
Bill




Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Joe Gregorio

On 8/22/05, James M Snell [EMAIL PROTECTED] wrote:
 +1.. this seems a very elegant solution.

+1.

Indeed both solutions, the never ending feed, and the FF separated
entries both have their advantages. The FF separated stream
has the advantage of being able to synchronize mid-stream. The 
never ending feed has the advantage that you are only initializing
a SAX parser instance just once. 

Interestingly enough the FF separated entries method would also work 
when storing a large quantity of entries in a single flat file where
appending an entry needs to be fast.

   -joe

-- 
Joe Gregoriohttp://bitworking.org



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Mark Nottingham


Yep; an existance proof is server push, which is very similar (but  
not XML-based);

  http://wp.netscape.com/assist/net_sites/pushpull.html


On 21/08/2005, at 9:36 PM, Sam Ruby wrote:



A. Pagaltzis wrote:


* Bob Wyman [EMAIL PROTECTED] [2005-08-22 01:05]:



What do you think? Is there any conceptual problem with
streaming basic Atom over TCP/IP, HTTP continuous sessions
(probably using chunked content) etc.?



I wonder how you would make sure that the document is
well-formed. Since the stream never actually ends and there is no
way for a client to signal an intent to close the connection, the
feed at the top would never actually be accompanied by a
/feed at the bottom.

If you accept that the stream can never be a complete well-formed
document, is there any reason not to simply send a stream of
concatenated Atom Entry Documents?

That would seem like the absolute simplest solution.



I think the keyword in the above is complete.

SAX is a popular API for dealing with streaming XML (and there are a
number of pull parsing APIs too).  It makes individual elements
available to your application as they are read.  If at any point, the
SAX parser determines that your feed is not well formed, it throws an
error at that point.

With a HTTP client library and SAX, the absolute simplest  
solution is

what Bob is describing: a single document that never completes.

Note that if your application were to discard all the data it receives
before it encouters the first entry, the stream from there on out  
would

be identical.

- Sam Ruby






--
Mark Nottingham   Principal Technologist
Office of the CTO   BEA Systems



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Mark Nottingham


Just as a data point, this should become less of a problem as event- 
loop based HTTP implementations become more popular; with them, the  
number of connections you can hold open is only practically limited  
by available memory (to keep fairly small amounts of connection- 
specific state). This technique can allow tens to hundreds of  
thousands of concurrent connections, leading to multi-hour HTTP  
connections (if both sides want them).



On 21/08/2005, at 8:08 PM, Bob Wyman wrote:


The
problem is that HTTP connections, given the current infrastructure and
standard components, are very hard to keep open permanently or  
for a very
long period of time. One is often considered lucky if you can keep  
an HTTP

connection open for 5 minutes without having to re-initialize...



--
Mark Nottingham   Principal Technologist
Office of the CTO   BEA Systems



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 22, 2005 2:01:45 PM -0400 Joe Gregorio [EMAIL PROTECTED] wrote:

 Interestingly enough the FF separated entries method would also work 
 when storing a large quantity of entries in a single flat file where
 appending an entry needs to be fast.

The original application was logfiles in XML.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Bill de hÓra

A. Pagaltzis wrote:
 * Bill de hÓra [EMAIL PROTECTED] [2005-08-22 19:00]:
 
In XMPP, you can reset on seeing /atom:entry,
 
 
 Really?

Really. And I've even seen the two test cases below...


 
 ![CDATA[
 /entry
 ]]
 
 Or maybe
 
 ![CDATA[
 ![CDATA[
 /entry
 ]]

... which are cute. I forget I didn't have an opening tag, or a buffer
for the entry. Oh wait, I do ;)


 These are probably the only exceptions (I might be missing some,
 though), but they’re enough to demonstrate that you will need to
 write a parser, even if only a relatively simple one.

You already have an XML parser, that's not the problem; the problem is
managing the stream buffer off the wire for a protocol model that has no
underlying concept of an octet frame. I've written enough XMPP code to
understand why the BEEP/MIME crowd might frown at it - manipulating
infosets right on top on sockets make for funky code and isn't my notion
of what bits on the wire should be ;)

[Incidentally this is a non-problem for APP because we're piggy backing
on HTTP octets...]


 Using a character which is illegal in XML and can never be part
 of a well-formed document as a separator is a clever way to avoid
 having to do *any* parsing *whatsoever*. You just scan the stream
 for the character and start over when you see it, end of story.
 No need to keep state or look for patterns or anything else.

I see all the +1s, but don't understand why reinventing multi-part MIME
with formfeeds as a special case for Atom is more attractive that an
infinite list of entries whose closing atom:feed tag never arrives.
Still, I think this discussion is valuable: it speaks volumes on the use
of XML for wire protocols, especially for the single document school of
thought.

cheers
Bill



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Graham


Just out of interest, how well does any of this work with caches and  
proxies?


Graham



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Tim Bray


On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote:


Essentially, LiveJournal is making this data available to anybody who
wishes to access it, without any need to register or to invent a  
unique API.


Ahh, I had thought this was a more dedicated ping traffic stream. The
never ending Atom document makes much more sense now.


It's got another advantage.  You connect and ask for the feed.  You get

feed xmlns=http://www.w3.org/2005/Atom;
 ... goes on forever 

and none of the entry documents need to redeclare the Atom namespace,  
which saves quite a few bytes after the first hundred thousand or so  
entries. -Tim




Re: If you want Fat Pings just use Atom!

2005-08-22 Thread James M Snell


Justin Fletcher wrote:

I'm a little confused by all this discussion of never-ending XML 
documents, mainly because my understanding is that without the 
well-formedness checks the content might as well be free form, and the 
elements within the document may rely on parts that have 'yet to arrive'.


Taking as an example the atom:author element, with the above example 
of a never-ending document any atom:entry elements which exist would 
be quite valid in containing no atom:author element because they're 
not required to have one if the atom:feed element contains such an 
author. And because the feed has not yet finished the reading 
application cannot know that the document is invalid (or not - the 
atom:author element may arrive at some point in the future).


If an atom:feed contains an author element, it is required by the spec 
to appear before any atom:entry elements in the atom:feed. 
(http://www.atompub.org/2005/08/17/draft-ietf-atompub-format-11.html#rfc.section.4.1.1) 
so this isn't a problem.  Further, the spec really does not define any 
metadata that would be dependent on parts yet-to-arrive.  There could be 
a challenge with same-document links that use fragment identifiers, but 
that's about it.


- James



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Joe Gregorio

On 8/22/05, Justin Fletcher [EMAIL PROTECTED] wrote:
 
 On Mon, 22 Aug 2005, Tim Bray wrote:
 
  On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote:
 
  Essentially, LiveJournal is making this data available to anybody who
  wishes to access it, without any need to register or to invent a unique
  API.
 
  Ahh, I had thought this was a more dedicated ping traffic stream. The
  never ending Atom document makes much more sense now.
 
  It's got another advantage.  You connect and ask for the feed.  You get
 
  feed xmlns=http://www.w3.org/2005/Atom;
  ... goes on forever 
 
  and none of the entry documents need to redeclare the Atom namespace, which
  saves quite a few bytes after the first hundred thousand or so entries. -Tim
 
 I'm a little confused by all this discussion of never-ending XML
 documents, mainly because my understanding is that without the
 well-formedness checks the content might as well be free form, and the
 elements within the document may rely on parts that have 'yet to arrive'.

Not at all:

The atom:feed element is the document (i.e., top-level) element
of an Atom Feed Document, acting as a container for metadata and data
associated with the feed. Its element children consist of metadata
elements *followed by* zero or more atom:entry child elements.

http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html#rfc.section.4.1.1

   -joe

-- 
Joe Gregoriohttp://bitworking.org



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Martin Duerst


At 02:15 05/08/23, A. Pagaltzis wrote:

Using a character which is illegal in XML and can never be part
of a well-formed document as a separator is a clever way to avoid
having to do *any* parsing *whatsoever*. You just scan the stream
for the character and start over when you see it, end of story.
No need to keep state or look for patterns or anything else.

Well, modulo character encoding issues, that is. An FF will
look differently in UTF-16 than in ASCII-based encodings.

Regards,   Martin. 



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 23, 2005 12:01:11 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote:
 
 Well, modulo character encoding issues, that is. An FF will
 look differently in UTF-16 than in ASCII-based encodings.

Fine. Use two NULs. That is either one illegal UTF-16(BE or LE) character
or two illegal characters in ASCII or UTF-8.

Of course, a transport level multi-payload system would be preferred.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread A. Pagaltzis

* Bill de hÓra [EMAIL PROTECTED] [2005-08-22 21:45]:
 ... which are cute. I forget I didn't have an opening tag, or a
 buffer for the entry. Oh wait, I do ;)

Sure. I didn’t say it was impossible. I was just saying that you
have to do more than scan the stream for the sequence “/entry”.

 You already have an XML parser, that's not the problem;

The point is that if you get broken XML at some point, the parser
may be left in a state where it never closes an entry. Using an
illegal-in-XML character allows resynching without reconnecting
to the stream.

 [Incidentally this is a non-problem for APP because we're piggy
 backing on HTTP octets...]

Which Fat Pings are doing as well…

 I see all the +1s, but don't understand why reinventing
 multi-part MIME with formfeeds as a special case for Atom is
 more attractive that an infinite list of entries whose closing
 atom:feed tag never arrives.

Purely for the resynchronization aspect. If you believe that it’s
not an issue, then sure, there’s a lot less difference between
the two options.

* Martin Duerst [EMAIL PROTECTED] [2005-08-23 05:10]:
 Well, modulo character encoding issues, that is. An FF will
 look differently in UTF-16 than in ASCII-based encodings.

Depends on whether you specify a single encoding for all entries
at the HTTP level or not. For this application, I would do just
that, in which case, as a bonus, non-UTF-8 streams would get to
avoid resending the XML preamble over and over and over.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread A. Pagaltzis

* Tim Bray [EMAIL PROTECTED] [2005-08-23 02:35]:
 It's got another advantage.  You connect and ask for the feed.
 You get
 
 feed xmlns=http://www.w3.org/2005/Atom;
  ... goes on forever 
 
 and none of the entry documents need to redeclare the Atom
 namespace, which saves quite a few bytes after the first
 hundred thousand or so entries. -Tim

Hmm.

That’s the first really solid pro-single-doc argument I see…

And one I don’t think can be argued about, /realistically/.

• The ^L separation makes a lot of sense for logfiles, but less
  so for TCP-carried connections which ensure a basic level of
  data integrity. (With logfiles on disk all bets are off.)

• “Just discard the last incomplete document” logic can be
  implemented without require separate documents by processing
  and flushing the stack so far whenever an atom:entry
  start-element event is seen.

I still dislike the idea of a “virtual closing tag” – it ain’t
1995 anymore, after all. And I still think separate documents
would be cleaner and would require a bit less special case logic
to process.

But this *is* an application where saving bytes is a serious
enough concern, and I suppose that so long as the only thing
missing is exactly one atom:feed end-element event, the special
case is simple enough that I guess it’s acceptable.

Ugh. Reality.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Justin Fletcher


On Mon, 22 Aug 2005, James M Snell wrote:


Justin Fletcher wrote:

I'm a little confused by all this discussion of never-ending XML documents, 
mainly because my understanding is that without the well-formedness checks 
the content might as well be free form, and the elements within the 
document may rely on parts that have 'yet to arrive'.


Taking as an example the atom:author element, with the above example of a 
never-ending document any atom:entry elements which exist would be quite 
valid in containing no atom:author element because they're not required to 
have one if the atom:feed element contains such an author. And because the 
feed has not yet finished the reading application cannot know that the 
document is invalid (or not - the atom:author element may arrive at some 
point in the future).


If an atom:feed contains an author element, it is required by the spec to 
appear before any atom:entry elements in the atom:feed. 
(http://www.atompub.org/2005/08/17/draft-ietf-atompub-format-11.html#rfc.section.4.1.1) 
so this isn't a problem.  Further, the spec really does not define any 
metadata that would be dependent on parts yet-to-arrive.  There could be a 
challenge with same-document links that use fragment identifiers, but that's 
about it.


Ah; I misread that in the specification. Thanks. It's just the lack of 
well-formedness that is an issue in my head then.


--
Gerph http://gerph.org/
... Things get better second time around.



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Bjoern Hoehrmann

* Tim Bray wrote:
If you encounter a busted tag on the Nth entry, per the XML spec  
that's a fatal error and you can't process any more.

That's a bit misleading, a fatal error just means that the XML
processor must report the error to the application and that the
processor is not required by the XML specification to continue
processing; doing so is however an optional feature and further
processing would be implementation-defined. So this scenario is
unconstrained by the XML specifications.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Joe Gregorio wrote:
 Why not POST the Atom Entry, ala the Atom Publishing Protocol?
This would be an excellent idea if what we were talking about was a
low volume site. However, a site like LiveJournal generates hundreds of
updates per minute. Right now, on a Sunday evening, they are updating at the
rate of 349 entries per minute. During peak periods, they generate much more
traffic. Generating 349 POST messages per minute to perhaps 10 or 15
different services means that they would be pumping out thousands of these
things per minute. It just isn't reasonable.
Using an open TCP/IP socket to carry a stream of Atom Entries
results in much greater efficiencies with much reduced bandwidth and
processing requirements. 
At PubSub, we've been experimentally providing Fat Ping versions
of our FeedMesh feeds to a small group of testers. We publish messages at a
rate much higher than LiveJournal does -- since we publish all of
LiveJournal's content plus everyone else's. We couldn't even consider Fat
Pings if we had to create and tear down a TCP/IP-HTTP session to post each
individual entry.
There are many situations in which HTTP would work fine for Fat
Pings. However, for high-volume sites, it just isn't reasonable. The key, to
me, is that we establish the expectation that the Atom format is adequate to
the task (whatever the transport) and leave the transport selection as a
context dependent decision. Thus, some server/client pairs would exchange
streams of Atom entries using the POST based Atom Publishing Protocol while
others would exchange essentially the same streams using a more efficient
transport mechanism such as streaming raw sockets or even Atom over XMPP.

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 I wonder how you would make sure that the document is
 well-formed. Since the stream never actually ends and there
 is no way for a client to signal an intent to close the connection,
 the feed at the top would never actually be accompanied by a
 /feed at the bottom.
This is a problem which has become well understood in the use and
implementation of the XMPP/Jabber protocols which are based on streaming
XML. Basically, what you do is consider the open tag to have a virtual
closure and use it primarily as a carrier of stream metadata. In XMPP
terminology, your code works at picking stanzas out of the stream that can
be parsed successfully or unsuccessfully on their own. In an Atom stream,
the processor would consider each atom:entry to be a parseable atomic unit.

 If you accept that the stream can never be a complete
 well-formed document, is there any reason not to simply send a
 stream of concatenated Atom Entry Documents?
 That would seem like the absolute simplest solution.
You could certainly do that, however, you will inevitably want to
pass across some stream oriented metadata and you'll eventually realize that
much of it is stuff that you can map into an Atom Feed. (i.e. created
date, unique stream id, stream title, etc.). Since we're all in the process
of learning how to deal with atom:feed elements anyway, why not just reuse
what we've got instead of inventing something new.
A rather nice side effect of forming the stream as an atom feed is
the simple fact that a log of the stream can be written to disk as a
well-formed Atom file. Thus, the same tools that you usually use to parse
Atom files can be used to parse the log of the stream. It is nice to be able
to reuse tools in this way... (Note: At PubSub, the atom files that we serve
to people are, in essence, just slightly stripped logs of the proto-Atom
over XMPP streams that they would have received if they had been listening
with that protocol. In our clients we can use the same parser for the stream
as we do for atom files. It works out nicely and elegantly.)

bob wyman




Re: If you want Fat Pings just use Atom!

2005-08-21 Thread Joe Gregorio

On 8/21/05, Bob Wyman [EMAIL PROTECTED] wrote:
 Joe Gregorio wrote:
  Why not POST the Atom Entry, ala the Atom Publishing Protocol?
 This would be an excellent idea if what we were talking about was a
 low volume site. However, a site like LiveJournal generates hundreds of
 updates per minute. Right now, on a Sunday evening, they are updating at the
 rate of 349 entries per minute. During peak periods, they generate much more
 traffic. Generating 349 POST messages per minute to perhaps 10 or 15
 different services means that they would be pumping out thousands of these
 things per minute. It just isn't reasonable.
 Using an open TCP/IP socket to carry a stream of Atom Entries
 results in much greater efficiencies with much reduced bandwidth and
 processing requirements.

Why can't you keep that socket open, that is the default 
behavior for HTTP 1.1.

   -joe

-- 
Joe Gregoriohttp://bitworking.org



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread James M Snell


Bob Wyman wrote:


Joe Gregorio wrote:
 


Why not POST the Atom Entry, ala the Atom Publishing Protocol?
   


This would be an excellent idea if what we were talking about was a
low volume site. However, a site like LiveJournal generates hundreds of
updates per minute. Right now, on a Sunday evening, they are updating at the
rate of 349 entries per minute. During peak periods, they generate much more
traffic. Generating 349 POST messages per minute to perhaps 10 or 15
different services means that they would be pumping out thousands of these
things per minute. It just isn't reasonable.
Using an open TCP/IP socket to carry a stream of Atom Entries
results in much greater efficiencies with much reduced bandwidth and
processing requirements. 
	At PubSub, we've been experimentally providing Fat Ping versions

of our FeedMesh feeds to a small group of testers. We publish messages at a
rate much higher than LiveJournal does -- since we publish all of
LiveJournal's content plus everyone else's. We couldn't even consider Fat
Pings if we had to create and tear down a TCP/IP-HTTP session to post each
individual entry.
There are many situations in which HTTP would work fine for Fat
Pings. However, for high-volume sites, it just isn't reasonable. The key, to
me, is that we establish the expectation that the Atom format is adequate to
the task (whatever the transport) and leave the transport selection as a
context dependent decision. Thus, some server/client pairs would exchange
streams of Atom entries using the POST based Atom Publishing Protocol while
others would exchange essentially the same streams using a more efficient
transport mechanism such as streaming raw sockets or even Atom over XMPP.

 

First off, as a general FYI, take a look at PaceSimpleNotify... the 
current version uses basic HTTP POSTs to send one or more individual 
atom:entry's to a remote endpoint.  I'm hoping that the folks on the 
protocol list will pick this up in discussion in the near future as it 
is something that I definitely want to see incorporated. 

Secondly, I believe that the format is more than adequate to support 
this kind of mechanism.  I do not believe that Brad's atomStream 
container is necessary.  Either just stream a bunch of atom:feed or 
atom:entry elements directly over an open TCP/IP or a persistent 
(keep-alive) HTTP connection.  By no means would I ever suggest a new 
HTTP connection for each ping.


- James



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread 'A. Pagaltzis'

* Bob Wyman [EMAIL PROTECTED] [2005-08-22 04:00]:
 Basically, what you do is consider the open tag to have a
 virtual closure and use it primarily as a carrier of stream
 metadata.

Shades of SGML…

 You could certainly do that, however, you will inevitably want
 to pass across some stream oriented metadata and you'll
 eventually realize that much of it is stuff that you can map
 into an Atom Feed

OT1H, you could put this data in the stream as an empty but
complete Atom Feed Document served as the first complete entity
in the feed –

 A rather nice side effect of forming the stream as an atom feed
 is the simple fact that a log of the stream can be written to
 disk as a well-formed Atom file.

– but OTOH this is a pretty good point.

Of course, the question is whether it is really any more work to
receive an empty Atom Feed Document + X * Atom Entry Documents
and to insert the Entry Documents into the Feed Document for
storage.

Note that in the case of prepending an empty Atom Feed Document,
all fully received Documents are well-formed entities of their
own, so you don’t need a recovering XML parser that can implement
the “virtual closing element” semantic – all entities can be
processed with any run-of-the-mill XML parser.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Joe Gregorio wrote:
 Why can't you keep that socket open, that is the default
 behavior for HTTP 1.1.
In some applications, HTTP 1.1 will work just fine. However, HTTP
doesn't add much to the high volume case. It also costs a great deal. For
instance, every POST requires a response. This means that you're moving from
a pure streaming case to an endless sequence of application level ACK/NAKs
that are simply replicating what TCP/IP already does for you. Also, the HTTP
headers that would be required simply don't contribute anything useful. The
bandwidth overhead of the additional headers as well as the bandwidth,
processing and timing problems related to generating responses begins to
look pretty nasty when you're moving at hundreds of items per minute or
second...
One really good reason for using HTTP would be to exploit the
existing HTTP infrastructure including proxies, caches, application-level
firewalls, etc. However, I'm aware of no such infrastructure components that
are designed to handle well permanently open high-bandwidth connections. The
HTTP infrastructure is optimized around the normal uses of HTTP. This isn't
normal. 
One of the really irritating things about the current HTTP
infrastructure is that it is very fragile. This is a problem that has
caused unlimited headaches for the folk trying to do notification over
HTTP (mod-pubsub, KnowNow, various HTTP-based IM/chat systems, etc.). The
problem is that HTTP connections, given the current infrastructure and
standard components, are very hard to keep open permanently or for a very
long period of time. One is often considered lucky if you can keep an HTTP
connection open for 5 minutes without having to re-initialize... Of course,
during the period between when your connection breaks and when you get it
re-established, you're losing packets. That means that you have to have a
much more robust mechanism for recovering lost messages and that means
increased complexity, network traffic, etc. The added complexity and trouble
can be justified in some cases; however, not in all cases.
HTTP is great in some cases but not all. That's why the IETF has
defined BEEP, XMPP, SIP, SIMPLE, etc. in addition to HTTP. One protocol
model simply can't suit all needs at all times and in all contexts.
Whatever... The point here is that Atom already has defined all that
appears to be needed in order to address the Fat Ping requirement whether
you prefer individual HTTP POSTs, POSTs over HTTP 1.1 connections, XMPP, or
raw open TCP/IP sockets. That is a good thing.

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 Shades of SGML.
No! No! Not that! :-)

He continues with:
 ... many good points 

Basically, there are many really easy ways that one can handle
streams of Atom entries. You could prepend an empty feed to the head of the
stream, you could use virtual end-tags, you could just send entries and
rely on the receiver to wrap them up as required, etc... But, since all of
these are really easy and none of them really gets in the way of anything
rational that I can imagine someone wanting to do, why not just default to
doing it the way it is defined in the Atom spec? In that way, we don't have
to create one more context-dependent distinction between formats. Complexity
is reduced and we can avoid having to read yet-another-specification that
looks very, very much like hundreds we've read before. If Atom provides all
we need, lets not do something else unless there is a *very* good argument
to do so.

bob wyman




Re: If you want Fat Pings just use Atom!

2005-08-21 Thread 'A. Pagaltzis'

* Bob Wyman [EMAIL PROTECTED] [2005-08-22 05:25]:
 If Atom provides all we need, lets not do something else unless
 there is a *very* good argument to do so.

I’m not inventing anything. Atom Entry Documents are part of the
spec and Atom Feed Documents may legally be empty.

And a consumer of a stream according to your proposition does not
get away without implementing some special ability to be able to
interpret it correctly anyway – you are using “plain Atom” at the
cost of requiring a specifically abled XML parser.

My proposition requires a trivial amount of extra semantics
implemented at the application logic level; yours requires extra
semantics at the protocol logic level (the protocol being XML).

I think doing this in application logic is so cheap that reaching
for the protocol logic is unwarranted.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread James M Snell


Bob Wyman wrote:


Basically, there are many really easy ways that one can handle
streams of Atom entries. You could prepend an empty feed to the head of the
stream, you could use virtual end-tags, you could just send entries and
rely on the receiver to wrap them up as required, etc... But, since all of
these are really easy and none of them really gets in the way of anything
rational that I can imagine someone wanting to do, why not just default to
doing it the way it is defined in the Atom spec? In that way, we don't have
to create one more context-dependent distinction between formats. Complexity
is reduced and we can avoid having to read yet-another-specification that
looks very, very much like hundreds we've read before. If Atom provides all
we need, lets not do something else unless there is a *very* good argument
to do so.

bob wyman


 

+1. The basic format gives us everything we need to enable this.  Even 
looking back over my PaceSimpleNotify proposal.. in which I introduce a 
notification element used to identify the action that has occurred on 
the element (e.g. create, update or delete), I can see that there really 
is no need to have that element. 


- James



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread Sam Ruby

A. Pagaltzis wrote:
 * Bob Wyman [EMAIL PROTECTED] [2005-08-22 01:05]:
 
What do you think? Is there any conceptual problem with
streaming basic Atom over TCP/IP, HTTP continuous sessions
(probably using chunked content) etc.?
 
 I wonder how you would make sure that the document is
 well-formed. Since the stream never actually ends and there is no
 way for a client to signal an intent to close the connection, the
 feed at the top would never actually be accompanied by a
 /feed at the bottom.
 
 If you accept that the stream can never be a complete well-formed
 document, is there any reason not to simply send a stream of
 concatenated Atom Entry Documents?
 
 That would seem like the absolute simplest solution.

I think the keyword in the above is complete.

SAX is a popular API for dealing with streaming XML (and there are a
number of pull parsing APIs too).  It makes individual elements
available to your application as they are read.  If at any point, the
SAX parser determines that your feed is not well formed, it throws an
error at that point.

With a HTTP client library and SAX, the absolute simplest solution is
what Bob is describing: a single document that never completes.

Note that if your application were to discard all the data it receives
before it encouters the first entry, the stream from there on out would
be identical.

- Sam Ruby



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread Sam Ruby

Joe Gregorio wrote:
 Why not POST the Atom Entry, ala the Atom Publishing Protocol?

Essentially, LiveJournal is making this data available to anybody who
wishes to access it, without any need to register or to invent a unique API.

I can, and have, accessed the LiveJournal stream from behind both a
firewall and a NAT device.  Doing so requires the client to initiate the
request.  Therefore, if you really wanted to turn this around, the
client would need to initiate a POST, and the server would need to
return the Fat Pings as the response.

I talked to Brad - in fact, I had independently made the same suggestion
that Bob did.  Brad indicated that if there were clients with different
requirements, he was amenable to accommodating each - endpoints are cheap.

- Sam Ruby



Re: If you want Fat Pings just use Atom!

2005-08-21 Thread A. Pagaltzis

* Sam Ruby [EMAIL PROTECTED] [2005-08-22 06:45]:
 SAX is a popular API for dealing with streaming XML (and there
 are a number of pull parsing APIs too).

Of course – using a DOM parser is impossible with either approach
anyway.

 With a HTTP client library and SAX, the absolute simplest
 solution is what Bob is describing: a single document that
 never completes.

That can be argued both ways, I think. The important point is
that the connection can and almost certainly will be closed in
the middle of an entry document.

With a single, endless document, the application will have to
backtrack, discarding events until it has thrown away the last
seen start-element event for the incomplete entry, then close the
feed element.

With a series of concatenated complete documents, it can simply
discard everything that belongs to the current incomplete Entry
Document. There are implicit checkpoints in the stream.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/