Re: If you want Fat Pings just use Atom!
* Henri Sivonen [EMAIL PROTECTED] [2005-08-23 08:45]: On Aug 23, 2005, at 07:27, A. Pagaltzis wrote: I still dislike the idea of a “virtual closing tag” – it ain’t 1995 anymore, after all. You seem to be thinking that an XML parser needs to consider the whole document before reporting data to the app. No, I’m not. I’m just thinking that an incomplete document can’t be a well-formed one; I don’t think anyone here will want to deny that. I admit that I was *initially* only considering only DOM-based processing, for which well-formedness is a prerequisite. However I never disputed this: There's nothing in the XML spec requiring the app to throw away the data structures it has already built when the parser reports the error. My discomfort about using ill-formed documents persists, though, even if Tim Bray’s argument has convinced me that they are the best practical choice in this context. The spec does REQUIRE that Atom Documents be XML, and casting well-formedness aside seems like a rather liberal interpretation of this demand. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: If you want Fat Pings just use Atom!
At 22:58 05/08/23, Antone Roundy wrote: On Monday, August 22, 2005, at 09:54 PM, A. Pagaltzis wrote: For this application, I would do just that, in which case, as a bonus, non-UTF-8 streams would get to avoid resending the XML preamble over and over and over. Of course, if you do that, you won't be able to keep signatures for entries originally published in an encoding other than the one you've chosen. Wrong, unfortunately. XML Signature requires to transcode to UTF-8 before signing, exactly to protect against such problems. If one were to want to signal an encoding change mid-stream, how might that work with what's been proposed thus far? You can't change an encoding midstream in an XML entity. You can use different encodings for different external entities, though. Regrads, Martin.
RE: If you want Fat Pings just use Atom!
Bill de hÓra wrote: the problem is managing the stream buffer off the wire for a protocol model that has no underlying concept of an octet frame. I've written enough XMPP code to understand why the BEEP/MIME crowd might frown at it Framing is, in fact, an exceptionally important issue. Fortunately, HTTP offers us some framing capability in the form of chunked delivery. This is much more light weight than what BEEP provides since HTTP assumes TCP/IP as a transport layer while BEEP did not. The HTTP chunked delivery method would be vastly superior to the suggestions for doing thing like including form-feeds or sequences of nulls as entry boundary markers. If you accept a simple rule that says that you will insert HTTP chunk length markers between each entry sent in a never-ending Atom file, you get something like the feed I show below. Simply strip out the chunk length data prior to stuffing data into your XML parser. If an entry appears to continue beyond a chunk boundary, discard that entry and continue by reading the next chunk. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1 for more information on this method. Note that RFC2616 says: All HTTP/1.1 applications MUST be able to receive and decode the chunked transfer-coding,... Note: the chunk lengths are not correct in the following example. GET /never-ending-feed.xml HTTP/1.1 HTTP/1.1 200 OK Date: Fri Apr 8 17:41:11 2005 Server: FeedMesh/0.1 Connection: close Transfer-Encoding: chunked Content-Type: application/xml; charset=utf-8 ab ?xml version=1.0 encoding=utf-8? feed ... ... a8 entry ... /entry 93 entry ... /entry And so forth until finally you get a /feed, the connection closes, or you close the connection. This is simple, requires no new specifications and provides for robust error recovery in that broken entries can be easily detected and discarded. bob wyman
Re: If you want Fat Pings just use Atom!
On Aug 23, 2005, at 07:27, A. Pagaltzis wrote: I still dislike the idea of a “virtual closing tag” – it ain’t 1995 anymore, after all. You seem to be thinking that an XML parser needs to consider the whole document before reporting data to the app. This is not the case. If you have a document that lacks the end tag, the fatal error is hit when the data stream ends. All the subtrees rooted at children of the document element will be just fine. Try it with any streaming SAX parser. In this case, the data stream ends only if either party decides to close the connection. The parser will never know how long the document would have been. From the parsers point of view it looks like a broken stream while reading a finite doc. There's nothing in the XML spec requiring the app to throw away the data structures it has already built when the parser reports the error. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: If you want Fat Pings just use Atom!
Tim Bray wrote: On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote: Essentially, LiveJournal is making this data available to anybody who wishes to access it, without any need to register or to invent a unique API. Ahh, I had thought this was a more dedicated ping traffic stream. The never ending Atom document makes much more sense now. It's got another advantage. You connect and ask for the feed. You get feed xmlns=http://www.w3.org/2005/Atom; ... goes on forever and none of the entry documents need to redeclare the Atom namespace, which saves quite a few bytes after the first hundred thousand or so entries. -Tim Any value in using this to exercise the PaceBatch? If it dealt with the use-case in terms of bulk transfer that would make it more compelling imo. cheers Bill
Re: If you want Fat Pings just use Atom!
* Tim Bray wrote: That's a bit misleading, a fatal error just means that the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature and further processing would be implementation-defined. So this scenario is unconstrained by the XML specifications. No. See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal error. -Tim Yes, exactly what I wrote... -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: If you want Fat Pings just use Atom!
On Monday, August 22, 2005, at 09:54 PM, A. Pagaltzis wrote: * Martin Duerst [EMAIL PROTECTED] [2005-08-23 05:10]: Well, modulo character encoding issues, that is. An FF will look differently in UTF-16 than in ASCII-based encodings. Depends on whether you specify a single encoding for all entries at the HTTP level or not. For this application, I would do just that, in which case, as a bonus, non-UTF-8 streams would get to avoid resending the XML preamble over and over and over. Of course, if you do that, you won't be able to keep signatures for entries originally published in an encoding other than the one you've chosen. If one were to want to signal an encoding change mid-stream, how might that work with what's been proposed thus far?
Re: If you want Fat Pings just use Atom!
--On August 23, 2005 9:40:44 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote: There's nothing in the XML spec requiring the app to throw away the data structures it has already built when the parser reports the error. There is also nothing requiring it. It is optional. The only reqired behavior is to report the error and stop creating parsed information. Otherwise, results are undefined according to the spec. The spec does require that normal processing stop at the error. The parser can make data past the error available, but it must not continue to pass character data and information about the document's logical structure to the application in the normal way. This still feels like a hack to me. An unterminated document is not well-formed, and is not XML or Atom. Doing this should require another RFC that says, we didn't really mean that it had to be XML. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
On 23 Aug 2005, at 2:57 pm, Bjoern Hoehrmann wrote: No. See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal error. -Tim Yes, exactly what I wrote... the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature and further processing would be implementation-defined vs Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way). Graham
Re: If you want Fat Pings just use Atom!
* Graham wrote: the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature and further processing would be implementation-defined vs Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way). Yes, the normal way could be for example to have the illformed flag on each event not set, and an unnormal way would be to have the flag set. In particular, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application and there is no constraint on what the application may or must not do with such data. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: If you want Fat Pings just use Atom!
On Aug 23, 2005, at 6:57 AM, Bjoern Hoehrmann wrote: That's a bit misleading, a fatal error just means that the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature... No. See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal error. -Tim Yes, exactly what I wrote... No. The specification clearly *forbids* proceeding with normal processing. If you get a busted element in a feed, you'd better close that connection and open another one. -Tim
Re: If you want Fat Pings just use Atom!
On Aug 22, 2005, at 9:27 PM, A. Pagaltzis wrote: It's got another advantage. You connect and ask for the feed. You get feed xmlns=http://www.w3.org/2005/Atom; ... goes on forever and none of the entry documents need to redeclare the Atom namespace, which saves quite a few bytes after the first hundred thousand or so entries. -Tim That’s the first really solid pro-single-doc argument I see… There's another. You don't have to create a new XML parser for each of the entries. In most programming environments, that's a time- saver. -Tim
Re: If you want Fat Pings just use Atom!
On Aug 22, 2005, at 9:56 PM, Bjoern Hoehrmann wrote: If you encounter a busted tag on the Nth entry, per the XML spec that's a fatal error and you can't process any more. That's a bit misleading, a fatal error just means that the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature and further processing would be implementation-defined. So this scenario is unconstrained by the XML specifications. No. See, http://www.w3.org/TR/REC-xml/#sec-terminology, under fatal error. -Tim
Re: If you want Fat Pings just use Atom!
Bob Wyman wrote: Aristotle Pagaltzis wrote: I wonder how you would make sure that the document is well-formed. Since the stream never actually ends and there is no way for a client to signal an intent to close the connection, the feed at the top would never actually be accompanied by a /feed at the bottom. This is a problem which has become well understood in the use and implementation of the XMPP/Jabber protocols which are based on streaming XML. It is: every XMPP server has a hack to deal with it ;) Basically, what you do is consider the open tag to have a virtual closure and use it primarily as a carrier of stream metadata. In XMPP terminology, your code works at picking stanzas out of the stream that can be parsed successfully or unsuccessfully on their own. In an Atom stream, the processor would consider each atom:entry to be a parseable atomic unit. How XMPP does that opening stanza thing and then proceeds to not close it is not a design to be emulated (imvho). The opening stanza should arrived and be closed. If you accept that the stream can never be a complete well-formed document, is there any reason not to simply send a stream of concatenated Atom Entry Documents? That would seem like the absolute simplest solution. You could certainly do that, however, you will inevitably want to pass across some stream oriented metadata and you'll eventually realize that much of it is stuff that you can map into an Atom Feed. (i.e. created date, unique stream id, stream title, etc.). Since we're all in the process of learning how to deal with atom:feed elements anyway, why not just reuse what we've got instead of inventing something new. In the Atom case, I don't see why you need to keep the atom:feed open. Just send it once and then send entries in the raw. A rather nice side effect of forming the stream as an atom feed is the simple fact that a log of the stream can be written to disk as a well-formed Atom file. Thus, the same tools that you usually use to parse Atom files can be used to parse the log of the stream. It is nice to be able to reuse tools in this way... (Note: At PubSub, the atom files that we serve to people are, in essence, just slightly stripped logs of the proto-Atom over XMPP streams that they would have received if they had been listening with that protocol. In our clients we can use the same parser for the stream as we do for atom files. It works out nicely and elegantly.) For high load scenarios using Atom/XMPP to decouple the atom processing from xmpp stream handling you can use the log you're talking about as a poor-man's message queue. cheers Bill
Re: If you want Fat Pings just use Atom!
On 8/22/05, Sam Ruby [EMAIL PROTECTED] wrote: Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? Essentially, LiveJournal is making this data available to anybody who wishes to access it, without any need to register or to invent a unique API. Ahh, I had thought this was a more dedicated ping traffic stream. The never ending Atom document makes much more sense now. Thanks, -joe -- Joe Gregoriohttp://bitworking.org
Re: If you want Fat Pings just use Atom!
--On August 22, 2005 12:36:17 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote: With a HTTP client library and SAX, the absolute simplest solution is what Bob is describing: a single document that never completes. Except that an endless document can't be legal XML, because XML requires the root element to balance. An endless document never closes it. So, the endless document cannot be legal Atom. Worse, there is no chance for error recovery. One error, and the rest of the stream might not be parsable. So, it is simple, but busted. The standard trick here is to use a sequence of small docs, separated by ASCII form-feed characters. That character is not legal within an XML document, so it allows the stream to resyncronize on that character. Besides, form-feed actually has almost the right semantics -- start a new page. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
* Walter Underwood [EMAIL PROTECTED] [2005-08-22 18:35]: The standard trick here is to use a sequence of small docs, separated by ASCII form-feed characters. That character is not legal within an XML document, so it allows the stream to resyncronize on that character. Ooh! I like this – simple and very clever. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: If you want Fat Pings just use Atom!
Walter Underwood wrote: The standard trick here is to use a sequence of small docs, separated by ASCII form-feed characters. That character is not legal within an XML document, so it allows the stream to resyncronize on that character. Besides, form-feed actually has almost the right semantics -- start a new page. In XMPP, you can reset on seeing /atom:entry, assuming you don't have atom:feed left hanging. You probably won't need atom:feed there anyway - feeds are very much artefacts of going over HTTP. cheers Bill
Re: If you want Fat Pings just use Atom!
On 8/22/05, James M Snell [EMAIL PROTECTED] wrote: +1.. this seems a very elegant solution. +1. Indeed both solutions, the never ending feed, and the FF separated entries both have their advantages. The FF separated stream has the advantage of being able to synchronize mid-stream. The never ending feed has the advantage that you are only initializing a SAX parser instance just once. Interestingly enough the FF separated entries method would also work when storing a large quantity of entries in a single flat file where appending an entry needs to be fast. -joe -- Joe Gregoriohttp://bitworking.org
Re: If you want Fat Pings just use Atom!
Yep; an existance proof is server push, which is very similar (but not XML-based); http://wp.netscape.com/assist/net_sites/pushpull.html On 21/08/2005, at 9:36 PM, Sam Ruby wrote: A. Pagaltzis wrote: * Bob Wyman [EMAIL PROTECTED] [2005-08-22 01:05]: What do you think? Is there any conceptual problem with streaming basic Atom over TCP/IP, HTTP continuous sessions (probably using chunked content) etc.? I wonder how you would make sure that the document is well-formed. Since the stream never actually ends and there is no way for a client to signal an intent to close the connection, the feed at the top would never actually be accompanied by a /feed at the bottom. If you accept that the stream can never be a complete well-formed document, is there any reason not to simply send a stream of concatenated Atom Entry Documents? That would seem like the absolute simplest solution. I think the keyword in the above is complete. SAX is a popular API for dealing with streaming XML (and there are a number of pull parsing APIs too). It makes individual elements available to your application as they are read. If at any point, the SAX parser determines that your feed is not well formed, it throws an error at that point. With a HTTP client library and SAX, the absolute simplest solution is what Bob is describing: a single document that never completes. Note that if your application were to discard all the data it receives before it encouters the first entry, the stream from there on out would be identical. - Sam Ruby -- Mark Nottingham Principal Technologist Office of the CTO BEA Systems
Re: If you want Fat Pings just use Atom!
Just as a data point, this should become less of a problem as event- loop based HTTP implementations become more popular; with them, the number of connections you can hold open is only practically limited by available memory (to keep fairly small amounts of connection- specific state). This technique can allow tens to hundreds of thousands of concurrent connections, leading to multi-hour HTTP connections (if both sides want them). On 21/08/2005, at 8:08 PM, Bob Wyman wrote: The problem is that HTTP connections, given the current infrastructure and standard components, are very hard to keep open permanently or for a very long period of time. One is often considered lucky if you can keep an HTTP connection open for 5 minutes without having to re-initialize... -- Mark Nottingham Principal Technologist Office of the CTO BEA Systems
Re: If you want Fat Pings just use Atom!
--On August 22, 2005 2:01:45 PM -0400 Joe Gregorio [EMAIL PROTECTED] wrote: Interestingly enough the FF separated entries method would also work when storing a large quantity of entries in a single flat file where appending an entry needs to be fast. The original application was logfiles in XML. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
A. Pagaltzis wrote: * Bill de hÓra [EMAIL PROTECTED] [2005-08-22 19:00]: In XMPP, you can reset on seeing /atom:entry, Really? Really. And I've even seen the two test cases below... ![CDATA[ /entry ]] Or maybe ![CDATA[ ![CDATA[ /entry ]] ... which are cute. I forget I didn't have an opening tag, or a buffer for the entry. Oh wait, I do ;) These are probably the only exceptions (I might be missing some, though), but they’re enough to demonstrate that you will need to write a parser, even if only a relatively simple one. You already have an XML parser, that's not the problem; the problem is managing the stream buffer off the wire for a protocol model that has no underlying concept of an octet frame. I've written enough XMPP code to understand why the BEEP/MIME crowd might frown at it - manipulating infosets right on top on sockets make for funky code and isn't my notion of what bits on the wire should be ;) [Incidentally this is a non-problem for APP because we're piggy backing on HTTP octets...] Using a character which is illegal in XML and can never be part of a well-formed document as a separator is a clever way to avoid having to do *any* parsing *whatsoever*. You just scan the stream for the character and start over when you see it, end of story. No need to keep state or look for patterns or anything else. I see all the +1s, but don't understand why reinventing multi-part MIME with formfeeds as a special case for Atom is more attractive that an infinite list of entries whose closing atom:feed tag never arrives. Still, I think this discussion is valuable: it speaks volumes on the use of XML for wire protocols, especially for the single document school of thought. cheers Bill
Re: If you want Fat Pings just use Atom!
Just out of interest, how well does any of this work with caches and proxies? Graham
Re: If you want Fat Pings just use Atom!
On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote: Essentially, LiveJournal is making this data available to anybody who wishes to access it, without any need to register or to invent a unique API. Ahh, I had thought this was a more dedicated ping traffic stream. The never ending Atom document makes much more sense now. It's got another advantage. You connect and ask for the feed. You get feed xmlns=http://www.w3.org/2005/Atom; ... goes on forever and none of the entry documents need to redeclare the Atom namespace, which saves quite a few bytes after the first hundred thousand or so entries. -Tim
Re: If you want Fat Pings just use Atom!
Justin Fletcher wrote: I'm a little confused by all this discussion of never-ending XML documents, mainly because my understanding is that without the well-formedness checks the content might as well be free form, and the elements within the document may rely on parts that have 'yet to arrive'. Taking as an example the atom:author element, with the above example of a never-ending document any atom:entry elements which exist would be quite valid in containing no atom:author element because they're not required to have one if the atom:feed element contains such an author. And because the feed has not yet finished the reading application cannot know that the document is invalid (or not - the atom:author element may arrive at some point in the future). If an atom:feed contains an author element, it is required by the spec to appear before any atom:entry elements in the atom:feed. (http://www.atompub.org/2005/08/17/draft-ietf-atompub-format-11.html#rfc.section.4.1.1) so this isn't a problem. Further, the spec really does not define any metadata that would be dependent on parts yet-to-arrive. There could be a challenge with same-document links that use fragment identifiers, but that's about it. - James
Re: If you want Fat Pings just use Atom!
On 8/22/05, Justin Fletcher [EMAIL PROTECTED] wrote: On Mon, 22 Aug 2005, Tim Bray wrote: On Aug 22, 2005, at 7:26 AM, Joe Gregorio wrote: Essentially, LiveJournal is making this data available to anybody who wishes to access it, without any need to register or to invent a unique API. Ahh, I had thought this was a more dedicated ping traffic stream. The never ending Atom document makes much more sense now. It's got another advantage. You connect and ask for the feed. You get feed xmlns=http://www.w3.org/2005/Atom; ... goes on forever and none of the entry documents need to redeclare the Atom namespace, which saves quite a few bytes after the first hundred thousand or so entries. -Tim I'm a little confused by all this discussion of never-ending XML documents, mainly because my understanding is that without the well-formedness checks the content might as well be free form, and the elements within the document may rely on parts that have 'yet to arrive'. Not at all: The atom:feed element is the document (i.e., top-level) element of an Atom Feed Document, acting as a container for metadata and data associated with the feed. Its element children consist of metadata elements *followed by* zero or more atom:entry child elements. http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html#rfc.section.4.1.1 -joe -- Joe Gregoriohttp://bitworking.org
Re: If you want Fat Pings just use Atom!
At 02:15 05/08/23, A. Pagaltzis wrote: Using a character which is illegal in XML and can never be part of a well-formed document as a separator is a clever way to avoid having to do *any* parsing *whatsoever*. You just scan the stream for the character and start over when you see it, end of story. No need to keep state or look for patterns or anything else. Well, modulo character encoding issues, that is. An FF will look differently in UTF-16 than in ASCII-based encodings. Regards, Martin.
Re: If you want Fat Pings just use Atom!
--On August 23, 2005 12:01:11 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote: Well, modulo character encoding issues, that is. An FF will look differently in UTF-16 than in ASCII-based encodings. Fine. Use two NULs. That is either one illegal UTF-16(BE or LE) character or two illegal characters in ASCII or UTF-8. Of course, a transport level multi-payload system would be preferred. wunder -- Walter Underwood Principal Software Architect, Verity
Re: If you want Fat Pings just use Atom!
* Bill de hÓra [EMAIL PROTECTED] [2005-08-22 21:45]: ... which are cute. I forget I didn't have an opening tag, or a buffer for the entry. Oh wait, I do ;) Sure. I didn’t say it was impossible. I was just saying that you have to do more than scan the stream for the sequence “/entry”. You already have an XML parser, that's not the problem; The point is that if you get broken XML at some point, the parser may be left in a state where it never closes an entry. Using an illegal-in-XML character allows resynching without reconnecting to the stream. [Incidentally this is a non-problem for APP because we're piggy backing on HTTP octets...] Which Fat Pings are doing as well… I see all the +1s, but don't understand why reinventing multi-part MIME with formfeeds as a special case for Atom is more attractive that an infinite list of entries whose closing atom:feed tag never arrives. Purely for the resynchronization aspect. If you believe that it’s not an issue, then sure, there’s a lot less difference between the two options. * Martin Duerst [EMAIL PROTECTED] [2005-08-23 05:10]: Well, modulo character encoding issues, that is. An FF will look differently in UTF-16 than in ASCII-based encodings. Depends on whether you specify a single encoding for all entries at the HTTP level or not. For this application, I would do just that, in which case, as a bonus, non-UTF-8 streams would get to avoid resending the XML preamble over and over and over. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: If you want Fat Pings just use Atom!
* Tim Bray [EMAIL PROTECTED] [2005-08-23 02:35]: It's got another advantage. You connect and ask for the feed. You get feed xmlns=http://www.w3.org/2005/Atom; ... goes on forever and none of the entry documents need to redeclare the Atom namespace, which saves quite a few bytes after the first hundred thousand or so entries. -Tim Hmm. That’s the first really solid pro-single-doc argument I see… And one I don’t think can be argued about, /realistically/. • The ^L separation makes a lot of sense for logfiles, but less so for TCP-carried connections which ensure a basic level of data integrity. (With logfiles on disk all bets are off.) • “Just discard the last incomplete document” logic can be implemented without require separate documents by processing and flushing the stack so far whenever an atom:entry start-element event is seen. I still dislike the idea of a “virtual closing tag” – it ain’t 1995 anymore, after all. And I still think separate documents would be cleaner and would require a bit less special case logic to process. But this *is* an application where saving bytes is a serious enough concern, and I suppose that so long as the only thing missing is exactly one atom:feed end-element event, the special case is simple enough that I guess it’s acceptable. Ugh. Reality. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: If you want Fat Pings just use Atom!
On Mon, 22 Aug 2005, James M Snell wrote: Justin Fletcher wrote: I'm a little confused by all this discussion of never-ending XML documents, mainly because my understanding is that without the well-formedness checks the content might as well be free form, and the elements within the document may rely on parts that have 'yet to arrive'. Taking as an example the atom:author element, with the above example of a never-ending document any atom:entry elements which exist would be quite valid in containing no atom:author element because they're not required to have one if the atom:feed element contains such an author. And because the feed has not yet finished the reading application cannot know that the document is invalid (or not - the atom:author element may arrive at some point in the future). If an atom:feed contains an author element, it is required by the spec to appear before any atom:entry elements in the atom:feed. (http://www.atompub.org/2005/08/17/draft-ietf-atompub-format-11.html#rfc.section.4.1.1) so this isn't a problem. Further, the spec really does not define any metadata that would be dependent on parts yet-to-arrive. There could be a challenge with same-document links that use fragment identifiers, but that's about it. Ah; I misread that in the specification. Thanks. It's just the lack of well-formedness that is an issue in my head then. -- Gerph http://gerph.org/ ... Things get better second time around.
Re: If you want Fat Pings just use Atom!
* Tim Bray wrote: If you encounter a busted tag on the Nth entry, per the XML spec that's a fatal error and you can't process any more. That's a bit misleading, a fatal error just means that the XML processor must report the error to the application and that the processor is not required by the XML specification to continue processing; doing so is however an optional feature and further processing would be implementation-defined. So this scenario is unconstrained by the XML specifications. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
RE: If you want Fat Pings just use Atom!
Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? This would be an excellent idea if what we were talking about was a low volume site. However, a site like LiveJournal generates hundreds of updates per minute. Right now, on a Sunday evening, they are updating at the rate of 349 entries per minute. During peak periods, they generate much more traffic. Generating 349 POST messages per minute to perhaps 10 or 15 different services means that they would be pumping out thousands of these things per minute. It just isn't reasonable. Using an open TCP/IP socket to carry a stream of Atom Entries results in much greater efficiencies with much reduced bandwidth and processing requirements. At PubSub, we've been experimentally providing Fat Ping versions of our FeedMesh feeds to a small group of testers. We publish messages at a rate much higher than LiveJournal does -- since we publish all of LiveJournal's content plus everyone else's. We couldn't even consider Fat Pings if we had to create and tear down a TCP/IP-HTTP session to post each individual entry. There are many situations in which HTTP would work fine for Fat Pings. However, for high-volume sites, it just isn't reasonable. The key, to me, is that we establish the expectation that the Atom format is adequate to the task (whatever the transport) and leave the transport selection as a context dependent decision. Thus, some server/client pairs would exchange streams of Atom entries using the POST based Atom Publishing Protocol while others would exchange essentially the same streams using a more efficient transport mechanism such as streaming raw sockets or even Atom over XMPP. bob wyman
RE: If you want Fat Pings just use Atom!
Aristotle Pagaltzis wrote: I wonder how you would make sure that the document is well-formed. Since the stream never actually ends and there is no way for a client to signal an intent to close the connection, the feed at the top would never actually be accompanied by a /feed at the bottom. This is a problem which has become well understood in the use and implementation of the XMPP/Jabber protocols which are based on streaming XML. Basically, what you do is consider the open tag to have a virtual closure and use it primarily as a carrier of stream metadata. In XMPP terminology, your code works at picking stanzas out of the stream that can be parsed successfully or unsuccessfully on their own. In an Atom stream, the processor would consider each atom:entry to be a parseable atomic unit. If you accept that the stream can never be a complete well-formed document, is there any reason not to simply send a stream of concatenated Atom Entry Documents? That would seem like the absolute simplest solution. You could certainly do that, however, you will inevitably want to pass across some stream oriented metadata and you'll eventually realize that much of it is stuff that you can map into an Atom Feed. (i.e. created date, unique stream id, stream title, etc.). Since we're all in the process of learning how to deal with atom:feed elements anyway, why not just reuse what we've got instead of inventing something new. A rather nice side effect of forming the stream as an atom feed is the simple fact that a log of the stream can be written to disk as a well-formed Atom file. Thus, the same tools that you usually use to parse Atom files can be used to parse the log of the stream. It is nice to be able to reuse tools in this way... (Note: At PubSub, the atom files that we serve to people are, in essence, just slightly stripped logs of the proto-Atom over XMPP streams that they would have received if they had been listening with that protocol. In our clients we can use the same parser for the stream as we do for atom files. It works out nicely and elegantly.) bob wyman
Re: If you want Fat Pings just use Atom!
On 8/21/05, Bob Wyman [EMAIL PROTECTED] wrote: Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? This would be an excellent idea if what we were talking about was a low volume site. However, a site like LiveJournal generates hundreds of updates per minute. Right now, on a Sunday evening, they are updating at the rate of 349 entries per minute. During peak periods, they generate much more traffic. Generating 349 POST messages per minute to perhaps 10 or 15 different services means that they would be pumping out thousands of these things per minute. It just isn't reasonable. Using an open TCP/IP socket to carry a stream of Atom Entries results in much greater efficiencies with much reduced bandwidth and processing requirements. Why can't you keep that socket open, that is the default behavior for HTTP 1.1. -joe -- Joe Gregoriohttp://bitworking.org
Re: If you want Fat Pings just use Atom!
Bob Wyman wrote: Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? This would be an excellent idea if what we were talking about was a low volume site. However, a site like LiveJournal generates hundreds of updates per minute. Right now, on a Sunday evening, they are updating at the rate of 349 entries per minute. During peak periods, they generate much more traffic. Generating 349 POST messages per minute to perhaps 10 or 15 different services means that they would be pumping out thousands of these things per minute. It just isn't reasonable. Using an open TCP/IP socket to carry a stream of Atom Entries results in much greater efficiencies with much reduced bandwidth and processing requirements. At PubSub, we've been experimentally providing Fat Ping versions of our FeedMesh feeds to a small group of testers. We publish messages at a rate much higher than LiveJournal does -- since we publish all of LiveJournal's content plus everyone else's. We couldn't even consider Fat Pings if we had to create and tear down a TCP/IP-HTTP session to post each individual entry. There are many situations in which HTTP would work fine for Fat Pings. However, for high-volume sites, it just isn't reasonable. The key, to me, is that we establish the expectation that the Atom format is adequate to the task (whatever the transport) and leave the transport selection as a context dependent decision. Thus, some server/client pairs would exchange streams of Atom entries using the POST based Atom Publishing Protocol while others would exchange essentially the same streams using a more efficient transport mechanism such as streaming raw sockets or even Atom over XMPP. First off, as a general FYI, take a look at PaceSimpleNotify... the current version uses basic HTTP POSTs to send one or more individual atom:entry's to a remote endpoint. I'm hoping that the folks on the protocol list will pick this up in discussion in the near future as it is something that I definitely want to see incorporated. Secondly, I believe that the format is more than adequate to support this kind of mechanism. I do not believe that Brad's atomStream container is necessary. Either just stream a bunch of atom:feed or atom:entry elements directly over an open TCP/IP or a persistent (keep-alive) HTTP connection. By no means would I ever suggest a new HTTP connection for each ping. - James
Re: If you want Fat Pings just use Atom!
* Bob Wyman [EMAIL PROTECTED] [2005-08-22 04:00]: Basically, what you do is consider the open tag to have a virtual closure and use it primarily as a carrier of stream metadata. Shades of SGML… You could certainly do that, however, you will inevitably want to pass across some stream oriented metadata and you'll eventually realize that much of it is stuff that you can map into an Atom Feed OT1H, you could put this data in the stream as an empty but complete Atom Feed Document served as the first complete entity in the feed – A rather nice side effect of forming the stream as an atom feed is the simple fact that a log of the stream can be written to disk as a well-formed Atom file. – but OTOH this is a pretty good point. Of course, the question is whether it is really any more work to receive an empty Atom Feed Document + X * Atom Entry Documents and to insert the Entry Documents into the Feed Document for storage. Note that in the case of prepending an empty Atom Feed Document, all fully received Documents are well-formed entities of their own, so you don’t need a recovering XML parser that can implement the “virtual closing element” semantic – all entities can be processed with any run-of-the-mill XML parser. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
RE: If you want Fat Pings just use Atom!
Joe Gregorio wrote: Why can't you keep that socket open, that is the default behavior for HTTP 1.1. In some applications, HTTP 1.1 will work just fine. However, HTTP doesn't add much to the high volume case. It also costs a great deal. For instance, every POST requires a response. This means that you're moving from a pure streaming case to an endless sequence of application level ACK/NAKs that are simply replicating what TCP/IP already does for you. Also, the HTTP headers that would be required simply don't contribute anything useful. The bandwidth overhead of the additional headers as well as the bandwidth, processing and timing problems related to generating responses begins to look pretty nasty when you're moving at hundreds of items per minute or second... One really good reason for using HTTP would be to exploit the existing HTTP infrastructure including proxies, caches, application-level firewalls, etc. However, I'm aware of no such infrastructure components that are designed to handle well permanently open high-bandwidth connections. The HTTP infrastructure is optimized around the normal uses of HTTP. This isn't normal. One of the really irritating things about the current HTTP infrastructure is that it is very fragile. This is a problem that has caused unlimited headaches for the folk trying to do notification over HTTP (mod-pubsub, KnowNow, various HTTP-based IM/chat systems, etc.). The problem is that HTTP connections, given the current infrastructure and standard components, are very hard to keep open permanently or for a very long period of time. One is often considered lucky if you can keep an HTTP connection open for 5 minutes without having to re-initialize... Of course, during the period between when your connection breaks and when you get it re-established, you're losing packets. That means that you have to have a much more robust mechanism for recovering lost messages and that means increased complexity, network traffic, etc. The added complexity and trouble can be justified in some cases; however, not in all cases. HTTP is great in some cases but not all. That's why the IETF has defined BEEP, XMPP, SIP, SIMPLE, etc. in addition to HTTP. One protocol model simply can't suit all needs at all times and in all contexts. Whatever... The point here is that Atom already has defined all that appears to be needed in order to address the Fat Ping requirement whether you prefer individual HTTP POSTs, POSTs over HTTP 1.1 connections, XMPP, or raw open TCP/IP sockets. That is a good thing. bob wyman
RE: If you want Fat Pings just use Atom!
Aristotle Pagaltzis wrote: Shades of SGML. No! No! Not that! :-) He continues with: ... many good points Basically, there are many really easy ways that one can handle streams of Atom entries. You could prepend an empty feed to the head of the stream, you could use virtual end-tags, you could just send entries and rely on the receiver to wrap them up as required, etc... But, since all of these are really easy and none of them really gets in the way of anything rational that I can imagine someone wanting to do, why not just default to doing it the way it is defined in the Atom spec? In that way, we don't have to create one more context-dependent distinction between formats. Complexity is reduced and we can avoid having to read yet-another-specification that looks very, very much like hundreds we've read before. If Atom provides all we need, lets not do something else unless there is a *very* good argument to do so. bob wyman
Re: If you want Fat Pings just use Atom!
* Bob Wyman [EMAIL PROTECTED] [2005-08-22 05:25]: If Atom provides all we need, lets not do something else unless there is a *very* good argument to do so. I’m not inventing anything. Atom Entry Documents are part of the spec and Atom Feed Documents may legally be empty. And a consumer of a stream according to your proposition does not get away without implementing some special ability to be able to interpret it correctly anyway – you are using “plain Atom” at the cost of requiring a specifically abled XML parser. My proposition requires a trivial amount of extra semantics implemented at the application logic level; yours requires extra semantics at the protocol logic level (the protocol being XML). I think doing this in application logic is so cheap that reaching for the protocol logic is unwarranted. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: If you want Fat Pings just use Atom!
Bob Wyman wrote: Basically, there are many really easy ways that one can handle streams of Atom entries. You could prepend an empty feed to the head of the stream, you could use virtual end-tags, you could just send entries and rely on the receiver to wrap them up as required, etc... But, since all of these are really easy and none of them really gets in the way of anything rational that I can imagine someone wanting to do, why not just default to doing it the way it is defined in the Atom spec? In that way, we don't have to create one more context-dependent distinction between formats. Complexity is reduced and we can avoid having to read yet-another-specification that looks very, very much like hundreds we've read before. If Atom provides all we need, lets not do something else unless there is a *very* good argument to do so. bob wyman +1. The basic format gives us everything we need to enable this. Even looking back over my PaceSimpleNotify proposal.. in which I introduce a notification element used to identify the action that has occurred on the element (e.g. create, update or delete), I can see that there really is no need to have that element. - James
Re: If you want Fat Pings just use Atom!
A. Pagaltzis wrote: * Bob Wyman [EMAIL PROTECTED] [2005-08-22 01:05]: What do you think? Is there any conceptual problem with streaming basic Atom over TCP/IP, HTTP continuous sessions (probably using chunked content) etc.? I wonder how you would make sure that the document is well-formed. Since the stream never actually ends and there is no way for a client to signal an intent to close the connection, the feed at the top would never actually be accompanied by a /feed at the bottom. If you accept that the stream can never be a complete well-formed document, is there any reason not to simply send a stream of concatenated Atom Entry Documents? That would seem like the absolute simplest solution. I think the keyword in the above is complete. SAX is a popular API for dealing with streaming XML (and there are a number of pull parsing APIs too). It makes individual elements available to your application as they are read. If at any point, the SAX parser determines that your feed is not well formed, it throws an error at that point. With a HTTP client library and SAX, the absolute simplest solution is what Bob is describing: a single document that never completes. Note that if your application were to discard all the data it receives before it encouters the first entry, the stream from there on out would be identical. - Sam Ruby
Re: If you want Fat Pings just use Atom!
Joe Gregorio wrote: Why not POST the Atom Entry, ala the Atom Publishing Protocol? Essentially, LiveJournal is making this data available to anybody who wishes to access it, without any need to register or to invent a unique API. I can, and have, accessed the LiveJournal stream from behind both a firewall and a NAT device. Doing so requires the client to initiate the request. Therefore, if you really wanted to turn this around, the client would need to initiate a POST, and the server would need to return the Fat Pings as the response. I talked to Brad - in fact, I had independently made the same suggestion that Bob did. Brad indicated that if there were clients with different requirements, he was amenable to accommodating each - endpoints are cheap. - Sam Ruby
Re: If you want Fat Pings just use Atom!
* Sam Ruby [EMAIL PROTECTED] [2005-08-22 06:45]: SAX is a popular API for dealing with streaming XML (and there are a number of pull parsing APIs too). Of course – using a DOM parser is impossible with either approach anyway. With a HTTP client library and SAX, the absolute simplest solution is what Bob is describing: a single document that never completes. That can be argued both ways, I think. The important point is that the connection can and almost certainly will be closed in the middle of an entry document. With a single, endless document, the application will have to backtrack, discarding events until it has thrown away the last seen start-element event for the incomplete entry, then close the feed element. With a series of concatenated complete documents, it can simply discard everything that belongs to the current incomplete Entry Document. There are implicit checkpoints in the stream. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/