Re: PaceArchiveDocument posted
On 7 Feb 2005, at 18:29, Antone Roundy wrote: The latter seems likely to be supported by the WG, but the former does not. I'd rather have an archive document type, and not repeat entries in "normal" feeds. I don't think the historical sliding window view forces you at all to duplicate the entries in your feed. The spec allows you to remove all the old versions if you wish. After all the Present time, is just one element in the sequence of history. People who only want to live in the present don't negate history. They just don't remember it. Henry Story
Re: PaceArchiveDocument posted
I think that the complexity that this proposal is proof of its failure. If you look at a Feed document as simply a sliding window view into the historical state of entries instead a sliding window view into the current state of entries (though as I have shown these can overlap),` then you have your archive document already. HELLO GUYS/GALS YOU ARE THERE AT THE FINISH LINE. IT ALL WORKS! One of the arguments against the sliding window view in the historical state of entries is that it was too complicated. But clearly not going that way is making things WAY MORE COMPLICATED. So before proceeding any further it may be worth now comparing the complexity of both proposals in detail. My guess is that the historical one is just a little surprising, but that is all. Henry Story
Re: PaceArchiveDocument posted
Robert Sayre wrote: Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. We have feed documents. A series of feed documents makes an archive. I don't see why we need atom:archive after all. Robert Sayre +1 [1]. Specifically, I think that we can fulfill the charter's 'support for archives' requirement [2] with the existing syntax plus the facilities provided by the protocol. How an archive is stored is something I'd be happy leaving to 'archiver' implementations in much the same way we leave such details to 'aggregator' implementations. I can certainly see value in standardizing document formats for archivers to allow interoperability; but I don't think the benefit is worth the cost, and I don't think the charter requires it. -John [1] Though I actually disagree that just a series of feed documents makes an archive -- what about non-entry resources? Where are my cat pictures? [2] At least for the types of archives I think are important.
Re: PaceArchiveDocument posted
On Monday, February 7, 2005, at 10:26 AM, Bob Wyman wrote: Antone Roundy wrote: foo:bar/a ... ...where @revision is a number whose only requirement is that the number for a later revision be greater than the number for an earlier revision, but skipping numbers is allowed. Providing an explicit revision number exclusively in atom:archives has both advantages and costs. If we assume that revision numbers start at 0 or 1 and increase monotonically, then revision numbers gets us: 1. The ability to name or explicitly identify different versions of an entry. 2. The ability to determine the order in which entries were written -- independent of document order. 3. The ability to detect "missing" entry versions. The cost of the three benefits above is, of course, the increased complexity that comes from needing to maintain the version number associated with an entry. Only if the version number for a particular Entry Representation needs to remain constant. I'd be fine with it either way--simplicity and only #2 (and limited #1--you can do it within the context of a particular instance of the archive, but can't be sure it won't change when you download the archive again) vs. storing the version number and getting all 3. The first two benefits of version numbers can be had without a requirement for maintaining any state if we make the version number a DateTime. Of course, you'd have to store the DateTime. It's more likely that people are already doing that, so for those that are, this would be preferable. Those who aren't are likely not storing the historical states of the entry at all. Is a revision attribute such a "bad thing" that it is really necessary to increase the complexity of the system by requiring that it be stored and maintained external to the feed document itself? Wouldn't it be easier to just allow sites that archive to include the revision number in their feed documents? I'd have no problem with allowing that. Given the two arguments above, it would seem that atom:modified (must be updated on the change of any byte in an entry) would provide all of the benefits that appear to be desired with the exception of "missing" entry detection. True. Rather than going back into the whole discussion of dates, we could let people who want to get benefits #1 and #2 can store dc:modified (since atom:modified) doesn't currently exist, and those who don't do that can either invent their own extension like @revision. So archive documents as defined in Atom wouldn't include either, and the market would show us which would prevail. I'm feeling a little apathetic about exactly how we do it at the moment.
Re: PaceArchiveDocument posted
Yuck. I don't like the granularity of that at all. I can see checking in individual entries, but not a single feed with every entry. What if I'm just changing the value of a single title attribute? Should I have to regenerate the entire feed and check in the entire feed just to update the archive for one minor edit? That would be kind of like taking all the source code for a given project, zipping it up into a single zip file and checking that into subversion. Sure, it would work, but dang that's nasty (not that it's not being done). Better to archive individual entries. Sure, archiving of feeds can be done also, but archiving feeds is separate from archiving entries. They are separate documents, treat them separately. - James M Snell Robert Sayre wrote: Antone Roundy wrote: If Atom Documents are not allowed to contain multiple instances of a particular resource, then archiving the states of an entry would require the feed to be split into more, smaller feed documents any time an entry is edited without many other entries being published in between edits. For example, if you published an entry, decided to revise a paragraph and published again, found a misspelling and published again, and then fixed and published another misspelling, you'd need four documents, two of which would only contain one entry. Doable, yes. But a little ugly. No, I'm saying I would regenerate a feed with every entry in it every time I make a change. Then, I would check it into Subversion. Robert Sayre
Re: PaceArchiveDocument posted
Collecting a bunch of recent discussion into one document, how about these for a set of terms and their meanings: * Entry: An abstract term describing a unit of content and metadata associated with it. * Entry Representation: A representation of a particular state of a particular entry. * Entry Document: A document, whose document element is , which contains a single Entry Representation. * Feed: An abstract term describing a stream of Entry Representations. * Feed Document: No such thing--replaced by Collection Documents and Archive Documents. * Collection: Entry Representations of the current states of the Entries from a Feed. * Collection Document: A document, whose document element is , which contains a Collection or a portion of a Collection. * Archive: Entry Representations of the historical states of the Entries from a Feed. * Archive Document: A document, whose document element is , which contains an Archive or a portion of an Archive. Archive documents may contain multiple Entry Representations of the same Entry. Publishers may choose to publish only Collection Documents, only Archive Documents, only Entry Documents, or any combination of these. So, for example, a publisher who does not track the historical states of their Entries might publish only a Collection Document. A publisher who DOES track the historical states of their Entries might publish only an Archive Document, or both Collection and Archive Documents, or only a Collection Document. Processing an Archive Document would be slightly more complicated than processing a Collection Document for clients that don't track the history of a Feed--for example, scripts that simply display the current contents of a Document on a website, because they might need to choose one from among multiple Entry Representations of the same Entry for display or other processing. On Monday, February 7, 2005, at 10:13 AM, James M Snell wrote: +1. We need to at least discuss the model a bit more before agreeing to a syntax. As with all things, there are many different ways we can do this -- a new top level elements, the @profile attribute Mark and I have been pitching, etc -- but unless we identify the general requirements and a general model that we're shooting for, whatever we scrap together here at the last minute is not going to be completely adequate. - James M Snell Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. So, it is not possible for the current doc to fulfill the charter, and this document is not ready for last call. wunder --On February 6, 2005 2:00:20 AM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote: -1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. bob wyman -- Walter Underwood Principal Architect, Verity
Re: PaceArchiveDocument posted
James M Snell wrote: +1. We need to at least discuss the model a bit more before agreeing to a syntax. As with all things, there are many different ways we can do this -- a new top level elements, the @profile attribute Mark and I have been pitching, etc -- but unless we identify the general requirements and a general model that we're shooting for, whatever we scrap together here at the last minute is not going to be completely adequate. The word "archive" shouldn't have been in the charter. Since no one even bothered to define the term over the past 8 months, I will do it now. Archiving -- It must be possible to serialize a complete collection of entries using the Atom Format. Robert Sayre
Re: PaceArchiveDocument posted
Antone Roundy wrote: If Atom Documents are not allowed to contain multiple instances of a particular resource, then archiving the states of an entry would require the feed to be split into more, smaller feed documents any time an entry is edited without many other entries being published in between edits. For example, if you published an entry, decided to revise a paragraph and published again, found a misspelling and published again, and then fixed and published another misspelling, you'd need four documents, two of which would only contain one entry. Doable, yes. But a little ugly. No, I'm saying I would regenerate a feed with every entry in it every time I make a change. Then, I would check it into Subversion. Robert Sayre
Re: PaceArchiveDocument posted
Hmm... ok, at this point we have a point of disagreement. I see archiving individual entries as being more important (or at least equally important) as archiving feeds. Example: my weblog is a collection of entries, not a collection of feeds. The feed published by my weblog is just a snapshot of a given point in time designed to allow others to read my entries without visiting my site. When I archive, what I want to archive are the entries, not the feed.does not make any sense to me. does. (ignore the angle brackets for now, I'm not trying to promote the idea of the archive element, I'm just illustrating my point). Now, if I had my way, I would *probably* spec this out as: * Archives work fundamentally on the entry level. * An archived entry consists of all versions of the entry. * Feeds are only archived as they relate to archived entries. For example, if a given entry has an associated comments feed, or some form of nested discussion feed, etc. Such feeds are logically considered part of the metadata of the entry. - James M Snell Robert Sayre wrote: Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. We have feed documents. A series of feed documents makes an archive. I don't see why we need atom:archive after all. Robert Sayre
RE: PaceArchiveDocument posted
Antone Roundy wrote: > > foo:bar/a > ... > > ...where @revision is a number whose only requirement is that the > number for a later revision be greater than the number for an > earlier revision, but skipping numbers is allowed. Providing an explicit revision number exclusively in atom:archives has both advantages and costs. If we assume that revision numbers start at 0 or 1 and increase monotonically, then revision numbers gets us: 1. The ability to name or explicitly identify different versions of an entry. 2. The ability to determine the order in which entries were written -- independent of document order. 3. The ability to detect "missing" entry versions. The cost of the three benefits above is, of course, the increased complexity that comes from needing to maintain the version number associated with an entry. Given that "revision" is an attribute which is not to be stored in a normal feed document, this means that the feed document itself cannot be used as the primary entry storage mechanism -- as is the case in some systems today. In order to maintain revision numbers, a site that provided an archive would have to have storage/memory external to the feed document. The feed document could not be considered a complete representation of even the current the state of the feed since part of the state of the feed (the revision numbers of the current entries) would be stored externally to the feed. The first two benefits of version numbers can be had without a requirement for maintaining any state if we make the version number a DateTime. The requirement for saving state external to the feed document could be removed by simply permitting the revision number to appear in the feed document. Is the third benefit -- detection of missing entries -- worth the cost of requiring that state be maintained? Is it worth it in light of the fact that a "stateless" alternative exists that provides all the other benefits? Is a revision attribute such a "bad thing" that it is really necessary to increase the complexity of the system by requiring that it be stored and maintained external to the feed document itself? Wouldn't it be easier to just allow sites that archive to include the revision number in their feed documents? Given the two arguments above, it would seem that atom:modified (must be updated on the change of any byte in an entry) would provide all of the benefits that appear to be desired with the exception of "missing" entry detection. Of course, if we went a step further and said that the unique identifier for an entry was the concatenation of atom:id + (atom:revision OR atom:modified), then we would no longer require the archive document type at all... But, I won't go there... bob wyman
Re: PaceArchiveDocument posted
On Monday, February 7, 2005, at 10:06 AM, Robert Sayre wrote: Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. We have feed documents. A series of feed documents makes an archive. I don't see why we need atom:archive after all. If Atom Documents are not allowed to contain multiple instances of a particular resource, then archiving the states of an entry would require the feed to be split into more, smaller feed documents any time an entry is edited without many other entries being published in between edits. For example, if you published an entry, decided to revise a paragraph and published again, found a misspelling and published again, and then fixed and published another misspelling, you'd need four documents, two of which would only contain one entry. Doable, yes. But a little ugly. If, on the other hand, a feed document is a sliding window into the historical states on entries (and thus, must allow multiple instances of particular entries), and if we don't want to archive the state of the feed, then we don't need a separate archive document type. The latter seems likely to be supported by the WG, but the former does not. I'd rather have an archive document type, and not repeat entries in "normal" feeds.
Re: PaceArchiveDocument posted
+1. We need to at least discuss the model a bit more before agreeing to a syntax. As with all things, there are many different ways we can do this -- a new top level elements, the @profile attribute Mark and I have been pitching, etc -- but unless we identify the general requirements and a general model that we're shooting for, whatever we scrap together here at the last minute is not going to be completely adequate. - James M Snell Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. So, it is not possible for the current doc to fulfill the charter, and this document is not ready for last call. wunder --On February 6, 2005 2:00:20 AM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote: -1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. bob wyman -- Walter Underwood Principal Architect, Verity
Re: PaceArchiveDocument posted
Walter Underwood wrote: I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. We have feed documents. A series of feed documents makes an archive. I don't see why we need atom:archive after all. Robert Sayre
RE: PaceArchiveDocument posted
I agree, but I would put it another way. The charter requires support for archives, but we don't have a clear model for those. Without a model, we can't spec syntax. So, it is not possible for the current doc to fulfill the charter, and this document is not ready for last call. wunder --On February 6, 2005 2:00:20 AM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote: > > -1. > The use cases for archiving have not been well defined or well > discussed on this list. It is, I believe, inappropriate and unwise to try to > rush through something this major at the last moment before a pending Last > Call. > > bob wyman > > > -- Walter Underwood Principal Architect, Verity
Re: PaceArchiveDocument posted
On Sunday, February 6, 2005, at 09:35 AM, Sam Ruby wrote: Graham wrote: On 6 Feb 2005, at 3:39 pm, Sam Ruby wrote: If you produce feeds that contain multiple entries with the same id, there will be people who misunderstand such documents. I do believe that there needs to be some way to say "this is not a feed, but an archive". Solution: A new atom:archive top level element with normal head and entry children. Difference is multiple instances of entries are allowed with the same id. Each is distinguished by a new atom:version date, that has no semantics other than determining which instance is the newest. Meanwhile, atom:version and multiple instances are explicitly banned from atom:feed. ? Works for me. Presumably, atom:entry elements can have at most one atom:version child element. But, are atom:version elements be required in atom:archives? Okay, so we are punting on defining a way to archive the changing state of feed/head. Making something like the above semi-concrete: ... foo:bar/a ... foo:bar/a ... foo:bar/a ... ... ...where @revision is a number whose only requirement is that the number for a later revision be greater than the number for an earlier revision, but skipping numbers is allowed. @revision could go in instead of , or could be . We're at the deadline for paces. Someone who wants something like this should write one today.
Re: PaceArchiveDocument posted
Antone Roundy wrote: I'd rather have held off while we discussed further, but as the deadline is approaching, here it is. Abstract Creates a new option for the document element, , which can contain multiple feeds or instances of the same feed, in order to archive the states of a feed or feeds and the states of the entries published while the feed was in each of those states. Specifies that multiple instances of a resource with the same atom:id is illegal in Feed Documents, Entry Documents, and if PaceAggregationDocument2 as adopted, Aggregation Documents, but is legal in Archive Documents. Rationale 1. Our charter speaks of creating an archive format. 2. If we wish to be able to archive multiple revisions of an entry or the contents of a feed's head in a single document, we must either specify that the atom:id of a resource be repeatable within a document intended as an archive, or that we invent some other method of identifying multiple instances of the same entry or feed metadata. Multiple instances within an archive type document would be simpler. 3. Multiple versions of an entry or feed in a non-archive document is unprecedented in syndication formats. In spite of the fact that changing feed metadata after an entry is published breaks the connection between the state of the feed metadata at the time of publishing the entry and the entry, this is how feeds have always worked, so no exception to the one-feed-instance-per-document rule need be made for Aggregation Documents--that is a special case reserved only for archiving. See http://www.intertwingly.net/wiki/pie/PaceArchiveDocument for more. I'm +0 on repeated instances for archives, if it's sufficient for some use case and doesn't delay the draft. But, given that I don't know what the use case for this is, I'm concerned that this is going to be the start of a slew of issues. Some people may be expecting to be able to use the "archive format" to store the entire state of their blog/wiki/whatever, at least as a snapshot. But all the current proposals ignore the crucial issue of how I archive the state of my cat pictures. Seriously -- how can I create a self-contained document that includes both entry and non-entry resources which point at each other? [1]. To be clear, I think this is not something we should tackle at this point, but I fear that the word "archive" may conjure up an image of something you could use for, say, backups. Second, if the intent of tracking state is to be able to go back to a prior version of the feed: This ignores the issues of intra-feed links, for example those of comment feed(s). For example, let's say I get an archive of the history of a (comment) feed. Each entry in this feed will have a URI for the parent entry. But, the parent entry itself can change over time -- the URI is for an entry, not an entry instance. For general URIs, this is just something to live with; but one might reasonably expect an archive format to be able to archive an entry plus its comments together, and if it claims to archive historical changes, to be able to get to a 'state of the site' on demand. Finally, how does one obtain an archive document? A new endpoint? -John Panzer http://journals.aol.com/panzerjohn/abstractioneer [1] This might be a good use for multipart/related and cid:. But that's a much bigger kettle of fish.
Re: PaceArchiveDocument posted
Graham wrote: On 6 Feb 2005, at 3:39 pm, Sam Ruby wrote: If you produce feeds that contain multiple entries with the same id, there will be people who misunderstand such documents. I do believe that there needs to be some way to say "this is not a feed, but an archive". Solution: A new atom:archive top level element with normal head and entry children. Difference is multiple instances of entries are allowed with the same id. Each is distinguished by a new atom:version date, that has no semantics other than determining which instance is the newest. Meanwhile, atom:version and multiple instances are explicitly banned from atom:feed. ? Works for me. Presumably, atom:entry elements can have at most one atom:version child element. But, are atom:version elements be required in atom:archives? - Sam Ruby
Re: PaceArchiveDocument posted
Graham wrote: On 6 Feb 2005, at 3:39 pm, Sam Ruby wrote: If you produce feeds that contain multiple entries with the same id, there will be people who misunderstand such documents. I do believe that there needs to be some way to say "this is not a feed, but an archive". Solution: A new atom:archive top level element with normal head and entry children. Difference is multiple instances of entries are allowed with the same id. Each is distinguished by a new atom:version date, that has no semantics other than determining which instance is the newest. Meanwhile, atom:version and multiple instances are explicitly banned from atom:feed. An uncomfortable thought (for me). Perhaps this a use-case for @profile. If Mark is following this, he might be able to comment on that. Mark? cheers Bill
Re: PaceArchiveDocument posted
On 6 Feb 2005, at 3:39 pm, Sam Ruby wrote: If you produce feeds that contain multiple entries with the same id, there will be people who misunderstand such documents. I do believe that there needs to be some way to say "this is not a feed, but an archive". Solution: A new atom:archive top level element with normal head and entry children. Difference is multiple instances of entries are allowed with the same id. Each is distinguished by a new atom:version date, that has no semantics other than determining which instance is the newest. Meanwhile, atom:version and multiple instances are explicitly banned from atom:feed. ? Graham
Re: PaceArchiveDocument posted
Henry Story wrote: On 6 Feb 2005, at 08:00, Bob Wyman wrote: -1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. I agree. Very serious -1 for me. I think the versioning element of id works very well. I am using it in my BlogEd model currently. I think the feed works very well as an archiving format already. No need to change anything that works well. Furthermore I think one should prove that harm is being done by the id as version feature (which would be difficult to show, because I have implemented this, Bob has implemented it, and neither of us have come across a problem). We already have a problem - Bob wanting updated to mean modified. Pardon me for observing that the two individuals mentioned are above average in intelligence, and actually read specifications - an unusual trait. If you produce feeds that contain multiple entries with the same id, there will be people who misunderstand such documents. I do believe that there needs to be some way to say "this is not a feed, but an archive". - Sam Ruby
Re: PaceArchiveDocument posted
On 6 Feb 2005, at 08:00, Bob Wyman wrote: -1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. I agree. Very serious -1 for me. I think the versioning element of id works very well. I am using it in my BlogEd model currently. I think the feed works very well as an archiving format already. No need to change anything that works well. Furthermore I think one should prove that harm is being done by the id as version feature (which would be difficult to show, because I have implemented this, Bob has implemented it, and neither of us have come across a problem). Henry bob wyman
RE: PaceArchiveDocument posted
-1. The use cases for archiving have not been well defined or well discussed on this list. It is, I believe, inappropriate and unwise to try to rush through something this major at the last moment before a pending Last Call. bob wyman
Re: PaceArchiveDocument posted
Hmm.. I'm sorry but this just seems wierd to me. ... id:version1 id:version2 id:version3 What is the point of having the feed elements in there at all? If entries are indeed able to stand on their own, why not just go ahead and get rid of the containing feed element altogether? I mean, it is the entries that are being archived, not the feeds that just happened to contain them at some moment in time right? ... id:version1 id:version2 id:version3 I guess I just don't see the point of archiving the feed. - James M Snell Antone Roundy wrote: I'd rather have held off while we discussed further, but as the deadline is approaching, here it is. Abstract Creates a new option for the document element, , which can contain multiple feeds or instances of the same feed, in order to archive the states of a feed or feeds and the states of the entries published while the feed was in each of those states. Specifies that multiple instances of a resource with the same atom:id is illegal in Feed Documents, Entry Documents, and if PaceAggregationDocument2 as adopted, Aggregation Documents, but is legal in Archive Documents. Rationale 1. Our charter speaks of creating an archive format. 2. If we wish to be able to archive multiple revisions of an entry or the contents of a feed's head in a single document, we must either specify that the atom:id of a resource be repeatable within a document intended as an archive, or that we invent some other method of identifying multiple instances of the same entry or feed metadata. Multiple instances within an archive type document would be simpler. 3. Multiple versions of an entry or feed in a non-archive document is unprecedented in syndication formats. In spite of the fact that changing feed metadata after an entry is published breaks the connection between the state of the feed metadata at the time of publishing the entry and the entry, this is how feeds have always worked, so no exception to the one-feed-instance-per-document rule need be made for Aggregation Documents--that is a special case reserved only for archiving. See http://www.intertwingly.net/wiki/pie/PaceArchiveDocument for more.