RE: PaceArchiveDocument posted

2005-02-07 Thread Walter Underwood

I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.

So, it is not possible for the current doc to fulfill the charter, and
this document is not ready for last call.

wunder

--On February 6, 2005 2:00:20 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote:

 
 -1.
   The use cases for archiving have not been well defined or well
 discussed on this list. It is, I believe, inappropriate and unwise to try to
 rush through something this major at the last moment before a pending Last
 Call.
 
   bob wyman
 
 
 



--
Walter Underwood
Principal Architect, Verity



Re: PaceArchiveDocument posted

2005-02-07 Thread Robert Sayre
Walter Underwood wrote:
I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.
We have feed documents. A series of feed documents makes an archive. I 
don't see why we need atom:archive after all.

Robert Sayre


Re: PaceArchiveDocument posted

2005-02-07 Thread James M Snell
+1.  We need to at least discuss the model a bit more before agreeing to
a syntax.  As with all things, there are many different ways we can do
this -- a new top level elements, the @profile attribute Mark and I have
been pitching, etc -- but unless we identify the general requirements
and a general model that we're shooting for, whatever we scrap together
here at the last minute is not going to be completely adequate.
- James M Snell
Walter Underwood wrote:
I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.
So, it is not possible for the current doc to fulfill the charter, and
this document is not ready for last call.
wunder
--On February 6, 2005 2:00:20 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote:

-1.
The use cases for archiving have not been well defined or well
discussed on this list. It is, I believe, inappropriate and unwise to try to
rush through something this major at the last moment before a pending Last
Call.
bob wyman



--
Walter Underwood
Principal Architect, Verity




Re: PaceArchiveDocument posted

2005-02-07 Thread Antone Roundy
On Monday, February 7, 2005, at 10:06  AM, Robert Sayre wrote:
Walter Underwood wrote:
I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.
We have feed documents. A series of feed documents makes an archive. I 
don't see why we need atom:archive after all.

If Atom Documents are not allowed to contain multiple instances of a 
particular resource, then archiving the states of an entry would 
require the feed to be split into more, smaller feed documents any time 
an entry is edited without many other entries being published in 
between edits.  For example, if you published an entry, decided to 
revise a paragraph and published again, found a misspelling and 
published again, and then fixed and published another misspelling, 
you'd need four documents, two of which would only contain one entry.  
Doable, yes.  But a little ugly.

If, on the other hand, a feed document is a sliding window into the 
historical states on entries (and thus, must allow multiple instances 
of particular entries), and if we don't want to archive the state of 
the feed, then we don't need a separate archive document type.  The 
latter seems likely to be supported by the WG, but the former does not. 
 I'd rather have an archive document type, and not repeat entries in 
normal feeds.



RE: PaceArchiveDocument posted

2005-02-07 Thread Bob Wyman

Antone Roundy wrote:
   entry
   id revision=3foo:bar/a/id
   ...
   /entry
 ...where @revision is a number whose only requirement is that the 
 number for a later revision be greater than the number for an
 earlier revision, but skipping numbers is allowed.
Providing an explicit revision number exclusively in atom:archives
has both advantages and costs.
If we assume that revision numbers start at 0 or 1 and increase
monotonically, then revision numbers gets us:
1. The ability to name or explicitly identify different versions of
an entry.
2. The ability to determine the order in which entries were written
-- independent of document order.
3. The ability to detect missing entry versions.
The cost of the three benefits above is, of course, the increased
complexity that comes from needing to maintain the version number associated
with an entry. Given that revision is an attribute which is not to be
stored in a normal feed document, this means that the feed document itself
cannot be used as the primary entry storage mechanism -- as is the case in
some systems today. In order to maintain revision numbers, a site that
provided an archive would have to have storage/memory external to the feed
document. The feed document could not be considered a complete
representation of even the current the state of the feed since part of the
state of the feed (the revision numbers of the current entries) would be
stored externally to the feed.

The first two benefits of version numbers can be had without a
requirement for maintaining any state if we make the version number a
DateTime. The requirement for saving state external to the feed document
could be removed by simply permitting the revision number to appear in the
feed document.
Is the third benefit -- detection of missing entries -- worth the
cost of requiring that state be maintained? Is it worth it in light of the
fact that a stateless alternative exists that provides all the other
benefits?
Is a revision attribute such a bad thing that it is really
necessary to increase the complexity of the system by requiring that it be
stored and maintained external to the feed document itself? Wouldn't it be
easier to just allow sites that archive to include the revision number in
their feed documents?
Given the two arguments above, it would seem that atom:modified
(must be updated on the change of any byte in an entry) would provide all of
the benefits that appear to be desired with the exception of missing entry
detection. 
Of course, if we went a step further and said that the unique
identifier for an entry was the concatenation of atom:id + (atom:revision OR
atom:modified), then we would no longer require the archive document type at
all...
But, I won't go there... 

bob wyman




Re: PaceArchiveDocument posted

2005-02-07 Thread James M Snell
Hmm... ok, at this point we have a point of disagreement.  I see 
archiving individual entries as being more important (or at least 
equally important) as archiving feeds.

Example: my weblog is a collection of entries, not a collection of 
feeds.  The feed published by my weblog is just a snapshot of a given 
point in time designed to allow others to read my entries without 
visiting my site.  When I archive, what I want to archive are the 
entries, not the feed.  archivefeed /feed //archive  does not 
make any sense to me.  archiveentry /entry //archive does. 
(ignore the angle brackets for now, I'm not trying to promote the idea 
of the archive element, I'm just illustrating my point).

Now, if I had my way, I would *probably* spec this out as:
* Archives work fundamentally on the entry level.
* An archived entry consists of all versions of the entry.
* Feeds are only archived as they relate to archived entries. For 
example, if a given entry has an associated comments feed, or some form 
of nested discussion feed, etc.  Such feeds are logically considered 
part of the metadata of the entry.

- James M Snell
Robert Sayre wrote:
Walter Underwood wrote:
I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.
We have feed documents. A series of feed documents makes an archive. I 
don't see why we need atom:archive after all.

Robert Sayre




Re: PaceArchiveDocument posted

2005-02-07 Thread Robert Sayre
James M Snell wrote:
+1.  We need to at least discuss the model a bit more before agreeing to
a syntax.  As with all things, there are many different ways we can do
this -- a new top level elements, the @profile attribute Mark and I have
been pitching, etc -- but unless we identify the general requirements
and a general model that we're shooting for, whatever we scrap together
here at the last minute is not going to be completely adequate.
The word archive shouldn't have been in the charter. Since no one even 
bothered to define the term over the past 8 months, I will do it now.

Archiving -- It must be possible to serialize a complete collection of 
entries using the Atom Format.

Robert Sayre


Re: PaceArchiveDocument posted

2005-02-07 Thread Antone Roundy
Collecting a bunch of recent discussion into one document, how about 
these for a set of terms and their meanings:

* Entry: An abstract term describing a unit of content and metadata 
associated with it.
* Entry Representation: A representation of a particular state of a 
particular entry.
* Entry Document: A document, whose document element is entry, which 
contains a single Entry Representation.
* Feed: An abstract term describing a stream of Entry Representations.
* Feed Document: No such thing--replaced by Collection Documents and 
Archive Documents.
* Collection: Entry Representations of the current states of the 
Entries from a Feed.
* Collection Document: A document, whose document element is 
collection, which contains a Collection or a portion of a Collection.
* Archive: Entry Representations of the historical states of the 
Entries from a Feed.
* Archive Document: A document, whose document element is archive, 
which contains an Archive or a portion of an Archive.  Archive 
documents may contain multiple Entry Representations of the same Entry.

Publishers may choose to publish only Collection Documents, only 
Archive Documents, only Entry Documents, or any combination of these.  
So, for example, a publisher who does not track the historical states 
of their Entries might publish only a Collection Document.  A publisher 
who DOES track the historical states of their Entries might publish 
only an Archive Document, or both Collection and Archive Documents, or 
only a Collection Document.

Processing an Archive Document would be slightly more complicated than 
processing a Collection Document for clients that don't track the 
history of a Feed--for example, scripts that simply display the current 
contents of a Document on a website, because they might need to choose 
one from among multiple Entry Representations of the same Entry for 
display or other processing.

On Monday, February 7, 2005, at 10:13  AM, James M Snell wrote:
+1.  We need to at least discuss the model a bit more before agreeing 
to
a syntax.  As with all things, there are many different ways we can do
this -- a new top level elements, the @profile attribute Mark and I 
have
been pitching, etc -- but unless we identify the general requirements
and a general model that we're shooting for, whatever we scrap together
here at the last minute is not going to be completely adequate.

- James M Snell
Walter Underwood wrote:
I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.
So, it is not possible for the current doc to fulfill the charter, and
this document is not ready for last call.
wunder
--On February 6, 2005 2:00:20 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote:
-1.
	The use cases for archiving have not been well defined or well
discussed on this list. It is, I believe, inappropriate and unwise 
to try to
rush through something this major at the last moment before a 
pending Last
Call.

bob wyman

--
Walter Underwood
Principal Architect, Verity




Re: PaceArchiveDocument posted

2005-02-07 Thread James M Snell
Yuck. I don't like the granularity of that at all.  I can see checking 
in individual entries, but not a single feed with every entry. What if 
I'm just changing the value of a single title attribute? Should I have 
to regenerate the entire feed and check in the entire feed just to 
update the archive for one minor edit?  That would be kind of like 
taking all the source code for a given project, zipping it up into a 
single zip file and checking that into subversion.  Sure, it would work, 
but dang that's nasty (not that it's not being done).  Better to archive 
individual entries.  Sure, archiving of feeds can be done also, but 
archiving feeds is separate from archiving entries.  They are separate 
documents, treat them separately.

- James M Snell
Robert Sayre wrote:
Antone Roundy wrote:

If Atom Documents are not allowed to contain multiple instances of a 
particular resource, then archiving the states of an entry would 
require the feed to be split into more, smaller feed documents any 
time an entry is edited without many other entries being published in 
between edits.  For example, if you published an entry, decided to 
revise a paragraph and published again, found a misspelling and 
published again, and then fixed and published another misspelling, 
you'd need four documents, two of which would only contain one entry.  
Doable, yes.  But a little ugly.

No, I'm saying I would regenerate a feed with every entry in it every 
time I make a change. Then, I would check it into Subversion.

Robert Sayre




Re: PaceArchiveDocument posted

2005-02-07 Thread Antone Roundy
On Monday, February 7, 2005, at 10:26  AM, Bob Wyman wrote:
Antone Roundy wrote:
entry
id revision=3foo:bar/a/id
...
/entry
...where @revision is a number whose only requirement is that the
number for a later revision be greater than the number for an
earlier revision, but skipping numbers is allowed.
	Providing an explicit revision number exclusively in atom:archives
has both advantages and costs.
	If we assume that revision numbers start at 0 or 1 and increase
monotonically, then revision numbers gets us:
	1. The ability to name or explicitly identify different versions of
an entry.
	2. The ability to determine the order in which entries were written
-- independent of document order.
	3. The ability to detect missing entry versions.
	The cost of the three benefits above is, of course, the increased
complexity that comes from needing to maintain the version number 
associated
with an entry.
Only if the version number for a particular Entry Representation needs 
to remain constant.  I'd be fine with it either way--simplicity and 
only #2 (and limited #1--you can do it within the context of a 
particular instance of the archive, but can't be sure it won't change 
when you download the archive again) vs. storing the version number and 
getting all 3.

The first two benefits of version numbers can be had without a
requirement for maintaining any state if we make the version number a
DateTime.
Of course, you'd have to store the DateTime.  It's more likely that 
people are already doing that, so for those that are, this would be 
preferable.  Those who aren't are likely not storing the historical 
states of the entry at all.

	Is a revision attribute such a bad thing that it is really
necessary to increase the complexity of the system by requiring that 
it be
stored and maintained external to the feed document itself? Wouldn't 
it be
easier to just allow sites that archive to include the revision number 
in
their feed documents?
I'd have no problem with allowing that.
	Given the two arguments above, it would seem that atom:modified
(must be updated on the change of any byte in an entry) would provide 
all of
the benefits that appear to be desired with the exception of missing 
entry
detection.
True.  Rather than going back into the whole discussion of dates, we 
could let people who want to get benefits #1 and #2 can store 
dc:modified (since atom:modified) doesn't currently exist, and those 
who don't do that can either invent their own extension like @revision. 
 So archive documents as defined in Atom wouldn't include either, and 
the market would show us which would prevail.

I'm feeling a little apathetic about exactly how we do it at the moment.


Re: PaceArchiveDocument posted

2005-02-07 Thread Henry Story
I think that the complexity that this proposal is proof of its failure.
If you look at a Feed document as simply a sliding window view into
the historical state of entries instead a sliding window view into the
current state of entries (though as I have shown these can overlap),`
then you have your archive document already.
HELLO GUYS/GALS YOU ARE THERE AT THE FINISH LINE. IT ALL WORKS!
One of the arguments against the sliding window view in the historical
state of entries is that it was too complicated. But clearly not going
that way is making things WAY MORE COMPLICATED.
So before proceeding any further it may be worth now comparing the
complexity of both proposals in detail. My guess is that the historical 
one is just a little surprising, but that is all.

Henry Story


Re: PaceArchiveDocument posted

2005-02-07 Thread Henry Story
On 7 Feb 2005, at 18:29, Antone Roundy wrote:
The latter seems likely to be supported by the WG, but the former does 
not.  I'd rather have an archive document type, and not repeat entries 
in normal feeds.
I don't think the historical sliding window view forces you at all
to duplicate the entries in your feed. The spec allows you to remove 
all the old versions if you wish. After all the Present time, is just
one element in the sequence of history.

People who only want to live in the present don't negate history. They
just don't remember it.
Henry Story


Re: PaceArchiveDocument posted

2005-02-06 Thread Sam Ruby
Henry Story wrote:
On 6 Feb 2005, at 08:00, Bob Wyman wrote:
-1.
The use cases for archiving have not been well defined or well
discussed on this list. It is, I believe, inappropriate and unwise to 
try to
rush through something this major at the last moment before a pending 
Last
Call.
I agree. Very serious -1 for me.
I think the versioning element of id works very well. I am using it
in my BlogEd model currently. I think the feed works very well as
an archiving format already. No need to change anything that works well.
Furthermore I think one should prove that harm is being done by the
id as version feature (which would be difficult to show, because I have 
implemented this, Bob has implemented it, and neither of us have
come across a problem).
We already have a problem - Bob wanting updated to mean modified.
Pardon me for observing that the two individuals mentioned are above 
average in intelligence, and actually read specifications - an unusual 
trait.

If you produce feeds that contain multiple entries with the same id, 
there will be people who misunderstand such documents.

I do believe that there needs to be some way to say this is not a feed, 
but an archive.

- Sam Ruby



Re: PaceArchiveDocument posted

2005-02-05 Thread James M Snell
Hmm.. I'm sorry but this just seems wierd to me.
archive
  head.../head
  feed
entry
  idid:version1/id
/entry
  /feed
  feed
entry
  idid:version2/id
/entry
  /feed
  feed
entry
  idid:version3/id
/entry
  /feed
/archive
What is the point of having the feed elements in there at all?  If
entries are indeed able to stand on their own, why not just go ahead and
get rid of the containing feed element altogether?  I mean, it is the
entries that are being archived, not the feeds that just happened to
contain them at some moment in time right?
archive
  head.../head
  entry
idid:version1/id
  /entry
  entry
idid:version2/id
  /entry
  entry
idid:version3/id
  /entry
/archive
I guess I just don't see the point of archiving the feed.
- James M Snell
Antone Roundy wrote:
I'd rather have held off while we discussed further, but as the deadline 
is approaching, here it is.

Abstract
Creates a new option for the document element, archive, which can 
contain multiple feeds or instances of the same feed, in order to 
archive the states of a feed or feeds and the states of the entries 
published while the feed was in each of those states. Specifies that 
multiple instances of a resource with the same atom:id is illegal in 
Feed Documents, Entry Documents, and if PaceAggregationDocument2 as 
adopted, Aggregation Documents, but is legal in Archive Documents.

Rationale
   1. Our charter speaks of creating an archive format.
   2. If we wish to be able to archive multiple revisions of an entry or 
the contents of a feed's head in a single document, we must either 
specify that the atom:id of a resource be repeatable within a document 
intended as an archive, or that we invent some other method of 
identifying multiple instances of the same entry or feed metadata. 
Multiple instances within an archive type document would be simpler.
   3.  Multiple versions of an entry or feed in a non-archive document 
is unprecedented in syndication formats. In spite of the fact that 
changing feed metadata after an entry is published breaks the connection 
between the state of the feed metadata at the time of publishing the 
entry and the entry, this is how feeds have always worked, so no 
exception to the one-feed-instance-per-document rule need be made for 
Aggregation Documents--that is a special case reserved only for archiving.

See http://www.intertwingly.net/wiki/pie/PaceArchiveDocument for more.




RE: PaceArchiveDocument posted

2005-02-05 Thread Bob Wyman

-1.
The use cases for archiving have not been well defined or well
discussed on this list. It is, I believe, inappropriate and unwise to try to
rush through something this major at the last moment before a pending Last
Call.

bob wyman