Re: PaceAllowDuplicateIDs
Thomas Broyer wrote: David Powell wrote: I'm in favour of allowing duplicate ids. This only seems to be a partial solution though: Their atom:updated timestamps SHOULD be different But what if they are not? What if I want to represent an archive of a feed - maybe mine, maybe someone else's - but the atom:updated dates are the same in two or more entries? I thought it was up to the publisher to decide whether to rev atom:updated. If you don't update atom:updated (e.g. it's not a significant update, fixing typos, etc.), one could (I would) assume you don't want to archive the previous entry state. Archiving was just an example, but the Publisher and the Archiver are different entities. It is up to the Publisher to decide what they consider to be a significant update, and it is up to the Archiver to decide what they want to archive. There are very few chances that you significantly update an entry with the same second I agree, but this proposal distorts the intended meaning of atom:updated, and I think that this risks atom:updated becoming an unreliable indicator for the newness of entries, which is a shame, because it is a useful feature. -- Dave
Re: PaceAllowDuplicateIDs
Don't you have the same problem with atom:modified? What if the publisher does not update the atom:modified entry? I suppose that if you are making an archive of an atom entries and believe that the author has made a mistake with the atom:updated field, you can of course try to correct the mistake artificially in your own feed by increasing the time precision. So if the original entries both claim to have been updated at 12:45 you could have one of them be modified at 12:45:30 and the other at 12:45:31. Just a thought. Henry Story On 12 May 2005, at 01:40, David Powell wrote: I'm in favour of allowing duplicate ids. This only seems to be a partial solution though: Their atom:updated timestamps SHOULD be different But what if they are not? What if I want to represent an archive of a feed - maybe mine, maybe someone else's - but the atom:updated dates are the same in two or more entries? I thought it was up to the publisher to decide whether to rev atom:updated. I was always concerned that the existence of atom:updated without atom:modified would cause the meaning of atom:updated as an alert to be diluted to being equivalent to atom:modified. This proposal would encourage that. It would mean, if you don't update atom:updated, then your entry instances are second class. The restriction forces services that proxy, or re-aggregate feeds to drop entries on the floor, just because the user has chosen not to update atom:updated. atom:updated encourages aggregators to make loud noises when they see a change; anything that encourages atom:updated to be changed just for the sake of it is going to be very annoying to users and make atom:updated useless as an alert flag. I'm in favour of duplicate ids, but unfortunately, the only way I can see them working is if we have atom:modified. I hate to bring this up, especially now, but it would solve some problems, and it is cheap to implement: Is anyone still opposed to atom:modified? -- Dave
Re: PaceAllowDuplicateIDs
David Powell wrote: I'm in favour of allowing duplicate ids. This only seems to be a partial solution though: Their atom:updated timestamps SHOULD be different But what if they are not? What if I want to represent an archive of a feed - maybe mine, maybe someone else's - but the atom:updated dates are the same in two or more entries? I thought it was up to the publisher to decide whether to rev atom:updated. If you don't update atom:updated (e.g. it's not a significant update, fixing typos, etc.), one could (I would) assume you don't want to archive the previous entry state. There are very few chances that you significantly update an entry with the same second (that's the only way to get to the same atom:updated value if the time-secfrac is not provided), so you shouldn't have to track versions of an entry with the same atom:updated values. So you can have a MUST instead of your SHOULD. -- Thomas Broyer
Re: PaceAllowDuplicateIDs
On 12/5/05 5:26 PM, Henry Story [EMAIL PROTECTED] wrote: Don't you have the same problem with atom:modified? What if the publisher does not update the atom:modified entry? Theoretically, the publisher doesn't get that choice. To not update atom:modified would be an error. There remains the precision issue though. e.
Re: PaceAllowDuplicateIDs alteration
Friday, May 6, 2005, 3:52:19 PM, Eric Scheid wrote: On 7/5/05 12:09 AM, Graham [EMAIL PROTECTED] wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry I don't think this changes the technical meaning of the proposal, but does make it very explicit. +1, with one minor amendment. s/versions/instantiations/ the spec uses that word elsewhere, and 'versions' might suggest a media adaptation, language variant, etc. How about revisions? versions could sound too broad, it could imply that the entry could change for all sorts of reasons, conneg etc. instantiations/instances could sound like they could be identical copies of the same information. revisions suggests something that changes over time, which is closer to what we mean I think? -- Dave
RE: PaceAllowDuplicateIDs
At 00:12 05/05/07, Bob Wyman wrote: Right. We have abstract feeds and entries and we have concrete feeds and entries. The abstract feed is the actual stream of entries and updates to entries as they are created over time. Feed documents are concrete snapshots of this stream or abstract feed of entries. An abstract entry is made concrete in entry documents or entry elements. An abstract entry may change over time and may have one or more concrete instantiations. Some applications are only interested in being exposed to those concrete entries that reflect the current or most recent state of the abstract entries -- these apps would prefer to see no duplicate ids in concrete feed documents even though these duplicates *will* occur in the abstract feed. Other applications will require visibility to the entire stream of changes to abstract entries -- these applications will wish to see concrete feeds that may contain multiple, differing concrete instantiations of abstract entries. i.e. they will want the concrete feed to be an accurate representation of the abstract feed. Two needs, to views... You say 'some applications' and 'other applications', as if they were on the same footing. In my view, the 'some applicaitons' (only interested in latest version) should be the usual case, and the 'other applications' (interested in more than one version) should be the exception. Mapping that back to the origin, applications generating feeds that in one way or another rely on the user getting more than one, or more than the latest, versions of their entries have made a design error, they have taken the wrong thing for the 'entry'. If they think that they have two different kinds of audiences, interested in two different things, they should publish two feeds. Some people claim that we need a definition for 'entry' to finish this discussion, but once we confirm that a feed can only contain one version of an entry with the same ID, the definition of entry is as clear as we need it to be. This is just the same as for Web pages. If somebody puts up a Web page for the current weather, there is nothing in HTTP that will help me get the past versions of this page. If the publisher thinks that people may be interested in past weather info, they will make up separate pages. If we think that it would be valuable to be able to correlate the entries in both feeds, we should define an extension for that, not mess around with the basic model. An extension would be rather easy, we only need two rel values for links in entries. One rel value could be called permaentry, the other could be called updatingentry. Maybe a third called updatingfeed, if there is an updating feed for a single changing entry. I'm sure there are better names. The main use I see for documents with multiple entries with the same ID is archives. Everything else can be handled by the creater doing the right thing, or by an immediary offering a new feed with versions of the entry (no guarantee to have all of them, in that case). Even archives could be handled that way if really needed, but it's difficult to immagine that everybody will publish an archive feed. We can easily define an archive top level element, and that problem is solved. For aggregators, wanting to forward two or more entries with the same ID for me means that they are simply not doing their job. Aggregators should aggregate, not just function as GIGO (garbage in, garbage out) processors. So it should be clear that I'm -1 on PaceAllowDuplicateIDs. Regards,Martin.
Re: PaceAllowDuplicateIDs
that maintains the past state present. Those will be cool extensions. Why disallow those extensions by ruling now that feeds can't have multiple entries with the same id? The main use I see for documents with multiple entries with the same ID is archives. Everything else can be handled by the creater doing the right thing, or by an immediary offering a new feed with versions of the entry (no guarantee to have all of them, in that case). Even archives could be handled that way if really needed, but it's difficult to immagine that everybody will publish an archive feed. We can easily define an archive top level element, and that problem is solved. Yes, but what is the real advantage of defining an archive top level element that will never make it into this spec, when we can have all that we need by being flexible? In any case by admitting that a top level archive element could exist you are admitting that there is not really any problem with grouping multiple entry versions together. For aggregators, wanting to forward two or more entries with the same ID for me means that they are simply not doing their job. Aggregators should aggregate, not just function as GIGO (garbage in, garbage out) processors. I wonder why we need to specify what a good aggregator is and what it is not. Let the applications and the market decide. We just need to make certain types of communication possible. So it should be clear that I'm -1 on PaceAllowDuplicateIDs. Should I be -1 on UTF-8 and internationalization because most people I know are english, french or german and so that ISO-latin x is good enough for me? Regards,Martin.
Re: PaceAllowDuplicateIDs
I have no good answer to that until I know what what an id stands for. The answer an entry isn't sufficient. Bill: Semi-random thoughts... * An atom:id is a globally unique name for a specific database query. * There is no stream of instances over time. There's just old data that's out of sync with that query. * If duplicate ids are allowed, the world won't come to an end. I'm almost sure. I think. -- Roger Benningfield
Re: PaceAllowDuplicateIDs alteration
On 7/5/05 3:53 AM, Tim Bray [EMAIL PROTECTED] wrote: On May 6, 2005, at 8:49 AM, Eric Scheid wrote: Are they still the same entry if they have different source elements that identify their source as being different feeds? I don't see why. I subscribe to a Local News feed, a National News feed, and a Science News feed. All from the same publisher. The same story may appear in one, two, or three of those feeds. I don't believe each of those feeds would have the same feed/source values. Right but the story's atom:entry would have the same atom:id in each of those feeds, right? So they are the same entry, right? -Tim typo: I don't see why not. d'oh! (Tim: yes) e.
Re: PaceAllowDuplicateIDs
Robert Sayre wrote: I'm much more sympathetic to the aggregate feed problem of multiple IDs. People advocating this type of thing seem to think the default action should be grouping, so they want to use the same ID. I think that's a bad idea, and there are plenty of other ways to indicate the fundamental sameness of entries. For example, NewsML URNs have a NewsItemID and a RevisionID, which would allow smart aggregators to group the entries without violating Atom's constraint. Then you have two ways of indicating fundamental sameness of entries, one for when the same entry appears multiple times in a feed, and one for everything else. Back to basics then. Does anyone remember why having the same id in a feed is a bad idea? cheers Bill
Re: PaceAllowDuplicateIDs
On 6 May 2005, at 2:10 pm, Dave Johnson wrote: Yes, I think both of my arguments fail to hold and I no longer have a real objection to duplicates. Allowing duplicates gives feed produces to model events or other objects (versioned documents in a wiki) as they wish. Like you, I wonder Does anyone remember why having the same id in a feed is a bad idea? Beacuse instead of a fixed model where a feed is a stream of entries each with their own id, it is now a stream of entries each of which does not have its own id, but shares it with similar entries. This is bullshit. Graham
Re: PaceAllowDuplicateIDs
Unique IDs allow clients to determine the state of the feed. If entry ids are not unique, then we still need some other way to determine the unique state of the feed. If we allow duplicate IDs but *require* something else to be different (e.g. update time), then we can still determine the unique state of a feed and repeated IDs are OK. We would need to properly document which elements make an entry unique in the event of a duplicated ID. Brett. On 6 May 2005, at 2:10 pm, Dave Johnson wrote: Yes, I think both of my arguments fail to hold and I no longer have a real objection to duplicates. Allowing duplicates gives feed produces to model events or other objects (versioned documents in a wiki) as they wish. Like you, I wonder Does anyone remember why having the same id in a feed is a bad idea?
PaceAllowDuplicateIDs alteration
As the WG may have noticed, I have some serious problems with the Pace. One small change would eliminate about 75% of them: Replace the line: If an Atom Feed Document contains multiple entries with the same atom:id, software MAY choose to display all of them or some subset of them with: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry I don't think this changes the technical meaning of the proposal, but does make it very explicit. Would anyone object to this change? Graham
RE: PaceAllowDuplicateIDs
Graham wrote: Does anyone remember why having the same id in a feed is a bad idea? Beacuse instead of a fixed model where a feed is a stream of entries each with their own id, it is now a stream of entries each of which does not have its own id, but shares it with similar entries. This is bullshit. I completely disagree on this. I think the problem here is people focusing too much on characteristics of the feed when the real issue here is Entries. Like I've said in the past, It's about the Entries, Stupid! (don't take offense...) As long as we allow entries to be updated, it is inevitable that the stream of entries that is created over time will contain instances of entries that share common atom:id values. The only question here is whether or not we're willing to allow a feed document to *accurately* represent the stream of entries -- as they were created -- or whether we insist that the feed document censor the history of the stream by removing old instances of updated entries before allowing updates to be inserted. The reality is that no matter which decision we make in this case, any useful aggregator must have code to deal with multiple instances of an entries that share the same atom:id. This is the case since even if we don't permit duplicate IDs in a single instance of a feed document, we would still permit duplicate ID's *over time. Because duplicate ids appear, over time, whenever you update an entry, the aggregator has to have all the logic needed to handle them in the *stream* of entries that it reads -- over time. This issue only becomes interesting if we try to provide special rules for the handling of data within a single instance of a feed document. The reality is, however, that any aggregator that actually pays attention to these special case rules is going to either get more complex (since it can't simply treat everything as a stream of entries) or it will get confused (since folk will intentionally or unintentionally create duplicate ids). This ban on duplicate ids provides no benefit for aggregators, it makes feed producers more complex, it tempts aggregator or client writers to do dangerous things, it forces deletion of data that is useful to some people for some applications, it puts too much emphasis on feeds when we should be working on entries, etc... It is a really bad thing to do. bob wyman
Re: PaceAllowDuplicateIDs alteration
What determines a version? If we have multiple entries with identical information, these are copies, not versions. The real issue is feed state. Given there are identical IDs, can we determine if the entries are identical or different? The end client can then deal with it any way it wants (e.g.): - Keep only the most recent. - Display them all. - Combine them all into a single entry with a history. - Show changes. - Drop duplicates. - etc. Sorry I don't have an answer here, just an observation of the exact problem. Brett Graham wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry
Re: PaceAllowDuplicateIDs alteration
On May 6, 2005, at 7:09 AM, Graham wrote: Replace the line: If an Atom Feed Document contains multiple entries with the same atom:id, software MAY choose to display all of them or some subset of them with: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry Hmmm; the Pace already says If multiple atom:entry elements with the same atom:id value appear in an Atom Feed document, they represent the same entry. So what you want is almost there. I don't think this changes the technical meaning of the proposal, but does make it very explicit. My problem is that when you say software MUST, I think you should follow up with something specific and testable. This assertion feels fairly vague and leaves room for lots of argument. Maybe a first step to tightening it up would be to provide some specific examples of software behaviors that would be forbidden/allowed by this MUST-clause. -Tim [who has yet to express a +1 or any other final opinion about this Pace]
Re: PaceAllowDuplicateIDs
On 6/5/05 11:37 PM, Graham [EMAIL PROTECTED] wrote: Beacuse instead of a fixed model where a feed is a stream of entries each with their own id, it is now a stream of entries each of which does not have its own id, but shares it with similar entries. This is bullshit. See the spec: [...] Put another way, an atom:id element pertains to all instantiations of a particular Atom entry or feed; revisions retain the same content in their atom:id elements. [...] This clearly implies that a feed is a stream of *instantiations* of an entry. Put another way, the map is not the territory, the entry is not the 'entry'. e.
Re: PaceAllowDuplicateIDs alteration
On 7/5/05 12:09 AM, Graham [EMAIL PROTECTED] wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry I don't think this changes the technical meaning of the proposal, but does make it very explicit. +1, with one minor amendment. s/versions/instantiations/ the spec uses that word elsewhere, and 'versions' might suggest a media adaptation, language variant, etc. e.
Re: PaceAllowDuplicateIDs
Graham wrote: On 6 May 2005, at 2:10 pm, Dave Johnson wrote: Yes, I think both of my arguments fail to hold and I no longer have a real objection to duplicates. Allowing duplicates gives feed produces to model events or other objects (versioned documents in a wiki) as they wish. Like you, I wonder Does anyone remember why having the same id in a feed is a bad idea? Beacuse instead of a fixed model where a feed is a stream of entries each with their own id, it is now a stream of entries each of which does not have its own id, but shares it with similar entries. This is bullshit. No, it's not, not yet. You can't reasonably call bullshit on this either way until we know what's being identified. When I boil it down, this is what I get: the technical problem we have is that we can't distinguish between a buggy feed with the same ids and an aggregate feed with the same ids under the current spec I have no good answer to that until I know what what an id stands for. The answer an entry isn't sufficient. cheers Bill
Re: PaceAllowDuplicateIDs alteration
On 7/5/05 1:16 AM, Bob Wyman [EMAIL PROTECTED] wrote: Graham wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry Are they still the same entry if they have different source elements that identify their source as being different feeds? I don't see why. I subscribe to a Local News feed, a National News feed, and a Science News feed. All from the same publisher. The same story may appear in one, two, or three of those feeds. I don't believe each of those feeds would have the same feed/source values. e.
Re: PaceAllowDuplicateIDs alteration
On 5/6/05, Bob Wyman [EMAIL PROTECTED] wrote: Graham wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry Are they still the same entry if they have different source elements that identify their source as being different feeds? I would say yes. Entries IDs are globally unique. If they aren't the same entry, it's a collision, right? Robert Sayre
Re: PaceAllowDuplicateIDs alteration
On May 6, 2005, at 8:49 AM, Eric Scheid wrote: Are they still the same entry if they have different source elements that identify their source as being different feeds? I don't see why. I subscribe to a Local News feed, a National News feed, and a Science News feed. All from the same publisher. The same story may appear in one, two, or three of those feeds. I don't believe each of those feeds would have the same feed/source values. Right but the story's atom:entry would have the same atom:id in each of those feeds, right? So they are the same entry, right? -Tim
Re: PaceAllowDuplicateIDs alteration
On 6 May 2005, at 4:16 pm, Bob Wyman wrote: Graham wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry Are they still the same entry if they have different source elements that identify their source as being different feeds? Why wouldn't they be? It would mean Here's how the entry looked when it was published in feed A and Here's how the entry looked when it was in feed B. But if the publisher has assigned them the same ID, that's a fairly clear expression that they're versions of the same thing. (obviously there's the danger of spoofing, but that's a general problem with IDs and not something that needs to be noted in every sentence) Graham
Re: PaceAllowDuplicateIDs alteration
On Friday, May 6, 2005, at 09:16 AM, Bob Wyman wrote: Graham wrote: If an Atom Feed Document contains multiple entries with the same atom:id, software MUST treat them as multiple versions of the same entry Are they still the same entry if they have different source elements that identify their source as being different feeds? In a perfect world with no malicious, undereducated, misinformed, intellectually challenged or other people people who don't mint ids appropriately, yes, they're the same entry. In the real world, I have no idea. A human looking at them could probably determine whether they're the same if they're different enough, but if they're substantially similar, then even a human wouldn't necessarily be able to determine whether they're the same or whether one is a malicious alteration. There's no automated way to decide (unless their contents are identical). Authors of consuming applications will have to decide whether or not to obey the commandment from the spec (if adopted) to treat them as being the same, or whether to give their users the option of making that decision. Specifying that publishers who publish the same entry in multiple feeds MUST choose one to be the original source and express the rest as aggregated entries from that feed would make it much easier to justify treating them as different entries if they claimed to originate in different feeds.
Re: PaceAllowDuplicateIDs
Tonight something incredible happened to me. You won't believe it. I was walking back from the pubs when I got snapped by a passing space ships full of hyper advanced aliens. They did various experiments on me, and cloned me 1000 times. It is terrible. I just don't know what to do. I suppose that I means am +1000 on this now. :-) That's consensus, I am sure. Henry http://bblfish.net/blog/ On 5 May 2005, at 06:02, Tim Bray wrote: co-chair-hat status=OFF http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has. Summary: 1. There are multiple plausible use-cases for feeds with duplicate IDs 2. Pro and Contra 3. Alternate Paces 4. Details about this Pace 1. Use-Cases Here's a stream of stock-market quotes. feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in. Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time. You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly. Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic. 2. Pro and Contra Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/ RepeatID.txt for the WG's convenience. Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was Software will screw this up. On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me. So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom if this restriction remains in place. I believe strongly that if there's something that implementors want to do, standards shouldn't get in the way unless there's real interoperability damage. I'm certainly prepared to believe that this could cause interoperability damage, but to date I haven't seen any convincing arguments that it will. I think that if we nonetheless forbid it, people who want to do this will (a) use RSS instead of Atom, (b) cook up horrible kludges, or (c) ignore us and just do it. So my best estimate is that the cost of allowing dupes is probably much lower than the cost of forbidding them. Finally, our charter does say that we're also supposed to specify how you'd go about archiving feeds, and AllowDuplicateIDs makes this trivial. I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that. 3. Alternate Paces I didn't want to just revive PaceRepeatIdInDocument, because it used the word version in what I thought was kind of a sloppy way, and because it wasn't current against format-08. I don't like either PaceDuplicateIDWithSource or ...WithSource2, they are complicated and don't really meet PubSub's needs anyhow. So I'm strongly -1 on both of those. Yes, that means that if this Pace fails, we'll allow no duplicates at all. I prefer either dupes OK or no dupes to dupes OK in the following circumstances; cleaner. 4. Details Section 4.1.2 of format-08 says that atom:entry represents an individual entry. The Pace says that if you have dupes, they represent the same entry, which I think is consistent with both the letter and spirit of 4.1.2. The Pace discourages duplicate timestamps without resorting to MUST language, because accidents can happen; this allows software to throw
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 5:02 am, Tim Bray wrote: feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed Tim, model this as a blog first. Is it: a) One entry that's being updated? b) Hourly new postings with the latest price? See, I think it's b). Which under any sensible circumstance would count as new entries, and therefoe get new ids. You're trying to use atom:id as a category system here. Let's say I post a new picture of my cat every day. Should all my blog entries have the same id? Technical problems: The problem multiple ids is that we don't have a date element that provides a definitive answer to the question, What is the current version?, which 99% of the time is all an aggregator needs. For example, what happens if I retract an update to an entry, and presumably roll back atom:updated? The new version stays? If so, the spec of atom:updated needs changing. I see you have the constraint Their atom:updated timestamps SHOULD be different, and processing software SHOULD regard entries with duplicate atom:id and atom:updated values as evidence of an error in the feed generation. Does this apply temporally as well as spatially? For example, if the content changes the second time I load something, but the atom:updated doesn't, is that an error? Again, atom:updated falls short for this purpose. Finally, at pubsub, what happens when they download an entry from one feed, then the user edits it, but doesn't modify atom:updated, then they download the new entry from a second feed associated with the site. Different content, identical atom:ids, identical atom:updated = Invalid feed. They're not in any better position than they were before. This doesn't even solve the problem it's meant to. If an Atom Feed Document contains multiple entries with the same atom:id, software MAY choose to display all of them or some subset of them What does this even mean, other than atom:id is meaningless, ignore it? I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that. Duplicate ids is a constraint of the atom:feed element. Use a different top level element, atom:archive, for archives. Graham
Re: http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
On 5 May 2005, at 2:20 am, Bob Wyman wrote: Basically, you dont have to update atom:updated unless you think it makes sense OR you are publishing to a feed that already has an entry with the same atom:id as the atom:id of the entry you are currently publishing. Or someone downstream is publishing one, of course. Which means you must always change atom:updated, just in case. No harm done, Tim. None at all. Graham
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 11:36 am, Henry Story wrote: Tim, model this as a blog first. Is it: a) One entry that's being updated? b) Hourly new postings with the latest price? Given that the ids are the same it is now clear that we have situation a) I said first, before we decide what ids we should use. If created as a blog, Tim's stock quotes would make most sense (to me) posted as hourly new entries. Ergo, they should each have different ids. Again, atom:id is not a category system. Either the change they made is significant or it is not. If it is a significant change then by not changing the atom:updated field the user will have done something other than what he thought he was doing. For by not changing the date he is allowing receiving software to decide by themselves whether they wish to keep or drop the change. If it is not a significant change, then the receiving software won't be doing anything problematic by either dropping the later version received or keeping it. atom:updated is used by the publisher to show what they consider a significant change. The user, on the other hand, wants to see the latest version, reliably, even if the publisher disagrees that the change was significant. This is the core problem with Tim's proposal. There is no way to create an aggregator that works in the way the user expects. I am lying down in the road here, as Tim would say. Well that seems like a very complicated way of solving a problem where allowing entries with duplicate ids in a feed document from the start would be much simpler. If you are going to allow archive feeds to keep duplicates then why not just allow feeds and be done with it? Because feeds are feeds and archives are archives? They have different audiences and different uses and different requirements. Graham
Re: PaceAllowDuplicateIDs
On 5/5/05, Eric Scheid [EMAIL PROTECTED] wrote: Because feeds are feeds and archives are archives? They have different audiences and different uses and different requirements. And what about the use case of a wiki's RecentChanges log? Each entry refers to a specific page, and there may be multiple such entries for each page as it gets rapidly edited ... and wiki folks have found it important to be able to monitor all change events. I'm much more sympathetic to the aggregate feed problem of multiple IDs. People advocating this type of thing seem to think the default action should be grouping, so they want to use the same ID. I think that's a bad idea, and there are plenty of other ways to indicate the fundamental sameness of entries. For example, NewsML URNs have a NewsItemID and a RevisionID, which would allow smart aggregators to group the entries without violating Atom's constraint. Robert Sayre
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 2:26 pm, Eric Scheid wrote: perhaps we needed atom:modified after all :-( Yes we do, if we want to go down this route. I suggest appending the current time (or for old versions, the last time that version was current) at the source. And what about the use case of a wiki's RecentChanges log? Each entry refers to a specific page, and there may be multiple such entries for each page as it gets rapidly edited ... and wiki folks have found it important to be able to monitor all change events. Each log entry is an entry in itself, with its own id. That seems a far better functional parallel to the basic blog feed. As with the share price example, the topic of the entry (the company, or the wiki page) is far more analogous to a category that the entry belongs to, than to an its identity. Everyone stop trying to use ids as a category system. Graham
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 15:55, Graham wrote: On 5 May 2005, at 2:26 pm, Eric Scheid wrote: perhaps we needed atom:modified after all :-( Yes we do, if we want to go down this route. I suggest appending the current time (or for old versions, the last time that version was current) at the source. Sorry I don't understand why we need atom:modified. And what about the use case of a wiki's RecentChanges log? Each entry refers to a specific page, and there may be multiple such entries for each page as it gets rapidly edited ... and wiki folks have found it important to be able to monitor all change events. Each log entry is an entry in itself, with its own id. That seems a far better functional parallel to the basic blog feed. As I explained in my lengthy reply to your lengthy post, I think one should be able to do either. Each way has its advantages and disadvantages. Let the publisher decide which mechanism to use. As with the share price example, the topic of the entry (the company, or the wiki page) is far more analogous to a category that the entry belongs to, than to an its identity. Again let the publisher choose what the identity criterion of his objects are. Some will stick some will not. But it is not up to us to decide for our users. Since it does not cause any interoperability issues, what's the problem? Everyone stop trying to use ids as a category system. I don't think that one would be using ids as a category system. If you go to http://google.com you get todays front page. Tomorrow you get tomorrows front page. What's the problem? Is http://google.com a hidden category system? Henry Story http://bblfish.net/blog/
Re: PaceAllowDuplicateIDs
On 5/5/05, Eric Scheid [EMAIL PROTECTED] wrote: On 5/5/05 11:55 PM, Graham [EMAIL PROTECTED] wrote: Each log entry is an entry in itself, with its own id. Sorry, that makes as much sense as changing the id for a blog entry if that blog entry is updated. Graham's got it exactly right. The functional parallel is wiki-page = blog-entry, and if a blog-entry is updated then that is reflected in the feed as an updated entry - with the same id. That's right, that is the functional parallel. No software I know of shows both revisions of the entry in the feed when it's updated. If you are syndicating wiki changes, part of each entry is the diff and revision id--each revision is a unique thing. Another analogous use case would be a feed watching a certain file in CVS. Every entry would be about the same file, but each would have its own atom:id. Once again, there remains a downstream problem for PubSub, etc. Robert Sayre
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 3:32 pm, Henry Story wrote: As I explained in my lengthy reply to your lengthy post, I think one should be able to do either. Each way has its advantages and disadvantages. Let the publisher decide which mechanism to use. Well please flag it so that I can provide a consistent user interface to people's whims? Since it does not cause any interoperability issues, what's the problem? I have to come up with a new way to recognise and interpret such feeds where an entry (as defined by its id) isn't an entry but a feed of different entries. I don't think that one would be using ids as a category system. If you go to http://google.com you get todays front page. Tomorrow you get tomorrows front page. What's the problem? Is http://google.com a hidden category system? Charter: Atom defines a feed format for representing resources such as Weblogs, online journals, Wikis, and similar content Atom is not a replacement for HTTP. Google.com is a web page, not similar content. It's not relevant here. Graham
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 3:23 pm, Eric Scheid wrote: And what about the use case of a wiki's RecentChanges log? Each entry refers to a specific page, and there may be multiple such entries for each page as it gets rapidly edited ... and wiki folks have found it important to be able to monitor all change events. Each log entry is an entry in itself, with its own id. Sorry, that makes as much sense as changing the id for a blog entry if that blog entry is updated. Have a look here: http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history There you have a reverse chrono list, each with an author, date, and summary. Looks an awful lot like each one is an entry to me. Graham
Re: PaceAllowDuplicateIDs
On 6/5/05 12:32 AM, Henry Story [EMAIL PROTECTED] wrote: Sorry I don't understand why we need atom:modified. Graham suggested it : a reliable way for an aggregator to discern the latest version of an entry. atom:updated is used by the publisher to show what they consider a significant change. The user, on the other hand, wants to see the latest version, reliably, even if the publisher disagrees that the change was significant. This is the core problem with Tim's proposal. There is no way to create an aggregator that works in the way the user expects. ... no way, that is, unless we have atom:modified making each same-id entry distinct, and not just distinct but also time-ordered and time-distanced (an advantage over just using something similar to the NewsML RevisionID mechanism). e.
Re: PaceAllowDuplicateIDs
On 6/5/05 12:45 AM, Graham [EMAIL PROTECTED] wrote: Have a look here: http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history There you have a reverse chrono list, each with an author, date, and summary. Looks an awful lot like each one is an entry to me. and looks to me like a stream of meta-data concerning the one entry to me. and not distinct and separable entries like you'd find in your every day blog. Henry has the right idea -- the spec should allow both kinds, rather than trying to shoe-horn everything into the one viewpoint of what is an entry. e.
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 16:38, Graham wrote: On 5 May 2005, at 3:32 pm, Henry Story wrote: As I explained in my lengthy reply to your lengthy post, I think one should be able to do either. Each way has its advantages and disadvantages. Let the publisher decide which mechanism to use. Well please flag it so that I can provide a consistent user interface to people's whims? What is the problem with the user interface that you have exactly? I have pointed you to BlogEd that keeps a history of all the changes to an entry. Try it out: http://blogs.sun.com/roller/page/bblfish/ Its open source, so you can also copy the code. If you don't want to keep a history of the entries all you need to do is drop all but the latest entry with the same id. There is nothing more to it. Just show the user the last one you came across. Since it does not cause any interoperability issues, what's the problem? I have to come up with a new way to recognise and interpret such feeds where an entry (as defined by its id) isn't an entry but a feed of different entries. No you don't. Just drop the old ones, if you don't care about the history. Really simple. As Tim Bray's text says [[ software MAY choose to display all of them or some subset of them ]] So just drop the older versions. I don't think that one would be using ids as a category system. If you go to http://google.com you get todays front page. Tomorrow you get tomorrows front page. What's the problem? Is http://google.com a hidden category system? Charter: Atom defines a feed format for representing resources such as Weblogs, online journals, Wikis, and similar content yes, and it must also allow the representation of [[ * a complete archive of all entries in a feed ]] This proposal permits this, and it does not harm anyone else. Atom is not a replacement for HTTP. Google.com is a web page, not similar content. It's not relevant here. I don't know where you get the idea that I said atom is a replacement for HTTP. Take a breath perhaps and relax before you answer. Graham
Re: PaceAllowDuplicateIDs
I'm -1 on PaceAllowDuplicateIDs Reasons: 1) We're supposed to be standardizing current practice not inventing new things. Current best practice is to have unique IDs and current software (e.g. Javablogs.com) is predicated on this practice. I know, this practice is not followed widely enough, but that is another matter. 2) I think it is *much* more useful to think of an Atom Entry as an event that occurred at a specific time. Typically, an event is the publication of an article or blog entry on the web. For example: event: CNET published article subject: CNET object: article But an event it could also represent other events. event: delivery van delivers package subject: delivery van object: package event: alarm system sends warning subject: alarm system object: warning event: server sends load warning subject: server object: load warning If you think of Atom Entries as events, then it makes sense to consider the Atom Entry ID to be the ID of the event, not the ID of the subject or object of the event. Events are unique (you can't have more than one version of an event) and can be assigned GUIDs and therefore you cannot have more than one entry with the same ID. In the case of earthquake data, each new data report is a new event. event: agency reports earthquake data subject: agency object: earthquake data The ID is the ID of the data reported event not the ID of the earthquake. We don't know what subjects and objects people are going to use in the future, so we can't specify Atom elements or IDs for subjects and objects -- that's what extensions are for. If you want to create a feed to syndicate information about earthquakes, then you introduce an extension for uniquely identifying earthquakes. The same goes for earthquakes. - Dave On May 5, 2005, at 12:02 AM, Tim Bray wrote: co-chair-hat status=OFF http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has. Summary: 1. There are multiple plausible use-cases for feeds with duplicate IDs 2. Pro and Contra 3. Alternate Paces 4. Details about this Pace 1. Use-Cases Here's a stream of stock-market quotes. feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in. Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time. You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly. Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic. 2. Pro and Contra Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience. Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was Software will screw this up. On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me. So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom if this restriction remains in place. I believe strongly that if there's something that implementors want to do, standards shouldn't get in the way unless there's real interoperability damage. I'm certainly prepared to believe that this could cause interoperability damage, but to date I haven't seen any convincing arguments that it will. I think that if we nonetheless forbid it, people who want to do this will (a) use RSS instead of Atom, (b) cook
Re: PaceAllowDuplicateIDs
On Thursday, May 5, 2005, at 09:15 AM, Eric Scheid wrote: Have a look here: http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history There you have a reverse chrono list, each with an author, date, and summary. Looks an awful lot like each one is an entry to me. and looks to me like a stream of meta-data concerning the one entry to me. and not distinct and separable entries like you'd find in your every day blog. Henry has the right idea -- the spec should allow both kinds, rather than trying to shoe-horn everything into the one viewpoint of what is an entry. +1 -- allow the publisher to decide which model fits their intent.
Re: PaceAllowDuplicateIDs
Immediately after sending this message, I had a rush of second thoughts. My point #2 is not very well thought out. I think it applies for things like earthquake data, but when Atom feeds represent blog entries or articles (in an archive or an Atom Protocol feed) the ID represents the article not an event in the blog entry's life. So, you can discount my second reason against the pace. - Dave On May 5, 2005, at 11:27 AM, David M Johnson wrote: I'm -1 on PaceAllowDuplicateIDs Reasons: 1) We're supposed to be standardizing current practice not inventing new things. Current best practice is to have unique IDs and current software (e.g. Javablogs.com) is predicated on this practice. I know, this practice is not followed widely enough, but that is another matter. 2) I think it is *much* more useful to think of an Atom Entry as an event that occurred at a specific time. Typically, an event is the publication of an article or blog entry on the web. For example: event: CNET published article subject: CNET object: article But an event it could also represent other events. event: delivery van delivers package subject: delivery van object: package event: alarm system sends warning subject: alarm system object: warning event: server sends load warning subject: server object: load warning If you think of Atom Entries as events, then it makes sense to consider the Atom Entry ID to be the ID of the event, not the ID of the subject or object of the event. Events are unique (you can't have more than one version of an event) and can be assigned GUIDs and therefore you cannot have more than one entry with the same ID. In the case of earthquake data, each new data report is a new event. event: agency reports earthquake data subject: agency object: earthquake data The ID is the ID of the data reported event not the ID of the earthquake. We don't know what subjects and objects people are going to use in the future, so we can't specify Atom elements or IDs for subjects and objects -- that's what extensions are for. If you want to create a feed to syndicate information about earthquakes, then you introduce an extension for uniquely identifying earthquakes. The same goes for earthquakes. - Dave On May 5, 2005, at 12:02 AM, Tim Bray wrote: co-chair-hat status=OFF http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has. Summary: 1. There are multiple plausible use-cases for feeds with duplicate IDs 2. Pro and Contra 3. Alternate Paces 4. Details about this Pace 1. Use-Cases Here's a stream of stock-market quotes. feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in. Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time. You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly. Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic. 2. Pro and Contra Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience. Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was Software will screw this up. On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me. So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 4:22 pm, Henry Story wrote: If you don't want to keep a history of the entries all you need to do is drop all but the latest entry with the same id. There is nothing more to it. Just show the user the last one you came across. But, if we follow Eric's model of how a wiki changelog should be defined, I'll be missing entries in the log, because several different entries have the same id. Ergo, the user interface and data model for the new type of feed this proposal permits is very different. This proposal permits this, and it does not harm anyone else. It harms everyone, by allowing a second, unrelated data model in Atom feeds. They may not be posting today, but I assure you, when other aggregator authors get the first user complaints about how Eric's wiki log displays incompletely in their program, they'll forgive Dave Winer everything. Graham
Re: PaceAllowDuplicateIDs
Hi Dave, nice to see you participate here. I understand your points, and I myself thought the way you did for a while. [Oops, I see now that you have retracted your point. Oh well. I had already started writing the following] On 5 May 2005, at 17:27, David M Johnson wrote: I'm -1 on PaceAllowDuplicateIDs Please consider the following points before you vote. Reasons: 1) We're supposed to be standardizing current practice not inventing new things. Current best practice is to have unique IDs and current software (e.g. Javablogs.com) is predicated on this practice. I know, this practice is not followed widely enough, but that is another matter. Atom is standardizing current practice, but it is also adding some features. For example name spaces and ids. The atom charter also requires us to allow archives [[ * a complete archive of all entries in a feed ]] Graham himself thinks that archives are possible, since he supports the use of an archive head element. 2) I think it is *much* more useful to think of an Atom Entry as an event that occurred at a specific time. Typically, an event is the publication of an article or blog entry on the web. For example: event: CNET published article subject: CNET object: article But an event it could also represent other events. event: delivery van delivers package subject: delivery van object: package event: alarm system sends warning subject: alarm system object: warning event: server sends load warning subject: server object: load warning If you think of Atom Entries as events, then it makes sense to consider the Atom Entry ID to be the ID of the event, not the ID of the subject or object of the event. You are right. There are two types of objects that we need to think about: A- the event/state of a resource at a particular time B- the thing that makes these different states the state of the same thing Clearly we need (B) or else all the talk about an entry changing over time (atom:updated) would not make sense. So let us start off, as I did a long time ago, by thinking that the the id of an entry uniquely identifies the event/state of the entry. For every id there can be only one and only one entry/entry representation. That id is that representation. It is, if you wish, the name of a state of something else... and that would be? I think it is clear that one of the roles of the id is to make it possible for an entry to be moved from one web site to another, so that if your blog service provider lets you down, you can still refer to the entry even when you have moved it to a different alternate position. Graham has made such a point quite often. Entries it has often been said can change, but the id remains the same. I think this is clearly the consensus on this list. So the id URI is what identifies the different entry.../entry representations as being representations of the same thing. Events are unique (you can't have more than one version of an event) and can be assigned GUIDs and therefore you cannot have more than one entry with the same ID. yes. But I don't think that this is the consensus on this group. The good thing is that you can achieve the same identification of a state through the combination of the id and the modification time. [here I noticed that you had changed your mind, anyway. I think I had exactly the same thought as you did when I first started thinking about this. ] In the case of earthquake data, each new data report is a new event. event: agency reports earthquake data subject: agency object: earthquake data The ID is the ID of the data reported event not the ID of the earthquake. We don't know what subjects and objects people are going to use in the future, so we can't specify Atom elements or IDs for subjects and objects -- that's what extensions are for. If you want to create a feed to syndicate information about earthquakes, then you introduce an extension for uniquely identifying earthquakes. The same goes for earthquakes. - Dave
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 17:53, Graham wrote: On 5 May 2005, at 4:22 pm, Henry Story wrote: If you don't want to keep a history of the entries all you need to do is drop all but the latest entry with the same id. There is nothing more to it. Just show the user the last one you came across. But, if we follow Eric's model of how a wiki changelog should be defined, I'll be missing entries in the log, because several different entries have the same id. Ergo, the user interface and data model for the new type of feed this proposal permits is very different. If your tool (Is it Shrook2? [1]) only shows people the latest version available to you of an entry, then by showing them only the latest version, Shrook2 will be giving the user what he is expecting. When your news reader currently reads feeds on the internet what does it do with changed entries? Either it keeps the older version around, for the user to browse, or it does not. If your users don't mind you throwing away the older versions of an entry, then they won't mind you throwing away the older versions of the above entries either. There is no difference in the behavior between allowing changed entries across feed documents and changed entries inside a feed document. People who place two entries with the same id inside a feed document should be aware that tools like yours will have the behavior they do, and that this is ok. Other people may be interested in looking at things historically. They will get a historical viewer and be happy with it. I think the current proposal is good exactly because it allows the wiki people to express what they want to express correctly. Namely how their wiki entry is changing over time. This proposal permits this, and it does not harm anyone else. It harms everyone, by allowing a second, unrelated data model in Atom feeds. They may not be posting today, but I assure you, when other aggregator authors get the first user complaints about how Eric's wiki log displays incompletely in their program, they'll forgive Dave Winer everything. Again, has anyone yet complained to you that you have not kept a historical and browse-able track record of how the entries Shrook2 is looking at have changed over time? Clearly they could, as you sometimes let them know that an entry that they already have read has been updated. They could ask you what the changes were, no? How it changed, etc. If your users don't care that much about the history of an entry, then you can dump all but the latest entry. Or you could just keep the last two entries, so that you can show them a diff. Graham HJStory http://bblfish.net/blog/ [1] http://www.fondantfancies.com/apps/shrook/
Re: PaceAllowDuplicateIDs
On 6/5/05 1:53 AM, Graham [EMAIL PROTECTED] wrote: This proposal permits this, and it does not harm anyone else. It harms everyone, by allowing a second, unrelated data model in Atom feeds. They may not be posting today, but I assure you, when other aggregator authors get the first user complaints about how Eric's wiki log displays incompletely in their program, they'll forgive Dave Winer everything. Many wiki's offer options in displaying their change log with either most recent changes only, or all changes. Both models are commonly supported because some people want to see notifications of all changes, while others just want to see the most recent change. That is part of wiki culture, all the way back to ward's wiki. It wouldn't be surprising to find the same options made available for wiki logs in rss. Hey, here's one right now http://www.intertwingly.net/wiki/pie/RecentChanges?action=rss_rc Apparently, if you add an unique=1 URL parameter you get a list of changes where page names are unique, i.e. where only the latest change of each page is reflected. e.
Re: PaceAllowDuplicateIDs
On Thursday, May 5, 2005, at 08:44 AM, Antone Roundy wrote: If we accept this Pace, are we going to do anything to address the DOS issue for aggregated feeds? Bob, if I may direct a few question to you, since you have the most experience with this issue: if PaceAllowDuplicateIDs is adopted, how would you anticipate that PubSub would go about handling entries with the same atom:id coming from different feeds? What if each appears to be claiming to be the original feed for the entry? What if both are getting aggregated into the same feed, but your system doesn't think they're really the same entry? I'm in favor of the Pace, as far as it goes, but was surprised to see that it doesn't talk about these issues, given that it was motivated by a conversation with you.
Re: PaceAllowDuplicateIDs
On 5 May 2005, at 5:38 pm, Eric Scheid wrote: Many wiki's offer options in displaying their change log with either most recent changes only, or all changes. Both models are commonly supported because some people want to see notifications of all changes, while others just want to see the most recent change. That is part of wiki culture, all the way back to ward's wiki. OK that makes sense. I still think it's the wrong way to model a change log as a feed. My other two criticisms still stand: atom:updated is used by the publisher to show what they consider a significant change. The user, on the other hand, probably wants to see the latest version, reliably, even if the publisher disagrees that the change was significant. This is the core problem with Tim's proposal. There is no way to create an aggregator that works in the way the user expects. Finally, at pubsub, what happens when they download an entry from one feed, then the user edits it, but doesn't modify atom:updated, then they download the new entry from a second feed associated with the site? Different content, identical atom:ids, identical atom:updated = Invalid feed. They're not in any better position than they were before. This doesn't even solve the problem it's meant to. Basically, atom:updated doesn't properly differentiate versions, and the way atom:updated is being used by the proposal doesn't gel with the actual spec of the element. Graham
Re: PaceAllowDuplicateIDs
Graham wrote: On 5 May 2005, at 5:38 pm, Eric Scheid wrote: Many wiki's offer options in displaying their change log with either most recent changes only, or all changes. Both models are commonly supported because some people want to see notifications of all changes, while others just want to see the most recent change. That is part of wiki culture, all the way back to ward's wiki. OK that makes sense. I still think it's the wrong way to model a change log as a feed. My other two criticisms still stand: atom:updated is used by the publisher to show what they consider a significant change. The user, on the other hand, probably wants to see the latest version, reliably, even if the publisher disagrees that the change was significant. This is the core problem with Tim's proposal. There is no way to create an aggregator that works in the way the user expects. Just a thought: On the other hand, perhaps this is an opportunity to operationally define significant change: A change which results in a new version being exposed on one's feed. If you think your users would care about seeing the change, then change the atom:updated field and 'republish' by adding to the feed. If not, just change your content and don't republish. Examples of this might include: Fixing irrelevant typos. Changing character set encodings. Changing formatting to match a new style guide. -John
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
+1 with a comment: If this Pace is accepted (and I hope it will be) the issue of Duplicate IDs should probably be dealt with in Marks Implementation Guide.[1] Atom supports the publishing of newer versions of an entry which use the same atom:id as earlier versions of the same entry. It is not required that atom:updated be modified when a newer version is written. If the PaceAllowDuplicateIDs is accepted, it will be permitted to have multiple entries with the same atom:id in a single feed. However, the Pace language says processors SHOULD regard as feed generation errors any entries which duplicate both the atom:id and atom:updated of another entry in the same feed. Thus, feed authors who wish to publish feeds with duplicate atom:ids should ensure that any entry which duplicates an entry already in the feed has a different value for atom:updated. This constraint is not a requirement of the language, but it is a clear derivative of it. Basically, you dont have to update atom:updated unless you think it makes sense OR you are publishing to a feed that already has an entry with the same atom:id as the atom:id of the entry you are currently publishing. bob wyman [1] http://diveintomark.org/rfc/draft-ietf-atompub-impl-guide-00.html
Re: http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
On May 4, 2005, at 6:20 PM, Bob Wyman wrote: +1 with a comment: If this Pace is accepted (and I hope it will be) the issue of Duplicate IDs should probably be dealt with in Marks Implementation Guide.[1] Er, I had planned to refine this a bit and then announce it to the group with some explanations and some other background research I did; so how about I promise to do that later this evening; and please consider waiting for that before you all pile in, pro or contra. -Tim
PaceAllowDuplicateIDs
co-chair-hat status=OFF http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has. Summary: 1. There are multiple plausible use-cases for feeds with duplicate IDs 2. Pro and Contra 3. Alternate Paces 4. Details about this Pace 1. Use-Cases Here's a stream of stock-market quotes. feedtitleMy Portfolio/title entrytitleMSFT/title updated2005-05-03T10:00:00-05:00/updated contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T11:00:00-05:00/updated contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item /entry entrytitleMSFT/title updated2005-05-03T12:00:00-05:00/updated contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item /entry /feed You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in. Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time. You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly. Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic. 2. Pro and Contra Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience. Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was Software will screw this up. On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me. So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom if this restriction remains in place. I believe strongly that if there's something that implementors want to do, standards shouldn't get in the way unless there's real interoperability damage. I'm certainly prepared to believe that this could cause interoperability damage, but to date I haven't seen any convincing arguments that it will. I think that if we nonetheless forbid it, people who want to do this will (a) use RSS instead of Atom, (b) cook up horrible kludges, or (c) ignore us and just do it. So my best estimate is that the cost of allowing dupes is probably much lower than the cost of forbidding them. Finally, our charter does say that we're also supposed to specify how you'd go about archiving feeds, and AllowDuplicateIDs makes this trivial. I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that. 3. Alternate Paces I didn't want to just revive PaceRepeatIdInDocument, because it used the word version in what I thought was kind of a sloppy way, and because it wasn't current against format-08. I don't like either PaceDuplicateIDWithSource or ...WithSource2, they are complicated and don't really meet PubSub's needs anyhow. So I'm strongly -1 on both of those. Yes, that means that if this Pace fails, we'll allow no duplicates at all. I prefer either dupes OK or no dupes to dupes OK in the following circumstances; cleaner. 4. Details Section 4.1.2 of format-08 says that atom:entry represents an individual entry. The Pace says that if you have dupes, they represent the same entry, which I think is consistent with both the letter and spirit of 4.1.2. The Pace discourages duplicate timestamps without resorting to MUST language, because accidents can happen; this allows software to throw such entries on the floor while positively encouraging noisy complaining. On the other hand, if the WG wanted either to insist on a MUST here or remove the discouragement altogether I could live with that. Finally, it makes it clear that if there are entries with duplicate atom:id, software is free to display all or a subset, and calls out the likely common case where you discard all but the most recent. If I were Brent Simmons or equivalent, I'd be coding up a button where you