Re: PaceAllowDuplicateIDs

2005-05-15 Thread David Powell


Thomas Broyer wrote:

 David Powell wrote:
 I'm in favour of allowing duplicate ids. This only seems to be a
 partial solution though:
 
   Their atom:updated timestamps SHOULD be different
 
 But what if they are not? What if I want to represent an archive of a
 feed - maybe mine, maybe someone else's - but the atom:updated dates
 are the same in two or more entries? I thought it was up to the
 publisher to decide whether to rev atom:updated.

 If you don't update atom:updated (e.g. it's not a significant update,
 fixing typos, etc.), one could (I would) assume you don't want to 
 archive the previous entry state.

Archiving was just an example, but the Publisher and the Archiver are
different entities. It is up to the Publisher to decide what they
consider to be a significant update, and it is up to the Archiver to
decide what they want to archive.

 There are very few chances that you significantly update an entry with
 the same second

I agree, but this proposal distorts the intended meaning of
atom:updated, and I think that this risks atom:updated becoming an
unreliable indicator for the newness of entries, which is a shame,
because it is a useful feature.

-- 
Dave



Re: PaceAllowDuplicateIDs

2005-05-12 Thread Henry Story
Don't you have the same problem with atom:modified? What if the
publisher does not update the atom:modified entry?
I suppose that if you are making an archive of an atom entries and  
believe
that the author has made a mistake with the atom:updated field, you can
of course try to correct the mistake artificially in your own feed by
increasing the time precision.  So if the original entries both claim to
have been updated at 12:45 you could have one of them be modified at  
12:45:30
and the other at 12:45:31.

Just a thought.
Henry Story
On 12 May 2005, at 01:40, David Powell wrote:
I'm in favour of allowing duplicate ids. This only seems to be a
partial solution though:
  Their atom:updated timestamps SHOULD be different
But what if they are not? What if I want to represent an archive of a
feed - maybe mine, maybe someone else's - but the atom:updated dates
are the same in two or more entries? I thought it was up to the
publisher to decide whether to rev atom:updated.
I was always concerned that the existence of atom:updated without
atom:modified would cause the meaning of atom:updated as an alert to
be diluted to being equivalent to atom:modified. This proposal would
encourage that. It would mean, if you don't update atom:updated, then
your entry instances are second class.
The restriction forces services that proxy, or re-aggregate feeds to
drop entries on the floor, just because the user has chosen not to
update atom:updated.
atom:updated encourages aggregators to make loud noises when they see
a change; anything that encourages atom:updated to be changed just for
the sake of it is going to be very annoying to users and make
atom:updated useless as an alert flag.
I'm in favour of duplicate ids, but unfortunately, the only way I can
see them working is if we have atom:modified. I hate to bring this up,
especially now, but it would solve some problems, and it is cheap to
implement:
Is anyone still opposed to atom:modified?
--
Dave



Re: PaceAllowDuplicateIDs

2005-05-12 Thread Thomas Broyer
David Powell wrote:
I'm in favour of allowing duplicate ids. This only seems to be a
partial solution though:
  Their atom:updated timestamps SHOULD be different
But what if they are not? What if I want to represent an archive of a
feed - maybe mine, maybe someone else's - but the atom:updated dates
are the same in two or more entries? I thought it was up to the
publisher to decide whether to rev atom:updated.
If you don't update atom:updated (e.g. it's not a significant update, 
fixing typos, etc.), one could (I would) assume you don't want to 
archive the previous entry state.

There are very few chances that you significantly update an entry with 
the same second (that's the only way to get to the same atom:updated 
value if the time-secfrac is not provided), so you shouldn't have to 
track versions of an entry with the same atom:updated values.

So you can have a MUST instead of your SHOULD.
--
Thomas Broyer


Re: PaceAllowDuplicateIDs

2005-05-12 Thread Eric Scheid

On 12/5/05 5:26 PM, Henry Story [EMAIL PROTECTED] wrote:

 Don't you have the same problem with atom:modified? What if the
 publisher does not update the atom:modified entry?

Theoretically, the publisher doesn't get that choice. To not update
atom:modified would be an error.

There remains the precision issue though.

e.



Re: PaceAllowDuplicateIDs alteration

2005-05-12 Thread David Powell


Friday, May 6, 2005, 3:52:19 PM, Eric Scheid wrote:

 On 7/5/05 12:09 AM, Graham [EMAIL PROTECTED] wrote:

 If an Atom Feed Document contains multiple entries with the same
 atom:id, software MUST treat them as multiple versions of the same
 entry
 
 I don't think this changes the technical meaning of the proposal, but
 does make it very explicit.

 +1, with one minor amendment.

 s/versions/instantiations/

 the spec uses that word elsewhere, and 'versions' might suggest a media
 adaptation, language variant, etc.

How about revisions?

versions could sound too broad, it could imply that the entry could
change for all sorts of reasons, conneg etc.

instantiations/instances could sound like they could be identical
copies of the same information.

revisions suggests something that changes over time, which is closer
to what we mean I think?

-- 
Dave



RE: PaceAllowDuplicateIDs

2005-05-08 Thread Martin Duerst
At 00:12 05/05/07, Bob Wyman wrote:
Right. We have abstract feeds and entries and we have concrete feeds
and entries. The abstract feed is the actual stream of entries and updates
to entries as they are created over time. Feed documents are concrete
snapshots of this stream or abstract feed of entries. An abstract entry is
made concrete in entry documents or entry elements. An abstract entry may
change over time and may have one or more concrete instantiations.
Some applications are only interested in being exposed to those
concrete entries that reflect the current or most recent state of the
abstract entries -- these apps would prefer to see no duplicate ids in
concrete feed documents even though these duplicates *will* occur in the
abstract feed. Other applications will require visibility to the entire
stream of changes to abstract entries -- these applications will wish to see
concrete feeds that may contain multiple, differing concrete instantiations
of abstract entries. i.e. they will want the concrete feed to be an accurate
representation of the abstract feed. Two needs, to views...
You say 'some applications' and 'other applications', as if they were on
the same footing. In my view, the 'some applicaitons' (only interested
in latest version) should be the usual case, and the 'other applications'
(interested in more than one version) should be the exception.
Mapping that back to the origin, applications generating feeds that in
one way or another rely on the user getting more than one, or more than
the latest, versions of their entries have made a design error, they
have taken the wrong thing for the 'entry'. If they think that they
have two different kinds of audiences, interested in two different
things, they should publish two feeds. Some people claim that we
need a definition for 'entry' to finish this discussion, but once
we confirm that a feed can only contain one version of an entry with
the same ID, the definition of entry is as clear as we need it to be.
This is just the same as for Web pages. If somebody puts up a Web page
for the current weather, there is nothing in HTTP that will help me
get the past versions of this page. If the publisher thinks that
people may be interested in past weather info, they will make up
separate pages. If we think that it would be valuable to be able
to correlate the entries in both feeds, we should define an
extension for that, not mess around with the basic model.
An extension would be rather easy, we only need two rel values
for links in entries. One rel value could be called permaentry,
the other could be called updatingentry. Maybe a third called
updatingfeed, if there is an updating feed for a single changing
entry. I'm sure there are better names.
The main use I see for documents with multiple entries with the same
ID is archives. Everything else can be handled by the creater doing
the right thing, or by an immediary offering a new feed with versions
of the entry (no guarantee to have all of them, in that case). Even
archives could be handled that way if really needed, but it's difficult
to immagine that everybody will publish an archive feed. We can easily
define an archive top level element, and that problem is solved.
For aggregators, wanting to forward two or more entries with the
same ID for me means that they are simply not doing their job.
Aggregators should aggregate, not just function as GIGO (garbage
in, garbage out) processors.
So it should be clear that I'm -1 on PaceAllowDuplicateIDs.
Regards,Martin. 



Re: PaceAllowDuplicateIDs

2005-05-08 Thread Henry Story
 that
maintains the past state present. Those will be cool extensions.
Why disallow those extensions by ruling now that feeds can't have
multiple entries with the same id?
The main use I see for documents with multiple entries with the same
ID is archives. Everything else can be handled by the creater doing
the right thing, or by an immediary offering a new feed with versions
of the entry (no guarantee to have all of them, in that case). Even
archives could be handled that way if really needed, but it's  
difficult
to immagine that everybody will publish an archive feed. We can easily
define an archive top level element, and that problem is solved.
Yes, but what is the real advantage of defining an archive top level  
element
that will never make it into this spec, when we can have all that we  
need
by being flexible? In any case by admitting that a top level archive  
element
could exist you are admitting that there is not really any problem with
grouping multiple entry versions together.

For aggregators, wanting to forward two or more entries with the
same ID for me means that they are simply not doing their job.
Aggregators should aggregate, not just function as GIGO (garbage
in, garbage out) processors.
I wonder why we need to specify what a good aggregator is and what
it is not. Let the applications and the market decide. We just need
to make certain types of communication possible.
So it should be clear that I'm -1 on PaceAllowDuplicateIDs.
Should I be -1 on UTF-8 and internationalization because most people
I know are english, french or german and so that ISO-latin x is good
enough for me?
Regards,Martin.



Re: PaceAllowDuplicateIDs

2005-05-07 Thread Roger B.

 I have no good answer to that until I know what what an id stands for.
 The answer an entry isn't sufficient.

Bill: Semi-random thoughts...

* An atom:id is a globally unique name for a specific database query.

* There is no stream of instances over time. There's just old data
that's out of sync with that query.

* If duplicate ids are allowed, the world won't come to an end. I'm
almost sure. I think.

--
Roger Benningfield



Re: PaceAllowDuplicateIDs alteration

2005-05-07 Thread Eric Scheid

On 7/5/05 3:53 AM, Tim Bray [EMAIL PROTECTED] wrote:

 On May 6, 2005, at 8:49 AM, Eric Scheid wrote:
 
 Are they still the same entry if they have different source elements that
 identify their source as being different feeds?
 
 I don't see why. I subscribe to a Local News feed, a National News feed, and
 a Science News feed. All from the same publisher. The same story may appear
 in one, two, or three of those feeds. I don't believe each of those feeds
 would have the same feed/source values.
 
 Right but the story's atom:entry would have the same atom:id in each of
 those feeds, right?  So they are the same entry, right? -Tim
 

typo: I don't see why not.

d'oh!

(Tim: yes)

e.



Re: PaceAllowDuplicateIDs

2005-05-06 Thread Bill de hÓra
Robert Sayre wrote:
I'm much more sympathetic to the aggregate feed problem of multiple
IDs. People advocating this type of thing seem to think the default
action should be grouping, so they want to use the same ID. I think
that's a bad idea, and there are plenty of other ways to indicate the
fundamental sameness of entries. For example, NewsML URNs have a
NewsItemID and a RevisionID, which would allow smart aggregators to
group the entries without violating Atom's constraint.
Then you have two ways of indicating fundamental sameness of entries, 
one for when the same entry appears multiple times in a feed, and one 
for everything else.

Back to basics then. Does anyone remember why having the same id in a 
feed is a bad idea?

cheers
Bill


Re: PaceAllowDuplicateIDs

2005-05-06 Thread Graham
On 6 May 2005, at 2:10 pm, Dave Johnson wrote:
Yes, I think both of my arguments fail to hold and I no longer have  
a real objection to duplicates. Allowing duplicates gives feed  
produces to model events or other objects (versioned documents in a  
wiki) as they wish. Like you, I wonder Does anyone remember why  
having the same id in a feed is a bad idea?
Beacuse instead of a fixed model where a feed is a stream of entries  
each with their own id, it is now a stream of entries each of which  
does not have its own id, but shares it with similar entries. This is  
bullshit.

Graham


Re: PaceAllowDuplicateIDs

2005-05-06 Thread Brett Lindsley

Unique IDs allow clients to determine the state of the feed.  If entry 
ids are
not unique, then we still need some other way to determine the unique
state of the feed. If we allow duplicate IDs but *require* something else
to be different (e.g. update time), then we can still determine the unique
state of a feed and repeated IDs are OK. We would need to properly
document which elements make an entry unique in the event of a duplicated
ID. Brett.

On 6 May 2005, at 2:10 pm, Dave Johnson wrote:
Yes, I think both of my arguments fail to hold and I no longer have  
a real objection to duplicates. Allowing duplicates gives feed  
produces to model events or other objects (versioned documents in a  
wiki) as they wish. Like you, I wonder Does anyone remember why  
having the same id in a feed is a bad idea?





PaceAllowDuplicateIDs alteration

2005-05-06 Thread Graham
As the WG may have noticed, I have some serious problems with the  
Pace. One small change would eliminate about 75% of them:

Replace the line:
If an Atom Feed Document contains multiple entries with the same  
atom:id, software MAY choose to display all of them or some subset of  
them

with:
If an Atom Feed Document contains multiple entries with the same  
atom:id, software MUST treat them as multiple versions of the same  
entry

I don't think this changes the technical meaning of the proposal, but  
does make it very explicit.

Would anyone object to this change?
Graham


RE: PaceAllowDuplicateIDs

2005-05-06 Thread Bob Wyman

Graham wrote:
Does anyone remember why having the same id in a feed is a bad idea?
 Beacuse instead of a fixed model where a feed is a stream of
 entries each with their own id, it is now a stream of entries each
 of which does not have its own id, but shares it with similar
 entries. This is bullshit.
I completely disagree on this.
I think the problem here is people focusing too much on
characteristics of the feed when the real issue here is Entries. Like I've
said in the past, It's about the Entries, Stupid! (don't take offense...)
As long as we allow entries to be updated, it is inevitable that the
stream of entries that is created over time will contain instances of
entries that share common atom:id values. 
The only question here is whether or not we're willing to allow a
feed document to *accurately* represent the stream of entries -- as they
were created -- or whether we insist that the feed document censor the
history of the stream by removing old instances of updated entries before
allowing updates to be inserted.
The reality is that no matter which decision we make in this case,
any useful aggregator must have code to deal with multiple instances of an
entries that share the same atom:id. This is the case since even if we don't
permit duplicate IDs in a single instance of a feed document, we would still
permit duplicate ID's *over time. Because duplicate ids appear, over time,
whenever you update an entry, the aggregator has to have all the logic
needed to handle them in the *stream* of entries that it reads -- over time.
This issue only becomes interesting if we try to provide special
rules for the handling of data within a single instance of a feed document.
The reality is, however, that any aggregator that actually pays attention to
these special case rules is going to either get more complex (since it can't
simply treat everything as a stream of entries) or it will get confused
(since folk will intentionally or unintentionally create duplicate ids).
This ban on duplicate ids provides no benefit for aggregators, it
makes feed producers more complex, it tempts aggregator or client writers to
do dangerous things, it forces deletion of data that is useful to some
people for some applications, it puts too much emphasis on feeds when we
should be working on entries, etc... It is a really bad thing to do.

bob wyman




Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Brett Lindsley

What determines a version? If we have multiple entries with identical 
information,
these are copies, not versions.

The real issue is feed state. Given there are identical IDs, can we 
determine if
the entries are identical or different? The end client can then deal 
with it any way
it wants (e.g.):
- Keep only the most recent.
- Display them all.
- Combine them all into a single entry with a history.
- Show changes.
- Drop duplicates.
- etc.

Sorry I don't have an answer here, just an observation of the exact problem.
Brett

Graham wrote:

If an Atom Feed Document contains multiple entries with the same  
atom:id, software MUST treat them as multiple versions of the same  
entry





Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Tim Bray
On May 6, 2005, at 7:09 AM, Graham wrote:
Replace the line:
If an Atom Feed Document contains multiple entries with the same 
atom:id, software MAY choose to display all of them or some subset of 
them

with:
If an Atom Feed Document contains multiple entries with the same 
atom:id, software MUST treat them as multiple versions of the same 
entry
Hmmm; the Pace already says If multiple atom:entry elements with the 
same atom:id value appear in an Atom Feed document, they represent the 
same entry.  So what you want is almost there.

I don't think this changes the technical meaning of the proposal, but 
does make it very explicit.
My problem is that when you say software MUST, I think you should 
follow up with something specific and testable.  This assertion feels 
fairly vague and leaves room for lots of argument.  Maybe a first step 
to tightening it up would be to provide some specific examples of 
software behaviors that would be forbidden/allowed by this MUST-clause.

-Tim [who has yet to express a +1 or any other final opinion about this 
Pace]



Re: PaceAllowDuplicateIDs

2005-05-06 Thread Eric Scheid

On 6/5/05 11:37 PM, Graham [EMAIL PROTECTED] wrote:

 Beacuse instead of a fixed model where a feed is a stream of entries
 each with their own id, it is now a stream of entries each of which
 does not have its own id, but shares it with similar entries. This is
 bullshit.

See the spec:

[...] Put another way, an atom:id element pertains to all
instantiations of a particular Atom entry or feed; revisions
retain the same content in their atom:id elements. [...]

This clearly implies that a feed is a stream of *instantiations* of an
entry.

Put another way, the map is not the territory, the entry is not the
'entry'.

e.



Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Eric Scheid

On 7/5/05 12:09 AM, Graham [EMAIL PROTECTED] wrote:

 If an Atom Feed Document contains multiple entries with the same
 atom:id, software MUST treat them as multiple versions of the same
 entry
 
 I don't think this changes the technical meaning of the proposal, but
 does make it very explicit.

+1, with one minor amendment.

s/versions/instantiations/

the spec uses that word elsewhere, and 'versions' might suggest a media
adaptation, language variant, etc.

e.



Re: PaceAllowDuplicateIDs

2005-05-06 Thread Bill de hÓra
Graham wrote:
On 6 May 2005, at 2:10 pm, Dave Johnson wrote:
Yes, I think both of my arguments fail to hold and I no longer have  a 
real objection to duplicates. Allowing duplicates gives feed  produces 
to model events or other objects (versioned documents in a  wiki) as 
they wish. Like you, I wonder Does anyone remember why  having the 
same id in a feed is a bad idea?

Beacuse instead of a fixed model where a feed is a stream of entries  
each with their own id, it is now a stream of entries each of which  
does not have its own id, but shares it with similar entries. This is  
bullshit.
No, it's not, not yet. You can't reasonably call bullshit on this either 
way until we know what's being identified. When I boil it down, this is 
what I get:

the technical problem we have is that we can't distinguish between a 
buggy feed with the same ids and an aggregate feed with the same ids 
under the current spec

I have no good answer to that until I know what what an id stands for. 
The answer an entry isn't sufficient.

cheers
Bill


Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Eric Scheid

On 7/5/05 1:16 AM, Bob Wyman [EMAIL PROTECTED] wrote:

 Graham wrote:
 If an Atom Feed Document contains multiple entries with the
 same atom:id, software MUST treat them as multiple versions of
 the same entry
 Are they still the same entry if they have different source elements
 that identify their source as being different feeds?

I don't see why. I subscribe to a Local News feed, a National News feed, and
a Science News feed. All from the same publisher. The same story may appear
in one, two, or three of those feeds. I don't believe each of those feeds
would have the same feed/source values.

e.



Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Robert Sayre

On 5/6/05, Bob Wyman [EMAIL PROTECTED] wrote:
 
 Graham wrote:
  If an Atom Feed Document contains multiple entries with the
  same atom:id, software MUST treat them as multiple versions of
  the same entry
 Are they still the same entry if they have different source elements
 that identify their source as being different feeds?

I would say yes. Entries IDs are globally unique. If they aren't the
same entry, it's a collision, right?

Robert Sayre



Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Tim Bray
On May 6, 2005, at 8:49 AM, Eric Scheid wrote:
Are they still the same entry if they have different source elements
that identify their source as being different feeds?
I don't see why. I subscribe to a Local News feed, a National News 
feed, and
a Science News feed. All from the same publisher. The same story may 
appear
in one, two, or three of those feeds. I don't believe each of those 
feeds
would have the same feed/source values.
Right but the story's atom:entry would have the same atom:id in 
each of those feeds, right?  So they are the same entry, right? -Tim



Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Graham
On 6 May 2005, at 4:16 pm, Bob Wyman wrote:
Graham wrote:
If an Atom Feed Document contains multiple entries with the
same atom:id, software MUST treat them as multiple versions of
the same entry
Are they still the same entry if they have different source  
elements
that identify their source as being different feeds?
Why wouldn't they be? It would mean Here's how the entry looked when  
it was published in feed A and Here's how the entry looked when it  
was in feed B. But if the publisher has assigned them the same ID,  
that's a fairly clear expression that they're versions of the same  
thing.

(obviously there's the danger of spoofing, but that's a general  
problem with IDs and not something that needs to be noted in every  
sentence)

Graham


Re: PaceAllowDuplicateIDs alteration

2005-05-06 Thread Antone Roundy
On Friday, May 6, 2005, at 09:16  AM, Bob Wyman wrote:
Graham wrote:
If an Atom Feed Document contains multiple entries with the
same atom:id, software MUST treat them as multiple versions of
the same entry
Are they still the same entry if they have different source elements
that identify their source as being different feeds?
In a perfect world with no malicious, undereducated, misinformed, 
intellectually challenged or other people people who don't mint ids 
appropriately, yes, they're the same entry.  In the real world, I have 
no idea.  A human looking at them could probably determine whether 
they're the same if they're different enough, but if they're 
substantially similar, then even a human wouldn't necessarily be able 
to determine whether they're the same or whether one is a malicious 
alteration.  There's no automated way to decide (unless their contents 
are identical).

Authors of consuming applications will have to decide whether or not to 
obey the commandment from the spec (if adopted) to treat them as being 
the same, or whether to give their users the option of making that 
decision.  Specifying that publishers who publish the same entry in 
multiple feeds MUST choose one to be the original source and express 
the rest as aggregated entries from that feed would make it much easier 
to justify treating them as different entries if they claimed to 
originate in different feeds.



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Henry Story
Tonight something incredible happened to me. You won't believe it. I  
was walking
back from the pubs when I got snapped by a passing space ships full
of hyper advanced aliens. They did various experiments on me, and  
cloned me 1000
times. It is terrible. I just don't know what to do.

I suppose that I means am +1000 on this now.
:-) That's consensus, I am sure.
Henry
http://bblfish.net/blog/
On 5 May 2005, at 06:02, Tim Bray wrote:
co-chair-hat status=OFF
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
This Pace was motivated by a talk I had with Bob Wyman today about  
the problems the synthofeed-generator community has.

Summary:
1. There are multiple plausible use-cases for feeds with duplicate IDs
2. Pro and Contra
3. Alternate Paces
4. Details about this Pace
1. Use-Cases
Here's a stream of stock-market quotes.
feedtitleMy Portfolio/title
 
 entrytitleMSFT/title
  updated2005-05-03T10:00:00-05:00/updated
  contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T11:00:00-05:00/updated
  contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T12:00:00-05:00/updated
  contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item
  /entry
/feed
You could also imagine a stream of weather readings.  Bob's actual  
here-and-now today use-case from PubSub is earthquakes, an entry  
describes an earthquake and they keep re-issuing it as new info  
about strength/location comes in.

Some people only care about the most recent version of the entry,  
others might want to see all of them.  Basically, each atom:entry  
element describes the same Entry, only at a different point in time.

You could argue that in some cases, these are representations of  
the Web resources identified by the atom:id URI, but I don't think  
we need to say that explicitly.

Yes, you could think of alternate ways of representing stock quotes  
or any of the other use-cases but this is simple and direct and  
idiomatic.

2. Pro and Contra
Given that I issued the consensus call rejecting the last attempt  
to do this, which was  PaceRepeatIdInDocument, I felt nervous about  
revisiting the issue.  So I went and reviewed the discussion around  
that one, which I extracted and placed at http://www.tbray.org/tmp/ 
RepeatID.txt for the WG's convenience.

Reviewing that discussion, I'm actually not impressed.  There were  
a few -1's but very few actual technical arguments about why this  
shouldn't be done.  The most common was Software will screw this  
up.  On reflection, I don't believe that.  You have a bunch of  
Entries, some of them have the same ID and are distinguished by  
datestamp.  Some software will show the latest, some will show all  
of them, the good software will allow switching back and forth.   
Doesn't seem like rocket science to me.

So here's how I see it: there are plausible use cases for doing  
this, and one of the leading really large-scale implementors in the  
space (PubSub) wants to do this right now.  Bob's been making  
strong claims about not being able to use Atom if this restriction  
remains in place.

I believe strongly that if there's something that implementors want  
to do, standards shouldn't get in the way unless there's real  
interoperability damage.  I'm certainly prepared to believe that  
this could cause interoperability damage, but to date I haven't  
seen any convincing arguments that it will.  I think that if we  
nonetheless forbid it, people who want to do this will (a) use RSS  
instead of Atom, (b) cook up horrible kludges, or (c) ignore us and  
just do it.

So my best estimate is that the cost of allowing dupes is probably  
much lower than the cost of forbidding them.

Finally, our charter does say that we're also supposed to specify  
how you'd go about archiving feeds, and AllowDuplicateIDs makes  
this trivial.  I looked around and failed to find how we claimed we  
were going to do that while still forbidding duplicates, but it's  
possible I missed that.

3. Alternate Paces
I didn't want to just revive PaceRepeatIdInDocument, because it  
used the word version in what I thought was kind of a sloppy way,  
and because it wasn't current against format-08.  I don't like  
either PaceDuplicateIDWithSource or ...WithSource2, they are  
complicated and don't really meet PubSub's needs anyhow.  So I'm  
strongly -1 on both of those.  Yes, that means that if this Pace  
fails, we'll allow no duplicates at all.  I prefer either dupes  
OK or no dupes to dupes OK in the following circumstances;  
cleaner.

4. Details
Section 4.1.2 of format-08 says that atom:entry represents an  
individual entry.  The Pace says that if you have dupes, they  
represent the same entry, which I think is consistent with both  
the letter and spirit of 4.1.2.

The Pace discourages duplicate timestamps without resorting to MUST  
language, because accidents can happen; this allows software to  
throw

Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham
On 5 May 2005, at 5:02 am, Tim Bray wrote:
feedtitleMy Portfolio/title
 
 entrytitleMSFT/title
  updated2005-05-03T10:00:00-05:00/updated
  contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T11:00:00-05:00/updated
  contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T12:00:00-05:00/updated
  contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item
  /entry
/feed
Tim, model this as a blog first. Is it:
a) One entry that's being updated?
b) Hourly new postings with the latest price?
See, I think it's b). Which under any sensible circumstance would  
count as new entries, and therefoe get new ids. You're trying to use  
atom:id as a category system here. Let's say I post a new picture of  
my cat every day. Should all my blog entries have the same id?

Technical problems:
The problem multiple ids is that we don't have a date element that  
provides a definitive answer to the question, What is the current  
version?, which 99% of the time is all an aggregator needs. For  
example, what happens if I retract an update to an entry, and  
presumably roll back atom:updated? The new version stays? If so, the  
spec of atom:updated needs changing.

I see you have the constraint Their atom:updated timestamps SHOULD  
be different, and processing software SHOULD regard entries with  
duplicate atom:id and atom:updated values as evidence of an error in  
the feed generation. Does this apply temporally as well as  
spatially? For example, if the content changes the second time I load  
something, but the atom:updated doesn't, is that an error?

Again, atom:updated falls short for this purpose.
Finally, at pubsub, what happens when they download an entry from one  
feed, then the user edits it, but doesn't modify atom:updated, then  
they download the new entry from a second feed associated with the  
site. Different content, identical atom:ids, identical atom:updated  
= Invalid feed. They're not in any better position than they were  
before. This doesn't even solve the problem it's meant to.

If an Atom Feed Document contains multiple entries with the same  
atom:id, software MAY choose to display all of them or some subset of  
them

What does this even mean, other than atom:id is meaningless, ignore  
it?

I looked around and failed to find how we claimed we were going to  
do that while still forbidding duplicates, but it's possible I  
missed that.
Duplicate ids is a constraint of the atom:feed element. Use a  
different top level element, atom:archive, for archives.

Graham


Re: http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs

2005-05-05 Thread Graham
On 5 May 2005, at 2:20 am, Bob Wyman wrote:
Basically, you dont have to update atom:updated unless you think  
it makes sense OR you are publishing to a feed that already has an  
entry with the same atom:id as the atom:id of the entry you are  
currently publishing.
Or someone downstream is publishing one, of course. Which means you  
must always change atom:updated, just in case. No harm done, Tim.  
None at all.

Graham



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham

On 5 May 2005, at 11:36 am, Henry Story wrote:
Tim, model this as a blog first. Is it:
a) One entry that's being updated?
b) Hourly new postings with the latest price?
Given that the ids are the same it is now clear that we have  
situation a)
I said first, before we decide what ids we should use. If created  
as a blog, Tim's stock quotes would make most sense (to me) posted as  
hourly new entries. Ergo, they should each have different ids. Again,  
atom:id is not a category system.

Either the change they made is significant or it is not. If it is  
a significant change then
by not changing the atom:updated field the user will have done  
something other than what he thought
he was doing. For by not changing the date he is allowing receiving  
software to decide by themselves
whether they wish to keep or drop the change. If it is not a  
significant change, then the receiving
software won't be doing anything problematic by either dropping the  
later version received or keeping it.
atom:updated is used by the publisher to show what they consider a  
significant change. The user, on the other hand, wants to see the  
latest version, reliably, even if the publisher disagrees that the  
change was significant. This is the core problem with Tim's proposal.  
There is no way to create an aggregator that works in the way the  
user expects.

I am lying down in the road here, as Tim would say.
Well that seems like a very complicated way of solving a problem  
where allowing entries with duplicate ids in a feed document from  
the start would be much simpler. If you are going to
allow archive feeds to keep duplicates then why not just allow   
feeds and be done with it?
Because feeds are feeds and archives are archives? They have  
different audiences and different uses and different requirements.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Robert Sayre

On 5/5/05, Eric Scheid [EMAIL PROTECTED] wrote:

  Because feeds are feeds and archives are archives? They have
  different audiences and different uses and different requirements.
 
 And what about the use case of a wiki's RecentChanges log? Each entry refers
 to a specific page, and there may be multiple such entries for each page as
 it gets rapidly edited ... and wiki folks have found it important to be able
 to monitor all change events.

I'm much more sympathetic to the aggregate feed problem of multiple
IDs. People advocating this type of thing seem to think the default
action should be grouping, so they want to use the same ID. I think
that's a bad idea, and there are plenty of other ways to indicate the
fundamental sameness of entries. For example, NewsML URNs have a
NewsItemID and a RevisionID, which would allow smart aggregators to
group the entries without violating Atom's constraint.

Robert Sayre



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham

On 5 May 2005, at 2:26 pm, Eric Scheid wrote:
perhaps we needed atom:modified after all :-(
Yes we do, if we want to go down this route. I suggest appending the  
current time (or for old versions, the last time that version was  
current) at the source.

And what about the use case of a wiki's RecentChanges log? Each  
entry refers
to a specific page, and there may be multiple such entries for each  
page as
it gets rapidly edited ... and wiki folks have found it important  
to be able
to monitor all change events.
Each log entry is an entry in itself, with its own id. That seems a  
far better functional parallel to the basic blog feed. As with the  
share price example, the topic of the entry (the company, or the wiki  
page) is far more analogous to a category that the entry belongs to,  
than to an its identity. Everyone stop trying to use ids as a  
category system.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Henry Story

On 5 May 2005, at 15:55, Graham wrote:
On 5 May 2005, at 2:26 pm, Eric Scheid wrote:

perhaps we needed atom:modified after all :-(
Yes we do, if we want to go down this route. I suggest appending  
the current time (or for old versions, the last time that version  
was current) at the source.
Sorry I don't understand why we need atom:modified.

And what about the use case of a wiki's RecentChanges log? Each  
entry refers
to a specific page, and there may be multiple such entries for  
each page as
it gets rapidly edited ... and wiki folks have found it important  
to be able
to monitor all change events.

Each log entry is an entry in itself, with its own id. That seems a  
far better functional parallel to the basic blog feed.
As I explained in my lengthy reply to your lengthy post, I think one  
should be able to do either.
Each way has its advantages and disadvantages. Let the publisher  
decide which mechanism to use.

As with the share price example, the topic of the entry (the  
company, or the wiki page) is far more analogous to a category that  
the entry belongs to, than to an its identity.
Again let the publisher choose what the identity criterion of his  
objects are. Some will stick
some will not. But it is not up to us to decide for our users.

Since it does not cause any interoperability issues, what's the problem?
Everyone stop trying to use ids as a category system.
I don't think that one would be using ids as a category system. If  
you go to
http://google.com you get todays front page. Tomorrow you get  
tomorrows front page.
What's the problem? Is http://google.com a hidden category system?

Henry Story
http://bblfish.net/blog/


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Robert Sayre

On 5/5/05, Eric Scheid [EMAIL PROTECTED] wrote:
 
 On 5/5/05 11:55 PM, Graham [EMAIL PROTECTED] wrote:
 
  Each log entry is an entry in itself, with its own id.
 
 Sorry, that makes as much sense as changing the id for a blog entry if that
 blog entry is updated.

Graham's got it exactly right. 
 
 The functional parallel is wiki-page = blog-entry, and if a blog-entry is
 updated then that is reflected in the feed as an updated entry - with the
 same id.

That's right, that is the functional parallel. No software I know of 
shows both revisions of the entry in the feed when it's updated. If
you are syndicating wiki changes, part of each entry is the diff and
revision id--each revision is a unique thing.

Another analogous use case would be a feed watching a certain file in
CVS. Every entry would be about the same file, but each would have its
own atom:id.

Once again, there remains a downstream problem for PubSub, etc. 

Robert Sayre



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham

On 5 May 2005, at 3:32 pm, Henry Story wrote:
As I explained in my lengthy reply to your lengthy post, I think  
one should be able to do either.
Each way has its advantages and disadvantages. Let the publisher  
decide which mechanism to use.
Well please flag it so that I can provide a consistent user interface  
to people's whims?

Since it does not cause any interoperability issues, what's the  
problem?
I have to come up with a new way to recognise and interpret such  
feeds where an entry (as defined by its id) isn't an entry but a feed  
of different entries.

I don't think that one would be using ids as a category system. If  
you go to
http://google.com you get todays front page. Tomorrow you get  
tomorrows front page.
What's the problem? Is http://google.com a hidden category system?
Charter: Atom defines a feed format for representing resources such  
as Weblogs, online journals, Wikis,
and similar content

Atom is not a replacement for HTTP. Google.com is a web page, not  
similar content. It's not relevant here.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham

On 5 May 2005, at 3:23 pm, Eric Scheid wrote:
And what about the use case of a wiki's RecentChanges log? Each  
entry refers
to a specific page, and there may be multiple such entries for  
each  page as
it gets rapidly edited ... and wiki folks have found it important  
to be able
to monitor all change events.


Each log entry is an entry in itself, with its own id.
Sorry, that makes as much sense as changing the id for a blog entry  
if that
blog entry is updated.
Have a look here:
http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history
There you have a reverse chrono list, each with an author, date, and  
summary. Looks an awful lot like each one is an entry to me.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Eric Scheid

On 6/5/05 12:32 AM, Henry Story [EMAIL PROTECTED] wrote:

 Sorry I don't understand why we need atom:modified.

Graham suggested it : a reliable way for an aggregator to discern the latest
version of an entry.

 atom:updated is used by the publisher to show what they consider a
 significant change. The user, on the other hand, wants to see the
 latest version, reliably, even if the publisher disagrees that the
 change was significant. This is the core problem with Tim's proposal.
 There is no way to create an aggregator that works in the way the
 user expects.

... no way, that is, unless we have atom:modified making each same-id entry
distinct, and not just distinct but also time-ordered and time-distanced (an
advantage over just using something similar to the NewsML RevisionID
mechanism).

e.



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Eric Scheid

On 6/5/05 12:45 AM, Graham [EMAIL PROTECTED] wrote:

 Have a look here:
 http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history
 
 There you have a reverse chrono list, each with an author, date, and
 summary. Looks an awful lot like each one is an entry to me.

and looks to me like a stream of meta-data concerning the one entry to me.

and not distinct and separable entries like you'd find in your every day
blog.

Henry has the right idea -- the spec should allow both kinds, rather than
trying to shoe-horn everything into the one viewpoint of what is an entry.

e.



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Henry Story

On 5 May 2005, at 16:38, Graham wrote:
On 5 May 2005, at 3:32 pm, Henry Story wrote:
As I explained in my lengthy reply to your lengthy post, I think  
one should be able to do either.
Each way has its advantages and disadvantages. Let the publisher  
decide which mechanism to use.
Well please flag it so that I can provide a consistent user  
interface to people's whims?
What is the problem with the user interface that you have exactly? I  
have pointed you to
BlogEd that keeps a history of all the changes to an entry. Try it out:
http://blogs.sun.com/roller/page/bblfish/
Its open source, so you can also copy the code.

If you don't want to keep a history of the entries all you need to do  
is drop all but the
latest entry with the same id. There is nothing more to it. Just show  
the user the last
one you came across.

Since it does not cause any interoperability issues, what's the  
problem?
I have to come up with a new way to recognise and interpret such  
feeds where an entry (as defined by its id) isn't an entry but a  
feed of different entries.
No you don't. Just drop the old ones, if you don't care about the  
history. Really simple.
As Tim Bray's text says

 [[
   software MAY choose to display all of them or some subset of them
 ]]
So just drop the older versions.
I don't think that one would be using ids as a category system. If  
you go to
http://google.com you get todays front page. Tomorrow you get  
tomorrows front page.
What's the problem? Is http://google.com a hidden category system?
Charter: Atom defines a feed format for representing resources  
such as Weblogs, online journals, Wikis,
and similar content
yes, and it must also allow the representation of
[[
   * a complete archive of all entries in a feed
]]
This proposal permits this, and it does not harm anyone else.
Atom is not a replacement for HTTP. Google.com is a web page, not  
similar content. It's not relevant here.
I don't know where you get the idea that I said atom is a replacement  
for HTTP. Take a breath
perhaps and relax before you answer.

Graham



Re: PaceAllowDuplicateIDs

2005-05-05 Thread David M Johnson
I'm -1 on PaceAllowDuplicateIDs
Reasons:
1) We're supposed to be standardizing current practice not inventing 
new things. Current best practice is to have unique IDs and current 
software (e.g. Javablogs.com) is predicated on this practice. I know, 
this practice is not followed widely enough, but that is another 
matter.

2) I think it is *much* more useful to think of an Atom Entry as an 
event that occurred at a specific time. Typically, an event is the 
publication of an article or blog entry on the web. For example:

   event: CNET published article
   subject: CNET
   object: article
But an event it could also represent other events.
   event: delivery van delivers package
   subject: delivery van
   object: package
   event: alarm system sends warning
   subject: alarm system
   object: warning
   event: server sends load warning
   subject: server
   object: load warning
If you think of Atom Entries as events, then it makes sense to consider 
the Atom Entry ID to be the ID of the event, not the ID of the subject 
or object of the event. Events are unique (you can't have more than one 
version of an event) and can be assigned GUIDs and therefore you cannot 
have more than one entry with the same ID.

In the case of earthquake data, each new data report is a new event.
   event: agency reports earthquake data
   subject: agency
   object: earthquake data
The ID is the ID of the data reported event not the ID of the 
earthquake.

We don't know what subjects and objects people are going to use in the 
future, so we can't specify Atom elements or IDs for subjects and 
objects -- that's what extensions are for. If you want to create a feed 
to syndicate information about earthquakes, then you introduce an 
extension for uniquely identifying earthquakes. The same goes for 
earthquakes.

- Dave

On May 5, 2005, at 12:02 AM, Tim Bray wrote:
co-chair-hat status=OFF
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
This Pace was motivated by a talk I had with Bob Wyman today about the 
problems the synthofeed-generator community has.

Summary:
1. There are multiple plausible use-cases for feeds with duplicate IDs
2. Pro and Contra
3. Alternate Paces
4. Details about this Pace
1. Use-Cases
Here's a stream of stock-market quotes.
feedtitleMy Portfolio/title
 
 entrytitleMSFT/title
  updated2005-05-03T10:00:00-05:00/updated
  contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T11:00:00-05:00/updated
  contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T12:00:00-05:00/updated
  contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item
  /entry
/feed
You could also imagine a stream of weather readings.  Bob's actual 
here-and-now today use-case from PubSub is earthquakes, an entry 
describes an earthquake and they keep re-issuing it as new info about 
strength/location comes in.

Some people only care about the most recent version of the entry, 
others might want to see all of them.  Basically, each atom:entry 
element describes the same Entry, only at a different point in time.

You could argue that in some cases, these are representations of the 
Web resources identified by the atom:id URI, but I don't think we need 
to say that explicitly.

Yes, you could think of alternate ways of representing stock quotes or 
any of the other use-cases but this is simple and direct and 
idiomatic.

2. Pro and Contra
Given that I issued the consensus call rejecting the last attempt to 
do this, which was  PaceRepeatIdInDocument, I felt nervous about 
revisiting the issue.  So I went and reviewed the discussion around 
that one, which I extracted and placed at 
http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience.

Reviewing that discussion, I'm actually not impressed.  There were a 
few -1's but very few actual technical arguments about why this 
shouldn't be done.  The most common was Software will screw this up. 
 On reflection, I don't believe that.  You have a bunch of Entries, 
some of them have the same ID and are distinguished by datestamp.  
Some software will show the latest, some will show all of them, the 
good software will allow switching back and forth.  Doesn't seem like 
rocket science to me.

So here's how I see it: there are plausible use cases for doing this, 
and one of the leading really large-scale implementors in the space 
(PubSub) wants to do this right now.  Bob's been making strong claims 
about not being able to use Atom if this restriction remains in place.

I believe strongly that if there's something that implementors want to 
do, standards shouldn't get in the way unless there's real 
interoperability damage.  I'm certainly prepared to believe that this 
could cause interoperability damage, but to date I haven't seen any 
convincing arguments that it will.  I think that if we nonetheless 
forbid it, people who want to do this will (a) use RSS instead of 
Atom, (b) cook

Re: PaceAllowDuplicateIDs

2005-05-05 Thread Antone Roundy
On Thursday, May 5, 2005, at 09:15  AM, Eric Scheid wrote:
Have a look here:
http://en.wikipedia.org/w/index.php?title=Main_Pageaction=history
There you have a reverse chrono list, each with an author, date, and
summary. Looks an awful lot like each one is an entry to me.
and looks to me like a stream of meta-data concerning the one entry to 
me.

and not distinct and separable entries like you'd find in your every 
day
blog.

Henry has the right idea -- the spec should allow both kinds, rather 
than
trying to shoe-horn everything into the one viewpoint of what is an 
entry.

+1 -- allow the publisher to decide which model fits their intent.


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Dave Johnson
Immediately after sending this message, I had a rush  of second 
thoughts.

My point #2 is not very well thought out. I think it applies for things 
like earthquake data, but when Atom feeds represent blog entries or 
articles (in an archive or an Atom Protocol feed) the  ID represents 
the article not an event in the blog entry's life.  So, you can 
discount my second reason against the pace.

- Dave

On May 5, 2005, at 11:27 AM, David M Johnson wrote:
I'm -1 on PaceAllowDuplicateIDs
Reasons:
1) We're supposed to be standardizing current practice not inventing 
new things. Current best practice is to have unique IDs and current 
software (e.g. Javablogs.com) is predicated on this practice. I know, 
this practice is not followed widely enough, but that is another 
matter.

2) I think it is *much* more useful to think of an Atom Entry as an 
event that occurred at a specific time. Typically, an event is the 
publication of an article or blog entry on the web. For example:

   event: CNET published article
   subject: CNET
   object: article
But an event it could also represent other events.
   event: delivery van delivers package
   subject: delivery van
   object: package
   event: alarm system sends warning
   subject: alarm system
   object: warning
   event: server sends load warning
   subject: server
   object: load warning
If you think of Atom Entries as events, then it makes sense to 
consider the Atom Entry ID to be the ID of the event, not the ID of 
the subject or object of the event. Events are unique (you can't have 
more than one version of an event) and can be assigned GUIDs and 
therefore you cannot have more than one entry with the same ID.

In the case of earthquake data, each new data report is a new event.
   event: agency reports earthquake data
   subject: agency
   object: earthquake data
The ID is the ID of the data reported event not the ID of the 
earthquake.

We don't know what subjects and objects people are going to use in the 
future, so we can't specify Atom elements or IDs for subjects and 
objects -- that's what extensions are for. If you want to create a 
feed to syndicate information about earthquakes, then you introduce an 
extension for uniquely identifying earthquakes. The same goes for 
earthquakes.

- Dave

On May 5, 2005, at 12:02 AM, Tim Bray wrote:
co-chair-hat status=OFF
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
This Pace was motivated by a talk I had with Bob Wyman today about 
the problems the synthofeed-generator community has.

Summary:
1. There are multiple plausible use-cases for feeds with duplicate IDs
2. Pro and Contra
3. Alternate Paces
4. Details about this Pace
1. Use-Cases
Here's a stream of stock-market quotes.
feedtitleMy Portfolio/title
 
 entrytitleMSFT/title
  updated2005-05-03T10:00:00-05:00/updated
  contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T11:00:00-05:00/updated
  contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T12:00:00-05:00/updated
  contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item
  /entry
/feed
You could also imagine a stream of weather readings.  Bob's actual 
here-and-now today use-case from PubSub is earthquakes, an entry 
describes an earthquake and they keep re-issuing it as new info about 
strength/location comes in.

Some people only care about the most recent version of the entry, 
others might want to see all of them.  Basically, each atom:entry 
element describes the same Entry, only at a different point in time.

You could argue that in some cases, these are representations of the 
Web resources identified by the atom:id URI, but I don't think we 
need to say that explicitly.

Yes, you could think of alternate ways of representing stock quotes 
or any of the other use-cases but this is simple and direct and 
idiomatic.

2. Pro and Contra
Given that I issued the consensus call rejecting the last attempt to 
do this, which was  PaceRepeatIdInDocument, I felt nervous about 
revisiting the issue.  So I went and reviewed the discussion around 
that one, which I extracted and placed at 
http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience.

Reviewing that discussion, I'm actually not impressed.  There were a 
few -1's but very few actual technical arguments about why this 
shouldn't be done.  The most common was Software will screw this 
up.  On reflection, I don't believe that.  You have a bunch of 
Entries, some of them have the same ID and are distinguished by 
datestamp.  Some software will show the latest, some will show all of 
them, the good software will allow switching back and forth.  Doesn't 
seem like rocket science to me.

So here's how I see it: there are plausible use cases for doing this, 
and one of the leading really large-scale implementors in the space 
(PubSub) wants to do this right now.  Bob's been making strong claims 
about not being able to use Atom

Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham
On 5 May 2005, at 4:22 pm, Henry Story wrote:
If you don't want to keep a history of the entries all you need to  
do is drop all but the
latest entry with the same id. There is nothing more to it. Just  
show the user the last
one you came across.
But, if we follow Eric's model of how a wiki changelog should be  
defined, I'll be missing entries in the log, because several  
different entries have the same id. Ergo, the user interface and data  
model for the new type of feed this proposal permits is very different.

This proposal permits this, and it does not harm anyone else.
It harms everyone, by allowing a second, unrelated data model in Atom  
feeds. They may not be posting today, but I assure you, when other  
aggregator authors get the first user complaints about how Eric's  
wiki log displays incompletely in their program, they'll forgive Dave  
Winer everything.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread Henry Story
Hi Dave,
 nice to see you participate here. I understand your points, and  
I myself thought the
way you did for a while.

[Oops, I see now that you have retracted your point. Oh well. I had  
already started writing
the following]

On 5 May 2005, at 17:27, David M Johnson wrote:
I'm -1 on PaceAllowDuplicateIDs
Please consider the following points before you vote.
Reasons:
1) We're supposed to be standardizing current practice not  
inventing new things. Current best practice is to have unique IDs  
and current software (e.g. Javablogs.com) is predicated on this  
practice. I know, this practice is not followed widely enough, but  
that is another matter.
Atom is standardizing current practice, but it is also adding some  
features. For example name
spaces and ids. The atom charter also requires us to allow archives

[[
  * a complete archive of all entries in a feed
 ]]
Graham himself thinks that archives are possible, since he supports  
the use of an
archive head element.

2) I think it is *much* more useful to think of an Atom Entry as an  
event that occurred at a specific time. Typically, an event is the  
publication of an article or blog entry on the web. For example:

   event: CNET published article
   subject: CNET
   object: article
But an event it could also represent other events.
   event: delivery van delivers package
   subject: delivery van
   object: package
   event: alarm system sends warning
   subject: alarm system
   object: warning
   event: server sends load warning
   subject: server
   object: load warning
If you think of Atom Entries as events, then it makes sense to  
consider the Atom Entry ID to be the ID of the event, not the ID of  
the subject or object of the event.
You are right. There are two types of objects that we need to think  
about:
   A- the event/state of a resource at a particular time
   B- the thing that makes these different states the state of the  
same thing

Clearly we need (B) or else all the talk about an entry changing over  
time (atom:updated)
would not make sense.

So let us start off, as I did a long time ago, by thinking that the  
the id of an entry
uniquely identifies the event/state of the entry. For every id there  
can be only one and
only one entry/entry representation. That id is that  
representation. It is, if you
wish, the name of a state of something else... and that would be?

I think it is clear that one of the roles of the id is to make it  
possible for an
entry to be moved from one web site to another, so that if your blog  
service provider
lets you down, you can still refer to the entry even when you have  
moved it to a
different alternate position. Graham has made such a point quite  
often. Entries it
has often been said can change, but the id remains the same. I think  
this is clearly
the consensus on this list. So the id URI is what identifies the  
different
entry.../entry representations as being representations of the  
same thing.


Events are unique (you can't have more than one version of an  
event) and can be assigned GUIDs and therefore you cannot have more  
than one entry with the same ID.
yes. But I don't think that this is the consensus on this group. The  
good thing is that
you can achieve the same identification of a state through the  
combination of the id and the
modification time.

[here I noticed that you had changed your mind, anyway. I think I had  
exactly the same
thought as you did when I first started thinking about this. ]


In the case of earthquake data, each new data report is a new event.
   event: agency reports earthquake data
   subject: agency
   object: earthquake data
The ID is the ID of the data reported event not the ID of the  
earthquake.

We don't know what subjects and objects people are going to use in  
the future, so we can't specify Atom elements or IDs for subjects  
and objects -- that's what extensions are for. If you want to  
create a feed to syndicate information about earthquakes, then you  
introduce an extension for uniquely identifying earthquakes. The  
same goes for earthquakes.

- Dave




Re: PaceAllowDuplicateIDs

2005-05-05 Thread Henry Story

On 5 May 2005, at 17:53, Graham wrote:
On 5 May 2005, at 4:22 pm, Henry Story wrote:
If you don't want to keep a history of the entries all you need to  
do is drop all but the
latest entry with the same id. There is nothing more to it. Just  
show the user the last
one you came across.

But, if we follow Eric's model of how a wiki changelog should be  
defined, I'll be missing entries in the log, because several  
different entries have the same id. Ergo, the user interface and  
data model for the new type of feed this proposal permits is very  
different.
If your tool (Is it Shrook2? [1]) only shows people the latest  
version available to you of
an entry, then by showing them only the latest version, Shrook2 will  
be giving the user what
he is expecting.

When your news reader currently reads feeds on the internet what does  
it do
with changed entries? Either it keeps the older version around, for  
the user to browse, or it
does not. If your users don't mind you throwing away the older  
versions of an entry, then
they won't mind you throwing away the older versions of the above  
entries either. There is no
difference in the behavior between allowing changed entries across  
feed documents and changed
entries inside a feed document. People who place two entries with the  
same id inside a feed
document should be aware that tools like yours will have the behavior  
they do, and that this is
ok.

Other people may be interested in looking at things historically.  
They will get a historical
viewer and be happy with it.

I think the current proposal is good exactly because it allows the  
wiki people to express
what they want to express correctly. Namely how their wiki entry is  
changing over time.

This proposal permits this, and it does not harm anyone else.
It harms everyone, by allowing a second, unrelated data model in  
Atom feeds. They may not be posting today, but I assure you, when  
other aggregator authors get the first user complaints about how  
Eric's wiki log displays incompletely in their program, they'll  
forgive Dave Winer everything.
Again, has anyone yet complained to you that you have not kept a  
historical and browse-able track
record of how the entries Shrook2  is looking at have changed over  
time? Clearly they could,
as you sometimes let them know that an entry that they already have  
read has been updated. They
could ask you what the changes were, no? How it changed, etc.

If your users don't care that much about the history of an entry,  
then you can dump all but the
latest entry. Or you could just keep the last two entries, so that  
you can show them a diff.

Graham
HJStory
http://bblfish.net/blog/
[1] http://www.fondantfancies.com/apps/shrook/



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Eric Scheid

On 6/5/05 1:53 AM, Graham [EMAIL PROTECTED] wrote:

 This proposal permits this, and it does not harm anyone else.
 
 It harms everyone, by allowing a second, unrelated data model in Atom
 feeds. They may not be posting today, but I assure you, when other
 aggregator authors get the first user complaints about how Eric's
 wiki log displays incompletely in their program, they'll forgive Dave
 Winer everything.

Many wiki's offer options in displaying their change log with either most
recent changes only, or all changes. Both models are commonly supported
because some people want to see notifications of all changes, while others
just want to see the most recent change. That is part of wiki culture, all
the way back to ward's wiki.

It wouldn't be surprising to find the same options made available for wiki
logs in rss. Hey, here's one right now

http://www.intertwingly.net/wiki/pie/RecentChanges?action=rss_rc

Apparently, if you add an unique=1 URL parameter you get a list of changes
where page names are unique, i.e. where only the latest change of each page
is reflected.

e.



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Antone Roundy
On Thursday, May 5, 2005, at 08:44  AM, Antone Roundy wrote:
If we accept this Pace, are we going to do anything to address the DOS 
issue for aggregated feeds?
Bob, if I may direct a few question to you, since you have the most 
experience with this issue: if PaceAllowDuplicateIDs is adopted, how 
would you anticipate that PubSub would go about handling entries with 
the same atom:id coming from different feeds?  What if each appears to 
be claiming to be the original feed for the entry?  What if both are 
getting aggregated into the same feed, but your system doesn't think 
they're really the same entry?

I'm in favor of the Pace, as far as it goes, but was surprised to see 
that it doesn't talk about these issues, given that it was motivated by 
a conversation with you.



Re: PaceAllowDuplicateIDs

2005-05-05 Thread Graham
On 5 May 2005, at 5:38 pm, Eric Scheid wrote:
Many wiki's offer options in displaying their change log with  
either most
recent changes only, or all changes. Both models are commonly  
supported
because some people want to see notifications of all changes, while  
others
just want to see the most recent change. That is part of wiki  
culture, all
the way back to ward's wiki.
OK that makes sense. I still think it's the wrong way to model a  
change log as a feed.

My other two criticisms still stand:
atom:updated is used by the publisher to show what they consider a  
significant change. The user, on the other hand, probably wants to  
see the latest version, reliably, even if the publisher disagrees  
that the change was significant. This is the core problem with Tim's  
proposal. There is no way to create an aggregator that works in the  
way the user expects.

Finally, at pubsub, what happens when they download an entry from  
one feed, then the user edits it, but doesn't modify atom:updated,  
then they download the new entry from a second feed associated with  
the site? Different content, identical atom:ids, identical  
atom:updated = Invalid feed. They're not in any better position than  
they were before. This doesn't even solve the problem it's meant to.

Basically, atom:updated doesn't properly differentiate versions, and  
the way atom:updated is being used by the proposal doesn't gel with  
the actual spec of the element.

Graham


Re: PaceAllowDuplicateIDs

2005-05-05 Thread John Panzer
Graham wrote:
On 5 May 2005, at 5:38 pm, Eric Scheid wrote:
Many wiki's offer options in displaying their change log with  either 
most
recent changes only, or all changes. Both models are commonly  supported
because some people want to see notifications of all changes, while  
others
just want to see the most recent change. That is part of wiki  
culture, all
the way back to ward's wiki.

OK that makes sense. I still think it's the wrong way to model a  
change log as a feed.

My other two criticisms still stand:
atom:updated is used by the publisher to show what they consider a  
significant change. The user, on the other hand, probably wants to  
see the latest version, reliably, even if the publisher disagrees  
that the change was significant. This is the core problem with Tim's  
proposal. There is no way to create an aggregator that works in the  
way the user expects.
Just a thought:  On the other hand, perhaps this is an opportunity to 
operationally define significant change:  A change which results in a 
new version being exposed on one's feed.  If you think your users would 
care about seeing the change, then change the atom:updated field and 
'republish' by adding to the feed.  If not, just change your content and 
don't republish.

Examples of this might include:  Fixing irrelevant typos.  Changing 
character set encodings.  Changing formatting to match a new style guide.

-John


http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs

2005-05-04 Thread Bob Wyman








+1 with a comment:



If this
Pace is accepted (and I hope it will be) the issue of Duplicate IDs should probably
be dealt with in Marks Implementation Guide.[1] 



Atom
supports the publishing of newer versions of an entry which use the
same atom:id as earlier versions of the same entry. It is not required that
atom:updated be modified when a newer version is written. 

If the
PaceAllowDuplicateIDs is accepted, it will be permitted to have multiple
entries with the same atom:id in a single feed. However, the Pace language says
processors SHOULD regard as feed generation errors any entries
which duplicate both the atom:id and atom:updated of another entry in the same
feed. Thus, feed authors who wish to publish feeds with duplicate atom:ids
should ensure that any entry which duplicates an entry already in the feed has
a different value for atom:updated. This constraint is not a requirement of the
language, but it is a clear derivative of it.

Basically,
you dont have to update atom:updated unless you think it makes sense OR
you are publishing to a feed that already has an entry with the same atom:id as
the atom:id of the entry you are currently publishing.



 bob wyman



[1] http://diveintomark.org/rfc/draft-ietf-atompub-impl-guide-00.html










Re: http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs

2005-05-04 Thread Tim Bray

On May 4, 2005, at 6:20 PM, Bob Wyman wrote:
+1 with a comment:

If this Pace is accepted (and I hope it will be) the issue of 
Duplicate IDs should probably be dealt with in Marks Implementation 
Guide.[1]
Er, I had planned to refine this a bit and then announce it to the 
group with some explanations and some other background research I did; 
so how about I promise to do that later this evening; and please 
consider waiting for that before you all pile in, pro or contra. -Tim



PaceAllowDuplicateIDs

2005-05-04 Thread Tim Bray
co-chair-hat status=OFF
http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs
This Pace was motivated by a talk I had with Bob Wyman today about the 
problems the synthofeed-generator community has.

Summary:
1. There are multiple plausible use-cases for feeds with duplicate IDs
2. Pro and Contra
3. Alternate Paces
4. Details about this Pace
1. Use-Cases
Here's a stream of stock-market quotes.
feedtitleMy Portfolio/title
 
 entrytitleMSFT/title
  updated2005-05-03T10:00:00-05:00/updated
  contentBid: 25.20 Ask: 25.50 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T11:00:00-05:00/updated
  contentBid: 25.15 Ask: 25.25 Last: 25.20/content/item
  /entry
 entrytitleMSFT/title
  updated2005-05-03T12:00:00-05:00/updated
  contentBid: 25.10 Ask: 25.15 Last: 25.10/content/item
  /entry
/feed
You could also imagine a stream of weather readings.  Bob's actual 
here-and-now today use-case from PubSub is earthquakes, an entry 
describes an earthquake and they keep re-issuing it as new info about 
strength/location comes in.

Some people only care about the most recent version of the entry, 
others might want to see all of them.  Basically, each atom:entry 
element describes the same Entry, only at a different point in time.

You could argue that in some cases, these are representations of the 
Web resources identified by the atom:id URI, but I don't think we need 
to say that explicitly.

Yes, you could think of alternate ways of representing stock quotes or 
any of the other use-cases but this is simple and direct and idiomatic.

2. Pro and Contra
Given that I issued the consensus call rejecting the last attempt to do 
this, which was  PaceRepeatIdInDocument, I felt nervous about 
revisiting the issue.  So I went and reviewed the discussion around 
that one, which I extracted and placed at 
http://www.tbray.org/tmp/RepeatID.txt for the WG's convenience.

Reviewing that discussion, I'm actually not impressed.  There were a 
few -1's but very few actual technical arguments about why this 
shouldn't be done.  The most common was Software will screw this up.  
On reflection, I don't believe that.  You have a bunch of Entries, some 
of them have the same ID and are distinguished by datestamp.  Some 
software will show the latest, some will show all of them, the good 
software will allow switching back and forth.  Doesn't seem like rocket 
science to me.

So here's how I see it: there are plausible use cases for doing this, 
and one of the leading really large-scale implementors in the space 
(PubSub) wants to do this right now.  Bob's been making strong claims 
about not being able to use Atom if this restriction remains in place.

I believe strongly that if there's something that implementors want to 
do, standards shouldn't get in the way unless there's real 
interoperability damage.  I'm certainly prepared to believe that this 
could cause interoperability damage, but to date I haven't seen any 
convincing arguments that it will.  I think that if we nonetheless 
forbid it, people who want to do this will (a) use RSS instead of Atom, 
(b) cook up horrible kludges, or (c) ignore us and just do it.

So my best estimate is that the cost of allowing dupes is probably much 
lower than the cost of forbidding them.

Finally, our charter does say that we're also supposed to specify how 
you'd go about archiving feeds, and AllowDuplicateIDs makes this 
trivial.  I looked around and failed to find how we claimed we were 
going to do that while still forbidding duplicates, but it's possible I 
missed that.

3. Alternate Paces
I didn't want to just revive PaceRepeatIdInDocument, because it used 
the word version in what I thought was kind of a sloppy way, and 
because it wasn't current against format-08.  I don't like either 
PaceDuplicateIDWithSource or ...WithSource2, they are complicated and 
don't really meet PubSub's needs anyhow.  So I'm strongly -1 on both of 
those.  Yes, that means that if this Pace fails, we'll allow no 
duplicates at all.  I prefer either dupes OK or no dupes to dupes 
OK in the following circumstances; cleaner.

4. Details
Section 4.1.2 of format-08 says that atom:entry represents an 
individual entry.  The Pace says that if you have dupes, they 
represent the same entry, which I think is consistent with both the 
letter and spirit of 4.1.2.

The Pace discourages duplicate timestamps without resorting to MUST 
language, because accidents can happen; this allows software to throw 
such entries on the floor while positively encouraging noisy 
complaining.  On the other hand, if the WG wanted either to insist on a 
MUST here or remove the discouragement altogether I could live with 
that.

Finally, it makes it clear that if there are entries with duplicate 
atom:id, software is free to display all or a subset, and calls out the 
likely common case where you discard all but the most recent.  If I 
were Brent Simmons or equivalent, I'd be coding up a button where you