Re: IBM IPR Disclosure

2007-02-24 Thread Bob Wyman

On 2/15/07, James M Snell [EMAIL PROTECTED] wrote: IBM has agreed to a
blanket commitment to Royalty Free terms

for any IPR that reads on the Standards Track specifications
produced by the atompub Working Group. ...
We think the Atom Syndication Format and the Atom Publishing
Protocol are really important.


Thank you.

bob wyman


Re: Quoting type parameter value allowed? - Was: I-D ACTION:draft-ietf-atompub-typeparam-00.txt

2007-01-19 Thread Bob Wyman

On 1/19/07, Andreas Sewe [EMAIL PROTECTED] wrote:

So, it looks like that quoting the type parameter's values is no
longer allowed;

Are the quotes part of the parameter value? Or, are quotes merely delimiters
of the value? If RFC045 is read to indicate that the quotes are delimiters,
then it would not be in conflict with RFC4288 since in both cases, feed
would be interpretted as being the value 'feed'...

bob wyman


Re: Inheritance of license grants by entries in a feed

2007-01-14 Thread Bob Wyman

On 1/14/07, David Powell [EMAIL PROTECTED] wrote: You can't just say
that the license extension inherits and

expect every implementation out there to implement that.
 You'd need an Atom 2.0 to do that: either support for
must-understand (which was rejected from Atom 1.0),
or a special feed document extension container.


An implementation should only do things based on the license extension if it
understands what the license extension means. Since the draft now has
carefully written words to ensure that license extensions only grant
additional rights and do not restrict default rights, the worst case
situation is that an implementation that doesn't understand the license
extension inheritance will simply treat entries as though they only had
normal, copyright-defined rights associated with them. i.e. You would get
fair use, implied right to syndicate, right to read, right to make
facilitative copies, etc. but you wouldn't realize that you also get
whatever extra rights were granted by the license. This is, I think a
reasonable fall-back. Of course, implementations that do understand that
feed-level licenses are inherited will be able to manage rights just a bit
better. This is a good thing.

A failure to properly implement license inheritance tends to limit what the
*reader* believes they can do with entries, but it doesn't do any harm to
the owner of the intellectual property in the entries since no one can
believe that they have rights not granted. The worst that can happen is that
readers don't know all the rights they have. This is acceptable, in my
opinion.

bob wyman


Re: Inheritance of license grants by entries in a feed

2007-01-14 Thread Bob Wyman

On 1/14/07, David Powell [EMAIL PROTECTED] wrote:Atom doesn't
describe the processing model of Atom

documents explicitly enough for me to infer much about
the semantics of atom:source. ...
Needing to [use atom:source] is a good sign that you
are abusing feed elements to carry entry metadata
though.


There are quite a few very common, non-abusive reasons for using
atom:source. For instance, the RFC clearly discusses the case where an entry
is copied from one feed document into another and needs to maintain its
association with the feed metadata of the source feed. There is also the
question of signatures

In any case, I read the Atom spec as clearly intending that an entry with an
atom:source element can be semantically equivalent to a single entry feed
document whose feed meta-data is equivelant to that contained in the entry's
atom:source. If this isn't what appears to be written, then I suggest that
it is a case of non-optimal drafting and the history of this group should be
consulted to clarify the intent. I explained why entries with source needed
to be equivelant to single entry feeds when I made the original proposal for
atom:source at the first Atom community meeting at Sun in June of 2004 and I
made it continuously throughout the process of drafting the RFC. This is
also one of the many reasons why Atom assigns no significance to the order
of atom:entry elements within the feed. The meaning of an entry derives
only from data which is either encoded within it or which is recorded as
part of the feed metadata associated with the entry. That association is
either by containment within a feed document or, more strongly, by
encapsulating the feed metadata within the entry. This equivelance property
is essential in order to make aggregated/synthetic feeds work and it is
necessary to make licensing work properly. (Yes, there were some of us
thinking about licensing long before James made his proposal...) Thus, the
processing model for an entry with an atom:source is just as precisely
described as the processing model for a single entry feed document...

bob wyman


Re: Inheritance of license grants by entries in a feed

2007-01-14 Thread Bob Wyman

On 1/14/07, David Powell [EMAIL PROTECTED] wrote: I agree that it is
important to distinguish between feeds

and feed documents, and this is why I think that feed
level inheritance of licenses should be dropped as it is
incompatible with Atom.

Inheritance can't be incompatible with Atom since Atom defines it.
I do agree with you, however, if you argue that Atom would have been cleaner
without inheritance. Without inheritance, feed level meta-data would only
apply to the collection which contains entries and not to the entries
themselves. Without inheritance, we wouldn't need atom:source -- we would
have only needed atom:provenance (a simple link to an entry's origin feed
similar to the source element in RSS. Note: Synthetic feed producers still
would have wanted atom:source as a convenient way to reduce the need to
repeatedly fetch feed documents to get atom:title values.) However, folk
really wanted to keep inheritance of the feed metadata and so we ended up
having to define something more complex.

bob wyman


Re: Fwd: Atom format interpretation question

2007-01-05 Thread Bob Wyman

On 1/4/07, James M Snell [EMAIL PROTECTED] wrote: If the NewsML folks
want to be able to use a proper

mediatype to identify their stuff AND treat it as XML,
they should come upwith an appropriate media type
registration (e.g.application/newsml+xml, etc).


Did the +xml convention ever get formalized in some RFC? I know we all
*think* that tacking +xml onto the end of something means that it is some
use of XML, however, if I remember correctly, this little bit of syntax has
never actually been formalized... Or have I missed something? Is there an
RFC that defines what +xml means?

bob wyman


Re: I-D ACTION:draft-ietf-atompub-typeparam-00.txt

2007-01-02 Thread Bob Wyman


This document looks good on an initial quick read -- with one possible
exception. It says:


Atom processors that do recognize the parameter SHOULD
detect and report inconsistencies between the parameter's
value and the actual type of the document's root element.


This would seem to be creating a directive concerning behavior which
is not directly related to interoperation between systems. (I'm
assuming that the destination of the reports is the user of the
application, a log file, or something like that.) Thus, it seems to me
that it might be inappropriate to use the SHOULD word since IETF apps
are supposed to be focused on interoperation and are supposed to avoid
constraining application behavior unnecessarily. May I suggest that
you rewrite this sentence in a manner similar to that below:

It is strongly recommended that Atom processors that do recognize the
parameter detect and report 

bob wyman



Re: base within HTML content

2007-01-01 Thread Bob Wyman

On 1/1/07, Geoffrey Sneddon [EMAIL PROTECTED] wrote: Why, may I
ask, MUST (under the RFC 2119 definition) HTML

content be a fragment (HTML markup within SHOULD be such
that it could validly appear directly within an HTML DIV
element, after unescaping. - note the word SHOULD, not
MUST, implying that you can have a full HTML document within)?


What would you do if you wanted to display a feed of 10 entries in
newspaper style (i.e. all entries in a single HTML page) yet each of the
entries had a different BASE defined? It wouldn't do you much good to move
all the base elements to the HEAD of the DOM tree -- you'd just end up with
a mess. If you want a local base, then use xml:base. That's what it is for.

The same problem exists for other page-global stuff. For instance, XHTML
modularization is useless if you're creating Atom entries since that stuff
relies on elements in HEAD but, an Atom entry ain't got no head

Remember as well that not all of the entries in a feed document need be
created by the same person. For instance, with aggregated or synthetic
feeds, you end up with entries written by many different authors who have no
chance of negotiating how they will divide the global resources that might
be used to display their entries. Because some entries may be signed, you
can't simply say something like just rewrite the entries -- that would
break the signatures.

It is good that Atom entries should be fragments. That increases to a great
degree the variety of environments in which Atom entries are useful. If you
feel constrained by this, I would suggest that you push on those who define
HTML and get them to provide mechanisms for allowing fragment-local
expression of things that at this time can only be expressed as page-global.
(Yes, I realize this will take some time.)

bob wyman


Re: Inheritance of license grants by entries in a feed

2006-12-18 Thread Bob Wyman

On 12/17/06, David Powell [EMAIL PROTECTED] wrote:


What you can do however, is to specify that feed licenses apply to the
feed, and inherit to the entries in the feed. ... It
means that the license applies to all entries in that feed, not just
ones in that specific feed document. This is probably reasonable
behaviour for licenses anyway.



Particularly in the case of licenses, it is very important to distinguish
between the feed or stream of all entries (past, present and future)
associated with a feed id and the actual feed documents that encapsulate
subsets of that stream. Atom provides no mechanism for associating meta-data
with feeds. Atom only supports associating meta-data with Feed Documents.
Data in one feed document does not apply to entries found in another feed
document -- or to entries that stand-alone. Feed meta-data found in one feed
document does not override, compliment or invalidate feed meta-data found in
other feed documents. This is one of the many reasons we have atom:source --
so that we can bind specific feed meta-data to an entry no matter what
context in which that entry might appear or when it might be read.

If we had a case where data in one feed document overrides data in other
feed documents, we'd have a mess. Some of the questions that we'd have to
answer are:

  - Elements like atom:author, atom:contributor and atom:rights can and
  do change over time -- sometimes frequently. If such a change occurs, does
  it mean that we've implied a change to all previous entries published in all
  previously published feed documents? This rule would tend to force us to
  create new feeds (i.e. new feed ids) whenever authors, contributors,
  or rights change. This would make a mess for aggregators and feed readers
  who have enough trouble keeping up with changes in the syndisphere...
  - If data is present in one feed doc but not another later document,
  does the absence of the data in the later document override the previous
  document or do we combine what we know from both documents? (i.e. if
  the earlier Feed document had an atom:contributor field but a later Feed
  Document does not, does this mean that we wipe out knowledge of the
  contributor who might have been essential to creating some of the earlier
  entries? (That's kind of heartless -- it would be a high tech version of
  What have you done for me lately?...) Or, do we improperly maintain the
  old contributor as a contributor of new entries --potentially long after the
  contributor has died?)
  - How do we handle mistakes? For instance, if after publishing several
  thousand feed documents in sequence, I might publish one that accidentally
  grants all rights to everyone when I really meant to grant only
  non-commercial rights. Does the new badly coded feed document force all of
  the thousands of entries I've been working on over time into the public
  domain even if that wasn't what I intended?
  - How do I repair the mistake discussed immediately above? If I
  publish a new feed document with a license grant for non-commercial use,
  does that then apply to all previously published entries -- including those
  that were accidentally published with over-generous rights? Does this mean
  that I can use a license in one feed document to *restrict* or rescind
  rights granted by another feed document? (This would be a very bad
  thing...)
  - If I have an aggregator that picks up some content which is licensed
  for general use today, how can I be sure that I can still use the content
  tomorrow? If the content of a Feed Document applies retroactively, it would
  seem that I have to re-fetch the feed every time I use content from the feed
  so that I can check the metadata. This doesn't seem to make sense. If I were
  sued by someone, could I use the argument: But, I didn't read the new, more
  restrictive Feed Documents! Is ignorance an excuse?

I could go on...But, I hope the case is made. Feed Documents only describe
themselves and the entries they contain. They do not describe the feed.


if you store a feed in an implementation such as
Microsoft's Feed Engine, only a single set of feed
extensions will be associated with the feed.


While it is important to be aware of the inadequacies (as well as the
strengths) of implementations by companies with significant market power, I
don't think that we can simply delegate the standards writing process to
such companies or modify standards to cover up their bugs. The fact that
Microsoft or any other company has done the wrong thing should not, in
itself, be sufficient to dictate the development of standards. Hopefully,
they will eventually see the error in their ways and correct them.

bob wyman


Re: AD Evaluation of draft-ietf-atompub-protocol-11

2006-12-17 Thread Bob Wyman

On 12/16/06, A. Pagaltzis [EMAIL PROTECTED] wrote:


Extending the Atom envelope is a strategy of last resort.



+1

It is important to remember that not all processors of Atom data will know
what to do with unexpected metadata in the envelope. Thus, unexpected
envelope fields will often simply be stripped off and thrown to the bit
bucket. If you want data to stay with your content, it is best to put it
in the content/...  Sometimes, it may be appropriate to extend the
envelope, however, one should not do so without a really compelling case.

Envelope extensions typically require fetching time or database structure
modifications in consuming applications if those extensions are to be
supported. This is because many feed consumers have distinct fields in their
databases or internal structures for each of the envelope elements and then
just have a single field for content. Also, the code for manipulating
envelope fields is usually distinctly different from the code used to
manipulate and process content/. So, if you create a new envelope field,
you require a great deal of code to be modified for that field to be
supported. On the other hand, if something can be slipped into content/
you'll see it being stored immediately and have the opportunity for
downstream consumers (display routines, etc.) to provide support for the
additional data. (For instance, you might write a GreaseMonkey script to do
interesting things with stuff encoded in content/ even though the
backend of the application knows nothing about it.)

My personal feeling is that many of the proposals (but not all) for envelope
extensions are derived from what I consider to be unfortunate precedent set
in the RSS world where all sorts of random stuff has been pushed into the
envelope since in RSS the description/ field is so under-specified that it
isn't really possible to think of it as something which can be structured.
Fortunately, the field has moved forward since legacy RSS was defined and
we've got better methods that can be used with Atom. There are undoubtedly
still things that might go in the envelope, but not as many as some folk
might think.

bob wyman


Inheritance of license grants by entries in a feed

2006-12-16 Thread Bob Wyman

In general, I think the latest version of James Snell's license ID [1] is
much better than earlier versions. I am particularly pleased that this draft
only speaks of license grants. I remain, as always, opposed to anything
that would encourage people to attempt to restrict the implied license to
syndicate. I do, however, have a few small issues. The text on inheritance
is, I think, almost correct in this draft however, as written it seems to
create a risk of the incorrect granting of rights as well as unfortunate
loss or decay of grants when entries are copied between feeds.

The current draft states: (focus on the underlined bits. The first
underlined sentence is too restrictive, the second too inclusive.)

2.3. Inherited Licenses

The license on a feed MAY be inherited by entries. Generally, a more
specific license overrides the less specific license. More specifically,
if an entry has any license link relations at all, including the
undefined license, it does not inherit the license of
the feed. If an entry has no license link relations, it does inherit the
license of its parent feed(s). Since an entry may appear in multiple
feeds, it may inherit multiple licenses. This is equivalent to stating
multiple licenses on the entry itself.



I am concerned that some readers who are not intimately familiar with
RFC4287 may not understand that entries which contain atom:source elements
do NOT inherit feed metadata from the feeds in which they are found. The
text of the current draft seems to override this constraint on inheritance.
Thus, I propose the following new wording for the third and fourth sentences
in the first paragraph of section 2.3 (the one's quoted and underlined
above):

More specifically, if an entry has any license link relations at all,

including the undefined license, [or, if the entry contains an
atom:source element,] it does not inherit the license of the feed. If an
entry has no license link relations[, and contains no atom:source
element,] it does inherit the license of its parent feed(s).



Additionally, I believe that this draft should align with the handling of
atom:rights defined in section 4.2.11 of RFC4287 by adding the following
text at some appropriate location:

If an atom:entry which does not contain an atom:source is copied from one

feed into another feed then if the feed into which it is copied contains a
license, an atom:source element SHOULD be added to the copied entry. If a
source feed contains a license, that license SHOULD be preserved in an
atom:source element added to any entries copied from the source feed which
do not already contain atom:source elements.



The first constraint is necessary to ensure that the act of copying entries
does not result in rights being granted by the copyist even though those
rights were were not granted by the entry's author. The second constraint
helps to prevent the loss or decay of rights as things are copied from
feeds with licenses that grant rights into feeds that contain no or lesser
grants.

I realize that clarifying these constraints on inheritance allows for at
least one odd result. That is, I might have a feed which contains entries
whose atom:source elements declare license grants that differ greatly from
what is seen in the feed's metadata even though all those entries claim the
enclosing feed as their source. This actually makes a good bit more sense
than it might seem to at first glance. The reason for this is that the
rights granted for entries added to a feed can change over time even though
changes to the feed's default rights may not impact previously created
entries. Thus, a feed might have granted liberal rights when an entry was
first created but might not offer the same grants when the entry was
updated. The author should be able to maintain with the entry the rights
that were originally granted (or not granted) rather than being forced to
update the rights in order to do something as simple as a spelling
correction. (Yes, I realize that the author could, in some cases, simply
attach the old rights to the updated entry rather than using an atom:source
which contains the same information. However, this can get messy in some
situations and causes us to lose some information about the source of the
license grants -- it may be useful in some cases to distinguish between
licenses granted in feed metadata and those granted in entry metadata.
Forcing attachment of licenses to entries would also require using the
undefined license in more cases than is desirable.)

I've got a few other comments -- destined for other messages. Nonetheless,
this draft is looking much better than earlier drafts.

bob wyman

[1]
http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-license-10.txt


License Draft: Tortured text re obligations...

2006-12-16 Thread Bob Wyman

There is, I think a bit of tortured text in James Snell's otherwise useful
License ID[1].

1.3. Terminology
...
The term license refers to a potentially machine-readable description of
explicit rights, and associated obligations, that have been granted to
consumers of an Atom feed or entry.

The problem is the underlined clause... One can't grant an obligation.
(When you have a conjunction, you should be able to scan the sentence with
only one element of the conjunction without losing meaning...) As written,
the sentence can be read by nitpicking lawyers as: The term 'license'
refers to obligations that have been granted... Clearly, this isn't the
intent. Thus, I propose the following rewording:

The term license refers to a potentially machine-readable description of
explicit rights that have been granted to consumers of an Atom feed or
entry. Rights granted by a license may be associated with obligations which
must be assumed by those exercising those rights.

I realize that this is a bit more wordy than the existing text, however, I
think it better perserves the author's intent. Also, it has the nice
attribute of limiting the discussion of obligations to the scope of rights
granted by the licenses -- not rights that might exist in the absence of the
license. Nothing we do should encourage people to use in-feed or in-entry
data to restrict rights which exist independent of an explicit license
grant. Such rights may include fair-use rights, the right to create backups,
the implied right to syndicate, etc. As with Creative Commons licenses, I
believe our goal here should be to provide mechanisms to expand the rights
granted -- not to restrict them.

bob wyman

[1]
http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-license-10.txt


Re: Atom Entry docs

2006-12-15 Thread Bob Wyman

On 12/15/06, Hugh Winkler [EMAIL PROTECTED] wrote:


It's telling that James felt it natural to choose the name type for
the parameter. Because it really is naming a new type of document.



What would be better than type? Might root work better?

It seems to me that application/atom+xm;type=entry describes an Atom
document whose root element is `entry/'. The type of the document is
atom but it is a kind or type of atom document that has an entry/
element as it's root. Unfortunately, type is being used to mean two
completely different things in this context.

Would you be happier if the proposal was for the following?

   application/atom+xml;root=entry
   application/atom+xml;root=feed

One argument for using root is that it might be a usage that would be
useful with other mediatypes which have more than one possible root element.
Also, using root as the parameter name would ensure that folk don't get
confused into thinking that there is any kind of subtyping going on here --
specifying ;type=root is simply providing meta-data which describes a
constrained use of the general atom type -- it is no different from doing
something like saying: I won't except any feeds that don't have icon/
elements. or, This feed contains no more than 256 entry elements. If one
is being exceptionally formal or overly pedantic, I can see how you might
argue that a feed constrained to fewer than 257 entries is somehow a
sub-type of sub-class of the more general atom type. But, since every
distinct instance of the atom type can be described in similar manners, it
would mean that every atom instance is a subtype. In some contexts, this
observation might be useful. I don't think, however, that such precision is
useful in the realm for which we normally are designing Atom...

bob wyman


Re: Atom Entry docs

2006-12-13 Thread Bob Wyman

There is, I think, a compromise position here which will avoid breaking
those existing implementations which follow the existing RFC's.

1) Define ;type=feed and ;type=entry as optional parameters. (i.e. get them
defined, registered, and ready to use.)
2) Leave RFC4287 unchanged. i.e. do NOT re-define application/atom-xml
3) New specifications MAY require that ;type=entry be used. (Note: Just
because ;type=entry is used DOES NOT imply that ;type=feed must also be
used)

Thus, APP would accept application/atom+xml when looking for a feed but
might insist that entries be explicitly identified with a disambiguating
type parameter. Thus, no code which currently uses application/atom+xml to
designate a feed would be broken. Additionally, any code which is properly
built and thus ignores unknown types will not be hurt when it sees
application/atom+xml;type=entry since it will ignore the type parameter
and dig around inside the data to figure out if it is feed or entry. The
only code which will be hurt is some potential code that does not follow the
existing RFCs for Atom or mime types. It is, I think, OK to occasionally
break code that doesn't follow the specs.

Whatever the technical arguments may be, I believe it is important from a
political point of view that we do not change the definition of things
defined in Atom. I am all for extending Atom, but not for changing Atom. We
must not change the exiting specification unless there is some really
serious harm being done. If we do, we risk losing the trust of at least some
members of the community that we've built these last few years... Folk will
remember that one of the advantages that is claimed for RSS is that it has
been declared to be eternally free from modification. While I personally
believe that that is silly, the proponents of RSS do have a point when they
speak of the value of stable specs. If we allow the Atom spec to be
*changed* so soon after it was accepted and we don't have a really, really
good reason for doing it, we will simply have proven the often made claim
that standards groups simply can't be trusted with important specifications.
We will be encouraging more of the kind of standards making that resulted
in the mess that is RSS...

bob wyman

PS: Since Kyle points out that GData, a Google product, is potentially
impacted by the results of this discussion, I should state that I currently
work for Google -- although I am not currently assigned to any product or
project that has a direct interest in the definition of Atom, APP, etc... My
participation in this discussion, at this time, is driven purely by personal
interest.


Re: PaceEntryMediatype

2006-12-10 Thread Bob Wyman

On 12/10/06, Eric Scheid [EMAIL PROTECTED] wrote:

The only danger [of defining a new media type] is if someone has

implemented

APP per the moving target which is Draft-[n++] ... they should
revise their test implementations as the draft updates, and certainly
update once it reaches RFC status, so no sympathies there.


The impact here is not just limited to APP implementations. If a new media
type is defined, it will undoubtedly appear in other contexts as well. Given
the current definition of the atom syntax, it is perfectly reasonable for an
aggregator to treat a single entry as the semantic equivelant of a
single-entry feed. If a new media type is defined, such an application would
end up having to be modified. That's not right... APP is not the only
context within which Atom is used.

bob wyman


Re: Fwd: PaceEntryMediatype

2006-12-09 Thread Bob Wyman

On 12/8/06, James M Snell [EMAIL PROTECTED] wrote:


I'm fine with the type parameter approach so long as it is effective.
By effective I mean: Will existing implementations actually take the
time to update their behavior to properly handle the optional type
parameter.



It would be useful to define better what is meant by properly handle the
optional type parameter. Those that don't understand the parameter should
simply continue to operate on the current assumption that they can't really
be sure if they are reading a feed or an entry until they read the first few
bytes. Those that do understand the meaning of the optional parameter will
be writing code in the future and we can hope that if they become aware of
the type parameter and decide to care about it, they will have sufficient
awareness to do whatever they do in a proper manner.

The only case where I can see a problem would be those folk who match
against the existing media type as an opaque string and don't have any code
to handle opional type parameters. Such sloppy code would be broken by the
use of the optional type parameters since the presence of the parameter
would break the simple string matches used by these coders. However, I must
admit that I don't have much sympathy for such folk. Making basic design
decisions to adress the concerns of these sloppy folk is something like the
old prejudice against using XML attributes since it tended to make it harder
to create sloppy, regex based parsers... In any case, the alternative
proposal, create a new media type for entries, would tend to confuse people
who have their code written properly today --- those whose code understands
that the existing atom mediatype can be used for both a feed and and entry.
What we would be doing by creating a new media type is break the code of the
folk who paid attention to the spec in order to preserve the code of those
who didn't read the spec (or those who refused to see Atom as anything other
than some twisted form of RSS...) This doesn't make sense to me. We should
use the type parameter if anything is changed here.

bob wyman


Re: rss reader

2006-12-09 Thread Bob Wyman

On 12/9/06, Greger [EMAIL PROTECTED] wrote:




hi
I have made a prototype rss reader. All is good. just wondering if anyone
would be interested in getting down to create a C++ library for in
particular
atom, but also the other used rss feed types.
anyone working on this kinds of things?



If you're going to be building a C++ library for syndication, you might want
to look at the Microsoft RSS Platform for some inspiration. See:
http://msdn.microsoft.com/XML/rss/default.aspx

I believe that Microsoft's is currently the most comprehensive platform for
handling syndication feeds. But, there is much that can be done to improve
on what they've done.

bob wyman


Fwd: PaceEntryMediatype

2006-12-08 Thread Bob Wyman

On 12/5/06, James M Snell [EMAIL PROTECTED] wrote:

Mark Baker wrote:
It's just an entry without a feed.  You'd use the same code
path to process that entry whether it were found in an entry
or feed document, right?

Not necessarily... The majority of applications that most
frequently handle Atom Feed Documents have no idea how
to deal with Atom Entry Documents and I would wager that
most applications that understand how to process Atom Entry
Documents and Atom Feed Documents typically don't fall into
the same category as most feed readers.

   What you seem to be implying is that the majority of applications that
process Atom Feed documents are not, in fact, supporting extremely important
parts of the atom specification. I believe that any properly constructed
Atom Feed parser will contain all the code needed to parse the most complex
Atom Entry document. And, an entry document with an atom:source is
semantically equivelant to an atom:feed with a single entry...  The problem
here is that people insist on building Atom parsers that aren't capable of
handling more semantics than legacy RSS. What we should be doing is
encouraging people to exploit Atom and use its features -- atom:source among
others -- that aren't supported by RSS.
For a parser that properly handles the case of an atom:entry appearing
within atom:feed, it should be trivially simple code to recognize  and
handle an entry without a feed wrapper. I think there are even cases where
this makes sense -- and you would even want to subscribe to such a thing:
   Consider a feed that communicates current weather or current stock
price, etc. We wouldn't be surprised if such a feed never contained more
than a single entry. We also wouldn't be surprised if the publisher of this
single entry feed decided that he wanted to sign the entry in this
single-entry-feed and was thus forced to insert all of the feed data into
the entry's atom:source. Of course, once you've got a single-entry Atom feed
which contains a signed entry, you have all the feed data duplicated -- so,
it wouldn't be surprising to see authors of such feeds argue that they
shouldn't be forced to waste bits on duplicated feed data when an atom entry
document provides exactly what they need.
   In any case, while it appears reasonable (and sometimes efficient) for
people to subscribe to Entry documents, I don't think we should do anything
disruptive unless someone can establish actual harm being caused by the
current state of affairs.

bob wyman


RE: atom license extension (Re: [cc-tab] *important* heads up)

2006-09-07 Thread Bob Wyman

John Panzer asks of Karl Dubost:
 (Let's say that  Doc Searls somehow discovers a license that
 would deny sploggers more than implied rights to his content
 while allowing liberal use for others[1], and deploys it.
  Are you saying that all of his readers' feed software would
 have to drop his feed content until they're upgraded to
 understand the license?) [1] http://doc.weblogs.com/2006/08/28
I think John's question can be (aggressively) rephrased as: Can Doc
Searls, by inserting a license in his feed, 'poison' the entire syndication
system that we've built over the last few years? (i.e. Can he do things
that make it unsafe or illegal for people to do things which the syndication
system was intentionally built to permit and which he knew were being done
before he willingly inserted his content into the syndication network?) I
don't think so.
As argued in other messages, I strongly believe that we should not
do anything that hinders or conflicts with the establishment or recognition
of a limited implied license to syndicate content which is formatted using
RSS/Atom and is made openly available on the network. (An interesting
question, of course, would be: What does it mean to 'syndicate'?)
In any case, there is a general problem of proper notice here. As
mentioned before, there is nothing special about an optional IETF protocol
extension. This subject of inserting licenses in content should be discussed
in a general sense -- not limited to this specific protocol extension. 
A vital question to ask is: What is proper notice of the presence of
a license? No IETF standard has the force of law. Readers are not obligated
to understand or even take note of the license links. Thus, no one using it
should be able to have any expectation that readers will take note of it any
more than they would of many other possible means of inserting licenses or
references to them in content. Publishers and consumers should both be
working on the assumption that normal copyright exists (i.e *all rights
reserved*) except where there are fair use privileges of implied licenses
that weaken the *all rights* default.)
If we were to allow or encourage any one mechanism to associate
restrictive licenses with content, we establish a precedent that would allow
or encourage others as well. Any other standards group or informal
collection of one or more persons could decide to define a new mechanism --
just like the IETF did. At that point, no reader could safely consume
content since no matter how many mechanisms they supported there might be
some others that they didn't know about. The issue here is about proper
notice... How can we obligate folk to respect licenses that they have no
means of discovering?
We should also ask: At what point does a restrictive license become
operative? Imagine that I decided that reading (copying) of my feeds by
commercial organizations was to be prohibited. Could I bar such copying by
putting a license in the content itself? Of course, if I did, that means
that in order to discover that copying was not permitted the reader would
have to actually do the thing which is prohibited. Clearly, even if there
was some way to put effective restrictive licenses in content, there would
have to remain some implied license exceptions to the *all rights
provision of copyright.
We are all best served by an assumption that copyright leaves all
rights reserved to the publisher and that only fair use, limited implied
license to syndicate, and explicit license grants (like CC) limit the
totality of those rights. With this in mind it might be best to change from
a license link to a rights-grant link... In other words, frame this link
type as something which can *only* be used to broaden rights, not restrict
them.

bob wyman





RE: atom license extension (Re: [cc-tab] *important* heads up)

2006-09-06 Thread Bob Wyman
 suggestion for this sentence is that it might be less
strongly worded. Given that the law in this area is not settled, it might
make sense not to say Nor can a license... restrict... Rather, it might be
more accurate to say something like: It is believed that a license ...
cannot restrict

My apologies for such a long message...

bob wyman




RE: atom license extension (Re: [cc-tab] *important* heads up)

2006-09-06 Thread Bob Wyman

Thomas Roessler wrote:
 It's fine to point out the lack of an enforceable binding on a
 technical level, but I don't think this spec is the place to
 discuss the legal implications that this might have.
If the spec does not make statements concerning the intended legal
implications of a feature which clearly addresses legal issues, the result
will almost inevitably be wide-spread misunderstanding of the implications
of using the feature. The mere act of going to the trouble of specifying the
license link indicates that the authors expect that there will be some
implication of having used the feature. The question that many readers will
have is: What are the intended implications? Leaving the answer to guess
work is not useful, I think.
Given the unsettled and potentially dynamic state of the law in this
area, I certainly agree that the spec should not make pronouncements
concerning what the law is in this case. But I don't see any valid argument
against making statements of intent that may, or may not, be in conflict
with the law as it is or may one day be.
The authors of the specification have, I think, not only good reason
to state their intention but an obligation to do so. Warning implementers
that the use of the license link may not, in at least some situations and in
some legal systems, create a legally enforceable binding is the right thing
to do.

bob wyman




RE: atom license extension (Re: [cc-tab] *important* heads up)

2006-09-06 Thread Bob Wyman

Wendy Seltzer wrote:
 The concern about limiting implied licenses is important...
 If the rfc encourages people to add licenses, it opens up
 the possibility that their explicit terms will contradict
 and override what has previously been implied.
This is precisely why I have normally argued against adding rights
and licenses mechanism to Atom and other formats. Unfortunately, it is has
been a losing battle (Atom has rights/) so, I'm now trying the tack of
attempting to get explanatory text and weakness in the language in order to
mitigate some of the damage that might be caused.
Oddly, I think part of the push for these dangerous licensing
mechanisms is the result of success of Creative Commons. We may be seeing
that a movement intended to expand rights will indirectly create a situation
where rights are more easily restricted. People really like the CC mechanism
for granting rights and as a result want cleaner and better understood means
for associating Creative Commons licenses with their content. Unfortunately,
an unintended consequence of satisfying this desire to publish CC licenses
might be that it becomes easier and more common for folk to publish
restrictive licenses.
Readers of this thread might be interested to see that Denise Howell
has been discussing very similar issues on her new Logarithms blog.[1][2]
I've put some comments in there and have also responded in length concerning
what I, as a non-lawyer, consider some of the implied licenses that attach
to RSS/Atom syndicated content.[3]

bob wyman

[1] http://blogs.zdnet.com/Howell/?p=17
[2] http://blogs.zdnet.com/Howell/?p=18
[3] http://www.wyman.us/main/2006/09/magazine_or_mus.html




RE: atom license extension (Re: [cc-tab] *important* heads up)

2006-09-06 Thread Bob Wyman

Antone Roundy wrote:
 With respect to the issue of aggregate feeds, I had thought
 that the existence of an atom:source element at the entry
 level blocked any inheritance of the feed metadata, but
 looking at RFC 4287, I don't see that explicitly stated.
It's not explicit, but it is implicit. The source/ element
preserves the entry's feed metadata. Thus, to find the feed metadata
associated with an entry which has an atom:source, you should look to the
preserved data in the atom:source element (or the source feed itself...) --
you should NOT look to the metadata of the feed within which you found the
entry.
Atom:source says, essentially: This entry is not of this feed. It
is foreign and should be interpreted as such. Thus, the feed metadata of
the containing feed should never be allowed to leak into the interpretation
of an entry which contains an atom:source. To do so would make syndication,
aggregation, etc. a complete mess.

bob wyman




RE: Finally Atom: Blogger is here

2006-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 [Now that Blogger supports both RSS 2.0 and Atom 1.0]
 That makes what, another few dozen million Atom 1.0 feeds?
Yes, many, many more than before. But also many more legacy RSS 2.0
feeds. Which leads to the inevitable rhetorical question: Why the heck do
people keep insisting that the industry continue to support new deployments
of RSS 2.0? This is just silliness.

bob wyman



RE: Atom license link last call

2006-08-21 Thread Bob Wyman

James Snell wrote:
 [1] The relationship [between license and atom:right] is
 subtle, but important ...
 [2] I specifically wanted to differentiate the two. ...
 [3] The two serve different, but related, purposes.  The
 two should not contradict each other.  If they do,
 consumers must go back to the content publisher to
 resolve the problem.
Given the subtle differences, the claimed importance of the
differences, and their supposed utility, I would strongly suggest that these
points should be clearly stated in the ID itself. It is highly unlikely that
readers of an eventual RFC are going to universally come here and read the
illuminating messages in the mailing list archive. Thus, the subtle
distinctions that you see are highly likely to be lost once the RFC is
published -- unless you document them. Also, it is more likely that
reviewers will be able to make more informed judgments if these distinctions
are clearly documented in the ID text.

bob wyman




RE: Atom license link last call

2006-08-18 Thread Bob Wyman

James,
My apologies if these questions and comments have been dealt with
before:
* What is the expected or intended relationship between data carried
in the atom:rights element and data pointed to by the license relationship?
* Why did you choose the word license when Atom itself uses the
word rights for a very similar (if not identical) concept?
* If the intent of the license link is to provide a mechanism to
support out of line rights elements, then did you consider doing something
similar to the handling of out-of-line atom: content via a src attribute?
For example: Does the license link do anything that would not be
accomplished by adding support for rights elements in the following form:
rights src=http://.../

* If a feed reader discovers both atom:rights and a license link in
a single entry or feed, is there any concept of precedence between the two?
For instance, if the text of the license is more or less restrictive than
what is in the atom:rights element, what should the reader assume about the
rights that are granted?

bob wyman




RE: Fyi, Apache project proposal

2006-05-28 Thread Bob Wyman

James M Snell mentioned his Apache Project...

It would be *very* nice if you could see your way to implementing
RFC3229+feed[1] support in your implementation. As I think you know, the
use of this mechanism results in massive reductions in the bandwidth and
client-side processing required in fetching updates to Atom feeds. Also,
Microsoft will be supporting RFC3229+feed in their browsers[2], thus, we can
anticipate that support for fetching delta-feeds will soon be considered
expected. The only issue with Apache is that Apache *still* does not
support the 226 response code...

bob wyman

[1] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html
http://bobwyman.pubsub.com/main/2004/09/implementations.html
http://www.intertwingly.net/blog/2004/09/15/Syndication-with-RFC3229
[2] http://bobwyman.pubsub.com/main/2006/04/microsoft_to_su.html




RE: atom:updated handling

2006-02-18 Thread Bob Wyman

Phil Ringnalda wrote:
 Patches that will make that more clear are welcome.
The warning message that Phil points to says in part: (at:
http://feedvalidator.org/docs/warning/DuplicateUpdated.html) 

For example, it would be generally inappropriate for a publishing
 system to apply the same timestamp to several entries which were
 published during the course of a single day.

Of course, this leads one to wonder if it might be appropriate to apply the
same timestamp to several entries if they were published during the course
of multiple days...

It would make a great deal more sense to say something like: It would not
be appropriate to apply the same timestamp to several entries unless they
were published simultaneously.

bob wyman





Structured Publishing -- Joe Reger shows the way...

2005-09-08 Thread Bob Wyman








Ive written a blog post pointing to a wonderful demo
of tools for doing structured publishing in blogs that Joe Reger has put
together. Given that Atom has built-in support for handling much more than just
the text/HTML that RSS is limited to, I think this should be interesting to the
Atom community.



http://bobwyman.pubsub.com/main/2005/09/joe_reger_shows.html



What can we do with Atom to make the vision of
Structured/Semantic publishing more real?



 bob wyman










The benefits of Lists are Entries rather than Lists are Feeds

2005-08-31 Thread Bob Wyman

Folks, I hate to be insistent, however, I think that in the mail below I
offered some pretty compelling reasons why lists should be entries rather
than turning feeds into lists. Could someone please comment on this? Is
there some point that I'm completely missing? What is wrong with my
suggestion that lists-are-entries is much more useful than the alternative?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Bob Wyman
Sent: Tuesday, August 30, 2005 5:10 PM
To: 'Mark Nottingham'
Cc: atom-syntax@imc.org
Subject: RE: Top 10 and other lists should be entries, not feeds.

Mark Nottingham wrote:
Are you saying that when/if Netflix switches over to Atom, they
 shouldn't use it for the Queue?
No. I'm saying that if Netflix switches over to Atom, what they
should do is insert the Queue information, as a list, into a single entry
within the feed. 
This will not only preserve the nature of Atom feeds as feeds but
also allow NetFlix a number of new and potentially interesting opportunities
for providing data to customers. Most important among these will be the
ability to include multiple lists in the feed (i.e. in addition to the
Queue, they could also include their Top 10 list as well as a set of
recommendations based on user experience. They might even include a list
of 10 most recent transactions on your account) Each list would be a
distinct entry. To make life easier on aggregators, each entry type should
probably use the same atom:id across versions. This allows the aggregators
to discard earlier, now out of date entries.
NetFlix would also be able to intermix information such as the
Queue List with non-list entries. For instance, they might have a Message
from NetFlix that they want to include in the feed or, they might include a
series of movie reviews that were carefully selected for the specific user.
Basically, by using entries for lists instead of converting the
entire feed into a list, NetFlix is able to offer a much richer and much
more satisfying experience to their users.
The ability of Atom to carry both lists and non-lists as entries
means that Atom is able to offer a much more flexible and powerful mechanism
to NetFlix than can be had from the less-capable RSS V2.0 solution. I think
that if I were NetFlix, I would want to have the opportunity to experiment
with and find ways to exploit this powerful capability. The richer the
opportunity for communications between NetFlix and their customers, the
greater the opportunity they have to generate revenues.
The alternative to using entries rather than feeds would be creating
multiple feeds per user. That strikes me as a solution which is ugly on its
face and unquestionably increases the complexity of the system for both
NetFlix and its customers. The list-in-entry solution is much more elegant
and much more powerful.

bob wyman






RE: Top 10 and other lists should be entries, not feeds.

2005-08-30 Thread Bob Wyman

Mark Nottingham wrote:
Are you saying that when/if Netflix switches over to Atom, they
 shouldn't use it for the Queue?
No. I'm saying that if Netflix switches over to Atom, what they
should do is insert the Queue information, as a list, into a single entry
within the feed. 
This will not only preserve the nature of Atom feeds as feeds but
also allow NetFlix a number of new and potentially interesting opportunities
for providing data to customers. Most important among these will be the
ability to include multiple lists in the feed (i.e. in addition to the
Queue, they could also include their Top 10 list as well as a set of
recommendations based on user experience. They might even include a list
of 10 most recent transactions on your account) Each list would be a
distinct entry. To make life easier on aggregators, each entry type should
probably use the same atom:id across versions. This allows the aggregators
to discard earlier, now out of date entries.
NetFlix would also be able to intermix information such as the
Queue List with non-list entries. For instance, they might have a Message
from NetFlix that they want to include in the feed or, they might include a
series of movie reviews that were carefully selected for the specific user.
Basically, by using entries for lists instead of converting the
entire feed into a list, NetFlix is able to offer a much richer and much
more satisfying experience to their users.
The ability of Atom to carry both lists and non-lists as entries
means that Atom is able to offer a much more flexible and powerful mechanism
to NetFlix than can be had from the less-capable RSS V2.0 solution. I think
that if I were NetFlix, I would want to have the opportunity to experiment
with and find ways to exploit this powerful capability. The richer the
opportunity for communications between NetFlix and their customers, the
greater the opportunity they have to generate revenues.
The alternative to using entries rather than feeds would be creating
multiple feeds per user. That strikes me as a solution which is ugly on its
face and unquestionably increases the complexity of the system for both
NetFlix and its customers. The list-in-entry solution is much more elegant
and much more powerful.

bob wyman




Top 10 and other lists should be entries, not feeds.

2005-08-29 Thread Bob Wyman








Im sorry, but I cant
go on without complaining. Microsoft has proposed extensions which turn
RSS V2.0 feeds into lists and weve got folk who are proposing much the
same for Atom (i.e. stateful, incremental or partitioned feeds) I think
they are wrong. Feeds arent lists and Lists arent feeds. It seems
to me that if you want a Top 10 list, then you should simply
create an entry that provides your Top 10. Then, insert that entry in your feed
so that the rest of us can read it. If you update the list, then just replace
the entry in your feed. If you create a new list (Top 34?) then insert that in
the feed along with the Top10 list. 

What is the problem? Why dont
folk see that lists are the stuff of entries  not feeds? Remember, Its
about the entries, Stupid

I think the reason weve got
this pull to turn feeds into Lists is simply because we dont have a
commonly accepted list schema. So, the idea is to repurpose what
weve got. Folk are too scared or tired to try to get a new thing defined
and through the process, so they figure that they will just overload the
definition of something that already exists. I think thats wrong. If we want
Lists then we should define lists and not muck about with Atom.
If everyone is too tired to do the job properly and define a real list as a
well defined schema for something that can be the payload of a content element,
then why not just use OPML as the list format?



What is a search engine or a
matching engine supposed to return as a result if it find a match for a user
query in an entry that comes from a list-feed? Should it return the entire feed
or should it return just the entry/item that contained the stuff in the users
query? What should an aggregating intermediary like PubSub do when it finds a
match in an element of a list-feed? Is there some way to return an entire feed
without building a feed of feeds? Given that no existing aggregator supports
feeds as entries, how can an intermediary aggregator/filter return something the
client will understand? 

You might say that the
search/matching engine should only present the matching entry in its results.
But, if you do that what happens is that you lose the important semantic data
that comes from knowing the position the matched entry had in the original
list-feed. There is no way to preserve that order-dependence information without
private extensions at present.

Im sorry but I simply cant
see that it makes sense to encourage folk to break important rules of Atom by
redefining feeds to be lists. If we want lists we should define
what they look like and put them in entries. Keep your hands off the feeds.
Feeds arent lists  they are feeds.



 bob
wyman












RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Roger Benningfield wrote:
 However, if I put something like:
 User-agent: PubSub
 Disallow: /
 ...in my robots.txt and you ignore it, then you very much
 belong on the Bad List.
I don't think so. The reason is that I believe that robots.txt has
nothing to do with any service I provide or process that we run. Thus, I
can't imagine why I would even look in the file. Remember, PubSub never does
anything that a desktop client doesn't do. We only look at feeds that have
pinged us or that someone has explicitly loaded into our system using
add-feed. We NEVER crawl. We're not a robot and thus I can't see why we
would even look at robots.txt. Does your browser look at robots.txt before
fetching a page? Does you desktop aggregator look at it before fetching a
feed? I don't think so! But, should a crawler like Google, Yahoo! or
Technorati respect robots.txt? YES!

bob wyman





RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Antone Roundy wrote:
 I'm with Bob on this.  If a person publishes a feed without limiting
 access to it, they either don't know what they're doing, or they're
 EXPECTING it to be polled on a regular basis.  As long as PubSub
 doesn't poll too fast, the publisher is getting exactly what they
 should be expecting.

Because PubSub aggregates content for thousands of others, it
removes significant bandwidth load from publishers' sites. We only read a
feed from a site in response to an explicit ping from that site or, for
those sites that don't ping, we poll them on a scheduled basis. In fact, we
read scheduled, non-pinging feeds less frequently than most desktop systems
would. No one can claim that we do anything but reduce the load on
publishers systems. It should also be noted that we support gzip
compression, RFC3229+Feed, conditional-gets, etc. and thus do all the things
necessary to reduce our load on publishers sites in the event that we
actually do fetch data from them. This is a good thing and not something
that robots.txt was intended to prevent.

bob wyman

 



RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Mark Pilgrim wrote (among other things):
 (And before you say but my aggregator is nothing but a podcast
 client, and the feeds are nothing but links to enclosures, so
 it's obvious that the publisher wanted me to download them -- WRONG!
I agree with just about everything that Mark wrote in his post.
However, I'm finding it very difficult to accept this bit about enclosures
(podcasts.) It seems to me that the very name enclosure implies that the
resources pointed to are to be considered part and parcel of the original
entry. In fact, I think one might even argue that if you *didn't* download
the enclosed items that you had created a derivative work that didn't
represent the item that was intended to be syndicated...
Others have pointed out the problem with links to images,
stylesheets, CSS files, etc. And, what about the numerous proposals for
linking one feed to another? What about the remote content pointed to by a
src attribute in an atom:content element? Should PubSub be able to read that
remote content when indexing and/or matching the entry? 
It strikes me that not all URIs are created equally and not
everything that looks like crawling is really crawling. I am firm in
believing that URI's in a/ tags are the stuff of crawlers but the URIs in
link/ tags, enclosures, media-rss objects, img/ tags, etc. seem to be
qualitatively different. I think crawling URI's found in link/ tags,
img/ tags and enclosures isn't crawling... Or... Is there something I'm
missing here?

bob wyman



RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Roger Benningfield wrote:
 We've got a mechanism that allows any user with his own domain
 and a text editor to tell us whether or not he wants us messing with
 his stuff. I think it's foolish to ignore that.
The problem is that we have *many* such mechanisms. Robots.txt is
only one. Others have been mentioned on this list in the past. Others are
buried in obscure posts that you really have to dig to find. How do we
decide which mechanisms to use? Also, since I don't think robots.txt was
intended to be used for services like the aggregators we're discussing, I
believe that for us to encourage people to use it in the way you suggest
would be an abuse of the robots.txt system.

 Bob: What about FeedMesh? If I ping blo.gs, they pass that ping
 along to you, and PubSub fetches my feed, then PubSub is doing
 something a desktop client doesn't do.
Wrong. Some desktop clients *do* work like FeedMesh. Consider the
Shrook distributed checking system[1]. FeedMesh and PubSub work very much
like Shrook's desktop clients do. In the Shrook system, all the desktop
clients report back updates that they have found to a central service that
then distributes the update info to other clients. The result is that the
amount of polling that goes on is drastically reduced and the freshness of
data is increased since every client benefits from the polling of all other
clients. Although no single client might poll a site more frequently than
once an hour, if you have 60 Shrook clients each polling once an hour, each
client is getting the effect of polling every minute... The Shrook model is
basically the same as the FeedMesh model except that in FeedMesh you
typically ask for info on ALL sites whereas in Shrook, you typically only
get updates for a smaller, enumerated set of feeds. However, the number of
feeds you monitor does not change the basic nature of the distributed
checking system. Shrook and FeedMesh are, as far as I'm concerned, largely
indistinguishable in this area. (There are some detail differences of
course. For instance, Shrook worries about client privacy issues that aren't
relevant in the FeedMesh case.)

Remember, PubSub only deals with data from Pings and from sites that
have been manually added to our system. We don't do any web scraping and we
don't follow links to find other blogs. Also, we filter out of our system
feeds that originate with services that are known to scrape web pages and
inject data that was not intended by the original publisher to appear in
feeds. (Often, people try to get around partial feeds by filling in the
missing bits by scraping from blog's websites.) Thus, we filter out any feed
that comes from a service like Technorati since they scrape blogs and inject
scraped content into feeds without the explicit approval or consent of the
publishers of the sites they scraped. 

bob wyman

[1] http://www.fondantfancies.com/apps/shrook/distfaq.php




RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Karl Dubost wrote:
 - How one who has previously submitted a feed URL remove it from
 the index? (Change of opinions)
If you are the publisher of a feed and you don't want us to monitor
your content, complain to us and we'll filter you out. Folk do this every
once in a while. Send us an email using the contact information on our site.
(Sorry I don't want to put an email address in a mailing list post... We get
enough spam already.) 

 - How someone who's not mastering the ping (built-in in the
 service, the software) but doesn't want his/her feed being indexed by
 the service.
Providers of hosted blogging solutions or of stand-alone system
should feel a responsibility to do a better job of educating their users as
to the impact of configuration options (or the lack of options.) There are
many blogging systems that don't support pings and others which normally
provide pings but allow users to turn them off. Some systems, like
LiveJournal even allow you to have a blog but mark it private so that only
your friends can read it and pings aren't generated. What might not be
happening as well as it could is the process by which service or software
providers are educating their users. Services should work harder to educate
their users.

bob wyman




RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman

Karl Dubost points out that it is hard to figure out what email address to
send messages to if you want to de-list from PubSub...:
Karl, Please, accept my apologies for this. I could have sworn we
had the policy prominently displayed on the site. I know we used to have it
there. This must have been lost when we did a site redesign last November!
I'm really surprised that it has taken this long to notice that it is gone.
I'll see that we get it back up.

 You see educating users is not obvious it seems ;) No offense, it
 just shows that it is not an easy accessible information. And
 there's a need to educate Services too.
Point taken. I'll get it fixed. It's a weekend now. Give me a few
days... I'm not sure, but I think it makes sense to put this on the
add-feed page at: http://www.pubsub.com/add_feed.php . Do you agree?

 Scenario:
 I take the freedom to add his feed URL to the service and/or to ping
 the service because I want to know when this guy talk about me the
 next time. Well the problem is that this guy doesn't want to be indexed
 by these services. How does he block the service?
Yes, forged pings or unauthorized third-party pings are a real
issue. Unfortunately, the current design of the pinging system gives us
absolutely no means to determine if a ping is authorized by the publisher.
This is one of many, many issues that I hope that this Working Group will be
willing to take up once it gets the protocol worked out and has time to
think about these issues.
I argued last year that we should develop a blogging or syndication
architecture document in much the same way that the TAG documented the web
architecture and in the way that most decent standards groups usually
produce some sort of reference architecture document. There are many pieces
of the syndication infrastructure that are being ignored or otherwise not
being given enough attention. Pinging is one of them.
Some solutions, like requiring that pings be signed would work
from a technical point of view, but are probably not practical except in
some limited cases. (e.g. Signatures may make sense as a way to enable Fat
Pings from small or personal blog sites. In that case, the benefit of the
Fat Ping might override the cost and complexity of generating the
signature.) Some have also proposed the equivalent of a do-not-call list
that folk could register with. We might also set up something like FeedMesh
where service providers shared updates concerning which bloggers had asked
to be filtered out. (That means you would only have to notify one service to
get pulled from them all -- a real benefit to users.) Or, we could define
extensions to Atom to express these things... There are many options.
Today, we do the best we can with what we have. Hopefully, we'll all
maintain enough interest in these issues to continue the process of working
them out.

bob wyman




RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman

James M Snell wrote:
 Does the following work?
 feed
  ...
  x:aggregateno/x:aggregate
 /feed
I think it is important to recognize that there are at least two
kinds of aggregator. The most common is the desktop end-point aggregator
that consumes feeds from various sources and then presents or processes them
locally. The second kind of aggregator would be something like PubSub -- a
channel intermediary that serves as an aggregating (and potentially caching)
router that forwards messages on toward end-point aggregators.
Your syntax seems only focused on the end-point aggregators. Without
clarifying the expected behavior of intermediary aggregators, your proposal
would tend to cause some significant confusion in the system. Should PubSub
aggregate and/or route entries that come from feeds marked no-aggregate?
If not, why not? From the publisher's point of view, an intermediary
aggregator like PubSub should be indistinguishable from the channel itself.

bob wyman





RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman

Karl Dubost wrote:
 One of my reasons which worries me more and more, is that some
 aggregators, bots do not respect the Creative Common license (or
 at least the way I understand it).
Your understanding of Creative Commons is apparently a bit
non-optimal -- even though many people seem to believe as you do.
The reality is that a Creative Commons license cannot be used to
restrict access to data. It can only be used to relax constraints that might
otherwise exist. A Creative Commons license that says no commercial use is
not prohibiting commercial use, rather, it is saying that the license does
not grant commercial use. (The distinction between prohibiting use and
not granting a right to use is very important.) A no commercial use CC
license merely says that other constraints i.e. copyright, etc. continue
to have force. Thus, if copyright applies to the content, and one has a
non-commercial use CC license on that content, one would assume that the
copyright restrictions which would tend to limit commercial use would still
apply.
It is important to re-iterate that a CC License only *grants*
rights, it does not restrict, deny, or constrain them in any way. Thus, you
can't say: The aggregator failed to respect the CC non-commercial use
attribute. You must say: The aggregator failed to respect the copyright.

bob wyman




RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman

Antone Roundy wrote:
 How could this all be related to aggregators that accept feed URL
 submissions?

My impression has always been that robots.txt was intended to stop
robots that crawl a site (i.e. they read one page, extract the URLs from it
and then read those pages). I don't believe robots.txt is intended to stop
processes that simply fetch one or more specific URLs with known names.

At PubSub we *never* crawl to discover feed URLs. The only feeds
we know about are:
1. Feeds that have announced their presence with a ping
2. Feeds that have been announced to us via a FeedMesh message.
3. Feeds that have been manually submitted to us via our add-feed
page.
We don't crawl.

I do not think we qualify as a robot in the sense that is relevant
to robots.txt. It would appear that Walter Underwood of Verity would agree
with me since he says in his recent post that: I would call desktop clients
clients not robots. The distinction is how they add feeds to the polling
list. Clients add them because of human decisions. Robots discover them
mechanically and add them. If Walter is correct, then he must agree with me
that robots.txt does not apply to PubSub! (and, we should not be on his
bad list Walter? Please take us off the list...)

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-23 Thread Bob Wyman

Bill de hÓra wrote:
 the problem is managing the stream buffer off the wire for a
 protocol model that has no underlying concept of an octet frame.
 I've written enough XMPP code to understand why the BEEP/MIME crowd
 might frown at it
Framing is, in fact, an exceptionally important issue. Fortunately,
HTTP offers us some framing capability in the form of chunked delivery. This
is much more light weight than what BEEP provides since HTTP assumes TCP/IP
as a transport layer while BEEP did not.
The HTTP chunked delivery method would be vastly superior to the
suggestions for doing thing like including form-feeds or sequences of nulls
as entry boundary markers. If you accept a simple rule that says that you
will insert HTTP chunk length markers between each entry sent in a
never-ending Atom file, you get something like the feed I show below.
Simply strip out the chunk length data prior to stuffing data into your XML
parser. If an entry appears to continue beyond a chunk boundary, discard
that entry and continue by reading the next chunk.

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
for more information on this method. Note that RFC2616 says: All HTTP/1.1
applications MUST be able to receive and decode the chunked
transfer-coding,...
Note: the chunk lengths are not correct in the following example.

GET /never-ending-feed.xml HTTP/1.1

HTTP/1.1 200 OK
Date: Fri Apr 8 17:41:11 2005
Server: FeedMesh/0.1
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml; charset=utf-8

ab
?xml version=1.0 encoding=utf-8?
feed ...
...

a8
entry
...
/entry

93
entry
...
/entry

And so forth until finally you get a /feed, the connection closes, or you
close the connection.

This is simple, requires no new specifications and provides for robust error
recovery in that broken entries can be easily detected and discarded.

bob wyman






RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-22 Thread Bob Wyman

James M Snell wrote:
 Second note to self: After thinking about this a bit more, I would
 also need a way of specifying a null license (e.g. the lack of a license).

 For instance, what if an entry that does not contain a license is
 aggregated into a feed that has a license.  The original
 lack-of-license still applies to that entry regardless of what is
 specified on the feed  level.  Golly Bob, you're right, this is
 rather messy ain't it. Hmm...

My apologies for not having more clearly pointed this out in my
original message. The problem is exacerbated for folk like us at PubSub
since we would feel completely comfortable in claiming copyright over the
collection of entries that we pass along to our subscribers, however,
there is *no way* that we could even hint at claiming copyright over the
individual entries themselves. If statements made at the feed level are
inherited by or in the scope of the entries, then we would not be able
to assert a copyright claim at the feed level since it would leak down to
the entries. 
Of course, one might argue that since we at PubSub will virtually
always ensure that any entry we publish has an atom:source element, one
could argue that we don't have to worry about this scope leakage. But, we're
a special case in this regard. The general issue of scope exists in cases
where the atom:source element is not present.

bob wyman




RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Bob Wyman

Paul Hoffman asked:
 Does an informative extension that appears at the feed level
 (as compared to in entries) indicate:
 a) this information pertains to each entry
 b) this information pertains to the feed itself
 c) this information pertains to each entry and to the feed itself
 d) completely unknown unless specified in the extension definition

I believe the correct answer is e:

  e) Unless otherwise specified, this information pertains to the feed only.

bob wyman




If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman
 as claimed by a ping, we would have to be able to trust the
pinger. Normally, creating such trust relationships is very expensive. However,
given that the vast majority of posts are made on the large services, we can
drastically increase the efficiency of the overall system by having just a few
of these hosters/publishers who are permitted the privilege of publishing Fat
Pings. It is my hope that in the future well be able to rely on Atoms
support for Digital Signatures to expand drastically the number of publishers
who could be trusted to publish Fat Pings.



Brad Proposes:



?xml version='1.0' encoding='utf-8' ?
atomStream
time1124247941/time
feed xmlns='http://www.w3.org/2005/Atom'
title type='text'some journal
title/title
link href=''"
title="http://www.livejournal.com/users/username/'">http://www.livejournal.com/users/username/'
/
authornamesome
name/name/author
entry
titlesome entry title/title
link href=''"
title="http://www.livejournal.com/users/username/12345.html'">http://www.livejournal.com/users/username/12345.html'
/
content type='html'
content
/content
/entry
/feed



I believe that the sample feed above would be better
represented as a simple Atom feed which contains entries having
source elements. Note: My sample is a bit bigger than Brads since Ive
included various bits that are required in Atom but that Brads proposal
omits. He readily admits in his postings that he has not yet gone to the effort
of ensuring that he is issuing compliant data.



I propose the following as an equivelant to Brads
sample:



?xml
version=1.0 encoding=utf-8?

feed
xmlns=http://www.w3.org/2005/Atom


titleLiveJournal Aggregate Feed/title

 link
href="">

 updated2005-08-21T16:30:02Z /updated


authornameBrad/name/author

 idtag:livejournal.org,2005:aggregatefeed-1/id



entry xmlns='http://www.w3.org/2005/Atom'

 source

 title type=text'Example Feed'/title

 link
href=''/

 link rel='self'
type='application/atom+xml'


href=''/


idtag:livejournal.org,2005:feed-username/id


updated2005-08-21T16:30:02Z/updated

 authornameJohn
Doe/name/author

 /source

 title some entry title /title

 link rel='alternate' type='text/html'


href=''/


idtag:livejournal.org,2003:entry-username-32397/id

 published2005-08-21T16:30:02Z /published

 updated2005-08-21T16:30:02Z
/updated

 content type=html

 This is some
bcontent/b.

 /content

/entry

. . .



/feed



 What do you think? Is there any conceptual
problem with streaming basic Atom over TCP/IP, HTTP continuous sessions (probably
using chunked content) etc.? Is there any really good reason not just to use
Atom as defined?



 bob wyman










RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Bob Wyman

Paul Hoffman wrote: 
 The crux of the question is: what happens when an extension that
 does not specify the scope appears at the feed level?
Robert Sayre asked:
 I'm not sure why this question is interesting. What sort of
 application would need to know?
I ask:
What should an aggregate feed generator like PubSub do when it finds
an entry in a feed that contains unscoped extensions as children of the
feed? 
* Would you expect us to include these extension elements in an
atom:source element if we use the entry in one of our feeds?
* Should we include in the source elements we generate even things
that we don't understand?
* What should we do if the entry already has a source element but
that source element doesn't include the extension elements? Should we
publish the source element as we find it? Or, should we modify the source
element to include the extensions? (assuming there are no signatures...)


bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Joe Gregorio wrote:
 Why not POST the Atom Entry, ala the Atom Publishing Protocol?
This would be an excellent idea if what we were talking about was a
low volume site. However, a site like LiveJournal generates hundreds of
updates per minute. Right now, on a Sunday evening, they are updating at the
rate of 349 entries per minute. During peak periods, they generate much more
traffic. Generating 349 POST messages per minute to perhaps 10 or 15
different services means that they would be pumping out thousands of these
things per minute. It just isn't reasonable.
Using an open TCP/IP socket to carry a stream of Atom Entries
results in much greater efficiencies with much reduced bandwidth and
processing requirements. 
At PubSub, we've been experimentally providing Fat Ping versions
of our FeedMesh feeds to a small group of testers. We publish messages at a
rate much higher than LiveJournal does -- since we publish all of
LiveJournal's content plus everyone else's. We couldn't even consider Fat
Pings if we had to create and tear down a TCP/IP-HTTP session to post each
individual entry.
There are many situations in which HTTP would work fine for Fat
Pings. However, for high-volume sites, it just isn't reasonable. The key, to
me, is that we establish the expectation that the Atom format is adequate to
the task (whatever the transport) and leave the transport selection as a
context dependent decision. Thus, some server/client pairs would exchange
streams of Atom entries using the POST based Atom Publishing Protocol while
others would exchange essentially the same streams using a more efficient
transport mechanism such as streaming raw sockets or even Atom over XMPP.

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 I wonder how you would make sure that the document is
 well-formed. Since the stream never actually ends and there
 is no way for a client to signal an intent to close the connection,
 the feed at the top would never actually be accompanied by a
 /feed at the bottom.
This is a problem which has become well understood in the use and
implementation of the XMPP/Jabber protocols which are based on streaming
XML. Basically, what you do is consider the open tag to have a virtual
closure and use it primarily as a carrier of stream metadata. In XMPP
terminology, your code works at picking stanzas out of the stream that can
be parsed successfully or unsuccessfully on their own. In an Atom stream,
the processor would consider each atom:entry to be a parseable atomic unit.

 If you accept that the stream can never be a complete
 well-formed document, is there any reason not to simply send a
 stream of concatenated Atom Entry Documents?
 That would seem like the absolute simplest solution.
You could certainly do that, however, you will inevitably want to
pass across some stream oriented metadata and you'll eventually realize that
much of it is stuff that you can map into an Atom Feed. (i.e. created
date, unique stream id, stream title, etc.). Since we're all in the process
of learning how to deal with atom:feed elements anyway, why not just reuse
what we've got instead of inventing something new.
A rather nice side effect of forming the stream as an atom feed is
the simple fact that a log of the stream can be written to disk as a
well-formed Atom file. Thus, the same tools that you usually use to parse
Atom files can be used to parse the log of the stream. It is nice to be able
to reuse tools in this way... (Note: At PubSub, the atom files that we serve
to people are, in essence, just slightly stripped logs of the proto-Atom
over XMPP streams that they would have received if they had been listening
with that protocol. In our clients we can use the same parser for the stream
as we do for atom files. It works out nicely and elegantly.)

bob wyman




RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 That issue is inheritance.
Let me give an example of problematic inheritance...
Some have suggested that there be a License that you can associate
with Atom feeds and entries. However, scoping becomes very important in this
case because of some peculiarities of the legal system.
One can copyright an individual thing and one can copyright a
collection of things. A claim of copyright in a collection is not, however,
necessarily a claim of copyright over the elements of the collection.
Similarly, a claim of copyright over an element of the collection doesn't
reduce any claim of copyright in the collection itself.
If we assume inheritance from feed elements, then without further
specification, it isn't possible to claim copyright in the collection that
is the feed without claiming copyright in its individual parts. What you'd
have to do is create two distinct types of claim (one for collection and one
for item. That's messy.)
I'm sure that copyright and licenses aren't the only problematic
contexts here.

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Joe Gregorio wrote:
 Why can't you keep that socket open, that is the default
 behavior for HTTP 1.1.
In some applications, HTTP 1.1 will work just fine. However, HTTP
doesn't add much to the high volume case. It also costs a great deal. For
instance, every POST requires a response. This means that you're moving from
a pure streaming case to an endless sequence of application level ACK/NAKs
that are simply replicating what TCP/IP already does for you. Also, the HTTP
headers that would be required simply don't contribute anything useful. The
bandwidth overhead of the additional headers as well as the bandwidth,
processing and timing problems related to generating responses begins to
look pretty nasty when you're moving at hundreds of items per minute or
second...
One really good reason for using HTTP would be to exploit the
existing HTTP infrastructure including proxies, caches, application-level
firewalls, etc. However, I'm aware of no such infrastructure components that
are designed to handle well permanently open high-bandwidth connections. The
HTTP infrastructure is optimized around the normal uses of HTTP. This isn't
normal. 
One of the really irritating things about the current HTTP
infrastructure is that it is very fragile. This is a problem that has
caused unlimited headaches for the folk trying to do notification over
HTTP (mod-pubsub, KnowNow, various HTTP-based IM/chat systems, etc.). The
problem is that HTTP connections, given the current infrastructure and
standard components, are very hard to keep open permanently or for a very
long period of time. One is often considered lucky if you can keep an HTTP
connection open for 5 minutes without having to re-initialize... Of course,
during the period between when your connection breaks and when you get it
re-established, you're losing packets. That means that you have to have a
much more robust mechanism for recovering lost messages and that means
increased complexity, network traffic, etc. The added complexity and trouble
can be justified in some cases; however, not in all cases.
HTTP is great in some cases but not all. That's why the IETF has
defined BEEP, XMPP, SIP, SIMPLE, etc. in addition to HTTP. One protocol
model simply can't suit all needs at all times and in all contexts.
Whatever... The point here is that Atom already has defined all that
appears to be needed in order to address the Fat Ping requirement whether
you prefer individual HTTP POSTs, POSTs over HTTP 1.1 connections, XMPP, or
raw open TCP/IP sockets. That is a good thing.

bob wyman




RE: If you want Fat Pings just use Atom!

2005-08-21 Thread Bob Wyman

Aristotle Pagaltzis wrote:
 Shades of SGML.
No! No! Not that! :-)

He continues with:
 ... many good points 

Basically, there are many really easy ways that one can handle
streams of Atom entries. You could prepend an empty feed to the head of the
stream, you could use virtual end-tags, you could just send entries and
rely on the receiver to wrap them up as required, etc... But, since all of
these are really easy and none of them really gets in the way of anything
rational that I can imagine someone wanting to do, why not just default to
doing it the way it is defined in the Atom spec? In that way, we don't have
to create one more context-dependent distinction between formats. Complexity
is reduced and we can avoid having to read yet-another-specification that
looks very, very much like hundreds we've read before. If Atom provides all
we need, lets not do something else unless there is a *very* good argument
to do so.

bob wyman




RE: Protocol Action: 'The Atom Syndication Format' to Proposed Standard

2005-08-17 Thread Bob Wyman

This is excellent news! Finally, we have an openly and formally defined
standard for syndication. Wonderful!

bob wyman



HTTP Accept Headers for Atom V1.0?

2005-07-15 Thread Bob Wyman








What would the HTTP Accept Headers for Atom V1.0 look like?
i.e. if I want to tell the server that I want Atom V1.0 but do not want Atom
0.3?



 bob wyman












Re: Major backtracking on canonicalization

2005-07-06 Thread Bob Wyman


Paul Hoffman wrote:

Now that I understand this better, I believe that our text should read:


Thank you for catching this. You've saved us major pain!

   bob wyman



RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Paul Hoffman wrote:
 I'm with Tim on the -1. Bob's suggestion and explanation make
 good sense for the implementer's guide, but not for the base spec.
 There is not an interoperability issue that I can see for entries
 without sources being signed.

Could we at least put in a sentence that states that including a
source element in signed entries is recommended? The implementer's guide
would then expand on that with more detail, discussion, etc. Note: I am not
suggesting use of the should word, although I would like it.
We can debate what it means to have an interoperability issue,
however, my personal feeling is that if systems are forced to break and
discard signatures in order to perform usual and customary processing on
entries that falls very close to the realm of interoperability if not within
it. Deferring this issue until the implementer's guide is written is likely
to defer it beyond the point at which common practice is established. The
result is likely to be that intermediaries and aggregators end up discarding
most signatures that appear in source feeds.

bob wyman




RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Tim Bray wrote:
 If I want to sign an entry and also want to make it available
 for aggregation then yes, I'd better put in an atom:source.  But 
 this is inherent in the basic definition of digsig; not something
 we need to call out.   -Tim
Certainly, the chain of reasoning is as clear and logical as you
describe. However, it is also very clear that this is precisely the sort of
multi-step chain of reasoning that is often overlooked by even the most
earnest of implementers. We have many, many indications that significant
numbers of RSS/Atom implementers do not, in fact, think much beyond what it
takes to get their content into a file. Even the best implementers, and
valued participants in this working group, have regularly proved that they
don't remember to think out all the systemic issues of syndication. Perhaps
it is because there are so few of us that act as intermediaries... The
issues are not well understood by those who don't serve this function.
Forgive me for suggesting that we call out the obvious. However,
this particularly bit of obviousness is not very obvious. In fact, it is
probably not obvious to most folk until *after* it has been called out. We
will help matters greatly by at least providing a recommendation that source
elements be inserted in signed entries...

bob wyman






RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Antone Roundy wrote:
 When signing individual entries that do not contain an 
 atom:source element, be aware that aggregators inserting an 
 atom:source element will be unable to retain the signature. For this 
 reason, publishers might consider including an atom:source element in
 all individually signed entries.


+1

bob wyman





RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Tim Bray wrote:
 Still -1, despite Bob's arguments, at least in part because we have
 no idea what kind of applications are going to be using signed
 entries and we shouldn't try to micromanage a future we don't
 understand. -Tim
We *DO* know that PubSub will support signed entries in the
future... And, we know that PubSub and any service like it will be forced to
discard signatures on any signed entries that do not have source elements in
them. Given that at least some would consider it likely that such services
will not only remain popular but grow in popularity in the future, why is it
such a terrible thing to provide an optional recommendation that people
address the needs of these services?
I find it hard to imagine what harm could be done by providing this
recommendation. Any application written in the future is already forced to
handle entries with source elements since these elements are permitted by
the Atom specification as it stands now. Thus, simply recommending that
people do what they are already permitted to do just doesn't seem to
threaten harm to unspecified future applications -- yet, it would clearly
accomplish some good in the case of the known applications.

What is the utility of signed entries if not to facilitate the
copying of entries between feeds? Why sign individual elements unless they
are likely to be removed from their original context? If entries are not to
be copied, then feed signatures are all that is necessary and would result
in smaller, more bandwidth-efficient feeds.

bob wyman




RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Paul Hoffman wrote:
 Timing. If we change text other than because of an IESG note,
 there is a strong chance we will have to delay being finalized by
 two weeks, possibly more.
I am aware of the issues with timing and I believe I am just as
concerned as you are with these issues. I was rather stunned to be at
Gnomedex recently and hear it said that after all the effort we've put into
Atom we still have nothing to show for it. An approved RFC would make such
statements much less acceptable... 
However, I think that this can be positioned as part of the response
to the IESG comments concerning canonicalization since including source
elements in signed entries will tend to cause those entries to be more
canonical or consistent in form. Also, given that the addition is merely a
recommendation and is thus non-normative, it shouldn't raise any review
issues.

Please remember that this isn't an issue that I just pulled out of
the hat at the last moment. I first brought this up long ago -- long before
last call... The problem, as has often been the case with the issues I
raise, is that there aren't many people who seem to be terribly aware of or
concerned with the aggregation issues even though we've got reasonable
representation from those who build feed generators and clients. I'm trying
hard to do the right thing for Atom and really wish that other
intermediaries, search engines, etc. would participate more but for whatever
reason, most have chosen to remain silent on these issues... 

bob wyman




RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

How about a compromise on the source insertion thing...

Paul Hoffman's proposed text for the first paragraph in Section 5 starts off
with a set of examples of why one would want to sign or encrypt atom entries
or feeds. (Discount coupons, bank statements, etc.) These examples were
requested by the IESG. In my opinion, none of the examples really speaks to
current uses of Atom. Thus, I would suggest that we either replace one of
the existing examples or add a new one with wording something like:

A publisher might digitally sign an entry, which included
an atom:source element, in order to ensure that verifiable
attribution for the entry was available if that entry was
copied into another feed or distributed via some other means.

I believe this improves the existing proposed text by providing a much more
immediately probable example than those currently listed. Additionally, by
alluding to the issue of including the source element it may at least tend
to cause implementers to consider the wisdom of including source elements in
signed entries. Finally, since the provision of examples is something that
was explicitly requested as part of the IESG review, this should not cause
any delay beyond those that are already inevitable.

bob wyman




RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Bob Wyman

Paul Hoffman wrote:
 Intermediaries such as aggregators may need to add an 
 atom:source element to an entry that does not contain its own
 atom:source element. If such an entry was signed, the addition
 will break the signature. Thus, a publisher of individually-signed
 entries should strongly consider adding an atom:source element to
 those entries before signing them.

It looks good to me. Thanks!

bob wyman




Re: Clearing a discuss vote on the Atom format

2005-07-04 Thread Bob Wyman


James M Snell wrote:

b. recommended inclusion of a source element in signed entries.

   +1

   bob wyman




Re: Roll-up of proposed changes to atompub-format section 5

2005-07-04 Thread Bob Wyman


   I believe it would be very useful to specify that signed entries should 
include a  source element. This can/should be considered part of entry 
canonicalization.
   The reason I suggest this is that signed entries are only really useful 
when extracted from their original source feeds. If entries are only read 
from their source feeds, then it is probably best for publishers to sign the 
feed, not the individual entries. (Note: It is my hope that feed publishers 
will anticipate that their entries will be extracted from the source feeds 
and will thus sign the individual entries rather than the feeds... i.e. 
Publishers should anticipate that intermediaries like PubSub and various 
other search/discovery services will aggregate their entries and republish 
them in non-source feeds.)
   When an entry is removed from its source, it SHOULD have a source 
element inserted if one is not already present. However, if a republisher 
inserts a source element into a signed entry that would break the signature. 
Thus, it seems reasonable that we should strongly encourage those who sign 
entries to anticipate the needs of subsequent processors by inserting the 
source elements in the original signed entries. By inserting the source 
elements, the requirement for others to break the signature will be 
drastically reduced. If an entry is signed, yet contains no source element, 
much of the utility of the signature (allowing verification of the original 
publisher) is eliminated.


   bob wyman




Re: More on Atom XML signatures and encryption

2005-06-30 Thread Bob Wyman


Paul Hoffman wrote:
Same as above.  Even though it is included-by-reference, the referenced 
content is still a part of the message.

No, it isn't. The reference is part of the message.

+1
   The signature should only cover the bits that are actually in the 
element (feed or entry) that is signed. Referenced data may be under 
different administrative control, may change independently of the signed 
element, etc.


   bob wyman




RE: More on Atom XML signatures and encryption

2005-06-22 Thread Bob Wyman

James M Snell wrote:
 I am becoming increasingly convinced that a c14n algorithm is
 the *only* way to accomplish the goal here.
The need for C14N should never have been questioned. Where there are
signatures, there *must* be C14N (Canonicalization). In the absence of
explicitly defined C14N rules, the C14N algorithm is simply: Leave it as it
is! -- but that is rarely useful and is certainly not useful in the case of
Atom.
The only interesting question is What is the C14N process for
Atom? The question: Is C14N required? is rhetorical at best. The answer
is Yes.

 The algorithm would recast the entry being signed as a standalone entity
 with all appropriate namespace declarations, etc.
Precisely. It is also exceptionally important to ensure that a
source element be included in any signed entry in order to ensure that the
signed entry can be copied to other feeds without breaking the signature or
changing the semantics of the entry by allowing feed metadata from the
non-source feed to bleed into the entry.

bob wyman




Re: More on Atom XML signatures and encryption

2005-06-21 Thread Bob Wyman


James M Snell wrote:
the ability to omit the author element from a contained entry / if the 
containing feed has an author...
   Signed entries should include a source element and that source element 
should contain any of the feed level elements that the entry depends on. 
This is one of the reasons that souce elements exist. The use of source 
elements drastically simplifies this part of the canonicalization process.


   bob wyman




Re: More on Atom XML signatures and encryption

2005-06-20 Thread Bob Wyman


James M Snell wrote:
Question: should we only allow signing of the entire document or are there 
valid use cases for allowing each individual entry in the feed to be 
individually signed?
   We definitely need to be able to sign each entry. This is necessary so 
that we can passed signed content in aggregated feeds. The mere act of 
aggregation should not force a signature to be removed from an item. (Note: 
Signed entries really *must* include source elements. Otherwise, aggregators 
will be forced to strip off the signatures in order to insert the source 
elements.)


   bob wyman




Re: Polling Sucks! (was RE: Atom feed synchronization)

2005-06-18 Thread Bob Wyman


James M Snell wrote:

If I understand Bob's solution correctly, it goes something like:
1) wake up
2) scratch whatever you need to scratch
3) turn on computer, launch feed reader
4) feed reader does some RFC3229+feed magic to catch up on what happened 
during the night
5) feed reader opens a XMPP connection to receive the active stream of new 
entries
   This is precisely what I was describing and it is what we implement in 
the PubSub Sidebar clients. This hybrid combination gives you the best of 
both worlds. The result is the lowest possible bandwidth consumption as well 
as the lowest latency in delivering content to clients.
   The Push+Pull approach is particularly well suited to the kind of high 
volume application that James Snell describes -- particularly if the server 
has a large number of readers. While I've previously pointed out the benefit 
to the network (efficient utilization of bandwidth) and to clients (low 
latency), it is important to point out that the Push model offers real 
benefits for the server as well. In extremely high volume applications, it 
is important that the server be able to control and smooth load. Server 
based load control is most easily accomplished with a Push system. In a Pull 
based system, load is almost totally dependent on client-driven scheduling 
and thus load tends to be very bursty. Bursty load is the worst possible 
thing to have in a network-based system. In Push based systems, the server 
is able to eliminate load bursts by spreading delivery of entries over 
time -- without worrying about the need to service bursty client requests 
within the window of their request time-out limits.
   Even though there are all sorts of advantages to using Push-based and 
hybrid Push+Pull systems, the reality is that only a tiny percentage of all 
the millions of servers that support Atom feeds will have sufficient traffic 
or readership to benefit from these methods. As Joe Gregario suggests in his 
recent note: 99.99% of all syndication is done via HTTP and this will 
probably remain the case in terms of a raw census of servers. However, it is 
also clear we are seeing significant growth in the use of feed aggregators 
like PubSub, FeedBurner and the other blog search and monitoring services. 
Also, we are seeing an increase in the use of feed-readers on mobile devices 
which require that feeds be consolidated and fed through proxies in order to 
reduce the amount of polling and other processing done by those mobile 
devices. As the use of these services increases, it will make sense for 
client developers to implement client-based support for Push+Pull and thus 
provide to their users the benefits of reduced bandwidth, reduced session 
management, and reduced latency. Broad client-based support makes sense even 
if similarly broad server-based support does not.


   bob wyman




Re: Polling Sucks! (was RE: Atom feed synchronization)

2005-06-18 Thread Bob Wyman


Sam Ruby wrote:
P.S.  Why is this on atom-sytax?  Is there a concrete proposal we are 
talking about here?  Is there likely to be?

   Because James Snell asked a question?.. But, more seriously:
   I intend to write an Internet draft for RFC3229+feed and hope that I'll 
be able to get the working group to consider it. Given the implemenation 
history, we certainly meet the IETF tradition of having more than three 
independent implementions as well as considerable experience in field use. 
Also, the Atom over XMPP Internet Draft is something that I think the 
Working Group should consider once the issues related to the syntax and 
protocol specs are dealt with.
   In any case, I think it is traditional for IETF mailing lists to provide 
a forum for discussion of potential use of the protocols that they define in 
addition to providing a forum for the work of defining the language of the 
specifications themselves. It is only by developing a common understanding 
of the various use cases that we can understand how the future work, if any, 
of the working group should be defined.


   bob wyman




Polling Sucks! (was RE: Atom feed synchronization)

2005-06-17 Thread Bob Wyman

Henry Story wrote:
 The best solution is just to add a link types to the atom syntax:
 a link to the previous feed document that points to the next bunch of
 entries. IE. do what web sites do. If you can't find your answer on
 the first page, go look at the next page.
 How do you know when to stop? If the pages are ordered  
 chronologically, the client will know to stop when he has come to a page
 with entries with update times before the date he last looked.

This is *not* simpler than taking a push feed using Atom over XMPP.
For a push feed, all you do is:
1. Open a socket
2. Send a login XML Stanza
3. Process the stanzas as they arrive.

For your solution, you need to:
1. Poll the feed to get a pointer to the first link. (each poll
will cost you a TCP/IP connection).
2. If you got a new first link then go to step 5
3. Wait some period of time (the polling interval)
4. GoTo Step 1
5. Open a new TCP/IP socket to get the next link
6. Form and send an HTTP request for the next entry
7. Catch the response from the server
8. Parse the response to determine if its time stamp is something
you've already seen.
9. If you haven't seen the current entry before, then go to step 5
10. Go to step 1 to start over.
(Note: I've eliminated and compressed a few steps to avoid more
typing... An actual implementation would be more complex than I describe
above.)

Your solution is more complex and generates much more network
traffic (i.e. because of polling the feed, repeatedly opening new TCP/IP
connections with all the traditional slow start overhead, and requesting
each next link). Additionally, you end up with reduced latency since the
age of any entry you discover will be, on average, half that of your polling
frequency plus some latency introduced by link following. (Yes, you could
rely on continuous connections and thus remove the overhead of creating so
many TCP/IP connections, however, at that point, you might as well have a
continuous push socket open...)
The push solution conserves network bandwidth, delivers data with
much less latency and is simpler to implement. 
Polling sucks! (that was a pun...)

bob wyman




RE: Polling Sucks! (was RE: Atom feed synchronization)

2005-06-17 Thread Bob Wyman

Antone Roundy wrote:
 XMPP:
 5. If the feed had entries that were old and not updated, go to step 7 6.
 If the feed has a first or next or whatever link, go to step 1 using
 that link 7. Open a socket 8. Send login XML stanza 
I am assuming that if you are pushing entries via Atom over XMPP,
you would only push new and updated entries. Thus, a client shouldn't need
to check for old and not updated entries. Also, I'm assuming that since
you are pushing entries, you wouldn't be inserting first or next links
that needed to be followed. The client would get all of its entries from the
XMPP stream.

 XMPP could achieve parity in getting feed changes that occurred while
 offline, at the expense of implementation complexity parity, by
 polling the feed once upon startup.
My assumption is that any well-built XMPP feed reader will, in fact,
also be able to read Atom files via HTTP. This is what we do at PubSub and
Gush does the same. I think Bill's app also does this. 
The original question dealt can, I think, be summarized as: How
does one best keep up with a high-volume Atom publisher? My point was that
the first and next links don't make things any easier. They just force
the client to do a great deal of work to discover what the server already
knows -- which entries have been updated. The first and next links
approach just makes the process of working with feed files more complex as
well as more bandwidth intensive. XMPP support is a much better solution for
keeping up with changes while connected.
Let's keep Atom as it is now -- without the first and next tags
and encourage folk who need to keep up with high volume streams to use Atom
over XMPP. Lowered bandwidth utilization, reduced latency and simplicity are
good things.

bob wyman




RE: Polling Sucks! (was RE: Atom feed synchronization)

2005-06-17 Thread Bob Wyman

Joe Gregorio wrote:
 The one thing missing from the analysis is the overhead, and
 practicality, of switching protocols (HTTP to XMPP).
I'm not aware of anything that might be called overhead. What our
clients do is, upon startup, connect to XMPP and request the list of Atom
files that they are monitoring. They then immediately fetch those files to
establish their start-of-session state. From that point on, they only listen
to XMPP since anything that would be written to the Atom files is also
written to XMPP. HTTP is only used on start-up. It's a pretty clean process.

 Let's keep Atom as it is now explain to folks who need to keep up with 
 high volume streams the two options they have, either streaming over
 XMPP or next links.
Where are these next links defined? I don't see them in the Atom
Internet Draft. The word next doesn't even appear in the ID... If they
aren't there, how can you call them Atom as it is now? I thought Henry
Story was proposing these as extensions.

bob wyman



Re: Atom feed synchronization

2005-06-16 Thread Bob Wyman


James M Snell wrote:
Nice. I had pulled out of the Atom discussions to work on another project 
back when this was being discussed and missed it. Quick question tho.. in 
your initial post on the concept you state It is my intention to create a 
Internet Draft describing the ideas here
   I do intend to write an Internet Draft, but I had been waiting to get 
some field experience before doing so. At this point, I guess I'm pretty 
sure it works as described. Many people have implemented RFC3229+feed and so 
far I've heard of no issues with it other than some folk who object on 
principle to RFC3229 itself. Other than waiting for field experience, the 
thing that has held me up is waiting for the working group to get far enough 
along with Atom so that I could propose that the Internet Draft be taken up 
here. Now that Atom V1.0 is almost in the can, I guess it is time to get the 
thing written...
   BTW: I think the best way to implement the application you describe is 
probably via a combination of Push and Pull. If you're updating as rapidly 
as you say you are, then it would make sense to push the updates to the 
client using something like Atom over XMPP[1]. You would, however, still 
generate Atom files and serve them using RFC3229+feed. The Atom files would 
be used by clients to catch up on missed messages when they initially 
connect or reconnect to the push stream after having been off-line for some 
time.
   The hybrid Push+Pull process described above is what we implement at 
PubSub for every subscription. We currently have this implemented in our 
PubSub Sidebars for IE and Firefox[2]. Also, the Gush reader from 2entwine 
implements this hybrid Push+Pull approach when reading our feeds.
   Push+Pull with Atom and Atom over XMPP gives you the best of both 
worlds. You get very efficient and low latency publishing of new entries to 
clients as well as efficient downloading of catchup files. What more could 
you want? :-)


   bob wyman

[1] http://www.xmpp.org/drafts/draft-saintandre-atompub-notify-02.html
[2] http://www.pubsub.com/downloads.php
[3] http://www.2entwine.com/features/pubsub.html





RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor

2005-06-07 Thread Bob Wyman

Greg Stein wrote:
 It was not published to muddy the waters. That implies a specific
 intent which was *definitely* not present.
Please accept my apologies for what was poor writing. I can see how
you read my sentence as implying intent to muddy. It wasn't my intent,
however, to imply that. I should have written. publishing this new format
*and* muddying the waters. My intent was to say that publishing the format
has the effect of muddying the waters. I wasn't trying to say that Google
was intentionally doing this.

 proprietary connotes closed.
I'm using the older definition of proprietary which means simply
not standard. I see nothing wrong with saying open and proprietary
format... I don't think one implies the exclusion of the other.

 How about this: you have a web site with 10 *million* URLs on it. 
 What format are you going to use? Is Atom appropriate at that scale?
No. I don't think Atom would work well with 10 million URLs. At
least not as currently defined. I do think, however, that it would have been
useful to try to at least have a conversation about defining some subset of
Atom that would address the need. I think a result could have ended up
looking much like the Sitemap format but offered a smoother migration path
from Atom as we know it to the more terse format and the reverse.

Please understand that I think that on-the-whole, the efforts by
Google to popularize the Sitemap process and syndication by non-blogs is
absolutely wonderful! I'm only grumbling about the formats... 

bob wyman



RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor

2005-06-03 Thread Bob Wyman

Arve Bersvendsen wrote:
 Actually, Google Sitemaps is already compatible with [Atom].
Yes, I've had a number of folk send me mail pointing to the FAQ that
I did not read as closely as I should. In fact, it is great that Google is
willing to accept Atom 0.3 files instead of just their Sitemap format.
Hopefully, this move will encourage many traditional web sites to start
producing Atom files to serve the role of Rich Site Summaries and allow us
to expand our feed services to cover non-blogs as well. 
I do still think it unfortunate that Google felt compelled to invent
yet-another-format for Sitemaps. Because of Google's support for the Sitemap
format, it is inevitable that aggregation services are going to have to
start reading Sitemaps in addition to the billions of flavors of legacy RSS
and Atom files. Nonetheless, given the importance of any format that Google
supports, should we be considering providing support in Atom for the
priority element or some easily mapped equivelant? Also, I wonder what it
will take to get Google so say that they'll support Atom 1.0? It seems
obvious that they would, but it would be nice to know when they expect to do
it...
The Sitemap Index files are proper and useful additions to what we
have now in the blogosphere and we should look carefully at either trying to
figure out how to incorporate this idea into what we do for Atom. Perhaps
something along the lines of Sitemap indexes could be incorporated into the
autodiscovery specification?

bob wyman



RE: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor

2005-06-03 Thread Bob Wyman

Graham wrote:
 I don't see how a highly specialized format for a particular task is
 a competitor to or even compatible with what Atom does.
The highly specialized task which is performed using the Sitemap
format is providing lists of changed web pages on sites. This is precisely
the function that is performed in many applications of Atom. The only
difference between the target of Sitemap and Atom is that Sitemap works with
web sites that are not blogs and Atom is usually used with web sites that
are blogs. However, the differences between these two kinds of site are
virtually non-existent. 
Atom doesn't need to be a jack of all trades to handle the job
that Sitemaps handle. It is already quite capable of doing the job.  And, as
James Snell points out in an earlier message, collection documents would
handle well the job of providing Sitemap indexes. 
It seems quite clear that the Sitemap and Sitemap index formats have
little to offer that isn't already provided by Atom. This obviously leads to
the question of why Google went to the trouble of defining these formats. It
would be real nice if someone from Google could provide a touch of
explanation...

bob wyman




ByLines, NewsML and interop with other syndication formats

2005-05-25 Thread Bob Wyman








Ive spent an interesting day
in Amsterdam at
the IPTC News Summit and had a chance to talk about standards convergence
issues with various folk in the IPTC (owners of NewsML, NITF, EventsML,
SportsML, ANPA, etc.). These folk seem sincerely interested in getting some
better worked out compatibility between things like NewsML and Atom

 Id like to suggest that we explicitly invite
the IPTC folk to propose a set of Atom extensions (that would include ByLine) with
the intention that these extensions would incorporate their detailed knowledge
of the publishing world and facilitate the interchange or translation of
documents between NewsML, NITF, etc. formats and Atom.



 bob wyman










RE: Compulsory feed ID?

2005-05-23 Thread Bob Wyman

Antone Roundy wrote re the issue of DOS attacks:
 I've been a bit surprised that you [Bob Wyman] haven't 
 been more active in taking the lead on pushing the conversation
 forward and ensuring that threads addressing the issue don't die
 out, given the strength of your comments on the issue in the past
 and the obvious significance to your business. ... Perhaps 
 you, who are probably in a better position than any of us to speak
 from experience on how to deal with this, could refresh our memories
 of specifically what you think the best solution is.
Yes, this issue is very important to us at PubSub and should be very
important to others as well. However, as I've learned from other recent
discussions, my viewpoint is not commonly held in this Working Group. Thus,
what I've been trying to do is pick carefully the issues that I work on. For
instance, I've put a great deal of effort into multiple ids since that
allows us the freedom to either work out proprietary solutions to the DOS
problem on our own or allows us to punt the problem forward to the
end-users' aggregators if we can't come up with a decent solution. 
Clearly, the best solution here would be for folk to use signatures.
But, that is going to take either a great deal of work to get adopted or
something really creative (and simple)... The history of attempts to get
signatures used does not make pleasant reading... We are putting effort into
working out methods to make signatures more acceptable to the community and
I hope to have some proposals soon... If we successful (wish us luck!) that
will at least provide a solution for some people...
Basically, it doesn't make sense for me to keep demanding that
people deal with issues that they clearly don't want to address. I've been
mentioning the DOS problem for months now and getting nowhere. So, the
reason I'm not pushing harder is that it is clear that implementable
work-arounds will be more useful than never agreed-to solutions...

bob wyman




How is Atom superior to RSS?

2005-05-22 Thread Bob Wyman








Ill be making a presentation on Tuesday which will
include a slide on how Atom improves on RSS. If you have any thoughts on this
subject, I would appreciate hearing them



bob wyman










RE: How is Atom superior to RSS?

2005-05-22 Thread Bob Wyman

This has been an experiment...
I've got lots of thoughts on why Atom is an improvement over RSS but
I am constantly amazed that people are able to continue making the claim
that Atom offers little that RSS doesn't already support. Certainly, Winer
and the Microsoft crowd make that claim regularly. I've often wondered why
people don't see the really important differences between these two. To a
certain extent, the answer comes in the replies I've received to my posting.
i.e. Not even those most familiar with Atom can present a decent list of
clear advantages -- even though they undoubtedly know them.

Yes, we all know the advantages of requiring unique atom:id values,
writing less ambiguous documentation, etc. However, I wonder why advances
like the following don't get more recognition (note: this is not a complete
list.)
1. Explicit support for xml:lang rather than the silly language/
tag of RSS V2.0.
2. Explicit support, in the core, for digital signatures and
encryption.
3. Atom Entry documents. Thus, support for the protocol as well as
for push delivery of Atom feeds via Atom over XMPP and other such protocols.
(i.e. Atom is designed to enable a push future rather than only working in
the legacy pull-only world of RSS)
4. Atom:source elements which provide robust support, in the core,
for attribution on entries that have been copied from one feed to another
and for preservation of important feed metadata in copied entries. Atom's
source element makes it a superior format for delivering search results, for
constructing feeds which aggregate entries from multiple sources, and for
push applications.
5. Support for XML content types rather than being limited to RSS's
HTML content type.
6. Explicit support for remote content. 

We all worked hard in getting these new capabilities and others like
them into Atom and properly defined. Why aren't these things given more
press and attention? They are significant improvements over RSS that will
have profound impact on our ability to build better applications for our
users.

bob wyman




RE: A different question about atom:author and inheritance

2005-05-22 Thread Bob Wyman

Tim Bray wrote:
The intent seems pretty clear; entry-level overrides source-level 
 overrides feed-level, but it seems like we should say that.
 Anybody think this is anything more than an editorial change? -Tim
I believe that this three-level chain of inheritance has always been
what we've intended. There was, however, a great deal of discussion at one
point about how to actually write the words. Thus, I agree that it is
largely an editorial change; however, you might expect some controversy over
particular word choices. Give it a shot and let's see how folk respond.

Note: There is more to authorship than just the inheritance issue. I
think it also makes sense that a feed-level author should be considered to
be the author of the collection of items which is the feed. This authorship
is independent of authorship over any particular entry within the feed. Even
if the feed contains no items authored by the feed-level author, the
feed-level author is still author of the collection. This distinction would
be useful in describing linkblogs, and a variety of other feeds types that
are composed of entries collected from other feeds or multiple authors.

bob wyman





RE: Refresher on Updated/Modified

2005-05-21 Thread Bob Wyman

Graham wrote:
 What if someone (either the publisher or someone downstream)
 wants to store a history of every revision in an archive 
 feed?
To this, Tim Bray answered:
 I don't see why, if you wanted that kind of archive, you couldn't
 use atom:updated for every little change in the archived version
 but atom:updated only for the ones you cared about in the published
 version.  In which case the archived version would be a superset
 of the published version.  I see nothing wrong with that. -Tim

Of course, the objections to Tim's position are obvious:
1. The case of someone downstream was ignored in the answer. Tim
only addresses the issue of what the publisher might do. 
2. Given Tim's solution to the problem, downstream readers would
be incapable of maintaining an accurate archive since only the publisher's
unpublished archive would have atom:updated values that change on each
modification.
3. The archive that Tim describes would not actually be a useful
archive for many purposes since it would not be an accurate description of
the sequence of entries written to the feed. For instance, such an archive
would not satisfy legal rules for logging data in financial applications
since such an archive could not be used to determine the value of the
atom:updated value in entries that had actually been published.

This whole argument is silly. Atom:modified is needed. It should be
provided. Nobody has given a decent argument against it. If you insist on
objecting to it then let the darn thing be optional -- but instead of trying
to impose your personal vision on the process, just let the rest of us get
on with doing the work we need to do in the way we know we have to do it.

bob wyman




RE: multiple ids

2005-05-21 Thread Bob Wyman

Tim Bray directs the editors to insert the following words:
 If multiple atom:entry elements with the same atom:id value appear in
 an Atom Feed document, they describe the same entry and Atom Processors
 MUST treat them as such.

It is a long standing and valued tradition in the IETF that
Standards Track RFC's MUST NOT impose constraints on applications unless
such constraints relate to issues of interoperability. Thus, while it is
entirely appropriate for the specification to state that multiple
atom:entry elements with the same atom:id ... describe the same entry it is
NOT appropriate to state how Atom Processors must treat such elements. The
text should read simply:

If multiple atom:entry elements with the same atom:id value appear in
 an Atom Feed document, they describe the same entry.

The appropriate handling of multiple instances of the same entry
is a matter which is the solely up to the discretion of Atom Processors
since variances in such handling do not impact interoperablity. One can
imagine that various developers will make different decisions in duplicate
handling policies. Some processors might even allow their end-users to
decide the handling policies. By making such decisions, developers will
either enhance or detract from the utility of the overall solutions they
develop -- but, it is not up to the IETF to direct what decisions should be
made in this case.
Restating this in a manner perhaps more friendly to those who
declare themselves as bits on the wire people: The specification should
specify the meaning of the bits on the wire -- not what one does with the
bits after receiving them.

bob wyman




RE: Compulsory feed ID?

2005-05-21 Thread Bob Wyman

Tim Bray wrote:
 I think the WG basically decided to punt on the DOS scenario. -Tim
I believe you are correct in describing the WG's unfortunate
disposition towards this issue. (Naturally, I object...) In any case, given
that a significant DOS attack has been identified -- yet not addressed -- I
think it would be both wise and appropriate to provide text in a Security
Concerns section that describes the vulnerability of systems that rely on
Atom documents to this particular attack.

bob wyman





atom:modified indicates temporal ORDER not version....

2005-05-21 Thread Bob Wyman

Robert Sayre wrote:
Versioning problems aren't solved by timestamps.
I don't understand why this version issue keeps coming up. It
should be apparent to everyone that there is NO relationship between
timestamp and version. Timestamps have only two functions:
1. Different timestamps indicate different instances of an entry.
2. Timestamps allow assumptions concerning temporal ORDER or
sequence to made

It is totally reasonable for me to develop a V1.0 followed by a
V2.0 which is then followed by a V1.1 -- if I have a reasonably rich
versioning scheme. If I then order-by-version, I would have the ordered set
V1.0, V1.1, V2.0. However, if I order-by-time, I have the ordered set
V1.0, V2.0, V1.1. 
I believe that atom:modified is intended only to permit
order-by-time. Certainly, that is all that is needed to address the vast
majority of use cases for which atom:modified has been declared useful.
Atom:modified is intended to allow processors to compare two non-identical
instances of a single entry and determine the order in which they should be
considered to have been created. Atom:modified allows us to say This entry
is considered to have been created after that entry. Atom:modified does not
permit us to make any statements concerning versions or variants.
It is possible that some confusion is being introduced here since
one may notice that if a very simple, single-line-of-descent versioning
scheme is used, the time-ordered sequence of instances will be identical to
a version-ordered set of instances. However, this is merely an anecdotal
observation that applies to only one of many possible classes of versioning
policy. There is no general correlation between temporal order and version
order that applies across all versioning policies. Any correlation between
temporal-order and version-order should generally be considered coincidental
and not interesting.
Knowledge of temporal order is extremely useful information that can
be usefully exploited by Atom Processors to deliver a number of capabilities
demanded by a variety of users. However, the current definition of Atom
(since it doesn't support atom:modified) does not permit general temporal
ordering of entries even when they are all published in a single feed. At
best, the current definition only allows us to temporally order sets of
entries that have the same atom:updated value. However, we have no means of
determining sequence or temporal ordering of the elements of a set whose
members share the same atom:updated value. This inability to order elements
of such sets is a significant weakness in Atom in that it introduces
ambiguity.
Atom should support atom:modified to permit the temporal-ordering of
members of sets that share the same atom:id and atom:updated values. This
has nothing to do with versioning.

bob wyman




RE: atom:modified indicates temporal ORDER not version....

2005-05-21 Thread Bob Wyman

Robert Sayre wrote:
 What does atom:id have to do with temporal ordering?
Absolutely nothing.
Atom:id is used to identify sets of entry instances which, according
to the Atom specification, should be considered the same entry. Sets
composed of instances of the same entry can then be divided into subsets
that share a common atom:updated value. After such a division into subsets,
some of the subsets may contain multiple elements which cannot be temporally
ordered given the current Atom spec draft. atom:modified provides a means to
temporally order the elements of sets which contain multiple elements that
share common atom:id and atom:updated values.
I believe this was communicated when I wrote:

Atom should support atom:modified to permit the temporal-ordering of
members of sets that share the same atom:id and atom:updated values.

bob wyman




RE: atom:modified indicates temporal ORDER not version....

2005-05-21 Thread Bob Wyman

I wrote:
 I believe this was communicated when I wrote:
 Atom should support atom:modified to permit the temporal-ordering of
 members of sets that share the same atom:id and atom:updated values.

Robert Sayre wrote:
 No, that's not what you communicated. How can I temporally order atom
 entries with different IDs but the same atom:updated value? atom:id
 and atom:modified are completely unrelated.
 I don't know what the problem is, but the answer is atom:modified!

Robert, it is clear that your disdain for the current discussions
has driven you to the point where you are no longer even reading the posts
to which you respond. This is not productive.
I have said *nothing* about the temporal ordering of atom entries
with different IDs. I have only written about the problem of providing
temporal ordering of atom entries that share the same atom:id and
atom:updated values. 
I repeat (with a few added words to make it even more clear):

Atom should support atom:modified to permit the temporal-ordering of
 members of sets whose members share the same atom:id and atom:updated
 values.

bob wyman




RE: Fetch me an author. Now, fetch me another author.

2005-05-21 Thread Bob Wyman

Robert Sayre wrote:
 atom:modified cannot be operationally distinguished from atom:updated.
 Obviously, if people start shipping feeds with the same id and
 atom:updated figure, it will be needed. There's no reason to
 standardize it, though. We don't know how that would work.
The definition of atom:updated was explicitly and intentionally
crafted to permit the creation of multiple non-identical entries that shared
common atom:id and atom:updated values. Clearly, it was the intention of the
Working Group to permit this, otherwise the definition of atom:updated would
not be as it is. Thus, it is ridiculous to try to suggest that feeds with
the same id and atom:updated are somehow unanticipated or not-understood.
If such feeds are so far outside the ken of what the working group intends,
then atom:updated should never have been defined as it is.
Additionally, atom:modified is clearly distinguished from
atom:updated *by definition!* Atom:modified indicates that last time an
entry was modified. Atom:updated indicates the last time it was modified in
a way that the publisher considered significant. This is a very clear
distinction.

bob wyman



RE: Fetch me an author. Now, fetch me another author.

2005-05-21 Thread Bob Wyman

Robert Sayre wrote:
 Here's the last time this discussion happened:
 http://www.imc.org/atom-syntax/mail-archive/msg13276.html
Tim's point in the referenced mail supported the current definition
of atom:updated which provides a means for publishers to express their own
subjective opinions of what is a significant change to an entry. However,
the solution of one problem does not eliminate the second problem. The
second problem is that readers (not publishers) need to be able to
distinguish and temporally order entries that have been written by
publishers. Because the publishers CANNOT know the detailed needs of all
their readers, publishers' subjective input cannot be held to be useful.
Objective metrics which can be clearly understood by both publishers and
readers must be used. In this case, the best objective measure to use is to
say that the change of one of more bits in the encoding or representation of
an entry should result in a new atom:modified value.

* Atom:updated addresses needs of publishers
* Atom:modified addresses needs of readers

Both sets of needs, that of publishers as well as readers, must be
addressed and dealt with by the Atom format. Atom:updated only addresses the
needs of publishers.

bob wyman




RE: atom:modified indicates temporal ORDER not version....

2005-05-21 Thread Bob Wyman

Robert Sayre wrote:
 Temporal order of what? They are all the same entry, so what is it
 you are temporally ordering?
We are discussing the temporal ordering of multiple non-identical
*instances* of a single Atom entry. It is common in the realm of software
engineering to deal with this concept of instances. Things are often
considered to be simultaneously different and the same. (I am who I am
today -- as I was when I was a child, nonetheless, I am very different today
than I was when I was a child. The instance of me today differs from the
instance of me that you might have come across many years ago.) But, perhaps
this concept is too abstract for some readers...

 Why is this a new problem that only arises when we allow multiple
 IDs in the same feed?
I have been pointing out these issues since long before the issue of
multiple IDs (multiple instances) recently regained attention. The issue
exists even without duplicate id support but is particularly critical once
we support multiple instances of an entry in a single feed document.
In the absence of duplicate id support, a reader can infer the
temporal order of entries by simply noticing the order in which the entry
instances were read from a feed document. (If duplicate ids are prohibited,
then if you have read two entry instances which share a common atom:id, they
must have been read from different instances of feeds and at different
times. Thus, you can infer in some cases that the temporal ordering of the
entry instances approximates the temporal ordering of the read operations
which retrieved the entry instances. ) 
However, if you permit multiple instances of an entry in a single
feed document then it is possible that you will read multiple entries whose
temporal order cannot be inferred. (Note: Order of appearance in a feed does
not imply any inter-entry order and thus cannot be used to infer or discover
the temporal ordering of entries.)
Thus, this issue *is* related to the multiple ID issue in that the
problem is exacerbated by permitting multiple instances of a single entry in
a single feed document. Whether or not it is relevant in other contexts is
largely irrelevant since it appears that addressing the issue in one context
will resolve it in other contexts as well.

bob wyman





RE: Refresher on Updated/Modified

2005-05-21 Thread Bob Wyman

Tim Bray wrote:
 for archiving purposes I consider all changes no matter how small
 significant, and thus preserve them all with different values of
 atom:updated.  For publication to the web, I have a different
 criterion as to what is significant.  I fail to see any problem
 in the archive being a superset of the feed.
The problem is that such an archive would not accurately reflect
what you actually published to the web. Thus, for many applications, you
would also have to keep a distinct log of what you published. Using your
archive you wouldn't be able to meet various legal requirements that apply
to a number of businesses which require that you be able to show what you
published.
The problems get worse if you include signatures in your entries.
Using your archiving method, the signatures on your archived entries would
be different for the signatures on the entries you published.
The archive method you describe does not produce a superset of
what you published; it is a different set of data from that which you
published. This is not necessary.

bob wyman




RE: Refresher on Updated/Modified

2005-05-21 Thread Bob Wyman

Tim Bray wrote:
 I regularly make minor changes to the trailing part of long
 entries and decline to refresh the feed or the atom:updated date, 
 specifically because I do not went each of the ten thousand or
 so newsreaders who fetch my feed to go and re-get the entry
 because I fixed a typo in paragraph 11.
It seems like you are concerned that people who see a change in your
feed will re-fetch the HTML? If this is your concern, then do as you do now
and don't refresh the feed unless you have a change that warrants an update
to atom:updated. This is totally up to you and support for atom:modified
wouldn't change that. There is no requirement that your feed change whenever
you modify your posts. Thus, there is nothing that stops you from pursuing
this policy. You are essentially arguing that the standard should force
everyone to have a blog that works in the manner that your blog works. That
is not reasonable. To argue that the standard should make it possible for
you to do things the way you want is quite reasonable. But, you should give
to others the same consideration you apparently demand from them.

bob wyman




RE: atom:modified (was Re: Fetch me an author. Now, fetch me another author.)

2005-05-21 Thread Bob Wyman

Antone Roundy wrote:
 Unless the need for this can be shown, and it can be shown that
 an extension can't take care of it, I'm -1 on atom:modified.
The need is simple and I've stated it dozens of times... Given two
non-identical entries that share the same atom:id and the same atom:updated,
I need to know which of them is to be presented to the user. The current
specification doesn't allow me to do anything other than make a random
choice. This is not reasonable. Atom:modified would provide the data needed
to determine which was the most recently produced of the two entries. That
most recently produced entry is the one that is most often desired by users.
On extensions... Virtually anything can be done in extensions. If
nothing should be in the core except those things that can be defined by
extensions, then nothing would be in the core. It is inevitable that
extensions will not be as broadly implemented as elements of the core. The
practical implication of forcing something to be an extension is to ensure
that it is never broadly implemented.

bob wyman





RE: Refresher on Updated/Modified

2005-05-21 Thread Bob Wyman

Tim Bray wrote:
 As a matter of policy, my feed contains the most recent 20  
 posts.  However, if one of those posts is a long post and only the  
 summary is provided, when I make a change, I make a conscious  
 decision whether it's sufficient that I want newsreaders to re-fetch  
 it, and if so I change the datestamp, otherwise not.
Finally, your true motivations appear. It is now clear that you're
not talking about Atom itself. Rather, you are trying to regulate the
behavior of Atom Processors who will be using Atom feeds somewhat like
ping feeds that tell them which entries to fetch. Your concerns are easily
addressed by providing text in the specification that makes clear what you
want.

I therefore propose the following text:

To the discussion of both atom:updated and atom:modified add:
[Non-Normative: In order to preserve network bandwidth and reduce the load
on hosts of resources linked to Atom feeds or entries, Atom Processors which
fetch the contents of alternate links are advised that they should not
re-fetch such contents unless atom:updated changes.]

To the discussion of atom:modified add this normative text:
The value of atom:modified MUST only be changed when some other element
(including atom:updated) of the same Atom entry has changed. Changes which
are limited to resources linked to the Atom Entry MUST NOT trigger changes
to atom:modified.

 Since I'm a good citizen about specs, I would do this wasteful thing.
If the spec were written as I have proposed above, then you -- as a
good citizen -- would never re-fetch the alternate linked resources unless
atom:updated changed.

The difference here is that my comments have been solely focused on
the contents of the Atom feed -- which is all that PubSub is concerned with.
Nonetheless, the proposed texts should resolve your issues while allowing
PubSub to do its job.

bob wyman




Microsoft to support Atom in any aggregator they produce

2005-05-20 Thread Bob Wyman

FYI: 
Robert Scoble, a Microsoft employee/insider very familiar with Microsoft's
plans for syndication, declares in comments on his blog that we are
supporting Atom in any aggregator we produce. Microsoft's example in
supporting Atom should be followed by all other aggregator developers in the
future and Microsoft should be commended for supporting the adoption of
openly defined standards for syndication.


For more info (and some heated comments...) see:

 http://bobwyman.pubsub.com/main/2005/05/microsoft_to_su.html

bob wyman




Which is the preferred feed?

2005-05-09 Thread Bob Wyman








Some sites are beginning to serve
their feeds via intermediaries like FeedBurner. They are doing this, in part,
to make it easier for them to get better statistics on their use of the feeds,
to off-load bandwidth requirements, or to take advantage of the advertising insertion
and management programs of the intermediaries.

However, many of todays
intermediaries require that program participants manage a base
feed on their own sites that is later copied to the intermediary. This is the
approach taken by FeedBurner among others. Whether or not the intermediaries
require that a feed be maintained on the site, this is usually required if only
because there will be people who are reading the feed and there is no means to
notify them, within the feed, that a new preferred source of the
feeds is available.

For instance, the Typepad site
blog.deeje.tv has two feeds generated by Typepad:



http://blog.deeje.tv/musings/atom.xml

http://blog.deeje.tv/musings/index.rdf



and it has a feed generated by
FeedBurner:



http://feeds.feedburner.com/deeje/musings



Now, my assumption is that the owner
of blog.deeje.tv probably would prefer that people read his FeedBurner feed
rather than the TypePad feeds. Evidence of this can be seen in that the
autodiscovery links on the page point to the FeedBurner feeds. However, while
the links currently point to FeedBurner, they have not always pointed there
At some point in the past, the owner of this blog decided to prefer the
FeedBurner service over Typepad for feed services. At some point in the future,
the same owner might wish to drop the FeedBurner service in favor of some other
service  or perhaps just go back to Typepad normal feeds.

The problem, of course, is that
there is no existing mechanism by which these changes in preferred feeds can be
indicated in either an Atom or RSS file. The result is that any software system
that started reading the Atom or RDF feeds provided by Typepad before this blog
started using FeedBurner will continue to read the Typepad feeds in the future.
Similarly, any system currently reading the FeedBurner feeds is likely to
continue reading those feeds in the future.

One could argue that feed reading software
should, on some regular schedule, re-scan the alternate site for
a feed to see if the autodiscovery links have changed. However, this is a pretty
crude solution It would be much, much better to allow a feed to contain
data that explicitly identifies a preferred alternative source.

Supporting a means to identify a preferred
alternative source would greatly improve the mobility of feeds across the
network and would avoid the current problem of potentially pinning someone down
to a feed delivery service simply because of historical accident. If I want to
move my feeds from Typepad to FeedBurner, I should be able to without having to
worry about leaving behind everyone who had ever started reading my Typepad
feeds. Similarly, if I later decide that I want to move off FeedBurner, there
should be a way to point people to the location of my preferred feeds.



bob wyman
















RE: Which is the preferred feed?

2005-05-09 Thread Bob Wyman

Anne van Kesteren wrote:
 Sites could also use a HTTP 302 link on their own site that points
 to FeedBurner in the end. When FeedBurner dies or when they no longer
 have desire to use the service, they switch the location of the 
 temporary redirect and all is fine.
While 302 is an obvious technical solution, it just doesn't do the
job. HTTP's 302 is just a bit too absolute...
For instance, if I'm trying to push people from my Atom 0.3 feed to
my Atom V1.0 feed, it is likely that there will be many readers who don't
know how to process Atom V1.0 correctly -- at least initially. They should
be free to fallback to the Atom 0.3 feed until they learn the new format.
Similarly, if I have readers who use one of the MAC based readers that only
read RSS, it becomes problematic to force them to read my Atom V1.0 feeds...
It should also be noted that the ability to change HTTP response
codes is not something which is typically provided to many bloggers today.
I'm aware of no blog hosting services that allow for customer-requested
302 status values. Even on the one-user systems, I think its pretty hard
for normal folk to figure out how to make the modifications needed to return
a 302 for some files.
We should also realize that business issues are likely to make it
difficult for people to use 302-based solutions. For instance, a site that
provided intermediary feed serving might not wish to make it easy for people
to migrate their feeds away from their service. They might *like* the idea
that switching costs are very high... Thus, they might simply refuse (on
some technical grounds) to allow users who are moving to a new service to
get 302-forwarding on their feeds. On the other hand, if the Atom format
itself contained a means of redirecting to preferred feeds, and if the spec
said that such data MUST NOT be removed when a feed is copied, etc., then
one could essentially force vendors to support feed mobility. (Yes, there
would be loop-holes)
Normally, I wouldn't argue for replicating an HTTP feature inside
the feeds, however, I think that what I'm talking about here is not really
what 302 was intended to provide. In any case, this may be looked at as a
layering issue. 302 provides hard redirection at the HTTP level, a
preferred feed indicator provides soft-redirection at the application
level. Implementation of similar services in multiple layers of the stack is
a reasonable thing to do as long as the semantics vary at least slightly
between the layers and the reasons for the variances are related to the
nature of the layers.

bob wyman




RE: PaceAllowDuplicateIDs

2005-05-06 Thread Bob Wyman

Graham wrote:
Does anyone remember why having the same id in a feed is a bad idea?
 Beacuse instead of a fixed model where a feed is a stream of
 entries each with their own id, it is now a stream of entries each
 of which does not have its own id, but shares it with similar
 entries. This is bullshit.
I completely disagree on this.
I think the problem here is people focusing too much on
characteristics of the feed when the real issue here is Entries. Like I've
said in the past, It's about the Entries, Stupid! (don't take offense...)
As long as we allow entries to be updated, it is inevitable that the
stream of entries that is created over time will contain instances of
entries that share common atom:id values. 
The only question here is whether or not we're willing to allow a
feed document to *accurately* represent the stream of entries -- as they
were created -- or whether we insist that the feed document censor the
history of the stream by removing old instances of updated entries before
allowing updates to be inserted.
The reality is that no matter which decision we make in this case,
any useful aggregator must have code to deal with multiple instances of an
entries that share the same atom:id. This is the case since even if we don't
permit duplicate IDs in a single instance of a feed document, we would still
permit duplicate ID's *over time. Because duplicate ids appear, over time,
whenever you update an entry, the aggregator has to have all the logic
needed to handle them in the *stream* of entries that it reads -- over time.
This issue only becomes interesting if we try to provide special
rules for the handling of data within a single instance of a feed document.
The reality is, however, that any aggregator that actually pays attention to
these special case rules is going to either get more complex (since it can't
simply treat everything as a stream of entries) or it will get confused
(since folk will intentionally or unintentionally create duplicate ids).
This ban on duplicate ids provides no benefit for aggregators, it
makes feed producers more complex, it tempts aggregator or client writers to
do dangerous things, it forces deletion of data that is useful to some
people for some applications, it puts too much emphasis on feeds when we
should be working on entries, etc... It is a really bad thing to do.

bob wyman




RE: entry definition

2005-05-06 Thread Bob Wyman

Henry Story wrote:
 An Atom Entry is a resource (identified by atom:id) whose
 representations (atom:entry) describe the state of a web resource
 at a time (the link alternate).

I think that if this is not 100% correct then it is at least very
close to whatever correct actually is. 

bob wyman




RE: Autodiscovery

2005-05-06 Thread Bob Wyman

Sjoerd Visscher wrote:
 [HTML 4.01 says:] This attribute describes the relationship from
 the current document to the anchor specified by the href attribute.
 The value of this attribute is a space-separated list of link types.
But, if you copy HTML from one document to another, or you construct
an HTML document from parts, you risk carrying a tags with rel attributes
from one document to another. If I quote some HTML in a new HTML document
and the quoted HTML includes rel=alternate in an a tag, are we really
saying that the presence of rel=alternate in the quoted text establishes a
relation of the new HTML document as a whole?
Personally, I think there is a serious scoping problem here. We've
got attributes of separable components of a page establishing metadata for
the page as a whole. Not good.

bob wyman




  1   2   >