Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt

2005-06-30 Thread Thomas Broyer


Antone Roundy wrote:
 Getting back to how to use static documents for a chain of instances,
 that could easily be done as follows. The following assumes that the
 current feed document and the archive documents will each contain 15
 entries.

 The first 15 instances of the feed document do not contain a prev
 link (assuming one entry is added each time).

 When the 16th entry is added, a static document is created containing
 the first 15 entries, and a prev link pointing to it is added to the
 current feed document. This link remains unchanged until the 31st entry
 is added.

 When the 31st entry is added, another static document is created
 containing the 16th through 30th entries. It has a prev link pointing
 to the first static document. The current feed document's prev link is
 updated to point to the second static document, and it continues to
 point to the second static document until the 46th entry is added.

 When the 46th entry is added, a third static document is created
 containing the 31st through 45th entries, etc.

However, there should then be a this link in the live feed, otherwise
I'll have to retrieve (as a reader/aggregator) the prev feed each 15
entries:

Say I retrieved the feed when it was 15-entries long. When the 16th entry
is added and the first static document created, the live feed is added a
prev link, pointing to a document I never retrieved, so I guess I might
have missed entries and retrieve it. I end up retrieving back the 15
entries I already know of.
When the 31st entry is added, the feed's prev link is changed to
reference the new 16th-to-31st archive feed. This is an URI I never
dereferenced, so I guess I might have missed some entries and then
dereference the URI and retrieve the archive feed. If I had retrieved the
feed when it was 30-entries long, I end up retrieving back the 16th to
31st entries I already know of.

One could argue that I don't need to retrieve the archive feed as the live
feed already contains 14 entries (2nd to 15th, or 17th to 30th) I already
retrieved, using atom:updated and atom:id to notive them.
Well, nothing precludes an entry to be pushed to front even if its
atom:updated hasn't changed, so the entry following such a puished to
front entry could be one I never saw and I might have missed it.
And actually, this doesn't otherwise change the problem, which would still
arise if I retrieve the live feed when, say, it was 15-entries long and
15 entries later: I never saw the prev archive feed or any of the 15
entries in the live feed (so I can't conclude anything based on
atom:id+atom:updated), I then retrieve the -prev linked archive feed and
end up retrieving 15 entries I already know of, because it happens than I
actually didn't miss any entry between my two live feed retrievals...

So we need a mean to either identify the *next* prev link (a this or
permalink link in the live feed (no need to have one in archive
feeds, as already said on the list), which means it must be predictable),
or something to tell us we didn't missed entries, such as the atom:updated
of the prev-linked archive feed (is atom:updated enough?).

We'll end up with the live feed being either:
feed xmlns=... xmlns:fs=...
  link rel=archive href=http://example.com/2005/05/; /
  !-- I didn't use a link construct as the document not yet exists) --
  fs:predicted-archive-uri
http://example.com/2005/06/
  /fs:predicted-archive-uri
  ...
/feed

or

feed xmlns=... xmlns:fs=...
  !-- I used an extension attribute, even if it's not clearly
   defined by the Atom Syndication Format --
  link rel=prev href=http://example.com/2005/05/;
fs:updated=2005-05-31T23:59:59 /
  ...
/feed

One advantage of the latter is that you don't rely on URIs as identifiers
for the feed archive documents and they can be moved/split/merged without
readers and aggregators being then implicitly told to retrieve back the
whole archives (if you change URIs, they'll think they missed entries...).

-- 
Thomas Broyer




Re: More on Atom XML signatures and encryption

2005-06-30 Thread James M Snell


Paul Hoffman wrote:


At 12:47 PM -0700 6/29/05, James M Snell wrote:

1. After going through a bunch of potential XML encryption use cases, 
it really doesn't seem to make any sense at all to use XML Encryption 
below the document element level.  The I-D will not cover anything 
about encryption of Atom documents as there are really no special 
considerations that are specific to Atom.



Good.

2. The I-D will allow a KeyInfo element to included as a child of the 
atom:feed, atom:entry and atom:source elements.  These will be used 
to identify the signing key. (e.g. the KeyInfo in the Signature can 
reference another KeyInfo contained elsewhere in the Feed).



This is OK from a security standpoint, but why have it? Why not always 
have the signature contain all the validating information?


You know, if you had asked me this when I wrote this requirement down in 
my notes three days ago I would have been able to give you the answer.  
The fact that I'm staring at my screen trying to recall what that answer 
is indicates that it's not a very good one ;-) ... You're right, there 
really is no need to separate the keyinfo from the signature in this 
situation.


3. When signing complete Atom documents (atom:feed and top level 
atom:entry), Inclusive Canonicalization with no pre-c14n 
normalization is required.



There seems to be many more interoperability issues with Inclusive 
Canonicalization than with Exclusive. What is your reasoning here?



Two reasons:
a. No need to re-envelope things at the document level
b. Ignorance on my part as to what all the interoperability issues are.  
Can you elaborate or point me to some relevant discussions?


4. The signature should cover the signing key. (e.g. if a x509 cert 
stored externally from the feed is used, the Signature should 
reference and cover that x509 cert).  Failing to do so opens up a 
security risk.



Please explain the security risk. I probably disagree with this 
requirement, but want to hear your risk analysis.


This is mostly tied to #2 above and comes from a lesson learned from 
WS-Security. Specifically section 13.2.4 of 
http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0.pdf


   Implementers should be aware of the possibility of a token 
substitution attack. In any
situation where a digital signature is verified by reference to a 
token provided in the
message, which specifies the key, it may be possible for an 
unscrupulous producer
to later claim that a different token, containing the same key, but 
different information

was intended.

If we don't verify-by-reference to a key contained elsewhere in the feed 
(or other location), this no longer becomes an issue.


5. When signing individual atom:entry elements within a feed, 
Exclusive Canonicalization MUST be used.  If a separate KeyInfo is 
used to identify the signing key, it MUST be contained as either a 
child of the entry or source elements.  A source element SHOULD be 
included in the entry.



Why is this different than #3?

These entries are subject to re-enveloping in a way that document level 
elements are not. It is possible to use ex-c14n throughout so that the 
behavior is consistent. The KeyInfo statement relates to #2 and thus 
becomes irrelevant.


6. If an entry contains any enclosure links, the digital signature 
SHOULD cover the referenced resources.  Enclosure links that are not 
covered are considered untrusted and pose a potential security risk



Fully disagree. We are signing the bits in the document, not the 
outside. There is security risk, those items are simply unsigned.


I tend to consider enclosures to be part of the document, even if they 
are included by reference.  As a potential consumer of an enclosure I 
want to know whether or not the referenced enclosure can be trusted.  Is 
it accepted to change the SHOULD to a MAY with a caveat outlining the 
security risk?


7. If an entry contains a content element that uses @src, the digital 
signature MUST cover the referenced resource.



Fully disagree.

Same as above.  Even though it is included-by-reference, the referenced 
content is still a part of the message. 

8. Aggregators and Intermediaries MUST NOT alter/augment the content 
of digitally signed entry elements.



Also disagree, but for a different reason. Aggregators and 
intermediaries should be free to diddle bits if they strip the 
signatures that they have broken.


Ok, my fault. I wasn't clear.  Reword to Aggregators and Intermediaries 
MUST NOT alter/augment the content of digitally signed entry elements 
unless they strip the Signature from the entry


9. In addition to serving as a message authenticator, the Signature 
may be used by implementations to assert that potentially 
untrustworthy content within a feed can be trusted (e.g. binary 
enclosures, scripts, etc)



How will you assert that?

Not so much a normative assertion.  More of a if you know who produced 
this feed/entry and 

Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt

2005-06-30 Thread Mark Nottingham


Hi James,

On 29/06/2005, at 10:09 AM, James M Snell wrote:


1. This appears to be addressed at solving the same problem as Bob  
Wyman's RFC3229+feed proposal [http://bobwyman.pubsub.com/main/ 
2004/09/using_rfc3229_w.html].  Do you have any empiracle data  
similar to what Bob provides @ http://bobwyman.pubsub.com/main/ 
2004/10/massive_bandwid.html that would indicate that your approach  
is a better solution to this problem?  These are actually not  
mutually exclusive solutions, they're just different and could be  
used for different scenarios -- e.g. Bob's tends to make a lot of  
sense for blog dashboard feeds like what we use within IBM to show  
all post and commenting activity within our internal blogs server  
while your mechanism would work rather well for things like Top Ten  
lists, etc.  I would just like to see a bit of a compare/contrast  
on the two approaches.


It's orthoganal to RFC3229. The problem I'm solving is how to  
reconstruct the *entire* state of the logical feed, not just one  
partial representation of it; although RFC3229 could be used to do  
that, it would require feed authors to post the entire content of  
their feed (potentially, many megabytes). This would incur a huge  
load, because any clients that don't support RFC3229 would have to  
GET the entire feed, leading to severe bandwidth problems.


To give a concrete example, Dave Winer would have to post one RSS  
file containing every entry he's made in Scripting News for the past  
10+ years to use RFC3229 to meet the same goal; with this proposal,  
he'd just have to add a 'prev' to each archived feed (assuming he has  
archives around, which if he doesn't, I imagine he could reconstruct).


2. Is the feed state mechanism a way of paging through the current  
contents of a collection or a snapshot-in-time view of a feed?   
That is...


   is it

   A) Collection has a bunch of entries.  Each feed  
representation has 15 entries and the prev link
acts like a paging mechanism similar to what we see  
currently use in search results.  Deleting
the first ten entries out of the collection would cause  
all of the entries in the feed to shift backwards

in the feeds

B) Each prev link is representative of how the feed looked  
at a given point in time.  E.g. the feed as it would

 have appeared at a given hour of a given day

   If it's A, then Bob's RFC3229+feed solution seems much more  
efficient. (see #1)


   If it's B, then I'm wondering why you don't just use an ETag  
based approach, e.g.


  fs:Stateful1/fs:Stateful
  fs:prev{ETag}/fs:prev

   This would allow clients to only ever have to deal with a single  
URI for a feed and use conditional-gets with ETag to differentiate  
which snapshot of the feed they want to get and would likely make  
it easier to remediate potential recursive reference attacks, (e.g.  
feed A references feed B which references feed C which is a blind  
redirect to Feed A).


This proposal doesn't handle deletion or other aspects of identity in  
feeds; I tried to introduce language like that earlier in Atom  
itself, but we failed to gain consensus around it.


How does an ETag help you locate a previous feed to reconstruct  
state? Even if it could, I'm not sure intermingling HTTP protocol  
details with application semantics; although there's nothing to  
prevent this theoretically, in many implementations, it might be  
problematic to predict what the ETag is.



3. Microsoft's RSS Lists spec uses cf:treatAs / to attach  
behavioral semantics to a feed.  This proposal uses fs:Stateful /  
to attach behavioral semantics.  It would be nice if we could come  
up with a relatively simple and standardizable way of attaching  
behavioral semantics.  For example, a standardized treatAs /  
element:


   atomex:treatAsstateful/atomex:treatAs

   The value of the treatAs element would be a list of tokens with  
defined semantics.  Each token SHOULD be registered with IANA.   
Unknown tokens would be ignored.  Incompatible tokens would be  
ignored with  first-in-the-list takes precedence semantics. For  
example:


   atomex:treatAsstateful list/atomex:treatAs

   Indicates that the feed should be treated as a list whose past  
states can be queried using the kind of mechanism you've defined.


That seems like an awfully heavyweight solution. What does defining  
the container and an IANA registry add?



--
Mark Nottingham http://www.mnot.net/



Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt

2005-06-30 Thread James M Snell


Mark Nottingham wrote:


Hi James,

On 29/06/2005, at 10:09 AM, James M Snell wrote:



1. This appears to be addressed at solving the same problem as Bob  
Wyman's RFC3229+feed proposal [http://bobwyman.pubsub.com/main/ 
2004/09/using_rfc3229_w.html].  Do you have any empiracle data  
similar to what Bob provides @ http://bobwyman.pubsub.com/main/ 
2004/10/massive_bandwid.html that would indicate that your approach  
is a better solution to this problem?  These are actually not  
mutually exclusive solutions, they're just different and could be  
used for different scenarios -- e.g. Bob's tends to make a lot of  
sense for blog dashboard feeds like what we use within IBM to show  
all post and commenting activity within our internal blogs server  
while your mechanism would work rather well for things like Top Ten  
lists, etc.  I would just like to see a bit of a compare/contrast  on 
the two approaches.



It's orthoganal to RFC3229. The problem I'm solving is how to  
reconstruct the *entire* state of the logical feed, not just one  
partial representation of it; although RFC3229 could be used to do  
that, it would require feed authors to post the entire content of  
their feed (potentially, many megabytes). This would incur a huge  
load, because any clients that don't support RFC3229 would have to  
GET the entire feed, leading to severe bandwidth problems.


To give a concrete example, Dave Winer would have to post one RSS  
file containing every entry he's made in Scripting News for the past  
10+ years to use RFC3229 to meet the same goal; with this proposal,  
he'd just have to add a 'prev' to each archived feed (assuming he has  
archives around, which if he doesn't, I imagine he could reconstruct).


At times we do get spolied by the ability to dynamically generate 
responses don't we ;-) You're obviously correct when it comes to 
statically generated content - RFC3229+feed does not provide a workable 
solution in that case.


2. Is the feed state mechanism a way of paging through the current  
contents of a collection or a snapshot-in-time view of a feed?   That 
is...


   is it

   A) Collection has a bunch of entries.  Each feed  
representation has 15 entries and the prev link
acts like a paging mechanism similar to what we see  
currently use in search results.  Deleting
the first ten entries out of the collection would cause  
all of the entries in the feed to shift backwards

in the feeds

B) Each prev link is representative of how the feed looked  
at a given point in time.  E.g. the feed as it would

 have appeared at a given hour of a given day

   If it's A, then Bob's RFC3229+feed solution seems much more  
efficient. (see #1)


   If it's B, then I'm wondering why you don't just use an ETag  
based approach, e.g.


  fs:Stateful1/fs:Stateful
  fs:prev{ETag}/fs:prev

   This would allow clients to only ever have to deal with a single  
URI for a feed and use conditional-gets with ETag to differentiate  
which snapshot of the feed they want to get and would likely make  it 
easier to remediate potential recursive reference attacks, (e.g.  
feed A references feed B which references feed C which is a blind  
redirect to Feed A).



This proposal doesn't handle deletion or other aspects of identity in  
feeds; I tried to introduce language like that earlier in Atom  
itself, but we failed to gain consensus around it.


How does an ETag help you locate a previous feed to reconstruct  
state? Even if it could, I'm not sure intermingling HTTP protocol  
details with application semantics; although there's nothing to  
prevent this theoretically, in many implementations, it might be  
problematic to predict what the ETag is.


It's not so much using ETag to reconstruct state as much as it is to 
view access previous views of the feed.  Btw, I threw this out for 
discussions sake and not because I think it's the right solution.  I'm 
not particularly in love with it myself.




3. Microsoft's RSS Lists spec uses cf:treatAs / to attach  
behavioral semantics to a feed.  This proposal uses fs:Stateful /  
to attach behavioral semantics.  It would be nice if we could come  
up with a relatively simple and standardizable way of attaching  
behavioral semantics.  For example, a standardized treatAs /  element:


   atomex:treatAsstateful/atomex:treatAs

   The value of the treatAs element would be a list of tokens with  
defined semantics.  Each token SHOULD be registered with IANA.   
Unknown tokens would be ignored.  Incompatible tokens would be  
ignored with  first-in-the-list takes precedence semantics. For  
example:


   atomex:treatAsstateful list/atomex:treatAs

   Indicates that the feed should be treated as a list whose past  
states can be queried using the kind of mechanism you've defined.



That seems like an awfully heavyweight solution. What does defining  
the container and an IANA registry add?


The value 

Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt

2005-06-30 Thread Mark Nottingham



On 30/06/2005, at 1:41 PM, James M Snell wrote:

The value is that I would really like to see a common and  
consistent way of attaching behavioral semantics to the feed rather  
than each individual vendor / spec defining their own app and impl  
specific methods.  It could be done without IANA support, of  
course, but it's just annoying to see relatively similar tasks done  
in completely different ways.


I totally agree that we should have neutral, non-vendor-specific  
semantics defined. I just don't see how having this container  
defined, along with the IANA registry, helps; if it was the intent of  
the WG to forbid all vendor-specific mechanisms, we should have  
disallowed all extensions except for those that are in an IANA  
registry (for example).


That's an extreme, of course, but it points out that Atom -- and RSS,  
for that matter -- is still in the period of its lifetime where  
vendors and individuals have to experiment to figure out what's  
valuable, and let the market sort out what becomes commonly deployed.  
It's not pretty, but it works pretty well in the long run.


Cheers,


--
Mark Nottingham http://www.mnot.net/



Re: More on Atom XML signatures and encryption

2005-06-30 Thread Paul Hoffman


At 3:16 PM -0600 6/30/05, Antone Roundy wrote:

On Thursday, June 30, 2005, at 12:58  PM, James M Snell wrote:
6. If an entry contains any enclosure links, the digital 
signature SHOULD cover the referenced resources.  Enclosure links 
that are not covered are considered untrusted and pose a 
potential security risk


Fully disagree. We are signing the bits in the document, not the 
outside. There is security risk, those items are simply unsigned.


I tend to consider enclosures to be part of the document, even if 
they are included by reference.  As a potential consumer of an 
enclosure I want to know whether or not the referenced enclosure 
can be trusted.  Is it accepted to change the SHOULD to a MAY with 
a caveat outlining the security risk?


Perhaps a good approach would be for the signed entry to contain a 
separate signature for the enclosure--so the entry's signature would 
cover the bits in the enclosure's signature, but not the bits in the 
enclosure itself.  That way, the signature for the entry could be 
verified without having to fetch the enclosure.


Where would that signature go?  Did we decide that link doesn't 
have to be empty?  If so, that might be a good place...but then I 
don't have any experience with signed XML, so I don't know whether 
there would be technical difficulties with putting it in any 
particular place.


This is possible. It translates to I say that the bits gotten from 
here have a hash of value. If the hash doesn't match, you can't 
assume anything about the bits; if it does, the other semantic data 
in the message can apply to them (...and it is a picture of me, 
...and it is a program that will delete your data...).


--Paul Hoffman, Director
--Internet Mail Consortium



Re: More on Atom XML signatures and encryption

2005-06-30 Thread Paul Hoffman


At 11:58 AM -0700 6/30/05, James M Snell wrote:
3. When signing complete Atom documents (atom:feed and top level 
atom:entry), Inclusive Canonicalization with no pre-c14n 
normalization is required.



There seems to be many more interoperability issues with Inclusive 
Canonicalization than with Exclusive. What is your reasoning here?



Two reasons:
a. No need to re-envelope things at the document level


There is no reason to do that with Canonical XML.

b. Ignorance on my part as to what all the interoperability issues 
are.  Can you elaborate or point me to some relevant discussions?


The description of how to pull things down from the outside info is 
well-defined for Canonical XML, and Canonical XML is required for 
XMLDigSig, so folks have worked harder on it than Inclusive.


4. The signature should cover the signing key. (e.g. if a x509 
cert stored externally from the feed is used, the Signature should 
reference and cover that x509 cert).  Failing to do so opens up a 
security risk.



Please explain the security risk. I probably disagree with this 
requirement, but want to hear your risk analysis.


This is mostly tied to #2 above and comes from a lesson learned from 
WS-Security. Specifically section 13.2.4 of 
http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0.pdf


   Implementers should be aware of the possibility of a token 
substitution attack. In any
situation where a digital signature is verified by reference to 
a token provided in the
message, which specifies the key, it may be possible for an 
unscrupulous producer
to later claim that a different token, containing the same key, 
but different information

was intended.

If we don't verify-by-reference to a key contained elsewhere in the 
feed (or other location), this no longer becomes an issue.


We have no intention of doing HMACs, so I believe that this falls 
out. I have added words about that in a different message I just sent.


5. When signing individual atom:entry elements within a feed, 
Exclusive Canonicalization MUST be used.  If a separate KeyInfo is 
used to identify the signing key, it MUST be contained as either a 
child of the entry or source elements.  A source element SHOULD be 
included in the entry.



Why is this different than #3?

These entries are subject to re-enveloping in a way that document 
level elements are not. It is possible to use ex-c14n throughout so 
that the behavior is consistent. The KeyInfo statement relates to #2 
and thus becomes irrelevant.


Consistency will probably lead to more interoperability, particularly 
in an area as tricky as canonicalization.


6. If an entry contains any enclosure links, the digital 
signature SHOULD cover the referenced resources.  Enclosure links 
that are not covered are considered untrusted and pose a potential 
security risk



Fully disagree. We are signing the bits in the document, not the 
outside. There is security risk, those items are simply unsigned.


I tend to consider enclosures to be part of the document, even if 
they are included by reference.  As a potential consumer of an 
enclosure I want to know whether or not the referenced enclosure can 
be trusted.  Is it accepted to change the SHOULD to a MAY with a 
caveat outlining the security risk?


You have to define exactly what is covered by the signature. No 
SHOULDs, no MAYs. So you either have to define exactly how to bring 
in referenced data (and do you follow links in that data, and links 
in links...?), or you say it's just the bits you see here. As 
another example, how would you sign an entry that points to a page 
that is known to change all the time because it shows the current 
date? Or a hit counter?


There is no security risk if you state exactly what is signed. You 
should point out that the referenced material can change and is not 
covered by the signature.


7. If an entry contains a content element that uses @src, the 
digital signature MUST cover the referenced resource.



Fully disagree.

Same as above.  Even though it is included-by-reference, the 
referenced content is still a part of the message.


No, it isn't. The reference is part of the message.

8. Aggregators and Intermediaries MUST NOT alter/augment the 
content of digitally signed entry elements.



Also disagree, but for a different reason. Aggregators and 
intermediaries should be free to diddle bits if they strip the 
signatures that they have broken.


Ok, my fault. I wasn't clear.  Reword to Aggregators and 
Intermediaries MUST NOT alter/augment the content of digitally 
signed entry elements unless they strip the Signature from the entry


That works for me. You might also consider adding and they are 
allowed to add their own signatures in the place of stripped 
signatures.


9. In addition to serving as a message authenticator, the 
Signature may be used by implementations to assert that 
potentially untrustworthy content within a feed can be trusted 

Re: More on Atom XML signatures and encryption

2005-06-30 Thread Bob Wyman


Paul Hoffman wrote:
Same as above.  Even though it is included-by-reference, the referenced 
content is still a part of the message.

No, it isn't. The reference is part of the message.

+1
   The signature should only cover the bits that are actually in the 
element (feed or entry) that is signed. Referenced data may be under 
different administrative control, may change independently of the signed 
element, etc.


   bob wyman




Re: More on Atom XML signatures and encryption

2005-06-30 Thread James M Snell


Ok, this is fine.  I'll back this out of the draft.

Bob Wyman wrote:


Paul Hoffman wrote:

Same as above.  Even though it is included-by-reference, the 
referenced content is still a part of the message.


No, it isn't. The reference is part of the message.


+1
   The signature should only cover the bits that are actually in the 
element (feed or entry) that is signed. Referenced data may be under 
different administrative control, may change independently of the 
signed element, etc.


   bob wyman