Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt
Antone Roundy wrote: Getting back to how to use static documents for a chain of instances, that could easily be done as follows. The following assumes that the current feed document and the archive documents will each contain 15 entries. The first 15 instances of the feed document do not contain a prev link (assuming one entry is added each time). When the 16th entry is added, a static document is created containing the first 15 entries, and a prev link pointing to it is added to the current feed document. This link remains unchanged until the 31st entry is added. When the 31st entry is added, another static document is created containing the 16th through 30th entries. It has a prev link pointing to the first static document. The current feed document's prev link is updated to point to the second static document, and it continues to point to the second static document until the 46th entry is added. When the 46th entry is added, a third static document is created containing the 31st through 45th entries, etc. However, there should then be a this link in the live feed, otherwise I'll have to retrieve (as a reader/aggregator) the prev feed each 15 entries: Say I retrieved the feed when it was 15-entries long. When the 16th entry is added and the first static document created, the live feed is added a prev link, pointing to a document I never retrieved, so I guess I might have missed entries and retrieve it. I end up retrieving back the 15 entries I already know of. When the 31st entry is added, the feed's prev link is changed to reference the new 16th-to-31st archive feed. This is an URI I never dereferenced, so I guess I might have missed some entries and then dereference the URI and retrieve the archive feed. If I had retrieved the feed when it was 30-entries long, I end up retrieving back the 16th to 31st entries I already know of. One could argue that I don't need to retrieve the archive feed as the live feed already contains 14 entries (2nd to 15th, or 17th to 30th) I already retrieved, using atom:updated and atom:id to notive them. Well, nothing precludes an entry to be pushed to front even if its atom:updated hasn't changed, so the entry following such a puished to front entry could be one I never saw and I might have missed it. And actually, this doesn't otherwise change the problem, which would still arise if I retrieve the live feed when, say, it was 15-entries long and 15 entries later: I never saw the prev archive feed or any of the 15 entries in the live feed (so I can't conclude anything based on atom:id+atom:updated), I then retrieve the -prev linked archive feed and end up retrieving 15 entries I already know of, because it happens than I actually didn't miss any entry between my two live feed retrievals... So we need a mean to either identify the *next* prev link (a this or permalink link in the live feed (no need to have one in archive feeds, as already said on the list), which means it must be predictable), or something to tell us we didn't missed entries, such as the atom:updated of the prev-linked archive feed (is atom:updated enough?). We'll end up with the live feed being either: feed xmlns=... xmlns:fs=... link rel=archive href=http://example.com/2005/05/; / !-- I didn't use a link construct as the document not yet exists) -- fs:predicted-archive-uri http://example.com/2005/06/ /fs:predicted-archive-uri ... /feed or feed xmlns=... xmlns:fs=... !-- I used an extension attribute, even if it's not clearly defined by the Atom Syndication Format -- link rel=prev href=http://example.com/2005/05/; fs:updated=2005-05-31T23:59:59 / ... /feed One advantage of the latter is that you don't rely on URIs as identifiers for the feed archive documents and they can be moved/split/merged without readers and aggregators being then implicitly told to retrieve back the whole archives (if you change URIs, they'll think they missed entries...). -- Thomas Broyer
Re: More on Atom XML signatures and encryption
Paul Hoffman wrote: At 12:47 PM -0700 6/29/05, James M Snell wrote: 1. After going through a bunch of potential XML encryption use cases, it really doesn't seem to make any sense at all to use XML Encryption below the document element level. The I-D will not cover anything about encryption of Atom documents as there are really no special considerations that are specific to Atom. Good. 2. The I-D will allow a KeyInfo element to included as a child of the atom:feed, atom:entry and atom:source elements. These will be used to identify the signing key. (e.g. the KeyInfo in the Signature can reference another KeyInfo contained elsewhere in the Feed). This is OK from a security standpoint, but why have it? Why not always have the signature contain all the validating information? You know, if you had asked me this when I wrote this requirement down in my notes three days ago I would have been able to give you the answer. The fact that I'm staring at my screen trying to recall what that answer is indicates that it's not a very good one ;-) ... You're right, there really is no need to separate the keyinfo from the signature in this situation. 3. When signing complete Atom documents (atom:feed and top level atom:entry), Inclusive Canonicalization with no pre-c14n normalization is required. There seems to be many more interoperability issues with Inclusive Canonicalization than with Exclusive. What is your reasoning here? Two reasons: a. No need to re-envelope things at the document level b. Ignorance on my part as to what all the interoperability issues are. Can you elaborate or point me to some relevant discussions? 4. The signature should cover the signing key. (e.g. if a x509 cert stored externally from the feed is used, the Signature should reference and cover that x509 cert). Failing to do so opens up a security risk. Please explain the security risk. I probably disagree with this requirement, but want to hear your risk analysis. This is mostly tied to #2 above and comes from a lesson learned from WS-Security. Specifically section 13.2.4 of http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0.pdf Implementers should be aware of the possibility of a token substitution attack. In any situation where a digital signature is verified by reference to a token provided in the message, which specifies the key, it may be possible for an unscrupulous producer to later claim that a different token, containing the same key, but different information was intended. If we don't verify-by-reference to a key contained elsewhere in the feed (or other location), this no longer becomes an issue. 5. When signing individual atom:entry elements within a feed, Exclusive Canonicalization MUST be used. If a separate KeyInfo is used to identify the signing key, it MUST be contained as either a child of the entry or source elements. A source element SHOULD be included in the entry. Why is this different than #3? These entries are subject to re-enveloping in a way that document level elements are not. It is possible to use ex-c14n throughout so that the behavior is consistent. The KeyInfo statement relates to #2 and thus becomes irrelevant. 6. If an entry contains any enclosure links, the digital signature SHOULD cover the referenced resources. Enclosure links that are not covered are considered untrusted and pose a potential security risk Fully disagree. We are signing the bits in the document, not the outside. There is security risk, those items are simply unsigned. I tend to consider enclosures to be part of the document, even if they are included by reference. As a potential consumer of an enclosure I want to know whether or not the referenced enclosure can be trusted. Is it accepted to change the SHOULD to a MAY with a caveat outlining the security risk? 7. If an entry contains a content element that uses @src, the digital signature MUST cover the referenced resource. Fully disagree. Same as above. Even though it is included-by-reference, the referenced content is still a part of the message. 8. Aggregators and Intermediaries MUST NOT alter/augment the content of digitally signed entry elements. Also disagree, but for a different reason. Aggregators and intermediaries should be free to diddle bits if they strip the signatures that they have broken. Ok, my fault. I wasn't clear. Reword to Aggregators and Intermediaries MUST NOT alter/augment the content of digitally signed entry elements unless they strip the Signature from the entry 9. In addition to serving as a message authenticator, the Signature may be used by implementations to assert that potentially untrustworthy content within a feed can be trusted (e.g. binary enclosures, scripts, etc) How will you assert that? Not so much a normative assertion. More of a if you know who produced this feed/entry and
Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt
Hi James, On 29/06/2005, at 10:09 AM, James M Snell wrote: 1. This appears to be addressed at solving the same problem as Bob Wyman's RFC3229+feed proposal [http://bobwyman.pubsub.com/main/ 2004/09/using_rfc3229_w.html]. Do you have any empiracle data similar to what Bob provides @ http://bobwyman.pubsub.com/main/ 2004/10/massive_bandwid.html that would indicate that your approach is a better solution to this problem? These are actually not mutually exclusive solutions, they're just different and could be used for different scenarios -- e.g. Bob's tends to make a lot of sense for blog dashboard feeds like what we use within IBM to show all post and commenting activity within our internal blogs server while your mechanism would work rather well for things like Top Ten lists, etc. I would just like to see a bit of a compare/contrast on the two approaches. It's orthoganal to RFC3229. The problem I'm solving is how to reconstruct the *entire* state of the logical feed, not just one partial representation of it; although RFC3229 could be used to do that, it would require feed authors to post the entire content of their feed (potentially, many megabytes). This would incur a huge load, because any clients that don't support RFC3229 would have to GET the entire feed, leading to severe bandwidth problems. To give a concrete example, Dave Winer would have to post one RSS file containing every entry he's made in Scripting News for the past 10+ years to use RFC3229 to meet the same goal; with this proposal, he'd just have to add a 'prev' to each archived feed (assuming he has archives around, which if he doesn't, I imagine he could reconstruct). 2. Is the feed state mechanism a way of paging through the current contents of a collection or a snapshot-in-time view of a feed? That is... is it A) Collection has a bunch of entries. Each feed representation has 15 entries and the prev link acts like a paging mechanism similar to what we see currently use in search results. Deleting the first ten entries out of the collection would cause all of the entries in the feed to shift backwards in the feeds B) Each prev link is representative of how the feed looked at a given point in time. E.g. the feed as it would have appeared at a given hour of a given day If it's A, then Bob's RFC3229+feed solution seems much more efficient. (see #1) If it's B, then I'm wondering why you don't just use an ETag based approach, e.g. fs:Stateful1/fs:Stateful fs:prev{ETag}/fs:prev This would allow clients to only ever have to deal with a single URI for a feed and use conditional-gets with ETag to differentiate which snapshot of the feed they want to get and would likely make it easier to remediate potential recursive reference attacks, (e.g. feed A references feed B which references feed C which is a blind redirect to Feed A). This proposal doesn't handle deletion or other aspects of identity in feeds; I tried to introduce language like that earlier in Atom itself, but we failed to gain consensus around it. How does an ETag help you locate a previous feed to reconstruct state? Even if it could, I'm not sure intermingling HTTP protocol details with application semantics; although there's nothing to prevent this theoretically, in many implementations, it might be problematic to predict what the ETag is. 3. Microsoft's RSS Lists spec uses cf:treatAs / to attach behavioral semantics to a feed. This proposal uses fs:Stateful / to attach behavioral semantics. It would be nice if we could come up with a relatively simple and standardizable way of attaching behavioral semantics. For example, a standardized treatAs / element: atomex:treatAsstateful/atomex:treatAs The value of the treatAs element would be a list of tokens with defined semantics. Each token SHOULD be registered with IANA. Unknown tokens would be ignored. Incompatible tokens would be ignored with first-in-the-list takes precedence semantics. For example: atomex:treatAsstateful list/atomex:treatAs Indicates that the feed should be treated as a list whose past states can be queried using the kind of mechanism you've defined. That seems like an awfully heavyweight solution. What does defining the container and an IANA registry add? -- Mark Nottingham http://www.mnot.net/
Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt
Mark Nottingham wrote: Hi James, On 29/06/2005, at 10:09 AM, James M Snell wrote: 1. This appears to be addressed at solving the same problem as Bob Wyman's RFC3229+feed proposal [http://bobwyman.pubsub.com/main/ 2004/09/using_rfc3229_w.html]. Do you have any empiracle data similar to what Bob provides @ http://bobwyman.pubsub.com/main/ 2004/10/massive_bandwid.html that would indicate that your approach is a better solution to this problem? These are actually not mutually exclusive solutions, they're just different and could be used for different scenarios -- e.g. Bob's tends to make a lot of sense for blog dashboard feeds like what we use within IBM to show all post and commenting activity within our internal blogs server while your mechanism would work rather well for things like Top Ten lists, etc. I would just like to see a bit of a compare/contrast on the two approaches. It's orthoganal to RFC3229. The problem I'm solving is how to reconstruct the *entire* state of the logical feed, not just one partial representation of it; although RFC3229 could be used to do that, it would require feed authors to post the entire content of their feed (potentially, many megabytes). This would incur a huge load, because any clients that don't support RFC3229 would have to GET the entire feed, leading to severe bandwidth problems. To give a concrete example, Dave Winer would have to post one RSS file containing every entry he's made in Scripting News for the past 10+ years to use RFC3229 to meet the same goal; with this proposal, he'd just have to add a 'prev' to each archived feed (assuming he has archives around, which if he doesn't, I imagine he could reconstruct). At times we do get spolied by the ability to dynamically generate responses don't we ;-) You're obviously correct when it comes to statically generated content - RFC3229+feed does not provide a workable solution in that case. 2. Is the feed state mechanism a way of paging through the current contents of a collection or a snapshot-in-time view of a feed? That is... is it A) Collection has a bunch of entries. Each feed representation has 15 entries and the prev link acts like a paging mechanism similar to what we see currently use in search results. Deleting the first ten entries out of the collection would cause all of the entries in the feed to shift backwards in the feeds B) Each prev link is representative of how the feed looked at a given point in time. E.g. the feed as it would have appeared at a given hour of a given day If it's A, then Bob's RFC3229+feed solution seems much more efficient. (see #1) If it's B, then I'm wondering why you don't just use an ETag based approach, e.g. fs:Stateful1/fs:Stateful fs:prev{ETag}/fs:prev This would allow clients to only ever have to deal with a single URI for a feed and use conditional-gets with ETag to differentiate which snapshot of the feed they want to get and would likely make it easier to remediate potential recursive reference attacks, (e.g. feed A references feed B which references feed C which is a blind redirect to Feed A). This proposal doesn't handle deletion or other aspects of identity in feeds; I tried to introduce language like that earlier in Atom itself, but we failed to gain consensus around it. How does an ETag help you locate a previous feed to reconstruct state? Even if it could, I'm not sure intermingling HTTP protocol details with application semantics; although there's nothing to prevent this theoretically, in many implementations, it might be problematic to predict what the ETag is. It's not so much using ETag to reconstruct state as much as it is to view access previous views of the feed. Btw, I threw this out for discussions sake and not because I think it's the right solution. I'm not particularly in love with it myself. 3. Microsoft's RSS Lists spec uses cf:treatAs / to attach behavioral semantics to a feed. This proposal uses fs:Stateful / to attach behavioral semantics. It would be nice if we could come up with a relatively simple and standardizable way of attaching behavioral semantics. For example, a standardized treatAs / element: atomex:treatAsstateful/atomex:treatAs The value of the treatAs element would be a list of tokens with defined semantics. Each token SHOULD be registered with IANA. Unknown tokens would be ignored. Incompatible tokens would be ignored with first-in-the-list takes precedence semantics. For example: atomex:treatAsstateful list/atomex:treatAs Indicates that the feed should be treated as a list whose past states can be queried using the kind of mechanism you've defined. That seems like an awfully heavyweight solution. What does defining the container and an IANA registry add? The value
Re: I-D ACTION:draft-nottingham-atompub-feed-history-00.txt
On 30/06/2005, at 1:41 PM, James M Snell wrote: The value is that I would really like to see a common and consistent way of attaching behavioral semantics to the feed rather than each individual vendor / spec defining their own app and impl specific methods. It could be done without IANA support, of course, but it's just annoying to see relatively similar tasks done in completely different ways. I totally agree that we should have neutral, non-vendor-specific semantics defined. I just don't see how having this container defined, along with the IANA registry, helps; if it was the intent of the WG to forbid all vendor-specific mechanisms, we should have disallowed all extensions except for those that are in an IANA registry (for example). That's an extreme, of course, but it points out that Atom -- and RSS, for that matter -- is still in the period of its lifetime where vendors and individuals have to experiment to figure out what's valuable, and let the market sort out what becomes commonly deployed. It's not pretty, but it works pretty well in the long run. Cheers, -- Mark Nottingham http://www.mnot.net/
Re: More on Atom XML signatures and encryption
At 3:16 PM -0600 6/30/05, Antone Roundy wrote: On Thursday, June 30, 2005, at 12:58 PM, James M Snell wrote: 6. If an entry contains any enclosure links, the digital signature SHOULD cover the referenced resources. Enclosure links that are not covered are considered untrusted and pose a potential security risk Fully disagree. We are signing the bits in the document, not the outside. There is security risk, those items are simply unsigned. I tend to consider enclosures to be part of the document, even if they are included by reference. As a potential consumer of an enclosure I want to know whether or not the referenced enclosure can be trusted. Is it accepted to change the SHOULD to a MAY with a caveat outlining the security risk? Perhaps a good approach would be for the signed entry to contain a separate signature for the enclosure--so the entry's signature would cover the bits in the enclosure's signature, but not the bits in the enclosure itself. That way, the signature for the entry could be verified without having to fetch the enclosure. Where would that signature go? Did we decide that link doesn't have to be empty? If so, that might be a good place...but then I don't have any experience with signed XML, so I don't know whether there would be technical difficulties with putting it in any particular place. This is possible. It translates to I say that the bits gotten from here have a hash of value. If the hash doesn't match, you can't assume anything about the bits; if it does, the other semantic data in the message can apply to them (...and it is a picture of me, ...and it is a program that will delete your data...). --Paul Hoffman, Director --Internet Mail Consortium
Re: More on Atom XML signatures and encryption
At 11:58 AM -0700 6/30/05, James M Snell wrote: 3. When signing complete Atom documents (atom:feed and top level atom:entry), Inclusive Canonicalization with no pre-c14n normalization is required. There seems to be many more interoperability issues with Inclusive Canonicalization than with Exclusive. What is your reasoning here? Two reasons: a. No need to re-envelope things at the document level There is no reason to do that with Canonical XML. b. Ignorance on my part as to what all the interoperability issues are. Can you elaborate or point me to some relevant discussions? The description of how to pull things down from the outside info is well-defined for Canonical XML, and Canonical XML is required for XMLDigSig, so folks have worked harder on it than Inclusive. 4. The signature should cover the signing key. (e.g. if a x509 cert stored externally from the feed is used, the Signature should reference and cover that x509 cert). Failing to do so opens up a security risk. Please explain the security risk. I probably disagree with this requirement, but want to hear your risk analysis. This is mostly tied to #2 above and comes from a lesson learned from WS-Security. Specifically section 13.2.4 of http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0.pdf Implementers should be aware of the possibility of a token substitution attack. In any situation where a digital signature is verified by reference to a token provided in the message, which specifies the key, it may be possible for an unscrupulous producer to later claim that a different token, containing the same key, but different information was intended. If we don't verify-by-reference to a key contained elsewhere in the feed (or other location), this no longer becomes an issue. We have no intention of doing HMACs, so I believe that this falls out. I have added words about that in a different message I just sent. 5. When signing individual atom:entry elements within a feed, Exclusive Canonicalization MUST be used. If a separate KeyInfo is used to identify the signing key, it MUST be contained as either a child of the entry or source elements. A source element SHOULD be included in the entry. Why is this different than #3? These entries are subject to re-enveloping in a way that document level elements are not. It is possible to use ex-c14n throughout so that the behavior is consistent. The KeyInfo statement relates to #2 and thus becomes irrelevant. Consistency will probably lead to more interoperability, particularly in an area as tricky as canonicalization. 6. If an entry contains any enclosure links, the digital signature SHOULD cover the referenced resources. Enclosure links that are not covered are considered untrusted and pose a potential security risk Fully disagree. We are signing the bits in the document, not the outside. There is security risk, those items are simply unsigned. I tend to consider enclosures to be part of the document, even if they are included by reference. As a potential consumer of an enclosure I want to know whether or not the referenced enclosure can be trusted. Is it accepted to change the SHOULD to a MAY with a caveat outlining the security risk? You have to define exactly what is covered by the signature. No SHOULDs, no MAYs. So you either have to define exactly how to bring in referenced data (and do you follow links in that data, and links in links...?), or you say it's just the bits you see here. As another example, how would you sign an entry that points to a page that is known to change all the time because it shows the current date? Or a hit counter? There is no security risk if you state exactly what is signed. You should point out that the referenced material can change and is not covered by the signature. 7. If an entry contains a content element that uses @src, the digital signature MUST cover the referenced resource. Fully disagree. Same as above. Even though it is included-by-reference, the referenced content is still a part of the message. No, it isn't. The reference is part of the message. 8. Aggregators and Intermediaries MUST NOT alter/augment the content of digitally signed entry elements. Also disagree, but for a different reason. Aggregators and intermediaries should be free to diddle bits if they strip the signatures that they have broken. Ok, my fault. I wasn't clear. Reword to Aggregators and Intermediaries MUST NOT alter/augment the content of digitally signed entry elements unless they strip the Signature from the entry That works for me. You might also consider adding and they are allowed to add their own signatures in the place of stripped signatures. 9. In addition to serving as a message authenticator, the Signature may be used by implementations to assert that potentially untrustworthy content within a feed can be trusted
Re: More on Atom XML signatures and encryption
Paul Hoffman wrote: Same as above. Even though it is included-by-reference, the referenced content is still a part of the message. No, it isn't. The reference is part of the message. +1 The signature should only cover the bits that are actually in the element (feed or entry) that is signed. Referenced data may be under different administrative control, may change independently of the signed element, etc. bob wyman
Re: More on Atom XML signatures and encryption
Ok, this is fine. I'll back this out of the draft. Bob Wyman wrote: Paul Hoffman wrote: Same as above. Even though it is included-by-reference, the referenced content is still a part of the message. No, it isn't. The reference is part of the message. +1 The signature should only cover the bits that are actually in the element (feed or entry) that is signed. Referenced data may be under different administrative control, may change independently of the signed element, etc. bob wyman