Re: Structured Publishing -- Joe Reger shows the way...
On Sep 21, 2005, at 11:36 AM, Danny Ayers wrote: On 9/12/05, Bob Wyman <[EMAIL PROTECTED]> wrote: I believe it doesn't make sense for us to add data-carrying elements to Atom other than atom:content or atom:summary. Atom provides a definition of a collection of entries and it provides the entry format. Frankly, it should stop there. The data payload should be carried in the content element. I believe the ability to include data outside the content is likely to be useful, and may even be essential in some republishing scenarios where additional metadata about the payload is required. But that's not to say the transport-only view of Atom doesn't offer big advantages in the Structured Blogging kind of scenario where the data can be neatly packaged, relatively opaquely to the rest of the entry data. (Atom as SOAP lite?) I agree with Bob rather than Danny, except that I'd advocate making the metadata part of the XHTML content. Using Atom as a rich envelope in this way combines very well with the Microformat approach of retaining structure in XHTML. For the example of lists of information, given earlier, the XOXO microformat is ideal, as it can degrade gracefully for all viewers. Microformat aware viewers can extract the structure, HTML viewers can display it in a clear human readable form, and even plain-text viewers (assuming they have enough nous to strip stuff between <>) will have the core content. http://microformats.org http://microformats.org/wiki/xoxo
Re: FYI: Updated Index draft
On 14/09/2005, at 1:06 PM, David Powell wrote: How will this interact with the sliding-window/feed-history interpretation of feeds? The natural order assigned by this extension seems incompatible with the implied date order that would be implied by two feed documents, polled over some period of time. What should be the order of a merged feed history such as this: Poll 1: feed(e1, e2, e3) Poll 2: feed(e3, e1, e5) - where, perhaps, 3 and 1 have been updated. How do you combine entries sorted by their natural order, with the time-ordered feed history? There'd need to be an algorithm described for combing the feed documents; e.g., see the _combine() method in http://www.mnot.net/rss/ history/feed_history.py. In practice, most/all(?) popular aggregators do this now (feed history + natural order); the only change is that the algorithm would be documented and well-understood (which IMO would be a vast improvement, *if* we can agree on one... or more). With the rank approach, you'd probably need to say that the ranks were valid within the scope of a single feed document, and then describe the relations between ranks in different feed documents. Not sure that's as interesting. -- Mark Nottingham http://www.mnot.net/
Re: FYI: Updated Index draft
On 14/09/2005, at 1:06 PM, David Powell wrote: I'm probably on my own, but I expected Atom's statement that "This specification assigns no significance to the order of atom:entry elements within the feed" was non-negotiable and couldn't be changed by extensions. This seems more like potential Atom 1.1 material to me - it doesn't seem to layer on top of the Atom framework so much as slightly rewrite part of it. Strictly read, this doesn't preclude other specifications / extensions from adding semantics to the ordering of entries -- it only says that *this* spec doesn't assign any meaning to it. That was the intent as I recall it. Eg - An Atom library or server that doesn't know about this extension is free to not preserve the entry order, and yet to retain the element, even though this will have corrupted the data. That is indeed a problem. Probably the easiest way to fix this would be in errata, by adding a statement like "Some feeds may implicitly or explicitly (through extensions) have meaning assigned to the ordering of entries, so intermediaries SHOULD NOT reorder them." I think that as implemented, this extension wouldn't be safe to deploy without must-understand extensions, which Atom 1.0 doesn't support. That would be another way to go, but people didn't want mU. Cheers, -- Mark Nottingham http://www.mnot.net/
Re: FYI: Updated Index draft
On Thursday, September 22, 2005, at 10:20 AM, James M Snell wrote: Antone Roundy wrote: I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: Using atom:id as the ranking domain would limit the ranking to a single feed which is useful, but does not cover the full range of cases. ... Yes, there are two special cases here: 1. Lack of a i:domain 2. i:domain value that is a same document reference I think a ranking without a domain is pretty much useless--or at least likely to lead to problems downstream--so that case doesn't need to be covered. More on that below. ... Feed1 # A 50 20 B 25 40 Feed2 # C 50 30 D 25 10 In this example, the domainless rankings were added when the XHTML document was created, right? So the XHTML document is essentially an aggregate feed, just not in Atom format. Would it not make as much or more sense to mint an ID for the document (call it the ID of a "virtual Atom Feed Document" if you don't actually create an aggregate feed) and use it to scope those i:rank elements? If, somehow, someone were to pull the atom:feeds out of the XHTML document (if atom:feed getting embedded into xhtml:body is going to happen, then is not atom:feed getting extracted from xhtml:body also likely?) and aggregate them with other feeds with domainless i:rank elements, the scopes of those elements would get mixed. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): The goal of @scheme is to identify the type of ranking to apply while the goal of @domain is to identify the scope of the ranking. I do not believe that it is a good idea to conflate the two. Okay, I've come to agree with that while writing and editing this message. Note however that "fivestar" also indicates multiple things: 1) Higher numbers are "better" 2) The range is 0 to 5 (BTW, if this is limited to integers, how will you handle things like 3.5 stars, which are common in that type of rating system? Maybe decimal values need to be allowed.) 3) Hint: you might want to display the value as stars #1 is the only one needed for sorting of entries. #2 would be useful if the feed reader wanted to display some sort of graphical element to indicate the ranking. #3 might be slightly useful, but except for the most popular schemes, would probably be ignored. Perhaps all of these should be separated, a la: ... 3 ...where @domain is the feed/id of the feed if there's just one feed in scope, or a value that won't be duplicated by any feed/id otherwise (if one can mint a unique feed id, surely one can also mint a unique id that won't be used for a feed). I'd suggest that i:ranking-scheme/@domain either default to the containing feed/id (or the one from atom:source, if it exists) or be required, i:rank/@domain be required, @order default to ascending, @min-value default to 0, and the rest of the attributes be optional with no defaults.
Re: FYI: Updated Index draft
Antone Roundy wrote: On Wednesday, September 21, 2005, at 11:43 PM, James M Snell wrote: {domain} I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: Using atom:id as the ranking domain would limit the ranking to a single feed which is useful, but does not cover the full range of cases. Later on in your note, you say: If sticking with i:domain, I'd recommend that you recommend that in cases where a ranking domain does not span multiple feeds, the feed/id value be used for the value of i:domain, and that in all cases, the same care be taken to (attempt to) ensure that i:domain's value is unique to what is intended to be a particular domain. Yes, there are two special cases here: 1. Lack of a i:domain 2. i:domain value that is a same document reference In the first case, I had imagined a "Default Ranking Domain" that is identified by the feed atom:id element, just as you suggest. In the second case, I had imagined a "Document Ranking Domain" that is identified by the document containing the feed. There is a subtle difference between these two. Consider the following (somewhat contrived) example: ... Feed1 # A 50 20 B 25 40 Feed2 # C 50 30 D 25 10 The two embedded atom:feed elements specify two ranking domains: The Default Ranking Domain and a Document Ranking Domain. The Default Ranking Domain is scoped to the individual atom:feed as is identified by the value of the atom:id. the Document Ranking Domain is scoped to the containing document. The Default Ranking Domain ranks may only be used to order the entries within the containing atom:feed: sort_ascending ( Feed1 ) = B, A sort_ascending ( Feed2 ) = D, C The Document Ranking Domain ranks may be used to order all entries appearing within the document sort_ascending ( Document ) = D, A, C, B In an Atom Feed Document, the Default Ranking Domain and the Document Ranking Domain happen to be identical. urn:my_reviews descending descending Movie A 3 4 Movie B 2 1 Notes: * The i:order element tells the user agent whether higher or lower numbers are considered "better", "higher priority", "first", or whatever. In these cases, higher numbers are better, so would typicially be shown first, so they're considered a "descending" schemes. Hmm.. I wanted to get away from doing this kind of thing. * i:order/@label indicates a human readable label for the scheme, and could be optional. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): The goal of @scheme is to identify the type of ranking to apply while the goal of @domain is to identify the scope of the ranking. I do not believe that it is a good idea to conflate the two. - James
Re: FYI: Updated Index draft
James Holderness wrote: James M Snell wrote: This could all get rather complicated very quickly. My primary objective is to address known use cases for ordered feeds (my netflix queue feed[1] for example), most of which are structured as complete datasets that are non-incremental in nature. I realise that this sort of thing sounds like a good idea from a content provider's point of view, but as an aggregator developer, this is probably the last thing I would want to support. A feed that is not incremental is not a feed IMHO. There are just too many special case complications that an aggregator developer has to deal with that have nothing to do with regular, honest-to-goodness feeds. I do believe this falls under the Not-All-Feeds-Should-Be-Aggregated Category. That said, however, I think the concept of Feed-As-List is one that generally has a lot of support. 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. This kind of makes sense, but I'm not convinced it's necessary. If the feed has various ranks on which it can be sorted, I'd rather leave the decision on which one to use to the user. If, for whatever reason, those alternate domains are no longer applicable and the feed absolutely has to force the use of a particular domain, wouldn't it make more sense to filter out all those unused ranks rather than making the user download them? i:domain is not used as a key of determining which rankings to use; it's a key that is used to correlate rankings. Regarding filtering, we should not rely on aggregators filtering out unused ranks. Consider the case of digitally signed entries; filtering out a rank covered by the digital signature would invalidate the signature. 2. It helps us to correlate ranks that span multiple feed documents. For instance, two separate feed documents may specify the same ranking domain. This I like. By the description given, it sounds as if the BBC ranking is more a ranking of relative importance than a ranking of natural order. That is, Top Story A has a higher importance that Top Story B, etc. If that is the case, a "priority" or "importance" ranking scheme can be used in conjunction with the atom:updated element. This almost works. As an aggregator, what I would want to do is automatically sort with the date as the primary key and the priority as the secondary key. That way, today's high-priority items would appear at the top of the list, and yesterday's would follow on afterwards. Any of yesterday's items that were still of some importance today would need to have their atom:updated element set to today and their priority adjusted as appropriate. There are couple of problems though. The atom:updated element has to be identical for all items on a particular day. Also the atom:updated element can't be changed when an actual update occurs (say a spelling correction, or an update on a story) without breaking the ordering. The problem is we're abusing the atom:updated element so as to use it for something that's it's not. The updated elements would not need to be identical. Aggregators can easily determine whether or not entries with different updated values occured on the same day / same hour / etc. In other words, I could sort by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any difficulty. There is no abuse of atom:updated here. It would be better if we could add an extra attribute to your rank tag that specified what date the rank applied to. For someone like the BBC that reprioritizes feeds on a daily basis they'd set this attribute to something like say midnight for the date on which the ranking applies. If you have an item from a previous day that is still important today, it would keep its original atom:updated value, but the rank-date would be set to today. I'll give this some thought, but my initial gut reaction is that it is not necessary. Let me see if I can convince myself otherwise ;-) Regards James Thanks for the input! - James
Re: FYI: Updated Index draft
James M Snell wrote: This could all get rather complicated very quickly. My primary objective is to address known use cases for ordered feeds (my netflix queue feed[1] for example), most of which are structured as complete datasets that are non-incremental in nature. I realise that this sort of thing sounds like a good idea from a content provider's point of view, but as an aggregator developer, this is probably the last thing I would want to support. A feed that is not incremental is not a feed IMHO. There are just too many special case complications that an aggregator developer has to deal with that have nothing to do with regular, honest-to-goodness feeds. Are you supposed to automatically delete old items? With or without the users' consent? Do you archive old items in some way? How do you handle the aggregation of items from multiple non-incremental feeds into a single feed? How do you handle the aggregation of items from multiple feeds some of which are incremental and some of which are complete datasets? How do you handle filtering that results in a subset of items from what is supposed to be a complete dataset? That said, I suspect I'm fighting a losing battle, and I do like this proposal as it applies to ranking of feeds in general. 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. This kind of makes sense, but I'm not convinced it's necessary. If the feed has various ranks on which it can be sorted, I'd rather leave the decision on which one to use to the user. If, for whatever reason, those alternate domains are no longer applicable and the feed absolutely has to force the use of a particular domain, wouldn't it make more sense to filter out all those unused ranks rather than making the user download them? 2. It helps us to correlate ranks that span multiple feed documents. For instance, two separate feed documents may specify the same ranking domain. This I like. By the description given, it sounds as if the BBC ranking is more a ranking of relative importance than a ranking of natural order. That is, Top Story A has a higher importance that Top Story B, etc. If that is the case, a "priority" or "importance" ranking scheme can be used in conjunction with the atom:updated element. This almost works. As an aggregator, what I would want to do is automatically sort with the date as the primary key and the priority as the secondary key. That way, today's high-priority items would appear at the top of the list, and yesterday's would follow on afterwards. Any of yesterday's items that were still of some importance today would need to have their atom:updated element set to today and their priority adjusted as appropriate. There are couple of problems though. The atom:updated element has to be identical for all items on a particular day. Also the atom:updated element can't be changed when an actual update occurs (say a spelling correction, or an update on a story) without breaking the ordering. The problem is we're abusing the atom:updated element so as to use it for something that's it's not. It would be better if we could add an extra attribute to your rank tag that specified what date the rank applied to. For someone like the BBC that reprioritizes feeds on a daily basis they'd set this attribute to something like say midnight for the date on which the ranking applies. If you have an item from a previous day that is still important today, it would keep its original atom:updated value, but the rank-date would be set to today. An aggregator supporting this extension could then sort on the rank-date as the primary key (descending) and the rank value itself as the secondary key. For feeds that don't change their priorities over time you can just leave this attribute out and the aggregator can sort on the rank value alone. I don't think it overly complicates the interface, but it does add significant value IMO. Regards James
Re: FYI: Updated Index draft
On Wednesday, September 21, 2005, at 11:43 PM, James M Snell wrote: {domain} I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: urn:my_reviews descending descending Movie A 3 4 Movie B 2 1 Notes: * The i:order element tells the user agent whether higher or lower numbers are considered "better", "higher priority", "first", or whatever. In these cases, higher numbers are better, so would typicially be shown first, so they're considered a "descending" schemes. * i:order/@label indicates a human readable label for the scheme, and could be optional. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): urn:my_feed ascending urn:my_feed/a 1 urn:my_feed/b 2 If sticking with i:domain, I'd recommend that you recommend that in cases where a ranking domain does not span multiple feeds, the feed/id value be used for the value of i:domain, and that in all cases, the same care be taken to (attempt to) ensure that i:domain's value is unique to what is intended to be a particular domain.