Re: FYI: Updated Index draft
I'm sorry for the long delay in replying, but I've been swamped with work lately. James M Snell wrote: 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. i:domain is not used as a key of determining which rankings to use; it's a key that is used to correlate rankings. The correlating part I get - I think that's a great idea. It's part 1 (quoted above) that I don't understand. You explicitly say a consumer can safely ignore the rank based on what domain it sees in a feed. How is that not a determination of which rankings to use? Regarding filtering, we should not rely on aggregators filtering out unused ranks. Consider the case of digitally signed entries; filtering out a rank covered by the digital signature would invalidate the signature. I haven't been following the digital signature proposals, but I would have expected they would attempt to sign a particular content entry rather than trying to include all the metadata that went with it. Is it really safe to assume that an item aggregated into a synthetic feed or one that has passed through a caching/forwarding system will not have had metadata tags added, removed or reordered? I guess this is offtopic though and I see your point. There are couple of problems though. The atom:updated element has to be identical for all items on a particular day. Also the atom:updated element can't be changed when an actual update occurs (say a spelling correction, or an update on a story) without breaking the ordering. The problem is we're abusing the atom:updated element so as to use it for something that's it's not. The updated elements would not need to be identical. Aggregators can easily determine whether or not entries with different updated values occured on the same day / same hour / etc. In other words, I could sort by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any difficulty. There is no abuse of atom:updated here. The problem is knowing what to sort on. Unless you provide that information somewhere in the feed there's no way for the aggregator to perform the sort automatically. Say BBC updates its ranks every 6 hours, how is the aggregator to know that it should sort by halfday+priority? Or maybe it updates at 8am and 2pm every day - how would an aggregator deal with that? With my proposal they would set rank-date to 8am (of the current day) for every item updated between 8am and 2pm. And items updated from 2pm to 8am the following day would have a rank-date of 2pm. The aggregator sorts on rankdate+priority and it all just lines up automatically. No guessing required. I'll give this some thought, but my initial gut reaction is that it is not necessary. Let me see if I can convince myself otherwise ;-) You could be right. It's not exactly a critical need. I just thought it wouldn't harm at least having it there as an option for those people that might want that level of control. Regards James
Re: FYI: Updated Index draft
Antone Roundy wrote: I think a ranking without a domain is pretty much useless--or at least likely to lead to problems downstream--so that case doesn't need to be covered. More on that below. Agreed. ... Feed1 # A 50 20 B 25 40 Feed2 # C 50 30 D 25 10 In this example, the domainless rankings were added when the XHTML document was created, right? So the XHTML document is essentially an aggregate feed, just not in Atom format. Would it not make as much or more sense to mint an ID for the document (call it the ID of a "virtual Atom Feed Document" if you don't actually create an aggregate feed) and use it to scope those i:rank elements? If, somehow, someone were to pull the atom:feeds out of the XHTML document (if atom:feed getting embedded into xhtml:body is going to happen, then is not atom:feed getting extracted from xhtml:body also likely?) and aggregate them with other feeds with domainless i:rank elements, the scopes of those elements would get mixed. Yes, but we cannot reliably dictate that containing documents must contain atom:id elements simply because we have no control over the definitions of those containing documents. And yes, if someone pulls the feed out of the XHTML and uses it somewhere else, any ranks in the document scope will be affected. I do not believe that this is a deal-breaker, however.. it's just something that folks using the ranking mechanism need to be aware of so that they can make the appropriate decisions about how and when to properly use the document ranking domain versus a domain that explicitly scoped to a given ID. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): The goal of @scheme is to identify the type of ranking to apply while the goal of @domain is to identify the scope of the ranking. I do not believe that it is a good idea to conflate the two. Okay, I've come to agree with that while writing and editing this message. Note however that "fivestar" also indicates multiple things: 1) Higher numbers are "better" 2) The range is 0 to 5 (BTW, if this is limited to integers, how will you handle things like 3.5 stars, which are common in that type of rating system? Maybe decimal values need to be allowed.) 3) Hint: you might want to display the value as stars #1 is the only one needed for sorting of entries. #2 would be useful if the feed reader wanted to display some sort of graphical element to indicate the ranking. #3 might be slightly useful, but except for the most popular schemes, would probably be ignored. Perhaps all of these should be separated, a la: Minutes before I received this note I had a similar thought that a scheme definition could be useful -- although that get's us quite close to the territory of the RSS simple list extensions (not that it is a bad thing). The symbol attribute is a bit strange. I'd rather let the application determine how it wants to display the rank. The label, order, min and max values and domain attribute are fine. And yes, regarding #2, allowing decimal values would likely be a good idea... doing so would also allow us to do ratings that are based on a 0-1 fractional scheme (e.g. percentages, etc). Negative values should also be allowed. ... 3 ...where @domain is the feed/id of the feed if there's just one feed in scope, or a value that won't be duplicated by any feed/id otherwise (if one can mint a unique feed id, surely one can also mint a unique id that won't be used for a feed). I'd suggest that i:ranking-scheme/@domain either default to the containing feed/id (or the one from atom:source, if it exists) or be required, i:rank/@domain be required, @order default to ascending, @min-value default to 0, and the rest of the attributes be optional with no defaults. I'm liking these suggestions... The i:ranking-scheme element would appear within the atom:feed. If the @domain attribute is missing, the domain is automatically mapped to the id of the feed. If the @domain attribute is a same document reference, the domain is mapped to the document scope. http://www.example.com"; xml:base="http://www.example.com"; /> The meaning of the @order attribute needs to be clearly articulated. It is NOT an indicator of how applications should display the elements rather an indicator of how to interpret the rank values (e.g. highest number is most significant, lowest
Re: FYI: Updated Index draft
On 14/09/2005, at 1:06 PM, David Powell wrote: How will this interact with the sliding-window/feed-history interpretation of feeds? The natural order assigned by this extension seems incompatible with the implied date order that would be implied by two feed documents, polled over some period of time. What should be the order of a merged feed history such as this: Poll 1: feed(e1, e2, e3) Poll 2: feed(e3, e1, e5) - where, perhaps, 3 and 1 have been updated. How do you combine entries sorted by their natural order, with the time-ordered feed history? There'd need to be an algorithm described for combing the feed documents; e.g., see the _combine() method in http://www.mnot.net/rss/ history/feed_history.py. In practice, most/all(?) popular aggregators do this now (feed history + natural order); the only change is that the algorithm would be documented and well-understood (which IMO would be a vast improvement, *if* we can agree on one... or more). With the rank approach, you'd probably need to say that the ranks were valid within the scope of a single feed document, and then describe the relations between ranks in different feed documents. Not sure that's as interesting. -- Mark Nottingham http://www.mnot.net/
Re: FYI: Updated Index draft
On 14/09/2005, at 1:06 PM, David Powell wrote: I'm probably on my own, but I expected Atom's statement that "This specification assigns no significance to the order of atom:entry elements within the feed" was non-negotiable and couldn't be changed by extensions. This seems more like potential Atom 1.1 material to me - it doesn't seem to layer on top of the Atom framework so much as slightly rewrite part of it. Strictly read, this doesn't preclude other specifications / extensions from adding semantics to the ordering of entries -- it only says that *this* spec doesn't assign any meaning to it. That was the intent as I recall it. Eg - An Atom library or server that doesn't know about this extension is free to not preserve the entry order, and yet to retain the element, even though this will have corrupted the data. That is indeed a problem. Probably the easiest way to fix this would be in errata, by adding a statement like "Some feeds may implicitly or explicitly (through extensions) have meaning assigned to the ordering of entries, so intermediaries SHOULD NOT reorder them." I think that as implemented, this extension wouldn't be safe to deploy without must-understand extensions, which Atom 1.0 doesn't support. That would be another way to go, but people didn't want mU. Cheers, -- Mark Nottingham http://www.mnot.net/
Re: FYI: Updated Index draft
On Thursday, September 22, 2005, at 10:20 AM, James M Snell wrote: Antone Roundy wrote: I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: Using atom:id as the ranking domain would limit the ranking to a single feed which is useful, but does not cover the full range of cases. ... Yes, there are two special cases here: 1. Lack of a i:domain 2. i:domain value that is a same document reference I think a ranking without a domain is pretty much useless--or at least likely to lead to problems downstream--so that case doesn't need to be covered. More on that below. ... Feed1 # A 50 20 B 25 40 Feed2 # C 50 30 D 25 10 In this example, the domainless rankings were added when the XHTML document was created, right? So the XHTML document is essentially an aggregate feed, just not in Atom format. Would it not make as much or more sense to mint an ID for the document (call it the ID of a "virtual Atom Feed Document" if you don't actually create an aggregate feed) and use it to scope those i:rank elements? If, somehow, someone were to pull the atom:feeds out of the XHTML document (if atom:feed getting embedded into xhtml:body is going to happen, then is not atom:feed getting extracted from xhtml:body also likely?) and aggregate them with other feeds with domainless i:rank elements, the scopes of those elements would get mixed. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): The goal of @scheme is to identify the type of ranking to apply while the goal of @domain is to identify the scope of the ranking. I do not believe that it is a good idea to conflate the two. Okay, I've come to agree with that while writing and editing this message. Note however that "fivestar" also indicates multiple things: 1) Higher numbers are "better" 2) The range is 0 to 5 (BTW, if this is limited to integers, how will you handle things like 3.5 stars, which are common in that type of rating system? Maybe decimal values need to be allowed.) 3) Hint: you might want to display the value as stars #1 is the only one needed for sorting of entries. #2 would be useful if the feed reader wanted to display some sort of graphical element to indicate the ranking. #3 might be slightly useful, but except for the most popular schemes, would probably be ignored. Perhaps all of these should be separated, a la: ... 3 ...where @domain is the feed/id of the feed if there's just one feed in scope, or a value that won't be duplicated by any feed/id otherwise (if one can mint a unique feed id, surely one can also mint a unique id that won't be used for a feed). I'd suggest that i:ranking-scheme/@domain either default to the containing feed/id (or the one from atom:source, if it exists) or be required, i:rank/@domain be required, @order default to ascending, @min-value default to 0, and the rest of the attributes be optional with no defaults.
Re: FYI: Updated Index draft
Antone Roundy wrote: On Wednesday, September 21, 2005, at 11:43 PM, James M Snell wrote: {domain} I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: Using atom:id as the ranking domain would limit the ranking to a single feed which is useful, but does not cover the full range of cases. Later on in your note, you say: If sticking with i:domain, I'd recommend that you recommend that in cases where a ranking domain does not span multiple feeds, the feed/id value be used for the value of i:domain, and that in all cases, the same care be taken to (attempt to) ensure that i:domain's value is unique to what is intended to be a particular domain. Yes, there are two special cases here: 1. Lack of a i:domain 2. i:domain value that is a same document reference In the first case, I had imagined a "Default Ranking Domain" that is identified by the feed atom:id element, just as you suggest. In the second case, I had imagined a "Document Ranking Domain" that is identified by the document containing the feed. There is a subtle difference between these two. Consider the following (somewhat contrived) example: ... Feed1 # A 50 20 B 25 40 Feed2 # C 50 30 D 25 10 The two embedded atom:feed elements specify two ranking domains: The Default Ranking Domain and a Document Ranking Domain. The Default Ranking Domain is scoped to the individual atom:feed as is identified by the value of the atom:id. the Document Ranking Domain is scoped to the containing document. The Default Ranking Domain ranks may only be used to order the entries within the containing atom:feed: sort_ascending ( Feed1 ) = B, A sort_ascending ( Feed2 ) = D, C The Document Ranking Domain ranks may be used to order all entries appearing within the document sort_ascending ( Document ) = D, A, C, B In an Atom Feed Document, the Default Ranking Domain and the Document Ranking Domain happen to be identical. urn:my_reviews descending descending Movie A 3 4 Movie B 2 1 Notes: * The i:order element tells the user agent whether higher or lower numbers are considered "better", "higher priority", "first", or whatever. In these cases, higher numbers are better, so would typicially be shown first, so they're considered a "descending" schemes. Hmm.. I wanted to get away from doing this kind of thing. * i:order/@label indicates a human readable label for the scheme, and could be optional. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): The goal of @scheme is to identify the type of ranking to apply while the goal of @domain is to identify the scope of the ranking. I do not believe that it is a good idea to conflate the two. - James
Re: FYI: Updated Index draft
James Holderness wrote: James M Snell wrote: This could all get rather complicated very quickly. My primary objective is to address known use cases for ordered feeds (my netflix queue feed[1] for example), most of which are structured as complete datasets that are non-incremental in nature. I realise that this sort of thing sounds like a good idea from a content provider's point of view, but as an aggregator developer, this is probably the last thing I would want to support. A feed that is not incremental is not a feed IMHO. There are just too many special case complications that an aggregator developer has to deal with that have nothing to do with regular, honest-to-goodness feeds. I do believe this falls under the Not-All-Feeds-Should-Be-Aggregated Category. That said, however, I think the concept of Feed-As-List is one that generally has a lot of support. 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. This kind of makes sense, but I'm not convinced it's necessary. If the feed has various ranks on which it can be sorted, I'd rather leave the decision on which one to use to the user. If, for whatever reason, those alternate domains are no longer applicable and the feed absolutely has to force the use of a particular domain, wouldn't it make more sense to filter out all those unused ranks rather than making the user download them? i:domain is not used as a key of determining which rankings to use; it's a key that is used to correlate rankings. Regarding filtering, we should not rely on aggregators filtering out unused ranks. Consider the case of digitally signed entries; filtering out a rank covered by the digital signature would invalidate the signature. 2. It helps us to correlate ranks that span multiple feed documents. For instance, two separate feed documents may specify the same ranking domain. This I like. By the description given, it sounds as if the BBC ranking is more a ranking of relative importance than a ranking of natural order. That is, Top Story A has a higher importance that Top Story B, etc. If that is the case, a "priority" or "importance" ranking scheme can be used in conjunction with the atom:updated element. This almost works. As an aggregator, what I would want to do is automatically sort with the date as the primary key and the priority as the secondary key. That way, today's high-priority items would appear at the top of the list, and yesterday's would follow on afterwards. Any of yesterday's items that were still of some importance today would need to have their atom:updated element set to today and their priority adjusted as appropriate. There are couple of problems though. The atom:updated element has to be identical for all items on a particular day. Also the atom:updated element can't be changed when an actual update occurs (say a spelling correction, or an update on a story) without breaking the ordering. The problem is we're abusing the atom:updated element so as to use it for something that's it's not. The updated elements would not need to be identical. Aggregators can easily determine whether or not entries with different updated values occured on the same day / same hour / etc. In other words, I could sort by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any difficulty. There is no abuse of atom:updated here. It would be better if we could add an extra attribute to your rank tag that specified what date the rank applied to. For someone like the BBC that reprioritizes feeds on a daily basis they'd set this attribute to something like say midnight for the date on which the ranking applies. If you have an item from a previous day that is still important today, it would keep its original atom:updated value, but the rank-date would be set to today. I'll give this some thought, but my initial gut reaction is that it is not necessary. Let me see if I can convince myself otherwise ;-) Regards James Thanks for the input! - James
Re: FYI: Updated Index draft
James M Snell wrote: This could all get rather complicated very quickly. My primary objective is to address known use cases for ordered feeds (my netflix queue feed[1] for example), most of which are structured as complete datasets that are non-incremental in nature. I realise that this sort of thing sounds like a good idea from a content provider's point of view, but as an aggregator developer, this is probably the last thing I would want to support. A feed that is not incremental is not a feed IMHO. There are just too many special case complications that an aggregator developer has to deal with that have nothing to do with regular, honest-to-goodness feeds. Are you supposed to automatically delete old items? With or without the users' consent? Do you archive old items in some way? How do you handle the aggregation of items from multiple non-incremental feeds into a single feed? How do you handle the aggregation of items from multiple feeds some of which are incremental and some of which are complete datasets? How do you handle filtering that results in a subset of items from what is supposed to be a complete dataset? That said, I suspect I'm fighting a losing battle, and I do like this proposal as it applies to ranking of feeds in general. 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. This kind of makes sense, but I'm not convinced it's necessary. If the feed has various ranks on which it can be sorted, I'd rather leave the decision on which one to use to the user. If, for whatever reason, those alternate domains are no longer applicable and the feed absolutely has to force the use of a particular domain, wouldn't it make more sense to filter out all those unused ranks rather than making the user download them? 2. It helps us to correlate ranks that span multiple feed documents. For instance, two separate feed documents may specify the same ranking domain. This I like. By the description given, it sounds as if the BBC ranking is more a ranking of relative importance than a ranking of natural order. That is, Top Story A has a higher importance that Top Story B, etc. If that is the case, a "priority" or "importance" ranking scheme can be used in conjunction with the atom:updated element. This almost works. As an aggregator, what I would want to do is automatically sort with the date as the primary key and the priority as the secondary key. That way, today's high-priority items would appear at the top of the list, and yesterday's would follow on afterwards. Any of yesterday's items that were still of some importance today would need to have their atom:updated element set to today and their priority adjusted as appropriate. There are couple of problems though. The atom:updated element has to be identical for all items on a particular day. Also the atom:updated element can't be changed when an actual update occurs (say a spelling correction, or an update on a story) without breaking the ordering. The problem is we're abusing the atom:updated element so as to use it for something that's it's not. It would be better if we could add an extra attribute to your rank tag that specified what date the rank applied to. For someone like the BBC that reprioritizes feeds on a daily basis they'd set this attribute to something like say midnight for the date on which the ranking applies. If you have an item from a previous day that is still important today, it would keep its original atom:updated value, but the rank-date would be set to today. An aggregator supporting this extension could then sort on the rank-date as the primary key (descending) and the rank value itself as the secondary key. For feeds that don't change their priorities over time you can just leave this attribute out and the aggregator can sort on the rank value alone. I don't think it overly complicates the interface, but it does add significant value IMO. Regards James
Re: FYI: Updated Index draft
On Wednesday, September 21, 2005, at 11:43 PM, James M Snell wrote: {domain} I was thinking yesterday of suggesting that feed/id be used the way you're using i:domain. Which is better is probably a matter of whether ranking domains that span multiple feeds will be useful or not. In the movie ratings use case presented below, perhaps rather than a fivestarts scheme and netflix and amazon domains, it might make more sense to do this: urn:my_reviews descending descending Movie A 3 4 Movie B 2 1 Notes: * The i:order element tells the user agent whether higher or lower numbers are considered "better", "higher priority", "first", or whatever. In these cases, higher numbers are better, so would typicially be shown first, so they're considered a "descending" schemes. * i:order/@label indicates a human readable label for the scheme, and could be optional. * Since the urn:(netflix|amazon).com/reviews schemes are feed independent, it is not necessary to indicate a feed (or "domain") in this case. * For a feed-specific scheme, like natural order, the feed ID would be included like this (so that if these entries were aggregated, it would be clear that the i:order elements were relevant to the source feed, not the aggregate feed): urn:my_feed ascending urn:my_feed/a 1 urn:my_feed/b 2 If sticking with i:domain, I'd recommend that you recommend that in cases where a ranking domain does not span multiple feeds, the feed/id value be used for the value of i:domain, and that in all cases, the same care be taken to (attempt to) ensure that i:domain's value is unique to what is intended to be a particular domain.
Re: FYI: Updated Index draft
This could all get rather complicated very quickly. My primary objective is to address known use cases for ordered feeds (my netflix queue feed[1] for example), most of which are structured as complete datasets that are non-incremental in nature. I'm not convinced that I necessarily want to try to solve all of the potential problem cases that could arise with ordered feeds that span multiple a collection of historical feeds, etc. Also, I am not wishing to duplicate what Microsoft has done with their simple list extensions. So with that in mind, I still wish to try and address the issues that have been raised so here's what I have so far: [1] http://rss.netflix.com/QueueRSS?id=P5365369447081104293883231608616881 {domain} {nonNegativeInteger} I drop the feed level i:ranking element and introduce a new i:domain element that identifies a "ranking domain" that this feed is a part of. The i:rank element is used to specify the nonNegativeInteger rank for the given {scheme} for the containing element. The {domain} attribute is used to scope the i:rank to a specific ranking domain -- for instance, the priority ranking is only relevant if the entry is contained in a feed with a corresponding i:domain element. The lack of a i:domain element indicates the "default ranking domain". Any i:rank elements that do not specify a domain attribute are considered to be part of the default ranking domain. For instance, in the following example, only the first i:rank is relevant within the given feed. Neither of the urn:bar i:rank elements are relevant within this particular feed example. tag:example.com,2005:/feed urn:foo tag:example.com,2005:/feed/1 1 2 tag:example.com,2005:/feed/2 2 1 Domain urn:foo ranking: tag:example.com,2005:/feed/1 then tag:example.com,2005:/feed/2 Domain urn:bar ranking: rag:example.com,2005:/feed/2 then tag:example.com,2005:/feed/1 The domain element serves multiple purposes. 1. It helps us to scope the relevance of an i:rank element within an entry. For instance, if an entry with an i:rank in the urn:foo domain is aggregated into a synthetic feed that either a) does not specify a ranking domain or b) specifies a different ranking domain, consumers can safely ignore the urn:foo i:rank. 2. It helps us to correlate ranks that span multiple feed documents. For instance, two separate feed documents may specify the same ranking domain. No-Rank Entries: no-rank entries are marked by the absence of an i:rank element corresponding to a given scheme. For instance, in the following example, entry "C" is a No-Rank Entry in the Index scheme, but is ranked in the Priority Scheme. A 1 10 B 2 50 C 20 Re: Eric's Question: "How does this help (eg) bbc.co.uk order their news items in some sensible manner?" By the description given, it sounds as if the BBC ranking is more a ranking of relative importance than a ranking of natural order. That is, Top Story A has a higher importance that Top Story B, etc. If that is the case, a "priority" or "importance" ranking scheme can be used in conjunction with the atom:updated element. top-story-A 2005-12-12T12:00:00Z 90 top-story-B 2005-12-12T12:00:00Z 80 top-story-C 2005-12-11T12:00:00Z 90 top-story-D 2005-12-11T12:00:00Z 80 In this example, top-story-A is ranked as the highest priority entry on Dec, 12, 2005 while top-story-C is ranked as the highest priority entry on Dec, 11, 2005. Re: Eric's Question: "What happens when entries "fall off the bottom" ... do their rankings expire?" It will be entirely dependent on the scheme. In a priority ranking scheme (measuring the relative importance of an entry), having an entry "fall off the bottom" would have no effect on the overall ordering/ranking of the feed. In a natural order ranking scheme (indexed position), having an entry "fall off the bottom" would likely mean that the entry is no longer a part of the ordered list or is no longer relevant to the rankings. Re: Thomas Broyer's suggestions: >1. get rid of your i:rank, users will use any extension element instead (no more > registry and you can still define "standard" priority and index extensions) I considered this but a single extensible rank element fits most of the simple use cases for this rather well. That said, allowing for specific ranking elements would be helpful, so how about a bit of a compromise? rankingCommonAttributes = attribute i:scheme { IRI }, attribute i:domain { IRI }? rankingConstruct = rankingCommonAttributes integerRank = element i:rank { rankingConstruct (nonNegativeInteger) } With this approach, i:rank is defined as the standard nonNegativeInteger ranking element. If I so desired, I could easily define new rankingConstructs however, for instance: importanceRank = element x:importance { rankingConstruct ('critical' | 'high' | 'medium' | 'low' | 'info'|) } high >2. get rid of your @order attribute: users should be able to choose in which
Re: FYI: Updated Index draft
I had considered something along those lines, but it seemed to me to be a bit vague. I suspect it would produce adequate results in the majority of cases, but I'd prefer something that gave the content provider finer control. I like the idea of being able to say exactly where in a feed an item should be positioned. Then again I'm not a content provider so maybe that's not the sort of thing they're looking for. Eric Scheid wrote: thinking more ... I think the way to handle this is that the client application could weight the ranking with the age of the item, and thus a rank#1 item would appear near the top of the list, and then slowly drop away. You also get to know what the original ranking for an item is.
Re: FYI: Updated Index draft
On 21/9/05 9:35 PM, "James Holderness" <[EMAIL PROTECTED]> wrote: > Marking entries as having no rank sounds like a nice idea, but I don't think > it's feasible in the long run. thinking more ... I think the way to handle this is that the client application could weight the ranking with the age of the item, and thus a rank#1 item would appear near the top of the list, and then slowly drop away. You also get to know what the original ranking for an item is. e.
Re: FYI: Updated Index draft
Marking entries as having no rank sounds like a nice idea, but I don't think it's feasible in the long run. In order to erase ranking effectively from previous entries, the content provider needs to double their feed size potentially. And if a user misses out on a "rank update" they could end up with news items from the distant past sitting at the top of their ranking forever. Admittedly you already have the problem of losing items that have fallen off the bottom of a feed, but at least the feed remains readable. With a corrupted ranking system, the ranking effectively becomes useless. One possible solution may be the use of a rank-offset tag. Let's say you have three items A, B and C with A being the most important (rank 1) and C being the least important (rank 3). You start with a rank-offset of 0 and your feed looks like this: A:1 B:2 C:3 (rank-offset = 0) Now say you want to add a new item D that falls between A and B, but you still only want to include 4 items in your feed. You increment the rank-offset to 1 and reconstruct the feed with new ranks which now look like this: A:1 D:2 B:3 (rank-offset = 1) When the client receives that feed, it automatically subtracts the rank-offset from each item's rank value before adding them to its database. So internally its list of items now look like this: A:0 D:1 B:2 C:3 A's rank has been updated. D has been inserted. B and C remain unchanged. From the client's point of view the ranking numbers will start going negative almost immediately, but as long as you treat the lowest (signed) value as having the highest priority it shouldn't be a problem. And the rankings that actually appear in the feed are always nice small positive integers. The rank-offset will get large over time (and I don't see how it can be reset), but that's just one tag. Eric Scheid wrote: The only way out of this conundrum is that bbc.co.uk will have to update the original #1 and #2 stories and re-rank them as much lower. If they re-rank them as #46 and #47 then they will need to re-rank any previous entry at those ranks to lower positions, and similarly for any other entries with ranks which get pushed down. Eventually the entire history of the feed needs to be re-ranked. Unless entries can be marked as having no rank. Can they?
Re: FYI: Updated Index draft
On 21/9/05 1:05 PM, "James M Snell" <[EMAIL PROTECTED]> wrote: > The ranking is part of the entry metadata. If an entry falls off the > feed, there is no effect on the ranking metadata. With partial feed > retrieval, ordering could be performed over the entire set of entries. How does this help (eg) bbc.co.uk order their news items in some sensible manner? Today, they have a couple of important stories, they indicate those entries are rank #1, #2. Tomorrow, they have more news, but not more important than yesterday's big news. The day after they have a new big story, it should be rank #1. The #1 and #2 stories from two days ago have fallen off the bottom of the feed. The only way out of this conundrum is that bbc.co.uk will have to update the original #1 and #2 stories and re-rank them as much lower. If they re-rank them as #46 and #47 then they will need to re-rank any previous entry at those ranks to lower positions, and similarly for any other entries with ranks which get pushed down. Eventually the entire history of the feed needs to be re-ranked. Unless entries can be marked as having no rank. Can they? e.
Re: FYI: Updated Index draft
Eric Scheid wrote: On 21/9/05 5:18 AM, "James M Snell" <[EMAIL PROTECTED]> wrote: For instance ... 10 ... 5 What happens when entries "fall off the bottom" ... do their rankings expire? How does that work with the diff+Feed method of partial feed retrieval? e. The ranking is part of the entry metadata. If an entry falls off the feed, there is no effect on the ranking metadata. With partial feed retrieval, ordering could be performed over the entire set of entries. That is: feed 2 A 1 B 3 C 2 D 4 The order for feed 1 is: A, B The order for feed 2 is: C, D Full Reconstructed order: A, C, B, D - James
Re: FYI: Updated Index draft
On 21/9/05 5:18 AM, "James M Snell" <[EMAIL PROTECTED]> wrote: > For instance > > > ... > 10 > > > ... > 5 > > What happens when entries "fall off the bottom" ... do their rankings expire? How does that work with the diff+Feed method of partial feed retrieval? e.
Re: FYI: Updated Index draft
James M Snell wrote: Complete example ... priority index order="descending">http://www.example.com/ranking/foo ... C 10 3 http://www.example.com/ranking/foo";>30 […] Thoughts? It looks more and more like Microsoft's RSS simple list extension [1], and I think they had the good approach (define sorts on the feed metadata, based on extension element values at the entry level) but a bad technical solution (use the extension element in a different context: when in cf:sort, it has a non-namespaced data-type attribute and its content is a "label" string, while in an entry it might not have attributes and its value should be of type specified by the @data-type attribute seen before). Suggestions: 1. get rid of your i:rank, users will use any extension element instead (no more registry and you can still define "standard" priority and index extensions) 2. get rid of your @order attribute: users should be able to choose in which order they want their entries: best-ranked to least-ranked "top to bottom" or "bottom to top". Its the responsibility of the producer to provide labels and values that will be well-understood by users (e.g. not saying "stars" and ranking from 1 (best rank) down to 5: "stars" implies "number of stars", so "sort by stars in ascending order" implies "the highest the value, the better it is", which is not what's behind 1=best-rank…) 3. make content of i:raking a user understandable label 4. (optional) add a data-type attribute to i:ranking (maybe rename that one to something related to sorting, not ranking) 5. use @namespace and @localname attributes on i:ranking to describe the element in entries the sort applies to (using those attributes prevent from using QNames in attribute values, which doesn't work great with prefix changes) http://example.com/user-review"; localname="stars">User-reviews stars … http://example.com/user-review";>1 … This, however, doesn't match "index" in the draft title any more. What could be even better, though a lot less "simple" (and not feasible, see below), would be to use XPath or XPointer (XPointer has the advantage that you define namespace prefix bindings "inside" it , using the xmlns() XPointer scheme). That way, you could use any element/subelement and/or attribute as the value holder for the sort. This would require however an XPath/XPointer engine, as well as storing the XML DOM, or mapping XPath/XPointer to your internal feed representation; this is not feasible. [1] http://msdn.microsoft.com/windowsvista/building/rss/simplefeedextensions/ -- Thomas Broyer
Re: FYI: Updated Index draft
Eric Scheid wrote: On 15/9/05 6:06 AM, "David Powell" <[EMAIL PROTECTED]> wrote: Eg - An Atom library or server that doesn't know about this extension is free to not preserve the entry order, and yet to retain the element, even though this will have corrupted the data. very good point. Indeed. And it is a point that signals a show stopper for the approach taken in the draft. As an alternative, I'm considering a alternate approach that places order metadata within the entry. For instance ... 10 ... 5 ... 15 The i:rank element is a non-negative integer. Consumers of the feed may use the rank as a key for sorting entries. Because different rankings may be relevant in different domains, the i:rank element will support a scheme attribute whose value is either a name or an IRI identifying a ranking scheme. The built in schemes are "priority" and "index". "priority" is used to rank the relevative importance of the entry (the higher the value, the higher the importance). "index" is used to specify a natural order for entries. New scheme names can be standardized through IANA registration. IRI values can be used to identify non-standardized schemes. If the scheme attribute is missing, the value is assumed to be "index". For instance ... 10 1 http://www.example.com/ranking/foo";>100 ... 5 2 http://www.example.com/ranking/foo";>50 ... 0 3 http://www.example.com/ranking/foo";>30 On the feed level, metadata can be specified to indicate which ranking schemes are intended to be applied to the entries in the feed. These are generally informative and do no rely on the actual order of the entries For instance ... priority index order="descending">http://www.example.com/ranking/foo The value of the i:ranking element is the name or IRI of a ranking scheme. The default attribute (value 'yes' or 'no') is used to indicate which ranking scheme should be considered the default. Only one i:ranking with @default="yes" is allowed within a feed. The order attribute (value 'ascending' or 'descending') is used to indicate the default sort order for that ranking scheme. If a feed contains a particular i:ranking scheme, it's entries SHOULD contain corresponding i:rank elements. Entry elements that do not have i:rank elements for a particular scheme must be presented at the end of the presentation list (see the example below) Complete example ... priority index order="descending">http://www.example.com/ranking/foo ... C 10 3 http://www.example.com/ranking/foo";>30 ... A 5 1 http://www.example.com/ranking/foo";>50 ... B 0 2 http://www.example.com/ranking/foo";>100 ... D Ordered descending by priority): C, A, B, D Ordered ascending by index: A, B, C, D Ordered descending by http://www.example.com/ranking/foo: B, A, C, D Thoughts? - James
Re: FYI: Updated Index draft
On 15/9/05 6:06 AM, "David Powell" <[EMAIL PROTECTED]> wrote: > Eg - An Atom library or server that doesn't know about this extension > is free to not preserve the entry order, and yet to retain the > element, even though this will have corrupted the data. very good point. e.
Re: FYI: Updated Index draft
David, Excellent comments. David Powell wrote: How will this interact with the sliding-window/feed-history interpretation of feeds? The natural order assigned by this extension seems incompatible with the implied date order that would be implied by two feed documents, polled over some period of time. What should be the order of a merged feed history such as this: Poll 1: feed(e1, e2, e3) Poll 2: feed(e3, e1, e5) - where, perhaps, 3 and 1 have been updated. How do you combine entries sorted by their natural order, with the time-ordered feed history? Natural ordering and time-ordering are, by the very nature, opposing views -- unless of course, the natural ordering and time-ordering just happen to coincide with one another (by chance or design) . Using the terminology from Mark Nottingham's Feed History extension, naturally ordered feeds are likely to also be non-incremental feeds. For instance, my NetFlix.com Queue Feed is clearly intended to be an ordered, non-incremental feed. The feed presents it's entire state. There is no history. The ordering of the items in the feed is significant. I believe that it is safe to assert that ordered feeds should be presumed to be non-incremental. I will hold off on making that a normative assertion, however, due simply because there is no evidence that natural ordering *can't* be preserved across multiple feeds. How will this interact with entry documents, eg over pubsub. What about Atom Protocol - I can't imagine how I would publish a feed with a given natural order. For something like the BBC feeds, some sort of arbitrary "score" field might be more interoperable with both entry documents, Atom protocol, and feed history. This is definitely something I've been thinking about -- that is, how to use the Atom protocol to edit an ordered collection. Without the introduction of a specific metadata field within the entry itself, the only potential option is a pub:control parameter that specifies the ordering index for the entry. At this time I simply do not yet know if that is the right approach. I'll need to experiment a bit more. I'm probably on my own, but I expected Atom's statement that "This specification assigns no significance to the order of atom:entry elements within the feed" was non-negotiable and couldn't be changed by extensions. This seems more like potential Atom 1.1 material to me - it doesn't seem to layer on top of the Atom framework so much as slightly rewrite part of it. As far as feed processing is concerned I agree that the ordering of atom:entry elements is not significant, even if the feed does contain . The ordered extension is a flag that helps applications interpret the intention of the feed. For example, there is a clear distinction of intent between my weblogs feed and my NetFlix Queue feed. While both can be treated the same under the covers, having some sort of clue as to how the two should be presented to the user is helpful. Eg - An Atom library or server that doesn't know about this extension is free to not preserve the entry order, and yet to retain the element, even though this will have corrupted the data. Agreed that this is a valid issue. Let me stew over this one. I think that as implemented, this extension wouldn't be safe to deploy without must-understand extensions, which Atom 1.0 doesn't support. Ordered feeds are a useful problem though. Indexes or scores on entries might work better with entry documents, the protocol, and with the Atom extension framework, but it still isn't clear how they would interact with the sliding window. Nor is it clear how it could work in aggregation scenarios. e.g. what happens if the entry contains an index and is aggregated into a feed that has another entry with a conflicting index? A couple more minor points: I'm not sure whether the descending/ascending attribute is necessary? Given that the extension just presents a natural order (by some unnamed ordering), why would anyone go to the trouble of presenting the entries in reversed order, and then label them as descending; why not just present them in ascending order to begin with? Agreed. I actually had the same thought the other day and had a "Boy, that was silly" head-slap moment. Would it be useful for the extension to allow the natural ordering to be named? - so if the ordering is by "Importance", or "Order of real-life events", or something else, then it could labelled with a URI and/or label, so that people don't have to guess the significance of the natural order. Interesting thought. Correct me if I'm wrong, but this would look something like: http://www.example.com/ordered/by/priority With each entry having something like a corresponding priority element (just an example) 70 Or http://www.example.com/ordered/by/position Or ... whatever else The bottom line would be that the URI value of the ordered element would indicate
Re: FYI: Updated Index draft
Monday, September 12, 2005, 5:55:20 PM, James M Snell wrote: > I've updated the draft that defines an extension that can be used to > indicate that the order of entries within a Feed should be considered > significant. How will this interact with the sliding-window/feed-history interpretation of feeds? The natural order assigned by this extension seems incompatible with the implied date order that would be implied by two feed documents, polled over some period of time. What should be the order of a merged feed history such as this: Poll 1: feed(e1, e2, e3) Poll 2: feed(e3, e1, e5) - where, perhaps, 3 and 1 have been updated. How do you combine entries sorted by their natural order, with the time-ordered feed history? How will this interact with entry documents, eg over pubsub. What about Atom Protocol - I can't imagine how I would publish a feed with a given natural order. For something like the BBC feeds, some sort of arbitrary "score" field might be more interoperable with both entry documents, Atom protocol, and feed history. I'm probably on my own, but I expected Atom's statement that "This specification assigns no significance to the order of atom:entry elements within the feed" was non-negotiable and couldn't be changed by extensions. This seems more like potential Atom 1.1 material to me - it doesn't seem to layer on top of the Atom framework so much as slightly rewrite part of it. Eg - An Atom library or server that doesn't know about this extension is free to not preserve the entry order, and yet to retain the element, even though this will have corrupted the data. I think that as implemented, this extension wouldn't be safe to deploy without must-understand extensions, which Atom 1.0 doesn't support. Ordered feeds are a useful problem though. Indexes or scores on entries might work better with entry documents, the protocol, and with the Atom extension framework, but it still isn't clear how they would interact with the sliding window. A couple more minor points: I'm not sure whether the descending/ascending attribute is necessary? Given that the extension just presents a natural order (by some unnamed ordering), why would anyone go to the trouble of presenting the entries in reversed order, and then label them as descending; why not just present them in ascending order to begin with? Would it be useful for the extension to allow the natural ordering to be named? - so if the ordering is by "Importance", or "Order of real-life events", or something else, then it could labelled with a URI and/or label, so that people don't have to guess the significance of the natural order. -- Dave
FYI: Updated Index draft
I've updated the draft that defines an extension that can be used to indicate that the order of entries within a Feed should be considered significant. http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-index-02.txt Example, http://www.w3.org/2005/Atom"; xmlns:fi="http://purl.org/syndication/index/1.0";> ... tag:entry:1 ... tag:entry:2 ... tag:entry:3 ... The fi:ordered element indicates that the order of the entries as presented in the feed should be considered to be significant. The @sort attribute indicates the default sort order for those entries. A value of "descending" indicates that the entries should be presented last-to-first. A value of "ascending" indcates that the entries should be presented first-to-last. - James
FYI: Updated Index draft
I've updated the draft that defines an extension that can be used to indicate that the order of entries within a Feed should be considered significant. http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-index-02.txt Example, http://www.w3.org/2005/Atom"; xmlns:fi="http://purl.org/syndication/index/1.0";> ... tag:entry:1 ... tag:entry:2 ... tag:entry:3 ... The fi:ordered element indicates that the order of the entries as presented in the feed should be considered to be significant. The @sort attribute indicates the default sort order for those entries. A value of "descending" indicates that the entries should be presented last-to-first. A value of "ascending" indcates that the entries should be presented first-to-last. - James