Re: FYI: Updated Index draft

2005-09-27 Thread James Holderness


I'm sorry for the long delay in replying, but I've been swamped with work 
lately.


James M Snell wrote:
1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain is 
aggregated into a synthetic feed that either a) does not specify a 
ranking domain or b) specifies a different ranking domain, consumers can 
safely ignore the urn:foo i:rank.


i:domain is not used as a key of determining which rankings to use; it's a 
key that is used to correlate rankings.


The correlating part I get - I think that's a great idea. It's part 1 
(quoted above) that I don't understand. You explicitly say a consumer can 
safely ignore the rank based on what domain it sees in a feed. How is that 
not a determination of which rankings to use?


Regarding filtering, we should not rely on aggregators filtering out 
unused ranks.  Consider the case of digitally signed entries; filtering 
out a rank covered by the digital signature would invalidate the 
signature.


I haven't been following the digital signature proposals, but I would have 
expected they would attempt to sign a particular content entry rather than 
trying to include all the metadata that went with it. Is it really safe to 
assume that an item aggregated into a synthetic feed or one that has passed 
through a caching/forwarding system will not have had metadata tags added, 
removed or reordered? I guess this is offtopic though and I see your point.


There are couple of problems though. The atom:updated element has to be 
identical for all items on a particular day. Also the atom:updated 
element can't be changed when an actual update occurs (say a spelling 
correction, or an update on a story) without breaking the ordering. The 
problem is we're abusing the atom:updated element so as to use it for 
something that's it's not.


The updated elements would not need to be identical.  Aggregators can 
easily determine whether or not entries with different updated values 
occured on the same day / same hour / etc.  In other words, I could sort 
by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any 
difficulty.  There is no abuse of atom:updated here.


The problem is knowing what to sort on. Unless you provide that information 
somewhere in the feed there's no way for the aggregator to perform the sort 
automatically. Say BBC updates its ranks every 6 hours, how is the 
aggregator to know that it should sort by halfday+priority? Or maybe it 
updates at 8am and 2pm every day - how would an aggregator deal with that?


With my proposal they would set rank-date to 8am (of the current day) for 
every item updated between 8am and 2pm. And items updated from 2pm to 8am 
the following day would have a rank-date of 2pm. The aggregator sorts on 
rankdate+priority and it all just lines up automatically. No guessing 
required.


I'll give this some thought, but my initial gut reaction is that it is not 
necessary.  Let me see if I can convince myself otherwise ;-)


You could be right. It's not exactly a critical need. I just thought it 
wouldn't harm at least having it there as an option for those people that 
might want that level of control.


Regards
James



Re: FYI: Updated Index draft

2005-09-23 Thread James M Snell


Antone Roundy wrote:



I think a ranking without a domain is pretty much useless--or at least 
likely to lead to problems downstream--so that case doesn't need to be 
covered.  More on that below.



Agreed.


 
   ...
   
 
   Feed1
   # 
   
 A
 50
 20
   
   
 B
 25
 40
   
 
 
   Feed2
   # 
   
 C
 50
 30
   
   
 D
 25
 10
   
 
   
 



In this example, the domainless rankings were added when the XHTML 
document was created, right?  So the XHTML document is essentially an 
aggregate feed, just not in Atom format.  Would it not make as much or 
more sense to mint an ID for the document (call it the ID of a 
"virtual Atom Feed Document" if you don't actually create an aggregate 
feed) and use it to scope those i:rank elements?  If, somehow, someone 
were to pull the atom:feeds out of the XHTML document (if atom:feed 
getting embedded into xhtml:body is going to happen, then is not 
atom:feed getting extracted from xhtml:body also likely?) and 
aggregate them with other feeds with domainless i:rank elements, the 
scopes of those elements would get mixed.


Yes, but we cannot reliably dictate that containing documents must 
contain atom:id elements simply because we have no control over the 
definitions of those containing documents.  And yes, if someone pulls 
the feed out of the XHTML and uses it somewhere else, any ranks in the 
document scope will be affected.  I do not believe that this is a 
deal-breaker, however.. it's just something that folks using the ranking 
mechanism need to be aware of so that they can make the appropriate 
decisions about how and when to properly use the document ranking domain 
versus a domain that explicitly scoped to a given ID.


* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would 
be included like this (so that if these entries were aggregated, it 
would be clear that the i:order elements were relevant to the source 
feed, not the aggregate feed):


The goal of @scheme is to identify the type of ranking to apply while 
the goal of @domain is to identify the scope of the ranking.  I do 
not believe that it is a good idea to conflate the two.



Okay, I've come to agree with that while writing and editing this 
message.  Note however that "fivestar" also indicates multiple things:


1) Higher numbers are "better"
2) The range is 0 to 5 (BTW, if this is limited to integers, how will 
you handle things like 3.5 stars, which are common in that type of 
rating system? Maybe decimal values need to be allowed.)

3) Hint: you might want to display the value as stars

#1 is the only one needed for sorting of entries. #2 would be useful 
if the feed reader wanted to display some sort of graphical element to 
indicate the ranking. #3 might be slightly useful, but except for the 
most popular schemes, would probably be ignored. Perhaps all of these 
should be separated, a la:





Minutes before I received this note I had a similar thought that a 
scheme definition could be useful -- although that get's us quite close 
to the territory of the RSS simple list extensions (not that it is a bad 
thing).  The symbol attribute is a bit strange.  I'd rather let the 
application determine how it wants to display the rank.  The label, 
order, min and max values and domain attribute are fine.  And yes, 
regarding #2, allowing decimal values would likely be a good idea... 
doing so would also allow us to do ratings that are based on a 0-1 
fractional scheme (e.g. percentages, etc).  Negative values should also 
be allowed.



...

3

...where @domain is the feed/id of the feed if there's just one feed 
in scope, or a value that won't be duplicated by any feed/id otherwise 
(if one can mint a unique feed id, surely one can also mint a unique 
id that won't be used for a feed).


I'd suggest that i:ranking-scheme/@domain either default to the 
containing feed/id (or the one from atom:source, if it exists) or be 
required, i:rank/@domain be required, @order default to ascending, 
@min-value default to 0, and the rest of the attributes be optional 
with no defaults.




I'm liking these suggestions...

The i:ranking-scheme element would appear within the atom:feed.  If the 
@domain attribute is missing, the domain is automatically mapped to the 
id of the feed. If the @domain attribute is a same document reference, 
the domain is mapped to the document scope.


 
 

 
 http://www.example.com"; 
xml:base="http://www.example.com"; />


The meaning of the @order attribute needs to be clearly articulated.  It 
is NOT an indicator of how applications should display the elements 
rather an indicator of how to interpret the rank values (e.g. highest 
number is most significant, lowest

Re: FYI: Updated Index draft

2005-09-22 Thread Mark Nottingham



On 14/09/2005, at 1:06 PM, David Powell wrote:


How will this interact with the sliding-window/feed-history
interpretation of feeds? The natural order assigned by this extension
seems incompatible with the implied date order that would be implied
by two feed documents, polled over some period of time.

What should be the order of a merged feed history such as this:

Poll 1:
feed(e1, e2, e3)

Poll 2:
feed(e3, e1, e5)

- where, perhaps, 3 and 1 have been updated. How do you combine
entries sorted by their natural order, with the time-ordered feed
history?


There'd need to be an algorithm described for combing the feed  
documents; e.g., see the _combine() method in http://www.mnot.net/rss/ 
history/feed_history.py. In practice, most/all(?) popular aggregators  
do this now (feed history + natural order); the only change is that  
the algorithm would be documented and well-understood (which IMO  
would be a vast improvement, *if* we can agree on one... or more).


With the rank approach, you'd probably need to say that the ranks  
were valid within the scope of a single feed document, and then  
describe the relations between ranks in different feed documents. Not  
sure that's as interesting.


--
Mark Nottingham http://www.mnot.net/



Re: FYI: Updated Index draft

2005-09-22 Thread Mark Nottingham



On 14/09/2005, at 1:06 PM, David Powell wrote:


I'm probably on my own, but I expected Atom's statement that "This
specification assigns no significance to the order of atom:entry
elements within the feed" was non-negotiable and couldn't be changed
by extensions. This seems more like potential Atom 1.1 material to me
- it doesn't seem to layer on top of the Atom framework so much as
slightly rewrite part of it.


Strictly read, this doesn't preclude other specifications /  
extensions from adding semantics to the ordering of entries -- it  
only says that *this* spec doesn't assign any meaning to it. That was  
the intent as I recall it.



Eg - An Atom library or server that doesn't know about this extension
is free to not preserve the entry order, and yet to retain the
 element, even though this will have corrupted the data.


That is indeed a problem. Probably the easiest way to fix this would  
be in errata, by adding a statement like "Some feeds may implicitly  
or explicitly (through extensions) have meaning assigned to the  
ordering of entries, so intermediaries SHOULD NOT reorder them."



I think that as implemented, this extension wouldn't be safe to deploy
without must-understand extensions, which Atom 1.0 doesn't support.


That would be another way to go, but people didn't want mU.

Cheers,

--
Mark Nottingham http://www.mnot.net/



Re: FYI: Updated Index draft

2005-09-22 Thread Antone Roundy


On Thursday, September 22, 2005, at 10:20  AM, James M Snell wrote:

Antone Roundy wrote:
I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of 
whether ranking domains that span multiple feeds will be useful or 
not. In the movie ratings use case presented below, perhaps rather 
than a fivestarts scheme and netflix and amazon domains, it might 
make more sense to do this:


Using atom:id as the ranking domain would limit the ranking to a 
single feed which is useful, but does not cover the full range of 
cases.

...

Yes, there are two special cases here:

1. Lack of a i:domain
2. i:domain value that is a same document reference


I think a ranking without a domain is pretty much useless--or at least 
likely to lead to problems downstream--so that case doesn't need to be 
covered.  More on that below.



 
   ...
   
 
   Feed1
   # 
   
 A
 50
 20
   
   
 B
 25
 40
   
 
 
   Feed2
   # 
   
 C
 50
 30
   
   
 D
 25
 10
   
 
   
 


In this example, the domainless rankings were added when the XHTML 
document was created, right?  So the XHTML document is essentially an 
aggregate feed, just not in Atom format.  Would it not make as much or 
more sense to mint an ID for the document (call it the ID of a "virtual 
Atom Feed Document" if you don't actually create an aggregate feed) and 
use it to scope those i:rank elements?  If, somehow, someone were to 
pull the atom:feeds out of the XHTML document (if atom:feed getting 
embedded into xhtml:body is going to happen, then is not atom:feed 
getting extracted from xhtml:body also likely?) and aggregate them with 
other feeds with domainless i:rank elements, the scopes of those 
elements would get mixed.


* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would 
be included like this (so that if these entries were aggregated, it 
would be clear that the i:order elements were relevant to the source 
feed, not the aggregate feed):


The goal of @scheme is to identify the type of ranking to apply while 
the goal of @domain is to identify the scope of the ranking.  I do not 
believe that it is a good idea to conflate the two.


Okay, I've come to agree with that while writing and editing this 
message.  Note however that "fivestar" also indicates multiple things:


1) Higher numbers are "better"
2) The range is 0 to 5 (BTW, if this is limited to integers, how will 
you handle things like 3.5 stars, which are common in that type of 
rating system? Maybe decimal values need to be allowed.)

3) Hint: you might want to display the value as stars

#1 is the only one needed for sorting of entries. #2 would be useful if 
the feed reader wanted to display some sort of graphical element to 
indicate the ranking. #3 might be slightly useful, but except for the 
most popular schemes, would probably be ignored. Perhaps all of these 
should be separated, a la:



...

3

...where @domain is the feed/id of the feed if there's just one feed in 
scope, or a value that won't be duplicated by any feed/id otherwise (if 
one can mint a unique feed id, surely one can also mint a unique id 
that won't be used for a feed).


I'd suggest that i:ranking-scheme/@domain either default to the 
containing feed/id (or the one from atom:source, if it exists) or be 
required, i:rank/@domain be required, @order default to ascending, 
@min-value default to 0, and the rest of the attributes be optional 
with no defaults.




Re: FYI: Updated Index draft

2005-09-22 Thread James M Snell


Antone Roundy wrote:



On Wednesday, September 21, 2005, at 11:43  PM, James M Snell wrote:



{domain}


I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of whether 
ranking domains that span multiple feeds will be useful or not. In the 
movie ratings use case presented below, perhaps rather than a 
fivestarts scheme and netflix and amazon domains, it might make more 
sense to do this:


Using atom:id as the ranking domain would limit the ranking to a single 
feed which is useful, but does not cover the full range of cases.


Later on in your note, you say:

If sticking with i:domain, I'd recommend that you recommend that in 
cases where a ranking domain does not span multiple feeds, the feed/id 
value be used for the value of i:domain, and that in all cases, the 
same care be taken to (attempt to) ensure that i:domain's value is 
unique to what is intended to be a particular domain.




Yes, there are two special cases here:

1. Lack of a i:domain
2. i:domain value that is a same document reference

In the first case, I had imagined a "Default Ranking Domain" that is 
identified by the feed atom:id element, just as you suggest.
In the second case, I had imagined a "Document Ranking Domain" that is 
identified by the document containing the feed. 

There is a subtle difference between these two.  Consider the following 
(somewhat contrived) example:


 
   ...
   
 
   Feed1
   # 
   
 A
 50
 20
   
   
 B
 25
 40
   
 
 
   Feed2
   # 
   
 C
 50
 30
   
   
 D
 25
 10
   
 
   
 

The two embedded atom:feed elements specify two ranking domains: The 
Default Ranking Domain and a Document Ranking Domain.  The Default 
Ranking Domain is scoped to the individual atom:feed as is identified by 
the value of the atom:id.  the Document Ranking Domain is scoped to the 
containing document.


The Default Ranking Domain ranks may only be used to order the entries 
within the containing atom:feed: 
 sort_ascending ( Feed1 ) = B, A

 sort_ascending ( Feed2 ) = D, C

The Document Ranking Domain ranks may be used to order all entries 
appearing within the document

 sort_ascending ( Document ) = D, A, C, B

In an Atom Feed Document, the Default Ranking Domain and the Document 
Ranking Domain happen to be identical.




urn:my_reviews
descending
descending


Movie A
3
4


Movie B
2
1



Notes:
* The i:order element tells the user agent whether higher or lower 
numbers are considered "better", "higher priority", "first", or 
whatever. In these cases, higher numbers are better, so would 
typicially be shown first, so they're considered a "descending" schemes.


Hmm.. I wanted to get away from doing this kind of thing.

* i:order/@label indicates a human readable label for the scheme, and 
could be optional.
* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would be 
included like this (so that if these entries were aggregated, it would 
be clear that the i:order elements were relevant to the source feed, 
not the aggregate feed):


The goal of @scheme is to identify the type of ranking to apply while 
the goal of @domain is to identify the scope of the ranking.  I do not 
believe that it is a good idea to conflate the two.


- James



Re: FYI: Updated Index draft

2005-09-22 Thread James M Snell


James Holderness wrote:



James M Snell wrote:

This could all get rather complicated very quickly. My primary 
objective is to address known use cases for ordered feeds (my netflix 
queue feed[1] for example), most of which are structured as complete 
datasets that are non-incremental in nature.



I realise that this sort of thing sounds like a good idea from a 
content provider's point of view, but as an aggregator developer, this 
is probably the last thing I would want to support. A feed that is not 
incremental is not a feed IMHO. There are just too many special case 
complications that an aggregator developer has to deal with that have 
nothing to do with regular, honest-to-goodness feeds.


I do believe this falls under the Not-All-Feeds-Should-Be-Aggregated 
Category.  That said, however, I think the concept of Feed-As-List is 
one that generally has a lot of support.


1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain 
is aggregated into a synthetic feed that either a) does not specify a 
ranking domain or b) specifies a different ranking domain, consumers 
can safely ignore the urn:foo i:rank.


This kind of makes sense, but I'm not convinced it's necessary. If the 
feed has various ranks on which it can be sorted, I'd rather leave the 
decision on which one to use to the user. If, for whatever reason, 
those alternate domains are no longer applicable and the feed 
absolutely has to force the use of a particular domain, wouldn't it 
make more sense to filter out all those unused ranks rather than 
making the user download them?


i:domain is not used as a key of determining which rankings to use; it's 
a key that is used to correlate rankings.  Regarding filtering, we 
should not rely on aggregators filtering out unused ranks.  Consider the 
case of digitally signed entries; filtering out a rank covered by the 
digital signature would invalidate the signature.


2. It helps us to correlate ranks that span multiple feed documents. 
For instance, two separate feed documents may specify the same 
ranking domain.



This I like.

By the description given, it sounds as if the BBC ranking is more a 
ranking of relative importance than a ranking of natural order. That 
is, Top Story A has a higher importance that Top Story B, etc. If 
that is the case, a "priority" or "importance" ranking scheme can be 
used in conjunction with the atom:updated element.



This almost works. As an aggregator, what I would want to do is 
automatically sort with the date as the primary key and the priority 
as the secondary key. That way, today's high-priority items would 
appear at the top of the list, and yesterday's would follow on 
afterwards. Any of yesterday's items that were still of some 
importance today would need to have their atom:updated element set to 
today and their priority adjusted as appropriate.


There are couple of problems though. The atom:updated element has to 
be identical for all items on a particular day. Also the atom:updated 
element can't be changed when an actual update occurs (say a spelling 
correction, or an update on a story) without breaking the ordering. 
The problem is we're abusing the atom:updated element so as to use it 
for something that's it's not.


The updated elements would not need to be identical.  Aggregators can 
easily determine whether or not entries with different updated values 
occured on the same day / same hour / etc.  In other words, I could sort 
by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any 
difficulty.  There is no abuse of atom:updated here.


It would be better if we could add an extra attribute to your rank tag 
that specified what date the rank applied to. For someone like the BBC 
that reprioritizes feeds on a daily basis they'd set this attribute to 
something like say midnight for the date on which the ranking applies. 
If you have an item from a previous day that is still important today, 
it would keep its original atom:updated value, but the rank-date would 
be set to today.


I'll give this some thought, but my initial gut reaction is that it is 
not necessary.  Let me see if I can convince myself otherwise ;-)




Regards
James


Thanks for the input!

- James




Re: FYI: Updated Index draft

2005-09-22 Thread James Holderness


James M Snell wrote:
This could all get rather complicated very quickly. My primary objective 
is to address known use cases for ordered feeds (my netflix queue feed[1] 
for example), most of which are structured as complete datasets that are 
non-incremental in nature.


I realise that this sort of thing sounds like a good idea from a content 
provider's point of view, but as an aggregator developer, this is probably 
the last thing I would want to support. A feed that is not incremental is 
not a feed IMHO. There are just too many special case complications that an 
aggregator developer has to deal with that have nothing to do with regular, 
honest-to-goodness feeds.


Are you supposed to automatically delete old items? With or without the 
users' consent? Do you archive old items in some way? How do you handle the 
aggregation of items from multiple non-incremental feeds into a single feed? 
How do you handle the aggregation of items from multiple feeds some of which 
are incremental and some of which are complete datasets? How do you handle 
filtering that results in a subset of items from what is supposed to be a 
complete dataset?


That said, I suspect I'm fighting a losing battle, and I do like this 
proposal as it applies to ranking of feeds in general.


1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain is 
aggregated into a synthetic feed that either a) does not specify a ranking 
domain or b) specifies a different ranking domain, consumers can safely 
ignore the urn:foo i:rank.


This kind of makes sense, but I'm not convinced it's necessary. If the feed 
has various ranks on which it can be sorted, I'd rather leave the decision 
on which one to use to the user. If, for whatever reason, those alternate 
domains are no longer applicable and the feed absolutely has to force the 
use of a particular domain, wouldn't it make more sense to filter out all 
those unused ranks rather than making the user download them?


2. It helps us to correlate ranks that span multiple feed documents. For 
instance, two separate feed documents may specify the same ranking domain.


This I like.

By the description given, it sounds as if the BBC ranking is more a 
ranking of relative importance than a ranking of natural order. That is, 
Top Story A has a higher importance that Top Story B, etc. If that is the 
case, a "priority" or "importance" ranking scheme can be used in 
conjunction with the atom:updated element.


This almost works. As an aggregator, what I would want to do is 
automatically sort with the date as the primary key and the priority as the 
secondary key. That way, today's high-priority items would appear at the top 
of the list, and yesterday's would follow on afterwards. Any of yesterday's 
items that were still of some importance today would need to have their 
atom:updated element set to today and their priority adjusted as 
appropriate.


There are couple of problems though. The atom:updated element has to be 
identical for all items on a particular day. Also the atom:updated element 
can't be changed when an actual update occurs (say a spelling correction, or 
an update on a story) without breaking the ordering. The problem is we're 
abusing the atom:updated element so as to use it for something that's it's 
not.


It would be better if we could add an extra attribute to your rank tag that 
specified what date the rank applied to. For someone like the BBC that 
reprioritizes feeds on a daily basis they'd set this attribute to something 
like say midnight for the date on which the ranking applies. If you have an 
item from a previous day that is still important today, it would keep its 
original atom:updated value, but the rank-date would be set to today.


An aggregator supporting this extension could then sort on the rank-date as 
the primary key (descending) and the rank value itself as the secondary key. 
For feeds that don't change their priorities over time you can just leave 
this attribute out and the aggregator can sort on the rank value alone. I 
don't think it overly complicates the interface, but it does add significant 
value IMO.


Regards
James



Re: FYI: Updated Index draft

2005-09-22 Thread Antone Roundy


On Wednesday, September 21, 2005, at 11:43  PM, James M Snell wrote:


{domain}
I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of whether 
ranking domains that span multiple feeds will be useful or not. In the 
movie ratings use case presented below, perhaps rather than a 
fivestarts scheme and netflix and amazon domains, it might make more 
sense to do this:



urn:my_reviews
descending
descending


Movie A
3
4


Movie B
2
1



Notes:
* The i:order element tells the user agent whether higher or lower 
numbers are considered "better", "higher priority", "first", or 
whatever. In these cases, higher numbers are better, so would 
typicially be shown first, so they're considered a "descending" schemes.
* i:order/@label indicates a human readable label for the scheme, and 
could be optional.
* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would be 
included like this (so that if these entries were aggregated, it would 
be clear that the i:order elements were relevant to the source feed, 
not the aggregate feed):



urn:my_feed
ascending

urn:my_feed/a
1


urn:my_feed/b
2



If sticking with i:domain, I'd recommend that you recommend that in 
cases where a ranking domain does not span multiple feeds, the feed/id 
value be used for the value of i:domain, and that in all cases, the 
same care be taken to (attempt to) ensure that i:domain's value is 
unique to what is intended to be a particular domain.




Re: FYI: Updated Index draft

2005-09-21 Thread James M Snell


This could all get rather complicated very quickly. My primary objective 
is to address known use cases for ordered feeds (my netflix queue 
feed[1] for example), most of which are structured as complete datasets 
that are non-incremental in nature. I'm not convinced that I necessarily 
want to try to solve all of the potential problem cases that could arise 
with ordered feeds that span multiple a collection of historical feeds, 
etc. Also, I am not wishing to duplicate what Microsoft has done with 
their simple list extensions. So with that in mind, I still wish to try 
and address the issues that have been raised so here's what I have so far:


[1] http://rss.netflix.com/QueueRSS?id=P5365369447081104293883231608616881


{domain}

{nonNegativeInteger}



I drop the feed level i:ranking element and introduce a new i:domain 
element that identifies a "ranking domain" that this feed is a part of.


The i:rank element is used to specify the nonNegativeInteger rank for 
the given {scheme} for the containing element. The {domain} attribute is 
used to scope the i:rank to a specific ranking domain -- for instance, 
the priority ranking is only relevant if the entry is contained in a 
feed with a corresponding i:domain element.


The lack of a i:domain element indicates the "default ranking domain". 
Any i:rank elements that do not specify a domain attribute are 
considered to be part of the default ranking domain.


For instance, in the following example, only the first i:rank is 
relevant within the given feed. Neither of the urn:bar i:rank elements 
are relevant within this particular feed example.


tag:example.com,2005:/feed
urn:foo

tag:example.com,2005:/feed/1
1
2


tag:example.com,2005:/feed/2
2
1



Domain urn:foo ranking: tag:example.com,2005:/feed/1 then 
tag:example.com,2005:/feed/2
Domain urn:bar ranking: rag:example.com,2005:/feed/2 then 
tag:example.com,2005:/feed/1


The domain element serves multiple purposes.

1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain is 
aggregated into a synthetic feed that either a) does not specify a 
ranking domain or b) specifies a different ranking domain, consumers can 
safely ignore the urn:foo i:rank.


2. It helps us to correlate ranks that span multiple feed documents. For 
instance, two separate feed documents may specify the same ranking domain.


No-Rank Entries: no-rank entries are marked by the absence of an i:rank 
element corresponding to a given scheme. For instance, in the following 
example, entry "C" is a No-Rank Entry in the Index scheme, but is ranked 
in the Priority Scheme.




A
1
10


B
2
50


C
20



Re: Eric's Question: "How does this help (eg) bbc.co.uk order their news 
items in some sensible manner?"


By the description given, it sounds as if the BBC ranking is more a 
ranking of relative importance than a ranking of natural order. That is, 
Top Story A has a higher importance that Top Story B, etc. If that is 
the case, a "priority" or "importance" ranking scheme can be used in 
conjunction with the atom:updated element.




top-story-A
2005-12-12T12:00:00Z
90


top-story-B
2005-12-12T12:00:00Z
80


top-story-C
2005-12-11T12:00:00Z
90


top-story-D
2005-12-11T12:00:00Z
80



In this example, top-story-A is ranked as the highest priority entry on 
Dec, 12, 2005 while top-story-C is ranked as the highest priority entry 
on Dec, 11, 2005.


Re: Eric's Question: "What happens when entries "fall off the bottom" 
... do their rankings expire?"


It will be entirely dependent on the scheme. In a priority ranking 
scheme (measuring the relative importance of an entry), having an entry 
"fall off the bottom" would have no effect on the overall 
ordering/ranking of the feed. In a natural order ranking scheme (indexed 
position), having an entry "fall off the bottom" would likely mean that 
the entry is no longer a part of the ordered list or is no longer 
relevant to the rankings.


Re: Thomas Broyer's suggestions:

>1. get rid of your i:rank, users will use any extension element 
instead (no more
> registry and you can still define "standard" priority and index 
extensions)


I considered this but a single extensible rank element fits most of the 
simple use cases for this rather well. That said, allowing for specific 
ranking elements would be helpful, so how about a bit of a compromise?


rankingCommonAttributes =
attribute i:scheme { IRI },
attribute i:domain { IRI }?

rankingConstruct =
rankingCommonAttributes

integerRank = element i:rank {
rankingConstruct (nonNegativeInteger)
}

With this approach, i:rank is defined as the standard nonNegativeInteger 
ranking element. If I so desired, I could easily define new 
rankingConstructs however, for instance:


importanceRank = element x:importance {
rankingConstruct ('critical' | 'high' | 'medium' | 'low' | 'info'|)
}

high


>2. get rid of your @order attribute: users should be able to choose in 
which

Re: FYI: Updated Index draft

2005-09-21 Thread James Holderness


I had considered something along those lines, but it seemed to me to be a 
bit vague. I suspect it would produce adequate results in the majority of 
cases, but I'd prefer something that gave the content provider finer 
control. I like the idea of being able to say exactly where in a feed an 
item should be positioned. Then again I'm not a content provider so maybe 
that's not the sort of thing they're looking for.


Eric Scheid wrote:

thinking more ... I think the way to handle this is that the client
application could weight the ranking with the age of the item, and thus a
rank#1 item would appear near the top of the list, and then slowly drop
away.

You also get to know what the original ranking for an item is.




Re: FYI: Updated Index draft

2005-09-21 Thread Eric Scheid

On 21/9/05 9:35 PM, "James Holderness" <[EMAIL PROTECTED]> wrote:

> Marking entries as having no rank sounds like a nice idea, but I don't think
> it's feasible in the long run.

thinking more ... I think the way to handle this is that the client
application could weight the ranking with the age of the item, and thus a
rank#1 item would appear near the top of the list, and then slowly drop
away.

You also get to know what the original ranking for an item is.

e.



Re: FYI: Updated Index draft

2005-09-21 Thread James Holderness


Marking entries as having no rank sounds like a nice idea, but I don't think 
it's feasible in the long run. In order to erase ranking effectively from 
previous entries, the content provider needs to double their feed size 
potentially. And if a user misses out on a "rank update" they could end up 
with news items from the distant past sitting at the top of their ranking 
forever. Admittedly you already have the problem of losing items that have 
fallen off the bottom of a feed, but at least the feed remains readable. 
With a corrupted ranking system, the ranking effectively becomes useless.


One possible solution may be the use of a rank-offset tag. Let's say you 
have three items A, B and C with A being the most important (rank 1) and C 
being the least important (rank 3). You start with a rank-offset of 0 and 
your feed looks like this:


A:1 B:2 C:3 (rank-offset = 0)

Now say you want to add a new item D that falls between A and B, but you 
still only want to include 4 items in your feed. You increment the 
rank-offset to 1 and reconstruct the feed with new ranks which now look like 
this:


A:1 D:2 B:3 (rank-offset = 1)

When the client receives that feed, it automatically subtracts the 
rank-offset from each item's rank value before adding them to its database. 
So internally its list of items now look like this:


A:0 D:1 B:2 C:3

A's rank has been updated. D has been inserted. B and C remain unchanged.

From the client's point of view the ranking numbers will start going 
negative almost immediately, but as long as you treat the lowest (signed) 
value as having the highest priority it shouldn't be a problem. And the 
rankings that actually appear in the feed are always nice small positive 
integers. The rank-offset will get large over time (and I don't see how it 
can be reset), but that's just one tag.


Eric Scheid wrote:
The only way out of this conundrum is that bbc.co.uk will have to update 
the

original #1 and #2 stories and re-rank them as much lower. If they re-rank
them as #46 and #47 then they will need to re-rank any previous entry at
those ranks to lower positions, and similarly for any other entries with
ranks which get pushed down. Eventually the entire history of the feed 
needs

to be re-ranked.

Unless entries can be marked as having no rank. Can they?




Re: FYI: Updated Index draft

2005-09-21 Thread Eric Scheid

On 21/9/05 1:05 PM, "James M Snell" <[EMAIL PROTECTED]> wrote:

> The ranking is part of the entry metadata.  If an entry falls off the
> feed, there is no effect on the ranking metadata.  With partial feed
> retrieval, ordering could be performed over the entire set of entries.

How does this help (eg) bbc.co.uk order their news items in some sensible
manner?

Today, they have a couple of important stories, they indicate those entries
are rank #1, #2. Tomorrow, they have more news, but not more important than
yesterday's big news. The day after they have a new big story, it should be
rank #1. The #1 and #2 stories from two days ago have fallen off the bottom
of the feed. 

The only way out of this conundrum is that bbc.co.uk will have to update the
original #1 and #2 stories and re-rank them as much lower. If they re-rank
them as #46 and #47 then they will need to re-rank any previous entry at
those ranks to lower positions, and similarly for any other entries with
ranks which get pushed down. Eventually the entire history of the feed needs
to be re-ranked.

Unless entries can be marked as having no rank. Can they?

e.



Re: FYI: Updated Index draft

2005-09-20 Thread James M Snell


Eric Scheid wrote:


On 21/9/05 5:18 AM, "James M Snell" <[EMAIL PROTECTED]> wrote:

 


For instance

 
   ...
   10
 
 
   ...
   5
 

   



What happens when entries "fall off the bottom" ... do their rankings
expire? How does that work with the diff+Feed method of partial feed
retrieval?

e.

 

The ranking is part of the entry metadata.  If an entry falls off the 
feed, there is no effect on the ranking metadata.  With partial feed 
retrieval, ordering could be performed over the entire set of entries.


That is:
 
 
   feed 2
   
 A
 1
   
   
 B
 3
   
 


 
   
 C
 2
   
   
 D
 4
   
 

The order for feed 1 is: A, B
The order for feed 2 is: C, D
Full Reconstructed order: A, C, B, D

- James



Re: FYI: Updated Index draft

2005-09-20 Thread Eric Scheid

On 21/9/05 5:18 AM, "James M Snell" <[EMAIL PROTECTED]> wrote:

> For instance
> 
>   
> ...
> 10
>   
>   
> ...
> 5
>   
> 

What happens when entries "fall off the bottom" ... do their rankings
expire? How does that work with the diff+Feed method of partial feed
retrieval?

e.



Re: FYI: Updated Index draft

2005-09-20 Thread Thomas Broyer


James M Snell wrote:

Complete example

...
priority
index
order="descending">http://www.example.com/ranking/foo


...
C
10
3
http://www.example.com/ranking/foo";>30



[…]

Thoughts?
It looks more and more like Microsoft's RSS simple list extension [1], 
and I think they had the good approach (define sorts on the feed 
metadata, based on extension element values at the entry level) but a 
bad technical solution (use the extension element in a different 
context: when in cf:sort, it has a non-namespaced data-type attribute 
and its content is a "label" string, while in an entry it might not have 
attributes and its value should be of type specified by the @data-type 
attribute seen before).


Suggestions:
1. get rid of your i:rank, users will use any extension element instead 
(no more registry and you can still define "standard" priority and index 
extensions)
2. get rid of your @order attribute: users should be able to choose in 
which order they want their entries: best-ranked to least-ranked "top to 
bottom" or "bottom to top". Its the responsibility of the producer to 
provide labels and values that will be well-understood by users (e.g. 
not saying "stars" and ranking from 1 (best rank) down to 5: "stars" 
implies "number of stars", so "sort by stars in ascending order" implies 
"the highest the value, the better it is", which is not what's behind 
1=best-rank…)

3. make content of i:raking a user understandable label
4. (optional) add a data-type attribute to i:ranking (maybe rename that 
one to something related to sorting, not ranking)
5. use @namespace and @localname attributes on i:ranking to describe the 
element in entries the sort applies to (using those attributes prevent 
from using QNames in attribute values, which doesn't work great with 
prefix changes)



http://example.com/user-review"; 
localname="stars">User-reviews stars

…

http://example.com/user-review";>1
…

This, however, doesn't match "index" in the draft title any more.

What could be even better, though a lot less "simple" (and not feasible, 
see below), would be to use XPath or XPointer (XPointer has the 
advantage that you define namespace prefix bindings "inside" it , using 
the xmlns() XPointer scheme). That way, you could use any 
element/subelement and/or attribute as the value holder for the sort. 
This would require however an XPath/XPointer engine, as well as storing 
the XML DOM, or mapping XPath/XPointer to your internal feed 
representation; this is not feasible.


[1] 
http://msdn.microsoft.com/windowsvista/building/rss/simplefeedextensions/


--
Thomas Broyer




Re: FYI: Updated Index draft

2005-09-20 Thread James M Snell


Eric Scheid wrote:


On 15/9/05 6:06 AM, "David Powell" <[EMAIL PROTECTED]> wrote:

 


Eg - An Atom library or server that doesn't know about this extension
is free to not preserve the entry order, and yet to retain the
 element, even though this will have corrupted the data.
   



very good point.

 

Indeed.  And it is a point that signals a show stopper for the approach 
taken in the draft.  As an alternative, I'm considering a alternate 
approach that places order metadata within the entry.


For instance
 
   
 ...
 10
   
   
 ...
 5
   
   
 ...
 15
   
 

The i:rank element is a non-negative integer.  Consumers of the feed may 
use the rank as a key for sorting entries.  Because different rankings 
may be relevant in different domains, the i:rank element will support a 
scheme attribute whose value is either a name or an IRI identifying a 
ranking scheme.  The built in schemes are "priority" and "index".  
"priority" is used to rank the relevative importance of the entry (the 
higher the value, the higher the importance).  "index" is used to 
specify a natural order for entries.  New scheme names can be 
standardized through IANA registration.  IRI values can be used to 
identify non-standardized schemes.  If the scheme attribute is missing, 
the value is assumed to be "index".


For instance
 
   
 ...
 10
 1
 http://www.example.com/ranking/foo";>100
   
   
 ...
 5
 2
 http://www.example.com/ranking/foo";>50
   
   
 ...
 0
 3
 http://www.example.com/ranking/foo";>30
   
 

On the feed level, metadata can be specified to indicate which ranking 
schemes are intended to be applied to the entries in the feed.  These 
are generally informative and do no rely on the actual order of the entries


For instance
 
   ...
   priority
   index
   order="descending">http://www.example.com/ranking/foo

 

The value of the i:ranking element is the name or IRI of a ranking 
scheme.  The default attribute (value 'yes' or 'no') is used to indicate 
which ranking scheme should be considered the default.  Only one 
i:ranking with @default="yes" is allowed within a feed. The order 
attribute (value 'ascending' or 'descending') is used to indicate the 
default sort order for that ranking scheme. 

If a feed contains a particular i:ranking scheme, it's entries SHOULD 
contain corresponding i:rank elements. Entry elements that do not have 
i:rank elements for a particular scheme must be presented at the end of 
the presentation list (see the example below)


Complete example
 
   ...
   priority
   index
   order="descending">http://www.example.com/ranking/foo

   
 ...
 C
 10
 3
 http://www.example.com/ranking/foo";>30
   
   
 ...
 A
 5
 1
 http://www.example.com/ranking/foo";>50
   
   
 ...
 B
 0
 2
 http://www.example.com/ranking/foo";>100
   
   
 ...
 D
   
 

Ordered descending by priority): C, A, B, D
Ordered ascending by index: A, B, C, D
Ordered descending by http://www.example.com/ranking/foo: B, A, C, D

Thoughts?

- James



Re: FYI: Updated Index draft

2005-09-15 Thread Eric Scheid

On 15/9/05 6:06 AM, "David Powell" <[EMAIL PROTECTED]> wrote:

> Eg - An Atom library or server that doesn't know about this extension
> is free to not preserve the entry order, and yet to retain the
>  element, even though this will have corrupted the data.

very good point.

e.



Re: FYI: Updated Index draft

2005-09-14 Thread James M Snell


David,

Excellent comments. 


David Powell wrote:


How will this interact with the sliding-window/feed-history
interpretation of feeds? The natural order assigned by this extension
seems incompatible with the implied date order that would be implied
by two feed documents, polled over some period of time.

What should be the order of a merged feed history such as this:

Poll 1:
feed(e1, e2, e3)

Poll 2:
feed(e3, e1, e5)

- where, perhaps, 3 and 1 have been updated. How do you combine
entries sorted by their natural order, with the time-ordered feed
history?

 

Natural ordering and time-ordering are, by the very nature, opposing 
views -- unless of course, the natural ordering and time-ordering just 
happen to coincide with one another (by chance or design) . 

Using the terminology from Mark Nottingham's Feed History extension, 
naturally ordered feeds are likely to also be non-incremental feeds.  
For instance, my NetFlix.com Queue Feed is clearly intended to be an 
ordered, non-incremental feed.  The feed presents it's entire state.  
There is no history.  The ordering of the items in the feed is 
significant. 

I believe that it is safe to assert that ordered feeds should be 
presumed to be non-incremental.  I will hold off on making that a 
normative assertion, however, due simply because there is no evidence 
that natural ordering *can't* be preserved across multiple feeds.



How will this interact with entry documents, eg over pubsub.

What about Atom Protocol - I can't imagine how I would publish a feed
with a given natural order. For something like the BBC feeds, some
sort of arbitrary "score" field might be more interoperable with both
entry documents, Atom protocol, and feed history.

 

This is definitely something I've been thinking about -- that is, how to 
use the Atom protocol to edit an ordered collection.  Without the 
introduction of a specific metadata field within the entry itself, the 
only potential option is a pub:control parameter that specifies the 
ordering index for the entry.  At this time I simply do not yet know if 
that is the right approach. I'll need to experiment a bit more.



I'm probably on my own, but I expected Atom's statement that "This
specification assigns no significance to the order of atom:entry
elements within the feed" was non-negotiable and couldn't be changed
by extensions. This seems more like potential Atom 1.1 material to me
- it doesn't seem to layer on top of the Atom framework so much as
slightly rewrite part of it.

 

As far as feed processing is concerned I agree that the ordering of 
atom:entry elements is not significant, even if the feed does contain 
.  The ordered extension is a flag that helps applications 
interpret the intention of the feed.  For example, there is a clear 
distinction of intent between my weblogs feed and my NetFlix Queue 
feed.  While both can be treated the same under the covers, having some 
sort of clue as to how the two should be presented to the user is helpful.



Eg - An Atom library or server that doesn't know about this extension
is free to not preserve the entry order, and yet to retain the
 element, even though this will have corrupted the data.

 


Agreed that this is a valid issue. Let me stew over this one.


I think that as implemented, this extension wouldn't be safe to deploy
without must-understand extensions, which Atom 1.0 doesn't support.


Ordered feeds are a useful problem though. Indexes or scores on
entries might work better with entry documents, the protocol, and with
the Atom extension framework, but it still isn't clear how they would
interact with the sliding window.

 

Nor is it clear how it could work in aggregation scenarios.  e.g. what 
happens if the entry contains an index and is aggregated into a feed 
that has another entry with a conflicting index?



A couple more minor points:

I'm not sure whether the descending/ascending attribute is necessary?
Given that the extension just presents a natural order (by some
unnamed ordering), why would anyone go to the trouble of presenting
the entries in reversed order, and then label them as descending; why
not just present them in ascending order to begin with?

 

Agreed. I actually had the same thought the other day and had a "Boy, 
that was silly" head-slap moment.



Would it be useful for the extension to allow the natural ordering to
be named? - so if the ordering is by "Importance", or "Order of
real-life events", or something else, then it could labelled with a
URI and/or label, so that people don't have to guess the significance
of the natural order.

 

Interesting thought.  Correct me if I'm wrong, but this would look 
something like:


 http://www.example.com/ordered/by/priority

With each entry having something like a corresponding priority element 
(just an example)


 70

Or

 http://www.example.com/ordered/by/position

Or ... whatever else

The bottom line would be that the URI value of the ordered element would 
indicate 

Re: FYI: Updated Index draft

2005-09-14 Thread David Powell


Monday, September 12, 2005, 5:55:20 PM, James M Snell wrote:

> I've updated the draft that defines an extension that can be used to 
> indicate that the order of entries within a Feed should be considered 
> significant.

How will this interact with the sliding-window/feed-history
interpretation of feeds? The natural order assigned by this extension
seems incompatible with the implied date order that would be implied
by two feed documents, polled over some period of time.

What should be the order of a merged feed history such as this:

Poll 1:
feed(e1, e2, e3)

Poll 2:
feed(e3, e1, e5)

- where, perhaps, 3 and 1 have been updated. How do you combine
entries sorted by their natural order, with the time-ordered feed
history?


How will this interact with entry documents, eg over pubsub.

What about Atom Protocol - I can't imagine how I would publish a feed
with a given natural order. For something like the BBC feeds, some
sort of arbitrary "score" field might be more interoperable with both
entry documents, Atom protocol, and feed history.


I'm probably on my own, but I expected Atom's statement that "This
specification assigns no significance to the order of atom:entry
elements within the feed" was non-negotiable and couldn't be changed
by extensions. This seems more like potential Atom 1.1 material to me
- it doesn't seem to layer on top of the Atom framework so much as
slightly rewrite part of it.

Eg - An Atom library or server that doesn't know about this extension
is free to not preserve the entry order, and yet to retain the
 element, even though this will have corrupted the data.

I think that as implemented, this extension wouldn't be safe to deploy
without must-understand extensions, which Atom 1.0 doesn't support.


Ordered feeds are a useful problem though. Indexes or scores on
entries might work better with entry documents, the protocol, and with
the Atom extension framework, but it still isn't clear how they would
interact with the sliding window.


A couple more minor points:

I'm not sure whether the descending/ascending attribute is necessary?
Given that the extension just presents a natural order (by some
unnamed ordering), why would anyone go to the trouble of presenting
the entries in reversed order, and then label them as descending; why
not just present them in ascending order to begin with?

Would it be useful for the extension to allow the natural ordering to
be named? - so if the ordering is by "Importance", or "Order of
real-life events", or something else, then it could labelled with a
URI and/or label, so that people don't have to guess the significance
of the natural order.

-- 
Dave



FYI: Updated Index draft

2005-09-12 Thread James M Snell


I've updated the draft that defines an extension that can be used to 
indicate that the order of entries within a Feed should be considered 
significant.


http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-index-02.txt

Example,

http://www.w3.org/2005/Atom";
 xmlns:fi="http://purl.org/syndication/index/1.0";>
 ...
 
 
   tag:entry:1
   ...
 
 
   tag:entry:2
   ...
 
 
   tag:entry:3
   ...
 



The fi:ordered element indicates that the order of the entries as 
presented in the feed should be considered to be significant.  The @sort 
attribute indicates the default sort order for those entries.  A value 
of "descending" indicates that the entries should be presented 
last-to-first.  A value of "ascending" indcates that the entries should 
be presented first-to-last.



- James



FYI: Updated Index draft

2005-09-12 Thread James M Snell


I've updated the draft that defines an extension that can be used to 
indicate that the order of entries within a Feed should be considered 
significant.


http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-index-02.txt

Example,

http://www.w3.org/2005/Atom";
 xmlns:fi="http://purl.org/syndication/index/1.0";>
 ...
 
 
   tag:entry:1
   ...
 
 
   tag:entry:2
   ...
 
 
   tag:entry:3
   ...
 



The fi:ordered element indicates that the order of the entries as 
presented in the feed should be considered to be significant.  The @sort 
attribute indicates the default sort order for those entries.  A value 
of "descending" indicates that the entries should be presented 
last-to-first.  A value of "ascending" indcates that the entries should 
be presented first-to-last.



- James