Re: Structured Publishing -- Joe Reger shows the way...

2005-09-22 Thread Kevin Marks



On Sep 21, 2005, at 11:36 AM, Danny Ayers wrote:

On 9/12/05, Bob Wyman <[EMAIL PROTECTED]> wrote:

I believe it doesn't make sense for us to add data-carrying 
elements
to Atom other than atom:content or atom:summary. Atom provides a 
definition
of a collection of entries and it provides the entry format. Frankly, 
it

should stop there. The data payload should be carried in the content
element.


I believe the ability to include data outside the content is likely to
be useful, and may even be essential in some republishing scenarios
where additional metadata about the payload is required. But that's
not to say the transport-only view of Atom doesn't offer big
advantages in the Structured Blogging kind of scenario where the data
can be neatly packaged, relatively opaquely to the rest of the entry
data. (Atom as SOAP lite?)


I agree with Bob rather than Danny, except that I'd advocate making the 
metadata part of the XHTML content. Using Atom as a rich envelope in 
this way combines very well with the Microformat approach of retaining 
structure in XHTML.


For the example of lists of information, given earlier, the XOXO 
microformat is ideal, as it can degrade gracefully for all viewers. 
Microformat aware viewers can extract the structure, HTML viewers can 
display it in a clear human readable form, and even plain-text viewers 
(assuming they have enough nous to strip stuff between <>) will have 
the core content.


http://microformats.org
http://microformats.org/wiki/xoxo



Re: FYI: Updated Index draft

2005-09-22 Thread Mark Nottingham



On 14/09/2005, at 1:06 PM, David Powell wrote:


How will this interact with the sliding-window/feed-history
interpretation of feeds? The natural order assigned by this extension
seems incompatible with the implied date order that would be implied
by two feed documents, polled over some period of time.

What should be the order of a merged feed history such as this:

Poll 1:
feed(e1, e2, e3)

Poll 2:
feed(e3, e1, e5)

- where, perhaps, 3 and 1 have been updated. How do you combine
entries sorted by their natural order, with the time-ordered feed
history?


There'd need to be an algorithm described for combing the feed  
documents; e.g., see the _combine() method in http://www.mnot.net/rss/ 
history/feed_history.py. In practice, most/all(?) popular aggregators  
do this now (feed history + natural order); the only change is that  
the algorithm would be documented and well-understood (which IMO  
would be a vast improvement, *if* we can agree on one... or more).


With the rank approach, you'd probably need to say that the ranks  
were valid within the scope of a single feed document, and then  
describe the relations between ranks in different feed documents. Not  
sure that's as interesting.


--
Mark Nottingham http://www.mnot.net/



Re: FYI: Updated Index draft

2005-09-22 Thread Mark Nottingham



On 14/09/2005, at 1:06 PM, David Powell wrote:


I'm probably on my own, but I expected Atom's statement that "This
specification assigns no significance to the order of atom:entry
elements within the feed" was non-negotiable and couldn't be changed
by extensions. This seems more like potential Atom 1.1 material to me
- it doesn't seem to layer on top of the Atom framework so much as
slightly rewrite part of it.


Strictly read, this doesn't preclude other specifications /  
extensions from adding semantics to the ordering of entries -- it  
only says that *this* spec doesn't assign any meaning to it. That was  
the intent as I recall it.



Eg - An Atom library or server that doesn't know about this extension
is free to not preserve the entry order, and yet to retain the
 element, even though this will have corrupted the data.


That is indeed a problem. Probably the easiest way to fix this would  
be in errata, by adding a statement like "Some feeds may implicitly  
or explicitly (through extensions) have meaning assigned to the  
ordering of entries, so intermediaries SHOULD NOT reorder them."



I think that as implemented, this extension wouldn't be safe to deploy
without must-understand extensions, which Atom 1.0 doesn't support.


That would be another way to go, but people didn't want mU.

Cheers,

--
Mark Nottingham http://www.mnot.net/



Re: FYI: Updated Index draft

2005-09-22 Thread Antone Roundy


On Thursday, September 22, 2005, at 10:20  AM, James M Snell wrote:

Antone Roundy wrote:
I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of 
whether ranking domains that span multiple feeds will be useful or 
not. In the movie ratings use case presented below, perhaps rather 
than a fivestarts scheme and netflix and amazon domains, it might 
make more sense to do this:


Using atom:id as the ranking domain would limit the ranking to a 
single feed which is useful, but does not cover the full range of 
cases.

...

Yes, there are two special cases here:

1. Lack of a i:domain
2. i:domain value that is a same document reference


I think a ranking without a domain is pretty much useless--or at least 
likely to lead to problems downstream--so that case doesn't need to be 
covered.  More on that below.



 
   ...
   
 
   Feed1
   # 
   
 A
 50
 20
   
   
 B
 25
 40
   
 
 
   Feed2
   # 
   
 C
 50
 30
   
   
 D
 25
 10
   
 
   
 


In this example, the domainless rankings were added when the XHTML 
document was created, right?  So the XHTML document is essentially an 
aggregate feed, just not in Atom format.  Would it not make as much or 
more sense to mint an ID for the document (call it the ID of a "virtual 
Atom Feed Document" if you don't actually create an aggregate feed) and 
use it to scope those i:rank elements?  If, somehow, someone were to 
pull the atom:feeds out of the XHTML document (if atom:feed getting 
embedded into xhtml:body is going to happen, then is not atom:feed 
getting extracted from xhtml:body also likely?) and aggregate them with 
other feeds with domainless i:rank elements, the scopes of those 
elements would get mixed.


* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would 
be included like this (so that if these entries were aggregated, it 
would be clear that the i:order elements were relevant to the source 
feed, not the aggregate feed):


The goal of @scheme is to identify the type of ranking to apply while 
the goal of @domain is to identify the scope of the ranking.  I do not 
believe that it is a good idea to conflate the two.


Okay, I've come to agree with that while writing and editing this 
message.  Note however that "fivestar" also indicates multiple things:


1) Higher numbers are "better"
2) The range is 0 to 5 (BTW, if this is limited to integers, how will 
you handle things like 3.5 stars, which are common in that type of 
rating system? Maybe decimal values need to be allowed.)

3) Hint: you might want to display the value as stars

#1 is the only one needed for sorting of entries. #2 would be useful if 
the feed reader wanted to display some sort of graphical element to 
indicate the ranking. #3 might be slightly useful, but except for the 
most popular schemes, would probably be ignored. Perhaps all of these 
should be separated, a la:



...

3

...where @domain is the feed/id of the feed if there's just one feed in 
scope, or a value that won't be duplicated by any feed/id otherwise (if 
one can mint a unique feed id, surely one can also mint a unique id 
that won't be used for a feed).


I'd suggest that i:ranking-scheme/@domain either default to the 
containing feed/id (or the one from atom:source, if it exists) or be 
required, i:rank/@domain be required, @order default to ascending, 
@min-value default to 0, and the rest of the attributes be optional 
with no defaults.




Re: FYI: Updated Index draft

2005-09-22 Thread James M Snell


Antone Roundy wrote:



On Wednesday, September 21, 2005, at 11:43  PM, James M Snell wrote:



{domain}


I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of whether 
ranking domains that span multiple feeds will be useful or not. In the 
movie ratings use case presented below, perhaps rather than a 
fivestarts scheme and netflix and amazon domains, it might make more 
sense to do this:


Using atom:id as the ranking domain would limit the ranking to a single 
feed which is useful, but does not cover the full range of cases.


Later on in your note, you say:

If sticking with i:domain, I'd recommend that you recommend that in 
cases where a ranking domain does not span multiple feeds, the feed/id 
value be used for the value of i:domain, and that in all cases, the 
same care be taken to (attempt to) ensure that i:domain's value is 
unique to what is intended to be a particular domain.




Yes, there are two special cases here:

1. Lack of a i:domain
2. i:domain value that is a same document reference

In the first case, I had imagined a "Default Ranking Domain" that is 
identified by the feed atom:id element, just as you suggest.
In the second case, I had imagined a "Document Ranking Domain" that is 
identified by the document containing the feed. 

There is a subtle difference between these two.  Consider the following 
(somewhat contrived) example:


 
   ...
   
 
   Feed1
   # 
   
 A
 50
 20
   
   
 B
 25
 40
   
 
 
   Feed2
   # 
   
 C
 50
 30
   
   
 D
 25
 10
   
 
   
 

The two embedded atom:feed elements specify two ranking domains: The 
Default Ranking Domain and a Document Ranking Domain.  The Default 
Ranking Domain is scoped to the individual atom:feed as is identified by 
the value of the atom:id.  the Document Ranking Domain is scoped to the 
containing document.


The Default Ranking Domain ranks may only be used to order the entries 
within the containing atom:feed: 
 sort_ascending ( Feed1 ) = B, A

 sort_ascending ( Feed2 ) = D, C

The Document Ranking Domain ranks may be used to order all entries 
appearing within the document

 sort_ascending ( Document ) = D, A, C, B

In an Atom Feed Document, the Default Ranking Domain and the Document 
Ranking Domain happen to be identical.




urn:my_reviews
descending
descending


Movie A
3
4


Movie B
2
1



Notes:
* The i:order element tells the user agent whether higher or lower 
numbers are considered "better", "higher priority", "first", or 
whatever. In these cases, higher numbers are better, so would 
typicially be shown first, so they're considered a "descending" schemes.


Hmm.. I wanted to get away from doing this kind of thing.

* i:order/@label indicates a human readable label for the scheme, and 
could be optional.
* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would be 
included like this (so that if these entries were aggregated, it would 
be clear that the i:order elements were relevant to the source feed, 
not the aggregate feed):


The goal of @scheme is to identify the type of ranking to apply while 
the goal of @domain is to identify the scope of the ranking.  I do not 
believe that it is a good idea to conflate the two.


- James



Re: FYI: Updated Index draft

2005-09-22 Thread James M Snell


James Holderness wrote:



James M Snell wrote:

This could all get rather complicated very quickly. My primary 
objective is to address known use cases for ordered feeds (my netflix 
queue feed[1] for example), most of which are structured as complete 
datasets that are non-incremental in nature.



I realise that this sort of thing sounds like a good idea from a 
content provider's point of view, but as an aggregator developer, this 
is probably the last thing I would want to support. A feed that is not 
incremental is not a feed IMHO. There are just too many special case 
complications that an aggregator developer has to deal with that have 
nothing to do with regular, honest-to-goodness feeds.


I do believe this falls under the Not-All-Feeds-Should-Be-Aggregated 
Category.  That said, however, I think the concept of Feed-As-List is 
one that generally has a lot of support.


1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain 
is aggregated into a synthetic feed that either a) does not specify a 
ranking domain or b) specifies a different ranking domain, consumers 
can safely ignore the urn:foo i:rank.


This kind of makes sense, but I'm not convinced it's necessary. If the 
feed has various ranks on which it can be sorted, I'd rather leave the 
decision on which one to use to the user. If, for whatever reason, 
those alternate domains are no longer applicable and the feed 
absolutely has to force the use of a particular domain, wouldn't it 
make more sense to filter out all those unused ranks rather than 
making the user download them?


i:domain is not used as a key of determining which rankings to use; it's 
a key that is used to correlate rankings.  Regarding filtering, we 
should not rely on aggregators filtering out unused ranks.  Consider the 
case of digitally signed entries; filtering out a rank covered by the 
digital signature would invalidate the signature.


2. It helps us to correlate ranks that span multiple feed documents. 
For instance, two separate feed documents may specify the same 
ranking domain.



This I like.

By the description given, it sounds as if the BBC ranking is more a 
ranking of relative importance than a ranking of natural order. That 
is, Top Story A has a higher importance that Top Story B, etc. If 
that is the case, a "priority" or "importance" ranking scheme can be 
used in conjunction with the atom:updated element.



This almost works. As an aggregator, what I would want to do is 
automatically sort with the date as the primary key and the priority 
as the secondary key. That way, today's high-priority items would 
appear at the top of the list, and yesterday's would follow on 
afterwards. Any of yesterday's items that were still of some 
importance today would need to have their atom:updated element set to 
today and their priority adjusted as appropriate.


There are couple of problems though. The atom:updated element has to 
be identical for all items on a particular day. Also the atom:updated 
element can't be changed when an actual update occurs (say a spelling 
correction, or an update on a story) without breaking the ordering. 
The problem is we're abusing the atom:updated element so as to use it 
for something that's it's not.


The updated elements would not need to be identical.  Aggregators can 
easily determine whether or not entries with different updated values 
occured on the same day / same hour / etc.  In other words, I could sort 
by Day+Priority, Hour+Priority, Minute+Priority, whatever, without any 
difficulty.  There is no abuse of atom:updated here.


It would be better if we could add an extra attribute to your rank tag 
that specified what date the rank applied to. For someone like the BBC 
that reprioritizes feeds on a daily basis they'd set this attribute to 
something like say midnight for the date on which the ranking applies. 
If you have an item from a previous day that is still important today, 
it would keep its original atom:updated value, but the rank-date would 
be set to today.


I'll give this some thought, but my initial gut reaction is that it is 
not necessary.  Let me see if I can convince myself otherwise ;-)




Regards
James


Thanks for the input!

- James




Re: FYI: Updated Index draft

2005-09-22 Thread James Holderness


James M Snell wrote:
This could all get rather complicated very quickly. My primary objective 
is to address known use cases for ordered feeds (my netflix queue feed[1] 
for example), most of which are structured as complete datasets that are 
non-incremental in nature.


I realise that this sort of thing sounds like a good idea from a content 
provider's point of view, but as an aggregator developer, this is probably 
the last thing I would want to support. A feed that is not incremental is 
not a feed IMHO. There are just too many special case complications that an 
aggregator developer has to deal with that have nothing to do with regular, 
honest-to-goodness feeds.


Are you supposed to automatically delete old items? With or without the 
users' consent? Do you archive old items in some way? How do you handle the 
aggregation of items from multiple non-incremental feeds into a single feed? 
How do you handle the aggregation of items from multiple feeds some of which 
are incremental and some of which are complete datasets? How do you handle 
filtering that results in a subset of items from what is supposed to be a 
complete dataset?


That said, I suspect I'm fighting a losing battle, and I do like this 
proposal as it applies to ranking of feeds in general.


1. It helps us to scope the relevance of an i:rank element within an 
entry. For instance, if an entry with an i:rank in the urn:foo domain is 
aggregated into a synthetic feed that either a) does not specify a ranking 
domain or b) specifies a different ranking domain, consumers can safely 
ignore the urn:foo i:rank.


This kind of makes sense, but I'm not convinced it's necessary. If the feed 
has various ranks on which it can be sorted, I'd rather leave the decision 
on which one to use to the user. If, for whatever reason, those alternate 
domains are no longer applicable and the feed absolutely has to force the 
use of a particular domain, wouldn't it make more sense to filter out all 
those unused ranks rather than making the user download them?


2. It helps us to correlate ranks that span multiple feed documents. For 
instance, two separate feed documents may specify the same ranking domain.


This I like.

By the description given, it sounds as if the BBC ranking is more a 
ranking of relative importance than a ranking of natural order. That is, 
Top Story A has a higher importance that Top Story B, etc. If that is the 
case, a "priority" or "importance" ranking scheme can be used in 
conjunction with the atom:updated element.


This almost works. As an aggregator, what I would want to do is 
automatically sort with the date as the primary key and the priority as the 
secondary key. That way, today's high-priority items would appear at the top 
of the list, and yesterday's would follow on afterwards. Any of yesterday's 
items that were still of some importance today would need to have their 
atom:updated element set to today and their priority adjusted as 
appropriate.


There are couple of problems though. The atom:updated element has to be 
identical for all items on a particular day. Also the atom:updated element 
can't be changed when an actual update occurs (say a spelling correction, or 
an update on a story) without breaking the ordering. The problem is we're 
abusing the atom:updated element so as to use it for something that's it's 
not.


It would be better if we could add an extra attribute to your rank tag that 
specified what date the rank applied to. For someone like the BBC that 
reprioritizes feeds on a daily basis they'd set this attribute to something 
like say midnight for the date on which the ranking applies. If you have an 
item from a previous day that is still important today, it would keep its 
original atom:updated value, but the rank-date would be set to today.


An aggregator supporting this extension could then sort on the rank-date as 
the primary key (descending) and the rank value itself as the secondary key. 
For feeds that don't change their priorities over time you can just leave 
this attribute out and the aggregator can sort on the rank value alone. I 
don't think it overly complicates the interface, but it does add significant 
value IMO.


Regards
James



Re: FYI: Updated Index draft

2005-09-22 Thread Antone Roundy


On Wednesday, September 21, 2005, at 11:43  PM, James M Snell wrote:


{domain}
I was thinking yesterday of suggesting that feed/id be used the way 
you're using i:domain. Which is better is probably a matter of whether 
ranking domains that span multiple feeds will be useful or not. In the 
movie ratings use case presented below, perhaps rather than a 
fivestarts scheme and netflix and amazon domains, it might make more 
sense to do this:



urn:my_reviews
descending
descending


Movie A
3
4


Movie B
2
1



Notes:
* The i:order element tells the user agent whether higher or lower 
numbers are considered "better", "higher priority", "first", or 
whatever. In these cases, higher numbers are better, so would 
typicially be shown first, so they're considered a "descending" schemes.
* i:order/@label indicates a human readable label for the scheme, and 
could be optional.
* Since the urn:(netflix|amazon).com/reviews schemes are feed 
independent, it is not necessary to indicate a feed (or "domain") in 
this case.
* For a feed-specific scheme, like natural order, the feed ID would be 
included like this (so that if these entries were aggregated, it would 
be clear that the i:order elements were relevant to the source feed, 
not the aggregate feed):



urn:my_feed
ascending

urn:my_feed/a
1


urn:my_feed/b
2



If sticking with i:domain, I'd recommend that you recommend that in 
cases where a ranking domain does not span multiple feeds, the feed/id 
value be used for the value of i:domain, and that in all cases, the 
same care be taken to (attempt to) ensure that i:domain's value is 
unique to what is intended to be a particular domain.