Re: Paging, Feed History, etc.
2006/6/8, James Holderness [EMAIL PROTECTED]: Mark Nottingham wrote: Are you talking about using ETag HTTP response headers, If-Match request headers, and 304 Not Modified response status codes? That's a gross misapplication of those mechanisms if so, and this will break intermediaries along the path. For the first page I'm talking about an Etag (or Last-Modified) HTTP response header and If-None-Match (or If-Modified-Since) request headers for the retrievals a month later. What you described is RFC3229 w/ feeds [1], but you failed to include the new request and response headers and the specific status code, which are necessary because you're changing the behaviour of If-None-Match and 304 (Not Modified) as defined in HTTP/1.1. For page two onwards the state information (date, query and page number) comes from the link urls returned by the first page. That means you need to keep entry revisions as well, so that if an entry is updated while a client is navigating the paged result set, it is sent the old revision (corresponding to the date parameter). Even if it's cast as a query parameter in the URI (for example), it requires query support on the server side, a concept of discovered time (as you point out), and places constraints on the ordering of the feed. The ordering is not necessarily important. As long as the server can filter out entries that don't match a specific time criteria it can return those entries in any order. Yes, ordering is not important. If ranking is necessary, then use the Feed Rank extension (but that means that potentially a great number of entries will be sent back as modified in 226 (IM Used) responses just because their ranking has changed) Are you proposing this instead of the mechanism currently described in FH? Alongside it? What I'm proposing would work with the FH as currently specified as long as the client supported ETag or Last-Modified as well. For me that means no change at all. You're trying to change HTTP/1.1 behaviour wrt the If-None-Match request-header field and the 304 (Not Modified) status code, so you need to implement RFC3229 w/ feeds (which means dealing with some new headers and a new status code). As I already said, I highly suggest not using paging for 226 (IM Used) responses and rather fall back to standard GET in case there are too many changes (i.e. behaving the same way as servers that don't support RFC3229 w/ feeds). My main concern is that RFC3229 w/ feeds is being deployed more and more widely and is still not even an I-D (or I missed something). Maybe FH could be the place to spec it, as another optimization algorithm… [1] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html -- Thomas Broyer
Re: Paging, Feed History, etc.
Thomas Broyer wrote: What you described is RFC3229 w/ feeds [1], but you failed to include the new request and response headers and the specific status code, which are necessary because you're changing the behaviour of If-None-Match and 304 (Not Modified) as defined in HTTP/1.1. Yep. Sorry, forgot to mention that. That means you need to keep entry revisions as well, so that if an entry is updated while a client is navigating the paged result set, it is sent the old revision (corresponding to the date parameter). Why? If an entry has been revised either don't send it (they'll get it then next time they refresh), or send it anyway (they'll just get it again the next time they refresh). Is that such a big deal? Or am I missing something? Yes, ordering is not important. If ranking is necessary, then use the Feed Rank extension (but that means that potentially a great number of entries will be sent back as modified in 226 (IM Used) responses just because their ranking has changed) I would have thought IM only applied to the first page. All subsequent pages have a specific query that includes the query, page and time. You're not sending back a partial result in that case. What I'm proposing would work with the FH as currently specified as long as the client supported ETag or Last-Modified as well. For me that means no change at all. You're trying to change HTTP/1.1 behaviour wrt the If-None-Match request-header field and the 304 (Not Modified) status code, so you need to implement RFC3229 w/ feeds (which means dealing with some new headers and a new status code). No change at all for *me*. As in my client. I already support FH. I already support Etags. I already support 3229. As I already said, I highly suggest not using paging for 226 (IM Used) responses and rather fall back to standard GET in case there are too many changes (i.e. behaving the same way as servers that don't support RFC3229 w/ feeds). I don't get why this is a problem, but if you don't like it don't use it. All I'm saying is, if you're a search engine and you what to create subscribable paged results, this is a method that you can use right now, and it will work with at least one existing FH capable client (I suspect others too). The other proposal on the table is to change all your link names. Arguably a much better proposal than what I'm offering - it certainly seems to have got a lot of +1s - but it will work with precisely no one. Regards James
Re: Paging, Feed History, etc.
2006/6/8, James Holderness [EMAIL PROTECTED]: Thomas Broyer wrote: That means you need to keep entry revisions as well, so that if an entry is updated while a client is navigating the paged result set, it is sent the old revision (corresponding to the date parameter). Why? If an entry has been revised either don't send it (they'll get it then next time they refresh), or send it anyway (they'll just get it again the next time they refresh). Is that such a big deal? Or am I missing something? Sorry, I thought you wanted search engines to produce snapshots... (side note: but in this case, is there a need to pass a date parameter to following pages? and if pages are kind of live, isn't there a risk of data loss? –I mean, this is the Web, so you'll end up doing the request for each page, just returning different chunks of the result set; if an entry changes between the request to the first page and the retrieval a following page, your request might put it somewhere else in the result set, changing ordering of entries based on updated time stamps, discovery date, ranks or else, so your chunks would be different than if the entry hadn't changed, and an entry that have not been retrieved might end up in an already retrieved chunk by page number, hence the client missing an entry– I think this is Mark's concern: this might be an acceptable behaviour in some cases but not all) You're trying to change HTTP/1.1 behaviour wrt the If-None-Match request-header field and the 304 (Not Modified) status code, so you need to implement RFC3229 w/ feeds (which means dealing with some new headers and a new status code). No change at all for *me*. As in my client. I already support FH. I already support Etags. I already support 3229. OK, I though you only supported Etags, as defined by HTTP/1.1 for efficient caching and bandwidth saving. As I already said, I highly suggest not using paging for 226 (IM Used) responses and rather fall back to standard GET in case there are too many changes (i.e. behaving the same way as servers that don't support RFC3229 w/ feeds). I don't get why this is a problem, but if you don't like it don't use it. Yep, sorry, this is not a problem. All I'm saying is, if you're a search engine and you what to create subscribable paged results, this is a method that you can use right now, and it will work with at least one existing FH capable client (I suspect others too). So we agree ;-) Could you read my recent mails in this thread and confirm that it's the case? The other proposal on the table is to change all your link names. Arguably a much better proposal than what I'm offering - it certainly seems to have got a lot of +1s - but it will work with precisely no one. So there now are two -1, isn't it? ;-) -- Thomas Broyer
Re: Copyright, licensing, and feeds
* Karl Dubost [EMAIL PROTECTED] [2006-06-08 04:30]: Which will not remove abuse :) Well, will anything short of not publishing your content? I think the point of such an effort is to make life easier for third parties who want to respect your wishes, not to make it harder for third parties who are intent on violating them. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: when should two entries have the same id?
James M Snell wrote: That's not quite accurate. Two entries with the same atom:id may appear within the same atom:feed only if they have different atom:updated elements. The spec is silent on whether or not two entries existing in *separate documents* may have identical atom:id and atom:updated values. They're ids, not guids. Certainly I would expect that there'll be some accidental conflicts. For instance one site might number its posts post1, post2, post3,...; and a different, unrelated site might do the same. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: when should two entries have the same id?
On 8 Jun 2006, at 14:44, Elliotte Harold wrote: James M Snell wrote: That's not quite accurate. Two entries with the same atom:id may appear within the same atom:feed only if they have different atom:updated elements. The spec is silent on whether or not two entries existing in *separate documents* may have identical atom:id and atom:updated values. They're ids, not guids. Certainly I would expect that there'll be some accidental conflicts. For instance one site might number its posts post1, post2, post3,...; and a different, unrelated site might do the same. No, they are guids. The datatype for an id is a IRI, which is a generalisaiton of URI. IRIs are constructed in such a way that it should be easy to construct universally unique ones without ever having name clashes. If name clashes there are, this will either be due to incompetence or to malevolence. Henry -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/ cafeaulaitA/
Re: when should two entries have the same id?
Elliotte Harold schrieb: James M Snell wrote: That's not quite accurate. Two entries with the same atom:id may appear within the same atom:feed only if they have different atom:updated elements. The spec is silent on whether or not two entries existing in *separate documents* may have identical atom:id and atom:updated values. They're ids, not guids. Certainly I would expect that there'll be some accidental conflicts. For instance one site might number its posts post1, post2, post3,...; and a different, unrelated site might do the same. Sorry? That would be a bug. They *are* supposed to be globally unique. See: http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6. Best regards, Julian
Re: Copyright, licensing, and feeds
Very well stated, Aristotle!On 6/8/06, A. Pagaltzis [EMAIL PROTECTED] wrote: * Karl Dubost [EMAIL PROTECTED] [2006-06-08 04:30]: Which will not remove abuse :)Well, will anything short of not publishing your content?I think the point of such an effort is to make life easier for third parties who want to respect your wishes, not to make itharder for third parties who are intent on violating them.Regards,--Aristotle Pagaltzis // http://plasmasturm.org/ -- M:D/M. David Petersonhttp://www.xsltblog.com/
RFC3229 w/ feeds [was: Paging, Feed History, etc.]
On 2006/06/07, at 11:40 PM, Thomas Broyer wrote: My main concern is that RFC3229 w/ feeds is being deployed more and more widely and is still not even an I-D (or I missed something). I have that concern as well. I am also concerned that RFC3229 is an extension of HTTP, but some implementers are acting as if it chages the semantics of already- defined parts of HTTP. For example, a delta must be a subset of the current representation that is returned to a GET; if you GET the feed, it has to return all of the entries that you could retrieve by using delta. I have a feeling that many people are treating it as a dynamic query mechanism that's capable of retrieving any entry that's ever been in the feed, while still only returning the last n entries to a plain GET. If so, they're breaking HTTP, breaking delta, and should use something else. Is this the case, or am I (happily) mistaken? -- Mark Nottingham http://www.mnot.net/
Re: Paging, Feed History, etc.
Thomas Broyer wrote: Could you read my recent mails in this thread and confirm that it's the case? I'm sorry, but I can no longer participate in this discussion. I hope everything works out ok. Regards James
Re: Copyright, licensing, and feeds
Le 06-06-08 à 19:40, A. Pagaltzis a écrit : * Karl Dubost [EMAIL PROTECTED] [2006-06-08 04:30]: Which will not remove abuse :) Well, will anything short of not publishing your content? I think the point of such an effort is to make life easier for third parties who want to respect your wishes, not to make it harder for third parties who are intent on violating them. Agreed. And it's why my message (which was really badly written - fatigue) was separating the issue. It's a very important issue, and I really believe a clear spec, framework or let's say technical solution would improve the field. Definitely. I would love to see that happening as soon as possible. It's a mix between social and technical issues. Finding interoperable solutions would help to soften the social issues and frustrations. so again +1 a thousand of times ;) -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool ***