draft-snell-atompub-feed-nofollow-00.txt [was: Re: Don't Aggregrate Me]

2005-08-30 Thread James M Snell
This HAS NOT yet been submitted. I'm offering it up for discussion first. http://www.snellspace.com/public/draft-snell-atompub-feed-nofollow-00.txt defines x:follow="yes|no" x:index="yes|no" and x:archive="yes|no" attributes - James

Re: Don't Aggregrate Me

2005-08-29 Thread James M Snell
Walter Underwood wrote: --On August 30, 2005 11:39:04 AM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote: Someone wrote up "A Robots Processing Instruction for XML Documents" http://atrus.org/writings/technical/robots_pi/spec-199912__/ That's a PI though, and I have no idea how well supporte

Re: Don't Aggregrate Me

2005-08-29 Thread Joe Gregorio
On 8/29/05, Walter Underwood <[EMAIL PROTECTED]> wrote: > That was me. I think it makes perfect sense as a PI. But I think reuse > via namespaces is oversold. For example, we didn't even try to use > Dublin Core tags in Atom. Speak for yourself :) http://bitworking.org/news/Not_Invented_He

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood
--On August 29, 2005 7:05:09 PM -0700 James M Snell <[EMAIL PROTECTED]> wrote: > x:index="no|yes" doesn't seem to make a lot of sense in this case. It makes just as much sense as it does for HTML files. Maybe it is a whole group of Atom test cases. Maybe it is a feed of reboot times for the ser

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood
--On August 30, 2005 11:39:04 AM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote: > > Someone wrote up "A Robots Processing Instruction for XML Documents" > http://atrus.org/writings/technical/robots_pi/spec-199912__/ > That's a PI though, and I have no idea how well supported they are. I'd > pref

Re: Don't Aggregrate Me

2005-08-29 Thread Eric Scheid
On 30/8/05 12:05 PM, "James M Snell" <[EMAIL PROTECTED]> wrote: > That's kinda where I was going with x:follow="no|yes". An > x:archive="no|yes" would also make some sense but could also be handled > with HTTP caching (e.g. set the referenced content to expire > immediately). x:index="no|yes" d

Re: Don't Aggregrate Me

2005-08-29 Thread James M Snell
Eric Scheid wrote: On 30/8/05 11:19 AM, "James M Snell" <[EMAIL PROTECTED]> wrote: http://www.example.com/enclosure.mp3"; x:follow="no" /> http://www.example.com/enclosure.mp3"; x:follow="yes" /> http://www.example.com/enclosure.mp3"; x:follow="no" /> http://www.example.com/enclosure.mp3"

Re: Don't Aggregrate Me

2005-08-29 Thread Eric Scheid
On 30/8/05 11:19 AM, "James M Snell" <[EMAIL PROTECTED]> wrote: > http://www.example.com/enclosure.mp3"; > x:follow="no" /> > http://www.example.com/enclosure.mp3"; > x:follow="yes" /> > > http://www.example.com/enclosure.mp3"; x:follow="no" /> > http://www.example.com/enclosure.mp3"; x:follow="

Re: Don't Aggregrate Me

2005-08-29 Thread James M Snell
http://www.example.com/enclosure.mp3"; x:follow="no" /> http://www.example.com/enclosure.mp3"; x:follow="yes" /> http://www.example.com/enclosure.mp3"; x:follow="no" /> http://www.example.com/enclosure.mp3"; x:follow="yes" /> ??? - James A. Pagaltzis wrote: * Antone Roundy <[EMAIL PROTEC

Re: Don't Aggregrate Me

2005-08-29 Thread Karl Dubost
Le 05-08-26 à 18:59, Bob Wyman a écrit : Karl, Please, accept my apologies for this. I could have sworn we had the policy prominently displayed on the site. I know we used to have it there. This must have been lost when we did a site redesign last November! I'm really surprised that it

Re: Don't Aggregrate Me

2005-08-29 Thread A. Pagaltzis
* Eric Scheid <[EMAIL PROTECTED]> [2005-08-29 19:55]: > @xlink:type is used to describe the architecture of the links > involved, whether they be 'simple' (from here to somewhere) or > more complicated arrangements (that thing there is linked to > that other thing there, but only one way, but also

Re: Don't Aggregrate Me

2005-08-29 Thread A. Pagaltzis
* Mark Pilgrim <[EMAIL PROTECTED]> [2005-08-29 18:20]: > On 8/26/05, Graham <[EMAIL PROTECTED]> wrote: > > So you're saying browsers should check robots.txt before > > downloading images? > > It's sad that such an inane dodge would even garner any > attention at all, much less require a response.

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood
--On Monday, August 29, 2005 10:39:33 AM -0600 Antone Roundy <[EMAIL PROTECTED]> wrote: As has been suggested, to "inline images", we need to add frame documents, stylesheets, Java applets, external JavaScript code, objects such as Flash files, etc., etc., etc. The question is, with respect t

Re: Don't Aggregrate Me

2005-08-29 Thread A. Pagaltzis
* Antone Roundy <[EMAIL PROTECTED]> [2005-08-29 19:00]: > More robust would be: >default="false" /> > ...enabling extension elements to be named in @target without > requiring a list of @target values to be maintained anywhere. Is it wise to require either XPath support in consumers or to

Re: Don't Aggregrate Me

2005-08-29 Thread Antone Roundy
On Monday, August 29, 2005, at 10:39 AM, Antone Roundy wrote: More robust would be: ...enabling extension elements to be named in @target without requiring a list of @target values to be maintained anywhere.

Re: Don't Aggregrate Me

2005-08-29 Thread Antone Roundy
On Monday, August 29, 2005, at 10:12 AM, Mark Pilgrim wrote: On 8/26/05, Graham <[EMAIL PROTECTED]> wrote: (And before you say "but my aggregator is nothing but a podcast client, and the feeds are nothing but links to enclosures, so it's obvious that the publisher wanted me to download them" -

Re: Don't Aggregrate Me

2005-08-29 Thread Mark Pilgrim
On 8/26/05, Graham <[EMAIL PROTECTED]> wrote: > > (And before you say "but my aggregator is nothing but a podcast > > client, and the feeds are nothing but links to enclosures, so it's > > obvious that the publisher wanted me to download them" -- WRONG! The > > publisher might want that, or they

Re: Don't Aggregrate Me

2005-08-26 Thread Eric Scheid
On 27/8/05 6:40 AM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: > I think "crawling" URI's found in tags, > tags and enclosures isn't crawling... Or... Is there something I'm > missing here? crawling tags isn't a huge problem because it doesn't lead to a recursive situation. Same withh stylesheets

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman
Karl Dubost points out that it is hard to figure out what email address to send messages to if you want to "de-list" from PubSub...: Karl, Please, accept my apologies for this. I could have sworn we had the policy prominently displayed on the site. I know we used to have it there. This mus

Re: Don't Aggregrate Me

2005-08-26 Thread Karl Dubost
Le 05-08-26 à 17:53, Bob Wyman a écrit : Karl Dubost wrote: - How one who has previously submitted a feed URL remove it from the index? (Change of opinions) If you are the publisher of a feed and you don't want us to monitor your content, complain to us and we'll filter you out. Folk do

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman
Karl Dubost wrote: > - How one who has previously submitted a feed URL remove it from > the index? (Change of opinions) If you are the publisher of a feed and you don't want us to monitor your content, complain to us and we'll filter you out. Folk do this every once in a while. Send us an

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman
Roger Benningfield wrote: > We've got a mechanism that allows any user with his own domain > and a text editor to tell us whether or not he wants us messing with > his stuff. I think it's foolish to ignore that. The problem is that we have *many* such mechanisms. Robots.txt is only one. Ot

Re: Don't Aggregrate Me

2005-08-26 Thread A. Pagaltzis
* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 22:50]: > It strikes me that not all URIs are created equally and not > everything that looks like crawling is really "crawling." @xlink:type? Regards, -- Aristotle Pagaltzis //

Re: Don't Aggregrate Me

2005-08-26 Thread Roger B.
> Remember, PubSub never does > anything that a desktop client doesn't do. Bob: What about FeedMesh? If I ping blo.gs, they pass that ping along to you, and PubSub fetches my feed, then PubSub is doing something a desktop client doesn't do. It's following a link found in one place and retrieving/

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman
Mark Pilgrim wrote (among other things): > (And before you say "but my aggregator is nothing but a podcast > client, and the feeds are nothing but links to enclosures, so > it's obvious that the publisher wanted me to download them" -- WRONG! I agree with just about everything that Mark wr

Re: Don't Aggregrate Me

2005-08-26 Thread Karl Dubost
Le 05-08-25 à 18:51, Bob Wyman a écrit : At PubSub we *never* "crawl" to discover feed URLs. The only feeds we know about are: 1. Feeds that have announced their presence with a ping 2. Feeds that have been announced to us via a FeedMesh message. 3. Feeds that have been manually

Re: Don't Aggregrate Me

2005-08-26 Thread James M Snell
Graham wrote: (And before you say "but my aggregator is nothing but a podcast client, and the feeds are nothing but links to enclosures, so it's obvious that the publisher wanted me to download them" -- WRONG! The publisher might want that, or they might not ... So you're saying browsers

Re: Don't Aggregrate Me

2005-08-26 Thread Graham
On 26 Aug 2005, at 7:46 pm, Mark Pilgrim wrote: 2. If a user gives a feed URL to a program *and then the program finds all the URLs in that feed and requests them too*, the program needs to support robots.txt exclusions for all the URLs other than the original URL it was given. ... (And befo

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood
--On August 26, 2005 9:51:10 AM -0700 James M Snell <[EMAIL PROTECTED]> wrote: > Add a new link rel="readers" whose href points to a robots.txt-like file that > either allows or disallows the aggregator for specific URI's and establishes > polling rate preferences > > User-agent: {aggregator-u

Re: Don't Aggregrate Me

2005-08-26 Thread Mark Pilgrim
On 8/25/05, Roger B. <[EMAIL PROTECTED]> wrote: > > Mhh. I have not looked into this. But is not every desktop aggregator > > a robot? > > Henry: Depends on who you ask. (See the Newsmonster debates from a > couple years ago.) As I am the one who kicked off the Newsmonster debates a couple years

Re: Don't Aggregrate Me

2005-08-26 Thread James M Snell
Ok, so this discussion has definitely been interesting... let's see if we can turn it into something actionable. 1. Desktop aggregators and services like pubsub really do not fall into the same category as robots/crawlers and therefore should not necessarily be paying attention to robots.txt

RE: Don't Aggregrate Me

2005-08-26 Thread Bob Wyman
Antone Roundy wrote: > I'm with Bob on this. If a person publishes a feed without limiting > access to it, they either don't know what they're doing, or they're > EXPECTING it to be polled on a regular basis. As long as PubSub > doesn't poll too fast, the publisher is getting exactly what they >

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood
I'm adding robots@mccmedia.com to this dicussion. That is the classic list for robots.txt discussion. Robots list: this is a discussion about the interactions of /robots.txt and clients or robots that fetch RSS feeds. "Atom" is a new format in the RSS family. --On August 26, 2005 8:39:59 PM +100

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood
There are no wildcards in /robots.txt, only path prefixes and user-agent names. There is one special user-agent, "*", which means "all". I can't think of any good reason to always ignore the disallows for *. I guess it is OK to implement the parts of a spec that you want. Just don't answer "yes"

Re: Don't Aggregrate Me

2005-08-26 Thread A. Pagaltzis
* Bob Wyman <[EMAIL PROTECTED]> [2005-08-26 01:00]: > My impression has always been that robots.txt was intended to > stop robots that crawl a site (i.e. they read one page, extract > the URLs from it and then read those pages). I don't believe > robots.txt is intended to stop processes that simpl

Re: Don't Aggregrate Me

2005-08-26 Thread Antone Roundy
On Friday, August 26, 2005, at 04:39 AM, Eric Scheid wrote: On 26/8/05 3:55 PM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: Remember, PubSub never does anything that a desktop client doesn't do. Periodic re-fetching is a robotic behaviour, common to both desktop aggregators and server based aggre

Re: Don't Aggregrate Me

2005-08-26 Thread Eric Scheid
On 26/8/05 3:55 PM, "Bob Wyman" <[EMAIL PROTECTED]> wrote: > Remember, PubSub never does > anything that a desktop client doesn't do. Periodic re-fetching is a robotic behaviour, common to both desktop aggregators and server based aggregators. Robots.txt was established to minimise harm caused b

RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman
Roger Benningfield wrote: > However, if I put something like: > User-agent: PubSub > Disallow: / > ...in my robots.txt and you ignore it, then you very much > belong on the Bad List. I don't think so. The reason is that I believe that robots.txt has nothing to do with any service I provide

Re: Don't Aggregrate Me

2005-08-25 Thread Roger B.
Bob: It's one thing to ignore a wildcard rule in robots.txt. I don't think its a good idea, but I can at least see a valid argument for it. However, if I put something like: User-agent: PubSub Disallow: / ...in my robots.txt and you ignore it, then you very much belong on the Bad List. -- Roger

RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman
Antone Roundy wrote: > How could this all be related to aggregators that accept feed URL > submissions? My impression has always been that robots.txt was intended to stop robots that crawl a site (i.e. they read one page, extract the URLs from it and then read those pages). I don't believ

Re: Don't Aggregrate Me

2005-08-25 Thread Antone Roundy
On Thursday, August 25, 2005, at 03:12 PM, Walter Underwood wrote: I would call desktop clients "clients" not "robots". The distinction is how they add feeds to the polling list. Clients add them because of human decisions. Robots discover them mechanically and add them. So, clients should act

Re: Don't Aggregrate Me

2005-08-25 Thread Henry Story
Yes, I see how one is meant to look at it. But I can imagine desktop aggregators becoming more independent when searching for information... Perhaps at that point they should start reading robots.txt... Henry On 25 Aug 2005, at 23:12, Walter Underwood wrote: I would call desktop clients

Re: Don't Aggregrate Me

2005-08-25 Thread James M Snell
Walter Underwood wrote: --On August 25, 2005 3:43:03 PM -0400 Karl Dubost <[EMAIL PROTECTED]> wrote: Le 05-08-25 à 12:51, Walter Underwood a écrit : /robots.txt is one approach. Wouldn't hurt to have a recommendation for whether Atom clients honor that. Not many honor it.

Re: Don't Aggregrate Me

2005-08-25 Thread Roger B.
> Mhh. I have not looked into this. But is not every desktop aggregator > a robot? Henry: Depends on who you ask. (See the Newsmonster debates from a couple years ago.) Right now, I obey all wildcard and/or my-user-agent-specific directives I find in robots.txt. If I were writing a desktop app,

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood
I would call desktop clients "clients" not "robots". The distinction is how they add feeds to the polling list. Clients add them because of human decisions. Robots discover them mechanically and add them. So, clients should act like browsers, and ignore robots.txt. Robots.txt is not very widely

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood
--On August 25, 2005 3:43:03 PM -0400 Karl Dubost <[EMAIL PROTECTED]> wrote: > Le 05-08-25 à 12:51, Walter Underwood a écrit : >> /robots.txt is one approach. Wouldn't hurt to have a recommendation >> for whether Atom clients honor that. > > Not many honor it. I'm not surprised. There seems to b

Re: Don't Aggregrate Me

2005-08-25 Thread Henry Story
Mhh. I have not looked into this. But is not every desktop aggregator a robot? Henry On 25 Aug 2005, at 22:18, James M Snell wrote: At the very least, aggregators should respect robots.txt. Doing so would allow publishers to restrict who is allowed to pull their feed. - James

Re: Don't Aggregrate Me

2005-08-25 Thread James M Snell
Bob Wyman wrote: Karl Dubost wrote: One of my reasons which worries me more and more, is that some aggregators, bots do not respect the Creative Common license (or at least the way I understand it). Your understanding of Creative Commons is apparently a bit non-optimal -- even

Re: Don't Aggregrate Me

2005-08-25 Thread Karl Dubost
Bob, Thanks for the explanation. Much appreciated. Le 05-08-25 à 15:59, Bob Wyman a écrit : Karl Dubost wrote: One of my reasons which worries me more and more, is that some aggregators, bots do not respect the Creative Common license (or at least the way I understand it). It is importa

RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman
Karl Dubost wrote: > One of my reasons which worries me more and more, is that some > aggregators, bots do not respect the Creative Common license (or > at least the way I understand it). Your understanding of Creative Commons is apparently a bit non-optimal -- even though many people seem

Re: Don't Aggregrate Me

2005-08-25 Thread Karl Dubost
Le 05-08-25 à 12:51, Walter Underwood a écrit : /robots.txt is one approach. Wouldn't hurt to have a recommendation for whether Atom clients honor that. Not many honor it. A while ago I had this list from http://varchars.com/blog/node/view/59 The Good BlogPulse NITLE Blog Spider

Re: Don't Aggregrate Me

2005-08-25 Thread Karl Dubost
Le 05-08-25 à 06:44, James Aylett a écrit : I like the use case, but I don't see why you would want to disallow aggregators to pull the feed. You might want it for many reasons. One of my reasons which worries me more and more, is that some aggregators, bots do not respect the Creative Co

Re: Don't Aggregrate Me

2005-08-25 Thread Antone Roundy
I can see reasonable uses for this, like marking a feed of local disk errors as not of general interest. "This is not published data" - Security by obscurity^H^H^H^H^H^H^H^H^H saying "please" - < http://www-cs-faculty.stanford.edu/~knuth/> (see the second lin

Re: Don't Aggregrate Me

2005-08-25 Thread Mark Nottingham
It works in both Safari and Firefox; it's just that that particular data: URI is a 1x1 blank gif ;) On 25/08/2005, at 9:37 AM, Henry Story wrote: On 25 Aug 2005, at 17:06, A. Pagaltzis wrote: * Henry Story <[EMAIL PROTECTED]> [2005-08-25 16:55]: Do we put base64 encoded stuff in ht

Re: Don't Aggregrate Me

2005-08-25 Thread A. Pagaltzis
* Henry Story <[EMAIL PROTECTED]> [2005-08-25 18:40]: > And it does not give me anything very intersting when I look at > it in either Safari or Firefox. Of course not – it’s the infamous transparent single-pixel GIF. :-) Regards, -- Aristotle Pagaltzis //

RE: Don't Aggregrate Me

2005-08-25 Thread Paul Hoffman
At 10:22 AM -0400 8/25/05, Bob Wyman wrote: James M Snell wrote: Does the following work? ... no I think it is important to recognize that there are at least two kinds of aggregator. The most common is the desktop "end-point" aggregator that consumes feeds from various sources

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood
I can see reasonable uses for this, like marking a feed of local disk errors as not of general interest. I would not be surprised to see RSS/Atom catch on for system monitoring. Search engines see this all the time -- just because it is HTML doesn't mean it is the primary content on the site. Log

Re: Don't Aggregrate Me

2005-08-25 Thread Henry Story
On 25 Aug 2005, at 17:06, A. Pagaltzis wrote: * Henry Story <[EMAIL PROTECTED]> [2005-08-25 16:55]: Do we put base64 encoded stuff in html? No: that is why there are things like !!! That really does exist?! Yes: http://www.ietf.org/rfc/rfc2397.txt But apparently only for very shor

Re: Don't Aggregrate Me

2005-08-25 Thread Antone Roundy
On Thursday, August 25, 2005, at 08:16 AM, James M Snell wrote: Good points but it's more than just the handling of human-readable content. That's one use case but there are others. Consider, for example, if I was producing a feed that contained javascript and CSS styles that would other

Re: Don't Aggregrate Me

2005-08-25 Thread Antone Roundy
On Thursday, August 25, 2005, at 12:25 AM, James M Snell wrote: Up to this point, the vast majority of use cases for Atom feeds is the traditional syndicated content case. A bunch of content updates that are designed to be distributed and aggregated within Feed readers or online aggregators,

Re: Don't Aggregrate Me

2005-08-25 Thread James M Snell
A. Pagaltzis wrote: * James M Snell <[EMAIL PROTECTED]> [2005-08-25 16:20]: I dunno, I'm just kinda scratching my head on this wondering if there is any actual need here. My instincts are telling me no, but... Seems to me that your instincts are right. :-) I’m not sure why, in the

Re: Don't Aggregrate Me

2005-08-25 Thread A. Pagaltzis
* Henry Story <[EMAIL PROTECTED]> [2005-08-25 16:55]: > Do we put base64 encoded stuff in html? No: that is why there > are things like > :-) Regards, -- Aristotle Pagaltzis //

Re: Don't Aggregrate Me

2005-08-25 Thread A. Pagaltzis
* James M Snell <[EMAIL PROTECTED]> [2005-08-25 16:20]: > I dunno, I'm just kinda scratching my head on this wondering if > there is any actual need here. My instincts are telling me no, > but... Seems to me that your instincts are right. :-) I’m not sure why, in the scenarios you describe, it

Re: Don't Aggregrate Me

2005-08-25 Thread Henry Story
On 25 Aug 2005, at 15:45, Joe Gregorio wrote: On 8/25/05, James M Snell <[EMAIL PROTECTED]> wrote: Up to this point, the vast majority of use cases for Atom feeds is the traditional syndicated content case. A bunch of content updates that are designed to be distributed and aggregated wi

RE: Don't Aggregrate Me

2005-08-25 Thread Bob Wyman
James M Snell wrote: > Does the following work? > > ... > no > I think it is important to recognize that there are at least two kinds of aggregator. The most common is the desktop "end-point" aggregator that consumes feeds from various sources and then presents or processes them locall

Re: Don't Aggregrate Me

2005-08-25 Thread James M Snell
A. Pagaltzis wrote: * James M Snell <[EMAIL PROTECTED]> [2005-08-25 08:35]: I don't really want aggregators pulling and indexing that feed and attempting to display it within a traditional feed reader. Why, though? There’s no reason aggregators couldn’t at some point become more cap

Re: Don't Aggregrate Me

2005-08-25 Thread Joe Gregorio
On 8/25/05, James M Snell <[EMAIL PROTECTED]> wrote: > > Up to this point, the vast majority of use cases for Atom feeds is the > traditional syndicated content case. A bunch of content updates that > are designed to be distributed and aggregated within Feed readers or > online aggregators, etc.

Re: Don't Aggregrate Me

2005-08-25 Thread A. Pagaltzis
* James M Snell <[EMAIL PROTECTED]> [2005-08-25 08:35]: > I don't really want aggregators pulling and indexing that feed > and attempting to display it within a traditional feed reader. Why, though? There’s no reason aggregators couldn’t at some point become more capable of doing something usefu

Re: Don't Aggregrate Me

2005-08-25 Thread James Aylett
On Wed, Aug 24, 2005 at 11:25:12PM -0700, James M Snell wrote: > For example, suppose I build an application that depends on an Atom feed > containing binary content (e.g. a software update feed). I don't really > want aggregators pulling and indexing that feed and attempting to > display it