subject:"RE\: Atom feed refresh rates"

Re: Atom feed refresh rates

2005-05-06 Thread Walter Underwood

--On May 5, 2005 10:53:48 AM -0700 John Panzer <[EMAIL PROTECTED]> wrote:
> 
> I assume an HTTP Expires header for Atom content will work and play well with
> caches such as the Google Accelerator (http://webaccelerator.google.com/). 
> I'd also guess that a syntax-level tag won't.  Is this important? 

The syntax-level tag is useful inside a client program with a cache.
It can reduce the number of requests at the source, rather than 
reducing them in the middle of the network at an HTTP cache.

There is extra benefit from putting that info into the HTTP headers,
because the HTTP cache is shared between multiple clients. The source
webserver sees one GET per HTTP cache instead of one GET per Atom client.

The syntax-level tag also provides a way for the feed author to specify the
info without depending on webserver-specific controls. It does depend on
some extra bit of software to take that info and put it in the HTTP
Expires or Cache-control headers.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom on portable wireless device (was: RE: Atom feed refresh rates)

2005-05-06 Thread Janne Jalkanen


> You've written on your blog that you want to see more "304"
> responses. Well, I would suggest that what you *really* should want is more
> "226" responses -- 226 is the success code for an RFC3229+feed GET
> operation.

I like so agree.  226 support would be highly commendable for
everyone, who wants to serve feeds for mobiles...  But considering the
status these days, even 304 would be good. *sigh*

Though, the real solution probably lies in notification protocols,
such as SIP.  Reduce the need of polling, do a proper SIP
subscribe-notify...  Of course, these are not solutions for current
devices.

/Janne

Re: Atom feed refresh rates

2005-05-05 Thread A. Pagaltzis


* Lance Lavandowska <[EMAIL PROTECTED]> [2005-05-04 19:00]:
> In the toy aggregator I wrote I played with a scheduler that
> tried to throttle itself based on the feeds response.

I believe that is the right way to do this, though your algorithm
is a little too simple IMHO. A better approach for Atom consumers
in absence of applicable HTTP headers would be to intelligently
calculate an average update interval based on atom:published /
atom:updated / etc.

[ Of course, for RSS feeds without pubDate and any applicable
HTTP headers, the algorithm is a *lot* more complicated.
Something like exponential backoff with a low radix reducing (not
resetting) the backoff depending on the number of new items you
got would keep the update interval close to an ideal. ]

Furthermore Iâd suggest not giving users any option to manually
change intervals â only a way to force a refresh immediately.
This is a usability win. When I started using an aggregator, I
never had any idea what value to realistically supply when the
aggregator asked me how often I wanted a feed to be refreshed.
How is my grandfather supposed to make an intelligent decision?
And why should he? The software has all the information it needs
to refresh about as often as it can expect to find new content.

This entire issue is just a matter of lazy aggregator
implementors, IMO.

Regards,
-- 
Aristotle

Re: Atom feed refresh rates

2005-05-05 Thread Mark Pilgrim

On 5/5/05, John Panzer <[EMAIL PROTECTED]> wrote:
> I assume an HTTP Expires header for Atom content will work and play well
> with caches such as the Google Accelerator
> (http://webaccelerator.google.com/).  I'd also guess that a syntax-level
> tag won't.  Is this important?

Yes, and yes.  This is exactly the sort of software that we're talking
about when we say that HTTP's native caching mechanism is widely
supported.  All the proxies in the world (which is what Google's Web
Accelerator is, except it runs on your own machine and listens on port
9100) are able to reduce network traffic and therefore make the end
user's experience faster because they understand and respect the HTTP
caching mechanism.  (Google Web Accelerator does other things too,
like proxying requests through Google's servers.  And what are those
servers running?  Another caching HTTP proxy.)  Many ISPs do this at
the ISP level, both to reduce their own upstream bandwidth costs and
to make their end users happier.  Many corporations do this as well (I
would bet good money that IBM does it).  At one time, I even had Squid
installed on my home network to do this. 

HTTP caching works.

> The HTML solution for people who could not implement Expires: seems to
> be META tags with in theory equivalent information.  Though in practice
> the whole thing is a mess, this seems like a conceptually simple
> workaround.  Is there something obviously wrong with it?

Other than being a God-awful mess?  No, there's nothing wrong with it. ;)

-- 
Cheers,
-Mark

Re: Atom feed refresh rates

2005-05-05 Thread John Panzer

I assume an HTTP Expires header for Atom content will work and play well 
with caches such as the Google Accelerator 
(http://webaccelerator.google.com/).  I'd also guess that a syntax-level 
tag won't.  Is this important? 

The HTML solution for people who could not implement Expires: seems to 
be META tags with in theory equivalent information.  Though in practice 
the whole thing is a mess, this seems like a conceptually simple 
workaround.  Is there something obviously wrong with it?

-John

Re: Atom feed refresh rates

2005-05-05 Thread Mark Pilgrim


On 5/5/05, Dan Brickley <[EMAIL PROTECTED]> wrote:
> [googles a bit] OK it looks like Gnutella also uses HTTP for the file
> download part of it's protocol, fwiw. (including Range: header)
> http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf

You mean RSS's  element is even more useless than I thought?  I
didn't think that was possible.

-- 
Cheers,
-Mark

Re: Atom feed refresh rates

2005-05-05 Thread Dan Brickley

* Henri Sivonen <[EMAIL PROTECTED]> [2005-05-05 18:35+0300]
> 
> On May 5, 2005, at 16:24, Walter Underwood wrote:
> 
> >--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim <[EMAIL PROTECTED]> 
> >wrote:
> >>
> >>Not to be flippant, but we have one that's widely available.  It's
> >>called the Expires header.
> >
> >You need the information outside of HTTP. To quote from the RSS spec
> >for ttl:
> >
> >  This makes it possible for RSS sources to be managed by a 
> >file-sharing
> >  network such as Gnutella.
> >
> >Caching information is about knowing when your client cache is stale,
> >regardless of how you got the feed.
> 
> Virtually everyone with IP connectivity can do HTTP, and HTTP has the 
> Expires header. If this feature is important to you, why would you 
> switch to a transfer protocol that doesn't have the feature? (I am not 
> claiming anything about the actual Gnutella features.) To me, the "what 
> if the feed is not over HTTP" argumentation seems theoretical 
> over-generalization.

+1 

FWIW various P2P/filesharing protocols use HTTP, eg. Kazaa and others make 
use of HTTP's ability to request a byte range, handy if you're
requesting chunks of the same file from different servers. Those who
care to have HTTP header semantics show up in other environments can 
do various things (eg. reflect into an XML namespace). But it doesn't 
seem to me to be core business of the AtomPub WG to do this work...

[googles a bit] OK it looks like Gnutella also uses HTTP for the file 
download part of it's protocol, fwiw. (including Range: header)
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf

Dan

Re: Atom feed refresh rates

2005-05-05 Thread Tim Bray


Warning: we are into the end-game.  What really counts is the set of 
outstanding Paces.  When Paul and I are going through the list to 
figure out consensus calls, emails that don't have a Pace in the 
Subject line are apt to get ignored.

So I'm not sure this endless thread entitled "feed refresh rates" is 
doing anyone any good unless it can coalesce around a Pace. -Tim

Re: Atom feed refresh rates

2005-05-05 Thread Henri Sivonen

On May 5, 2005, at 16:24, Walter Underwood wrote:
--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim <[EMAIL PROTECTED]> 
wrote:
Not to be flippant, but we have one that's widely available.  It's
called the Expires header.
You need the information outside of HTTP. To quote from the RSS spec
for ttl:
  This makes it possible for RSS sources to be managed by a 
file-sharing
  network such as Gnutella.

Caching information is about knowing when your client cache is stale,
regardless of how you got the feed.
Virtually everyone with IP connectivity can do HTTP, and HTTP has the 
Expires header. If this feature is important to you, why would you 
switch to a transfer protocol that doesn't have the feature? (I am not 
claiming anything about the actual Gnutella features.) To me, the "what 
if the feed is not over HTTP" argumentation seems theoretical 
over-generalization.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: Atom feed refresh rates

2005-05-05 Thread James Aylett

On Thu, May 05, 2005 at 08:07:15AM -0500, Mark Pilgrim wrote:

> Not to be flippant, but we have one that's widely available.  It's
> called the Expires header.  I spoke with Roy Fielding at Apachecon
> 2003 and asked him this exact question: "If I set an Expires header on
> a feed of now + 3 hours, does that mean that I don't want the client
> to fetch the feed again for at least 3 hours?"  And he said yes,
> that's exactly what it means.

I think the problem here may be that the HTTP/1.1 spec gives the
impression that the Expires header is not designed to affect end
clients (user agents).

For instance, from 13.2.1 ("Server-Specified Expiration"), we get the
sentence:

"The expiration mechanism applies only to responses taken from a cache
and not to first-hand responses forwarded immediately to the
requesting client."

Now many clients themselves contain caches, but this distinction may
still be the source of some confusion, especially as the number of
people who know about the distinction (by having written a user agent)
compared to the number who are affected by it (by writing server
components) is pretty small.

James

-- 
/--\
  James Aylett  xapian.org
  [EMAIL PROTECTED]   uncertaintydivision.org

Re: Atom feed refresh rates

2005-05-05 Thread Mark Pilgrim

On 5/5/05, Walter Underwood <[EMAIL PROTECTED]> wrote:
> You need the information outside of HTTP. To quote from the RSS spec
> for ttl:
> 
>   This makes it possible for RSS sources to be managed by a file-sharing
>   network such as Gnutella.

Ignoring, for the moment, that this is a horrible idea and no one
supports it, Gnutella has its own caching and time-to-live mechanisms
that the RSS spec is ignoring.

-- 
Cheers,
-Mark

RE: Atom feed refresh rates

2005-05-05 Thread Andy Henderson


>>>You seem to want the ttl element so that you have the publisher's
permission to check less often. Why not just do so anyway if it causes so
many problems? If that degrades the user experience too much, you're free to
check more often. How is the ttl element useful to you?<<<

I allow anyone to specify any refresh interval higher than the greater of
ttl or 60 minutes.

The ttl allows me to extend the minimum refresh interval beyond 60 minutes.
'MSDN just published' at http://msdn.microsoft.com/rss.xml includes
1440.  I therefore set the refresh interval to 1 day when the
feed is added and I do not allow people to specify a lower refresh interval.

If the ttl tag simply described the minimum refresh interval, I would also
use it to allow people to specify refresh intervals less than 60 minutes
knowing that was acceptable to the feed provider.  Unfortunately, the
genesis of the ttl tag means that lower ttl values are unreliable.  The BBC,
for example, specifies a ttl of 5 which I'm sure refers to that tag's
original use, not a minimum refresh interval.

Andy

Re: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim <[EMAIL PROTECTED]> wrote:
>
> Not to be flippant, but we have one that's widely available.  It's
> called the Expires header. 

You need the information outside of HTTP. To quote from the RSS spec
for ttl:

  This makes it possible for RSS sources to be managed by a file-sharing 
  network such as Gnutella. 

Caching information is about knowing when your client cache is stale,
regardless of how you got the feed.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-05 Thread Graham

On 5 May 2005, at 1:54 pm, Andy Henderson wrote:
I've no way to demonstrate aggregator use except my own - even  
though I
support a tiny community I do observe and enforce the ttl tag.
You seem to want the ttl element so that you have the publisher's  
permission to check less often. Why not just do so anyway if it  
causes so many problems? If that degrades the user experience too  
much, you're free to check more often. How is the ttl element useful  
to you?

Graham

RE: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:15:10 AM +0100 Andy Henderson <[EMAIL PROTECTED]> wrote:
>
> here is no RSS2 feature I can see that allows feed providers to tell
> aggregators the minimum refresh period.  There's the ttl tag.  That was, I
> believe, introduced for a different purpose and determines the Maximum time
> a feed should be cached in a certain situation. 

We need both a ttl (max-age) and expires. One or the other is appropriate
for different publishing needs. We also need to specify what you do with
those values, or you end up with a mess, like the RSS2 ttl meaning reversing
over an undocumented value (Yikes!).

> What has yet to be tried is a specific tag in the core feed standard that
> promotes and determines good behaviour for aggregators refreshing their
> feeds.  Even if it were to prove only a limited benefit, it would still be a
> benefit.

It has been tried several ways, originally in robots.txt extensions and
also in RSS. It doesn't work. The model is not rich enough for publishers
or for spiders/aggregators.

Max-age/expires is already designed and proven. By page count, 20% of the
HTTP 1.1 spec is about caching. If we want to write a new caching/scheduling
approach, we can expect it to be a 20 page spec, plus an additional 10
pages on how to work with the HTTP model.

See the Notes section here for details on when to use max-age or expires,
and on the problems with calendar-based schemes.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-05 Thread Mark Pilgrim

On 5/5/05, Andy Henderson <[EMAIL PROTECTED]> wrote:
> convincing the WG, I would simply point out that a mechanism widely
> available to, and understood by, feed providers and aggregators cannot do
> harm and has the potential to do a great deal of good.

Not to be flippant, but we have one that's widely available.  It's
called the Expires header.  I spoke with Roy Fielding at Apachecon
2003 and asked him this exact question: "If I set an Expires header on
a feed of now + 3 hours, does that mean that I don't want the client
to fetch the feed again for at least 3 hours?"  And he said yes,
that's exactly what it means.

I sympathize with your dilemma that you have no control over your HTTP
headers, but... wait, no I don't sympathize.  At all.

-- 
Cheers,
-Mark

Re: Atom feed refresh rates

2005-05-05 Thread Julian Reschke

Andy Henderson wrote:
You can do that with the "Expires" response header. Everytime the 
resource is requestef, serve it with a value of "now +
minimumrefreshinterval".<<<
Ah.  I see what you mean.  Thank you.
The problem is that when you say "You can do that now with the "Expires"
response header" - I can't.  It's a theoretical capability I have, but not a
practical one.
I am directly responsible for three feeds.  One is a feed associated with my
aggregator.  It's a simple xml file stored on a shared server along with the
rest of my web site.  I have no access to any HTML headers.  When
aggregators access my feed, no code of mine runs - the transaction is
handled by the server alone.  The other two are feeds generated on the fly
from a back-end database; again they are running on a shared server and,
again, my development tool (IBM's Domino) gives me no access to set the
Expires header.
I just used Microsoft's Fiddler tool to check all the feeds I subscribe to
(not a scientific sample, I admit, but it's a pretty broad mix and includes
blog sites and blogging tools) and just two provide Expires headers.  One is
the BBC, the other is Wired.  Both set Expires to expire immediately.  I'm
guessing they have good reason to do that.  I re-subscribed to Slashdot
(which has implemented draconian bandwidth throttling measures) and it
doesn't use Expires headers.
So, Expires is a measure that I could use in theory but is not available to
me either directly or, apparently, via third party blogging sites/tools.
When I look at best practice, I find Expires is either not used or is used
in a different way.
Both from a provider viewpoint and from an aggregator viewpoint, Expires
does not seem a practical option.
Well,
we're designing a feed format here. When this feed is served through 
HTTP, (re-)using the caching features of HTTP will ensure that any 
standard HTTP client will take advantage of it. For instance, if you use 
an HTTP client component that maintains it's own cache, it will 
automatically do the right thing. Also, when you're accessing the feed 
through an HTTP proxy, you will get copies from the proxy's cache when 
available.

I just checked and Apache allows you to set the "Expires" header through 
"mod_expires" (). 
Lotus Domino seems to do it though a thing called "Web Site Rule" 
(). 
I'm sure you can do it with other packages as well.

On the other hand, of the feeds you checked, how many did actually 
implement the corresponding RSS feature 
()?

If you can demonstrate that lots of feeds use thos feature, and that 
aggregators indeed pay attention to it, you may be able to convince the 
WG that Atom needs this to achive feature parity.

Best regards, Julian

RE: Atom feed refresh rates

2005-05-05 Thread Andy Henderson


>>>we're designing a feed format here. When this feed is served through
HTTP, (re-)using the caching features of HTTP will ensure that any standard
HTTP client will take advantage of it. For instance, if you use an HTTP
client component that maintains it's own cache, it will automatically do the
right thing. Also, when you're accessing the feed through an HTTP proxy, you
will get copies from the proxy's cache when available.<<<

Understood.  My issue is that creating the headers is outside the capability
of many (most?) feed providers.

>>>I just checked and Apache allows you to set the "Expires" header through
"mod_expires" (). <<<

I'm sure you're right, but it would mean little to most feed providers
(including me).

>>>Lotus Domino seems to do it though a thing called "Web Site Rule" 
(). <<<

DWA is a sub-feature of the mail client so not helpful, I'm afraid.  Thanks
for looking into it, though.

>>>I'm sure you can do it with other packages as well.<<<

If it was available at the blogging package level, I'm sure people that use
those packages would use the feature.  The fact few feeds seem to use the
Expires header, and those that do use it to immediately expire, seems to
indicate an issue (proxy caching?).

>>>On the other hand, of the feeds you checked, how many did actually
implement the corresponding RSS feature
()?<<<

Out of 33 RSS.9/2 feeds, 16 have ttl tags.  Syndic8 reports that 16,840 RSS
feeds use it.  It says that's 7% of the total.  The actual percentage is
better than that because I believe ttl is RSS2 only; it's certainly not
RSS1.

That's pretty high considering that ttl is flawed because it was not
originally designed to communicate minimum refresh intervals.

>>>If you can demonstrate that lots of feeds use thos feature, and that
aggregators indeed pay attention to it, you may be able to convince the WG
that Atom needs this to achive feature parity.<<<

I've no way to demonstrate aggregator use except my own - even though I
support a tiny community I do observe and enforce the ttl tag.  I'm sure
that if there were a clearly-defined tag supported by Mark's implementation
document, usage would significantly improve over RSS2 ttl levels.  As for
convincing the WG, I would simply point out that a mechanism widely
available to, and understood by, feed providers and aggregators cannot do
harm and has the potential to do a great deal of good.  It seems to me to be
a useful opportunity to demonstrate a clear improvement over both RSS1 and
RSS2.

Andy

RE: Atom feed refresh rates

2005-05-05 Thread Andy Henderson


>>>You can do that with the "Expires" response header. Everytime the 
resource is requestef, serve it with a value of "now +
minimumrefreshinterval".<<<

Ah.  I see what you mean.  Thank you.

The problem is that when you say "You can do that now with the "Expires"
response header" - I can't.  It's a theoretical capability I have, but not a
practical one.

I am directly responsible for three feeds.  One is a feed associated with my
aggregator.  It's a simple xml file stored on a shared server along with the
rest of my web site.  I have no access to any HTML headers.  When
aggregators access my feed, no code of mine runs - the transaction is
handled by the server alone.  The other two are feeds generated on the fly
from a back-end database; again they are running on a shared server and,
again, my development tool (IBM's Domino) gives me no access to set the
Expires header.

I just used Microsoft's Fiddler tool to check all the feeds I subscribe to
(not a scientific sample, I admit, but it's a pretty broad mix and includes
blog sites and blogging tools) and just two provide Expires headers.  One is
the BBC, the other is Wired.  Both set Expires to expire immediately.  I'm
guessing they have good reason to do that.  I re-subscribed to Slashdot
(which has implemented draconian bandwidth throttling measures) and it
doesn't use Expires headers.

So, Expires is a measure that I could use in theory but is not available to
me either directly or, apparently, via third party blogging sites/tools.
When I look at best practice, I find Expires is either not used or is used
in a different way.

Both from a provider viewpoint and from an aggregator viewpoint, Expires
does not seem a practical option.

Andy

RE: Atom feed refresh rates

2005-05-05 Thread Andy Henderson


 >>>Actually, as I recall, last time this came up (proposed by Walter 
>>>Underwood), someone pointed out accurately that RSS2 has had this 
>>>functionality for a long time and that nobody ever really implemented 
>>>it; thus there was a strong vote from experience against such a 
>>>feature. -Tim<<<

There is no RSS2 feature I can see that allows feed providers to tell
aggregators the minimum refresh period.  There's the ttl tag.  That was, I
believe, introduced for a different purpose and determines the Maximum time
a feed should be cached in a certain situation.  The tag's usage has morphed
over time and, if more than 60 minutes, I assume it's a Minimum time; but
it's not surprising if feed providers are wary of using the tag in this way.

What has yet to be tried is a specific tag in the core feed standard that
promotes and determines good behaviour for aggregators refreshing their
feeds.  Even if it were to prove only a limited benefit, it would still be a
benefit.

Andy

Atom on portable wireless device (was: RE: Atom feed refresh rates)

2005-05-04 Thread Bob Wyman

Chris DeSalvo wrote:
> As the author of an aggregator app for a portable wireless device I 
> can tell you that this is a serious problem for this class of products.
You didn't list support for RFC3229+feed[1,2] as one of the things
you are doing. This would help you drastically reduce the bandwidth needed
when you find a feed that actually has new content. If you use RFC3229+feed
to pull a feed, then you will only get the new entries in the feed -- not
ones that you've copied over before. It's one step beyond If-None-Match,
etc.
But, the real problem with your approach is that you have apparently
coded the device so that it goes out and polls large numbers of feeds. This
doesn't make sense. For a portable wireless device with limited bandwidth
and limited connectivity, you should be accessing feeds via an intermediary
"proxy" that gathers up all your updates into a *single* feed. That feed
should be served using RFC3229+feed to ensure that you only copy from it the
updated entries since you last pulled from it. Of course, it would also make
sense to support compression on the results. There is no more efficient
mechanism for polling for feeds from the kind of device you describe.
You say that you're reading about 20MB per day but you're only able
to harvest 2MB of "fresh" data from it? This 1/10 harvesting yield is
actually pretty normal when polling RSS/Atom feeds served without
RFC3229+feed. If you used RFC3229+feed, you would find that your yield would
start to approach 100% rather then the 10% you are at now. Additionally,
given the efficiencies here, you would be able to increase your polling
frequency almost arbitrarily without significantly increasing the bandwidth
consumption of your system. Thus, you could cut latency below the average of
30 minutes which is implied by a polling frequency of 1 hour.
You've written on your blog that you want to see more "304"
responses. Well, I would suggest that what you *really* should want is more
"226" responses -- 226 is the success code for an RFC3229+feed GET
operation.

bob wyman

[1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html
[2] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html

 Original Message ==
In 
my app I've implemented every trick in the book to try and reduce the 
amount of data that I have to pull through the radio and parse.  I use 
If-None-Match and If-Changed-Since headers in my requests, I support 
compression, I respect caching hints from the servers.  It doesn't help 
in all cases.  I have 112 loaded up in my aggregator and only 74 of the 
servers hosting those feeds ever return a 304.  The rest give me a 200 
and gladly hand me everything regardless of whether it has changed or 
not.  17 of the servers don't bother supplying an ETag header.

My feed list amounts to about 20 MB of data per day when polling once 
per hour.  That is a lot of air time for a small radio, and a lot time 
spent grinding in an XML parser for a small CPU.  This is especially 
upsetting because by my measurements only about 2 MB of data is fresh 
for any given day.  The main hit is in battery life - the above stats 
can trivially knock HOURS off of the life of a small battery.

I've written extensively about this problem here:

with a real-world example studied here:

So, I guess I'd like to see an optional update-frequency hint element.

Thanks,
Chris

Re: Atom feed refresh rates

2005-05-04 Thread Robert Sayre

On 5/4/05, Chris DeSalvo <[EMAIL PROTECTED]> wrote:

> 
> If the feed provided a hint for a reasonable polling frequency, it
> would be a plus for limited-resource devices.  I hate to suggest that
> the format be changed as a prophylactic measure against bad-citizen
> servers, but that is the problem that I have to solve for my platform
> and applications. 

No one is denying the existence of the problem you're describing.
However, this WG has consistently decided is that an optional XML
element of the kind you're describing wouldn't solve the problem.
Essentially, we'd be trading one evangelism problem for another.

Robert Sayre

Re: Atom feed refresh rates

2005-05-04 Thread Lance Lavandowska

On 5/4/05, Roger B. <[EMAIL PROTECTED]> wrote:
> 
> That's not to say that there's something necessarily wrong with an
> aggregator that allows users to pull feeds every five minutes. If

In the toy aggregator I wrote I played with a scheduler that tried to
throttle itself based on the feeds response.  That is to say it
started polling every ten minutes.  If the feed returned a 302 (or the
corresponding Etag "i haven't changed") then it extended that to every
20.  Then 30...  The problem I had was deciding what the maximum
should be (1 hour? 2? 24?).  Upon getting a 'fresh' feed it reset the
interval to 10 minutes and started over again.

I'm certain I got this idea from someone else, but don't recall who
originated the idea.

Lance Lavandowska

Re: Atom feed refresh rates

2005-05-04 Thread Chris DeSalvo

I do not disagree.  I just wanted to get my $0.02 in for completeness.  
I'm happy as a clam with atom as it is now.

-chris
On May 4, 2005, at 12:52 PM, Robert Sayre wrote:
No one is denying the existence of the problem you're describing.
However, this WG has consistently decided is that an optional XML
element of the kind you're describing wouldn't solve the problem.
Essentially, we'd be trading one evangelism problem for another.

Re: Atom feed refresh rates

2005-05-04 Thread Chris DeSalvo

On May 4, 2005, at 11:35 AM, Graham wrote:
On 4 May 2005, at 7:11 pm, Chris DeSalvo wrote:
My feed list amounts to about 20 MB of data per day when polling once 
per hour.  That is a lot of air time for a small radio, and a lot 
time spent grinding in an XML parser for a small CPU.  This is 
especially upsetting because by my measurements only about 2 MB of 
data is fresh for any given day.  The main hit is in battery life – 
the above stats can trivially knock HOURS off of the life of a small 
battery.
So you're saying the first smartphone aggregator that uses a gateway 
server to move the heavy lifting off of the device is going to clean 
up the market. What's this got to do with Atom?

So, I guess I'd like to see an optional update-frequency hint element.
Why?
If the feed provided a hint for a reasonable polling frequency, it 
would be a plus for limited-resource devices.  I hate to suggest that 
the format be changed as a prophylactic measure against bad-citizen 
servers, but that is the problem that I have to solve for my platform 
and applications.  In case anyone cares, this is for the T-Mobile 
Sidekick.  I work at Danger, Inc, the developer of the OS and hardware. 
 I work on the OS and applications.

-chris
p.s.  And yes, someone providing a good gateway, with a snazzy push 
protocol would make my life a lot easier.

Re: Atom feed refresh rates

2005-05-04 Thread Graham

On 4 May 2005, at 7:11 pm, Chris DeSalvo wrote:
My feed list amounts to about 20 MB of data per day when polling  
once per hour.  That is a lot of air time for a small radio, and a  
lot time spent grinding in an XML parser for a small CPU.  This is  
especially upsetting because by my measurements only about 2 MB of  
data is fresh for any given day.  The main hit is in battery life –  
the above stats can trivially knock HOURS off of the life of a  
small battery.
So you're saying the first smartphone aggregator that uses a gateway  
server to move the heavy lifting off of the device is going to clean  
up the market. What's this got to do with Atom?

So, I guess I'd like to see an optional update-frequency hint element.
Why?
Graham

Re: Atom feed refresh rates

2005-05-04 Thread Chris DeSalvo

On May 4, 2005, at 3:44 AM, Brett Lindsley wrote:
Andy, I recall bringing up the same issue with respect to portable 
devices. My angle
was that firing up the transmitter, making a network connection and 
connecting to
the server is still an expensive operation in time and power (for a 
portable
device) - even if the server returns nothing .  There is no reason to 
check feeds
that are not being updated, but then, there currently is no way to 
know this.
As the author of an aggregator app for a portable wireless device I can 
tell you that this is a serious problem for this class of products.  In 
my app I've implemented every trick in the book to try and reduce the 
amount of data that I have to pull through the radio and parse.  I use 
If-None-Match and If-Changed-Since headers in my requests, I support 
compression, I respect caching hints from the servers.  It doesn't help 
in all cases.  I have 112 loaded up in my aggregator and only 74 of the 
servers hosting those feeds ever return a 304.  The rest give me a 200 
and gladly hand me everything regardless of whether it has changed or 
not.  17 of the servers don't bother supplying an ETag header.

My feed list amounts to about 20 MB of data per day when polling once 
per hour.  That is a lot of air time for a small radio, and a lot time 
spent grinding in an XML parser for a small CPU.  This is especially 
upsetting because by my measurements only about 2 MB of data is fresh 
for any given day.  The main hit is in battery life – the above stats 
can trivially knock HOURS off of the life of a small battery.

I've written extensively about this problem here:

with a real-world example studied here:

So, I guess I'd like to see an optional update-frequency hint element.
Thanks,
Chris

Re: Atom feed refresh rates

2005-05-04 Thread Roger B.


> This is a myth perpetuated by cheapskate bloggers. There's no
> technical reason for it beyond "I bought a lousy hosting package".

Graham: I disagree. In a time where referrer and trackback spam agents
are hammering servers everywhere, it's quite reasonable for aggregator
developers to exhibit restraint and not add to the burden that the
blogosphere has unintentionally created.

That's not to say that there's something necessarily wrong with an
aggregator that allows users to pull feeds every five minutes. If
you're building something for people who are going to be subscribing
to Gmail feeds or referrer logs (I'm subscribed to both in
Newzcrawler), then you have to cater to those needs. The most anyone
can ask is that you provide reasonable defaults and leave it at that.

But I've got my own code set to limit refreshes to an hour or more,
and don't forsee changing it. It's the right thing for *me* to do.

--
Roger Benningfield

Re: Atom feed refresh rates

2005-05-04 Thread Tim Bray

On May 4, 2005, at 7:44 AM, Graham wrote:
A quick look at that site turned up only one other site actually 
complaining, MSDN, and they changed their minds:
Actually, as I recall, last time this came up (proposed by Walter 
Underwood), someone pointed out accurately that RSS2 has had this 
functionality for a long time and that nobody ever really implemented 
it; thus there was a strong vote from experience against such a 
feature. -Tim

Re: Atom feed refresh rates

2005-05-04 Thread Eric Scheid


On 5/5/05 12:44 AM, "Graham" <[EMAIL PROTECTED]> wrote:

> uses 3GB a day, or about $1.20 at current prices.

only in some parts of the world.

over here I'm paying 13.2 cents per K and reading from a recent bill
2,982.61 Kbytes cost me $393.79 AUD.

e.

Re: Atom feed refresh rates

2005-05-04 Thread Robert Sayre

On 5/4/05, Walter Underwood <[EMAIL PROTECTED]> wrote:
> 
> PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP
> or some other protocol.
> 
> PaceCaching was rejected by the editors because it was too late (two months
> ago) and non-core.

In this WG, the editors don't reject proposals or schedule issues.
Those tasks fall to the chairs and secretary, respectively.

Robert Sayre

Re: Atom feed refresh rates

2005-05-04 Thread Graham

On 4 May 2005, at 2:11 pm, Andy Henderson wrote:
But it's the only example I know of, of a large scale site that's
complained. Note they've never (as far as I know) said that it was  
costing
them money or causing them other problems, they've just arbitrarily
complained about "poorly written" programs and decreed how often  
people may
check their feed. There's no evidence of a technical problem here,  
just a
bunch of uptight twunts telling other people how to write their  
software.<<<
I'm no expert on the provider side, but I believe it to be a wider  
issue.
There was some discussion at http://regsched.bookinfo.info/ and some
attempts to implement bandwidth throttling but it petered out  
because, it
seemed to me, the problem was very difficult to solve from the  
provider's
end.
A quick look at that site turned up only one other site actually  
complaining, MSDN, and they changed their minds:

Scoble's post was widely debated by bloggers before being corrected  
by Microsoft's Sara Williams. "In a nutshell, our RSS traffic is  
neglible compared to all the traffic generated by Windows Update,  
MSN, downloads, and the rest of microsoft.com," wrote Williams. "We  
were motivated to reduce the size of the blogs.msdn.com home page  
primarily for operational efficiency's sake."

According to that site, Boing Boing's RSS feed (one of the most  
subscribed to) uses 3GB a day, or about $1.20 at current prices.  
Bandwidth is only going to get cheaper, and designing a system that  
might save a few gigs is going to be about as relevant in the future  
as the LOWSRC IMG tag. I think the forward looking thing to do would  
be for Atom to ignore this issue.

Graham

Re: Atom feed refresh rates

2005-05-04 Thread Walter Underwood


PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP
or some other protocol.

PaceCaching was rejected by the editors because it was too late (two months
ago) and non-core. I think that: a) it is never too late to get it right,
and b) scalability is core.

The PACE describes why refresh rates do not solve the problem adequately.

wunder

--On May 4, 2005 5:44:18 AM -0500 Brett Lindsley <[EMAIL PROTECTED]> wrote:

> 
> 
> Andy, I recall bringing up the same issue with respect to portable devices. 
> My angle
> was that firing up the transmitter, making a network connection and 
> connecting to
> the server is still an expensive operation in time and power (for a portable
> device) - even if the server returns nothing .  There is no reason to check 
> feeds
> that are not being updated, but then, there currently is no way to know this.
> 
> I recall there was a proposal on cache control. That seemed like a good 
> direction,
> but I don't recall it being discussed. As you indicated, if the feed had some
> element that indicated it won't be updated (for example) for another day (e.g.
> a "daily news summary"), then the end client would need to only check once
> a day.
> 
> Brett Lindsley, Motorola Labs
> 
> Andy Henderson wrote:
> 
>> If I'm asking this in the wrong place, sorry; please redirect me if you can.
>> 
>> I am the author of an Aggregator and I'm looking for advice on refresh
>> rates.  There was some discussion in this group back in June about a
>> possible 'Refresh rate' element.  That seems to have been dismissed in
>> favour of bandwidth throttling techniques, notably etag, last-modified and
>> compression.  I already support all these plus some additional ones.  I am
>> uncomfortable, though, with the implication that refresh rates don't matter
>> and should be left to the end-user to decide.
>> 
>> I am adding Atom support to my Agg.  For RSS feeds, I have used the ttl and
>> sy:updatePeriod / sy:updateFrequency elements to  allow feed providers to
>> limit refresh rates.  I have, in any case, imposed a minimum refresh rate of
>> one hour - because that seemed the decent thing to do.  However, I'm coming
>> under pressure to reduce that minimum limit for feeds that are clearly
>> designed for shorter refresh periods - such as the Gmail Atom feeds.  I'm
>> reluctant to implement a free-for-all so I'm looking for guidance on how I
>> should tackle this issue.
>> 
>> Andy Henderson
>> Constructive IT Advice
>> 
>>  
>> 
> 
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-04 Thread Henry Story


On 4 May 2005, at 14:55, Mark Pilgrim wrote:
On 5/4/05, Graham <[EMAIL PROTECTED]> wrote:
On 4 May 2005, at 12:29 pm, Julian Reschke wrote:
Isn't this what the HTTP "Expires" header is for
+1
+1
+1
Also, that belongs in an implementation guide, not a format spec.
Like this one:
http://diveintomark.org/rfc/draft-ietf-atompub-impl-guide-00.html
I am soliciting co-editors.
--
Cheers,
-Mark

Re: Atom feed refresh rates

2005-05-04 Thread Graham


On 4 May 2005, at 1:51 pm, Andy Henderson wrote:
This is a myth perpetuated by cheapskate bloggers. There's no  
technical

reason for it beyond "I bought a lousy hosting package".<<<
I don't think so.  Slashdot is just one example of a feed that has
introduced bandwidth throttling in response to aggregators polling too
frequently.
But it's the only example I know of, of a large scale site that's  
complained. Note they've never (as far as I know) said that it was  
costing them money or causing them other problems, they've just  
arbitrarily complained about "poorly written" programs and decreed  
how often people may check their feed. There's no evidence of a  
technical problem here, just a bunch of uptight twunts telling other  
people how to write their software.

Graham

Re: Atom feed refresh rates

2005-05-04 Thread Mark Pilgrim

On 5/4/05, Graham <[EMAIL PROTECTED]> wrote:
> On 4 May 2005, at 12:29 pm, Julian Reschke wrote:
> > Isn't this what the HTTP "Expires" header is for
> +1

+1

Also, that belongs in an implementation guide, not a format spec. 
Like this one:

http://diveintomark.org/rfc/draft-ietf-atompub-impl-guide-00.html

I am soliciting co-editors.

-- 
Cheers,
-Mark

Re: Atom feed refresh rates

2005-05-04 Thread Graham

On 4 May 2005, at 11:44 am, Brett Lindsley wrote:
There is no reason to check feeds that are not being updated, but  
then, there currently is no way to know this.
plug plug: http://www.fondantfancies.com/apps/shrook/distfaq.php
As you indicated, if the feed had some
element that indicated it won't be updated (for example) for  
another day (e.g.
a "daily news summary"), then the end client would need to only  
check once
a day.
Please don't confuse bandwidth (number of posts per day) with latency  
(checking rate). They're largely unrelated. You could only check once  
per day if the daily summary appeared at an exact, known time, was  
never late, and was never updated later.

Graham

Re: Atom feed refresh rates

2005-05-04 Thread Julian Reschke

Andy Henderson wrote:
Isn't this what the HTTP "Expires" header is for
()?<<<
I don't think this helps a lot with my original issue because in many cases
a feed's updater will either not know when they will next update the feed,
> or will be updating the feed frequently throughout the day.
If they don't know that, how can the previous response you got help you 
in determining when to poll next?

Best regards,
Julian

Re: Atom feed refresh rates

2005-05-04 Thread Eric Scheid

On 4/5/05 8:44 PM, "Brett Lindsley" <[EMAIL PROTECTED]> wrote:

> As you indicated, if the feed had some element that indicated it won't be
> updated (for example) for another day (e.g. a "daily news summary"), then the
> end client would need to only check once a day.

aggregators could also take note of the suggested refresh time at the time
of subscription, setting the refresh rate as appropriate as a default.

for the common case of an auto-discovery feed subscription, this would be
quite handy.

if users want to override that default then they can.

e.

Re: Atom feed refresh rates

2005-05-04 Thread Graham

On 4 May 2005, at 9:10 am, Andy Henderson wrote:
I am adding Atom support to my Agg.  For RSS feeds, I have used the  
ttl and
sy:updatePeriod / sy:updateFrequency elements to  allow feed  
providers to
limit refresh rates.
Why?
I have, in any case, imposed a minimum refresh rate of one hour -  
because that seemed the decent thing to do.
This is a myth perpetuated by cheapskate bloggers. There's no  
technical reason for it beyond "I bought a lousy hosting package".

However, I'm coming under pressure to reduce that minimum limit for  
feeds that are clearly
designed for shorter refresh periods - such as the Gmail Atom  
feeds.  I'm reluctant to implement a free-for-all so I'm looking  
for guidance on how I should tackle this issue.
Keep the global setting for all feeds limited to 60 (or 30) minutes,  
but allow the setting for individual feeds to be set lower.

Graham

Re: Atom feed refresh rates

2005-05-04 Thread Graham

On 4 May 2005, at 12:29 pm, Julian Reschke wrote:
Isn't this what the HTTP "Expires" header is for ()?
+1
Graham

RE: Atom feed refresh rates

2005-05-04 Thread Andy Henderson


>>>Isn't this what the HTTP "Expires" header is for
()?<<<

I don't think this helps a lot with my original issue because in many cases
a feed's updater will either not know when they will next update the feed,
or will be updating the feed frequently throughout the day.

Andy

Re: Atom feed refresh rates

2005-05-04 Thread Brett Lindsley

In reviewing the protocol spec (and the basic protocol spec), there is 
no mention
of recommended HTTP headers. There are examples in the basic protocol 
spec that
shows ETag and Last-Modified but not Expires. Maybe there should be a 
section
in the protocol spec showing "recommended headers" (a SHOULD) for use
with Atom feeds. This would encourage the use of these three headers.

Brett Lindsley, Motorola Labs.
Julian Reschke wrote:
Brett Lindsley wrote:

Andy, I recall bringing up the same issue with respect to portable 
devices. My angle
was that firing up the transmitter, making a network connection and 
connecting to
the server is still an expensive operation in time and power (for a 
portable
device) - even if the server returns nothing .  There is no reason to 
check feeds
that are not being updated, but then, there currently is no way to 
know this.

I recall there was a proposal on cache control. That seemed like a 
good direction,
but I don't recall it being discussed. As you indicated, if the feed 
had some
element that indicated it won't be updated (for example) for another 
day (e.g.
a "daily news summary"), then the end client would need to only check 
once
a day.

Brett Lindsley, Motorola Labs

Isn't this what the HTTP "Expires" header is for 
()?

Best regards, Julian

Re: Atom feed refresh rates

2005-05-04 Thread Julian Reschke

Brett Lindsley wrote:

Andy, I recall bringing up the same issue with respect to portable 
devices. My angle
was that firing up the transmitter, making a network connection and 
connecting to
the server is still an expensive operation in time and power (for a 
portable
device) - even if the server returns nothing .  There is no reason to 
check feeds
that are not being updated, but then, there currently is no way to know 
this.

I recall there was a proposal on cache control. That seemed like a good 
direction,
but I don't recall it being discussed. As you indicated, if the feed had 
some
element that indicated it won't be updated (for example) for another day 
(e.g.
a "daily news summary"), then the end client would need to only check once
a day.

Brett Lindsley, Motorola Labs
Isn't this what the HTTP "Expires" header is for 
()?

Best regards, Julian

Re: Atom feed refresh rates

2005-05-04 Thread Brett Lindsley


Andy, I recall bringing up the same issue with respect to portable 
devices. My angle
was that firing up the transmitter, making a network connection and 
connecting to
the server is still an expensive operation in time and power (for a 
portable
device) - even if the server returns nothing .  There is no reason to 
check feeds
that are not being updated, but then, there currently is no way to know 
this.

I recall there was a proposal on cache control. That seemed like a good 
direction,
but I don't recall it being discussed. As you indicated, if the feed had 
some
element that indicated it won't be updated (for example) for another day 
(e.g.
a "daily news summary"), then the end client would need to only check once
a day.

Brett Lindsley, Motorola Labs
Andy Henderson wrote:
If I'm asking this in the wrong place, sorry; please redirect me if you can.
I am the author of an Aggregator and I'm looking for advice on refresh
rates.  There was some discussion in this group back in June about a
possible 'Refresh rate' element.  That seems to have been dismissed in
favour of bandwidth throttling techniques, notably etag, last-modified and
compression.  I already support all these plus some additional ones.  I am
uncomfortable, though, with the implication that refresh rates don't matter
and should be left to the end-user to decide.
I am adding Atom support to my Agg.  For RSS feeds, I have used the ttl and
sy:updatePeriod / sy:updateFrequency elements to  allow feed providers to
limit refresh rates.  I have, in any case, imposed a minimum refresh rate of
one hour - because that seemed the decent thing to do.  However, I'm coming
under pressure to reduce that minimum limit for feeds that are clearly
designed for shorter refresh periods - such as the Gmail Atom feeds.  I'm
reluctant to implement a free-for-all so I'm looking for guidance on how I
should tackle this issue.
Andy Henderson
Constructive IT Advice

45 matches

Mail list logo