* David Cantrell <[EMAIL PROTECTED]> [2006-08-23 09:35]:
> You probably want http://use.perl.org/~username/journal/rss

Actually, <http://use.perl.org/~username/journal/atom> – as well
as <http://use.perl.org/search.pl?op=journals;content_type=atom>.
The scraper pulls the latter and then iterates over the entries,
pulling individual journal feeds to pluck out fulltext and
transplant it, so I end up with the same feed that I get from
<http://use.perl.org/search.pl?op=journals;content_type=atom>,
only with fulltext included.

It’s not necessary to poll every hour actually. I used to do that
because originally I didn’t know about that hub feed and used a
different feed to scrape:
<http://use.perl.org/journal.pl?op=top;content_type=atom>. That
one only lists journals, not individual entries, so you need to
poll very frequently to avoid missing posts.

Additionally, the next version of my scraper will have a cache so
it can send the appropriate `If-Modified-Since` headers on
requests, so that for feeds I’ve already seen the entire req/resp
cycle takes all of 300 bytes or so instead of pulling down the
whole 50KB shebang every time for no reason at all.

(Yeah, I was really rude. I am embarrassed in retrospect.)

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Reply via email to