Re: Scraping news feeds?
I run http://10.am and do this on a largish scale. For aggregating RSS feeds I use RSSLite [1] rather than XML::RSS. RSSLite avoids using expat and is a little naughty in parsing XML that would make expat barf ( Alot of RSS feeds unfortunatly contain bad XML ). For actual scaping of sites I basically use meaty regexps or HTML::Parser. 10.am also supplys feeds [2] in RSS if you want to use them. I hope to Open Source 10.am in the near future when I sort out some contractual obligations. mallum [1] http://industrial-linux.org/RSSLite/ [2] http://10.am/docs/feeds.htm (eg http://10.am/Development/Perl-rss ) on Wed, Mar 07, 2001 at 04:36:56PM +, Dave Hodgkinson wrote: What's the best way to scrape a variety of news headlines from various sites? Sort of a moreover for the intranet... -- Dave Hodgkinson, http://www.hodgkinson.org Editor-in-chief, The Highway Star http://www.deep-purple.com Apache, mod_perl, MySQL, Sybase hired gun for, well, hire -
Re: Scraping news feeds?
On Fri, Mar 09, 2001 at 12:50:17PM -0500, mallum wrote: I run http://10.am and do this on a largish scale. Mallum - greetings! I didn't know you were on this list - then again, I am crap at keeping up anyway... dj
Re: Scraping news feeds?
At Fri, 9 Mar 2001 12:00:19 +, Michael Stevens [EMAIL PROTECTED] wrote: On Fri, Mar 09, 2001 at 12:50:17PM -0500, mallum wrote: For aggregating RSS feeds I use RSSLite [1] rather than XML::RSS. RSSLite avoids using expat and is a little naughty in parsing XML that would make expat barf ( Alot of RSS feeds unfortunatly contain bad XML ). That way lies madness. Which is what I said - but more succinct :)
Re: Scraping news feeds?
On Wed, Mar 07, 2001 at 04:36:56PM +, Dave Hodgkinson wrote: What's the best way to scrape a variety of news headlines from various sites? Sort of a moreover for the intranet... Probably using RSS (XML file format) and XML::RSS (which includes a nice scraper tool). -Dom
Re: Scraping news feeds?
At 12:54 07/03/2001 -0500, Dave Cross wrote: [snip] Chapter 10 isn't it Dave ? Section 10.4 to be precise. "Specialized parsers - XML::RSS" :) You've got a bit further since last Thursday then! Yep, but not quite that far ! Also been reading Rebel Code which has a nice bit about Perl in it. Simon.