Re: Scraping news feeds?

2001-03-09 Thread mallum



I run http://10.am and do this on a largish scale.

For aggregating RSS feeds I use RSSLite [1] rather than XML::RSS. RSSLite
avoids using expat and is a little naughty in parsing XML that would make
expat barf ( Alot of RSS feeds unfortunatly contain bad XML ).

For actual scaping of sites I basically use meaty regexps or HTML::Parser.

10.am also supplys feeds [2] in RSS if you want to use them.

I hope to Open Source 10.am in the near future when I sort out some
contractual obligations.

mallum

[1] http://industrial-linux.org/RSSLite/
[2] http://10.am/docs/feeds.htm (eg http://10.am/Development/Perl-rss )

on Wed, Mar 07, 2001 at 04:36:56PM +, Dave Hodgkinson wrote:
 
 What's the best way to scrape a variety of news headlines from various
 sites? Sort of a moreover for the intranet...
 
 
 -- 
 Dave Hodgkinson, http://www.hodgkinson.org
 Editor-in-chief, The Highway Star   http://www.deep-purple.com
   Apache, mod_perl, MySQL, Sybase hired gun for, well, hire
   -
 



Re: Scraping news feeds?

2001-03-09 Thread DJ Adams

On Fri, Mar 09, 2001 at 12:50:17PM -0500, mallum wrote:
 
 
 I run http://10.am and do this on a largish scale.

Mallum - greetings! 

I didn't know you were on this list - then again, I am crap at keeping
up anyway...

dj



Re: Scraping news feeds?

2001-03-09 Thread Dave Cross

At Fri, 9 Mar 2001 12:00:19 +, Michael Stevens [EMAIL PROTECTED] wrote:
 On Fri, Mar 09, 2001 at 12:50:17PM -0500, mallum wrote:
  For aggregating RSS feeds I use RSSLite [1] rather than XML::RSS. 
  RSSLite avoids using expat and is a little naughty in parsing XML 
  that would make expat barf ( Alot of RSS feeds unfortunatly contain 
  bad XML ).
 
 That way lies madness.

Which is what I said - but more succinct :)



Re: Scraping news feeds?

2001-03-07 Thread Dominic Mitchell

On Wed, Mar 07, 2001 at 04:36:56PM +, Dave Hodgkinson wrote:
 What's the best way to scrape a variety of news headlines from various
 sites? Sort of a moreover for the intranet...

Probably using RSS (XML file format) and XML::RSS (which includes a nice
scraper tool).

-Dom



Re: Scraping news feeds?

2001-03-07 Thread Simon Wilcox

At 12:54 07/03/2001 -0500, Dave Cross wrote:

[snip]

 
  Chapter 10 isn't it Dave ?

Section 10.4 to be precise. "Specialized parsers - XML::RSS" :)

You've got a bit further since last Thursday then!

Yep, but not quite that far !

Also been reading Rebel Code which has a nice bit about Perl in it.

Simon.