Hi RSSers and sitescoopers -- I developed Sitescooper a few years back ( http://sitescooper.org/ ), which scrapes news sites, blogs etc. and renders them down to Palm-format output. I haven't been using it much myself recently -- I've been getting more into RSS and reading updates via mail (using rss2mail) that way, instead of syncing them to my Palm and reading them there.
Recently, I've been running into blogs without decent RSS feeds (ie. short or missing descriptions or content:encoded parts). As a result, it occurred to me that Sitescooper could do with an RSS output mode, which would deal with (a) getting around crappy RSS 0.91-style feeds, and (b) the sites that don't have RSS output at all (although that's stepping on NewsIsFree's toes a little ;). It's also a handy way to scrape into RSS, given that sitescooper has - (a) lots of site descriptions which should mostly work (although a few are suffering bit rot now), - (b) uses the .site file format -- a simple format for rules on how to scrape "stories" from news sites effectively, - (c) has good caching mechanisms, and - (d) pretty good support for wierdness like HTTP redirects and authentication. So anyway -- after some hacking, the CVS version of sitescooper now supports scraping into RSS 2.0. Some fruits of this can be seen at http://sitescooper.org/rss/ . Each .xml is accompanied by the relevant .site file. I don't think it's quite ready for a release just yet, but I thought I'd let you all know about it and get some feedback ;) --j. ------------------------------------------------------- This SF.net email is sponsored by: Etnus, makers of TotalView, The best thread debugger on the planet. Designed with thread debugging features you've never dreamed of, try TotalView 6 free at www.etnus.com. _______________________________________________ Sitescooper-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/sitescooper-talk