Re: RSS link extractor

Doğacan Güney Wed, 18 Jul 2007 23:02:09 -0700

Hi,

On 7/19/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

Has anyone written a tool or used a plugin that lets you pull out the
RSS url from a crawled HTML page? (the link rel=alternate)


I know about the parse-rss plugin, but that seems to work on an
already discovered rss link. I want to look at my already crawled
crawldb and pull out all the rss URLs.


I haven't tested it and I don't know if it compiles against latest
trunk, but NUTCH-412 has a patch to extract rss links. If you try it
out, please send your feedback. This seems to be useful to some, so if
it is working, we can commit it.

--
Doğacan Güney

Re: RSS link extractor

Reply via email to