[Nutch-general] Extracting multiple entries from a single URL

Ragy Eleish Thu, 23 Feb 2006 10:15:02 -0800

Hi,

I have a need to get multiplte search results entries from a single URL. For
example I want to index the photo captions in this url
http://racer007.albumpost.com/montreal without having to navigate to each
picture page, because sometimes there is no individual picture page.


I did it by writing an HTMLParserFilter, modifying ParseData, and Fetcher,
then disabling the clean duplicate code in the CrawlerTool. I did this in
Nutch 0.7.1 Is there a better way of doing thing?

Regards

--Ragy

[Nutch-general] Extracting multiple entries from a single URL

Reply via email to