Does anyone know of a way to crawl a website, but ignore headers and footers, 
or include just the main content of a site by say only including content in a 
<div class="main">, for example.

I have tried using https://github.com/BayanGroup/nutch-custom-search in Nutch 
1.9 but I can't get it to work.

Any ideas greatly appreciated.

Thanks

Regards 

Mark Wilson
[email protected]



Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to