Using Nutch as a crawler for solr.

I've been digging around the nutch-user archives a bit and have seen some people discussing how to ignore menu items or other unnecessary div areas like common footers, etc. I still haven't come across a full answer yet.

Is there a to define a div by id that nutch will strip out before tossing the content into solr?


Reply via email to