Hi.
I crawl a webdatabase for *.html, *.pdf and *.doc documents, with a given topN. I want nutch to fetch first all of the html documents, then pdf and at last doc, because html is more important than pdf and so on. Is there a way to make nutch follow such rules (maybe with a scoring algorithm)?

Regards
Stefan

--
Stefan Scheffler
Avantgarde Labs GbR
Löbauer Straße 19, 01099 Dresden
Telefon: + 49 (0) 351 21590834
Email: [email protected]

Reply via email to