Hi.I crawl a webdatabase for *.html, *.pdf and *.doc documents, with a given topN. I want nutch to fetch first all of the html documents, then pdf and at last doc, because html is more important than pdf and so on. Is there a way to make nutch follow such rules (maybe with a scoring algorithm)?
Regards Stefan -- Stefan Scheffler Avantgarde Labs GbR Löbauer Straße 19, 01099 Dresden Telefon: + 49 (0) 351 21590834 Email: [email protected]

