Hi - There's nothing like that yet. What you can do is run a custom URL filter for the generate step, allowing only HTML files and use your standard URL filter for the other steps.
-----Original message----- > From:Stefan Scheffler <[email protected]> > Sent: Tue 02-Oct-2012 09:24 > To: [email protected] > Subject: priorised/scored fetching > > Hi. > I crawl a webdatabase for *.html, *.pdf and *.doc documents, with a > given topN. I want nutch to fetch first all of the html documents, then > pdf and at last doc, because html is more important than pdf and so on. > Is there a way to make nutch follow such rules (maybe with a scoring > algorithm)? > > Regards > Stefan > > -- > Stefan Scheffler > Avantgarde Labs GbR > Löbauer Straße 19, 01099 Dresden > Telefon: + 49 (0) 351 21590834 > Email: [email protected] > >

