Re: priorised/scored fetching

Stefan Scheffler Tue, 02 Oct 2012 01:36:31 -0700

Ah Ok. Thank you.
This sounds like my intention :)

Regards
Stefan


Am 02.10.2012 10:34, schrieb Julien Nioche:

you should be able to do that with a custom scoring filter and give a score
based on the mime type

On 2 October 2012 08:28, Markus Jelsma <[email protected]> wrote:

Hi - There's nothing like that yet. What you can do is run a custom URL
filter for the generate step, allowing only HTML files and use your
standard URL filter for the other steps.



-----Original message-----

From:Stefan Scheffler <[email protected]>
Sent: Tue 02-Oct-2012 09:24
To: [email protected]
Subject: priorised/scored fetching

Hi.
I crawl a webdatabase for *.html, *.pdf and *.doc documents, with a
given topN. I want nutch to fetch first all of the html documents, then
pdf and at last doc, because html is more important than pdf and so on.
Is there a way to make nutch follow such rules (maybe with a scoring
algorithm)?

Regards
Stefan

--
Stefan Scheffler
Avantgarde Labs GbR
Löbauer Straße 19, 01099 Dresden
Telefon: + 49 (0) 351 21590834
Email: [email protected]



--
Stefan Scheffler
Avantgarde Labs GbR
Löbauer Straße 19, 01099 Dresden
Telefon: + 49 (0) 351 21590834
Email: [email protected]

Re: priorised/scored fetching

Reply via email to