thanks markus, i will try implementing a scoring filter.
----- Mensaje original ----- De: "Markus Jelsma" <[email protected]> Para: [email protected] Enviados: Viernes, 6 de Febrero 2015 16:37:36 Asunto: [MASSMAIL]RE: how to crawl image first on every round of nutch? Implement a ScoringFilter, specifically the generate something method(), and emit a high float for image MIME's. -----Original message----- > From:Eyeris RodrIguez Rueda <[email protected]> > Sent: Friday 6th February 2015 19:54 > To: [email protected] > Subject: how to crawl image first on every round of nutch? > > Hi all. > > I want to use nutch for to crawl images, but my problem is how to fetch > images first, from crawldb on every round of crawl. > I was reading about AdaptiveFetchSchedule by MIME-type option but i´m not > sure if this solve my problem because it only function when nutch has crawl > the link at least once and extracted metadata of it. > > In my case you crawl page A and you discover 5 links to images, i want to > fetch in the next round that images, before other types of documents of > crawldb. > Is there any way to prioritize images on every round of crawl? > > I´m using nutch 1.9 and solr 4.10 in local mode. > > > --------------------------------------------------- > XII Aniversario de la creación de la Universidad de las Ciencias > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. >

