Hi

We do something similar using a parse filter plugin and a custom scheduler. The 
parse filter plugin contains a SVM classifier that gives a high score to hub 
pages, or pages we consider not important, no content, overviews, lists etc. 
This score is passed back to the CrawlDatum and used in the scheduler to adjust 
fetch time partially based on the hub score.

Markus
 
-----Original message-----
> From:Jorge Luis Betancourt González <[email protected]>
> Sent: Tuesday 18th February 2014 0:48
> To: [email protected]
> Subject: Re: Setting different fetch interval for some pages
> 
> If I'm don't remember wrong in the list there was a patch to accomplish this, 
> specifying the fetch interval in the seed file. Also this could work as a 
> base to implement a custom plugin to accomplish your specific use case. 
> 
> ----- Original Message -----
> From: "Mateusz Zakarczemny" <[email protected]>
> To: [email protected]
> Sent: Monday, February 17, 2014 10:14:14 AM
> Subject: Setting different fetch interval for some pages
> 
> Hi,
> 
> I'm going to crawl some set of news sites. Pages on those sites could be
> divided into two types: category page and article page. I would like to
> fetch categories pages more frequently than article pages. List of
> categories is rather fixed so I could mark them manually.
> 
> I know I could reach similar behaviour using AdaptiveFetchSchedule but it
> require some time to adjust fetch time. This doesn't satisfy me because
> before the fetch I already know how often pages should be re crawled.
> 
> I wonder if it is possible in nutch to set different fetch intervals for
> sites. I know that I could extend AbstractFetchSchedule and implement this
> behaviour manually. This would require adding some extra field to WebPage
> object which indicate what type of page we are dealing with. It is possible
> to add such field to WebPage object? Maybe there is another approach?
> 
> Regards,
> Mateusz
> ________________________________________________________________________________________________
> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
> 2014. Ver www.uci.cu
> 

Reply via email to