Re: Recrawling in nutch 2.x

Ali rahmani Mon, 26 May 2014 00:19:28 -0700

Hi Talat,
We are trying to monitor more than 3000 news site and we should Re-Crawl their 
main pages and store new added links.
In this process, we just need to crawl first two depths(Main Page and its 
links) of each site and we should not have any duplicated URL in our final  
crawl result.
Regards, 
Ali

On Monday, May 26, 2014 8:17:26 AM, Talat Uyarer <[email protected]> wrote:

Hi Ali,

Can you explain us What your exceptation is about recrawling ? Do you want
to set next fetchtime or you want to rerun your crawler ?

Talat
24 May 2014 12:14 tarihinde "Ali rahmani" <[email protected]> yazdı:

> Dear Guys,
> we are working on search engine ,and we have to juest version 2.x(due to
> its ability to connect to HBASE). we tired tens of re-crawling scripts but
> non of them works. Is there any re-crawling scrips for nutch 2.x.
> We also added "db.fetch.interval.default" to "nutch-site.xml" file but
> dose not have any positive effects.
> Regards,

Re: Recrawling in nutch 2.x

Reply via email to