Re: Whether Nutch AdaptiveFetchSchedule can do recrawling automatically?

Lewis John Mcgibbney Thu, 18 Apr 2013 10:42:37 -0700

Hi Senthilkumar,
In short, search recrawl from the Nutch wiki to find an external blog post
on recrawling with Nutch. If you have anything to add to the post contact
the author. If on the other hand you need clarification on anything then
ping us here


Hth

Lewis
On Thursday, April 18, 2013, mesenthil1 <
[email protected]> wrote:
> Hi,
>
> Can some one please explain how the following scenario works?
>
> I need to crawl a site with 50K urls.  This site is a dynamic site and
will
> have frequent updates on the site. Assuming it takes 2 days to completely
> crawl this site, can we have some configuration(fetch schedule or
something
> else) so that once the crawl cycle is complete, the next crawl cycle will
> start automatically after two days to find the new URLS. If this feature
is
> not available, should we manually control the repeated crawling of the
site
> thru some sort of scripting?
>
> Actually we will have to crawl more than 50 sites to be crawled
separately.
> If we need to maintain re-crawling of each site, should we have 50
separate
> scripts to handle them. Please let us know if anyone has faced this
> situation?
>
>
> Thanks,
> Senthil
>
>
>
>
>
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Whether-Nutch-AdaptiveFetchSchedule-can-do-recrawling-automatically-tp4056979p4057036.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

-- 
*Lewis*

Re: Whether Nutch AdaptiveFetchSchedule can do recrawling automatically?

Reply via email to