Hello Lewis,
We do have some weird and complicated rules, but these should not time out for
450 seconds, e.g. keep the JVM busy for that amount of time. We still haven't
fully investigated yet so it is a possibility some sitemap entries are very
long and complicated. But 450 seconds, very odd,
Hi Sebastian and lewis,
Did build on other machine and diffed the runtime log. Got the issues
pretty clear
yes, the build was not proper. Got it resolved.
Happy crawling.
Regards,
GoViNd
On Mon, Jan 15, 2018 at 2:04 AM, Sebastian Nagel wrote:
> Hi Govind,
>
>
I'll fix NUTCH-2466 this afternoon.
-Original message-
> From:Sebastian Nagel
> Sent: Wednesday 17th January 2018 14:09
> To: user@nutch.apache.org
> Subject: Re: SitemapProcessor destroyed our CrawlDB
>
> It was finally Omkar who brought NUTCH-2442
Ah thanks!
I knew you'd fixed some of these, now i know my patch of NUTCH-2466 silently
removes your commit!
My bad, thanks!
Markus
-Original message-
> From:Sebastian Nagel
> Sent: Wednesday 17th January 2018 13:32
> To: user@nutch.apache.org
> Subject:
It was finally Omkar who brought NUTCH-2442 forward.
Time to review the patch of NUTCH-2466!
On 01/17/2018 01:53 PM, Markus Jelsma wrote:
> Ah thanks!
>
> I knew you'd fixed some of these, now i know my patch of NUTCH-2466 silently
> removes your commit!
>
> My bad, thanks!
> Markus
>
>
Hello,
We noticed some abnormalities in our crawl cycle caused by a sudden reduction
of our CrawlDB's size. The SitemapProcessor ran, failed (timed out, see below)
and left us with a decimated CrawlDB.
This is odd because of:
} catch (Exception e) {
if (fs.exists(tempCrawlDb))
6 matches
Mail list logo