Hi Lewis,
I think I narrowed the problem to SelectorEntryComparator class nested in
GeneratorJob. In debugger during crash I noticed there a single instance of
SelectorEntryComparator shared across multiple reducer tasks. The class is
inherited from org.apache.hadoop.io.WritableComparator that has a few members
unprotected for concurrent usage. At some point multiple threads may access
those members in WritableComparator.compare call. I modified
SelectorEntryComparator and it seems solved the problem but I am not sure if
the change is appropriate and/or sufficient (covers GENERATE only?)
Original code:
============================
public static class SelectorEntryComparator extends WritableComparator {
public SelectorEntryComparator() {
super(SelectorEntry.class, true);
}
}
Modified code:
============================
public static class SelectorEntryComparator extends WritableComparator {
public SelectorEntryComparator() {
super(SelectorEntry.class, true);
}
@Override
synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int
s2, int l2) {
return super.compare(b1, s1, l1, b2, s2, l2);
}
}
Regards,
Vyacheslav Pascarel
-----Original Message-----
From: lewis john mcgibbney [mailto:[email protected]]
Sent: Wednesday, June 21, 2017 1:41 PM
To: [email protected]
Subject: [EXTERNAL] - Re: ERROR: Cannot run job worker!
Hi Vyacheslav,
Which version of Nutch are you using? 2.x?
lewis
On Wed, Jun 21, 2017 at 10:32 AM, <[email protected]> wrote:
>
>
> From: Vyacheslav Pascarel <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Bcc:
> Date: Wed, 21 Jun 2017 17:32:15 +0000
> Subject: ERROR: Cannot run job worker!
> Hello,
>
> I am writing an application that performs web site crawling using
> Nutch REST services. The application:
>
>
>