RE: [EXTERNAL] - Re: ERROR: Cannot run job worker!

2017-06-26 Thread Vyacheslav Pascarel
Done - NUTCH-2395

https://issues.apache.org/jira/browse/NUTCH-2395

Regards,

Vyacheslav Pascarel


-Original Message-
From: lewis john mcgibbney [mailto:lewi...@apache.org] 
Sent: Saturday, June 24, 2017 2:27 PM
To: user@nutch.apache.org
Subject: [EXTERNAL] - Re: ERROR: Cannot run job worker!

Hi Vyacheslav,
Thanks for the update, can you please open a ticket at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_NUTCH=DwIBaQ=ZgVRmm3mf2P1-XDAyDsu4A=XeO6ShRDVKU6HktuQu5d6DHtkdlyuxMSWDVUj-ZGQKE=Ti7iePIyYmd-ZZLJikFB-XeUZ91T7llSIXn3mcnxQ0M=5h5L8GfDpA0DjwfnOwcxaZU2WGD4nRU74FhRnbC7hnM=
If you are able to submit a pull request at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_nutch_=DwIBaQ=ZgVRmm3mf2P1-XDAyDsu4A=XeO6ShRDVKU6HktuQu5d6DHtkdlyuxMSWDVUj-ZGQKE=Ti7iePIyYmd-ZZLJikFB-XeUZ91T7llSIXn3mcnxQ0M=9Sw9oUodC8CQBD2WhtzdrZ2Ey098yYpAbLjWwAX6zGw=
 , it would be appreciated.
Lewis

On Sat, Jun 24, 2017 at 9:36 AM, <user-digest-h...@nutch.apache.org> wrote:

>
> From: Vyacheslav Pascarel <vpasc...@opentext.com>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Fri, 23 Jun 2017 13:07:39 +
> Subject: RE: [EXTERNAL] - Re: ERROR: Cannot run job worker!
> Hi Lewis,
>
> I think I narrowed the problem to SelectorEntryComparator class nested 
> in GeneratorJob. In debugger during crash I noticed there a single 
> instance of SelectorEntryComparator shared across multiple reducer 
> tasks. The class is inherited from 
> org.apache.hadoop.io.WritableComparator that has a few members 
> unprotected for concurrent usage. At some point multiple threads may 
> access those members in WritableComparator.compare call. I modified 
> SelectorEntryComparator and it seems solved the problem but I am not 
> sure if the change is appropriate and/or sufficient (covers GENERATE 
> only?)
>
> Original code:
> 
>
>   public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
>   super(SelectorEntry.class, true);
> }
>   }
>
> Modified code:
> 
>   public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
>   super(SelectorEntry.class, true);
> }
>
> @Override
> synchronized public int compare(byte[] b1, int s1, int l1, byte[] 
> b2, int s2, int l2) {
> return super.compare(b1, s1, l1, b2, s2, l2);
> }
>   }
>
>


Re: ERROR: Cannot run job worker!

2017-06-24 Thread lewis john mcgibbney
Hi Vyacheslav,
Thanks for the update, can you please open a ticket at
https://issues.apache.org/jira/projects/NUTCH
If you are able to submit a pull request at https://github.com/apache/nutch/,
it would be appreciated.
Lewis

On Sat, Jun 24, 2017 at 9:36 AM, <user-digest-h...@nutch.apache.org> wrote:

>
> From: Vyacheslav Pascarel <vpasc...@opentext.com>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Fri, 23 Jun 2017 13:07:39 +
> Subject: RE: [EXTERNAL] - Re: ERROR: Cannot run job worker!
> Hi Lewis,
>
> I think I narrowed the problem to SelectorEntryComparator class nested in
> GeneratorJob. In debugger during crash I noticed there a single instance of
> SelectorEntryComparator shared across multiple reducer tasks. The class is
> inherited from org.apache.hadoop.io.WritableComparator that has a few
> members unprotected for concurrent usage. At some point multiple threads
> may access those members in WritableComparator.compare call. I modified
> SelectorEntryComparator and it seems solved the problem but I am not sure
> if the change is appropriate and/or sufficient (covers GENERATE only?)
>
> Original code:
> 
>
>   public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
>   super(SelectorEntry.class, true);
> }
>   }
>
> Modified code:
> 
>   public static class SelectorEntryComparator extends WritableComparator {
> public SelectorEntryComparator() {
>   super(SelectorEntry.class, true);
> }
>
> @Override
> synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2,
> int s2, int l2) {
> return super.compare(b1, s1, l1, b2, s2, l2);
> }
>   }
>
>


RE: [EXTERNAL] - Re: ERROR: Cannot run job worker!

2017-06-23 Thread Vyacheslav Pascarel
Hi Lewis,

I think I narrowed the problem to SelectorEntryComparator class nested in 
GeneratorJob. In debugger during crash I noticed there a single instance of 
SelectorEntryComparator shared across multiple reducer tasks. The class is 
inherited from org.apache.hadoop.io.WritableComparator that has a few members 
unprotected for concurrent usage. At some point multiple threads may access 
those members in WritableComparator.compare call. I modified 
SelectorEntryComparator and it seems solved the problem but I am not sure if 
the change is appropriate and/or sufficient (covers GENERATE only?)

Original code:


  public static class SelectorEntryComparator extends WritableComparator {
public SelectorEntryComparator() {
  super(SelectorEntry.class, true);
}
  }

Modified code:

  public static class SelectorEntryComparator extends WritableComparator {
public SelectorEntryComparator() {
  super(SelectorEntry.class, true);
}

@Override
synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int 
s2, int l2) {
return super.compare(b1, s1, l1, b2, s2, l2);
}
  }

Regards,

Vyacheslav Pascarel


-Original Message-
From: lewis john mcgibbney [mailto:lewi...@apache.org] 
Sent: Wednesday, June 21, 2017 1:41 PM
To: user@nutch.apache.org
Subject: [EXTERNAL] - Re: ERROR: Cannot run job worker!

Hi Vyacheslav,

Which version of Nutch are you using? 2.x?
lewis

On Wed, Jun 21, 2017 at 10:32 AM, <user-digest-h...@nutch.apache.org> wrote:

>
>
> From: Vyacheslav Pascarel <vpasc...@opentext.com>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Wed, 21 Jun 2017 17:32:15 +
> Subject: ERROR: Cannot run job worker!
> Hello,
>
> I am writing an application that performs web site crawling using 
> Nutch REST services. The application:
>
>
>


RE: [EXTERNAL] - Re: ERROR: Cannot run job worker!

2017-06-21 Thread Vyacheslav Pascarel
2.3.1

Regards,

Vyacheslav Pascarel


-Original Message-
From: lewis john mcgibbney [mailto:lewi...@apache.org] 
Sent: Wednesday, June 21, 2017 1:41 PM
To: user@nutch.apache.org
Subject: [EXTERNAL] - Re: ERROR: Cannot run job worker!

Hi Vyacheslav,

Which version of Nutch are you using? 2.x?
lewis

On Wed, Jun 21, 2017 at 10:32 AM, <user-digest-h...@nutch.apache.org> wrote:

>
>
> From: Vyacheslav Pascarel <vpasc...@opentext.com>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Wed, 21 Jun 2017 17:32:15 +
> Subject: ERROR: Cannot run job worker!
> Hello,
>
> I am writing an application that performs web site crawling using 
> Nutch REST services. The application:
>
>
>


Re: ERROR: Cannot run job worker!

2017-06-21 Thread lewis john mcgibbney
Hi Vyacheslav,

Which version of Nutch are you using? 2.x?
lewis

On Wed, Jun 21, 2017 at 10:32 AM, <user-digest-h...@nutch.apache.org> wrote:

>
>
> From: Vyacheslav Pascarel <vpasc...@opentext.com>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Wed, 21 Jun 2017 17:32:15 +
> Subject: ERROR: Cannot run job worker!
> Hello,
>
> I am writing an application that performs web site crawling using Nutch
> REST services. The application:
>
>
>


ERROR: Cannot run job worker!

2017-06-21 Thread Vyacheslav Pascarel
Hello,

I am writing an application that performs web site crawling using Nutch REST 
services. The application:


1.   Injects seed URLs once

2.   Repeats GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of 
times to emulated continuous crawling (each step in sequence is executed upon 
successful competition of the previous step then sequence repeated again)

The application is also capable of running multiple crawls with different crawl 
IDs at the same time. That seems to be putting stress on Nutch and it starts to 
fail with the following error: "Cannot run job worker!". The parallel crawling 
seems to be starting normally with corresponding Nutch jobs finishing as 
expected but eventually they start to break.

Here are some details on crawling (work fine for non-parallel crawls):


-  Seed URL: http://www.cnn.com

-  Regex URL filters: "-^.{1000,}$"  and "+." (1. Exclude very long 
URLs; 2. Include the rest)

-  fetcher.threads.fetch in nutch-site.xml:  2 (smaller number seems to 
reproduce the problem faster, 100 as a value takes longer)

-  Number of parallel crawls: 7

Here is an example of failed job status (in this case GENERATE step failed but 
saw PARSE failing with the same error in other test executions):

{
"id" : "parallel_0-65ff2f1b-382e-4eb2-a813-a0370b84d5b6-GENERATE-1961495833",
"type" : "GENERATE",
"confId" : "65ff2f1b-382e-4eb2-a813-a0370b84d5b6",
"args" : { "topN" : "100" },
"result" : null,
"state" : "FAILED",
"msg" : "ERROR: java.lang.RuntimeException: job failed: 
name=[parallel_0]generate: 1498059912-1448058551, 
jobid=job_local1142434549_0036",
"crawlId" : "parallel_0"
}

Lines from hadoop.log:

2017-06-21 11:45:13,021 WARN  mapred.LocalJobRunner - job_local1142434549_0036
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: java.io.EOFException
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at org.apache.hadoop.io.Text.readString(Text.java:466)
at org.apache.hadoop.io.Text.readString(Text.java:457)
at 
org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
... 12 more
2017-06-21 11:45:13,058 WARN  mapred.LocalJobRunner - job_local1976432650_0038
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.io.EOFException
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1245)
at 
org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
at 
org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1575)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuf