I am using nutch 2.2.1 together with hbase 0.90.6 and all is working as
expected besides a random hang up of the parse process.
there are no error/warn messages in neither hadoop, nutch and zookeeper
logs. when I do a thread dump I get the following output:
"pool-1-thread-1-SendThread(localhost:2181)" daemon prio=10
tid=0x00007fcbf0204800 nid=0x556c runnable [0x00007fcbeeeed000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1931910> (a sun.nio.ch.Util$2)
- locked <0x00000000c1931920> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c19318c8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"pool-1-thread-1-EventThread" daemon prio=10 tid=0x00007fcbf041d800
nid=0x556b waiting on condition [0x00007fcbeefee000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c19319a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"pool-1-thread-1-SendThread(localhost:2181)" daemon prio=10
tid=0x00007fcbf041c800 nid=0x556a runnable [0x00007fcbef0ef000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1931b60> (a sun.nio.ch.Util$2)
- locked <0x00000000c1931b70> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1931b18> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"pool-1-thread-1-EventThread" daemon prio=10 tid=0x00000000012c4000
nid=0x5569 waiting on condition [0x00007fcbef1f0000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1931f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"pool-1-thread-1-SendThread(localhost:2181)" daemon prio=10
tid=0x00000000012c2000 nid=0x5568 runnable [0x00007fcbef2f1000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c19e3c28> (a sun.nio.ch.Util$2)
- locked <0x00000000c19e3c18> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c19e3890> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"pool-1-thread-1-EventThread" daemon prio=10 tid=0x00000000007b1000
nid=0x5567 waiting on condition [0x00007fcbef3f2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c19e3f78> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"pool-1-thread-1-SendThread(localhost:2181)" daemon prio=10
tid=0x00000000007b0800 nid=0x5566 runnable [0x00007fcbef4f3000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1a44ed8> (a sun.nio.ch.Util$2)
- locked <0x00000000c1a44ee8> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1a44e90> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"pool-1-thread-1" prio=10 tid=0x0000000001113800 nid=0x555d runnable
[0x00007fcbef6f5000]
java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3694)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4158)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$Start.match(Pattern.java:3408)
at java.util.regex.Matcher.search(Matcher.java:1199)
at java.util.regex.Matcher.find(Matcher.java:592)
at
org.apache.nutch.urlfilter.regex.RegexURLFilter$Rule.match(RegexURLFilter.java:100)
at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.filter(RegexURLFilterBase.java:129)
at org.apache.nutch.net.URLFilters.filter(URLFilters.java:88)
at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:257)
at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:131)
at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
"Thread-3" prio=10 tid=0x000000000141a800 nid=0x555c waiting on
condition [0x00007fcbef7f6000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1a5ef00> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:341)
"main-EventThread" daemon prio=10 tid=0x00007fcbf905c800 nid=0x5559
waiting on condition [0x00007fcbef8f7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1ae3870> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007fcbf905b800
nid=0x5558 runnable [0x00007fcbef9f8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1b491d0> (a sun.nio.ch.Util$2)
- locked <0x00000000c1b491e0> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1b49188> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"main-EventThread" daemon prio=10 tid=0x00007fcbf8576000 nid=0x5557
waiting on condition [0x00007fcbefaf9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1b4d118> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007fcbf8574800
nid=0x5556 runnable [0x00007fcbefbfa000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1b551d0> (a sun.nio.ch.Util$2)
- locked <0x00000000c1b551e0> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1b55188> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"main-EventThread" daemon prio=10 tid=0x00007fcbf82fd800 nid=0x5555
waiting on condition [0x00007fcbefcfb000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1b4d218> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"main-SendThread(localhost:2181)" daemon prio=10 tid=0x00007fcbf8d47000
nid=0x5554 runnable [0x00007fcbefdfc000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1b4d3d0> (a sun.nio.ch.Util$2)
- locked <0x00000000c1b4d3e0> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1b4d388> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"main-EventThread" daemon prio=10 tid=0x0000000001149800 nid=0x5550
waiting on condition [0x00007fcbf41c6000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c1b5d118> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:477)
"main-SendThread(localhost:2181)" daemon prio=10 tid=0x0000000001167800
nid=0x554f runnable [0x00007fcbf42c7000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c1b4d520> (a sun.nio.ch.Util$2)
- locked <0x00000000c1b4d530> (a
java.util.Collections$UnmodifiableSet)
- locked <0x00000000c1b4d4d8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1066)
"Service Thread" daemon prio=10 tid=0x00007fcbf8004800 nid=0x5547
runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007fcbf8001800 nid=0x5546
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00000000006b3800 nid=0x5545
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00000000006b1800 nid=0x5544
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x0000000000691800 nid=0x5543 in
Object.wait() [0x00007fcbf582b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000c1869120> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x0000000000688000 nid=0x5542 in
Object.wait() [0x00007fcbf592c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000c1868ba8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x0000000000612800 nid=0x5533 waiting on condition
[0x00007fcc05311000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1387)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:583)
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.parse.ParserJob.run(ParserJob.java:253)
at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:261)
at org.apache.nutch.parse.ParserJob.run(ParserJob.java:304)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserJob.main(ParserJob.java:308)
"VM Thread" prio=10 tid=0x0000000000685800 nid=0x5541 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000000628800
nid=0x5534 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000062a000
nid=0x5535 runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x000000000062c000
nid=0x5536 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x000000000062e000
nid=0x5537 runnable
"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000000630000
nid=0x5538 runnable
"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000000631800
nid=0x5539 runnable
"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000000633800
nid=0x553a runnable
"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000000635800
nid=0x553b runnable
"GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000000637000
nid=0x553c runnable
"GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000000639000
nid=0x553d runnable
"GC task thread#10 (ParallelGC)" prio=10 tid=0x000000000063b000
nid=0x553e runnable
"GC task thread#11 (ParallelGC)" prio=10 tid=0x000000000063d000
nid=0x553f runnable
"GC task thread#12 (ParallelGC)" prio=10 tid=0x000000000063e800
nid=0x5540 runnable
"VM Periodic Task Thread" prio=10 tid=0x00000000006e2000 nid=0x5548
waiting on condition
JNI global references: 267
Heap
PSYoungGen total 148480K, used 130347K [0x00000000eb280000,
0x00000000fb200000, 0x0000000100000000)
eden space 146944K, 87% used
[0x00000000eb280000,0x00000000f3058958,0x00000000f4200000)
from space 1536K, 96% used
[0x00000000fa900000,0x00000000faa72690,0x00000000faa80000)
to space 5120K, 0% used
[0x00000000fad00000,0x00000000fad00000,0x00000000fb200000)
ParOldGen total 559104K, used 270807K [0x00000000c1800000,
0x00000000e3a00000, 0x00000000eb280000)
object space 559104K, 48% used
[0x00000000c1800000,0x00000000d2075e78,0x00000000e3a00000)
PSPermGen total 22016K, used 22008K [0x00000000bc600000,
0x00000000bdb80000, 0x00000000c1800000)
object space 22016K, 99% used
[0x00000000bc600000,0x00000000bdb7e268,0x00000000bdb80000)
==============================
any help is welcome. If I should post any config values please let me
know which ones. I have the exact same problems on two different
machines (SUSE,UBUNTU). On both machines while the stall/stop in parsing
happens the JAVA process CPU is 100%.
Thank you