Hi everyone, I am using Hadoop-0.2.0 and Nutch-0.8, and at the moment trying to complete a 1-depth-crawl by using DFS and mapreduce structures. However, after a fetch step, I encounter the below JVM exception at one or more task trackers at the parsing step. It does not differ whether I use only the default parsers, or I also use the additional ones (pdf excel etc.). My task trackers work on AMD X2 64-bit machines and my JVM version is 1.5_06.
Have you ever faced with such a problem at the parse stage?Or how do you think I can spot the cause of this JVM exception?The error report is : 060530 144113 task_0007_m_000010_0 Using Signature impl: org.apache.nutch.crawl.MD5Signature 060530 144113 task_0007_m_000010_0 5.0391704E-6%/crawl/segments/20060521171305/content/part-00004/data:0+12303612 060530 144114 task_0007_m_000010_0 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer 060530 144114 task_0007_m_000007_0 0.084114%/crawl/segments/20060521171305/content/part-00011/data:0+12493176 060530 144115 task_0007_m_000007_0 0.09551566%/crawl/segments/20060521171305/content/part-00011/data:0+12493176 060530 144115 task_0007_m_000007_0 # 060530 144115 task_0007_m_000007_0 # An unexpected error has been detected by HotSpot Virtual Machine: 060530 144115 task_0007_m_000007_0 # 060530 144115 task_0007_m_000007_0 # SIGSEGV (0xb) at pc=0x0000003d1d247c10, pid=25093, tid=182894086496 060530 144115 task_0007_m_000007_0 # 060530 144115 task_0007_m_000007_0 # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_06-b05 mixed mode) 060530 144115 task_0007_m_000007_0 # Problematic frame: 060530 144115 task_0007_m_000007_0 # C [libc.so.6+0x47c10] printf_size+0x740 060530 144115 task_0007_m_000007_0 # 060530 144115 task_0007_m_000007_0 # An error report file with more information is saved as hs_err_pid25093.log 060530 144115 task_0007_m_000007_0 # 060530 144115 task_0007_m_000007_0 # If you would like to submit a bug report, please visit: 060530 144115 task_0007_m_000007_0 # http://java.sun.com/webapps/bugreport/crash.jsp 060530 144115 task_0007_m_000007_0 # 060530 144115 Server connection on port 51950 from 192.168.15.61: exiting 060530 144115 task_0007_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 134. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:242) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145) Thank you very much.
