Hi,
I'm getting some crawl errors after upgrade from Nutch-1.0 to Nutch-1.2.
Are Nutch-1.0 files (index, segments, etc) full compatible with the version
1.2?
Trying to continue the crawl on Nutch-1.2, I get the following error:
2011-01-17 10:25:36,361 WARN mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 13 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.String.substring(String.java:1939)
at java.lang.String.substring(String.java:1904)
at
org.apache.nutch.analysis.lang.NGramProfile.load(NGramProfile.java:348)
at
org.apache.nutch.analysis.lang.LanguageIdentifier.<init>(LanguageIdentifier.java:139)
at
org.apache.nutch.analysis.lang.LanguageIndexingFilter.setConf(LanguageIndexingFilter.java:105)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
at
org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:69)
at
org.apache.nutch.indexer.IndexerMapReduce.configure(IndexerMapReduce.java:61)
... 18 more
Any ideas?
Thanks
Patricio