Using a nutch nightly from a few days ago, mapred.speculative  
execution is off, etc..

In a re-parse of already fetched content I am getting a crash halfway  
through.

There are two java error dumps one after another:

the first is a java.io.IOException: config(config) coming from  
org.apache.hadoop.conf.Configuration. (I am using hadoop 0.9.1) The  
only thing in my hadoop config is the aforementioned  
mapred.speculative.execution being set to false.

The second is a java.lang.NullPointerException in  
PluginManifestParser.java. The line it claims is crashing is:
for (File oneSubFolder : directory.listFiles()) {
(where directory is the "looking in:" directory that the  
PluginRepository reports)

Of note is if I turn off my parse-mp3 plugin, the crash does not  
happen. However, it is not crashing on any mp3 URL. The URL it last  
read is a bog-standard HTML page with no mp3s on it or near it. Is  
there some effect a bad plugin could have on later parses?


2007-01-10 14:12:08,399 DEBUG parse.html - found 16 outlinks in  
http://{URL}
2007-01-10 14:12:08,405 INFO  mapred.LocalJobRunner - /array/nutch- 
nightly/crawl/segments/20070110022429/content/part-00000/data: 
436207616+33554432
2007-01-10 14:12:08,406 DEBUG conf.Configuration -  
java.io.IOException: config(config)
         at org.apache.hadoop.conf.Configuration.<init> 
(Configuration.java:102)
         at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:84)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run 
(LocalJobRunner.java:104)

2007-01-10 14:12:08,408 INFO  conf.Configuration - parsing jar:file:/ 
array/nutch-nightly/lib/hadoop-0.9.1.jar!/hadoop-default.xml
2007-01-10 14:12:08,413 INFO  conf.Configuration - parsing jar:file:/ 
array/nutch-nightly/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-10 14:12:08,415 INFO  conf.Configuration - parsing /tmp/ 
hadoop-bwhitman/mapred/local/localRunner/job_i8aofb.xml
2007-01-10 14:12:08,421 INFO  conf.Configuration - parsing jar:file:/ 
array/nutch-nightly/lib/hadoop-0.9.1.jar!/mapred-default.xml
2007-01-10 14:12:08,426 INFO  mapred.MapTask - opened part-0.out
2007-01-10 14:12:08,427 INFO  plugin.PluginRepository - Plugins:  
looking in: /array/nutch-nightly/plugins
2007-01-10 14:12:08,428 WARN  mapred.LocalJobRunner - job_i8aofb
java.lang.NullPointerException
         at  
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder 
(PluginManifestParser.java:87)
         at org.apache.nutch.plugin.PluginRepository.<init> 
(PluginRepository.java:71)
         at org.apache.nutch.plugin.PluginRepository.get 
(PluginRepository.java:95)
         at org.apache.nutch.scoring.ScoringFilters.<init> 
(ScoringFilters.java:59)
         at org.apache.nutch.parse.ParseSegment.configure 
(ParseSegment.java:55)
         at org.apache.hadoop.util.ReflectionUtils.setConf 
(ReflectionUtils.java:50)
         at org.apache.hadoop.util.ReflectionUtils.newInstance 
(ReflectionUtils.java:70)
         at org.apache.hadoop.mapred.MapRunner.configure 
(MapRunner.java:34)
         at org.apache.hadoop.util.ReflectionUtils.setConf 
(ReflectionUtils.java:50)
         at org.apache.hadoop.util.ReflectionUtils.newInstance 
(ReflectionUtils.java:70)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:211)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run 
(LocalJobRunner.java:109)





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to