On 8/9/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: > Hmm: > > $ bin/nutch inject crawl/crawldb /usr/tmp2/urls.txt > Injector: starting > Injector: crawlDb: crawl/crawldb > Injector: urlDir: /usr/tmp2/urls.txt > Injector: Converting injected urls to crawl db entries. > Injector: java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.crawl.Injector.inject(Injector.java:166) > at org.apache.nutch.crawl.Injector.run(Injector.java:196) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.crawl.Injector.main(Injector.java:186) > > $ cat /var/tmp/nutch-trunk/hadoop.log > 2007-08-09 10:23:23,504 INFO crawl.Injector - Injector: starting > 2007-08-09 10:23:23,505 INFO crawl.Injector - Injector: crawlDb: > crawl/crawldb > 2007-08-09 10:23:23,505 INFO crawl.Injector - Injector: urlDir: > /usr/tmp2/urls.txt > 2007-08-09 10:23:23,976 INFO crawl.Injector - Injector: Converting injected > urls to crawl db entries. > 2007-08-09 10:23:25,035 INFO plugin.PluginRepository - Plugins: looking in: > /usr/tmp2/nutch_trunk/plugins > 2007-08-09 10:23:25,038 WARN mapred.LocalJobRunner - job_48xttw > java.lang.NullPointerException > at > org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:87) > at > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) > at > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:116) > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:59) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:170) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126) > 2007-08-09 10:23:25,946 FATAL crawl.Injector - Injector: java.io.IOException: > Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.crawl.Injector.inject(Injector.java:166) > at org.apache.nutch.crawl.Injector.run(Injector.java:196) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.crawl.Injector.main(Injector.java:186) > > So, Doğacan, I tried the steps you suggested: > 1) Try doing an "ant clean;ant" and try again. > 2) Check if your classpath is clean > 3) Try to run the inject command by itself: bin/nutch <crawldb> <urldir> > > I did "ant clean; ant". In fact I even tried "svn up -r HEAD" yesterday. My > CLASSPATH is not set (it's empty) - do I need it to be set? JAVA_HOME is set > properly. NUTCH_HOME is set to /usr/tmp2/nutch_trunk as appropriate, and > that's where I ran the above inject command from. No crawl directory gets > created, though now I'm seeing hadoop.log in the correct place, as we see > above. Using df I see I have plenty of disk space. > > Any other ideas? Should I add some logging code and rebuild? Maybe I'll try > this with a stock 0.9 of nutch and see what happens.
Can you check your plugin.folders setting? You probably have a path there which doesn't exists (Note that: having value="plugins,some_non_existing_folder" would not work either. All the paths there have to be correct). > > --Kai Middleton > > > > > > ____________________________________________________________________________________ > Pinpoint customers who are looking for what you sell. > http://searchmarketing.yahoo.com/ -- Doğacan Güney
