Hmm:
$ bin/nutch inject crawl/crawldb /usr/tmp2/urls.txt
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: /usr/tmp2/urls.txt
Injector: Converting injected urls to crawl db entries.
Injector: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.Injector.inject(Injector.java:166)
at org.apache.nutch.crawl.Injector.run(Injector.java:196)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.crawl.Injector.main(Injector.java:186)
$ cat /var/tmp/nutch-trunk/hadoop.log
2007-08-09 10:23:23,504 INFO crawl.Injector - Injector: starting
2007-08-09 10:23:23,505 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2007-08-09 10:23:23,505 INFO crawl.Injector - Injector: urlDir:
/usr/tmp2/urls.txt
2007-08-09 10:23:23,976 INFO crawl.Injector - Injector: Converting injected
urls to crawl db entries.
2007-08-09 10:23:25,035 INFO plugin.PluginRepository - Plugins: looking in:
/usr/tmp2/nutch_trunk/plugins
2007-08-09 10:23:25,038 WARN mapred.LocalJobRunner - job_48xttw
java.lang.NullPointerException
at
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:87)
at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71)
at
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:116)
at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:59)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)
2007-08-09 10:23:25,946 FATAL crawl.Injector - Injector: java.io.IOException:
Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.Injector.inject(Injector.java:166)
at org.apache.nutch.crawl.Injector.run(Injector.java:196)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.crawl.Injector.main(Injector.java:186)
So, Doğacan, I tried the steps you suggested:
1) Try doing an "ant clean;ant" and try again.
2) Check if your classpath is clean
3) Try to run the inject command by itself: bin/nutch <crawldb> <urldir>
I did "ant clean; ant". In fact I even tried "svn up -r HEAD" yesterday. My
CLASSPATH is not set (it's empty) - do I need it to be set? JAVA_HOME is set
properly. NUTCH_HOME is set to /usr/tmp2/nutch_trunk as appropriate, and
that's where I ran the above inject command from. No crawl directory gets
created, though now I'm seeing hadoop.log in the correct place, as we see
above. Using df I see I have plenty of disk space.
Any other ideas? Should I add some logging code and rebuild? Maybe I'll try
this with a stock 0.9 of nutch and see what happens.
--Kai Middleton
____________________________________________________________________________________
Pinpoint customers who are looking for what you sell.
http://searchmarketing.yahoo.com/