Hi, I am trying to use nutch-0.8-dev and I have a problem with crawl run. I did checkout from SVN and prepared fresh package (ant package - all went fine). Then I installed nutch on linux and made only minor changes to nutch-site.xml file (turned on some plugins and increased several constansts), prepared file with urls and started bin/nutch crawl.
This worked for nutch-0.7x but for nutch-0.8-dev I am receiving the following exception in log file: 051220 204248 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 crawl started in: ./crawl.test 051220 204249 rootUrlDir = urls 051220 204249 threads = 10 051220 204249 depth = 6 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 Injector: starting 051220 204249 Injector: crawlDb: ./crawl.test/crawldb 051220 204249 Injector: urlDir: urls 051220 204249 Injector: Converting injected urls to crawl db entries. 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/crawl-tool.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-default.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/mapred-default.xml 051220 204249 parsing /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml 051220 204249 parsing file:/home/lukas/nutch/nutch-0.8-dev/conf/nutch-site.xml java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml , /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml , nutch-site.xml at org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) at org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) 051220 204249 Running job: job_4zwds6 Exception in thread "main" java.io.IOException: Job failed! at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) at org.apache.nutch.crawl.Injector.inject(Injector.java:102) at org.apache.nutch.crawl.Crawl.main(Crawl.java:101) It seems that the problem is that Nutch is not able to find mapred.input.subdir setting in neither of config files. I found that there is mapred.input.dir property defined in config for particular job (job_4zwds6.xml) with value equal to the name of my urls file but I don't understand where should I define mapred.input.subdir property and what value to assign to it (if it needs to be defined manually - note that mapred.input.dir seems to be configured automatically). Does anybody know the answer? p.s: Note that number of lines it the exception trace above for InputFormatBase.java file (85,95) can differ a bit as I tried to insert some more LOG.debug() commands there in search of the root cause and then I removed them again but it is possible that I left some extra empty lines there. Thanks, Lukas ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers