[ http://issues.apache.org/jira/browse/NUTCH-175?page=comments#action_12412644 ]
Stefan Neufeind commented on NUTCH-175: --------------------------------------- My bad I didn't pay close attention when moving from 0.7 to 0.8. But I'd like to stress in this bug-entry that "urls" in the example-call to "nutch crawl" is no longer a file - but actually a directory containing files with urls in them. RTFM - and now it works :-) > No input directories specified in: while crawing in nightly build from the > 14.1.2006: sh ./nutch crawl urllist.txt -dir tmpdir > ------------------------------------------------------------------------------------------------------------------------------ > > Key: NUTCH-175 > URL: http://issues.apache.org/jira/browse/NUTCH-175 > Project: Nutch > Type: Bug > Environment: SUSE Linux 9.3 > Reporter: Matthias Günter > Priority: Trivial > > [EMAIL PROTECTED]:~/workspace/lucene/nutch-nightly/bin> sh ./nutch crawl > urllist.txt -dir tmpdir > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml > 060114 205612 crawl started in: tmpdir > 060114 205612 rootUrlDir = urllist.txt > 060114 205612 threads = 10 > 060114 205612 depth = 5 > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml > 060114 205612 Injector: starting > 060114 205612 Injector: crawlDb: tmpdir/crawldb > 060114 205612 Injector: urlDir: urllist.txt > 060114 205612 Injector: Converting injected urls to crawl db entries. > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml > 060114 205612 Running job: job_n0o7ps > 060114 205612 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml > 060114 205613 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml > 060114 205613 parsing /tmp/nutch/mapred/local/localRunner/job_n0o7ps.xml > 060114 205613 parsing > file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml > java.io.IOException: No input directories specified in: NutchConf: > nutch-default.xml , mapred-default.xml , > /tmp/nutch/mapred/local/localRunner/job_n0o7ps.xml , nutch-site.xml > at > org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) > at > org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) > at > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) > 060114 205613 map 0% > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) > at org.apache.nutch.crawl.Injector.inject(Injector.java:102) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > urllist.txt contains > http://www.mentor.ch > PS: Is there a committer or developer (near Switzerland) who can support > (paid support) with a mixed index for intranet, some internet sites and > scanning of local drives (P:\ , S:\ etc) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers