Create a file called urls in the working directory with the url of your site.
On 4/22/06, Markus Franz <[EMAIL PROTECTED]> wrote: > > Hello! > > I'm quite new to Nutch and I find it very interesting. I intended to use > Nutch 0.7.2 for crawling my own website, > http://www.armin-knab-gymnasium.de > > I did everything like it was explained in the tutorial: First I changed > the regular expression in conf/crawl-urlfilter.txt into: > +^http://([a-z0-9]*\.)*armin-knab-gymnasium.de/ > > Then I started: > bin/nutch crawl urls -dir akg -depth 5 >& crawl.log > > The output written to the crawl.log was: > > 060422 133349 parsing file:/home/franz/n-test/conf/nutch-default.xml > 060422 133349 parsing file:/home/franz/n-test/conf/crawl-tool.xml > 060422 133349 parsing file:/home/franz/n-test/conf/nutch-site.xml > 060422 133349 No FS indicated, using default:local > 060422 133349 crawl started in: akg > 060422 133349 rootUrlFile = urls > 060422 133349 threads = 10 > 060422 133349 depth = 5 > 060422 133350 Created webdb at LocalFS,/home/franz/n-test/akg/db > Exception in thread "main" java.io.FileNotFoundException: urls (No such > file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(FileInputStream.java:106) > at java.io.FileReader.<init>(FileReader.java:55) > at > org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372) > at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) > at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) > > > Does anybody know what I am doing wrong? > > Regards > Markus > > -- > Danziger Weg 2 > 97350 Mainbernheim > Germany > -- > +491626077635 > [EMAIL PROTECTED] > -- > >
