I've narrowed down the problem. It is not parsing my command line correctly. You'll notice it says file not found "urls.txt -dir crawl.test" rather than just urls.txt. Also even though I specified depth 2 it thinks it is 5. If I just do "bin/nutch crawl urls.txt" it will run, but without the parameters I want. Perhaps a problem with the shell. I also had the error "IFS: cannot unset" so I had commented that out.
--- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Try to move the urls.txt in a folder called urls and > provide the > folder instead the text file itself. > Does this help? > Stefan > > Am 12.01.2006 um 21:29 schrieb Mike Markzon: > > > I've tried 0.7 and the nightly build of 0.8. > Neither > > is working for me. I'm just trying to follow the > > tutorials. Here's what i'm getting with 0.7 when > I > > try and crawl (FileNotFoundException). > > > > $ ls urls.txt > > urls.txt > > $ bin/nutch crawl urls.txt -dir crawl.test -d 2 > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/nutch-default.xml > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/crawl-tool.xml > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/nutch-site.xml > > 060112 122459 No FS indicated, using default:local > > 060112 122459 crawl started in: > crawl-20060112122459 > > 060112 122459 rootUrlFile = urls.txt -dir > crawl.test > > -d 2 > > 060112 122459 threads = 10 > > 060112 122459 depth = 5 > > 060112 122459 Created webdb at > > LocalFS,/apps/user/vignette/nutch-0.7/crawl-20060 > > 112122459/db > > Exception in thread "main" > > java.io.FileNotFoundException: urls.txt -dir > crawl.te > > st -d 2 (No such file or directory) > > at java.io.FileInputStream.open(Native > Method) > > at > > > java.io.FileInputStream.<init>(FileInputStream.java:106) > > at > > java.io.FileReader.<init>(FileReader.java:55) > > at > > > org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:37 > > 2) > > at > > > org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) > > at > > > org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) > > $ > > > > If I follow the tutorial at > > http://wiki.media-style.com/display/nutchDocu/Home > > everytime I execute a command I get a Usage > statement > > and the command doesn't do anything. > > $ bin/nutch admin db/ -create > > Usage: java org.apache.nutch.tools.WebDBAdminTool > > (-local | -ndfs <namenode:port > >> ) db [-create] [-textdump dumpPrefix] > [-scoredump] > > [-top k] > > > > Any ideas? Thanks! Also thanks to those who > answered > > my first question about using a server besides > Tomcat. > > -Mike > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > --------------------------------------------------------------- > company: http://www.media-style.com > forum: http://www.text-mining.org > blog: http://www.find23.net > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com