I've narrowed down the problem. It is not parsing my command line correctly. You'll notice it says file not found "urls.txt -dir crawl.test" rather than just urls.txt. Also even though I specified depth 2 it thinks it is 5. If I just do "bin/nutch crawl urls.txt" it will run, but without the parameters I want. Perhaps a problem with the shell. I also had the error "IFS: cannot unset" so I had commented that out.
--- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Try to move the urls.txt in a folder called urls and > provide the > folder instead the text file itself. > Does this help? > Stefan > > Am 12.01.2006 um 21:29 schrieb Mike Markzon: > > > I've tried 0.7 and the nightly build of 0.8. > Neither > > is working for me. I'm just trying to follow the > > tutorials. Here's what i'm getting with 0.7 when > I > > try and crawl (FileNotFoundException). > > > > $ ls urls.txt > > urls.txt > > $ bin/nutch crawl urls.txt -dir crawl.test -d 2 > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/nutch-default.xml > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/crawl-tool.xml > > 060112 122459 parsing > > > file:/apps/user/vignette/nutch-0.7/conf/nutch-site.xml > > 060112 122459 No FS indicated, using default:local > > 060112 122459 crawl started in: > crawl-20060112122459 > > 060112 122459 rootUrlFile = urls.txt -dir > crawl.test > > -d 2 > > 060112 122459 threads = 10 > > 060112 122459 depth = 5 > > 060112 122459 Created webdb at > > LocalFS,/apps/user/vignette/nutch-0.7/crawl-20060 > > 112122459/db > > Exception in thread "main" > > java.io.FileNotFoundException: urls.txt -dir > crawl.te > > st -d 2 (No such file or directory) > > at java.io.FileInputStream.open(Native > Method) > > at > > > java.io.FileInputStream.<init>(FileInputStream.java:106) > > at > > java.io.FileReader.<init>(FileReader.java:55) > > at > > > org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:37 > > 2) > > at > > > org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) > > at > > > org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) > > $ > > > > If I follow the tutorial at > > http://wiki.media-style.com/display/nutchDocu/Home > > everytime I execute a command I get a Usage > statement > > and the command doesn't do anything. > > $ bin/nutch admin db/ -create > > Usage: java org.apache.nutch.tools.WebDBAdminTool > > (-local | -ndfs <namenode:port > >> ) db [-create] [-textdump dumpPrefix] > [-scoredump] > > [-top k] > > > > Any ideas? Thanks! Also thanks to those who > answered > > my first question about using a server besides > Tomcat. > > -Mike > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > --------------------------------------------------------------- > company: http://www.media-style.com > forum: http://www.text-mining.org > blog: http://www.find23.net > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
