I hope someone can help me with this problem I'm having with crawling on
Solaris. The same
script works fine on Windows using cygwin but I need to run this on Solaris.
This works fine:
#bin/nutch crawl urls.txt
...creating a directory named something like crawl-20060418105008, as
expected, and creates a working index.
However if I try to add any parameters beyond the root_url_file parameter
I get the output below. I'm really stumped. The following does not create
a directory named FOO, but it does create a directory named something like
crawl-20060418105500. Apparently it ignores the -dir FOO parameter.
Actually looking at the output it seems as if it is taking "urls.txt -dir FOO"
as the name of the urls file, rather than interpreting the "-dir FOO" at all.
See the line "rootUrlFile = urls.txt -dir FOO"; it should just be
"rootUrlFile = urls.txt" I think.
## bin/nutch crawl urls.txt -dir FOO
060418 105308 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060418 105308 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060418 105308 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060418 105308 No FS indicated, using default:local
060418 105308 crawl started in: crawl-20060418105308
060418 105308 rootUrlFile = urls.txt -dir FOO
060418 105308 threads = 10
060418 105308 depth = 5
060418 105310 Created webdb at
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060418105308/db
Exception in thread "main" java.io.FileNotFoundException: urls.txt -dir
FOO (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at java.io.FileReader.<init>(FileReader.java:55)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general