Hi, I was able crawl/index/search a couple of sites using the "intranet crawl" instructions in the tutorial. I am now trying to go through the whole-web crawl instructions in the tutorial and only got through a few steps before I ran into an error the first time I called bin/nutch fetch.
(Note: the file urlsWW, used in the inject statement below, contains only one URL for testing purposes, so currently reads: http://www.democracynow.org) Here is what happened: [EMAIL PROTECTED] /usr/local/nutch-0.6 $ mkdir db2 [EMAIL PROTECTED] /usr/local/nutch-0.6 $ mkdir segments2 [EMAIL PROTECTED] /usr/local/nutch-0.6 $ bin/nutch admin db2 -create 050705 234131 No NutchFileSystem indicated, so defaulting to local fs. 050705 234131 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- default.xm 050705 234132 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- site.xml 050705 234132 Created webdb at LocalFS,db2 [EMAIL PROTECTED] /usr/local/nutch-0.6 $ bin/nutch inject db2 -urlfile urlsWW 050705 234332 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- default.xm 050705 234333 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- site.xml 050705 234333 No NutchFileSystem indicated, so defaulting to local fs. 050705 234333 Starting URL processing 050705 234333 Using URL filter: net.nutch.net.RegexURLFilter 050705 234333 found resource regex-urlfilter.txt at file:/C:/cygwin/usr/local/n tch-0.6/conf/regex-urlfilter.txt 050705 234333 Using URL normalizer: net.nutch.net.BasicUrlNormalizer 050705 234333 Added 1 pages 050705 234333 Processing pagesByURL: Sorted 1 instructions in 0.0 seconds. 050705 234333 Processing pagesByURL: Sorted Infinity instructions/second 050705 234333 Processing pagesByURL: Merged to new DB containing 1 records in 0 0 seconds 050705 234333 Processing pagesByURL: Merged Infinity records/second 050705 234333 Processing pagesByMD5: Sorted 1 instructions in 0.0 seconds. 050705 234333 Processing pagesByMD5: Sorted Infinity instructions/second 050705 234333 Processing pagesByMD5: Merged to new DB containing 1 records in 0 0 seconds 050705 234333 Processing pagesByMD5: Merged Infinity records/second 050705 234333 Processing linksByMD5: Copied file (0 bytes) in 0.015 secs. 050705 234333 Processing linksByURL: Copied file (0 bytes) in 0.016 secs. [EMAIL PROTECTED] /usr/local/nutch-0.6 $ bin/nutch generate db2 segments2 050705 234455 No NutchFileSystem indicated, so defaulting to local fs. 050705 234455 FetchListTool started 050705 234455 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- default.xm 050705 234455 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- site.xml 050705 234456 Processing pagesByURL: Sorted 1 instructions in 0.015 seconds. 050705 234456 Processing pagesByURL: Sorted 66.66666666666667instructions/seco d 050705 234456 Processing pagesByURL: Merged to new DB containing 1 records in 0 0 seconds 050705 234456 Processing pagesByURL: Merged Infinity records/second 050705 234456 Processing pagesByMD5: Sorted 1 instructions in 0.0 seconds. 050705 234456 Processing pagesByMD5: Sorted Infinity instructions/second 050705 234456 Processing pagesByMD5: Merged to new DB containing 1 records in 0 0 seconds 050705 234456 Processing pagesByMD5: Merged Infinity records/second 050705 234456 Processing linksByMD5: Copied file (0 bytes) in 0.016 secs. 050705 234456 Processing linksByURL: Copied file (0 bytes) in 0.015 secs. 050705 234456 Processing segments2\20050705234455\fetchlist.unsorted: Sorted 1 ntries in 0.0 seconds. 050705 234456 Processing segments2\20050705234455\fetchlist.unsorted: Sorted In inity entries/second 050705 234456 Overall processing: Sorted 1 entries in 0.0 seconds. 050705 234456 Overall processing: Sorted 0.0 entries/second 050705 234456 FetchListTool completed [EMAIL PROTECTED] /usr/local/nutch-0.6 $ s1='ls -d segments/2* | tail -1' [EMAIL PROTECTED] /usr/local/nutch-0.6 $ echo $s1 ls -d segments/20050701222333 | tail -1 [EMAIL PROTECTED] /usr/local/nutch-0.6 $ bin/nutch fetch $s1 050705 234611 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- default.xm 050705 234612 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch- site.xml 050705 234612 No NutchFileSystem indicated, so defaulting to local fs. Exception in thread "main" java.io.IOException: File does not exist at net.nutch.fs.LocalFileSystem.open(LocalFileSystem.java:77) at net.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:143) at net.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:136) at net.nutch.io.MapFile$Reader.<init>(MapFile.java:171) at net.nutch.io.MapFile$Reader.<init>(MapFile.java:160) at net.nutch.io.ArrayFile$Reader.<init>(ArrayFile.java:37) at net.nutch.fetcher.Fetcher.<init>(Fetcher.java:235) at net.nutch.fetcher.Fetcher.main(Fetcher.java:413) [EMAIL PROTECTED] /usr/local/nutch-0.6 $ Any Suggestions are much appreciated, Bryan
