Dear Bryan,

I think you have a problem with this line:
s1='ls -d segments/2* | tail -1'

The good command:

s1=`ls -d segments/2* | tail -1`
The ' <> `

Regards,
   Ferenc

Bryan Woliner wrotte:

Hi,

I was able crawl/index/search a couple of sites using the "intranet crawl" instructions in the tutorial. I am now trying to go through the whole-web crawl instructions in the tutorial and only got through a few steps before I ran into an error the first time I called bin/nutch fetch.

(Note: the file urlsWW, used in the inject statement below, contains only one URL for testing purposes, so currently reads: http://www.democracynow.org)

Here is what happened:
[EMAIL PROTECTED] /usr/local/nutch-0.6
$ mkdir db2

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ mkdir segments2

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ bin/nutch admin db2 -create
050705 234131 No NutchFileSystem indicated, so defaulting to local fs.
050705 234131 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
default.xm

050705 234132 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
site.xml
050705 234132 Created webdb at LocalFS,db2

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ bin/nutch inject db2 -urlfile urlsWW
050705 234332 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
default.xm

050705 234333 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
site.xml
050705 234333 No NutchFileSystem indicated, so defaulting to local fs.
050705 234333 Starting URL processing
050705 234333 Using URL filter: net.nutch.net.RegexURLFilter
050705 234333 found resource regex-urlfilter.txt at file:/C:/cygwin/usr/local/n
tch-0.6/conf/regex-urlfilter.txt
050705 234333 Using URL normalizer: net.nutch.net.BasicUrlNormalizer
050705 234333 Added 1 pages
050705 234333 Processing pagesByURL: Sorted 1 instructions in 0.0 seconds.
050705 234333 Processing pagesByURL: Sorted Infinity instructions/second
050705 234333 Processing pagesByURL: Merged to new DB containing 1 records in 0
0 seconds
050705 234333 Processing pagesByURL: Merged Infinity records/second
050705 234333 Processing pagesByMD5: Sorted 1 instructions in 0.0 seconds.
050705 234333 Processing pagesByMD5: Sorted Infinity instructions/second
050705 234333 Processing pagesByMD5: Merged to new DB containing 1 records in 0
0 seconds
050705 234333 Processing pagesByMD5: Merged Infinity records/second
050705 234333 Processing linksByMD5: Copied file (0 bytes) in 0.015 secs.
050705 234333 Processing linksByURL: Copied file (0 bytes) in 0.016 secs.

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ bin/nutch generate db2 segments2
050705 234455 No NutchFileSystem indicated, so defaulting to local fs.
050705 234455 FetchListTool started
050705 234455 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
default.xm

050705 234455 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
site.xml
050705 234456 Processing pagesByURL: Sorted 1 instructions in 0.015 seconds.
050705 234456 Processing pagesByURL: Sorted 66.66666666666667instructions/seco
d
050705 234456 Processing pagesByURL: Merged to new DB containing 1 records in 0
0 seconds
050705 234456 Processing pagesByURL: Merged Infinity records/second
050705 234456 Processing pagesByMD5: Sorted 1 instructions in 0.0 seconds.
050705 234456 Processing pagesByMD5: Sorted Infinity instructions/second
050705 234456 Processing pagesByMD5: Merged to new DB containing 1 records in 0
0 seconds
050705 234456 Processing pagesByMD5: Merged Infinity records/second
050705 234456 Processing linksByMD5: Copied file (0 bytes) in 0.016 secs.
050705 234456 Processing linksByURL: Copied file (0 bytes) in 0.015 secs.
050705 234456 Processing segments2\20050705234455\fetchlist.unsorted: Sorted 1
ntries in 0.0 seconds.
050705 234456 Processing segments2\20050705234455\fetchlist.unsorted: Sorted In
inity entries/second
050705 234456 Overall processing: Sorted 1 entries in 0.0 seconds.
050705 234456 Overall processing: Sorted 0.0 entries/second
050705 234456 FetchListTool completed

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ s1='ls -d segments/2* | tail -1'

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ echo $s1
ls -d segments/20050701222333 | tail -1

[EMAIL PROTECTED] /usr/local/nutch-0.6
$ bin/nutch fetch $s1
050705 234611 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
default.xm

050705 234612 loading file:/C:/cygwin/usr/local/nutch-0.6/conf/nutch-
site.xml
050705 234612 No NutchFileSystem indicated, so defaulting to local fs.
Exception in thread "main" java.io.IOException: File does not exist
at net.nutch.fs.LocalFileSystem.open(LocalFileSystem.java:77)
at net.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:143)
at net.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:136)
at net.nutch.io.MapFile$Reader.<init>(MapFile.java:171)
at net.nutch.io.MapFile$Reader.<init>(MapFile.java:160)
at net.nutch.io.ArrayFile$Reader.<init>(ArrayFile.java:37)
at net.nutch.fetcher.Fetcher.<init>(Fetcher.java:235)
at net.nutch.fetcher.Fetcher.main(Fetcher.java:413)

[EMAIL PROTECTED] /usr/local/nutch-0.6
$

Any Suggestions are much appreciated,
Bryan


Reply via email to