Re: error crawling

Christopher Gross Fri, 24 May 2013 09:43:39 -0700

Right.  "runbot" is the old one.  They don't package something with nutch
anymore like that.  Through digging on the web I found something.


I took this script.
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/bin/crawl

I made small changes -- rather than passing in args I hard coded them (to
make it easier to run via cron), and since my user doesn't have the right
stuff set up in the PATH, I have an environment loader.  I also commented
out the dedup line since it doesn't work.

>From that file:

# initial injection
$bin/nutch inject $SEEDDIR -crawlId $CRAWL_ID

Even taking out the CRAWL_ID part I still get the crawl_webpage error
message.  So I'm still not able to do the crawling correctly.  I still
cannot find documentation saying what I need to do to make the Keyclass and
nameclass match correctly.  That's what I'm trying to get answered.  I
tried hacking at it a bit but things got uglier, so I'm looking to here for
guidance.

Re: error crawling

Reply via email to