Re: error crawling

alxsss Fri, 24 May 2013 11:51:45 -0700

Can you send the scrpit? Also are you running it in deploy or local mode?

-----Original Message-----
From: Christopher Gross <[email protected]>
To: user <[email protected]>
Sent: Fri, May 24, 2013 9:43 am
Subject: Re: error crawling

Right.  "runbot" is the old one.  They don't package something with nutch
anymore like that.  Through digging on the web I found something.

I took this script.
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/bin/crawl

I made small changes -- rather than passing in args I hard coded them (to
make it easier to run via cron), and since my user doesn't have the right
stuff set up in the PATH, I have an environment loader.  I also commented
out the dedup line since it doesn't work.

>From that file:

# initial injection
$bin/nutch inject $SEEDDIR -crawlId $CRAWL_ID

Even taking out the CRAWL_ID part I still get the crawl_webpage error
message.  So I'm still not able to do the crawling correctly.  I still
cannot find documentation saying what I need to do to make the Keyclass and
nameclass match correctly.  That's what I'm trying to get answered.  I
tried hacking at it a bit but things got uglier, so I'm looking to here for
guidance.

Re: error crawling

Reply via email to