Re: ERROR when recrawling... can ANYONE help?

Honda-Search Administrator Fri, 23 Jun 2006 11:57:46 -0700

20060619230003 Does exist, but it does not have an index directory...


Here are the subdirectories it has:

content/  fetcher/  fetchlist/  parse_data/  parse_text/

This also seems to be the case for 8 other segment directories.

The rest of the segment directories (excluding the 9 above) have this filestructure:

content/ fetchlist/ index.done parse_text/ fetcher/ index/parse_data/

What would cause these segments to act this way? Is there a sure way to fixit? Can I prevent it from happening again?


Matt

----- Original Message -----From: "TDLN" <[EMAIL PROTECTED]>To: <[email protected]>; "Honda-Search Administrator"<[EMAIL PROTECTED]>

Sent: Friday, June 23, 2006 11:28 AM
Subject: Re: ERROR when recrawling... can ANYONE help?

Does /home/honda/nutch-0.7.2/crawl/segments/20060619230003/index exist atall?


Can you confirm that all segments contain index directory?

Rgrds,. Thomas


On 6/23/06, Honda-Search Administrator <[EMAIL PROTECTED]> wrote:

To recrawl I use the command:

/home/honda/nutch-0.7.2/recrawl.sh /home/honda/nutch-0.7.2/crawl 1 2

"crawl" is the name of my database directory.

The script "recrawl.sh" is the standard one that comes in the package.I'm

pretty sure it's the same for everyone, but I've included a link to the
recrawl.sh script I'm using:

http://www.honda-search.com/script.html

As you can see I'm crawling with a depth of 1, which is intentional. Ionlydesire to recrawl the specific pages injected each night. I'm wonderingif

the 'adddays' parameter is messing me up.

Matt

----- Original Message -----
From: "TDLN" <[EMAIL PROTECTED]>
To: <[email protected]>; "Honda-Search Administrator"
<[EMAIL PROTECTED]>
Sent: Friday, June 23, 2006 10:46 AM
Subject: Re: ERROR when recrawling... can ANYONE help?


> Please specify what exact sequence of commands you are using.
>
> For incremental crawling best to follow the "whole web" style process
> as outlined in the tutorial. The one stop crawl command cannot be used
> effectively for that.
>
> HTH Thomas
>

Re: ERROR when recrawling... can ANYONE help?

Reply via email to