Re: Recovering aborted fetch

Charlie Williams Mon, 26 Feb 2007 05:37:17 -0800

I had this same problem, we had gathered about 90% of a 1.5M page fetch only
to have the system crash at the reduce phase. We now do cycles of about 50k
pages at a time to minimize loss.


-Charlie

On 2/26/07, Mathijs Homminga <[EMAIL PROTECTED]> wrote:


:(

I read something about creating a 'fetcher.done' file which can do some
magic.
Could that help us out?

Mathijs

rubdabadub wrote:
> Hi:
>
> I am Sorry to say that you need to fetch again i.e your last segment.
> I know the feeling :-( AFAIK there is no way in 0.8 restart a failed
> crawl. I have found having small segment i.e generating small fetch
> list and merging all the segment later is the only way to avoid such
> situation.
>
> Regards
>
> On 2/25/07, Mathijs Homminga <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> While fetching a segment with 4M documents, we ran out of diskspace.
>> We guess that the fetcher has fetched (and parsed) about 80 percent of
>> the documents, so it would be great if we could continue our crawl
>> somehow.
>>
>> The segment directory does not contain a crawl_fetch subdirectory yet.
>> But we have a /tmp/hadoop/mapred/ (Local FS) directory.
>>
>> Is there some way we can use the data in the temporary mapred directory
>> to create the crawl_fetch data in order to continue our crawl?
>>
>> Thanks!
>> Mathijs
>>
>>

Re: Recovering aborted fetch

Reply via email to