Thanks Markus,

Yes, we've already changed hadoop.tmp.dir and there is plenty free space.

All the best,
Igor

On Thu, May 10, 2012 at 10:35 AM, Markus Jelsma
<[email protected]>wrote:

> Plenty of disk space does not mean you have enough room in your
> hadoop.tmp.dir which is /tmp by default.
>
>
> On Thu, 10 May 2012 10:26:00 +0200, Igor Salma <[email protected]>
> wrote:
>
>> Hi, Adriana, Sebastian,
>>
>> We are struggling with this for a days - the problem is cause it crawls
>> for
>> few days and then it breaks with same exception. At first, it seemed that
>> Adriana was right - that we're having problem with disc space but last two
>> breaks occurred with 9GB still left on disc. Also we've moved to
>> hadoop-core-1.0.2.jar. One thing more - it seems that it always fails on
>> job_local_0015 (not 100% sure, though):
>>
>> 2012-05-09 15:55:35,534 WARN  mapred.LocalJobRunner - job_local_0015
>> org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not
>> find any
>> valid local directory for output/file.out
>>        at
>>
>> org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
>> getLocalPathForWrite(**LocalDirAllocator.java:381)
>>        at
>>
>> org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathForWrite(**
>> LocalDirAllocator.java:146)
>>        at
>>
>> org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathForWrite(**
>> LocalDirAllocator.java:127)
>>        at
>>
>> org.apache.hadoop.mapred.**MapOutputFile.**getOutputFileForWrite(**
>> MapOutputFile.java:69)
>>        at
>>
>> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>> mergeParts(MapTask.java:1640)
>>        at
>>
>> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
>> MapTask.java:1323)
>>        at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
>> java:437)
>>        at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:372)
>>        at
>>
>> org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
>> LocalJobRunner.java:212)
>>
>> Do you know what could it mean?
>>
>> @Sebastian: we are running only one instance of Nutch
>>
>> We're speaking about ~ 300,000 - 400,000 documents. Should we start
>> considering crawl in parallel?
>>
>> Thanks in advance.
>>
>> All the best,
>> Igor
>>
>>
>>
>> On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel <
>> [email protected]
>>
>>> wrote:
>>>
>>
>>  Hi Igor,
>>>
>>> no disk space on /tmp is one possible reason.
>>>
>>> The other is:
>>> > (working in local mode).
>>> Are you running multiple instances of Nutch in parallel?
>>> If yes, these instances must use disjoint temp directories
>>> (hadoop.tmp.dir). There are multiple posts on this list
>>> about this topic.
>>>
>>> Sebastian
>>>
>>>
>>> On 04/30/2012 03:33 PM, Adriana Farina wrote:
>>>
>>>  Hello!
>>>>
>>>> I had the same kind of problem. In my case this was caused by one of the
>>>> node of my cluster with full memory, so to solve the priblem I simply
>>>> freed
>>>> up memory on that node. Check if all of the nodes of your cluster have
>>>> free
>>>> memory.
>>>>
>>>> As for the second error, it seems you're missing some library: try
>>>> adding
>>>> it to hadoop.
>>>>
>>>>
>>>> Inviato da iPhone
>>>>
>>>> Il giorno 30/apr/2012, alle ore 15:15, Igor Salma<[email protected]>
>>>>  ha scritto:
>>>>
>>>>  Hi to all,
>>>>
>>>>>
>>>>> We're having trouble with nutch when trying to crawl. Nutch version
>>>>> 1.4,
>>>>> Hadoop 0.20.2. (working in local mode). After 2 days of crawling we've
>>>>> got:
>>>>> org.apache.hadoop.util.****DiskChecker$****DiskErrorException: Could
>>>>> not
>>>>> find
>>>>> taskTracker/jobcache/job_****local_0015/attempt_local_0015_****
>>>>>
>>>>> m_000000_0/output/spill0.out
>>>>> in any of the configured local directories
>>>>>   at
>>>>> org.apache.hadoop.fs.****LocalDirAllocator$****AllocatorPerContext.**
>>>>> getLocalPathToRead(****LocalDirAllocator.java:389)
>>>>>   at
>>>>> org.apache.hadoop.fs.****LocalDirAllocator.****getLocalPathToRead(**
>>>>> LocalDirAllocator.java:138)
>>>>>   at
>>>>> org.apache.hadoop.mapred.****MapOutputFile.getSpillFile(**
>>>>> MapOutputFile.java:94)
>>>>>   at
>>>>> org.apache.hadoop.mapred.****MapTask$MapOutputBuffer.**
>>>>> mergeParts(MapTask.java:1443)
>>>>>   at
>>>>> org.apache.hadoop.mapred.****MapTask$MapOutputBuffer.flush(****
>>>>> MapTask.java:1154)
>>>>>   at org.apache.hadoop.mapred.****MapTask.runOldMapper(MapTask.****
>>>>> java:359)
>>>>>   at org.apache.hadoop.mapred.****MapTask.run(MapTask.java:307)
>>>>>   at
>>>>> org.apache.hadoop.mapred.****LocalJobRunner$Job.run(**
>>>>>
>>>>> LocalJobRunner.java:177)
>>>>>
>>>>> We've looked at mailing list archives but I'm not sure if exact thing
>>>>> is
>>>>> mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then this
>>>>> is
>>>>> thrown:
>>>>> Exception in thread "main" java.lang.****NoClassDefFoundError:
>>>>> org/apache/commons/****configuration/Configuration
>>>>>
>>>>>
>>>>> Can someone, please, shed some light on this?
>>>>>
>>>>> Thanks.
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
> --
> Markus Jelsma - CTO - Openindex
>

Reply via email to