Plenty of disk space does not mean you have enough room in your
hadoop.tmp.dir which is /tmp by default.
On Thu, 10 May 2012 10:26:00 +0200, Igor Salma <[email protected]>
wrote:
Hi, Adriana, Sebastian,
We are struggling with this for a days - the problem is cause it
crawls for
few days and then it breaks with same exception. At first, it seemed
that
Adriana was right - that we're having problem with disc space but
last two
breaks occurred with 9GB still left on disc. Also we've moved to
hadoop-core-1.0.2.jar. One thing more - it seems that it always fails
on
job_local_0015 (not 100% sure, though):
2012-05-09 15:55:35,534 WARN mapred.LocalJobRunner - job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
any
valid local directory for output/file.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Do you know what could it mean?
@Sebastian: we are running only one instance of Nutch
We're speaking about ~ 300,000 - 400,000 documents. Should we start
considering crawl in parallel?
Thanks in advance.
All the best,
Igor
On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel
<[email protected]
wrote:
Hi Igor,
no disk space on /tmp is one possible reason.
The other is:
> (working in local mode).
Are you running multiple instances of Nutch in parallel?
If yes, these instances must use disjoint temp directories
(hadoop.tmp.dir). There are multiple posts on this list
about this topic.
Sebastian
On 04/30/2012 03:33 PM, Adriana Farina wrote:
Hello!
I had the same kind of problem. In my case this was caused by one
of the
node of my cluster with full memory, so to solve the priblem I
simply freed
up memory on that node. Check if all of the nodes of your cluster
have free
memory.
As for the second error, it seems you're missing some library: try
adding
it to hadoop.
Inviato da iPhone
Il giorno 30/apr/2012, alle ore 15:15, Igor
Salma<[email protected]>
ha scritto:
Hi to all,
We're having trouble with nutch when trying to crawl. Nutch
version 1.4,
Hadoop 0.20.2. (working in local mode). After 2 days of crawling
we've
got:
org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could
not
find
taskTracker/jobcache/job_**local_0015/attempt_local_0015_**
m_000000_0/output/spill0.out
in any of the configured local directories
at
org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
getLocalPathToRead(**LocalDirAllocator.java:389)
at
org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
LocalDirAllocator.java:138)
at
org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
MapOutputFile.java:94)
at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
mergeParts(MapTask.java:1443)
at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
MapTask.java:1154)
at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
java:359)
at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
LocalJobRunner.java:177)
We've looked at mailing list archives but I'm not sure if exact
thing is
mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then
this
is
thrown:
Exception in thread "main" java.lang.**NoClassDefFoundError:
org/apache/commons/**configuration/Configuration
Can someone, please, shed some light on this?
Thanks.
Igor
--
Markus Jelsma - CTO - Openindex