I that case i'm not sure 9GB is enough for 400.000 documents. This is
most certainly not enough if you store the content in the segment
(default).
On Thu, 10 May 2012 10:43:14 +0200, Igor Salma <[email protected]>
wrote:
Thanks Markus,
Yes, we've already changed hadoop.tmp.dir and there is plenty free
space.
All the best,
Igor
On Thu, May 10, 2012 at 10:35 AM, Markus Jelsma wrote:
Plenty of disk space does not mean you have enough room in your
hadoop.tmp.dir which is /tmp by default.
On Thu, 10 May 2012 10:26:00 +0200, Igor Salma wrote:
Hi, Adriana, Sebastian,
We are struggling with this for a days - the problem is cause it
crawls for
few days and then it breaks with same exception. At first, it seemed
that
Adriana was right - that we're having problem with disc space but
last two
breaks occurred with 9GB still left on disc. Also we've moved to
hadoop-core-1.0.2.jar. One thing more - it seems that it always
fails on
job_local_0015 (not 100% sure, though):
2012-05-09 15:55:35,534 WARN mapred.LocalJobRunner -
job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any
valid local directory for output/file.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Do you know what could it mean?
@Sebastian: we are running only one instance of Nutch
We're speaking about ~ 300,000 - 400,000 documents. Should we start
considering crawl in parallel?
Thanks in advance.
All the best,
Igor
On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel (working in local
mode).
Are you running multiple instances of Nutch in parallel?
If yes, these instances must use disjoint temp directories
(hadoop.tmp.dir). There are multiple posts on this list
about this topic.
Sebastian
On 04/30/2012 03:33 PM, Adriana Farina wrote:
Hello!
I had the same kind of problem. In my case this was caused by one of
the
node of my cluster with full memory, so to solve the priblem I
simply freed
up memory on that node. Check if all of the nodes of your cluster
have free
memory.
As for the second error, it seems you're missing some library: try
adding
it to hadoop.
Inviato da iPhone
Il giorno 30/apr/2012, alle ore 15:15, Igor Salma
ha scritto:
Hi to all,
We're having trouble with nutch when trying to crawl. Nutch version
1.4,
Hadoop 0.20.2. (working in local mode). After 2 days of crawling
we've
got:
org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not
find
taskTracker/jobcache/job_**local_0015/attempt_local_0015_**
m_000000_0/output/spill0.out
in any of the configured local directories
at
org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
getLocalPathToRead(**LocalDirAllocator.java:389)
at
org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
LocalDirAllocator.java:138)
at
org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
MapOutputFile.java:94)
at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
mergeParts(MapTask.java:1443)
at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
MapTask.java:1154)
at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
java:359)
at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
LocalJobRunner.java:177)
We've looked at mailing list archives but I'm not sure if exact
thing is
mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then
this
is
thrown:
Exception in thread "main" java.lang.**NoClassDefFoundError:
org/apache/commons/**configuration/Configuration
Can someone, please, shed some light on this?
Thanks.
Igor
--
Markus Jelsma - CTO - Openindex
Links:
------
[1] mailto:[email protected]
[2] mailto:[email protected]
[3] mailto:[email protected]
[4] mailto:[email protected]
--
Markus Jelsma - CTO - Openindex