Re: segmentmerger spawns too many jobs

Dennis Kubes Wed, 26 Nov 2008 06:37:52 -0800


Alexander Aristov wrote:

I run nutch on EC2 small servers, they have about 2Gb RAM. I use DFS. Yes, I
supposed tasks, not jobs. Just took the name from the job tracker web page.

Just confirming. If it was starting a bunch of jobs that would be amuch different error :)


Where should I add these params? In nutch-site.xml or hadoop-site.xml?


Those should go in the hadoop-site.xml file.


My logs look so

08/11/25 02:41:53 INFO mapred.JobClient:  map 73% reduce 22%
08/11/25 02:41:59 INFO mapred.JobClient:  map 73% reduce 23%
08/11/25 02:42:06 INFO mapred.JobClient:  map 73% reduce 24%
08/11/25 02:59:02 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000001_0, Status : FAILED
Task attempt_200811250109_0014_m_000001_0 failed to report status for 603
seconds. Killing!
08/11/25 02:59:06 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000007_0, Status : FAILED
Task attempt_200811250109_0014_m_000007_0 failed to report status for 604
seconds. Killing!
08/11/25 03:01:13 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000000_1, Status : FAILED
Task attempt_200811250109_0014_m_000000_1 failed to report status for 604
seconds. Killing!
08/11/25 03:01:43 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000001_1, Status : FAILED
Task attempt_200811250109_0014_m_000001_1 failed to report status for 600
seconds. Killing!

....

Task attempt_200811250109_0014_m_000019_1 failed to report status for 602
seconds. Killing!
08/11/25 03:37:51 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000021_1, Status : FAILED
Task attempt_200811250109_0014_m_000021_1 failed to report status for 600
seconds. Killing!
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
        at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:622)
        at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:667)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

What is probably happening is the servers are getting overloaded,swapping too much, and not contacting the main namenode / jobtracker intime. What number of tasks do you currently have? I think the defaultis max 2.


Dennis


Alexander



2008/11/26 Dennis Kubes <[EMAIL PROTECTED]>

The mapred.map.tasks and mapred.reduce.tasks will define the approximate
number of tasks per job.  It is highly dependent upon the amount of data
being processed as well.  The mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum define the maximum number of tasks
to run on a single tasktracker for map and reduce tasks.

When you say 20 jobs I am assuming you mean tasks.  Also what type of
hardware are you running this on, what are your memory settings, running in
local or DFS mode?

Dennis


Alexander Aristov wrote:

Hi all

Can someone suggest me how to restrict number of jobs Nutch lauches in
hadoop when starts segment merger.

When I run generate, fetch, updatedb tasks Nutch starts about 6-10
Mapreduce
jobs (cluster of 2 datanodes) - actual value varies from task to task but
when the script start merging segments it lauches about 20 jobs and
servers
get overloaded and crash. Nutch settings are primary default one.

How can I control the number of jobs?

best Regards
Alexander

Re: segmentmerger spawns too many jobs

Reply via email to