I run nutch on EC2 small servers, they have about 2Gb RAM. I use DFS. Yes, I
supposed tasks, not jobs. Just took the name from the job tracker web page.

Where should I add these params? In nutch-site.xml or hadoop-site.xml?

My logs look so

08/11/25 02:41:53 INFO mapred.JobClient:  map 73% reduce 22%
08/11/25 02:41:59 INFO mapred.JobClient:  map 73% reduce 23%
08/11/25 02:42:06 INFO mapred.JobClient:  map 73% reduce 24%
08/11/25 02:59:02 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000001_0, Status : FAILED
Task attempt_200811250109_0014_m_000001_0 failed to report status for 603
seconds. Killing!
08/11/25 02:59:06 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000007_0, Status : FAILED
Task attempt_200811250109_0014_m_000007_0 failed to report status for 604
seconds. Killing!
08/11/25 03:01:13 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000000_1, Status : FAILED
Task attempt_200811250109_0014_m_000000_1 failed to report status for 604
seconds. Killing!
08/11/25 03:01:43 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000001_1, Status : FAILED
Task attempt_200811250109_0014_m_000001_1 failed to report status for 600
seconds. Killing!

....

Task attempt_200811250109_0014_m_000019_1 failed to report status for 602
seconds. Killing!
08/11/25 03:37:51 INFO mapred.JobClient: Task Id :
attempt_200811250109_0014_m_000021_1, Status : FAILED
Task attempt_200811250109_0014_m_000021_1 failed to report status for 600
seconds. Killing!
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
        at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:622)
        at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:667)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Alexander



2008/11/26 Dennis Kubes <[EMAIL PROTECTED]>

> The mapred.map.tasks and mapred.reduce.tasks will define the approximate
> number of tasks per job.  It is highly dependent upon the amount of data
> being processed as well.  The mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum define the maximum number of tasks
> to run on a single tasktracker for map and reduce tasks.
>
> When you say 20 jobs I am assuming you mean tasks.  Also what type of
> hardware are you running this on, what are your memory settings, running in
> local or DFS mode?
>
> Dennis
>
>
> Alexander Aristov wrote:
>
>> Hi all
>>
>> Can someone suggest me how to restrict number of jobs Nutch lauches in
>> hadoop when starts segment merger.
>>
>> When I run generate, fetch, updatedb tasks Nutch starts about 6-10
>> Mapreduce
>> jobs (cluster of 2 datanodes) - actual value varies from task to task but
>> when the script start merging segments it lauches about 20 jobs and
>> servers
>> get overloaded and crash. Nutch settings are primary default one.
>>
>> How can I control the number of jobs?
>>
>> best Regards
>> Alexander
>>
>>
>>


-- 
Best Regards
Alexander Aristov

Reply via email to