Try to increase the heap size on of task by setting
mapred.child.java.opts in mapred-site.xml. The default value is
-Xmx200m in mapred-default.xml which may be too small for you.



On Fri, Oct 8, 2010 at 6:55 PM, Vincent <[email protected]> wrote:
>
>
>
>  Thanks to Dmitriy and Jeff, I've set :
>
> set default_parallel 20; at the beginning of my script.
>
> Updated 8 JOINs to behave like:
>
> JOIN big BY id, small BY id USING 'replicated';
>
> Unfortunately this didn't improve the script speed (at least it runs for
> more than one hour now).
>
> But Looking in the jobtracker one of the job which reduce, I can see for the
> map:
>
>
>  Hadoop map task list for job_201010081314_0010
>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
>  prog7 <http://prog7.lan:50030/jobtracker.jsp>
>
> ------------------------------------------------------------------------
>
>
>   All Tasks
>
> Task    Complete        Status  Start Time      Finish Time     Errors
>  Counters
> task_201010081314_0010_m_000000
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>      100.00%
>
>
>        8-Oct-2010 14:07:44
>        8-Oct-2010 14:23:11 (15mins, 27sec)
>
>
> Too many fetch-failures
> Too many fetch-failures
>
>
>        8
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>
>
> And I can see this for the reduce
>
>
>  Hadoop reduce task list for job_201010081314_0010
>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
>  prog7 <http://prog7.lan:50030/jobtracker.jsp>
>
> ------------------------------------------------------------------------
>
>
>   All Tasks
>
> Task    Complete        Status  Start Time      Finish Time     Errors
>  Counters
> task_201010081314_0010_r_000000
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>      9.72%
>
>
>
>        reduce > copy (7 of 24 at 0.01 MB/s) >
>        8-Oct-2010 14:14:49
>
>
>
> Error: GC overhead limit exceeded
>
>
>        7
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
> task_201010081314_0010_r_000001
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>      0.00%
>
>
>        8-Oct-2010 14:14:52
>
>
>
> Error: Java heap space
>
>
>        0
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
> task_201010081314_0010_r_000002
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>      0.00%
>
>
>        8-Oct-2010 14:15:58
>
>
>
> java.io.IOException: Task process exit with nonzero status of 1.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>
>
>
>        0
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
> task_201010081314_0010_r_000003
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>      9.72%
>
>
>
>        reduce > copy (7 of 24 at 0.01 MB/s) >
>        8-Oct-2010 14:16:58
>
>
>        7
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
> task_201010081314_0010_r_000004
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>      0.00%
>
>
>        8-Oct-2010 14:18:11
>
>
>
> Error: GC overhead limit exceeded
>
>
>        0
> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
> task_201010081314_0010_r_000005
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
>      0.00%
>
>
>        8-Oct-2010 14:18:56
>
>
>
> Error: GC overhead limit exceeded
>
>
>
>
>
>
> Seems like it runs out of memory... Which parameter should be increased?
>
> -Vincent
>
>
> On 10/08/2010 01:12 PM, Jeff Zhang wrote:
>>
>> BTW, you can look at the job tracker web ui to see which part of the
>> job cost the most of the time
>>
>>
>>
>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>  wrote:
>>>
>>> No I mean whether your mapreduce job's reduce task number is 1.
>>>
>>> And could you share your pig script, then others can really understand
>>> your problem.
>>>
>>>
>>>
>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]>
>>>  wrote:
>>>>
>>>>  You are right, I didn't change this parameter, therefore the default is
>>>> used from src/mapred/mapred-default.xml
>>>>
>>>> <property>
>>>> <name>mapred.reduce.tasks</name>
>>>> <value>1</value>
>>>> <description>The default number of reduce tasks per job. Typically set
>>>> to
>>>> 99%
>>>>  of the cluster's reduce capacity, so that if a node fails the reduces
>>>> can
>>>>  still be executed in a single wave.
>>>>  Ignored when mapred.job.tracker is "local".
>>>> </description>
>>>> </property>
>>>>
>>>> Not clear for me what is the reduce capacity of my cluster :)
>>>>
>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>>>>
>>>>> I guess maybe your reduce number is 1 which cause the reduce phase very
>>>>> slowly.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]>
>>>>>  wrote:
>>>>>>
>>>>>>  Well I can see from the job tracker that all the jobs are done quite
>>>>>> quickly expect 2 for which reduce phase goes really really slowly.
>>>>>>
>>>>>> But how can I make the parallel between a job in the Hadoop jop
>>>>>> tracker
>>>>>> (example: job_201010072150_0045) and the Pig script execution?
>>>>>>
>>>>>> And what is the most efficient: several small Pig scripts? or one big
>>>>>> Pig
>>>>>> script? I did one big to avoid to load several time the same logs in
>>>>>> different scripts. Maybe it is not so good design...
>>>>>>
>>>>>> Thanks for your help.
>>>>>>
>>>>>> - Vincent
>>>>>>
>>>>>>
>>>>>> On 10/08/2010 11:31 AM, Vincent wrote:
>>>>>>>
>>>>>>>  I'm using pig-0.7.0 on hadoop-0.20.2.
>>>>>>>
>>>>>>> For the script, well it's more then 500 lines, I'm not sure if I post
>>>>>>> it
>>>>>>> here that somebody will read it till the end :-)
>>>>>>>
>>>>>>>
>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
>>>>>>>>
>>>>>>>> What version of Pig, and what does your script look like?
>>>>>>>>
>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>>  Hi All,
>>>>>>>>>
>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
>>>>>>>>> laugh.
>>>>>>>>>
>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one
>>>>>>>>> hour
>>>>>>>>> in
>>>>>>>>> pig
>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM.
>>>>>>>>>
>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes
>>>>>>>>> are
>>>>>>>>> not
>>>>>>>>> servers but simple computers:
>>>>>>>>> - Intel core 2 duo with 3GB of RAM.
>>>>>>>>> - Intel Quad with 4GB of RAM.
>>>>>>>>>
>>>>>>>>> Well I was aware that hadoop has overhead and that it won't be done
>>>>>>>>> in
>>>>>>>>> half
>>>>>>>>> an hour (time in local divided by number of nodes). But I was
>>>>>>>>> surprised
>>>>>>>>> to
>>>>>>>>> see this morning it took 7 hours to complete!!!
>>>>>>>>>
>>>>>>>>> My configuration was made according to this link:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>>>
>>>>>>>>> My question is simple: Is it normal?
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vincent
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>



-- 
Best Regards

Jeff Zhang

Reply via email to