Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Jeff Zhang Fri, 08 Oct 2010 04:16:37 -0700

Vincent,

Just want to remind you that you need to restart your cluster after
the reconfiguration.




On Fri, Oct 8, 2010 at 7:04 PM, Jeff Zhang <[email protected]> wrote:
> Try to increase the heap size on of task by setting
> mapred.child.java.opts in mapred-site.xml. The default value is
> -Xmx200m in mapred-default.xml which may be too small for you.
>
>
>
> On Fri, Oct 8, 2010 at 6:55 PM, Vincent <[email protected]> wrote:
>>
>>
>>
>>  Thanks to Dmitriy and Jeff, I've set :
>>
>> set default_parallel 20; at the beginning of my script.
>>
>> Updated 8 JOINs to behave like:
>>
>> JOIN big BY id, small BY id USING 'replicated';
>>
>> Unfortunately this didn't improve the script speed (at least it runs for
>> more than one hour now).
>>
>> But Looking in the jobtracker one of the job which reduce, I can see for the
>> map:
>>
>>
>>  Hadoop map task list for job_201010081314_0010
>>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
>>  prog7 <http://prog7.lan:50030/jobtracker.jsp>
>>
>> ------------------------------------------------------------------------
>>
>>
>>   All Tasks
>>
>> Task    Complete        Status  Start Time      Finish Time     Errors
>>  Counters
>> task_201010081314_0010_m_000000
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>      100.00%
>>
>>
>>        8-Oct-2010 14:07:44
>>        8-Oct-2010 14:23:11 (15mins, 27sec)
>>
>>
>> Too many fetch-failures
>> Too many fetch-failures
>>
>>
>>        8
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
>>
>>
>> And I can see this for the reduce
>>
>>
>>  Hadoop reduce task list for job_201010081314_0010
>>  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
>>  prog7 <http://prog7.lan:50030/jobtracker.jsp>
>>
>> ------------------------------------------------------------------------
>>
>>
>>   All Tasks
>>
>> Task    Complete        Status  Start Time      Finish Time     Errors
>>  Counters
>> task_201010081314_0010_r_000000
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>>      9.72%
>>
>>
>>
>>        reduce > copy (7 of 24 at 0.01 MB/s) >
>>        8-Oct-2010 14:14:49
>>
>>
>>
>> Error: GC overhead limit exceeded
>>
>>
>>        7
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
>> task_201010081314_0010_r_000001
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>>      0.00%
>>
>>
>>        8-Oct-2010 14:14:52
>>
>>
>>
>> Error: Java heap space
>>
>>
>>        0
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
>> task_201010081314_0010_r_000002
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>>      0.00%
>>
>>
>>        8-Oct-2010 14:15:58
>>
>>
>>
>> java.io.IOException: Task process exit with nonzero status of 1.
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>
>>
>>
>>        0
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
>> task_201010081314_0010_r_000003
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>>      9.72%
>>
>>
>>
>>        reduce > copy (7 of 24 at 0.01 MB/s) >
>>        8-Oct-2010 14:16:58
>>
>>
>>        7
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
>> task_201010081314_0010_r_000004
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>>      0.00%
>>
>>
>>        8-Oct-2010 14:18:11
>>
>>
>>
>> Error: GC overhead limit exceeded
>>
>>
>>        0
>> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
>> task_201010081314_0010_r_000005
>> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
>>      0.00%
>>
>>
>>        8-Oct-2010 14:18:56
>>
>>
>>
>> Error: GC overhead limit exceeded
>>
>>
>>
>>
>>
>>
>> Seems like it runs out of memory... Which parameter should be increased?
>>
>> -Vincent
>>
>>
>> On 10/08/2010 01:12 PM, Jeff Zhang wrote:
>>>
>>> BTW, you can look at the job tracker web ui to see which part of the
>>> job cost the most of the time
>>>
>>>
>>>
>>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>  wrote:
>>>>
>>>> No I mean whether your mapreduce job's reduce task number is 1.
>>>>
>>>> And could you share your pig script, then others can really understand
>>>> your problem.
>>>>
>>>>
>>>>
>>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]>
>>>>  wrote:
>>>>>
>>>>>  You are right, I didn't change this parameter, therefore the default is
>>>>> used from src/mapred/mapred-default.xml
>>>>>
>>>>> <property>
>>>>> <name>mapred.reduce.tasks</name>
>>>>> <value>1</value>
>>>>> <description>The default number of reduce tasks per job. Typically set
>>>>> to
>>>>> 99%
>>>>>  of the cluster's reduce capacity, so that if a node fails the reduces
>>>>> can
>>>>>  still be executed in a single wave.
>>>>>  Ignored when mapred.job.tracker is "local".
>>>>> </description>
>>>>> </property>
>>>>>
>>>>> Not clear for me what is the reduce capacity of my cluster :)
>>>>>
>>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>>>>>
>>>>>> I guess maybe your reduce number is 1 which cause the reduce phase very
>>>>>> slowly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]>
>>>>>>  wrote:
>>>>>>>
>>>>>>>  Well I can see from the job tracker that all the jobs are done quite
>>>>>>> quickly expect 2 for which reduce phase goes really really slowly.
>>>>>>>
>>>>>>> But how can I make the parallel between a job in the Hadoop jop
>>>>>>> tracker
>>>>>>> (example: job_201010072150_0045) and the Pig script execution?
>>>>>>>
>>>>>>> And what is the most efficient: several small Pig scripts? or one big
>>>>>>> Pig
>>>>>>> script? I did one big to avoid to load several time the same logs in
>>>>>>> different scripts. Maybe it is not so good design...
>>>>>>>
>>>>>>> Thanks for your help.
>>>>>>>
>>>>>>> - Vincent
>>>>>>>
>>>>>>>
>>>>>>> On 10/08/2010 11:31 AM, Vincent wrote:
>>>>>>>>
>>>>>>>>  I'm using pig-0.7.0 on hadoop-0.20.2.
>>>>>>>>
>>>>>>>> For the script, well it's more then 500 lines, I'm not sure if I post
>>>>>>>> it
>>>>>>>> here that somebody will read it till the end :-)
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
>>>>>>>>>
>>>>>>>>> What version of Pig, and what does your script look like?
>>>>>>>>>
>>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]>
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>>  Hi All,
>>>>>>>>>>
>>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
>>>>>>>>>> laugh.
>>>>>>>>>>
>>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one
>>>>>>>>>> hour
>>>>>>>>>> in
>>>>>>>>>> pig
>>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM.
>>>>>>>>>>
>>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes
>>>>>>>>>> are
>>>>>>>>>> not
>>>>>>>>>> servers but simple computers:
>>>>>>>>>> - Intel core 2 duo with 3GB of RAM.
>>>>>>>>>> - Intel Quad with 4GB of RAM.
>>>>>>>>>>
>>>>>>>>>> Well I was aware that hadoop has overhead and that it won't be done
>>>>>>>>>> in
>>>>>>>>>> half
>>>>>>>>>> an hour (time in local divided by number of nodes). But I was
>>>>>>>>>> surprised
>>>>>>>>>> to
>>>>>>>>>> see this morning it took 7 hours to complete!!!
>>>>>>>>>>
>>>>>>>>>> My configuration was made according to this link:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>>>>
>>>>>>>>>> My question is simple: Is it normal?
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vincent
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>>
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Reply via email to