Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Jeff Zhang Fri, 08 Oct 2010 02:13:22 -0700

BTW, you can look at the job tracker web ui to see which part of the
job cost the most of the time




On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang <[email protected]> wrote:
> No I mean whether your mapreduce job's reduce task number is 1.
>
> And could you share your pig script, then others can really understand
> your problem.
>
>
>
> On Fri, Oct 8, 2010 at 5:04 PM, Vincent <[email protected]> wrote:
>>  You are right, I didn't change this parameter, therefore the default is
>> used from src/mapred/mapred-default.xml
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>1</value>
>> <description>The default number of reduce tasks per job. Typically set to
>> 99%
>>  of the cluster's reduce capacity, so that if a node fails the reduces can
>>  still be executed in a single wave.
>>  Ignored when mapred.job.tracker is "local".
>> </description>
>> </property>
>>
>> Not clear for me what is the reduce capacity of my cluster :)
>>
>> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>>
>>> I guess maybe your reduce number is 1 which cause the reduce phase very
>>> slowly.
>>>
>>>
>>>
>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]>
>>>  wrote:
>>>>
>>>>  Well I can see from the job tracker that all the jobs are done quite
>>>> quickly expect 2 for which reduce phase goes really really slowly.
>>>>
>>>> But how can I make the parallel between a job in the Hadoop jop tracker
>>>> (example: job_201010072150_0045) and the Pig script execution?
>>>>
>>>> And what is the most efficient: several small Pig scripts? or one big Pig
>>>> script? I did one big to avoid to load several time the same logs in
>>>> different scripts. Maybe it is not so good design...
>>>>
>>>> Thanks for your help.
>>>>
>>>> - Vincent
>>>>
>>>>
>>>> On 10/08/2010 11:31 AM, Vincent wrote:
>>>>>
>>>>>  I'm using pig-0.7.0 on hadoop-0.20.2.
>>>>>
>>>>> For the script, well it's more then 500 lines, I'm not sure if I post it
>>>>> here that somebody will read it till the end :-)
>>>>>
>>>>>
>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
>>>>>>
>>>>>> What version of Pig, and what does your script look like?
>>>>>>
>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]>
>>>>>>  wrote:
>>>>>>
>>>>>>>  Hi All,
>>>>>>>
>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
>>>>>>> laugh.
>>>>>>>
>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one hour
>>>>>>> in
>>>>>>> pig
>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM.
>>>>>>>
>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes
>>>>>>> are
>>>>>>> not
>>>>>>> servers but simple computers:
>>>>>>> - Intel core 2 duo with 3GB of RAM.
>>>>>>> - Intel Quad with 4GB of RAM.
>>>>>>>
>>>>>>> Well I was aware that hadoop has overhead and that it won't be done in
>>>>>>> half
>>>>>>> an hour (time in local divided by number of nodes). But I was
>>>>>>> surprised
>>>>>>> to
>>>>>>> see this morning it took 7 hours to complete!!!
>>>>>>>
>>>>>>> My configuration was made according to this link:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>
>>>>>>> My question is simple: Is it normal?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Reply via email to