BTW, you can look at the job tracker web ui to see which part of the job cost the most of the time
On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang <[email protected]> wrote: > No I mean whether your mapreduce job's reduce task number is 1. > > And could you share your pig script, then others can really understand > your problem. > > > > On Fri, Oct 8, 2010 at 5:04 PM, Vincent <[email protected]> wrote: >> You are right, I didn't change this parameter, therefore the default is >> used from src/mapred/mapred-default.xml >> >> <property> >> <name>mapred.reduce.tasks</name> >> <value>1</value> >> <description>The default number of reduce tasks per job. Typically set to >> 99% >> of the cluster's reduce capacity, so that if a node fails the reduces can >> still be executed in a single wave. >> Ignored when mapred.job.tracker is "local". >> </description> >> </property> >> >> Not clear for me what is the reduce capacity of my cluster :) >> >> On 10/08/2010 01:00 PM, Jeff Zhang wrote: >>> >>> I guess maybe your reduce number is 1 which cause the reduce phase very >>> slowly. >>> >>> >>> >>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]> >>> wrote: >>>> >>>> Well I can see from the job tracker that all the jobs are done quite >>>> quickly expect 2 for which reduce phase goes really really slowly. >>>> >>>> But how can I make the parallel between a job in the Hadoop jop tracker >>>> (example: job_201010072150_0045) and the Pig script execution? >>>> >>>> And what is the most efficient: several small Pig scripts? or one big Pig >>>> script? I did one big to avoid to load several time the same logs in >>>> different scripts. Maybe it is not so good design... >>>> >>>> Thanks for your help. >>>> >>>> - Vincent >>>> >>>> >>>> On 10/08/2010 11:31 AM, Vincent wrote: >>>>> >>>>> I'm using pig-0.7.0 on hadoop-0.20.2. >>>>> >>>>> For the script, well it's more then 500 lines, I'm not sure if I post it >>>>> here that somebody will read it till the end :-) >>>>> >>>>> >>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote: >>>>>> >>>>>> What version of Pig, and what does your script look like? >>>>>> >>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you >>>>>>> laugh. >>>>>>> >>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one hour >>>>>>> in >>>>>>> pig >>>>>>> local mode on a Intel core 2 duo with 3GB of RAM. >>>>>>> >>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes >>>>>>> are >>>>>>> not >>>>>>> servers but simple computers: >>>>>>> - Intel core 2 duo with 3GB of RAM. >>>>>>> - Intel Quad with 4GB of RAM. >>>>>>> >>>>>>> Well I was aware that hadoop has overhead and that it won't be done in >>>>>>> half >>>>>>> an hour (time in local divided by number of nodes). But I was >>>>>>> surprised >>>>>>> to >>>>>>> see this morning it took 7 hours to complete!!! >>>>>>> >>>>>>> My configuration was made according to this link: >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >>>>>>> >>>>>>> My question is simple: Is it normal? >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>> >>>> >>> >>> >> >> > > > > -- > Best Regards > > Jeff Zhang > -- Best Regards Jeff Zhang
