Try to increase the heap size on of task by setting mapred.child.java.opts in mapred-site.xml. The default value is -Xmx200m in mapred-default.xml which may be too small for you.
On Fri, Oct 8, 2010 at 6:55 PM, Vincent <[email protected]> wrote: > > > > Thanks to Dmitriy and Jeff, I've set : > > set default_parallel 20; at the beginning of my script. > > Updated 8 JOINs to behave like: > > JOIN big BY id, small BY id USING 'replicated'; > > Unfortunately this didn't improve the script speed (at least it runs for > more than one hour now). > > But Looking in the jobtracker one of the job which reduce, I can see for the > map: > > > Hadoop map task list for job_201010081314_0010 > <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on > prog7 <http://prog7.lan:50030/jobtracker.jsp> > > ------------------------------------------------------------------------ > > > All Tasks > > Task Complete Status Start Time Finish Time Errors > Counters > task_201010081314_0010_m_000000 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> > 100.00% > > > 8-Oct-2010 14:07:44 > 8-Oct-2010 14:23:11 (15mins, 27sec) > > > Too many fetch-failures > Too many fetch-failures > > > 8 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> > > > And I can see this for the reduce > > > Hadoop reduce task list for job_201010081314_0010 > <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on > prog7 <http://prog7.lan:50030/jobtracker.jsp> > > ------------------------------------------------------------------------ > > > All Tasks > > Task Complete Status Start Time Finish Time Errors > Counters > task_201010081314_0010_r_000000 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> > 9.72% > > > > reduce > copy (7 of 24 at 0.01 MB/s) > > 8-Oct-2010 14:14:49 > > > > Error: GC overhead limit exceeded > > > 7 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> > task_201010081314_0010_r_000001 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> > 0.00% > > > 8-Oct-2010 14:14:52 > > > > Error: Java heap space > > > 0 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> > task_201010081314_0010_r_000002 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> > 0.00% > > > 8-Oct-2010 14:15:58 > > > > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > > > > 0 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> > task_201010081314_0010_r_000003 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> > 9.72% > > > > reduce > copy (7 of 24 at 0.01 MB/s) > > 8-Oct-2010 14:16:58 > > > 7 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> > task_201010081314_0010_r_000004 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> > 0.00% > > > 8-Oct-2010 14:18:11 > > > > Error: GC overhead limit exceeded > > > 0 > <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> > task_201010081314_0010_r_000005 > <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005> > 0.00% > > > 8-Oct-2010 14:18:56 > > > > Error: GC overhead limit exceeded > > > > > > > Seems like it runs out of memory... Which parameter should be increased? > > -Vincent > > > On 10/08/2010 01:12 PM, Jeff Zhang wrote: >> >> BTW, you can look at the job tracker web ui to see which part of the >> job cost the most of the time >> >> >> >> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]> wrote: >>> >>> No I mean whether your mapreduce job's reduce task number is 1. >>> >>> And could you share your pig script, then others can really understand >>> your problem. >>> >>> >>> >>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]> >>> wrote: >>>> >>>> You are right, I didn't change this parameter, therefore the default is >>>> used from src/mapred/mapred-default.xml >>>> >>>> <property> >>>> <name>mapred.reduce.tasks</name> >>>> <value>1</value> >>>> <description>The default number of reduce tasks per job. Typically set >>>> to >>>> 99% >>>> of the cluster's reduce capacity, so that if a node fails the reduces >>>> can >>>> still be executed in a single wave. >>>> Ignored when mapred.job.tracker is "local". >>>> </description> >>>> </property> >>>> >>>> Not clear for me what is the reduce capacity of my cluster :) >>>> >>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote: >>>>> >>>>> I guess maybe your reduce number is 1 which cause the reduce phase very >>>>> slowly. >>>>> >>>>> >>>>> >>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]> >>>>> wrote: >>>>>> >>>>>> Well I can see from the job tracker that all the jobs are done quite >>>>>> quickly expect 2 for which reduce phase goes really really slowly. >>>>>> >>>>>> But how can I make the parallel between a job in the Hadoop jop >>>>>> tracker >>>>>> (example: job_201010072150_0045) and the Pig script execution? >>>>>> >>>>>> And what is the most efficient: several small Pig scripts? or one big >>>>>> Pig >>>>>> script? I did one big to avoid to load several time the same logs in >>>>>> different scripts. Maybe it is not so good design... >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> - Vincent >>>>>> >>>>>> >>>>>> On 10/08/2010 11:31 AM, Vincent wrote: >>>>>>> >>>>>>> I'm using pig-0.7.0 on hadoop-0.20.2. >>>>>>> >>>>>>> For the script, well it's more then 500 lines, I'm not sure if I post >>>>>>> it >>>>>>> here that somebody will read it till the end :-) >>>>>>> >>>>>>> >>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote: >>>>>>>> >>>>>>>> What version of Pig, and what does your script look like? >>>>>>>> >>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you >>>>>>>>> laugh. >>>>>>>>> >>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one >>>>>>>>> hour >>>>>>>>> in >>>>>>>>> pig >>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM. >>>>>>>>> >>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes >>>>>>>>> are >>>>>>>>> not >>>>>>>>> servers but simple computers: >>>>>>>>> - Intel core 2 duo with 3GB of RAM. >>>>>>>>> - Intel Quad with 4GB of RAM. >>>>>>>>> >>>>>>>>> Well I was aware that hadoop has overhead and that it won't be done >>>>>>>>> in >>>>>>>>> half >>>>>>>>> an hour (time in local divided by number of nodes). But I was >>>>>>>>> surprised >>>>>>>>> to >>>>>>>>> see this morning it took 7 hours to complete!!! >>>>>>>>> >>>>>>>>> My configuration was made according to this link: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >>>>>>>>> >>>>>>>>> My question is simple: Is it normal? >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>> >>>>>>>>> Vincent >>>>>>>>> >>>>>>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > > -- Best Regards Jeff Zhang
