Vincent, Just want to remind you that you need to restart your cluster after the reconfiguration.
On Fri, Oct 8, 2010 at 7:04 PM, Jeff Zhang <[email protected]> wrote: > Try to increase the heap size on of task by setting > mapred.child.java.opts in mapred-site.xml. The default value is > -Xmx200m in mapred-default.xml which may be too small for you. > > > > On Fri, Oct 8, 2010 at 6:55 PM, Vincent <[email protected]> wrote: >> >> >> >> Thanks to Dmitriy and Jeff, I've set : >> >> set default_parallel 20; at the beginning of my script. >> >> Updated 8 JOINs to behave like: >> >> JOIN big BY id, small BY id USING 'replicated'; >> >> Unfortunately this didn't improve the script speed (at least it runs for >> more than one hour now). >> >> But Looking in the jobtracker one of the job which reduce, I can see for the >> map: >> >> >> Hadoop map task list for job_201010081314_0010 >> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on >> prog7 <http://prog7.lan:50030/jobtracker.jsp> >> >> ------------------------------------------------------------------------ >> >> >> All Tasks >> >> Task Complete Status Start Time Finish Time Errors >> Counters >> task_201010081314_0010_m_000000 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> >> 100.00% >> >> >> 8-Oct-2010 14:07:44 >> 8-Oct-2010 14:23:11 (15mins, 27sec) >> >> >> Too many fetch-failures >> Too many fetch-failures >> >> >> 8 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000> >> >> >> And I can see this for the reduce >> >> >> Hadoop reduce task list for job_201010081314_0010 >> <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on >> prog7 <http://prog7.lan:50030/jobtracker.jsp> >> >> ------------------------------------------------------------------------ >> >> >> All Tasks >> >> Task Complete Status Start Time Finish Time Errors >> Counters >> task_201010081314_0010_r_000000 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> >> 9.72% >> >> >> >> reduce > copy (7 of 24 at 0.01 MB/s) > >> 8-Oct-2010 14:14:49 >> >> >> >> Error: GC overhead limit exceeded >> >> >> 7 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000> >> task_201010081314_0010_r_000001 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> >> 0.00% >> >> >> 8-Oct-2010 14:14:52 >> >> >> >> Error: Java heap space >> >> >> 0 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001> >> task_201010081314_0010_r_000002 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> >> 0.00% >> >> >> 8-Oct-2010 14:15:58 >> >> >> >> java.io.IOException: Task process exit with nonzero status of 1. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >> >> >> >> 0 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002> >> task_201010081314_0010_r_000003 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> >> 9.72% >> >> >> >> reduce > copy (7 of 24 at 0.01 MB/s) > >> 8-Oct-2010 14:16:58 >> >> >> 7 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003> >> task_201010081314_0010_r_000004 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> >> 0.00% >> >> >> 8-Oct-2010 14:18:11 >> >> >> >> Error: GC overhead limit exceeded >> >> >> 0 >> <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004> >> task_201010081314_0010_r_000005 >> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005> >> 0.00% >> >> >> 8-Oct-2010 14:18:56 >> >> >> >> Error: GC overhead limit exceeded >> >> >> >> >> >> >> Seems like it runs out of memory... Which parameter should be increased? >> >> -Vincent >> >> >> On 10/08/2010 01:12 PM, Jeff Zhang wrote: >>> >>> BTW, you can look at the job tracker web ui to see which part of the >>> job cost the most of the time >>> >>> >>> >>> On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]> wrote: >>>> >>>> No I mean whether your mapreduce job's reduce task number is 1. >>>> >>>> And could you share your pig script, then others can really understand >>>> your problem. >>>> >>>> >>>> >>>> On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]> >>>> wrote: >>>>> >>>>> You are right, I didn't change this parameter, therefore the default is >>>>> used from src/mapred/mapred-default.xml >>>>> >>>>> <property> >>>>> <name>mapred.reduce.tasks</name> >>>>> <value>1</value> >>>>> <description>The default number of reduce tasks per job. Typically set >>>>> to >>>>> 99% >>>>> of the cluster's reduce capacity, so that if a node fails the reduces >>>>> can >>>>> still be executed in a single wave. >>>>> Ignored when mapred.job.tracker is "local". >>>>> </description> >>>>> </property> >>>>> >>>>> Not clear for me what is the reduce capacity of my cluster :) >>>>> >>>>> On 10/08/2010 01:00 PM, Jeff Zhang wrote: >>>>>> >>>>>> I guess maybe your reduce number is 1 which cause the reduce phase very >>>>>> slowly. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Well I can see from the job tracker that all the jobs are done quite >>>>>>> quickly expect 2 for which reduce phase goes really really slowly. >>>>>>> >>>>>>> But how can I make the parallel between a job in the Hadoop jop >>>>>>> tracker >>>>>>> (example: job_201010072150_0045) and the Pig script execution? >>>>>>> >>>>>>> And what is the most efficient: several small Pig scripts? or one big >>>>>>> Pig >>>>>>> script? I did one big to avoid to load several time the same logs in >>>>>>> different scripts. Maybe it is not so good design... >>>>>>> >>>>>>> Thanks for your help. >>>>>>> >>>>>>> - Vincent >>>>>>> >>>>>>> >>>>>>> On 10/08/2010 11:31 AM, Vincent wrote: >>>>>>>> >>>>>>>> I'm using pig-0.7.0 on hadoop-0.20.2. >>>>>>>> >>>>>>>> For the script, well it's more then 500 lines, I'm not sure if I post >>>>>>>> it >>>>>>>> here that somebody will read it till the end :-) >>>>>>>> >>>>>>>> >>>>>>>> On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote: >>>>>>>>> >>>>>>>>> What version of Pig, and what does your script look like? >>>>>>>>> >>>>>>>>> On Thu, Oct 7, 2010 at 11:48 PM, Vincent<[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I'm quite new to Pig/Hadoop. So maybe my cluster size will make you >>>>>>>>>> laugh. >>>>>>>>>> >>>>>>>>>> I wrote a script on Pig handling 1.5GB of logs in less than one >>>>>>>>>> hour >>>>>>>>>> in >>>>>>>>>> pig >>>>>>>>>> local mode on a Intel core 2 duo with 3GB of RAM. >>>>>>>>>> >>>>>>>>>> Then I tried this script on a simple 2 nodes cluster. These 2 nodes >>>>>>>>>> are >>>>>>>>>> not >>>>>>>>>> servers but simple computers: >>>>>>>>>> - Intel core 2 duo with 3GB of RAM. >>>>>>>>>> - Intel Quad with 4GB of RAM. >>>>>>>>>> >>>>>>>>>> Well I was aware that hadoop has overhead and that it won't be done >>>>>>>>>> in >>>>>>>>>> half >>>>>>>>>> an hour (time in local divided by number of nodes). But I was >>>>>>>>>> surprised >>>>>>>>>> to >>>>>>>>>> see this morning it took 7 hours to complete!!! >>>>>>>>>> >>>>>>>>>> My configuration was made according to this link: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >>>>>>>>>> >>>>>>>>>> My question is simple: Is it normal? >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Vincent >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >> >> > > > > -- > Best Regards > > Jeff Zhang > -- Best Regards Jeff Zhang
