Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Vincent Fri, 08 Oct 2010 07:49:58 -0700

 Yep, I did restart cluster (dfs and mapred stop/start).

Increasing the amount of memory I can see that the reduce task goesfurther (percentage is greater), but then start to decrease with memoryfailures.


On 10/08/2010 06:41 PM, Jeff Zhang wrote:

Did you restart cluster after reconfiguration ?


On Fri, Oct 8, 2010 at 9:59 PM, Vincent<[email protected]>  wrote:

  I've tried with mapred.child.java.opts value:
-Xmx512m -->  still memory errors in reduce phase
-Xmx1024m -->  still memory errors in reduce phase
I am now trying with -Xmx1536m but I'm afraid that my nodes will start to
swap memory...

Should I continue in this direction? Or it's already to much and I should
search the problem somewhere else?

Thanks

-Vincent


On 10/08/2010 03:04 PM, Jeff Zhang wrote:

Try to increase the heap size on of task by setting
mapred.child.java.opts in mapred-site.xml. The default value is
-Xmx200m in mapred-default.xml which may be too small for you.



On Fri, Oct 8, 2010 at 6:55 PM, Vincent<[email protected]>
  wrote:


  Thanks to Dmitriy and Jeff, I've set :

set default_parallel 20; at the beginning of my script.

Updated 8 JOINs to behave like:

JOIN big BY id, small BY id USING 'replicated';

Unfortunately this didn't improve the script speed (at least it runs for
more than one hour now).

But Looking in the jobtracker one of the job which reduce, I can see for
the
map:


  Hadoop map task list for job_201010081314_0010
  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>    on
  prog7<http://prog7.lan:50030/jobtracker.jsp>

------------------------------------------------------------------------


   All Tasks

Task    Complete        Status  Start Time      Finish Time     Errors
  Counters
task_201010081314_0010_m_000000

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
      100.00%


        8-Oct-2010 14:07:44
        8-Oct-2010 14:23:11 (15mins, 27sec)


Too many fetch-failures
Too many fetch-failures


        8

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


And I can see this for the reduce


  Hadoop reduce task list for job_201010081314_0010
  <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>    on
  prog7<http://prog7.lan:50030/jobtracker.jsp>

------------------------------------------------------------------------


   All Tasks

Task    Complete        Status  Start Time      Finish Time     Errors
  Counters
task_201010081314_0010_r_000000

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
      9.72%



        reduce>    copy (7 of 24 at 0.01 MB/s)>
        8-Oct-2010 14:14:49



Error: GC overhead limit exceeded


        7

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
task_201010081314_0010_r_000001

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
      0.00%


        8-Oct-2010 14:14:52



Error: Java heap space


        0

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
task_201010081314_0010_r_000002

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
      0.00%


        8-Oct-2010 14:15:58



java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)



        0

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
task_201010081314_0010_r_000003

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
      9.72%



        reduce>    copy (7 of 24 at 0.01 MB/s)>
        8-Oct-2010 14:16:58


        7

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
task_201010081314_0010_r_000004

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
      0.00%


        8-Oct-2010 14:18:11



Error: GC overhead limit exceeded


        0

<http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
task_201010081314_0010_r_000005

<http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
      0.00%


        8-Oct-2010 14:18:56



Error: GC overhead limit exceeded






Seems like it runs out of memory... Which parameter should be increased?

-Vincent


On 10/08/2010 01:12 PM, Jeff Zhang wrote:

BTW, you can look at the job tracker web ui to see which part of the
job cost the most of the time



On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<[email protected]>      wrote:

No I mean whether your mapreduce job's reduce task number is 1.

And could you share your pig script, then others can really understand
your problem.



On Fri, Oct 8, 2010 at 5:04 PM, Vincent<[email protected]>
  wrote:

  You are right, I didn't change this parameter, therefore the default
is
used from src/mapred/mapred-default.xml

<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set
to
99%
  of the cluster's reduce capacity, so that if a node fails the reduces
can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
</description>
</property>

Not clear for me what is the reduce capacity of my cluster :)

On 10/08/2010 01:00 PM, Jeff Zhang wrote:

I guess maybe your reduce number is 1 which cause the reduce phase
very
slowly.



On Fri, Oct 8, 2010 at 4:44 PM, Vincent<[email protected]>
  wrote:

  Well I can see from the job tracker that all the jobs are done
quite
quickly expect 2 for which reduce phase goes really really slowly.

But how can I make the parallel between a job in the Hadoop jop
tracker
(example: job_201010072150_0045) and the Pig script execution?

And what is the most efficient: several small Pig scripts? or one
big
Pig
script? I did one big to avoid to load several time the same logs in
different scripts. Maybe it is not so good design...

Thanks for your help.

- Vincent


On 10/08/2010 11:31 AM, Vincent wrote:

  I'm using pig-0.7.0 on hadoop-0.20.2.

For the script, well it's more then 500 lines, I'm not sure if I
post
it
here that somebody will read it till the end :-)


On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

What version of Pig, and what does your script look like?

On Thu, Oct 7, 2010 at 11:48 PM,
Vincent<[email protected]>
  wrote:

  Hi All,

I'm quite new to Pig/Hadoop. So maybe my cluster size will make
you
laugh.

I wrote a script on Pig handling 1.5GB of logs in less than one
hour
in
pig
local mode on a Intel core 2 duo with 3GB of RAM.

Then I tried this script on a simple 2 nodes cluster. These 2
nodes
are
not
servers but simple computers:
- Intel core 2 duo with 3GB of RAM.
- Intel Quad with 4GB of RAM.

Well I was aware that hadoop has overhead and that it won't be
done
in
half
an hour (time in local divided by number of nodes). But I was
surprised
to
see this morning it took 7 hours to complete!!!

My configuration was made according to this link:





http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

My question is simple: Is it normal?

Cheers


Vincent

--
Best Regards

Jeff Zhang

Re: pig local mode is faster than a 2 nodes cluster? Is it normal?

Reply via email to