RE: Obtain many mappers (or regions)

Florin P Mon, 27 Jun 2011 07:44:58 -0700

Hello!
  I've found the problem with the number of mappers. We are running the M/R 
jobs with Oozie that apparently ignores the set up of the mapred.map.tasks 
property that is used as a hint for computing the number of splits. Cite from 
the TableInputFormatBase#getSplits(old API) java doc:
"Splits are created in number equal to the smallest between numSplits and
    the number of {@link HRegion}s in the table. If the number of splits is
    smaller than the number of {@link HRegion}s then splits are spanned across
    multiple {@link HRegion}s and are grouped the most evenly possible. In the
    case splits are uneven the bigger splits are placed first in the
    {@link InputSplit} array."


By default the mapred.map.tasks is set up to 2. Applying the above algorithm on 
my scenario (and the oozie observation), computing  
min(mapred.map.tasks=2,number_of_my_regions=32) then we obtain the "magic" 
number of mappers 2. 
  We have observed this behavior, by implementing a Driver for the MR job and 
setting up the mapred.map.tasks to 40 let's say. Then the number of mappers are 
calculated correctly to 32.
  Regards,
   Florin



--- On Mon, 6/27/11, Florin P <[email protected]> wrote:

> From: Florin P <[email protected]>
> Subject: RE: Obtain many mappers (or regions)
> To: [email protected]
> Date: Monday, June 27, 2011, 8:46 AM
> Hi!
>   Thank you for your response. As I said, it is a
> temporary table. This table acts as a metadata for long
> tasks processing that we would like to trigger from the
> cluster (as map/reduce jobs) in order that all machines to
> take some of that tasks.
>   I have read the indicated chapter, and then I have
> followed the scenario:
>    1.We have loaded the small data into the
> hbase table
>    2. From the hbase admin interface we
> triggered the split action
>    3. We have seen that 32 new regions were
> created for that table
>    4. We have ran a map/reduce job that
> counts the number of rows
>    5. Only two mappers were created
> What is puzzles me is that only 2 mapper tasks were
> created, even in the indicated book it is stated that
>  (cite)"
> When TableInputFormat, is used to source an HBase table in
> a MapReduce job, its splitter will make a map task for each
> region of the table. Thus, if there are 100 regions in the
> table, there will be 100 map-tasks for the job - regardless
> of how many column families are selected in the Scan.
> "  
> 
> Can you please explain why this is happen? Did we miss some
> property configuration?
> 
> Thank you.
>  regards,
>   Florin
> --- On Mon, 6/27/11, Doug Meil <[email protected]>
> wrote:
> 
> > From: Doug Meil <[email protected]>
> > Subject: RE: Obtain many mappers (or regions)
> > To: "[email protected]"
> <[email protected]>
> > Date: Monday, June 27, 2011, 8:01 AM
> > Hi there-
> > 
> > If you only have 100 rows I think that HBase might be
> > overkill.
> > 
> > You probably want to start with this to get a
> background on
> > what HBase can do...
> > http://hbase.apache.org/book.html
> > .. there is a section on MapReduce with HBase as
> well.
> > 
> > -----Original Message-----
> > From: Florin P [mailto:[email protected]]
> > 
> > Sent: Monday, June 27, 2011 4:53 AM
> > To: [email protected]
> > Subject: Obtain many mappers (or regions)
> > 
> > Hello!
> > I have the following scenario:
> > 1. A temporary HBase table with small number of rows
> (aprox
> > 100) 2. A cluster with 2 machines that I would like
> to
> > crunch the data contained in the rows  
> > 
> > I would like to create two mappers that will crunch
> the
> > data from rows. 
> >  How can I achieve this?
> > A general question is: 
> >   how we can obtain many mappers to crunch small
> data
> > quantity?
> > 
> > Thank you.
> >   Regards,
> >   Florin  
> >
>

RE: Obtain many mappers (or regions)

Reply via email to