Re: Obtain many mappers (or regions)

Doug Meil Thu, 30 Jun 2011 05:24:22 -0700

Hi there-

See the Hbase book about pre-creating regions.


http://hbase.apache.org/book.html#precreate.regions






On 6/30/11 12:51 AM, "Florin P" <[email protected]> wrote:

>Hello!
>  Thank you for your responses. For our needs we have implemented our
>custom TableInputFormat and overriding the method getSplits().
>By the way, how you can create via Java API, a number of regions (or with
>your words "pre-splits")? I have read about them but I would like to see
>an example of such usage.
>  Thank you. 
> Florin
>
>
>--- On Mon, 6/27/11, Michel Segel <[email protected]> wrote:
>
>> From: Michel Segel <[email protected]>
>> Subject: Re: Obtain many mappers (or regions)
>> To: "[email protected]" <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Date: Monday, June 27, 2011, 12:22 PM
>> Just a simple suggestion that will
>> make your life a bit easier...
>> 
>> If your data is relatively small, small enough that you can
>> easily fit the result set in to memory...
>> You may want to do the following...
>> Oozie calls your map/reduce job.
>> At the start of your m/r job, you connect from the client
>> to hbase and read the result set in to a list object. (or
>> something similar). You then write a custom input format
>> class that uses a list object as its input. You can then
>> split the input as you need it.
>> 
>> Much easier than trying to pre split temporary tables and a
>> lot less work and overhead.
>> 
>> This is something that could be part of an indexing
>> solution. ;-P
>> (meaning that the classes are reusable for other
>> solutions...)
>> 
>> HTH -Mike
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Jun 27, 2011, at 7:46 AM, Florin P <[email protected]>
>> wrote:
>> 
>> > Hi!
>> >  Thank you for your response. As I said, it is a
>> temporary table. This table acts as a metadata for long
>> tasks processing that we would like to trigger from the
>> cluster (as map/reduce jobs) in order that all machines to
>> take some of that tasks.
>> >  I have read the indicated chapter, and then I
>> have followed the scenario:
>> >   1.We have loaded the small data into
>> the hbase table
>> >   2. From the hbase admin interface we
>> triggered the split action
>> >   3. We have seen that 32 new regions
>> were created for that table
>> >   4. We have ran a map/reduce job that
>> counts the number of rows
>> >   5. Only two mappers were created
>> > What is puzzles me is that only 2 mapper tasks were
>> created, even in the indicated book it is stated that
>> > (cite)"
>> > When TableInputFormat, is used to source an HBase
>> table in a MapReduce job, its splitter will make a map task
>> for each region of the table. Thus, if there are 100 regions
>> in the table, there will be 100 map-tasks for the job -
>> regardless of how many column families are selected in the
>> Scan.
>> > "  
>> > 
>> > Can you please explain why this is happen? Did we miss
>> some property configuration?
>> > 
>> > Thank you.
>> > regards,
>> >  Florin
>> > --- On Mon, 6/27/11, Doug Meil <[email protected]>
>> wrote:
>> > 
>> >> From: Doug Meil <[email protected]>
>> >> Subject: RE: Obtain many mappers (or regions)
>> >> To: "[email protected]"
>> <[email protected]>
>> >> Date: Monday, June 27, 2011, 8:01 AM
>> >> Hi there-
>> >> 
>> >> If you only have 100 rows I think that HBase might
>> be
>> >> overkill.
>> >> 
>> >> You probably want to start with this to get a
>> background on
>> >> what HBase can do...
>> >> http://hbase.apache.org/book.html
>> >> .. there is a section on MapReduce with HBase as
>> well.
>> >> 
>> >> -----Original Message-----
>> >> From: Florin P [mailto:[email protected]]
>> >> 
>> >> Sent: Monday, June 27, 2011 4:53 AM
>> >> To: [email protected]
>> >> Subject: Obtain many mappers (or regions)
>> >> 
>> >> Hello!
>> >> I have the following scenario:
>> >> 1. A temporary HBase table with small number of
>> rows (aprox
>> >> 100) 2. A cluster with 2 machines that I would
>> like to
>> >> crunch the data contained in the rows
>> >> 
>> >> I would like to create two mappers that will
>> crunch the
>> >> data from rows.
>> >> How can I achieve this?
>> >> A general question is:
>> >>   how we can obtain many mappers to
>> crunch small data
>> >> quantity?
>> >> 
>> >> Thank you.
>> >>   Regards,
>> >>   Florin  
>> >> 
>> > 
>>

Re: Obtain many mappers (or regions)

Reply via email to