Hello! Thank you for your responses. For our needs we have implemented our custom TableInputFormat and overriding the method getSplits(). By the way, how you can create via Java API, a number of regions (or with your words "pre-splits")? I have read about them but I would like to see an example of such usage. Thank you. Florin
--- On Mon, 6/27/11, Michel Segel <[email protected]> wrote: > From: Michel Segel <[email protected]> > Subject: Re: Obtain many mappers (or regions) > To: "[email protected]" <[email protected]> > Cc: "[email protected]" <[email protected]> > Date: Monday, June 27, 2011, 12:22 PM > Just a simple suggestion that will > make your life a bit easier... > > If your data is relatively small, small enough that you can > easily fit the result set in to memory... > You may want to do the following... > Oozie calls your map/reduce job. > At the start of your m/r job, you connect from the client > to hbase and read the result set in to a list object. (or > something similar). You then write a custom input format > class that uses a list object as its input. You can then > split the input as you need it. > > Much easier than trying to pre split temporary tables and a > lot less work and overhead. > > This is something that could be part of an indexing > solution. ;-P > (meaning that the classes are reusable for other > solutions...) > > HTH -Mike > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jun 27, 2011, at 7:46 AM, Florin P <[email protected]> > wrote: > > > Hi! > > Thank you for your response. As I said, it is a > temporary table. This table acts as a metadata for long > tasks processing that we would like to trigger from the > cluster (as map/reduce jobs) in order that all machines to > take some of that tasks. > > I have read the indicated chapter, and then I > have followed the scenario: > > 1.We have loaded the small data into > the hbase table > > 2. From the hbase admin interface we > triggered the split action > > 3. We have seen that 32 new regions > were created for that table > > 4. We have ran a map/reduce job that > counts the number of rows > > 5. Only two mappers were created > > What is puzzles me is that only 2 mapper tasks were > created, even in the indicated book it is stated that > > (cite)" > > When TableInputFormat, is used to source an HBase > table in a MapReduce job, its splitter will make a map task > for each region of the table. Thus, if there are 100 regions > in the table, there will be 100 map-tasks for the job - > regardless of how many column families are selected in the > Scan. > > " > > > > Can you please explain why this is happen? Did we miss > some property configuration? > > > > Thank you. > > regards, > > Florin > > --- On Mon, 6/27/11, Doug Meil <[email protected]> > wrote: > > > >> From: Doug Meil <[email protected]> > >> Subject: RE: Obtain many mappers (or regions) > >> To: "[email protected]" > <[email protected]> > >> Date: Monday, June 27, 2011, 8:01 AM > >> Hi there- > >> > >> If you only have 100 rows I think that HBase might > be > >> overkill. > >> > >> You probably want to start with this to get a > background on > >> what HBase can do... > >> http://hbase.apache.org/book.html > >> .. there is a section on MapReduce with HBase as > well. > >> > >> -----Original Message----- > >> From: Florin P [mailto:[email protected]] > >> > >> Sent: Monday, June 27, 2011 4:53 AM > >> To: [email protected] > >> Subject: Obtain many mappers (or regions) > >> > >> Hello! > >> I have the following scenario: > >> 1. A temporary HBase table with small number of > rows (aprox > >> 100) 2. A cluster with 2 machines that I would > like to > >> crunch the data contained in the rows > >> > >> I would like to create two mappers that will > crunch the > >> data from rows. > >> How can I achieve this? > >> A general question is: > >> how we can obtain many mappers to > crunch small data > >> quantity? > >> > >> Thank you. > >> Regards, > >> Florin > >> > > >
