Hi there- See the Hbase book about pre-creating regions.
http://hbase.apache.org/book.html#precreate.regions On 6/30/11 12:51 AM, "Florin P" <[email protected]> wrote: >Hello! > Thank you for your responses. For our needs we have implemented our >custom TableInputFormat and overriding the method getSplits(). >By the way, how you can create via Java API, a number of regions (or with >your words "pre-splits")? I have read about them but I would like to see >an example of such usage. > Thank you. > Florin > > >--- On Mon, 6/27/11, Michel Segel <[email protected]> wrote: > >> From: Michel Segel <[email protected]> >> Subject: Re: Obtain many mappers (or regions) >> To: "[email protected]" <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Date: Monday, June 27, 2011, 12:22 PM >> Just a simple suggestion that will >> make your life a bit easier... >> >> If your data is relatively small, small enough that you can >> easily fit the result set in to memory... >> You may want to do the following... >> Oozie calls your map/reduce job. >> At the start of your m/r job, you connect from the client >> to hbase and read the result set in to a list object. (or >> something similar). You then write a custom input format >> class that uses a list object as its input. You can then >> split the input as you need it. >> >> Much easier than trying to pre split temporary tables and a >> lot less work and overhead. >> >> This is something that could be part of an indexing >> solution. ;-P >> (meaning that the classes are reusable for other >> solutions...) >> >> HTH -Mike >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jun 27, 2011, at 7:46 AM, Florin P <[email protected]> >> wrote: >> >> > Hi! >> > Thank you for your response. As I said, it is a >> temporary table. This table acts as a metadata for long >> tasks processing that we would like to trigger from the >> cluster (as map/reduce jobs) in order that all machines to >> take some of that tasks. >> > I have read the indicated chapter, and then I >> have followed the scenario: >> > 1.We have loaded the small data into >> the hbase table >> > 2. From the hbase admin interface we >> triggered the split action >> > 3. We have seen that 32 new regions >> were created for that table >> > 4. We have ran a map/reduce job that >> counts the number of rows >> > 5. Only two mappers were created >> > What is puzzles me is that only 2 mapper tasks were >> created, even in the indicated book it is stated that >> > (cite)" >> > When TableInputFormat, is used to source an HBase >> table in a MapReduce job, its splitter will make a map task >> for each region of the table. Thus, if there are 100 regions >> in the table, there will be 100 map-tasks for the job - >> regardless of how many column families are selected in the >> Scan. >> > " >> > >> > Can you please explain why this is happen? Did we miss >> some property configuration? >> > >> > Thank you. >> > regards, >> > Florin >> > --- On Mon, 6/27/11, Doug Meil <[email protected]> >> wrote: >> > >> >> From: Doug Meil <[email protected]> >> >> Subject: RE: Obtain many mappers (or regions) >> >> To: "[email protected]" >> <[email protected]> >> >> Date: Monday, June 27, 2011, 8:01 AM >> >> Hi there- >> >> >> >> If you only have 100 rows I think that HBase might >> be >> >> overkill. >> >> >> >> You probably want to start with this to get a >> background on >> >> what HBase can do... >> >> http://hbase.apache.org/book.html >> >> .. there is a section on MapReduce with HBase as >> well. >> >> >> >> -----Original Message----- >> >> From: Florin P [mailto:[email protected]] >> >> >> >> Sent: Monday, June 27, 2011 4:53 AM >> >> To: [email protected] >> >> Subject: Obtain many mappers (or regions) >> >> >> >> Hello! >> >> I have the following scenario: >> >> 1. A temporary HBase table with small number of >> rows (aprox >> >> 100) 2. A cluster with 2 machines that I would >> like to >> >> crunch the data contained in the rows >> >> >> >> I would like to create two mappers that will >> crunch the >> >> data from rows. >> >> How can I achieve this? >> >> A general question is: >> >> how we can obtain many mappers to >> crunch small data >> >> quantity? >> >> >> >> Thank you. >> >> Regards, >> >> Florin >> >> >> > >>
