re: "data from raw data file into hbase table" One approach is bulk loading..
http://hbase.apache.org/book.html#arch.bulk.load If he's talking about using an Hbase table as the source of a MR job, then see this... http://hbase.apache.org/book.html#splitter On 5/25/12 2:35 AM, "Florin P" <[email protected]> wrote: >Hello! > >I've read Lars George's blog >http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where >at the end of the article, he mentioned "In the next post I will show you >how to import data from a raw data >file into a HBase table and how you eventually process the data in the >HBase table. We will address questions like how many mappers and/or >reducers are needed and how can I improve import and processing >performance.". I looked in the blog up for these questions, but it seems >that there is no article related. Do you knoe if he you touched these >subjects into a different post or book? Particular I am interested > >1. how you can set up the number of mappers? >2. number of mappers can be set up per region server? If yes how? >3. How the big number of set up mappers can affect the data locality? >4. is this algorithm for computing the number of mappers >(https://issues.apache.org/jira/browse/HBASE-1172) still available >"Currently, >the number of mappers specified when using TableInputFormat is strictly >followed if less than total regions on the input table. If greater, the >number of regions is used. >This will modify the splitting algorithm to do the following: > * Specify 0 mappers when you want # mappers = # regions > * If you specify fewer mappers than regions, will use exactly the number >you specify based on the current algorithm > * If >you specify more mappers than regions, will divide regions up by >determining [start,X) [X,end). The number of mappers will always be a >multiple of number of regions. This is so we do not have scanners >spanning multiple regions. >There is an additional issue in that the default number of mappers >in JobConf is set to 1. That means if a user does not explicitly set >number of map tasks, a single mapper will be used. " > >I'll look forward for you answers. Thank you. > >Kind regards, Florin
