Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description - make sure to read the examples: http://hbase.apache.org/book/mapreduce.example.html
Hadoop Related: - http://wiki.apache.org/hadoop/JobTracker - http://wiki.apache.org/hadoop/TaskTracker - http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html - Some Configurations: http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html HTH, Jerry On Tue, Sep 4, 2012 at 12:41 PM, Michael Segel <[email protected]>wrote: > I think the issue is that you are misinterpreting what you are seeing and > what Doug was trying to tell you... > > The short simple answer is that you're getting one split per region. Each > split is assigned to a specific mapper task and that task will sequentially > walk through the table finding the rows that match your scan request. > > There is no lock or blocking. > > I think you really should actually read Lars George's book on HBase to get > a better understanding. > > HTH > > -Mike > > On Sep 4, 2012, at 11:29 AM, Ioakim Perros <[email protected]> wrote: > > > Thank you very much for your response and for the excellent reference. > > > > The thing is that I am running jobs on a distributed environment and > beyond the TableMapReduceUtil settings, > > > > I have just set the scan ' s caching to the number of rows I expect to > retrieve at each map task, and the scan's caching blocks feature to false > (just as it is indicated at MapReduce examples of HBase's homepage). > > > > I am not aware of such a job configuration (requesting jobtracker to > execute more than 1 map tasks concurrently). Any other ideas? > > > > Thank you again and regards, > > ioakim > > > > > > On 09/04/2012 06:59 PM, Jerry Lam wrote: > >> Hi Loakim: > >> > >> Sorry, your hypothesis doesn't make sense. I would suggest you to read > the > >> "Learning HBase Internals" by Lars Hofhansl at > >> > http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final > >> to > >> understand how HBase locking works. > >> > >> Regarding to the issue you are facing, are you sure you configure the > job > >> properly (i.e. requesting the jobtracker to have more than 1 mapper to > >> execute)? If you are testing on a single machine, you properly need to > >> configure the number of tasktracker per node as well to see more than 1 > >> mapper to execute on a single machine. > >> > >> my $0.02 > >> > >> Jerry > >> > >> On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros <[email protected]> > wrote: > >> > >>> Hello, > >>> > >>> I would be grateful if someone could shed a light to the following: > >>> > >>> Each M/R map task is reading data from a separate region of a table. > >>> From the jobtracker 's GUI, at the map completion graph, I notice that > >>> although data read from mappers are different, they read data > sequentially > >>> - like the table has a lock that permits only one mapper to read data > from > >>> every region at a time. > >>> > >>> Does this "lock" hypothesis make sense? Is there any way I could avoid > >>> this useless delay? > >>> > >>> Thanks in advance and regards, > >>> Ioakim > >>> > > > > > >
