Thank you very much for your response and for the excellent reference.
The thing is that I am running jobs on a distributed environment and
beyond the TableMapReduceUtil settings,
I have just set the scan ' s caching to the number of rows I expect to
retrieve at each map task, and the scan's caching blocks feature to
false (just as it is indicated at MapReduce examples of HBase's homepage).
I am not aware of such a job configuration (requesting jobtracker to
execute more than 1 map tasks concurrently). Any other ideas?
Thank you again and regards,
ioakim
On 09/04/2012 06:59 PM, Jerry Lam wrote:
Hi Loakim:
Sorry, your hypothesis doesn't make sense. I would suggest you to read the
"Learning HBase Internals" by Lars Hofhansl at
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
to
understand how HBase locking works.
Regarding to the issue you are facing, are you sure you configure the job
properly (i.e. requesting the jobtracker to have more than 1 mapper to
execute)? If you are testing on a single machine, you properly need to
configure the number of tasktracker per node as well to see more than 1
mapper to execute on a single machine.
my $0.02
Jerry
On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros <[email protected]> wrote:
Hello,
I would be grateful if someone could shed a light to the following:
Each M/R map task is reading data from a separate region of a table.
From the jobtracker 's GUI, at the map completion graph, I notice that
although data read from mappers are different, they read data sequentially
- like the table has a lock that permits only one mapper to read data from
every region at a time.
Does this "lock" hypothesis make sense? Is there any way I could avoid
this useless delay?
Thanks in advance and regards,
Ioakim