Thank you very much for responding, but this was not exactly what I was
looking for.
I have understood the splitting process when M/R jobs read from HBase
tables (that each M/R task reads from exactly one region).
What I would like to clarify if possible is, if there is indeed some
"locking" between map tasks concerning reading from different table's
regions (because I noticed a sequential "reading behaviour" from the
different map tasks),
and if so, how I could avoid it, in order to speed up the procedure and
make map tasks read data in parallel (each from its respective region).
Thank you again very much, hoping there is an answer to that,
Ioakim
On 09/04/2012 06:32 PM, Doug Meil wrote:
Hi there-
Yes, there is an input split for each region of the source table of a MR
job.
There is a blurb on that in the RefGuide...
http://hbase.apache.org/book.html#splitter
On 9/4/12 11:17 AM, "Ioakim Perros" <[email protected]> wrote:
Hello,
I would be grateful if someone could shed a light to the following:
Each M/R map task is reading data from a separate region of a table.
From the jobtracker 's GUI, at the map completion graph, I notice that
although data read from mappers are different, they read data
sequentially - like the table has a lock that permits only one mapper to
read data from every region at a time.
Does this "lock" hypothesis make sense? Is there any way I could avoid
this useless delay?
Thanks in advance and regards,
Ioakim