Re: Reading in parallel from table's regions in MapReduce

Ioakim Perros Tue, 04 Sep 2012 08:43:57 -0700

Thank you very much for responding, but this was not exactly what I waslooking for.

I have understood the splitting process when M/R jobs read from HBasetables (that each M/R task reads from exactly one region).

What I would like to clarify if possible is, if there is indeed some"locking" between map tasks concerning reading from different table'sregions (because I noticed a sequential "reading behaviour" from thedifferent map tasks),

and if so, how I could avoid it, in order to speed up the procedure andmake map tasks read data in parallel (each from its respective region).


Thank you again very much, hoping there is an answer to that,
Ioakim

On 09/04/2012 06:32 PM, Doug Meil wrote:

Hi there-

Yes, there is an input split for each region of the source table of a MR
job.

There is a blurb on that in the RefGuide...

http://hbase.apache.org/book.html#splitter





On 9/4/12 11:17 AM, "Ioakim Perros" <[email protected]> wrote:

Hello,

I would be grateful if someone could shed a light to the following:

Each M/R map task is reading data from a separate region of a table.
 From the jobtracker 's GUI, at the map completion graph, I notice that
although data read from mappers are different, they read data
sequentially - like the table has a lock that permits only one mapper to
read data from every region at a time.

Does this "lock" hypothesis make sense? Is there any way I could avoid
this useless delay?

Thanks in advance and regards,
Ioakim

Re: Reading in parallel from table's regions in MapReduce

Reply via email to