Hi Guillermo, You should see as many MR tasks as you have regions in your input table. There will be one scan per task. They will all run in parallel is you have enough MR slots. Else, some of them will run in parallel, and the others will wait for an available slot. HBase will try to run those tasks on the RS the regions are. So doing on the client side using multiple thread will have a bigger impact on the resources usage since you will have a lot of calls between the client and all the region servers.
JM 2014-05-07 8:34 GMT-04:00 Guillermo Ortiz <[email protected]>: > I am processing data from HBase with a MapReduce. The input of my MapReduce > is a "full" scan of a table. > > When I execute a full scan with TableMapReduceUtil, is this scan executed > in parallel, so all mappers get the data in parallel?? same way that if I > would execute many range scans with threads? >
