On Mon, Feb 6, 2012 at 8:58 AM, Jon Bender <[email protected]> wrote:
> When you say it'll sort regions by you, does that mean I'll need to
> identify the regions before dividing up the maps?  Or just deal with the
> fact that multiple maps might read from the same regionserver?
>

If you do a multiget on N rows, internally HTable will sort the rows
by region so that the big multiget get turns into a as many
mini-multigets as there are regions present in the N rows.  HTable
then dispatches all in parallell and manages the returns, failures,
etc.

I was suggesting you run a client in the mapper and the map input
would be N rows for the client to handle.   Perhaps have each mapper
do 5 minutes worth of N multigets.

If in MR, your job gets distributed for you, retried (maybe you won't
want retries?), etc.
St.Ack

Reply via email to