On Mon, Feb 6, 2012 at 8:58 AM, Jon Bender <[email protected]> wrote: > When you say it'll sort regions by you, does that mean I'll need to > identify the regions before dividing up the maps? Or just deal with the > fact that multiple maps might read from the same regionserver? >
If you do a multiget on N rows, internally HTable will sort the rows by region so that the big multiget get turns into a as many mini-multigets as there are regions present in the N rows. HTable then dispatches all in parallell and manages the returns, failures, etc. I was suggesting you run a client in the mapper and the map input would be N rows for the client to handle. Perhaps have each mapper do 5 minutes worth of N multigets. If in MR, your job gets distributed for you, retried (maybe you won't want retries?), etc. St.Ack
