On Mon, May 14, 2012 at 2:11 PM, Shrijeet Paliwal <[email protected]> wrote: > Ahh of course! Thank you. One question what partition file I give to > the top partitioner? > I am trying to parse your last comment. > "You could figure how many you need by looking at the output of your MR job" > > Chicken and egg? Or am I not following you correctly. >
I was thinking that your MR job would not look to a table at all to figure where to partition the data. Rather, your reducer would write out files of size N where size N is just under your region max file size. After the MR is done, you'll then have M files. You'll need to create a table w/ M region boundaries (or M+1?) to match the flies produced (HFiles write out their first and last keys in metadata IIRC). You'll have to override the likes of the configureIncrementalLoad in HFileOutputFormat methinks. Its just a suggestion. I've not dug in on viability. St.Ack
