On Sun, May 13, 2012 at 4:12 PM, Shrijeet Paliwal <[email protected]> wrote: >
Can you write a MR job that rewrites the data once Shijeet? It would take hfiles for input and it would write out hfiles only it'd write hfiles no bigger than a region max in size. You'd use bulk importer to import (you'd also use total order partitioner so the output was totally sorted). You'd have pre-split the table into enough regions before running the bulk import (You could figure how many you need by looking at the output of your MR job). St.Ack
