On 24/09/10 16:55, Ted Yu wrote:
 From TotalOrderPartitioner:
       K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
       if (splitPoints.length != job.getNumReduceTasks() - 1) {
Partition list can be empty if you use 1 reducer.

But this is not what you want I guess.
Yes, this is not what we want since we want to create x regions.
But, we just found that there is a tool, InputSampler, in the hadoop library for this task. It will sample an arbitrary dataset, and create the partition splits. We will try first this approach. My guess is that, even if these partitions are an approximation, it should be ok for hbase. The size of the regions will be not totally identical, but it should not be a problem since the larger regions will be the first ones split into smaller regions by hbase. Can somebody confirm this assumption ?
--
Renaud Delbru

Reply via email to