Hello,
I am writing a MR job where each reducer will output one HFile containing some
of the rows of the table that will be created.
At first I thought to use the HashPartitioner to achieve load balancing, but
this would mix the rows and the output of each reducer will not be a continuous
part of the Hbase table that will be created combining all these files.
So, I would like to ask you if it is important to use a Partitioner
(TotalOrderPartitioner, for example) that will allow the reducers to have a
continuous part of the table?
If I do not do that, will this ruin the performance of HBase when executing
queries or when it runs compactions, as rows, which are supposed to be next to
each other, will be in different HFiles and the number of disk seeks will
increase?
Thank you for your help!
Panagiotis