I'm only aware of one reason you'd want to pre-partition the data before
inserting it into Kudu, and that's if you are sorting the input data prior
to inserting. Having a way to map a row to a partition means the sort step
can be done per-partition instead of globally, which can help reduce
> 1. We have multiple Kudu clients (Reducers).
Would it be better if each one has a single session to a single tablet
writing large number of records,
or multiple sessions writing to different tablets (total number of records
is the same)?
The advantage I see in writing to a single tablet from
Hi there,
We're running Map-Reduce jobs in java and Reducers write to Kudu.
In java we use hashCode() function to send results from Mappers to Reducers,
e.g.
public int getPartition(ArchiveKey key, Object value, int numReduceTasks) {
int hash = key.getCaseId().hashCode();