Re: Kudu hashes and Java hashes

2018-08-28 Thread Dan Burkert
I'm only aware of one reason you'd want to pre-partition the data before inserting it into Kudu, and that's if you are sorting the input data prior to inserting. Having a way to map a row to a partition means the sort step can be done per-partition instead of globally, which can help reduce

Re: Kudu hashes and Java hashes

2018-08-28 Thread William Berkeley
> 1. We have multiple Kudu clients (Reducers). Would it be better if each one has a single session to a single tablet writing large number of records, or multiple sessions writing to different tablets (total number of records is the same)? The advantage I see in writing to a single tablet from

Kudu hashes and Java hashes

2018-08-28 Thread Sergejs Andrejevs
Hi there, We're running Map-Reduce jobs in java and Reducers write to Kudu. In java we use hashCode() function to send results from Mappers to Reducers, e.g. public int getPartition(ArchiveKey key, Object value, int numReduceTasks) { int hash = key.getCaseId().hashCode();