Re: Kudu hashes and Java hashes

2018-08-28 Thread Dan Burkert
I'm only aware of one reason you'd want to pre-partition the data before
inserting it into Kudu, and that's if you are sorting the input data prior
to inserting.  Having a way to map a row to a partition means the sort step
can be done per-partition instead of globally, which can help reduce memory
usage.  Impala does this via the C++ KuduPartitioner

API, but there isn't yet a Java equivalent.

- Dan

On Tue, Aug 28, 2018 at 9:58 AM, William Berkeley 
wrote:

> > 1. We have multiple Kudu clients (Reducers).
>
> Would it be better if each one has a single session to a single tablet
> writing large number of records,
>
> or multiple sessions writing to different tablets (total number of records
> is the same)?
>
>
> The advantage I see in writing to a single tablet from a single reducer is
> that if the reducer is scheduled locally to the leader replica of the
> tablet then one network hop is eliminated. However, the client, AFAIK,
> doesn't offer a general mechanism to know where a write will go. If the
> table is purely range partitioned it is possible, but not if the table has
> hash partitioning. Since leadership can change at any time, it wouldn't be
> a reliable mechanism anyway. To compare, Kudu does offer a way to split a
> scan into scan tokens, which can be serialized and dehydrated into a
> scanner on a machine where a suitable replica lives.
>
>
> So it doesn't really matter, as long as there are a good enough number of
> rows for most of the tablets being written to, so that the execution time
> isn't dominated by roundtrips to n servers (vs. to 1 server).
>
>
> Are you seeing a specific problem where Kudu isn't as fast as you
> anticipated, or where reducers writing to many tablets is a bottleneck?
>
>
> > 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1
> Reducers should write to 1 Tablet. What would be the proper implementation
> to reduce amount of connections between reducers to different tablets, i.e.
> if there are 128 reducers (each gathers its own set of unique hashes) and
> 128 tablets, then ideally each reducer should write to 1 tablet, but not to
> each of 128 tablets.
>
>
> I don't think the 1-1 relationship is preferable if, for each reducer, the
> number of rows written per tablet is large enough to fill multiple batches
> (10,000, say, is good enough).
>
>
> > Why have the questions arised: there is a hash implementation in Java
> and another one in Kudu. Is there any chance to ensure Java Reducers use
> the same hash function as Kudu partitioning hash?
>
>
> The current Java implementation of the encodings of the primary and
> partition keys can be found at https://github.com/apache/
> kudu/blob/master/java/kudu-client/src/main/java/org/
> apache/kudu/client/KeyEncoder.java, so you could adjust your code to
> match that and, possibly with a bit of additional custom code, be able to
> tell ahead of time which tablet a row belongs to. However, it is
> implementation, not interface. I don't imagine it changing, but it could,
> and there's no guarantee it won't.
>
>
> Adding the ability to determine the row-to-tablet mapping, as a feature
> request, I think might not be a good idea because writes must go to the
> tablet leader, and that can change at any time, so such a feature still
> doesn't provide a reliable way to determine the row-to-tablet-server
> mapping.
>
>
> -Will
>
> On Tue, Aug 28, 2018 at 1:08 AM Sergejs Andrejevs 
> wrote:
>
>> Hi there,
>>
>>
>>
>> We're running Map-Reduce jobs in java and Reducers write to Kudu.
>>
>> In java we use hashCode() function to send results from Mappers to
>> Reducers, e.g.
>>
>> public int getPartition(ArchiveKey key, Object value, int
>> numReduceTasks) {
>>
>> int hash = key.getCaseId().hashCode();
>>
>> return (hash & Integer.MAX_VALUE) % numReduceTasks;
>>
>> }
>>
>> There is also a partitioning hash function in Kudu tables.
>>
>>
>>
>> Therefore, there are 2 questions:
>>
>> 1. We have multiple Kudu clients (Reducers).
>>
>> Would it be better if each one has a single session to a single tablet
>> writing large number of records,
>>
>> or multiple sessions writing to different tablets (total number of
>> records is the same)?
>>
>> 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 Reducers
>> should write to 1 Tablet. What would be the proper implementation to reduce
>> amount of connections between reducers to different tablets, i.e. if there
>> are 128 reducers (each gathers its own set of unique hashes) and 128
>> tablets, then ideally each reducer should write to 1 tablet, but not to
>> each of 128 tablets.
>>
>>
>>
>> Why have the questions arised: there is a hash implementation in Java and
>> another one in Kudu. Is there any chance to ensure Java Reducers use the
>> same hash function as Kudu partitioning hash?
>>
>>
>>
>>
>>
>> Best regards,
>>
>> Sergejs
>>
>


Re: Kudu hashes and Java hashes

2018-08-28 Thread William Berkeley
> 1. We have multiple Kudu clients (Reducers).

Would it be better if each one has a single session to a single tablet
writing large number of records,

or multiple sessions writing to different tablets (total number of records
is the same)?


The advantage I see in writing to a single tablet from a single reducer is
that if the reducer is scheduled locally to the leader replica of the
tablet then one network hop is eliminated. However, the client, AFAIK,
doesn't offer a general mechanism to know where a write will go. If the
table is purely range partitioned it is possible, but not if the table has
hash partitioning. Since leadership can change at any time, it wouldn't be
a reliable mechanism anyway. To compare, Kudu does offer a way to split a
scan into scan tokens, which can be serialized and dehydrated into a
scanner on a machine where a suitable replica lives.


So it doesn't really matter, as long as there are a good enough number of
rows for most of the tablets being written to, so that the execution time
isn't dominated by roundtrips to n servers (vs. to 1 server).


Are you seeing a specific problem where Kudu isn't as fast as you
anticipated, or where reducers writing to many tablets is a bottleneck?


> 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 Reducers
should write to 1 Tablet. What would be the proper implementation to reduce
amount of connections between reducers to different tablets, i.e. if there
are 128 reducers (each gathers its own set of unique hashes) and 128
tablets, then ideally each reducer should write to 1 tablet, but not to
each of 128 tablets.


I don't think the 1-1 relationship is preferable if, for each reducer, the
number of rows written per tablet is large enough to fill multiple batches
(10,000, say, is good enough).


> Why have the questions arised: there is a hash implementation in Java and
another one in Kudu. Is there any chance to ensure Java Reducers use the
same hash function as Kudu partitioning hash?


The current Java implementation of the encodings of the primary and
partition keys can be found at
https://github.com/apache/kudu/blob/master/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java,
so you could adjust your code to match that and, possibly with a bit of
additional custom code, be able to tell ahead of time which tablet a row
belongs to. However, it is implementation, not interface. I don't imagine
it changing, but it could, and there's no guarantee it won't.


Adding the ability to determine the row-to-tablet mapping, as a feature
request, I think might not be a good idea because writes must go to the
tablet leader, and that can change at any time, so such a feature still
doesn't provide a reliable way to determine the row-to-tablet-server
mapping.


-Will

On Tue, Aug 28, 2018 at 1:08 AM Sergejs Andrejevs 
wrote:

> Hi there,
>
>
>
> We're running Map-Reduce jobs in java and Reducers write to Kudu.
>
> In java we use hashCode() function to send results from Mappers to
> Reducers, e.g.
>
> public int getPartition(ArchiveKey key, Object value, int
> numReduceTasks) {
>
> int hash = key.getCaseId().hashCode();
>
> return (hash & Integer.MAX_VALUE) % numReduceTasks;
>
> }
>
> There is also a partitioning hash function in Kudu tables.
>
>
>
> Therefore, there are 2 questions:
>
> 1. We have multiple Kudu clients (Reducers).
>
> Would it be better if each one has a single session to a single tablet
> writing large number of records,
>
> or multiple sessions writing to different tablets (total number of records
> is the same)?
>
> 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 Reducers
> should write to 1 Tablet. What would be the proper implementation to reduce
> amount of connections between reducers to different tablets, i.e. if there
> are 128 reducers (each gathers its own set of unique hashes) and 128
> tablets, then ideally each reducer should write to 1 tablet, but not to
> each of 128 tablets.
>
>
>
> Why have the questions arised: there is a hash implementation in Java and
> another one in Kudu. Is there any chance to ensure Java Reducers use the
> same hash function as Kudu partitioning hash?
>
>
>
>
>
> Best regards,
>
> Sergejs
>


Kudu hashes and Java hashes

2018-08-28 Thread Sergejs Andrejevs
Hi there,

We're running Map-Reduce jobs in java and Reducers write to Kudu.
In java we use hashCode() function to send results from Mappers to Reducers, 
e.g.
public int getPartition(ArchiveKey key, Object value, int numReduceTasks) {
int hash = key.getCaseId().hashCode();
return (hash & Integer.MAX_VALUE) % numReduceTasks;
}
There is also a partitioning hash function in Kudu tables.

Therefore, there are 2 questions:
1. We have multiple Kudu clients (Reducers).
Would it be better if each one has a single session to a single tablet writing 
large number of records,
or multiple sessions writing to different tablets (total number of records is 
the same)?
2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 Reducers 
should write to 1 Tablet. What would be the proper implementation to reduce 
amount of connections between reducers to different tablets, i.e. if there are 
128 reducers (each gathers its own set of unique hashes) and 128 tablets, then 
ideally each reducer should write to 1 tablet, but not to each of 128 tablets.

Why have the questions arised: there is a hash implementation in Java and 
another one in Kudu. Is there any chance to ensure Java Reducers use the same 
hash function as Kudu partitioning hash?


Best regards,
Sergejs