I'm only aware of one reason you'd want to pre-partition the data before inserting it into Kudu, and that's if you are sorting the input data prior to inserting. Having a way to map a row to a partition means the sort step can be done per-partition instead of globally, which can help reduce memory usage. Impala does this via the C++ KuduPartitioner <https://kudu.apache.org/cpp-client-api/classkudu_1_1client_1_1KuduPartitioner.html> API, but there isn't yet a Java equivalent.
- Dan On Tue, Aug 28, 2018 at 9:58 AM, William Berkeley <[email protected]> wrote: > > 1. We have multiple Kudu clients (Reducers). > > Would it be better if each one has a single session to a single tablet > writing large number of records, > > or multiple sessions writing to different tablets (total number of records > is the same)? > > > The advantage I see in writing to a single tablet from a single reducer is > that if the reducer is scheduled locally to the leader replica of the > tablet then one network hop is eliminated. However, the client, AFAIK, > doesn't offer a general mechanism to know where a write will go. If the > table is purely range partitioned it is possible, but not if the table has > hash partitioning. Since leadership can change at any time, it wouldn't be > a reliable mechanism anyway. To compare, Kudu does offer a way to split a > scan into scan tokens, which can be serialized and dehydrated into a > scanner on a machine where a suitable replica lives. > > > So it doesn't really matter, as long as there are a good enough number of > rows for most of the tablets being written to, so that the execution time > isn't dominated by roundtrips to n servers (vs. to 1 server). > > > Are you seeing a specific problem where Kudu isn't as fast as you > anticipated, or where reducers writing to many tablets is a bottleneck? > > > > 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 > Reducers should write to 1 Tablet. What would be the proper implementation > to reduce amount of connections between reducers to different tablets, i.e. > if there are 128 reducers (each gathers its own set of unique hashes) and > 128 tablets, then ideally each reducer should write to 1 tablet, but not to > each of 128 tablets. > > > I don't think the 1-1 relationship is preferable if, for each reducer, the > number of rows written per tablet is large enough to fill multiple batches > (10,000, say, is good enough). > > > > Why have the questions arised: there is a hash implementation in Java > and another one in Kudu. Is there any chance to ensure Java Reducers use > the same hash function as Kudu partitioning hash? > > > The current Java implementation of the encodings of the primary and > partition keys can be found at https://github.com/apache/ > kudu/blob/master/java/kudu-client/src/main/java/org/ > apache/kudu/client/KeyEncoder.java, so you could adjust your code to > match that and, possibly with a bit of additional custom code, be able to > tell ahead of time which tablet a row belongs to. However, it is > implementation, not interface. I don't imagine it changing, but it could, > and there's no guarantee it won't. > > > Adding the ability to determine the row-to-tablet mapping, as a feature > request, I think might not be a good idea because writes must go to the > tablet leader, and that can change at any time, so such a feature still > doesn't provide a reliable way to determine the row-to-tablet-server > mapping. > > > -Will > > On Tue, Aug 28, 2018 at 1:08 AM Sergejs Andrejevs <[email protected]> > wrote: > >> Hi there, >> >> >> >> We're running Map-Reduce jobs in java and Reducers write to Kudu. >> >> In java we use hashCode() function to send results from Mappers to >> Reducers, e.g. >> >> public int getPartition(ArchiveKey key, Object value, int >> numReduceTasks) { >> >> int hash = key.getCaseId().hashCode(); >> >> return (hash & Integer.MAX_VALUE) % numReduceTasks; >> >> } >> >> There is also a partitioning hash function in Kudu tables. >> >> >> >> Therefore, there are 2 questions: >> >> 1. We have multiple Kudu clients (Reducers). >> >> Would it be better if each one has a single session to a single tablet >> writing large number of records, >> >> or multiple sessions writing to different tablets (total number of >> records is the same)? >> >> 2. Assuming it is preferable to have 1-to-1 relationship, i.e. 1 Reducers >> should write to 1 Tablet. What would be the proper implementation to reduce >> amount of connections between reducers to different tablets, i.e. if there >> are 128 reducers (each gathers its own set of unique hashes) and 128 >> tablets, then ideally each reducer should write to 1 tablet, but not to >> each of 128 tablets. >> >> >> >> Why have the questions arised: there is a hash implementation in Java and >> another one in Kudu. Is there any chance to ensure Java Reducers use the >> same hash function as Kudu partitioning hash? >> >> >> >> >> >> Best regards, >> >> Sergejs >> >
