Hi Adar, Thanks for bumping IMPALA-5323. I'll add my use case there in hopes it will help. The problem we have with UTF-8 is that the bloated size and necessary decoding doubles our query times. I agree that this problem is really on Impala, but for now our simplest path forward is a workaround patch adding a addBinaryString to the PartialRow api. A bit of a hack, but it's safe and hopefully it will be temporary.
Thanks, Cliff On Tue, Dec 17, 2019 at 6:34 PM Adar Lieber-Dembo <a...@cloudera.com> wrote: > From Kudu's perspective, I think the intent is for STRING to enforce > UTF-8 encoding and if that is inappropriate for your use case, you > should use BINARY (which is effectively STRING minus that > enforcement). The fact that the C++ client doesn't enforce the > encoding is a "bug" rather than a "feature". Though, looking at this > more deeply, what actually happens if you try to shoehorn your HLL > intermediates into the Java STRING APIs? Does the data actually get > mangled, and if so, is it at write time, or at scan time? > > Of course, Kudu doesn't operate in a vacuum so Impala's considerations > are important too. Unfortunately, there doesn't appear to have been > any progress on IMPALA-5323, which would be the clearest path forward. > Maybe you could update that ticket with your use case and hopefully > get the attention of some Impala developers? > > On Mon, Dec 16, 2019 at 10:16 AM Cliff Resnick <cre...@gmail.com> wrote: > > > > Hi Kudu team, > > > > We use Kudu with Impala, and usually update Kudu through the Java api. > We store some binary HLL intermediates in Kudu, but must use String type > since Impala does not have a Binary type. Kudu's java client forces UTF-8 > encoding and we have a C++ UDAF in Impala that must decode Kudu's UTF-8 on > every value. > > > > It looks like UTF-8 is not enforced in Kudu's C++ client, so I'm > wondering why we could not have control over the String encoding in Java as > well? As-is it looks like we'd have to fork the java code to add this > support. Or is there another way? > > > > -Cliff >