Hi Adar,

Thanks for bumping IMPALA-5323. I'll add my use case there in hopes it will
help.
The problem we have with UTF-8 is that the bloated size and necessary
decoding doubles our query times. I agree that this problem is really on
Impala, but for now our simplest path forward is a workaround patch adding
a addBinaryString to the PartialRow api. A bit of a hack, but it's safe and
hopefully it will be temporary.

Thanks,
Cliff



On Tue, Dec 17, 2019 at 6:34 PM Adar Lieber-Dembo <a...@cloudera.com> wrote:

> From Kudu's perspective, I think the intent is for STRING to enforce
> UTF-8 encoding and if that is inappropriate for your use case, you
> should use BINARY (which is effectively STRING minus that
> enforcement). The fact that the C++ client doesn't enforce the
> encoding is a "bug" rather than a "feature". Though, looking at this
> more deeply, what actually happens if you try to shoehorn your HLL
> intermediates into the Java STRING APIs? Does the data actually get
> mangled, and if so, is it at write time, or at scan time?
>
> Of course, Kudu doesn't operate in a vacuum so Impala's considerations
> are important too. Unfortunately, there doesn't appear to have been
> any progress on IMPALA-5323, which would be the clearest path forward.
> Maybe you could update that ticket with your use case and hopefully
> get the attention of some Impala developers?
>
> On Mon, Dec 16, 2019 at 10:16 AM Cliff Resnick <cre...@gmail.com> wrote:
> >
> > Hi Kudu team,
> >
> > We use Kudu with Impala, and usually update Kudu through the Java api.
> We store some binary HLL intermediates in Kudu, but must use String type
> since Impala does not have a Binary type. Kudu's java client forces UTF-8
> encoding and we have a C++ UDAF in Impala that must decode Kudu's UTF-8 on
> every value.
> >
> > It looks like UTF-8 is not enforced in Kudu's C++ client, so I'm
> wondering why we could not have control over the String encoding in Java as
> well? As-is it looks like we'd have to fork the java code to add this
> support. Or is there another way?
> >
> > -Cliff
>

Reply via email to