Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16066 )

Change subject: IMPALA-9482: Support for BINARY columns
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h@256
PS14, Line 256:     return col_descs_[slot_desc->col_path().back()];
> Hmm, I realized that I have to think nested binaries through more carefully
Checked how Hive handles this, it seems to utf8 encode the nested binary with 
replace:
select named_struct("b", unhex("00112233445566778899AABBCCDDEEFF"));
result: {"b":"3DUfw��������}

select array(unhex("00112233445566778899AABBCCDDEEFF"));
result: ["3DUfw��������]

A weird behavior around nested binary is that it is not quoted (unlike string):
select named_struct("s", "a", "b", cast("a" as binary));
{"s":"a","b":a}

I have also checked what happens when we read these back from Impala:
- if we return the whole struct, the non-valid utf8 binary is replaced with an 
empty string
- if we return the whole array, it is returned escaped:
"\000\021\"3DUfw��������"]
- if the member/item is returned directly, we print it the same way as Hive
- if we hex() the result back, it returns it correctly, so Hive does not write 
it in a lossy way

I am thinking about not allowing nested binary for now, and enable them once 
the struct/array printing functions are unified + I have verified whether not 
quoting them is intentional in Hive.



--
To view, visit http://gerrit.cloudera.org:8080/16066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Gerrit-Change-Number: 16066
Gerrit-PatchSet: 18
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 11:22:57 +0000
Gerrit-HasComments: Yes

Reply via email to