Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 )
Change subject: IMPALA-9482: Support for BINARY columns ...................................................................... Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h@256 PS14, Line 256: return col_descs_[slot_desc->col_path().back()]; > Hmm, I realized that I have to think nested binaries through more carefully Checked how Hive handles this, it seems to utf8 encode the nested binary with replace: select named_struct("b", unhex("00112233445566778899AABBCCDDEEFF")); result: {"b":"3DUfw��������} select array(unhex("00112233445566778899AABBCCDDEEFF")); result: ["3DUfw��������] A weird behavior around nested binary is that it is not quoted (unlike string): select named_struct("s", "a", "b", cast("a" as binary)); {"s":"a","b":a} I have also checked what happens when we read these back from Impala: - if we return the whole struct, the non-valid utf8 binary is replaced with an empty string - if we return the whole array, it is returned escaped: "\000\021\"3DUfw��������"] - if the member/item is returned directly, we print it the same way as Hive - if we hex() the result back, it returns it correctly, so Hive does not write it in a lossy way I am thinking about not allowing nested binary for now, and enable them once the struct/array printing functions are unified + I have verified whether not quoting them is intentional in Hive. -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 05 Aug 2022 11:22:57 +0000 Gerrit-HasComments: Yes