Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20624 )
Change subject: IMPALA-12517: Decode binary data with Python 3 ...................................................................... Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20624/3/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/20624/3/tests/shell/test_shell_commandline.py@1203 PS3, Line 1203: where string_col != "invalid utf8" """ > Python 2 accepts invalid UTF8, while Python 3 errors. I'm not sure whether Agree, it would be hard to change it, Thrift does the decoding for the string type during deserialization in Python 3. This is actually also the default in Python2, but the decoding can be skipped by adding option no_utf8strings to the Thrift compiler. I think that we can keep it like this unless the error becomes a problem for someone. A solution could be to treat STRING as binary in HS2 (with newer server and client versions) - avoiding per row utf8 decoding could also improve performance. -- To view, visit http://gerrit.cloudera.org:8080/20624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9222cd1ac081a38ab2b37d58628faac0812695ec Gerrit-Change-Number: 20624 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Fri, 27 Oct 2023 14:41:44 +0000 Gerrit-HasComments: Yes
