Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20624 )

Change subject: IMPALA-12517: Decode binary data with Python 3
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20624/3/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/20624/3/tests/shell/test_shell_commandline.py@1203
PS3, Line 1203:                where string_col != "invalid utf8" """
> Python 2 accepts invalid UTF8, while Python 3 errors. I'm not sure whether
Agree, it would be hard to change it, Thrift does the decoding for the string 
type during deserialization in Python 3. This is actually also the default in 
Python2, but the decoding can be skipped by adding option no_utf8strings to the 
Thrift compiler.

I think that we can keep it like this unless the error becomes a problem for 
someone. A solution could be to treat STRING as binary in HS2 (with newer 
server and client versions) - avoiding per row utf8 decoding could also improve 
performance.



--
To view, visit http://gerrit.cloudera.org:8080/20624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9222cd1ac081a38ab2b37d58628faac0812695ec
Gerrit-Change-Number: 20624
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Fri, 27 Oct 2023 14:41:44 +0000
Gerrit-HasComments: Yes

Reply via email to