Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 )
Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 ...................................................................... Patch Set 21: (8 comments) I played around with this patch and unfortunately encountered some unicode issues in impala-shell. They confused me a lot since I got different results between bin/impala-shell.sh and the tests. I found some causes (still not root cause, need to understand more of thrift's fastbinary.so), so left some comments to make impala-shell more robust. http://gerrit.cloudera.org:8080/#/c/17170/21/common/thrift/CMakeLists.txt File common/thrift/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/17170/21/common/thrift/CMakeLists.txt@146 PS21, Line 146: no_utf8strings Could you add a comment for this? I think based on our offline discussion, this is for impyla (used in tests) to work without changes (though it results in changes in impala-shell...) http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell File shell/impala-shell: http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell@29 PS21, Line 29: 0.9 stale comment http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@85 PS21, Line 85: def utf8_decode_if_needed(val): While calling this on all string fields from thrift, I think we also need to encode unicodes into strings when creating thrift requests. Otherwise, we will fail in using TBinaryProtocol, or in the fallback mode of TBinaryProtocolAccelerated (i.e. failed to load the fastbinary C module then the py module is used). This is found by a local issue with my env. I failed to execute a query contains non-ascii chars using shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell. The failure is at sending the ExecuteStatement request. Note that using bin/impala-shell.sh is good in my env since the C module (fastbinary) is used and it somehow works for both unicode and string. However, shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell in my env fails to load the C module so fallback to use the python implementation of TBinaryProtocol. This also causes test_utf8_decoding_error_handling fails since it use shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell. I notice the GVO pass in PS21. I think the reason is that the C module (fastbinary.so) is always loaded successfully by TBinaryProtocolAccelerated. I tested on ubuntu16.04 and centos7.4 where the default gcc version differ from our toolchain's gcc, the C module failed to be loaded due to this: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/fastbinary.so) I think we should make our codes work for both TBinaryProtocol.TBinaryProtocol and TBinaryProtocol.TBinaryProtocolAccelerated. http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@171 PS21, Line 171: TBinaryProtocolAccelerated If we change back this to TBinaryProtocol, or remove/rename the fastbinary.so in toolchain so it can't be loaded, the errors I mentioned in other comments may occur. E.g. bin/impala-shell.sh --protocol=hs2-http -q "select 1" Or execute a query with non-ascii chars in beeswax or hs2 protocol. I think previously we switch to TBinaryProtocolAccelerated for performance and the origin TBinaryProtocol still works. http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@735 PS21, Line 735: query_str I think we need to encode this into 'str' when it's 'unicode' in python2. The 'statement' field could contain non-ascii chars, which may lead to encoding errors in the thrift python module that compiled with no_utf8strings. I hit the following error when running the query of test_utf8_decoding_error_handling using TBinaryProtocol.TBinaryProtocol. Traceback (most recent call last): File "/home/systest/impala/shell/impala_shell.py", line 1241, in _execute_stmt query_str, self.set_query_options) File "/home/systest/impala/shell/impala_client.py", line 751, in execute_query resp = self._do_hs2_rpc(ExecuteStatement) File "/home/systest/impala/shell/impala_client.py", line 977, in _do_hs2_rpc return rpc() File "/home/systest/impala/shell/impala_client.py", line 747, in ExecuteStatement return self.imp_service.ExecuteStatement(query) File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 281, in ExecuteStatement self.send_ExecuteStatement(req) File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 288, in send_ExecuteStatement args.write(self._oprot) File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 1823, in write self.req.write(oprot) File "/home/systest/impala/shell/gen-py/TCLIService/ttypes.py", line 3708, in write oprot.writeString(self.statement) File "/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/protocol/TProtocol.py", line 121, in writeString self.writeBinary(str_to_binary(str_val)) File "/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 131, in writeBinary self.trans.write(str) File "/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 171, in write raise e UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-16: ordinal not in range(128) http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@736 PS21, Line 736: confOverlay=set_query_options This also contains unicodes, which could lead to an error in ImpalaHttpClient.py when hs2-http protocol is used. I got a stacktrace like this: Caught exception 'unicode' does not have the buffer interface, type=<type 'exceptions.TypeError'> in ExecuteStatement. Unknown Exception : 'unicode' does not have the buffer interface Traceback (most recent call last): File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala_shell.py", line 1241, in _execute_stmt query_str, self.set_query_options) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", line 751, in execute_query resp = self._do_hs2_rpc(ExecuteStatement) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", line 977, in _do_hs2_rpc return rpc() File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", line 747, in ExecuteStatement return self.imp_service.ExecuteStatement(query) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", line 281, in ExecuteStatement self.send_ExecuteStatement(req) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", line 288, in send_ExecuteStatement args.write(self._oprot) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", line 1823, in write self.req.write(oprot) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py", line 3714, in write oprot.writeString(kiter150) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/TProtocol.py", line 121, in writeString self.writeBinary(str_to_binary(str_val)) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/TBinaryProtocol.py", line 131, in writeBinary self.trans.write(str) File "/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/ImpalaHttpClient.py", line 189, in write self.__wbuf.write(buf) TypeError: 'unicode' does not have the buffer interface http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@1120 PS21, Line 1120: query_str I think we need to encode this too, if it's unicode in python2. http://gerrit.cloudera.org:8080/#/c/17170/21/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/17170/21/tests/beeswax/impala_beeswax.py@158 PS21, Line 158: # TODO: TBinaryProtocol led to negative size error, check if this is a known : # issue in Thrift Maybe it's a similar issue with the one in my local env. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 21 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Sun, 25 Apr 2021 09:09:27 +0000 Gerrit-HasComments: Yes
