Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
......................................................................


Patch Set 21:

(8 comments)

I played around with this patch and unfortunately encountered some unicode 
issues in impala-shell. They confused me a lot since I got different results 
between bin/impala-shell.sh and the tests. I found some causes (still not root 
cause, need to understand more of thrift's fastbinary.so), so left some 
comments to make impala-shell more robust.

http://gerrit.cloudera.org:8080/#/c/17170/21/common/thrift/CMakeLists.txt
File common/thrift/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/17170/21/common/thrift/CMakeLists.txt@146
PS21, Line 146: no_utf8strings
Could you add a comment for this? I think based on our offline discussion, this 
is for impyla (used in tests) to work without changes (though it results in 
changes in impala-shell...)


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell
File shell/impala-shell:

http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell@29
PS21, Line 29: 0.9
stale comment


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@85
PS21, Line 85: def utf8_decode_if_needed(val):
While calling this on all string fields from thrift, I think we also need to 
encode unicodes into strings when creating thrift requests. Otherwise, we will 
fail in using TBinaryProtocol, or in the fallback mode of 
TBinaryProtocolAccelerated (i.e. failed to load the fastbinary C module then 
the py module is used).

This is found by a local issue with my env. I failed to execute a query 
contains non-ascii chars using 
shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell. The failure is at sending 
the ExecuteStatement request.
Note that using bin/impala-shell.sh is good in my env since the C module 
(fastbinary) is used and it somehow works for both unicode and string. However, 
shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell in my env fails to load 
the C module so fallback to use the python implementation of TBinaryProtocol. 
This also causes test_utf8_decoding_error_handling fails since it use 
shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell.

I notice the GVO pass in PS21. I think the reason is that the C module 
(fastbinary.so) is always loaded successfully by TBinaryProtocolAccelerated. I 
tested on ubuntu16.04 and centos7.4 where the default gcc version differ from 
our toolchain's gcc, the C module failed to be loaded due to this:

 /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by 
/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/fastbinary.so)

I think we should make our codes work for both TBinaryProtocol.TBinaryProtocol 
and TBinaryProtocol.TBinaryProtocolAccelerated.


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@171
PS21, Line 171: TBinaryProtocolAccelerated
If we change back this to TBinaryProtocol, or remove/rename the fastbinary.so 
in toolchain so it can't be loaded, the errors I mentioned in other comments 
may occur. E.g.

 bin/impala-shell.sh --protocol=hs2-http -q "select 1"

Or execute a query with non-ascii chars in beeswax or hs2 protocol.

I think previously we switch to TBinaryProtocolAccelerated for performance and 
the origin TBinaryProtocol still works.


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@735
PS21, Line 735: query_str
I think we need to encode this into 'str' when it's 'unicode' in python2. The 
'statement' field could contain non-ascii chars, which may lead to encoding 
errors in the thrift python module that compiled with no_utf8strings.

I hit the following error when running the query of 
test_utf8_decoding_error_handling using TBinaryProtocol.TBinaryProtocol.

Traceback (most recent call last):
  File "/home/systest/impala/shell/impala_shell.py", line 1241, in _execute_stmt
    query_str, self.set_query_options)
  File "/home/systest/impala/shell/impala_client.py", line 751, in execute_query
    resp = self._do_hs2_rpc(ExecuteStatement)
  File "/home/systest/impala/shell/impala_client.py", line 977, in _do_hs2_rpc
    return rpc()
  File "/home/systest/impala/shell/impala_client.py", line 747, in 
ExecuteStatement
    return self.imp_service.ExecuteStatement(query)
  File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 
281, in ExecuteStatement
    self.send_ExecuteStatement(req)
  File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 
288, in send_ExecuteStatement
    args.write(self._oprot)
  File "/home/systest/impala/shell/gen-py/TCLIService/TCLIService.py", line 
1823, in write
    self.req.write(oprot)
  File "/home/systest/impala/shell/gen-py/TCLIService/ttypes.py", line 3708, in 
write
    oprot.writeString(self.statement)
  File 
"/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/protocol/TProtocol.py",
 line 121, in writeString
    self.writeBinary(str_to_binary(str_val))
  File 
"/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 131, in writeBinary
    self.trans.write(str)
  File 
"/home/systest/impala/toolchain/toolchain-packages-gcc7.5.0/thrift-0.11.0-p4/python/lib/python2.7/site-packages/thrift/transport/TTransport.py",
 line 171, in write
    raise e
UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-16: 
ordinal not in range(128)


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@736
PS21, Line 736: confOverlay=set_query_options
This also contains unicodes, which could lead to an error in 
ImpalaHttpClient.py when hs2-http protocol is used. I got a stacktrace like 
this:

Caught exception 'unicode' does not have the buffer interface, type=<type 
'exceptions.TypeError'> in ExecuteStatement.
Unknown Exception : 'unicode' does not have the buffer interface
Traceback (most recent call last):
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala_shell.py", 
line 1241, in _execute_stmt
    query_str, self.set_query_options)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py",
 line 751, in execute_query
    resp = self._do_hs2_rpc(ExecuteStatement)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py",
 line 977, in _do_hs2_rpc
    return rpc()
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py",
 line 747, in ExecuteStatement
    return self.imp_service.ExecuteStatement(query)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py",
 line 281, in ExecuteStatement
    self.send_ExecuteStatement(req)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py",
 line 288, in send_ExecuteStatement
    args.write(self._oprot)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py",
 line 1823, in write
    self.req.write(oprot)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py",
 line 3714, in write
    oprot.writeString(kiter150)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/TProtocol.py",
 line 121, in writeString
    self.writeBinary(str_to_binary(str_val))
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/thrift/protocol/TBinaryProtocol.py",
 line 131, in writeBinary
    self.trans.write(str)
  File 
"/home/systest/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/ImpalaHttpClient.py",
 line 189, in write
    self.__wbuf.write(buf)
TypeError: 'unicode' does not have the buffer interface


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@1120
PS21, Line 1120: query_str
I think we need to encode this too, if it's unicode in python2.


http://gerrit.cloudera.org:8080/#/c/17170/21/tests/beeswax/impala_beeswax.py
File tests/beeswax/impala_beeswax.py:

http://gerrit.cloudera.org:8080/#/c/17170/21/tests/beeswax/impala_beeswax.py@158
PS21, Line 158:       # TODO: TBinaryProtocol led to negative size error, check 
if this is a known
              :       #       issue in Thrift
Maybe it's a similar issue with the one in my local env.



--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 21
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Sun, 25 Apr 2021 09:09:27 +0000
Gerrit-HasComments: Yes

Reply via email to