Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 )
Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 ...................................................................... Patch Set 23: Code-Review+2 > Patch Set 23: > > (7 comments) > > Thanks a lot Quanlong for the detailed analysis! > > I added more conversions, and now test_shell_interactive.py passes with the > non-accelerated protocol. > > I like the code less and less though and become unsure about the > no_utf8strings option. When reading thrift structures, it makes sense, as we > can avoid unnecessary decode + encode pairs if we expect the result in utf8. > But when writing, it would be better to convert every 'unicode' to utf8, it > too much hassle to do this in the caller. > > I think that ideally Thrift would always encode when writing but return > string during read based on some option from the protocol, and do this > consistently in both accelerated and normal protocol. Yeah, I think the hassle comes from http://gerrit.cloudera.org:8080/15524 (IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3). Start from that patch, we change our internal string type from 'str' to 'unicode' in python2: from __future__ import unicode_literals At that point we expect getting 'unicode' from thrift. Now we switch the thrift py module to be compiled with no_utf8strings, so we are getting 'str' from thrift. This breaks the codes expecting 'unicode' values and needs additional converting codes. To finish the python3 compatibility work in impala-shell, I think we still need to insist in importing unicode_literals. I have some thoughts on future items (need further discussion). * using thrift py module without no_utf8strings in Impyla, then Impyla may be able to remove the dependency on thriftpy2 in Python3. * Impyla can provide an option on whether returning 'str' or 'unicode' values in python2, and then do neccessary converting at the boundary. In our tests, we'd like Impyla returns 'str' values. * Finally we can get rid of the no_utf8strings option in impala-shell and don't need the converting codes added in this patch. The current patch set LGTM. Thanks for addressing the comments! -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 23 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 27 Apr 2021 02:14:56 +0000 Gerrit-HasComments: No
