Quanlong Huang has uploaded this change for review. (
http://gerrit.cloudera.org:8080/16688
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to
thrift-0.11.0
......................................................................
IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
After we bump the impala-shell dependent thrift version to 0.11.0, we
hit some bugs in decoding malformed utf8 characters, which crash the
impala-shell or cause it hanging forever. Before we bump the thrift
version, impala-shell is able to print incomplete utf8 characters as
some replaced utf8 symbols, e.g.
impala-shell> select substr("引擎", 1, 4);
引�
impala-shell> select unhex("aa");
�
The cause is that thrift changes its internal strings representation
from bytes to unicode after 0.10 (THRIFT-3503) to support Python3, which
follows the "unicode sandwich" rule -- namely "bytes on the outside,
unicode on the inside, encode/decode at the edges". However, the error
handling method is not specified so we hit the decoding error. We need
patches of THRIFT-2087 and THRIFT-5303 to improve its robustness.
THRIFT-5303 is enough to resolve the issue we hitted since we mostly use
the _fast_decode code path. Backporting THRIFT-2087 as well in case we
use the normal decoding code path somewhere.
Tests:
- Verify the issue is resolved after bumping the impala-shell dependent
thrift version to 0.11.0-p4.
Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
---
M buildall.sh
A
source/thrift/thrift-0.11.0-patches/0003-THRIFT-2087-Python-compiler-replace-non-utf-8-char-w.patch
A
source/thrift/thrift-0.11.0-patches/0004-THRIFT-5303-Fix-missing-error-handling-in-using-PyUn.patch
3 files changed, 55 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/native-toolchain
refs/changes/88/16688/1
--
To view, visit http://gerrit.cloudera.org:8080/16688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Gerrit-Change-Number: 16688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <[email protected]>