Joe McDonnell has uploaded this change for review. (
http://gerrit.cloudera.org:8080/18576
Change subject: IMPALA-11325: Fix UnicodeDecodeError for shell file output
......................................................................
IMPALA-11325: Fix UnicodeDecodeError for shell file output
When using the --output_file commandline option for
impala-shell, the shell fails with UnicodeDecodeError
if the output contains Unicode characters.
For example, if running this command:
impala-shell -B -q "select '引'" --output_file=output.txt
This fails with:
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0:
ordinal not in range(128)
This happens due to an encode('utf-8') call happening
in OutputStream::write() on a string that is already UTF-8 encoded.
This changes the code to skip the encode('utf-8') call for Python 2.
Python 3 is using a string and still needs the encode call.
This is mostly a pragmatic fix to make the code a little bit
more functional, and there is more work to be done to have
clear contracts for the format() methods and clear points
of conversion to/from bytes.
Testing:
- Ran shell tests with Python 2 and Python 3 on Ubuntu 18
- Added a shell test that outputs a Unicode character
to an output file. Without the fix, this test fails.
Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
2 files changed, 27 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/18576/1
--
To view, visit http://gerrit.cloudera.org:8080/18576
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
Gerrit-Change-Number: 18576
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <[email protected]>