Hello Grant Henke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#6).

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: WIP - Make impala-shell compatible with python 3.

This is a WIP patch for making the impala-shell code cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that
will pass all python e2e tests repsective of the version of python used to
launch the shell, under the assumption that the test framework itself will
continue to run with python 2.7.x, irrespective of the shell version being
tested.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to 
virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == 
d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 463 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/6
--
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 6
Gerrit-Owner: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to