[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r211688118 --- Diff: python/pyspark/streaming/tests.py --- @@ -206,6 +207,38 @@ def func(dstream): expected = [[len(x)] for x in input] self._test_func(input, func, expected) +def test_slice(self): +"""Basic operation test for DStream.slice.""" +import datetime as dt --- End diff -- lets remove the import here since it is at the top already --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r211688841 --- Diff: python/pyspark/streaming/tests.py --- @@ -206,6 +207,38 @@ def func(dstream): expected = [[len(x)] for x in input] self._test_func(input, func, expected) +def test_slice(self): +"""Basic operation test for DStream.slice.""" +import datetime as dt --- End diff -- Also, not your doing but I noticed this spelling error on ln 183 "DStream.faltMap" should be "DStream.flatMap", would you mind changing that while we are here? https://github.com/apache/spark/pull/20838/files#diff-ca4ec8dece48511b915cad6a801695c1R183 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r207604490 --- Diff: python/pyspark/streaming/tests.py --- @@ -206,6 +207,22 @@ def func(dstream): expected = [[len(x)] for x in input] self._test_func(input, func, expected) +def test_slice(self): +"""Basic operation test for DStream.slice.""" +eol_python2 = dt.datetime(2020, 1, 1) +five_secs = dt.timedelta(seconds=5) +input = [eol_python2 - five_secs, eol_python2] + +def func(dstream): +return dstream.slice() +expected = [dt.datetime(2019, 12, 31, 23, 55), +dt.datetime(2019, 12, 31, 23, 56), +dt.datetime(2019, 12, 31, 23, 57), +dt.datetime(2019, 12, 31, 23, 58), +dt.datetime(2019, 12, 31, 23, 59), +dt.datetime(2020, 1, 1)] # fat lady sings... --- End diff -- please remove this comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r205933934 --- Diff: python/pyspark/streaming/tests.py --- @@ -206,6 +207,22 @@ def func(dstream): expected = [[len(x)] for x in input] self._test_func(input, func, expected) +def test_slice(self): --- End diff -- @holdenk Comments please on __test_slice()__ and on the test results https://github.com/apache/spark/pull/20838#issuecomment-408566860 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r205870627 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 +author = unicode(author, "UTF-8") +except NameError: # Python 3 +author = str(author) +author = unidecode.unidecode(author).strip() --- End diff -- > The way it is now [...] it's probably safer. Let's agree to leave this as is in this PR. EOL of Python 2 in 500 daze away so safe is better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r204269682 --- Diff: python/pyspark/sql/conf.py --- @@ -59,7 +62,7 @@ def unset(self, key): def _checkType(self, obj, identifier): """Assert that an object is of type str.""" -if not isinstance(obj, str) and not isinstance(obj, unicode): +if not isinstance(obj, basestring): --- End diff -- Nope --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r204190696 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 +author = unicode(author, "UTF-8") +except NameError: # Python 3 +author = str(author) +author = unidecode.unidecode(author).strip() --- End diff -- My thought was that we are first casting `author` this to unicode already with `unicode(author)` and it doesn't really matter if it is "UTF-8" or not because we then immediately decode it into ASCII with `unidecode`, which can handle it even it it wasn't "UTF-8", so the end result should be the same I believe. It was just to clean up a little, so not a big deal either way. The way it is now replicates the old behavior, so it's probably safer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r204094593 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 +author = unicode(author, "UTF-8") +except NameError: # Python 3 +author = str(author) +author = unidecode.unidecode(author).strip() --- End diff -- Yes... My other worry in Py2 would be if [__sys.setdefaultencoding()__](https://docs.python.org/2/library/sys.html#sys.setdefaultencoding) was set to somthing other that utf-8. That method was also thankfully dropped in Py3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r204093418 --- Diff: python/pyspark/streaming/dstream.py --- @@ -23,6 +23,8 @@ if sys.version < "3": from itertools import imap as map, ifilter as filter +else: +long = int --- End diff -- Adding a call to slice in `tests.py` should be enough to test this. I'm surprised we haven't caught this before, but I suppose this isn't a very frequently exercised code path. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r204091905 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 +author = unicode(author, "UTF-8") +except NameError: # Python 3 +author = str(author) +author = unidecode.unidecode(author).strip() --- End diff -- @BryanCutler I think we do need to specify it because it won't be unicode type in Python 2, we get these from run_cmd which call's Popen communicate which returns a regular string. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199701965 --- Diff: dev/merge_spark_pr.py --- @@ -39,6 +39,9 @@ except ImportError: JIRA_IMPORTED = False +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- It creates a new function called __raw_input()__ that is identical to the builtin __input()__. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199701953 --- Diff: python/pyspark/sql/conf.py --- @@ -59,7 +62,7 @@ def unset(self, key): def _checkType(self, obj, identifier): """Assert that an object is of type str.""" -if not isinstance(obj, str) and not isinstance(obj, unicode): +if not isinstance(obj, basestring): --- End diff -- Is there an issue here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199702001 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,9 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- It creates a new function called __raw_input()__ that is identical to the builtin __input()__. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292645 --- Diff: dev/merge_spark_pr.py --- @@ -39,6 +39,9 @@ except ImportError: JIRA_IMPORTED = False +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- Does this script run with Python 3+ now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292662 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,9 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- Does this script work in Python 3+ now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292586 --- Diff: python/pyspark/sql/conf.py --- @@ -59,7 +62,7 @@ def unset(self, key): def _checkType(self, obj, identifier): """Assert that an object is of type str.""" -if not isinstance(obj, str) and not isinstance(obj, unicode): +if not isinstance(obj, basestring): --- End diff -- This is fine since we rely on short-circuiting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292548 --- Diff: python/pyspark/streaming/dstream.py --- @@ -23,6 +23,8 @@ if sys.version < "3": from itertools import imap as map, ifilter as filter +else: +long = int --- End diff -- Can you add a test for it? Seems only used once and shouldn't be difficult to add a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r181192109 --- Diff: python/pyspark/cloudpickle.py --- @@ -802,9 +802,8 @@ def save_not_implemented(self, obj): self.save_reduce(_gen_not_implemented, ()) if PY3: -dispatch[io.TextIOWrapper] = save_file -else: -dispatch[file] = save_file +file = io.TextIOWrapper +dispatch[file] = save_file --- End diff -- This was merged upstream. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r177176851 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 +author = unicode(author, "UTF-8") +except NameError: # Python 3 +author = str(author) +author = unidecode.unidecode(author).strip() --- End diff -- could you just put this above: ``` if sys.version > '3': unicode = str ``` and then just do? ``` author = unidecode.unidecode(unicode(author)).strip() ``` I don't think you need to specify "UTF-8" because either way it will be a unicode object --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r176904365 --- Diff: python/pyspark/cloudpickle.py --- @@ -802,9 +802,8 @@ def save_not_implemented(self, obj): self.save_reduce(_gen_not_implemented, ()) if PY3: -dispatch[io.TextIOWrapper] = save_file -else: -dispatch[file] = save_file +file = io.TextIOWrapper +dispatch[file] = save_file --- End diff -- I think this one is actually related with cloudpickle's PR. I was trying to (exactly) match this file to a specific version of cloudpickle (which is currently 0.4.3 - https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3). So, I thought we could wait for more feedback there. At least, I was thinking that we should match it to https://github.com/cloudpipe/cloudpickle/tree/0.4.x If that one is merged, I could backport that change into 0.4.x branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r175240674 --- Diff: dev/create-release/releaseutils.py --- @@ -149,7 +152,11 @@ def get_commits(tag): if not is_valid_author(author): author = github_username # Guard against special characters -author = unidecode.unidecode(unicode(author, "UTF-8")).strip() +try: # Python 2 --- End diff -- Two spaces before the inlined comment .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r175000904 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,11 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +try: +raw_input # Python 2 --- End diff -- Shall we keep this consistent for now? There are a hell of a lot things to fix in PySpark in that way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r175000559 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,11 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +try: +raw_input # Python 2 --- End diff -- âWhere practical, apply the Python porting best practice: [Use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).â --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r175000330 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,11 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +try: +raw_input # Python 2 --- End diff -- and shall we do this with if-else of Python version to be consistent with other places? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r17582 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,11 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +try: +raw_input # Python 2 --- End diff -- Shall we put two spaces before lined comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
GitHub user cclauss opened a pull request: https://github.com/apache/spark/pull/20838 [SPARK-23698] Resolve undefined names in Python 3 ## What changes were proposed in this pull request? Fix issues arising from the fact that __file__, __long__, __raw_input()__, __unicode__, __xrange()__, etc. were remove from Python 3. __Undefined names__ have the potential to raise NameError at runtime. Apply the Python porting best practice: [Use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection). ## How was this patch tested? * $ __python2 -m flake8 . --count --select=E9,F82 --show-source --statistics__ * $ __python3 -m flake8 . --count --select=E9,F82 --show-source --statistics__ Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cclauss/spark fix-undefined-names Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20838.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20838 commit bdcd740a5144efef0db1f2b6c29b64c85f3d8ef5 Author: cclauss Date: 2018-03-15T23:08:04Z [SPARK-23698] Reduce undefined names in Python 3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org