[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-21 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r211688118
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -206,6 +207,38 @@ def func(dstream):
 expected = [[len(x)] for x in input]
 self._test_func(input, func, expected)
 
+def test_slice(self):
+"""Basic operation test for DStream.slice."""
+import datetime as dt
--- End diff --

lets remove the import here since it is at the top already


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-21 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r211688841
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -206,6 +207,38 @@ def func(dstream):
 expected = [[len(x)] for x in input]
 self._test_func(input, func, expected)
 
+def test_slice(self):
+"""Basic operation test for DStream.slice."""
+import datetime as dt
--- End diff --

Also, not your doing but I noticed this spelling error on ln 183 
"DStream.faltMap" should be "DStream.flatMap", would you mind changing that 
while we are here?

https://github.com/apache/spark/pull/20838/files#diff-ca4ec8dece48511b915cad6a801695c1R183


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-03 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r207604490
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -206,6 +207,22 @@ def func(dstream):
 expected = [[len(x)] for x in input]
 self._test_func(input, func, expected)
 
+def test_slice(self):
+"""Basic operation test for DStream.slice."""
+eol_python2 = dt.datetime(2020, 1, 1)
+five_secs = dt.timedelta(seconds=5)
+input = [eol_python2 - five_secs, eol_python2]
+
+def func(dstream):
+return dstream.slice()
+expected = [dt.datetime(2019, 12, 31, 23, 55),
+dt.datetime(2019, 12, 31, 23, 56),
+dt.datetime(2019, 12, 31, 23, 57),
+dt.datetime(2019, 12, 31, 23, 58),
+dt.datetime(2019, 12, 31, 23, 59),
+dt.datetime(2020, 1, 1)]  # fat lady sings...
--- End diff --

please remove this comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-27 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r205933934
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -206,6 +207,22 @@ def func(dstream):
 expected = [[len(x)] for x in input]
 self._test_func(input, func, expected)
 
+def test_slice(self):
--- End diff --

@holdenk Comments please on __test_slice()__ and on the test results 
https://github.com/apache/spark/pull/20838#issuecomment-408566860


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-27 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r205870627
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

> The way it is now [...] it's probably safer.

Let's agree to leave this as is in this PR.  EOL of Python 2 in 500 daze 
away so safe is better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204269682
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

Nope


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-20 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204190696
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

My thought was that we are first casting `author` this to unicode already 
with `unicode(author)` and it doesn't really matter if it is "UTF-8" or not 
because we then immediately decode it into ASCII with `unidecode`, which can 
handle it even it it wasn't "UTF-8", so the end result should be the same I 
believe.  It was just to clean up a little, so not a big deal either way.  The 
way it is now replicates the old behavior, so it's probably safer. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-20 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204094593
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

Yes... My other worry in Py2 would be if 
[__sys.setdefaultencoding()__](https://docs.python.org/2/library/sys.html#sys.setdefaultencoding)
 was set to somthing other that utf-8.  That method was also thankfully dropped 
in Py3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-20 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204093418
  
--- Diff: python/pyspark/streaming/dstream.py ---
@@ -23,6 +23,8 @@
 
 if sys.version < "3":
 from itertools import imap as map, ifilter as filter
+else:
+long = int
--- End diff --

Adding a call to slice in `tests.py` should be enough to test this. I'm 
surprised we haven't caught this before, but I suppose this isn't a very 
frequently exercised code path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-20 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204091905
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

@BryanCutler I think we do need to specify it because it won't be unicode 
type in Python 2, we get these from run_cmd which call's Popen communicate 
which returns a regular string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701965
  
--- Diff: dev/merge_spark_pr.py ---
@@ -39,6 +39,9 @@
 except ImportError:
 JIRA_IMPORTED = False
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701953
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

Is there an issue here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199702001
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,9 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292645
  
--- Diff: dev/merge_spark_pr.py ---
@@ -39,6 +39,9 @@
 except ImportError:
 JIRA_IMPORTED = False
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

Does this script run with Python 3+ now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292662
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,9 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

Does this script work in Python 3+ now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292586
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

This is fine since we rely on short-circuiting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292548
  
--- Diff: python/pyspark/streaming/dstream.py ---
@@ -23,6 +23,8 @@
 
 if sys.version < "3":
 from itertools import imap as map, ifilter as filter
+else:
+long = int
--- End diff --

Can you add a test for it? Seems only used once and shouldn't be difficult 
to add a test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-04-12 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r181192109
  
--- Diff: python/pyspark/cloudpickle.py ---
@@ -802,9 +802,8 @@ def save_not_implemented(self, obj):
 self.save_reduce(_gen_not_implemented, ())
 
 if PY3:
-dispatch[io.TextIOWrapper] = save_file
-else:
-dispatch[file] = save_file
+file = io.TextIOWrapper
+dispatch[file] = save_file
--- End diff --

This was merged upstream.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-26 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r177176851
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

could you just put this above:
```
if sys.version > '3':
unicode = str
```
and then just do?
```
author = unidecode.unidecode(unicode(author)).strip()
```

I don't think you need to specify "UTF-8" because either way it will be a 
unicode object


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r176904365
  
--- Diff: python/pyspark/cloudpickle.py ---
@@ -802,9 +802,8 @@ def save_not_implemented(self, obj):
 self.save_reduce(_gen_not_implemented, ())
 
 if PY3:
-dispatch[io.TextIOWrapper] = save_file
-else:
-dispatch[file] = save_file
+file = io.TextIOWrapper
+dispatch[file] = save_file
--- End diff --

I think this one is actually related with cloudpickle's PR.

I was trying to (exactly) match this file to a specific version of 
cloudpickle (which is currently 0.4.3 - 
https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3). So, I thought we 
could wait for more feedback there. 

At least, I was thinking that we should match it to 
https://github.com/cloudpipe/cloudpickle/tree/0.4.x If that one is merged, I 
could backport that change into 0.4.x branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r175240674
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
--- End diff --

Two spaces before the inlined comment .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r175000904
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,11 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+try:
+raw_input  # Python 2
--- End diff --

Shall we keep this consistent for now? There are a hell of a lot things to 
fix in PySpark in that way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r175000559
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,11 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+try:
+raw_input  # Python 2
--- End diff --

“Where practical, apply the Python porting best practice: [Use feature 
detection instead of version 
detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).”


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r175000330
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,11 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+try:
+raw_input  # Python 2
--- End diff --

and shall we do this with if-else of Python version to be consistent with 
other places?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r17582
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,11 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+try:
+raw_input  # Python 2
--- End diff --

Shall we put two spaces before lined comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/20838

[SPARK-23698] Resolve undefined names in Python 3

## What changes were proposed in this pull request?

Fix issues arising from the fact that __file__, __long__, __raw_input()__, 
__unicode__, __xrange()__, etc. were remove from Python 3.  __Undefined names__ 
have the potential to raise NameError at runtime.  Apply the Python porting 
best practice: [Use feature detection instead of version 
detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).
  

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark fix-undefined-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20838


commit bdcd740a5144efef0db1f2b6c29b64c85f3d8ef5
Author: cclauss 
Date:   2018-03-15T23:08:04Z

[SPARK-23698] Reduce undefined names in Python 3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org