[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-14 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
@holdenk Can you please do me a big favor and create a separate PR that 
makes sure that we are doing at least __flake8 . --count 
--select=E901,E999,F821,F822,F823 --show-source --statistics__ on the whole 
codebase on Python 2 and Python 3?  When I create PRs on this project, I make 
so many mistakes that it takes pages of conversation and weeks of testing to 
get them merged.  I just want to make sure that the project never backslides on 
F821s because it has taken [a lot of 
effort](https://github.com/apache/spark/pulls?q=is%3Apr+author%3Acclauss+is%3Aclosed)
 to get us clean.  Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-14 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
Removed the escaped backticks.

@HyukjinKwon Undefined names (F821) is a flake8 error, not a PyCodeStyle 
error.  That means that PyCodeStyle on its own (this PR) is unable to detect 
the problem with `__version__`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-13 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
OK...  Removed backticks and we are once again testing 
./dev/run-tests-jenkins.py


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-13 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
OK...  __# noqa__ works within the flake8 wrapper around PyFlakes but does 
not work when PyFlakes is called outside of flake8.  Also added r"comment" as 
you suggested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-12 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
@srowen Following the merge of #22400 there are just three W605s at the 
bottom of 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96020/console
  Should we __# noqa__ those lines or do you have a slicker solution/  Thanks 
massively for your persistence.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-12 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
Let's wait until #22400 is merged before proceeding with this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...

2018-09-12 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/22400#discussion_r216926889
  
--- Diff: dev/create-release/generate-contributors.py ---
@@ -88,7 +88,7 @@ def print_indented(_list):
 
 
 def is_release(commit_title):
-return re.findall("\[release\]", commit_title.lower()) or \
+return re.findall(r"\[release\]", commit_title.lower()) or \
--- End diff --

1. Could we use parents to remove line terminating backslashes as 
recommended in PEP8?
2. Could we get rid of the use of __re__ in this instance with
```python
return ("[release]", commit_title.lower() or
 "preparing spark release" in commit_title.lower() or
 "preparing development version" in commit_title.lower() or
 "CHANGES.txt" in commit_title)
 ```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-10 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
@srowen Could you please provide some test cases in Python 2 and Python 3 
to prove that the changes that you propose will be properly interpreted without 
side effects in both?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22266: [SPARK-25270] lint-python: Add flake8 to find syn...

2018-08-30 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/22266#discussion_r214201610
  
--- Diff: dev/lint-python ---
@@ -82,6 +82,25 @@ else
 rm "$PYCODESTYLE_REPORT_PATH"
 fi
 
+# stop the build if there are Python syntax errors or undefined names
+flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
--statistics
--- End diff --

My take is that we want contributors to be able to locally detect syntax 
errors and undefined names _before_ they submit a pull request.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22266: [SPARK-25270] lint-python: Add flake8 to find syn...

2018-08-30 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/22266#discussion_r214196909
  
--- Diff: dev/lint-python ---
@@ -82,6 +82,26 @@ else
 rm "$PYCODESTYLE_REPORT_PATH"
 fi
 
+python -m pip install flake8
--- End diff --

Removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22266: [SPARK-25270] lint-python: Add flake8 to find syntax err...

2018-08-29 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22266
  
@HyukjinKwon Done.
@viirya Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22266: lint-python: Add flake8 to find syntax errors and undefi...

2018-08-29 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22266
  
These tests should fail until #22265 or similar is merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22266: lint-python: Add flake8 tests to find Python synt...

2018-08-29 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22266

lint-python: Add flake8 tests to find Python syntax errors and undefi…

…ned names

Add [flake8](http://flake8.pycqa.org) tests to find Python syntax errors 
and undefined names.

__E901,E999,F821,F822,F823__ are the "_showstopper_" flake8 issues that can 
halt the runtime with a SyntaxError, NameError, etc. Most other flake8 issues 
are merely "style violations" -- useful for readability but they do not effect 
runtime safety.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable name referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22266


commit 129119803eacbca1a53384893424e4ae75774626
Author: cclauss 
Date:   2018-08-29T09:46:38Z

lint-python: Add flake8 tests to find Python syntax errors and undefined 
names

Add [flake8](http://flake8.pycqa.org) tests to find Python syntax errors 
and undefined names.

__E901,E999,F821,F822,F823__ are the "_showstopper_" flake8 issues that can 
halt the runtime with a SyntaxError, NameError, etc. Most other flake8 issues 
are merely "style violations" -- useful for readability but they do not effect 
runtime safety.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable name referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22265: Undefined name: from pyspark.util import _excepti...

2018-08-29 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22265

Undefined name: from pyspark.util import _exception_message

[flake8](http://flake8.pycqa.org) testing of 
https://github.com/apache/spark on Python 3.7.0

$ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
--statistics__
```
./python/pyspark/java_gateway.py:172:20: F821 undefined name 
'_exception_message'
emsg = _exception_message(e)
   ^
1 F821 undefined name '_exception_message'
1
```
@HyukjinKwon

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22265


commit 197ef63b57f089bc69c08cc9b975df79a73d
Author: cclauss 
Date:   2018-08-29T09:24:38Z

Undefined name: from pyspark.util import _exception_message

[flake8](http://flake8.pycqa.org) testing of 
https://github.com/apache/spark on Python 3.7.0

$ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
--statistics__
```
./python/pyspark/java_gateway.py:172:20: F821 undefined name 
'_exception_message'
emsg = _exception_message(e)
   ^
1 F821 undefined name '_exception_message'
1
```
@HyukjinKwon




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22231: [25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4...

2018-08-24 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/22231
  
https://issues.apache.org/jira/browse/SPARK-25238


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22231: [MINOR][PYTHON] lint-python: Upgrade pycodestyle ...

2018-08-24 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22231

[MINOR][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0

See https://pycodestyle.readthedocs.io/en/latest/developer.html#changes for 
changes made in this release.

## What changes were proposed in this pull request?

Upgrade pycodestyle to v2.4.0

## How was this patch tested?

__pycodestyle__

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22231


commit 00f34cafbe4b3526b48ac794677b9569e58ae01d
Author: cclauss 
Date:   2018-08-25T06:04:29Z

[MINOR][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0

See https://pycodestyle.readthedocs.io/en/latest/developer.html#changes for 
changes made in this release.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22214: [SPARK-23698][Python] Avoid 'undefined name' by d...

2018-08-24 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/22214


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22214: [SPARK-23698][Python] Avoid 'undefined name' by d...

2018-08-24 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22214

[SPARK-23698][Python] Avoid 'undefined name' by defining __version__

## What changes were proposed in this pull request?

Prevent linters from raising __undefined name '\_\_version\_\_'__ by 
initializing the variable before setting it via a call to __exec()__.  This is 
the last remaining issue related to the work done in #20838

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22214


commit a077542e67213c0add6093c0bf8c7568b6cb7b32
Author: cclauss 
Date:   2018-08-24T09:36:07Z

[SPARK-23698][Python] Avoid 'undefined name' by defining __version__

Prevent linters from raising __undefined name '\_\_version\_\_'__ by 
initializing the variable before setting it via a call to __exec()__.  This is 
the last remaining issue related to the work done in #20838




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22128: Add test_slice() to streaming BasicOperations

2018-08-21 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/22128


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-20 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
Thanks massively for this.  I doubt that I _ever_ would have gotten to that 
on my own.  This is a test so my proposal would be that _you create a separate 
PR_ so that we are all assured that it passes in the current codebase.  Once 
that PR has been merged, I can come back and finish this PR.  Thanks again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-18 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
This is not working at all...  I am wasting way too much time.  5+ months 
and 80+ comments for 12 lines of code is 

I do not have the skills to solve the following undefined name 'long' in a 
satisfactory manner:
```
./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
return self._sc._jvm.Time(long(timestamp * 1000))
  ^
```
If someone with more skills would be willing to take that one undefined 
name off my plate and solve it with a test in a separate PR then I would be 
grateful.  I will study that PR carefully and can then proceed with the others 
that are in this PR.

My recommended fix is at 
https://github.com/apache/spark/pull/20838/files#diff-6c576c52abc0624ccb6a2f45828dc6a7
 and my proposed test (it is failing!) is immediately following.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-17 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
It was reverted because [__slice_test()__](#22128) was causing the build to 
fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22128: Add test_slice() to streaming BasicOperations

2018-08-16 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22128

Add test_slice() to streaming BasicOperations

As suggested in 
https://github.com/apache/spark/pull/20838#pullrequestreview-139118618

## What changes were proposed in this pull request?
Add a test for slice operations on streams.

(Please fill in changes proposed in this fix)

## How was this patch tested?
It is a new test being added to the automated test suite.

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22128


commit 4094422d58077aa95129a7ec9fddf75c2e3af7a7
Author: cclauss 
Date:   2018-08-16T23:06:59Z

Add test_slice() to streaming BasicOperations

As suggested in 
https://github.com/apache/spark/pull/20838#pullrequestreview-139118618




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
@HyukjinKwon Your advise on next steps?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-04 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-04 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
May I please break this PR into pieces as I did in #21959 and #21960 so 
that we can isolate why tests keep failing?  These issues were all discovered 
using the same (flake8 F821) technique but they are in separate files and cover 
different undefined names so they could rightfully be part of separate PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-02 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
Several are fixed it this PR.  Some others are not fixable without 
resorting to using __#noqa__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21960: [SPARK-23698] Remove unused definitions of long a...

2018-08-02 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/21960


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21959: [SPARK-23698] Define xrange() for Python 3 in dum...

2018-08-02 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/21959


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21960: [SPARK-23698] Remove unused definitions of long a...

2018-08-01 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21960

[SPARK-23698] Remove unused definitions of long and unicode

__intlike__ and __unicode__ were defined but not used in the existing code. 
 __basestring()__was removed in Python 3 in favor of __str()__ because all str 
are Unicode.  This simple change resolves an Undefined Name (__long__) and was 
originally suggested in #20838 which is currently mired in 50+ comments on 
unrelated modifications.  @HyukjinKwon @holdenk

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21960


commit 20d072e6e19114e485711b882ff5f48e48a884a1
Author: cclauss 
Date:   2018-08-02T05:24:03Z

[SPARK-23698] Remove unused definitions of long and unicode

__intlike__ and __unicode__ were defined but not used in the existing code. 
 __basestring()__was removed in Python 3 in favor of __str()__ because all str 
are Unicode.  This simple change resolves an Undefined Name (__long__) and was 
originally suggested in #20838 which is currently mired in 50+ comments on 
unrelated modifications.  @HyukjinKwon @holdenk




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/21959
  
As it says in the commit message, these changes are already in #20838 but 
that PR has been open for 139 days and has 50+ comments.  The only way that I 
seem to make progress is by opening separate PRs that are easier to review and 
approve.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21959: [SPARK-23698] Define xrange() for Python 3 in dum...

2018-08-01 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21959

[SPARK-23698]  Define xrange() for Python 3 in dumpdata_script.py

__xrange()__ was removed in Python 3 in favor of __range()__.  This simple 
change removes three Undefined Names was originally suggested in #20838 which 
is currently mired in 50+ comments on unrelated modifications.  @HyukjinKwon 
@holdenk

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21959


commit 15676ab875838cc99f71fab6f8618c886b294b1a
Author: cclauss 
Date:   2018-08-02T05:13:37Z

[SPARK-23698]  Define xrange() for Python 3 in dumpdata_script.py

__xrange()__ was removed in Python 3 in favor of __range()__.  This simple 
change removes three Undefined Names was originally suggested in #20838 which 
is currently mired in 50+ comments on unrelated modifications.  @HyukjinKwon 
@holdenk




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-01 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
@holdenk Did we miss the window?  I still count 10 undefined names in this 
repo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-27 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r205933934
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -206,6 +207,22 @@ def func(dstream):
 expected = [[len(x)] for x in input]
 self._test_func(input, func, expected)
 
+def test_slice(self):
--- End diff --

@holdenk Comments please on __test_slice()__ and on the test results 
https://github.com/apache/spark/pull/20838#issuecomment-408566860


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-27 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r205870627
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

> The way it is now [...] it's probably safer.

Let's agree to leave this as is in this PR.  EOL of Python 2 in 500 daze 
away so safe is better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-20 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r204094593
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -149,7 +152,11 @@ def get_commits(tag):
 if not is_valid_author(author):
 author = github_username
 # Guard against special characters
-author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
+try:   # Python 2
+author = unicode(author, "UTF-8")
+except NameError:  # Python 3
+author = str(author)
+author = unidecode.unidecode(author).strip()
--- End diff --

Yes... My other worry in Py2 would be if 
[__sys.setdefaultencoding()__](https://docs.python.org/2/library/sys.html#sys.setdefaultencoding)
 was set to somthing other that utf-8.  That method was also thankfully dropped 
in Py3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-17 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
Three (different) failures in a row. Should I break this PR into separate 
PRs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/21702
  
Tested data entry...
In __./dev/merge_spark_pr.py__ just after __clean_up()__, I added the lines:
```
while not continue_maybe('y to conntinue'):
print('loop')
sys.exit()
```
Test: 'y' or 'Y' caused a loop while all others including "", "yes", "n", 
"N", "0", "1", "." caused an exit()
Identical results on both Python 2 and Python 3
---
In __./dev/create-release/releaseutils.py__ just after __yesOrNoPrompt()_ I 
added the lines:
```
while not yesOrNoPrompt('y to quit'):
print('got an 'n'')
sys.exit()
```
Test: 'y' caused an exit() while all others including "", "Y", "yes", "n", 
"N", "0", "1", "." caused a loop
Identical results on both Python 2 and Python 3



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21702#discussion_r199723478
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,13 +49,16 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version < '3':
+input = raw_input
--- End diff --

Two downsides to that approach:
1. We stick with the legacy Python syntax which will unnecessarily 
complicate our lives (and our diffs) [in 18 months](http://pythonclock.org)
2. This approach reduces the linting errors from 10 down to just 2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21702

[SPARK-23698] Remove raw_input() from Python 2

Signed-off-by: cclauss 

## What changes were proposed in this pull request?

Humans will be able to enter text in Python 3 prompts which they can not do 
today.
The Python builtin __raw_input()__ was removed in Python 3 in favor of 
__input()__.  This PR does the same thing in Python 2.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
flake8 testing

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark python-fix-raw_input

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21702


commit 960769a735933d58d136ea068954ad83d4731b10
Author: cclauss 
Date:   2018-07-03T07:10:46Z

[SPARK-23698] Remove raw_input() from Python 2

Signed-off-by: cclauss 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701965
  
--- Diff: dev/merge_spark_pr.py ---
@@ -39,6 +39,9 @@
 except ImportError:
 JIRA_IMPORTED = False
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701953
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

Is there an issue here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199702001
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,9 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-04-12 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r181192109
  
--- Diff: python/pyspark/cloudpickle.py ---
@@ -802,9 +802,8 @@ def save_not_implemented(self, obj):
 self.save_reduce(_gen_not_implemented, ())
 
 if PY3:
-dispatch[io.TextIOWrapper] = save_file
-else:
-dispatch[file] = save_file
+file = io.TextIOWrapper
+dispatch[file] = save_file
--- End diff --

This was merged upstream.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-04-12 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
Other issues / suggestions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-28 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
The cloudpickle change is now merged upstream.  I would like to avoid 
@BryanCutler suggestion above because Unicode is really touchy in Python 2 and 
a lot can change based on 
[sys.setdefaultencoding()](https://docs.python.org/2/library/sys.html#sys.setdefaultencoding)
 so I would like to avoid assumptions and make as small modifications as 
possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-23 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
@HyukjinKwon Was there something more to do on this one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread cclauss
Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r175000559
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,11 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+try:
+raw_input  # Python 2
--- End diff --

“Where practical, apply the Python porting best practice: [Use feature 
detection instead of version 
detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).”


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread cclauss
GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/20838

[SPARK-23698] Resolve undefined names in Python 3

## What changes were proposed in this pull request?

Fix issues arising from the fact that __file__, __long__, __raw_input()__, 
__unicode__, __xrange()__, etc. were remove from Python 3.  __Undefined names__ 
have the potential to raise NameError at runtime.  Apply the Python porting 
best practice: [Use feature detection instead of version 
detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).
  

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark fix-undefined-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20838


commit bdcd740a5144efef0db1f2b6c29b64c85f3d8ef5
Author: cclauss 
Date:   2018-03-15T23:08:04Z

[SPARK-23698] Reduce undefined names in Python 3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org