Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r456679942
##########
File path: python/pyspark/mllib/classification.py
##########
@@ -102,6 +102,7 @@ class LogisticRegressionModel(LinearClassificationModel):
in Multinomial Logistic Regression. By default, it is binary
logistic regression so numClasses will be set to 2.
+ >>> from pyspark.mllib.linalg import SparseVector
Review comment:
The examples are actually ran in the tests. If the SparseVector is
imported in the Python file itself, there is no issue. However, if the import
isn't used outside of the example, I would suggest to move them inside of the
example. An error:
```
**********************************************************************
File "/home/runner/work/spark/spark/python/pyspark/mllib/feature.py", line
287, in __main__.ChiSqSelector
Failed example:
data = sc.parallelize([
LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
LabeledPoint(1.0, SparseVector(3, {1: 9.0, 2: 6.0})),
LabeledPoint(1.0, [0.0, 9.0, 8.0]),
LabeledPoint(2.0, [7.0, 9.0, 5.0]),
LabeledPoint(2.0, [8.0, 7.0, 3.0])
])
Exception raised:
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.6.11/x64/lib/python3.6/doctest.py", line 1330,
in __run
compileflags, 1), test.globs)
File "<doctest __main__.ChiSqSelector[0]>", line 2, in <module>
LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
NameError: name 'LabeledPoint' is not defined
```
##########
File path: dev/lint-python
##########
@@ -147,7 +147,7 @@ flake8 checks failed."
fi
echo "starting $FLAKE8_BUILD test..."
- FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count
--select=E901,E999,F821,F822,F823 \
+ FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count
--select=E901,E999,F821,F822,F823,F401 \
Review comment:
I've enabled the rule in Flake8 :)
##########
File path: python/pyspark/__init__.py
##########
@@ -112,7 +112,7 @@ def wrapper(self, *args, **kwargs):
# for back compatibility
-from pyspark.sql import SQLContext, HiveContext, Row
+from pyspark.sql import SQLContext, HiveContext, Row # noqa: F401
Review comment:
@srowen I'd be happy to do some more cleanup, but ones like these can't
be removed easily. They provide backward compatibility, and allows us to do the
following:
```bash
fokkodriesprong@Fan spark % python3
Python 3.7.7 (v3.7.7:d7c567b08f, Mar 10 2020, 02:56:16)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyspark import Row
>>> # This is allowed due to the import
```
For backward compatibility I would keep these, otherwise we might break a
lot of imports.
##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
if __name__ == "__main__":
- from pyspark.ml.tests.test_stat import *
+ from pyspark.ml.tests.test_stat import * # noqa: F401
Review comment:
@srowen We could consider removing these ones. There are some entries
for running the tests using the xmlrunner. However it is unclear to me who's
using this, maybe the CI?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]