Fokko commented on a change in pull request #29121:
URL: https://github.com/apache/spark/pull/29121#discussion_r456679942



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -102,6 +102,7 @@ class LogisticRegressionModel(LinearClassificationModel):
       in Multinomial Logistic Regression. By default, it is binary
       logistic regression so numClasses will be set to 2.
 
+    >>> from pyspark.mllib.linalg import SparseVector

Review comment:
       The examples are actually ran in the tests. If the SparseVector is 
imported in the Python file itself, there is no issue. However, if the import 
isn't used outside of the example, I would suggest to move them inside of the 
example. An error:
   ```
   **********************************************************************
   File "/home/runner/work/spark/spark/python/pyspark/mllib/feature.py", line 
287, in __main__.ChiSqSelector
   Failed example:
       data = sc.parallelize([
           LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
           LabeledPoint(1.0, SparseVector(3, {1: 9.0, 2: 6.0})),
           LabeledPoint(1.0, [0.0, 9.0, 8.0]),
           LabeledPoint(2.0, [7.0, 9.0, 5.0]),
           LabeledPoint(2.0, [8.0, 7.0, 3.0])
       ])
   Exception raised:
       Traceback (most recent call last):
         File 
"/opt/hostedtoolcache/Python/3.6.11/x64/lib/python3.6/doctest.py", line 1330, 
in __run
           compileflags, 1), test.globs)
         File "<doctest __main__.ChiSqSelector[0]>", line 2, in <module>
           LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
       NameError: name 'LabeledPoint' is not defined
   ```

##########
File path: dev/lint-python
##########
@@ -147,7 +147,7 @@ flake8 checks failed."
     fi
 
     echo "starting $FLAKE8_BUILD test..."
-    FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count 
--select=E901,E999,F821,F822,F823 \
+    FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count 
--select=E901,E999,F821,F822,F823,F401 \

Review comment:
       I've enabled the rule in Flake8 :)

##########
File path: python/pyspark/__init__.py
##########
@@ -112,7 +112,7 @@ def wrapper(self, *args, **kwargs):
 
 
 # for back compatibility
-from pyspark.sql import SQLContext, HiveContext, Row
+from pyspark.sql import SQLContext, HiveContext, Row  # noqa: F401

Review comment:
       @srowen I'd be happy to do some more cleanup, but ones like these can't 
be removed easily. They provide backward compatibility, and allows us to do the 
following:
   ```bash
   fokkodriesprong@Fan spark % python3
   Python 3.7.7 (v3.7.7:d7c567b08f, Mar 10 2020, 02:56:16) 
   [Clang 6.0 (clang-600.0.57)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from pyspark import Row
   >>> # This is allowed due to the import
   ```
   For backward compatibility I would keep these, otherwise we might break a 
lot of imports.

##########
File path: python/pyspark/ml/tests/test_stat.py
##########
@@ -40,7 +40,7 @@ def test_chisquaretest(self):
 
 
 if __name__ == "__main__":
-    from pyspark.ml.tests.test_stat import *
+    from pyspark.ml.tests.test_stat import *  # noqa: F401

Review comment:
       @srowen We could consider removing these ones. There are some entries 
for running the tests using the xmlrunner. However it is unclear to me who's 
using this, maybe the CI?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to