_statistics.py

GitBox Sun, 21 Nov 2021 11:39:50 -0800


zero323 commented on a change in pull request #34513:
URL: https://github.com/apache/spark/pull/34513#discussion_r753783981




##########
File path: python/pyspark/mllib/linalg/__init__.pyi
##########
@@ -68,6 +68,7 @@ class Vector:
     __UDT__: VectorUDT
     def toArray(self) -> ndarray: ...
     def asML(self) -> newlinalg.Vector: ...
+    def __len__(self) -> int: ...

Review comment:
       This doesn't look right. ATM `Vector` has not `__len__` method (only 
subclassess do), so shouldn't have annotation for it.
   
   I believe we should `type: ignore` this for now, and decide if and how to 
adjust  parent class definitions separately.

##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -170,10 +190,29 @@ def corr(x, y=None, method=None):
         if not y:
             return callMLlibFunc("corr", x.map(_convert_to_vector), 
method).toArray()
         else:
-            return callMLlibFunc("corr", x.map(float), y.map(float), method)
+            return callMLlibFunc(
+                "corr", x.map(float), y.map(float), method  # type: 
ignore[arg-type]

Review comment:
       `float` is `Callable`. What is the exact error you get here if it is not 
ignored.

##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -170,10 +190,29 @@ def corr(x, y=None, method=None):
         if not y:
             return callMLlibFunc("corr", x.map(_convert_to_vector), 
method).toArray()
         else:
-            return callMLlibFunc("corr", x.map(float), y.map(float), method)
+            return callMLlibFunc(
+                "corr", x.map(float), y.map(float), method  # type: 
ignore[arg-type]

Review comment:
       `float` is `Callable`. What is the exact error you get here if it is not 
ignored?

##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -16,51 +16,57 @@
 #
 
 import sys
+from typing import overload, List, Optional, Union
+from typing_extensions import Literal
+
+from numpy import ndarray
 
 from pyspark.rdd import RDD
 from pyspark.mllib.common import callMLlibFunc, JavaModelWrapper
-from pyspark.mllib.linalg import Matrix, _convert_to_vector
+from pyspark.mllib.linalg import Matrix, Vector, _convert_to_vector  # type: 
ignore[attr-defined]

Review comment:
       This seems to indicate that we should annotate `__init__.py` first, but 
I guess that's too late.

##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -16,51 +16,57 @@
 #
 
 import sys
+from typing import overload, List, Optional, Union
+from typing_extensions import Literal

Review comment:
       We should never have `typing_extensions` extensions imported directly in 
 `py` module. Let's move this and `CorrelationMethod` to `_typing.pyi`.

##########
File path: python/pyspark/mllib/linalg/__init__.pyi
##########
@@ -68,6 +68,7 @@ class Vector:
     __UDT__: VectorUDT
     def toArray(self) -> ndarray: ...
     def asML(self) -> newlinalg.Vector: ...
+    def __len__(self) -> int: ...

Review comment:
       For the record SPARK-37431




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zero323 commented on a change in pull request #34513: [SPARK-37234][PYTHON] Inline type hints for python/pyspark/mllib/stat/_statistics.py

Reply via email to