zero323 commented on a change in pull request #34513:
URL: https://github.com/apache/spark/pull/34513#discussion_r753783981
##########
File path: python/pyspark/mllib/linalg/__init__.pyi
##########
@@ -68,6 +68,7 @@ class Vector:
__UDT__: VectorUDT
def toArray(self) -> ndarray: ...
def asML(self) -> newlinalg.Vector: ...
+ def __len__(self) -> int: ...
Review comment:
This doesn't look right. ATM `Vector` has not `__len__` method (only
subclassess do), so shouldn't have annotation for it.
I believe we should `type: ignore` this for now, and decide if and how to
adjust parent class definitions separately.
##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -170,10 +190,29 @@ def corr(x, y=None, method=None):
if not y:
return callMLlibFunc("corr", x.map(_convert_to_vector),
method).toArray()
else:
- return callMLlibFunc("corr", x.map(float), y.map(float), method)
+ return callMLlibFunc(
+ "corr", x.map(float), y.map(float), method # type:
ignore[arg-type]
Review comment:
`float` is `Callable`. What is the exact error you get here if it is not
ignored.
##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -170,10 +190,29 @@ def corr(x, y=None, method=None):
if not y:
return callMLlibFunc("corr", x.map(_convert_to_vector),
method).toArray()
else:
- return callMLlibFunc("corr", x.map(float), y.map(float), method)
+ return callMLlibFunc(
+ "corr", x.map(float), y.map(float), method # type:
ignore[arg-type]
Review comment:
`float` is `Callable`. What is the exact error you get here if it is not
ignored?
##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -16,51 +16,57 @@
#
import sys
+from typing import overload, List, Optional, Union
+from typing_extensions import Literal
+
+from numpy import ndarray
from pyspark.rdd import RDD
from pyspark.mllib.common import callMLlibFunc, JavaModelWrapper
-from pyspark.mllib.linalg import Matrix, _convert_to_vector
+from pyspark.mllib.linalg import Matrix, Vector, _convert_to_vector # type:
ignore[attr-defined]
Review comment:
This seems to indicate that we should annotate `__init__.py` first, but
I guess that's too late.
##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -16,51 +16,57 @@
#
import sys
+from typing import overload, List, Optional, Union
+from typing_extensions import Literal
Review comment:
We should never have `typing_extensions` extensions imported directly in
`py` module. Let's move this and `CorrelationMethod` to `_typing.pyi`.
##########
File path: python/pyspark/mllib/linalg/__init__.pyi
##########
@@ -68,6 +68,7 @@ class Vector:
__UDT__: VectorUDT
def toArray(self) -> ndarray: ...
def asML(self) -> newlinalg.Vector: ...
+ def __len__(self) -> int: ...
Review comment:
For the record SPARK-37431
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]