Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/21483
@HyukjinKwon Sorry I created the JIRA in a rush. I think there are two
issues:
1) Modules under pyspark.ml are not imported by default, e.g., clustering,
image, etc. So users cannot do
~~~python
import pyspark.ml as ml
ml.image.ImageSchema.readImages(...)
# or
kmeans = ml.clustering.KMeans(...)
~~~
We should include the modules under `spark.ml` in ml/__init__.py's
`__all__`. It might change the behavior if users's code is
~~~python
from pyspark.ml import *
~~~
It might load modules not used by users. But overall I think it should be a
good change.
2) image.py doesn't have `__all__` defined. This makes it hard to figure
out what names are imported. We should add `__all__ = ["ImageSchema"]` to
image.py.
I think in this PR we should do 2). I can create a new JIRA for 1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]