Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/21483
  
    @HyukjinKwon Sorry I created the JIRA in a rush. I think there are two 
issues:
    
    1) Modules under pyspark.ml are not imported by default, e.g., clustering, 
image, etc. So users cannot do
    
    ~~~python
    import pyspark.ml as ml
    
    ml.image.ImageSchema.readImages(...)
    
    # or
    
    kmeans = ml.clustering.KMeans(...)
    ~~~
    
    We should include the modules under `spark.ml` in ml/__init__.py's 
`__all__`. It might change the behavior if users's code is
    
    ~~~python
    from pyspark.ml import *
    ~~~
    
    It might load modules not used by users. But overall I think it should be a 
good change.
    
    2) image.py doesn't have `__all__` defined. This makes it hard to figure 
out what names are imported. We should add `__all__ = ["ImageSchema"]` to 
image.py.
    
    I think in this PR we should do 2). I can create a new JIRA for 1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to