Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19439#discussion_r148027184
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -0,0 +1,139 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark.ml.util import *
    +from pyspark.ml.param.shared import *
    +from pyspark.sql.types import *
    +from pyspark.sql.types import Row, _create_row
    +from pyspark.sql import DataFrame, SparkSession, SQLContext
    +import numpy as np
    +
    +undefinedImageType = "Undefined"
    +
    +imageFields = ["origin", "height", "width", "nChannels", "mode", "data"]
    +
    +
    +def getOcvTypes(spark=None):
    +    """
    +    Returns the OpenCV type mapping supported
    +
    +    :param sparkSession (SparkSession): The current spark session
    +    :rtype dict: The OpenCV type mapping supported
    +
    +    .. versionadded:: 2.3.0
    +    """
    +    spark = spark or SparkSession.builder.getOrCreate()
    +    ctx = spark.sparkContext
    +    return ctx._jvm.org.apache.spark.ml.image.ImageSchema.ocvTypes
    +
    +
    +# DataFrame with a single column of images named "image" (nullable)
    +def getImageSchema(spark=None):
    --- End diff --
    
    Actually, I think I don't like this API inconsistency. Should we maybe 
match this to Scala one.
    
    Could we maybe do this as below:
    
    ```python
    class _ImageSchema(object):
        @property
        def imageSchema(self):
            jschema = 
ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
            return _parse_datatype_json_string(jschema.json())
    
    
    ImageSchema = _ImageSchema()
    ```
    
    so that we can call:
    
    ```python
    >>> from pyspark.ml.image import ImageSchema
    >>> ImageSchema.imageSchema
    
StructType(List(StructField(image,StructType(List(StructField(origin,StringType,true),StructField(height,IntegerType,false),StructField(width,IntegerType,false),StructField(nChannels,IntegerType,false),StructField(mode,IntegerType,false),StructField(data,BinaryType,false))),true)))
    ```
    
    like Scala
    
    ```scala
    scala> import org.apache.spark.ml.image.ImageSchema
    import org.apache.spark.ml.image.ImageSchema
    
    scala> ImageSchema.imageSchema
    res0: org.apache.spark.sql.types.StructType = 
StructType(StructField(image,StructType(StructField(origin,StringType,true), 
StructField(height,IntegerType,false), StructField(width,IntegerType,false), 
StructField(nChannels,IntegerType,false), StructField(mode,IntegerType,false), 
StructField(data,BinaryType,false)),true))
    ```
    
    Please let me know if anyone has some opinions about this suggestion.
    
    cc @jkbradley too. I think had a talk about API consistency with you before 
if I remember correctly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to