Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148027184
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.util import *
+from pyspark.ml.param.shared import *
+from pyspark.sql.types import *
+from pyspark.sql.types import Row, _create_row
+from pyspark.sql import DataFrame, SparkSession, SQLContext
+import numpy as np
+
+undefinedImageType = "Undefined"
+
+imageFields = ["origin", "height", "width", "nChannels", "mode", "data"]
+
+
+def getOcvTypes(spark=None):
+ """
+ Returns the OpenCV type mapping supported
+
+ :param sparkSession (SparkSession): The current spark session
+ :rtype dict: The OpenCV type mapping supported
+
+ .. versionadded:: 2.3.0
+ """
+ spark = spark or SparkSession.builder.getOrCreate()
+ ctx = spark.sparkContext
+ return ctx._jvm.org.apache.spark.ml.image.ImageSchema.ocvTypes
+
+
+# DataFrame with a single column of images named "image" (nullable)
+def getImageSchema(spark=None):
--- End diff --
Actually, I think I don't like this API inconsistency. Should we maybe
match this to Scala one.
Could we maybe do this as below:
```python
class _ImageSchema(object):
@property
def imageSchema(self):
jschema =
ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
return _parse_datatype_json_string(jschema.json())
ImageSchema = _ImageSchema()
```
so that we can call:
```python
>>> from pyspark.ml.image import ImageSchema
>>> ImageSchema.imageSchema
StructType(List(StructField(image,StructType(List(StructField(origin,StringType,true),StructField(height,IntegerType,false),StructField(width,IntegerType,false),StructField(nChannels,IntegerType,false),StructField(mode,IntegerType,false),StructField(data,BinaryType,false))),true)))
```
like Scala
```scala
scala> import org.apache.spark.ml.image.ImageSchema
import org.apache.spark.ml.image.ImageSchema
scala> ImageSchema.imageSchema
res0: org.apache.spark.sql.types.StructType =
StructType(StructField(image,StructType(StructField(origin,StringType,true),
StructField(height,IntegerType,false), StructField(width,IntegerType,false),
StructField(nChannels,IntegerType,false), StructField(mode,IntegerType,false),
StructField(data,BinaryType,false)),true))
```
Please let me know if anyone has some opinions about this suggestion.
cc @jkbradley too. I think had a talk about API consistency with you before
if I remember correctly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]