Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r223552386
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct<origin: string,
height: int ... 4 more fields>]
+{% endhighlight %}
+</div>
+
+<div data-lang="java" markdown="1">
+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --
Out of curiosity, why did we put the image source inside of Spark, rather
then a separate module? (see also
https://github.com/apache/spark/pull/21742#discussion_r201552008). Avro was put
as a separate module.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]