[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

HyukjinKwon Mon, 08 Oct 2018 20:37:14 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22675#discussion_r223552386
  
    --- Diff: docs/ml-datasource.md ---
    @@ -0,0 +1,51 @@
    +---
    +layout: global
    +title: Data sources
    +displayTitle: Data sources
    +---
    +
    +In this section, we introduce how to use data source in ML to load data.
    +Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
    +
    +**Table of Contents**
    +
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +## Image data source
    +
    +This image data source is used to load libsvm data files from directory.
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    
+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
    +implements Spark SQL data source API for loading image data as DataFrame.
    +The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
    +
    +{% highlight scala %}
    +scala> spark.read.format("image").load("data/mllib/images/origin")
    +res1: org.apache.spark.sql.DataFrame = [image: struct<origin: string, 
height: int ... 4 more fields>]
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="java" markdown="1">
    
+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
    --- End diff --
    
    Out of curiosity, why did we put the image source inside of Spark, rather 
then a separate module? (see also 
https://github.com/apache/spark/pull/21742#discussion_r201552008). Avro was put 
as a separate module.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

Reply via email to