[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide

srowen Fri, 17 Aug 2018 07:15:21 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22121#discussion_r210922729
  
    --- Diff: docs/avro-data-source-guide.md ---
    @@ -0,0 +1,267 @@
    +---
    +layout: global
    +title: Avro Data Source Guide
    +---
    +
    +Since Spark 2.4 release, [Spark 
SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides 
support for reading and writing Avro data.
    +
    +## Deploying
    +The <code>spark-avro</code> module is external and not included in 
`spark-submit` or `spark-shell` by default.
    +
    +As with any Spark applications, `spark-submit` is used to launch your 
application. `spark-avro_{{site.SCALA_BINARY_VERSION}}`
    +and its dependencies can be directly added to `spark-submit` using 
`--packages`, such as,
    +
    +    ./bin/spark-submit --packages 
org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}}
 ...
    +
    +For experimenting on `spark-shell`, you can also use `--packages` to add 
`org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its 
dependencies directly,
    +
    +    ./bin/spark-shell --packages 
org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}}
 ...
    +
    +See [Application Submission Guide](submitting-applications.html) for more 
details about submitting applications with external dependencies.
    +
    +## Examples
    +
    +Since `spark-avro` module is external, there is not such API as 
<code>.avro</code> in 
    +<code>DataFrameReader</code> or <code>DataFrameWriter</code>.
    +To load/save data in Avro format, you need to specify the data source 
option <code>format</code> as short name <code>avro</code> or full name 
<code>org.apache.spark.sql.avro</code>.
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +
    +val usersDF = 
spark.read.format("avro").load("examples/src/main/resources/users.avro")
    +usersDF.select("name", 
"favorite_color").write.format("avro").save("namesAndFavColors.avro")
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +
    +Dataset<Row> usersDF = 
spark.read().format("avro").load("examples/src/main/resources/users.avro");
    +usersDF.select("name", 
"favorite_color").write().format("avro").save("namesAndFavColors.avro");
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="python" markdown="1">
    +{% highlight python %}
    +
    +df = 
spark.read.format("avro").load("examples/src/main/resources/users.avro")
    +df.select("name", 
"favorite_color").write.format("avro").save("namesAndFavColors.avro")
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +df <- read.df("examples/src/main/resources/users.avro", "avro")
    +write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", 
"avro")
    +
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +## Configuration
    --- End diff --
    
    Space after headings like this



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide

Reply via email to