GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/21742
[SPARK-24768][SQL] Have a built-in AVRO data source implementation
## What changes were proposed in this pull request?
Apache Avro (https://avro.apache.org) is a popular data serialization
format. It is widely used in the Spark and Hadoop ecosystem, especially for
Kafka-based data pipelines. Using the external package
https://github.com/databricks/spark-avro, Spark SQL can read and write the avro
data. Making spark-Avro built-in can provide a better experience for first-time
users of Spark SQL and structured streaming. We expect the built-in Avro data
source can further improve the adoption of structured streaming.
The proposal is to inline code from spark-avro package
(https://github.com/databricks/spark-avro). The target release is Spark 2.4.
[Built-in AVRO Data Source In Spark
2.4.pdf](https://github.com/apache/spark/files/2181511/Built-in.AVRO.Data.Source.In.Spark.2.4.pdf)
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark export_avro
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21742.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21742
----
commit 16d08afe5950ce3728c9be256945567c37346da9
Author: Gengliang Wang <gengliang.wang@...>
Date: 2018-07-10T14:13:55Z
initial import
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]