GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/21742

    [SPARK-24768][SQL] Have a built-in AVRO data source implementation

    ## What changes were proposed in this pull request?
    
    Apache Avro (https://avro.apache.org) is a popular data serialization 
format. It is widely used in the Spark and Hadoop ecosystem, especially for 
Kafka-based data pipelines.  Using the external package 
https://github.com/databricks/spark-avro, Spark SQL can read and write the avro 
data. Making spark-Avro built-in can provide a better experience for first-time 
users of Spark SQL and structured streaming. We expect the built-in Avro data 
source can further improve the adoption of structured streaming. 
    The proposal is to inline code from spark-avro package 
(https://github.com/databricks/spark-avro). The target release is Spark 2.4.  
    
    [Built-in AVRO Data Source In Spark 
2.4.pdf](https://github.com/apache/spark/files/2181511/Built-in.AVRO.Data.Source.In.Spark.2.4.pdf)
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark export_avro

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21742
    
----
commit 16d08afe5950ce3728c9be256945567c37346da9
Author: Gengliang Wang <gengliang.wang@...>
Date:   2018-07-10T14:13:55Z

    initial import

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to