GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22666

    [SPARK-25672][SQL] schema_of_csv() - schema inference from an example

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to add new function - *schema_of_csv()* which infers 
schema of CSV string literal. The result of the function is a string containing 
a schema in DDL format. For example:
    
    ```sql
    select schema_of_csv('1|abc', map('delimiter', '|'))
    ``` 
    ```
    struct<_c0:int,_c1:string>
    ```
    
    ## How was this patch tested?
    
    Added new tests to `CsvFunctionsSuite`, `CsvExpressionsSuite` and SQL tests 
to `csv-functions.sql`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 schema_of_csv-function

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22666
    
----
commit 4c00900e8bfbe56d13576d6dc21fb2f2dbbb105d
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-10-07T14:17:05Z

    Dependency of uniVocity 2.7.3 is added for sql/catalyst

commit 25f330a617e41c1207efd880be766136ce9b0bca
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-10-07T14:37:50Z

    Moving CSVOptions to sql/catalyst

commit 0d7e7990799a307794f10fe52030eca850762927
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-10-07T17:42:02Z

    Moving CSVInferSchema to sql/catalyst

commit 7abbfcae8444e88391e1d456a9a249fa5fccf6f0
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-16T19:12:58Z

    Added an expression test

commit 6ca4fa3e2bf6b29b82f1ece33c5a75beaf934d87
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-21T15:03:39Z

    Support options

commit e76536bfc62911c4e2039d4fc63d771b1c3b5fe1
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-21T16:05:55Z

    Register schema_of_csv and adding SQL tests

commit ef03d3a38e3a7a31a04cda901821238b01ec8f37
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-21T17:27:33Z

    Adding schema_of_csv and tests

commit 8ed225f3d2c5fbe3df75f8518d539fcdd5f01a2e
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-21T17:54:43Z

    Support schema_of_csv in PySpark

commit 5fb17fbefd52198bcf735abc132b0ab9174cbe0f
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-10-07T18:49:00Z

    2.5 -> 3.0

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to