GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22666
[SPARK-25672][SQL] schema_of_csv() - schema inference from an example ## What changes were proposed in this pull request? In the PR, I propose to add new function - *schema_of_csv()* which infers schema of CSV string literal. The result of the function is a string containing a schema in DDL format. For example: ```sql select schema_of_csv('1|abc', map('delimiter', '|')) ``` ``` struct<_c0:int,_c1:string> ``` ## How was this patch tested? Added new tests to `CsvFunctionsSuite`, `CsvExpressionsSuite` and SQL tests to `csv-functions.sql` You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 schema_of_csv-function Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22666.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22666 ---- commit 4c00900e8bfbe56d13576d6dc21fb2f2dbbb105d Author: Maxim Gekk <maxim.gekk@...> Date: 2018-10-07T14:17:05Z Dependency of uniVocity 2.7.3 is added for sql/catalyst commit 25f330a617e41c1207efd880be766136ce9b0bca Author: Maxim Gekk <maxim.gekk@...> Date: 2018-10-07T14:37:50Z Moving CSVOptions to sql/catalyst commit 0d7e7990799a307794f10fe52030eca850762927 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-10-07T17:42:02Z Moving CSVInferSchema to sql/catalyst commit 7abbfcae8444e88391e1d456a9a249fa5fccf6f0 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-09-16T19:12:58Z Added an expression test commit 6ca4fa3e2bf6b29b82f1ece33c5a75beaf934d87 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-09-21T15:03:39Z Support options commit e76536bfc62911c4e2039d4fc63d771b1c3b5fe1 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-09-21T16:05:55Z Register schema_of_csv and adding SQL tests commit ef03d3a38e3a7a31a04cda901821238b01ec8f37 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-09-21T17:27:33Z Adding schema_of_csv and tests commit 8ed225f3d2c5fbe3df75f8518d539fcdd5f01a2e Author: Maxim Gekk <maxim.gekk@...> Date: 2018-09-21T17:54:43Z Support schema_of_csv in PySpark commit 5fb17fbefd52198bcf735abc132b0ab9174cbe0f Author: Maxim Gekk <maxim.gekk@...> Date: 2018-10-07T18:49:00Z 2.5 -> 3.0 ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org