[GitHub] [spark] HyukjinKwon opened a new pull request #27854: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source

GitBox Mon, 09 Mar 2020 02:58:52 -0700

HyukjinKwon opened a new pull request #27854: [SPARK-31065][SQL] Match 
schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes two things:
   
   1. Convert `null` to `string` type during schema inference of 
`schema_of_json` as JSON datasource does. This is a bug fix as well because 
`null` string is not the proper DDL formatted string and it is unable for SQL 
parser to recognise it as a type string. We should match it to JSON datasource 
and return a string type so `schema_of_json` returns a proper DDL formatted 
string.
   
   2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema 
inference.
   
   
   ### Why are the changes needed?
   
   To let `schema_of_json` return a proper DDL formatted string, and respect 
`dropFieldIfAllNull` option.
   
   ### Does this PR introduce any user-facing change?
   Yes, it does.
   
   ```scala
   import collection.JavaConverters._
   import org.apache.spark.sql.functions._
   
   spark.range(1).select(schema_of_json("""{"id": ""}""")).show()
   spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": 
null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false)
   ```
   
   **Before:**
   
   ```
   struct<id:null>
   struct<drop:struct<drop:null>,id:string> 
   ```
   
   
   **After:**
   
   ```
   struct<id:string>
   struct<id:string>
   ```
   
   
   ### How was this patch tested?
   
   Manually tested, and unittests were added.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon opened a new pull request #27854: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source

Reply via email to