[ 
https://issues.apache.org/jira/browse/SPARK-43341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43341:
-----------------------------------
    Labels: pull-request-available  (was: )

> StructType.toDDL does not pick up on non-nullability of column in nested 
> struct
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-43341
>                 URL: https://issues.apache.org/jira/browse/SPARK-43341
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2
>            Reporter: Bram Boogaarts
>            Priority: Major
>              Labels: pull-request-available
>
> h2. The problem
> When converting a StructType instance containing a nested StructType column 
> which in turn contains a column for which {{nullable = false}} to a DDL 
> string using {{{}.toDDL{}}}, the resulting DDL string does not include this 
> non-nullability. For example:
> {code:java}
> val testschema = StructType(List(
>   StructField("key", IntegerType, false),
>   StructField("value", StringType, true),
>   StructField("nestedCols", StructType(List(
>     StructField("nestedKey", IntegerType, false),
>     StructField("nestedValue", StringType, true)
>   )), false)
> ))
> println(testschema.toDDL)
> println(StructType.fromDDL(testschema.toDDL)){code}
> gives:
> {code:java}
> key INT NOT NULL,value STRING,nestedCols STRUCT<nestedKey: INT, nestedValue: 
> STRING> NOT NULL
> StructType(
>   StructField(key,IntegerType,false),
>   StructField(value,StringType,true),
>   StructField(nestedCols,StructType(
>     StructField(nestedKey,IntegerType,true),
>     StructField(nestedValue,StringType,true)
>   ),false)
> ){code}
>  
> This is due to the fact that {{StructType.toDDL}} calls {{StructField.toDDL}} 
> for its fields, which in turn calls {{.sql}} for its {{{}dataType{}}}. If 
> {{dataType}} is a {{{}StructType{}}}, the call to {{.sql}} in turn calls 
> {{.sql}} for all the nested fields, and this last method does not include the 
> nullability of the field in its output.
> h2. Proposed solution
> {{StructField.toDDL}} should call {{dataType.toDDL}} for a 
> {{{}StructType{}}}, since this will include information about nullability of 
> nested columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to