[ https://issues.apache.org/jira/browse/SPARK-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949068#comment-16949068 ]
Prasanna Saraswathi Krishnan edited comment on SPARK-29432 at 10/11/19 5:51 PM: -------------------------------------------------------------------------------- My bad. When I formatted the code, I deleted the saveAsTable statement by mistake. If you save the dataframe like - df.write.saveAsTable('delault.withcolTest', mode='overwrite') And then get the schema, you should be able to reproduce the error. Not sure how it is resolved, if you can't reproduce the issue. was (Author: prasanna.sk): My bad. When I formatted the code, I deleted the saveAsTable statement ny mistake. If you save the dataframe like - df.write.saveAsTable('delault.withcolTest', mode='overwrite') And then get the schema, you should be able to reproduce the error. Not sure how it is resolved, if you can't reproduce the issue. > nullable flag of new column changes when persisting a pyspark dataframe > ----------------------------------------------------------------------- > > Key: SPARK-29432 > URL: https://issues.apache.org/jira/browse/SPARK-29432 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Environment: Spark 2.4.0-cdh6.1.1 (Cloudera distribution) > Python 3.7.3 > Reporter: Prasanna Saraswathi Krishnan > Priority: Minor > > When I add a new column to a dataframe with {{withColumn}} function, by > default, the column is added with {{nullable=false}}. > But, when I save the dataframe, the flag changes to {{nullable=true}}. Is > this the expected behavior? why? > > {code:java} > >>> l = [('Alice', 1)] > >>> df = spark.createDataFrame(l) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > {code} > {code:java} > >>> from pyspark.sql.functions import lit > >>> df = df.withColumn('newCol', lit('newVal')) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = false) > >>> df.write.saveAsTable('default.withcolTest', mode='overwrite') > >>> spark.sql("select * from default.withcolTest").printSchema() > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org