[ https://issues.apache.org/jira/browse/SPARK-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-26263. --------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23215 [https://github.com/apache/spark/pull/23215] > Throw exception when Partition column value can't be converted to user > specified type > ------------------------------------------------------------------------------------- > > Key: SPARK-26263 > URL: https://issues.apache.org/jira/browse/SPARK-26263 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Gengliang Wang > Assignee: Gengliang Wang > Priority: Major > Fix For: 3.0.0 > > > Currently if user provides data schema, partition column values are converted > as per it. But if the conversion failed, e.g. converting string to int, the > column value is null. > For the following directory > /tmp/testDir > ├── p=bar > └── p=foo > If we run: > ``` > val schema = StructType(Seq(StructField("p", IntegerType, false))) > spark.read.schema(schema).csv("/tmp/testDir/").show() > ``` > We will get: > +----+ > | p| > +----+ > |null| > |null| > +----+ > This PR propose to throw exception in such case, instead of converting into > null value silently: > 1. These null partition column values doesn't make sense to users in most > case. It is better to know the conversion failure, and then adjust the schema > or ETL jobs, etc to fix it. > 2. There are always exceptions on such conversion failure for non-partition > data columns. Partition columns should have the same behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org