[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23215 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239460682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -272,9 +279,14 @@ object PartitioningUtils { val literal = if (userSpecifiedDataTypes.contains(columnName)) { // SPARK-26188: if user provides corresponding column schema, get the column value without // inference, and then cast it as user specified data type. -val columnValue = inferPartitionColumnValue(rawColumnValue, false, timeZone) -val castedValue = - Cast(columnValue, userSpecifiedDataTypes(columnName), Option(timeZone.getID)).eval() +val dataType = userSpecifiedDataTypes(columnName) +val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, false, timeZone) +val columnValue = columnValueLiteral.eval() +val castedValue = Cast(columnValueLiteral, dataType, Option(timeZone.getID)).eval() +if (validatePartitionColumns && columnValue != null && castedValue == null) { + throw new RuntimeException(s"Failed to cast value `$columnValue` to `$dataType` " + +s"for partition column `$columnName`") +} Literal.create(castedValue, userSpecifiedDataTypes(columnName)) --- End diff -- `Literal.create(castedValue, dataType)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239453312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1396,6 +1396,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val VALIDATE_PARTITION_COLUMNS = +buildConf("spark.sql.sources.validatePartitionColumns") + .internal() + .doc("When this option is set to true, partition column values will be validated with " + +"provided schema. If the validation fails, a runtime exception is thrown." + +"When this option is set to false, the partition column value will be converted to null " + +"if it can not be converted to corresponding provided schema.") --- End diff -- `... can not be casted to ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239453026 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -95,6 +95,31 @@ class FileIndexSuite extends SharedSQLContext { } } + test("SPARK-26263: Throw exception when partition value can't be converted to specific type") { --- End diff -- `can't be casted to user-specified type` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239423651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -272,9 +279,13 @@ object PartitioningUtils { val literal = if (userSpecifiedDataTypes.contains(columnName)) { // SPARK-26188: if user provides corresponding column schema, get the column value without // inference, and then cast it as user specified data type. -val columnValue = inferPartitionColumnValue(rawColumnValue, false, timeZone) -val castedValue = - Cast(columnValue, userSpecifiedDataTypes(columnName), Option(timeZone.getID)).eval() +val dataType = userSpecifiedDataTypes(columnName) +val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, false, timeZone) +val columnValue = columnValueLiteral.eval() +val castedValue = Cast(columnValueLiteral, dataType, Option(timeZone.getID)).eval() +if (validatePartitionColumns && columnValue != null && castedValue == null) { + throw new RuntimeException(s"Failed to cast partition value `$columnValue` to $dataType") --- End diff -- Good suggestion. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239388954 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -272,9 +279,13 @@ object PartitioningUtils { val literal = if (userSpecifiedDataTypes.contains(columnName)) { // SPARK-26188: if user provides corresponding column schema, get the column value without // inference, and then cast it as user specified data type. -val columnValue = inferPartitionColumnValue(rawColumnValue, false, timeZone) -val castedValue = - Cast(columnValue, userSpecifiedDataTypes(columnName), Option(timeZone.getID)).eval() +val dataType = userSpecifiedDataTypes(columnName) +val columnValueLiteral = inferPartitionColumnValue(rawColumnValue, false, timeZone) +val columnValue = columnValueLiteral.eval() +val castedValue = Cast(columnValueLiteral, dataType, Option(timeZone.getID)).eval() +if (validatePartitionColumns && columnValue != null && castedValue == null) { + throw new RuntimeException(s"Failed to cast partition value `$columnValue` to $dataType") --- End diff -- Can we also show `columnName` in this exception message? So it is easier to know which partition column has such error. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org