sadikovi edited a comment on pull request #34995: URL: https://github.com/apache/spark/pull/34995#issuecomment-1000545069
I think there is a bug in the partitioning cast where the value is inferred using the raw value which could contain escaped characters. We escape column name but not the value! For example, if you have double value 4.5, you end up with trying to infer double from `4%2E5`. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L307-L311 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L491. Timestamps escape value separately in the "inferPartitionColumnValue" method. IMHO, "inferPartitionColumnValue" method should already take the actual value that was unescaped, not the raw one. Because of this issue, this PR introduces breaking changes as type inference could be incorrect in doubles and decimals. @cloud-fan is this a known issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
