sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000545069


   I think there is  a bug in the partitioning cast where the value is inferred 
using the raw value which could contain escaped characters. We escape column 
name but not the value! For example, if you have double value 4.5, you end up 
with trying to infer double from `4%2E5`. See 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L307-L311
 and 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L491.
 Timestamps escape value separately in the "inferPartitionColumnValue" method.
   
   IMHO, "inferPartitionColumnValue" method should already take the actual 
value that was unescaped, not the raw one. Because of this issue, this PR 
introduces breaking changes as type inference could be incorrect in doubles and 
decimals.
   
   @cloud-fan is this a known issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to