LuciferYang commented on PR #42039:
URL: https://github.com/apache/spark/pull/42039#issuecomment-1648155159

   > I see, arrow does not encode the full precision? then always truncate 
here? it is at least simpler.
   
   Let me clarify my thoughts again:
   
   1. This is not related to arrow, the same result applies to using 
`ExpressionEncoder`(https://github.com/apache/spark/pull/40395)
   
   2. In the current implementation of Spark, we use `DateTimeUtils` to convert 
`Instant` or `LocalDateTime` to microseconds:
   
   
https://github.com/apache/spark/blob/4644344f443f2f6ab26a72b44b9219fb6c82d26e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L542-L568
   
   
   3. Therefore, in special scenarios, the result of `encoded+decoded` data 
loses precision after microseconds. 
   
   For example:
   
   Normal scenario:input is `2023-03-13T18:09:12.498162`, the result after 
`encoded+decoded` is `2023-03-13T18:09:12.498162`, input equals output.
   
   Special scenario(Java17&Linux): input is `2023-03-13T18:09:12.498162194`, 
the result after `encoded+decoded` is `2023-03-13T18:09:12.498162`, input not 
equal to output.
   
   Calling `truncatedTo(ChronoUnit.MICROS)` on input in special scenarios is 
only for the convenience of comparing results
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to