Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Ok sounds good. Could I get some opinions on the best way to convert
internal Spark timestamps since they are stored as UTC time? I think we have
the following options:
1. Write Arrow data with SESSION_LOCAL timestamp (as is currently in this
PR), then convert to local timezone without timestamp in Python after the data
is loaded into Pandas. This would be at the end of `toPandas()` or just before
the user function is called in `pandas_udf`s, and convert back to UTC again
just after.
2. Convert Spark internal data to local timezone without timestamp in Scala
and write to Arrow data as timezone naive.
With (1) it's easy to do the conversion with Pandas, but we have to make
sure it gets done at multiple places. With (2), it's just in one spot but I'm
not sure if it's possible to end up doing the conversion more than once
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]