[jira] [Created] (SPARK-32123) [Python] Setting `spark.sql.session.timeZone` only partially respected

Toby Harradine (Jira) Sun, 28 Jun 2020 15:42:02 -0700

Toby Harradine created SPARK-32123:
--------------------------------------

             Summary: [Python] Setting `spark.sql.session.timeZone` only 
partially respected
                 Key: SPARK-32123
                 URL: https://issues.apache.org/jira/browse/SPARK-32123
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.1
            Reporter: Toby Harradine



The setting `spark.sql.session.timeZone` is respected by PySpark when 
converting from and to Pandas, as described 
[here|http://spark.apache.org/docs/latest/sql-programming-guide.html#timestamp-with-time-zone-semantics].
 However, when timestamps are converted directly to Pythons `datetime` objects, 
its ignored and the systems timezone is used.

This can be checked by the following code snippet
{code:java}
import pyspark.sql

spark = (pyspark
         .sql
         .SparkSession
         .builder
         .master('local[1]')
         .config("spark.sql.session.timeZone", "UTC")
         .getOrCreate()
        )

df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
df = df.withColumn("ts", df["ts"].astype("timestamp"))

print(df.toPandas().iloc[0,0])
print(df.collect()[0][0])
{code}
Which for me prints (the exact result depends on the timezone of your system, 
mine is Europe/Berlin)
{code:java}
2018-06-01 01:00:00
2018-06-01 03:00:00
{code}
Hence, the method `toPandas` respected the timezone setting (UTC), but the 
method `collect` ignored it and converted the timestamp to my systems timezone.

The cause for this behaviour is that the methods `toInternal` and 
`fromInternal` of PySparks `TimestampType` class don't take into account the 
setting `spark.sql.session.timeZone` and use the system timezone.

If the maintainers agree that this should be fixed, I would try to come up with 
a patch. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32123) [Python] Setting `spark.sql.session.timeZone` only partially respected

Reply via email to