ref :
https://stackoverflow.com/questions/76436159/apache-spark-not-reading-utc-timestamp-from-mongodb-correctly

Hello All,
I've data stored in MongoDB collection and the timestamp column is not
being read by Apache Spark correctly. I'm running Apache Spark on GCP
Dataproc.

Here is sample data :

-----

In Mongo :

timeslot_date  :
timeslot  |timeslot_date         |
+--------------------------+------1683527400|{2023-05-08T06:30:00Z}|


When I use pyspark to read this  :

+----------+-------------------+
timeslot  |timeslot_date      |
+----------+-------------------+1683527400|2023-05-07 23:30:00|
+----------------+-------+-----

-----

My understanding is, data in Mongo is in UTC format i.e.
2023-05-08T06:30:00Z is in UTC format. I'm in PST timezone. I'm not
clear why spark is reading it a different timezone format (neither PST
nor UTC) Note - it is not reading it as PST timezone, if it was doing
that it would advance the time by 7 hours, instead it is doing the
opposite.

Where is the default timezone format taken from, when Spark is reading
data from MongoDB ?

Any ideas on this ?

tia!

Reply via email to