Benedikt Maria Beckermann created SPARK-30767:
-------------------------------------------------

             Summary: from_json changes times of timestmaps by several minutes 
without error 
                 Key: SPARK-30767
                 URL: https://issues.apache.org/jira/browse/SPARK-30767
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.4
         Environment: We ran the example code with Spark 2.4.4 via Azure 
Databricks with Databricks Runtime version 6.3 within an interactive cluster. 
We encountered the issue first on a Job Cluster running a streaming application 
on Databricks Runtime Version 5.4.
            Reporter: Benedikt Maria Beckermann


When a json text column includes a timestamp and the timestamp has a format 
like {{2020-01-25T06:39:45.887429Z}}, the function 
{{from_json(Column,StructType)}} is able to infer a timestamp but that 
timestamp is changed by several minutes. 
Spark does not throw any kind of error but continues to run with the 
invalidated timestamp. 

The following scala snipped is able to reproduce the issue.
 
{code:scala}
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val df = Seq("""{"time":"2020-01-25T06:39:45.887429Z"}""").toDF("json")

val struct = new StructType().add("time", TimestampType, nullable = true)

val timeDF = df
  .withColumn("time (string)", get_json_object(col("json"), "$.time"))
  .withColumn("time casted directly (CORRECT)", col("time 
(string)").cast(TimestampType))
  .withColumn("time casted via struct (INVALID)", from_json(col("json"), 
struct))

display(timeDF)
{code}


Output: 
||json||time (string)||time casted directly (CORRECT)||time casted via struct 
(INVALID)
|{"time":"2020-01-25T06:39:45.887429Z"}|2020-01-25T06:39:45.887429Z|2020-01-25T06:39:45.887+0000|{"time":"2020-01-25T06:54:32.429+0000"}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to