Vishal Doshi created SPARK-22070:
------------------------------------

             Summary: Spark SQL filter comparisons failing with timestamps and 
ISO-8601 strings
                 Key: SPARK-22070
                 URL: https://issues.apache.org/jira/browse/SPARK-22070
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.2.0
            Reporter: Vishal Doshi
            Priority: Minor


Filter behavior seems like it's ignoring time in the ISO-8601 string. See below 
for code to reproduce:


{code}
import datetime

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, TimestampType

spark = SparkSession.builder.getOrCreate()

data = [{"dates": datetime.datetime(2017, 1, 1, 12)}]
schema = StructType([StructField("dates", TimestampType())])
df = spark.createDataFrame(data, schema=schema)
# df.head() returns (correctly):
#   Row(dates=datetime.datetime(2017, 1, 1, 12, 0))

df.filter(df["dates"] > datetime.datetime(2017, 1, 1, 11).isoformat()).count()
# should return 1, instead returns 0
# datetime.datetime(2017, 1, 1, 11).isoformat() returns '2017-01-01T11:00:00'
df.filter(df["dates"] > datetime.datetime(2016, 12, 31, 11).isoformat()).count()
# this one works
{code}

Of course, the simple work around is to use the datetime objects themselves in 
the query expression, but in practice, this means using dateutil to parse some 
data, which is not ideal.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to