Vishal Doshi created SPARK-22070: ------------------------------------ Summary: Spark SQL filter comparisons failing with timestamps and ISO-8601 strings Key: SPARK-22070 URL: https://issues.apache.org/jira/browse/SPARK-22070 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.2.0 Reporter: Vishal Doshi Priority: Minor
Filter behavior seems like it's ignoring time in the ISO-8601 string. See below for code to reproduce: {code} import datetime from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, TimestampType spark = SparkSession.builder.getOrCreate() data = [{"dates": datetime.datetime(2017, 1, 1, 12)}] schema = StructType([StructField("dates", TimestampType())]) df = spark.createDataFrame(data, schema=schema) # df.head() returns (correctly): # Row(dates=datetime.datetime(2017, 1, 1, 12, 0)) df.filter(df["dates"] > datetime.datetime(2017, 1, 1, 11).isoformat()).count() # should return 1, instead returns 0 # datetime.datetime(2017, 1, 1, 11).isoformat() returns '2017-01-01T11:00:00' df.filter(df["dates"] > datetime.datetime(2016, 12, 31, 11).isoformat()).count() # this one works {code} Of course, the simple work around is to use the datetime objects themselves in the query expression, but in practice, this means using dateutil to parse some data, which is not ideal. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org