Luis created SPARK-28874: ---------------------------- Summary: Pyspark bug in date_format Key: SPARK-28874 URL: https://issues.apache.org/jira/browse/SPARK-28874 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.0, 2.1.0 Reporter: Luis
Pyspark date_format add one years in the last days off year : Example : {code:python} from datetime import datetime from dateutil.relativedelta import relativedelta from pyspark.sql.functions i start_date = datetime(2010,1,1) end_date = datetime(2055,1,1) indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D') data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ] from pyspark.sql.types import * df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)])) df_string = df_p.withColumn("date_string" ,date_format(col("d"), "YYYY-MM-dd")) df_string.filter("d!=date_string").show(1000) {code} +----------+-----------+ | d|date_string| +----------+-----------+ |2010-12-26| 2011-12-26| |2010-12-27| 2011-12-27| |2010-12-28| 2011-12-28| |2010-12-29| 2011-12-29| |2010-12-30| 2011-12-30| |2010-12-31| 2011-12-31| |2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31| |2013-12-29| 2014-12-29| |2013-12-30| 2014-12-30| |2013-12-31| 2014-12-31 -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org