Luis created SPARK-28874:
----------------------------

             Summary: Pyspark bug in date_format
                 Key: SPARK-28874
                 URL: https://issues.apache.org/jira/browse/SPARK-28874
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0, 2.1.0
            Reporter: Luis


Pyspark date_format add one years in the last days off year :

Example :

{code:python}

from datetime import datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.functions i

start_date = datetime(2010,1,1)

end_date = datetime(2055,1,1)

indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), 
end_date.strftime('%m/%d/%Y'), freq='D')

data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in 
indx_ts.values ]

from pyspark.sql.types import *

df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), 
True)]))
df_string = df_p.withColumn("date_string"
 ,date_format(col("d"), "YYYY-MM-dd"))

df_string.filter("d!=date_string").show(1000)

{code}

 

+----------+-----------+ | d|date_string| +----------+-----------+ |2010-12-26| 
2011-12-26| |2010-12-27| 2011-12-27| |2010-12-28| 2011-12-28| |2010-12-29| 
2011-12-29| |2010-12-30| 2011-12-30| |2010-12-31| 2011-12-31| |2012-12-30| 
2013-12-30| |2012-12-31| 2013-12-31| |2013-12-29| 2014-12-29| |2013-12-30| 
2014-12-30| |2013-12-31| 2014-12-31



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to