Hey Everyone

Hope you are doing well

I was looking through the datetime format¹ in the docs and think that the
docs are a little incorrect while describing the formatting for year
format. It is written as "... If the count of letters is less than four
(but not two), then the sign is only output for negative years. Otherwise,
the sign is output if the pad width is exceeded when ‘G’ is not present ..."

Some things are incorrect about the statement (The code below is run in
python 3.13.1 and spark 3.5.4)

1. Negative sign is output for 4 or more letters regardless if the pad
width is exceeded or not.
The following example is where year was less than the padding width
>>> df.select(F.date_format(F.make_date(F.lit(-2012), F.lit(1), F.lit(1)),
'yyyyy')).show()
+------------------------------------------+
|date_format(make_date(-2012, 1, 1), yyyyy)|
+------------------------------------------+
|                                    -02012|
+------------------------------------------+
I think the behaviour is obvious, but the doc needs some refinment.

2. Positive signs are output (when pad width is exceeded) even if 'G' is
present.
Ex -
>>> df.select(F.date_format(F.make_date(F.lit(20125), F.lit(1), F.lit(1)),
'yyyy G')).show()
+-------------------------------------------+
|date_format(make_date(20125, 1, 1), yyyy G)|
+-------------------------------------------+
|                                  +20125 AD|
+-------------------------------------------
Don't know the behaviour of 'G' but I think it never prints the negative
sign. It converts the negative value to 'BC' and again prints the positive
sign (when pad width is exceeded)
Ex-
>>> df.select(F.date_format(F.make_date(F.lit(-20125), F.lit(1), F.lit(1)),
'yyyy G')).show()
+--------------------------------------------+
|date_format(make_date(-20125, 1, 1), yyyy G)|
+--------------------------------------------+
|                                   +20126 BC|
+--------------------------------------------+

I think the statement should be like "Prints the negative sign for any
number of characters except two. Prints the positive sign for four or more
letters when pad width is exceeded".

It might be a small thing, but still wanted to make it sure. If I am
correct, I can raise a pull request for the same.

Thanks
Dhruv

¹: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Reply via email to