Previously in Spark2 we could use the spark function date_format with the "W" flag and it will provide back the week of month of that date. In Spark3 when trying this there is an error back:
* org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html * Caused by: java.lang.IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: W, Please use the SQL function EXTRACT instead If I use the first solution and set the policy to LEGACY, currently it is EXCEPTION, then the code runs through spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY") df1 = spark.createDataFrame( [ (1, date(2014, 3, 7)), (2, date(2014, 3, 8)), (3, date(2014, 3, 30)), (4, date(2014, 3, 31)), (5, date(2015, 3, 7)), (6, date(2015, 3, 8)), (7, date(2015, 3, 30)), (8, date(2015, 3, 31)), ], schema="a long, b date", ) df1 = df1.withColumn("WEEKOFMONTH", F.date_format(F.col("b"), "W")) df1.show() +---+----------+-----------+ | a| b|WEEKOFMONTH| +---+----------+-----------+ | 1|2014-03-07| 2| | 2|2014-03-08| 2| | 3|2014-03-30| 6| | 4|2014-03-31| 6| | 5|2015-03-07| 1| | 6|2015-03-08| 2| | 7|2015-03-30| 5| | 8|2015-03-31| 5| +---+----------+-----------+ Trying to explore the latter options, in both the EXTRACT and the datetime patterns that are listed, I don't see that there is the "W" option or equivalent way to produce this. The datetime link mentions: In Spark 3.0, we define our own pattern strings in Datetime Patterns for Formatting and Parsing<https://spark.apache.org/docs/3.2.1/sql-ref-datetime-pattern.html>, which is implemented via DateTimeFormatter<https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html> under the hood. If I follow the link into the DateTimeFormatter then I see the W existing there: W week-of-month number 4 My first question is, with the Spark3 and not setting the policy to LEGACY, is there no longer a way to compute this using the spark built in functions? The second question is, if we are setting the policy to LEGACY, is there any caveats or items to be aware of that might get us later? For example in a future Spark 3.3.X is this option going to be deprecated This was an item that we ran into from Spark2 to Spark3 conversion and trying to see how to best handle this Thanks for your feedback, Kevin ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
