Hi Spark user group, Spark 2.4 to 3 migration for existing Spark jobs seems a big challenge given a long list of changes in migration guide <https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30>, they could introduce failures or output changes related to behavior changes in Spark 3. This makes the migration risky if we don't identify and fix changes in the migration guide.
However, the guide is a bit high level. For some items in the guide, I don't know how to use an example query/job to compare behavior between 2.4 and 3.x. A specific example: - In Spark version 2.4 and below, you can create map values with map type key via built-in function such as CreateMap, MapFromArrays, etc. In Spark 3.0, it’s not allowed to create map values with map type key with these built-in functions. Users can use map_entries function to convert map to array<struct<key, value» as a workaround. In addition, users can still read map values with map type key from data source or Java/Scala collections, though it is discouraged. Does anyone have an example to illustrate what this item is about? Also, I'm curious if anyone has done, or is doing large scale Spark 2.4 to 3.x migration? What's your experience in handling the long list of breaking changes? Thanks, Jason Xu