JacobZheng0927 opened a new pull request, #43979: URL: https://github.com/apache/spark/pull/43979
### What changes were proposed in this pull request? Modify the toJSON return format of CaseWhen to avoid child expressions being populated multiple times in JSON. **Before:** ```json [ { "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen", "num-children":3, "branches":[ { "product-class":"scala.Tuple2", "_1":[ { "class":"org.apache.spark.sql.catalyst.expressions.EqualTo", "num-children":2, "left":0, "right":1 }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"1", "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"2", "dataType":"integer" } ], "_2":[ { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"3", "dataType":"integer" } ] } ], "elseValue":[ { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"4", "dataType":"integer" } ] }, { "class":"org.apache.spark.sql.catalyst.expressions.EqualTo", "num-children":2, "left":0, "right":1 }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"1", "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"2", "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"3", "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":"4", "dataType":"integer" } ] ``` **After:** ```json [ { "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen", "num-children":3, "branches":[ { "condition":0, "value":1 } ], "elseValue":2 }, { "class":"org.apache.spark.sql.catalyst.expressions.EqualTo", "num-children":2, "left":0, "right":1 }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":1, "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":2, "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":3, "dataType":"integer" }, { "class":"org.apache.spark.sql.catalyst.expressions.Literal", "num-children":0, "value":4, "dataType":"integer" } ] ``` ### Why are the changes needed? When executing the toJSON method on an expression nested in more than one case when, it is easy to cause OOM because the child expression is expanded multiple times. eg. ``` CASE WHEN(`cost` <= 250) THEN '(245-250]' ELSE CASE WHEN(`cost` <= 255) THEN '(250-255]' ELSE CASE WHEN(`cost` <= 260) THEN '(255-260]' ELSE CASE WHEN(`cost` <= 265) THEN '(260-265]' ELSE '----' END END END END END ``` ### Does this PR introduce _any_ user-facing change? Yes. It changes the return result of CaseWhen's toJSON method. ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org