JacobZheng0927 opened a new pull request, #43979:
URL: https://github.com/apache/spark/pull/43979

   ### What changes were proposed in this pull request?
   Modify the toJSON return format of CaseWhen to avoid child expressions being 
populated multiple times in JSON.
   
   **Before:**
   
   ```json
   [
       {
           "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
           "num-children":3,
           "branches":[
               {
                   "product-class":"scala.Tuple2",
                   "_1":[
                       {
                           
"class":"org.apache.spark.sql.catalyst.expressions.EqualTo",
                           "num-children":2,
                           "left":0,
                           "right":1
                       },
                       {
                           
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
                           "num-children":0,
                           "value":"1",
                           "dataType":"integer"
                       },
                       {
                           
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
                           "num-children":0,
                           "value":"2",
                           "dataType":"integer"
                       }
                   ],
                   "_2":[
                       {
                           
"class":"org.apache.spark.sql.catalyst.expressions.Literal",
                           "num-children":0,
                           "value":"3",
                           "dataType":"integer"
                       }
                   ]
               }
           ],
           "elseValue":[
               {
                   "class":"org.apache.spark.sql.catalyst.expressions.Literal",
                   "num-children":0,
                   "value":"4",
                   "dataType":"integer"
               }
           ]
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.EqualTo",
           "num-children":2,
           "left":0,
           "right":1
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":"1",
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":"2",
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":"3",
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":"4",
           "dataType":"integer"
       }
   ]
   ```
   
   **After:**
   
   ```json
   [
       {
           "class":"org.apache.spark.sql.catalyst.expressions.CaseWhen",
           "num-children":3,
           "branches":[
               {
                   "condition":0,
                   "value":1
               }
           ],
           "elseValue":2
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.EqualTo",
           "num-children":2,
           "left":0,
           "right":1
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":1,
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":2,
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":3,
           "dataType":"integer"
       },
       {
           "class":"org.apache.spark.sql.catalyst.expressions.Literal",
           "num-children":0,
           "value":4,
           "dataType":"integer"
       }
   ]
   ```
   ### Why are the changes needed?
   When executing the toJSON method on an expression nested in more than one 
case when, it is easy to cause OOM because the child expression is expanded 
multiple times.
   eg. 
   ```
   CASE 
       WHEN(`cost` <= 250) THEN '(245-250]'
       ELSE CASE 
           WHEN(`cost` <= 255) THEN '(250-255]'
               ELSE CASE 
                   WHEN(`cost` <= 260) THEN '(255-260]'
                       ELSE CASE
                           WHEN(`cost` <= 265) THEN '(260-265]'
                           ELSE '----' 
                       END 
                   END
               END
           END
       END
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. It changes the return result of CaseWhen's toJSON method.
   
   
   ### How was this patch tested?
   Unit test
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to