viirya commented on pull request #32559:
URL: https://github.com/apache/spark/pull/32559#issuecomment-844310285


   > I figured out how the UDF error is happening. Basically it's due to `val 
values = c.branches.map(_._2) ++ c.elseValue` in `commonChildrenToRecurse`.
   > 
   > I think it should actually be something like
   > 
   > ```
   > val values = if (c.isDefined) {
   >   c.branches.map(_._2) ++ c.elseValue
   > } else {
   >   Nil
   > }
   > ```
   > 
   > because if the `elseValue` isn't defined then you're never going to find a 
common value among all branches.
   
   Good catch! Do you want to submit a separate PR for this fix?
    
   > The problem is then that gets rid of the optimization of `when(length(col) 
> 0, col)`, because col was only being pulled out of the when value because of 
this "bug" I think. So I started thinking about this a little bit. Would it 
make sense to do something like track separately:
   > 
   >     * Definitely evaluated expressions
   > 
   >     * Optionally evaluated expressions
   > 
   > 
   > And then consider an expression to be pulled out as a subexpression if:
   > 
   >     * The expression is definitely evaluated at least once AND
   > 
   >     * The expression is definitely or optionally evaluated at least twice
   > 
   > 
   > Then `commonChildrenToRecurse` would add the common expressions to 
definitely evaluated, but you could add each individual expression to 
optionally evaluated so they would get deduped if possible too. Maybe a 
separate issue from this PR, but figured I'd bring it up
   
   For optionally evaluated expressions, do you mean expressions only happen in 
a few conditions/values? Since it is not definitely evaluated, but as a 
subexpression it will be evaluated all the times. I think it is hard to make 
sure it is always good for performance.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to