mihailom-db commented on PR #49670:
URL: https://github.com/apache/spark/pull/49670#issuecomment-2614429747

   @LuciferYang thanks for the concern, but there a couple of reasons why we 
would want to do this, apart from abandoning non-ANSI behaviour. 
   
   In spark 4.0.0 ANSI is turned on by default. Because of this, we need to 
make sure we do not suggest turning off ANSI config that easily. Previously 
this suggestion made sense, as users had to had set ANSI config explicitly, so 
suggestion to turn it off was a suggestion to revert an explicit set to the 
default state. Now we would suggest turning off (switching to non-default 
value) on a config that ensures spark queries return proper result, without 
returning unexpected nulls on erroneous inputs. 
   
   Additionally, once user sets a config to specific value, they would usually 
stick with it, without considering it until they run into some problems. 
Switching off ANSI would make many different expressions return nulls, which is 
really hard to catch without inspecting data, which might not be something user 
wants to do, when the default behaviour of spark now is with ANSI on. Also, 
getting to the phase where the query already run, sometimes leads to the state 
where it is almost impossible to go back and revert the change in data without 
a big pain.
   
   So IMO we need to keep clear the difference between the change that is 
coming, the switch of default value, and the newly proposed thing of abandoning 
non-ANSI behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to