Re: [PR] [SPARK-50007][SQL][SS] Provide default values for metrics on observe API when physical node is lost in executed plan [spark]

via GitHub Mon, 21 Oct 2024 20:00:12 -0700


HeartSaVioR commented on PR #48517:
URL: https://github.com/apache/spark/pull/48517#issuecomment-2428109032


   UPDATE: @hvanhovell and I had an offline talk. I wasn't very clear about the 
semantic of the API, and he clarified that the intention of the API (semantic) 
is not to cope with the default values, but "to ensure" the node is executed in 
any way. 
   E.g. it is a wrong optimization if any optimization would end up with 
dropping the node. 
   
   I only see the issue for PruneFilters so it's easier to make a point fix, 
but need to discuss further with other folks as well to aim for better fix.
   
   That said, he also stated that providing default value is worse than not 
having the metrics; which I agree based on the new understanding of the 
semantic of the API. Users should be able to know whether Spark fails to 
calculate the metrics or not. It is at least possible before this fix.
   
   I'll revert the commit and look for a better fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50007][SQL][SS] Provide default values for metrics on observe API when physical node is lost in executed plan [spark]

Reply via email to