Re: [PR] [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset [spark]

via GitHub Fri, 31 May 2024 23:03:26 -0700


HeartSaVioR commented on PR #46820:
URL: https://github.com/apache/spark/pull/46820#issuecomment-2143313356


   > Do we have a general princile for this? It looks like all statistics-based 
optimization are risky, as the query plan for each micro batch have different 
statistics and the optimizer may rewrite the plan differently.
   
   There are multiple approaches e.g. never allow streaming source as empty 
source node but we need to also consider the perf regression. Being 
conservative is good but may not be the best approach.
   
   We are looking for more broader/holistic fix of this, but I expect it would 
take time. That said, having point fixes before we reach to the holistic fix is 
unfortunately unavoidable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset [spark]

Reply via email to