HeartSaVioR commented on PR #46820: URL: https://github.com/apache/spark/pull/46820#issuecomment-2143313356
> Do we have a general princile for this? It looks like all statistics-based optimization are risky, as the query plan for each micro batch have different statistics and the optimizer may rewrite the plan differently. There are multiple approaches e.g. never allow streaming source as empty source node but we need to also consider the perf regression. Being conservative is good but may not be the best approach. We are looking for more broader/holistic fix of this, but I expect it would take time. That said, having point fixes before we reach to the holistic fix is unfortunately unavoidable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
