carsonwang commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/20303#issuecomment-458820365 @maropu , thanks a lot for reviewing this. I was working with @cloud-fan on much refactoring. So the code may not reflect the original design and there are other changes coming soon. For dynamically changing the reducer number, the new version should always work better than the existing one. The existing version may add additional shuffle to the plan but this is resolved in the new implementation. So I think it should be fine to replace the existing implementation. For benchmark result, we have some results on [Intel Developer Zone](https://software.intel.com/en-us/articles/spark-sql-adaptive-execution-at-100-tb). That is based on an earlier version and I expect better results now. Many companies like Baidu, JD.com also reported very good results. For the naming, the good side for `queryStage` is that it does represent plans running in a single Spark stage. One confusing is in RDD, we call the dependency as parent stages. But in AE, we call the dependency as child query stages because they are children in a tree. This is one feedback I received on the design doc. But I am open to any new names.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
