maropu commented on issue #21668: [SPARK-24690][SQL] Add a config to control plan stats computation in LogicalRelation URL: https://github.com/apache/spark/pull/21668#issuecomment-557741390 Ah, thanks for the comment, @dongjoon-hyun! To be honest, I forgot the comment above... (thanks for reminding me). On second thoughts, yea, I personally think that this pr is still worth a try. Currently, in the master, `spark.sql.cbo.enabled=true` directly means the cost-based join reorder + `BasicStatsPlanVisitor`. Recently, the new features (e.g., the dynamic part pruning) depend on `LogicalPlanVisitor[Statistics] `. To use the dynamic part pruning + `BasicStatsPlanVisitor`, we need to set `spark.sql.cbo.enabled=true`. But, this also activates the cost-based join reorder. I think how to collect data stats (`BasicStatsPlanVisitor` or `SizeInBytesOnlyStatsPlanVisitor`) is orthogonal to join reorder logics and it'd better to be able to turn on/off them individually. What I propose is the two things as follows; - Add a new config to control how to collect data stats (this pr) - Since the name of `spark.sql.cbo.enabled` is ambiguous, rename it to `spark.sql.cbo.joinReorder.enabled` - If the dynamic part pruning is one of CBO features, rename `spark.sql.optimizer.dynamicPartitionPruning.enabled` to `spark.sql.cbo.dynamicPartitionPruning.enabled`? WDYT? @cloud-fan @dongjoon-hyun (off-topic: I personally think CBO is one of optimizer features, so better to move `spark.sql.cbo.enabled` to `spark.sql.optimizer.cbo.enabled`?)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
