[GitHub] [spark] maropu commented on issue #21668: [SPARK-24690][SQL] Add a config to control plan stats computation in LogicalRelation

GitBox Fri, 22 Nov 2019 16:19:34 -0800

maropu commented on issue #21668: [SPARK-24690][SQL] Add a config to control 
plan stats computation in LogicalRelation
URL: https://github.com/apache/spark/pull/21668#issuecomment-557741390
 
 
   Ah, thanks for the comment, @dongjoon-hyun! To be honest, I forgot the 
comment above... (thanks for reminding me).
   
   On second thoughts, yea, I personally think that this pr is still worth a 
try. Currently, in the master, `spark.sql.cbo.enabled=true` directly means the 
cost-based join reorder + `BasicStatsPlanVisitor`. Recently, the new features 
(e.g., the dynamic part pruning) depend on `LogicalPlanVisitor[Statistics] `. 
To use the dynamic part pruning + `BasicStatsPlanVisitor`, we need to set 
`spark.sql.cbo.enabled=true`. But, this also activates the cost-based join 
reorder.
   
    I think how to collect data stats (`BasicStatsPlanVisitor` or 
`SizeInBytesOnlyStatsPlanVisitor`) is orthogonal to join reorder logics and 
it'd better to be able to turn on/off them individually.
   
   What I propose is the two things as follows;
    - Add a new config to control how to collect data stats (this pr)
    - Since the name of `spark.sql.cbo.enabled` is ambiguous, rename it to 
`spark.sql.cbo.joinReorder.enabled`
    - If the dynamic part pruning is one of CBO features, rename 
`spark.sql.optimizer.dynamicPartitionPruning.enabled` to 
`spark.sql.cbo.dynamicPartitionPruning.enabled`?
   
   WDYT? @cloud-fan @dongjoon-hyun 
   
   (off-topic: I personally think CBO is one of optimizer features, so better 
to move `spark.sql.cbo.enabled` to `spark.sql.optimizer.cbo.enabled`?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on issue #21668: [SPARK-24690][SQL] Add a config to control plan stats computation in LogicalRelation

Reply via email to