[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

GitBox Tue, 29 Sep 2020 21:37:39 -0700


c21 commented on pull request #29804:
URL: https://github.com/apache/spark/pull/29804#issuecomment-701151447



   @viirya - thanks for pointing out. With query cache, e.g. dataframe user 
calls `persist()`, we will store the query data as logical operator 
`InMemoryRelation` and later on with physical operator `InMemoryTableScanExec`. 
So if user caches the query which only reads the bucketed table, it will have 
regression later when user join/group-by/etc on cached table data.
   
   I am fine with disabling the feature by default, and let user opt-in case by 
case. As normally for SQL users (not dataframe), the cache query should not be 
very popular. WDYT? @maropu . thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

Reply via email to