c21 commented on pull request #29804: URL: https://github.com/apache/spark/pull/29804#issuecomment-701151447
@viirya - thanks for pointing out. With query cache, e.g. dataframe user calls `persist()`, we will store the query data as logical operator `InMemoryRelation` and later on with physical operator `InMemoryTableScanExec`. So if user caches the query which only reads the bucketed table, it will have regression later when user join/group-by/etc on cached table data. I am fine with disabling the feature by default, and let user opt-in case by case. As normally for SQL users (not dataframe), the cache query should not be very popular. WDYT? @maropu . thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
