LantaoJin commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-662859641


   > how about to use `CACHE TABLE` command to do that?
   
   I think you mean `CACHE VIEW` since `CACHE TABLE` still needs to `DROP 
TABLE` manually.
   There are four reasons to build `temporary table` instead of `cache a 
temporary view`:
   1. The intermediate table which users want to create as a temporary table 
are always very large. To avoid OOM, user has to use `CACHE TABLE viewname 
OPTIONS('storageLevel', 'disk_only')`. It's not friend to SQL users. Users 
confuse what 'storageLevel' and 'disk_only' are.
   2. View is dynamic and table is static. Whatever the underly detail tables 
changes, accessing a view should always access the latest data. So when the 
underly detail tables of a view changed, Spark will recache all data to 
executors's local disks again. For a large intermediate table, this is not 
performance friendly.
   3. The storages between cached view and temporary table are different. Cache 
command stores the block in executors local disk which managed by 
`blockManager`, and data of temporary table is stored in external storage like 
HDFS. Local disks in executors are very limited and not easy to scale out. 
Besides, the data in HDFS can be organized by Parquet file format, this can 
highly benefits `Scan` operation and predicates pushdown.
   4. The accuracy of table statistics for cached view is easily expired. IIUC, 
table statistics for cached view are calculated when the cache operation 
occurs. When the data of under details tables changed, the statistics of a 
cached view won't be updated. So some optimization like AQE cannot work 
correctly. But to a temporary table, no problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to