LantaoJin commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-662859641
> how about to use `CACHE TABLE` command to do that?
I think you mean `CACHE VIEW` since `CACHE TABLE` still needs to `DROP
TABLE` manually.
There are four reasons to build `temporary table` instead of `cache a
temporary view`:
1. The intermediate table which users want to create as a temporary table
are always very large. To avoid OOM, user has to use `CACHE TABLE viewname
OPTIONS('storageLevel', 'disk_only')`. It's not friend to SQL users. Users
confuse what 'storageLevel' and 'disk_only' are.
2. View is dynamic and table is static. Whatever the underly detail tables
changes, accessing a view should always access the latest data. So when the
underly detail tables of a view changed, Spark will recache all data to
executors's local disks again. For a large intermediate table, this is not
performance friendly.
3. The storages between cached view and temporary table are different. Cache
command stores the block in executors local disk which managed by
`blockManager`, and data of temporary table is stored in external storage like
HDFS. Local disks in executors are very limited and not easy to scale out.
Besides, the data in HDFS can be organized by Parquet file format, this can
highly benefits `Scan` operation and predicates pushdown.
4. The accuracy of table statistics for cached view is easily expired. IIUC,
table statistics for cached view are calculated when the cache operation
occurs. When the data of under details tables changed, the statistics of a
cached view won't be updated. So some optimization like AQE cannot work
correctly. But to a temporary table, no problem.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]