LantaoJin commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-654198375
> If I write the output to a temp location and then create a temp view, is it similar to the temp table? Except that temp table can be removed when the session terminates. There is no path and materialized data for temp view. So the answer is no. You can simply treate a `temporary table` as a Spark permanent data source table which will be dropped automatically when the session closed. So the implementation is not complex. But the user cases of temporary table are more than it looks. A permanent metastore table needs more maintenance, creating a permanent metastore table needs write permission of database it related and folder permission of storage. For many ad-hoc user case (OLAP), a user may only have limited permission like read. So current, users use `temporary view` to implement their complicated queries. View will not give improved performance without data materialzation. So if user can create `temporary table`, no need to grant write permission in production databases, it is very convenience for users. Imaging the user case like below: A databricks runtime user login to the notebook. He/She will write some statements to do an analysis job. Maybe in databricks runtime, user has r/w permissions on his/her own default space (we call it workspace), but no write permission on production database (for example, database "dw"). Without `temporary table`, they may use temporary view in their SQL statements. Or they can create a temporary workspace/database (for example, database named "tony_work"), and create permanent tables in "tony_work" then drop them all when they logout (if they can logout without failure). But users may want to share their scripts (above SQL statements) to another user or batch account. He/She has to change their scripts since the batch account or another user don't have the write permission on database "tony_ work". So in our production, the `temporary table` feature is wildly used, especially for the Teradata users who migrated to Spark. @cloud-fan ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
