[GitHub] [spark] LantaoJin commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

GitBox Mon, 06 Jul 2020 05:17:01 -0700


LantaoJin commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-654198375



   > If I write the output to a temp location and then create a temp view, is 
it similar to the temp table? Except that temp table can be removed when the 
session terminates.
   
   There is no path and materialized data for temp view. So the answer is no. 
You can simply treate a `temporary table` as a Spark permanent data source 
table which will be dropped automatically when the session closed. So the 
implementation is not complex. But the user cases of temporary table are more 
than it looks. A permanent metastore table needs more maintenance, creating a 
permanent metastore table needs write permission of database it related and 
folder permission of storage. For many ad-hoc user case (OLAP), a user may only 
have limited permission like read. So current, users use `temporary view` to 
implement their complicated queries. View will not give improved performance 
without data materialzation. So if user can create `temporary table`, no need 
to grant write permission in production databases, it is very convenience for 
users.
   
   Imaging the user case like below:
   A databricks runtime user login to the notebook. He/She will write some 
statements to do an analysis job. Maybe in databricks  runtime, user has r/w 
permissions on his/her own default space (we call it workspace), but no write 
permission on production database (for example, database "dw"). Without 
`temporary table`, they may use temporary view in their SQL statements. Or they 
can create a temporary workspace/database (for example, database named 
"tony_work"), and create permanent tables in "tony_work" then drop them all 
when they logout (if they can logout without failure). But users may want to 
share their scripts (above SQL statements) to another user or batch account. 
He/She has to change their scripts since the batch account or another user 
don't have the write permission on database "tony_ work". So in our production, 
the `temporary table` feature is wildly used, especially for the Teradata users 
who migrated to Spark. @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LantaoJin commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

Reply via email to