[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

GitBox Mon, 08 Feb 2021 11:06:21 -0800


MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775373033



   > ... why is this necessary instead of deleting from the table or 
overwriting everything with no new records?
   
   1. By emulating table truncation via the insertion of no rows, you require 
atomic operations: delete + insert but a concrete implementation might not 
support this though it can atomically truncate a table.
   2. You close the room for truncation specific optimizations. If a catalog 
implementation would know in advance that we want to truncate the entire table 
instead of deleting all rows, it could do that in a more optimal way. Let's say 
some file based implementation could move the table folder to a trash folder 
using one atomic syscall.
   3. From security or permissions controls point of view, we could distinguish 
insert with overwrite (or delete) from truncation. I could imagine a case when 
some roles/users can have only truncation permissions but not insert or delete 
permissions.
   4.  Also it is possible that truncation op is just a record at catalog level 
log but inserts/deletes are records at table level logs. So, we cannot smoothly 
sit on such implementation if we emulate table truncation via inserts/deletes.
   
   In general, I do believe we should not hide our intention from catalog 
implementations - truncation should be explicit. Table catalog implementation 
should decide how to implement in a more optimal way. So, if they can emulate 
truncation via overwriting with no rows, ok, this is up to them.   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Reply via email to