[GitHub] [spark] MaxGekk opened a new pull request #31066: [SPARK-34027][SQL] Refresh cache in `ALTER TABLE .. RECOVER PARTITIONS`

GitBox Wed, 06 Jan 2021 04:19:18 -0800


MaxGekk opened a new pull request #31066:
URL: https://github.com/apache/spark/pull/31066



   ### What changes were proposed in this pull request?
   Invoke `refreshTable()` from `CatalogImpl` which refreshes the cache in v1 
`ALTER TABLE .. RECOVER PARTITIONS`.
   
   ### Why are the changes needed?
   This fixes the issues portrayed by the example:
   ```sql
   spark-sql> create table tbl (col int, part int) using parquet partitioned by 
(part);
   spark-sql> insert into tbl partition (part=0) select 0;
   spark-sql> cache table tbl;
   spark-sql> select * from tbl;
   0    0
   spark-sql> show table extended like 'tbl' partition(part=0);
   default      tbl     false   Partition Values: [part=0]
   Location: 
file:/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
   ...
   ```
   Create new partition by copying the existing one:
   ```
   $ cp -r 
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=0
 
/Users/maximgekk/proj/recover-partitions-refresh-cache/spark-warehouse/tbl/part=1
   ```
   ```sql
   spark-sql> alter table tbl recover partitions;
   spark-sql> select * from tbl;
   0    0
   ```
   
   The last query must not return `0    1` since it has been recovered by 
`ALTER TABLE .. RECOVER PARTITIONS`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. After the changes for the example above:
   ```sql
   ...
   spark-sql> alter table tbl recover partitions;
   spark-sql> select * from tbl;
   0    0
   0    1
   ```
   
   ### How was this patch tested?
   By running the affected test suite:
   ```
   $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CachedTableSuite"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk opened a new pull request #31066: [SPARK-34027][SQL] Refresh cache in `ALTER TABLE .. RECOVER PARTITIONS`

Reply via email to