MaxGekk opened a new pull request #31305:
URL: https://github.com/apache/spark/pull/31305


   ### What changes were proposed in this pull request?
   Invoke `CatalogImpl.refreshTable()` instead of `SessionCatalog.refreshTable` 
in v1 implementation of the `LOAD DATA` command. `SessionCatalog.refreshTable` 
just refreshes metadata comparing to `CatalogImpl.refreshTable()` which 
refreshes cached table data as well.
   
   ### Why are the changes needed?
   The example below portraits the issue:
   
   - Create a source table:
   ```sql
   spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
   spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
   spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
   default      src_tbl false   Partition Values: [part=0]
   Location: 
file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0
   ...
   ```
   - Load data from the source table to a cached destination table:
   ```sql
   spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
   spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
   spark-sql> CACHE TABLE dst_tbl;
   spark-sql> SELECT * FROM dst_tbl;
   1    1
   spark-sql> LOAD DATA LOCAL INPATH 
'/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' 
INTO TABLE dst_tbl PARTITION (part=0);
   spark-sql> SELECT * FROM dst_tbl;
   1    1
   ```
   The last query does not return new loaded data. 
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. After the changes, the example above works correctly:
   ```sql
   spark-sql> LOAD DATA LOCAL INPATH 
'/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' 
INTO TABLE dst_tbl PARTITION (part=0);
   spark-sql> SELECT * FROM dst_tbl;
   0    0
   1    1
   ```
   
   
   ### How was this patch tested?
   Added new test to `org.apache.spark.sql.hive.CachedTableSuite`:
   ```
   $ build/sbt -Phive -Phive-thriftserver "test:testOnly *CachedTableSuite"
   ```
   
   Authored-by: Max Gekk <[email protected]>
   Signed-off-by: Dongjoon Hyun <[email protected]>
   (cherry picked from commit f8bf72ed5d1c25cb9068dc80d3996fcd5aade3ae)
   Signed-off-by: Max Gekk <[email protected]>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to