[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

cloud-fan Thu, 18 Oct 2018 05:24:55 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22721
  
    I think it's reasonable to follow `InsertIntoHiveTable`, but it's better to 
provide more details about what changes in `InsertIntoHadoopFsRelationCommand`:
    1. what's refreshed? Previously we refreshed the data cache via path, and 
also refresh the file index. But the plan cache is still there. Now we refresh 
the plan cache. Since file index exists in the plan, so we don't need to 
refresh it if we refresh plan cache, but the data cache still needs to be 
refreshed.
    2. what's the performance impact? plan cache is very useful when reading 
partitioned tables, to avoid listing files repeatedly. But seems it's OK 
because we already refresh file index before, so we must re-list files after 
insertion.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

Reply via email to