Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22721
I think it's reasonable to follow `InsertIntoHiveTable`, but it's better to
provide more details about what changes in `InsertIntoHadoopFsRelationCommand`:
1. what's refreshed? Previously we refreshed the data cache via path, and
also refresh the file index. But the plan cache is still there. Now we refresh
the plan cache. Since file index exists in the plan, so we don't need to
refresh it if we refresh plan cache, but the data cache still needs to be
refreshed.
2. what's the performance impact? plan cache is very useful when reading
partitioned tables, to avoid listing files repeatedly. But seems it's OK
because we already refresh file index before, so we must re-list files after
insertion.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]