Github user sujith71955 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22721#discussion_r231790964
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -183,13 +183,14 @@ case class InsertIntoHadoopFsRelationCommand(
refreshUpdatedPartitions(updatedPartitionPaths)
}
- // refresh cached files in FileIndex
- fileIndex.foreach(_.refresh())
- // refresh data cache if table is cached
- sparkSession.catalog.refreshByPath(outputPath.toString)
-
if (catalogTable.nonEmpty) {
+
sparkSession.sessionState.catalog.refreshTable(catalogTable.get.identifier)
--- End diff --
This is the reason i asked why in some flow we are initializing the stats
and for some flow we are not because of which stats will be none and
refreshTable will be never called.
in my PR i told the flow where i saw in insert flow we are not nitializing
the stats because of which refreshTable () flow will never be executed.
But before insert command you execute a select statement where stats will
be intialized and the relation will be cached, now if you execute insert query
refreshTable() will be called as this time the stats will be nonempty
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]