sunchao commented on a change in pull request #31364:
URL: https://github.com/apache/spark/pull/31364#discussion_r566482732
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
##########
@@ -515,14 +515,19 @@ class CatalogImpl(sparkSession: SparkSession) extends
Catalog {
}
/**
- * Invalidates and refreshes all the cached data and metadata of the given
table or view.
- * For Hive metastore table, the metadata is refreshed. For data source
tables, the schema will
- * not be inferred and refreshed.
+ * The method fully refreshes a table or view with the given name including:
+ * 1. The relation cache in the session catalog. The method remove table
entry from the cache.
+ * 2. The file indexes of all relations used by the given view.
+ * 3. Table/View schema in the Hive Metastore if the SQL config
+ * `spark.sql.hive.caseSensitiveInferenceMode` is set to
`INFER_AND_SAVE`.
+ * 4. Cached data of the given table or view, and all its dependents that
refer to it.
+ * Cached data is cleared while keeping the table/view and all its
dependents as cached.
Review comment:
I'm a bit confused by this sentence, do you mean to say that existing
cached data will be cleared and the cache will be lazily filled when the next
time the table/view or the dependents are accessed?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##########
@@ -998,7 +998,26 @@ class SessionCatalog(
}
/**
- * Refresh the cache entry for a metastore table, if any.
+ * Refresh table entries in structures maintained by the session catalog
such as:
+ * - The map of temporary or global temporary view names to their logical
plans
+ * - The relation cache which maps table identifiers to their logical plans
+ *
+ * For temp views, it refreshes their logical plans, and as a consequence of
that it can refresh
+ * the file indexes of the base relations (`HadoopFsRelation` for instance)
used in the views.
+ * The method still keeps the views in the internal lists of session catalog.
+ *
+ * For tables/views, it removes their entries from the relation cache.
+ *
+ * The method is supposed to use in the following situations:
+ * 1. The logical plan of a table/view was changed, and cached table/view
data is cleared
+ * explicitly. For example, like in `AlterTableRenameCommand` which
re-caches the table
+ * itself. Otherwise if you need to re-fresh cached data, consider
using of
Review comment:
nit: re-fresh -> refresh?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
##########
@@ -515,14 +515,19 @@ class CatalogImpl(sparkSession: SparkSession) extends
Catalog {
}
/**
- * Invalidates and refreshes all the cached data and metadata of the given
table or view.
- * For Hive metastore table, the metadata is refreshed. For data source
tables, the schema will
- * not be inferred and refreshed.
+ * The method fully refreshes a table or view with the given name including:
+ * 1. The relation cache in the session catalog. The method remove table
entry from the cache.
Review comment:
nit: remove -> removes
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]