[ https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Wong resolved SPARK-27062. ---------------------------------- Resolution: Duplicate > CatalogImpl.refreshTable should register query in cache with received > tableName > ------------------------------------------------------------------------------- > > Key: SPARK-27062 > URL: https://issues.apache.org/jira/browse/SPARK-27062 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.2 > Reporter: William Wong > Priority: Minor > Labels: easyfix, pull-request-available > Original Estimate: 2h > Remaining Estimate: 2h > > If _CatalogImpl.refreshTable()_ method is invoked against a cached table, > this method would first uncache corresponding query in the shared state cache > manager, and then cache it back to refresh the cache copy. > However, the table was recached with only 'table name'. The database name > will be missed. Therefore, if cached table is not on the default database, > the recreated cache may refer to a different table. For example, we may see > the cached table name in driver's storage page will be changed after table > refreshing. > Here is related code on github for your reference. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala] > > > {code:java} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > if (isCached(table)) { > // Uncache the logicalPlan. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, > blocking = true) > // Cache it again. > sparkSession.sharedState.cacheManager.cacheQuery(table, > Some(tableIdent.table)) > } > } > {code} > > CatalogImpl cache table with received _tableName_, instead of > _tableIdent.table_ > {code:java} > override def cacheTable(tableName: String): Unit = { > sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), > Some(tableName)) } > {code} > > Therefore, I would like to propose aligning the behavior. RefreshTable method > should reuse the received _tableName_. Here is the proposed line of changes. > > {code:java} > sparkSession.sharedState.cacheManager.cacheQuery(table, > Some(tableIdent.table)) > {code} > to > {code:java} > sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName)){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org