[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286026540 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { Review comment: BTW, may i know where we can find the Spark 3.0 SQL reference? https://spark.apache.org/docs/latest/api/sql/index.html only lists the supported functions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286025787 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in temp db with qualified table name, and then check + // whether the table is still cached with the same name and storage level. + sql(s"REFRESH TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") Review comment: Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286025756 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in temp db with qualified table name, and then check + // whether the table is still cached with the same name and storage level. + sql(s"REFRESH TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refreshing with its qualified name.") + + // Change the active database to the temp db and refresh the table with unqualified + // table name, and then check whether the table is still cached with the same name and + // storage level. + // Without bug fix 'SPARK-27248', the recreated cache name will be changed to + // 'cachedTable', instead of '$db.cachedTable' + activateDatabase(db) { +sql("REFRESH TABLE cachedTable") +assertCached(sql("select * from cachedTable"), s"`$db`.`cachedTable`") Review comment: Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286025654 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in temp db with qualified table name, and then check + // whether the table is still cached with the same name and storage level. + sql(s"REFRESH TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refreshing with its qualified name.") + + // Change the active database to the temp db and refresh the table with unqualified + // table name, and then check whether the table is still cached with the same name and + // storage level. + // Without bug fix 'SPARK-27248', the recreated cache name will be changed to + // 'cachedTable', instead of '$db.cachedTable' + activateDatabase(db) { +sql("REFRESH TABLE cachedTable") +assertCached(sql("select * from cachedTable"), s"`$db`.`cachedTable`") +assert(spark.catalog.isCached("cachedTable"), + s"Table '$db.cachedTable' should be cached after refreshing with its " + +"unqualified name.") + } +} + } +} + + +// This section tests when a table is cached with its unqualified name but it is refreshed +// with its qualified name. +withTempDatabase { db => + withTable("cachedTable") { +withCache("cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql("CREATE TABLE cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in default db without qualified table name , and then + // check whether the table is cached with expected name. + sql("CACHE TABLE cachedTable") + assertCached(sql("select * from cachedTable"), "`cachedTable`") + assert(spark.catalog.isCached("cachedTable"), "Table 'cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in default db with unqualified table name, and then + // check whether the table is still cached with the same name. + sql("REFRESH TABLE cachedTable") + assertCached(sql("select * from cachedTable"), "`cachedTable`") + assert(spark.catalog.isCached("cachedTable"), +"Table 'cachedTable' should be cached after refreshing with its unqualified name.") + + // Change the active database to the temp db and refresh the table with qualified + // table name, and then check whether the table is still cached with the same name and + // storage level. + // Without bug fix 'SPARK-27248', the recreated cache name will be changed to + // 'default.cachedTable', instead of 'cachedTable' + activateDatabase(db) { +sql("REFRESH TABLE default.cachedTable") +assertCached(sql("select * from default.cachedTable"), "`cachedTable`") Review comment: Yes. Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286025716 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in temp db with qualified table name, and then check + // whether the table is still cached with the same name and storage level. + sql(s"REFRESH TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refreshing with its qualified name.") + + // Change the active database to the temp db and refresh the table with unqualified + // table name, and then check whether the table is still cached with the same name and + // storage level. + // Without bug fix 'SPARK-27248', the recreated cache name will be changed to + // 'cachedTable', instead of '$db.cachedTable' + activateDatabase(db) { +sql("REFRESH TABLE cachedTable") +assertCached(sql("select * from cachedTable"), s"`$db`.`cachedTable`") +assert(spark.catalog.isCached("cachedTable"), + s"Table '$db.cachedTable' should be cached after refreshing with its " + +"unqualified name.") + } +} + } +} + + +// This section tests when a table is cached with its unqualified name but it is refreshed +// with its qualified name. +withTempDatabase { db => + withTable("cachedTable") { +withCache("cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql("CREATE TABLE cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in default db without qualified table name , and then + // check whether the table is cached with expected name. + sql("CACHE TABLE cachedTable") + assertCached(sql("select * from cachedTable"), "`cachedTable`") Review comment: Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286020718 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in temp db with qualified table name, and then check + // whether the table is still cached with the same name and storage level. + sql(s"REFRESH TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refreshing with its qualified name.") + + // Change the active database to the temp db and refresh the table with unqualified + // table name, and then check whether the table is still cached with the same name and + // storage level. + // Without bug fix 'SPARK-27248', the recreated cache name will be changed to + // 'cachedTable', instead of '$db.cachedTable' + activateDatabase(db) { +sql("REFRESH TABLE cachedTable") +assertCached(sql("select * from cachedTable"), s"`$db`.`cachedTable`") +assert(spark.catalog.isCached("cachedTable"), + s"Table '$db.cachedTable' should be cached after refreshing with its " + +"unqualified name.") + } +} + } +} + + +// This section tests when a table is cached with its unqualified name but it is refreshed +// with its qualified name. +withTempDatabase { db => + withTable("cachedTable") { +withCache("cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql("CREATE TABLE cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in default db without qualified table name , and then + // check whether the table is cached with expected name. + sql("CACHE TABLE cachedTable") + assertCached(sql("select * from cachedTable"), "`cachedTable`") + assert(spark.catalog.isCached("cachedTable"), "Table 'cachedTable' should be cached.") + + // Refresh the table 'cachedTable' in default db with unqualified table name, and then + // check whether the table is still cached with the same name. + sql("REFRESH TABLE cachedTable") + assertCached(sql("select * from cachedTable"), "`cachedTable`") Review comment: Thanks. Will change the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286017653 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { Review comment: Thanks. I will update it accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r286017778 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +350,81 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempDatabase { db => + withTable(s"$db.cachedTable") { +withCache(s"$db.cachedTable") { + // Create table 'cachedTable' in default db for testing purpose. + sql(s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key") + + // Cache the table 'cachedTable' in temp db with qualified table name, + // and then check whether the table is cached with expected name + sql(s"CACHE TABLE $db.cachedTable") + assertCached(sql(s"select * from $db.cachedTable"), s"`$db`.`cachedTable`") Review comment: Thanks. Will change the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r285373211 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -248,57 +244,91 @@ private[sql] trait SQLTestUtilsBase !spark.sessionState.catalog.functionExists(FunctionIdentifier(functionName)), s"Function $functionName should have been dropped. But, it still exists.") } -} +) } /** * Drops temporary view `viewNames` after calling `f`. */ protected def withTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // temp views that never got created. - try viewNames.foreach(spark.catalog.dropTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView)) } /** * Drops global temporary view `viewNames` after calling `f`. */ protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // global temp views that never got created. - try viewNames.foreach(spark.catalog.dropGlobalTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinallyBlock(f)(viewNames.foreach(spark.catalog.dropGlobalTempView)) } /** * Drops table `tableName` after calling `f`. */ protected def withTable(tableNames: String*)(f: => Unit): Unit = { -try f finally { +tryWithFinallyBlock(f)( tableNames.foreach { name => spark.sql(s"DROP TABLE IF EXISTS $name") } -} +) } /** * Drops view `viewName` after calling `f`. */ protected def withView(viewNames: String*)(f: => Unit): Unit = { -try f finally { +tryWithFinallyBlock(f)( viewNames.foreach { name => spark.sql(s"DROP VIEW IF EXISTS $name") } +) + } + + /** + * Drops cache `cacheName` after calling `f`. + */ + protected def withCache(cacheNames: String*)(f: => Unit): Unit = { +tryWithFinallyBlock(f)(cacheNames.foreach(uncacheTable)) + } + + /** + * Executes the given tryBlock and then the given finallyBlock no matter whether tryBlock throws + * an exception. If both tryBlock and finallyBlock throw exceptions, the exception thrown + * from the finallyBlock with be added to the exception thrown from tryBlock as a + * suppress exception. It helps to avoid masking the exception from tryBlock with exception + * from finallyBlock + */ + private def tryWithFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = { Review comment: Many thanks for keeping review this PR and also your comments. I reverted the implementation of 'withCache' and I will submit another PR for 'tryWithFinallyBlock'. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r285373159 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -248,57 +244,91 @@ private[sql] trait SQLTestUtilsBase !spark.sessionState.catalog.functionExists(FunctionIdentifier(functionName)), s"Function $functionName should have been dropped. But, it still exists.") } -} +) } /** * Drops temporary view `viewNames` after calling `f`. */ protected def withTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // temp views that never got created. - try viewNames.foreach(spark.catalog.dropTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView)) Review comment: They are not equivalent. I thought if an unexpected exception happened before the finally block, that exception should be thrown instead. I hope the function 'tryWithFinallyBlock' does the right thing. Understood the concern and the last commit is just reverted. I will submit another PR for the function 'tryWithFinallyBlock'. I hope it make senses. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r284789009 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +352,87 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { + +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempPath { path => + withTempDatabase { db => +withTable(s"$db.cachedTable") { + withCache(s"$db.cachedTable") { + +// Create table 'cachedTable' in default db for testing purpose. +sql(s"CREATE TABLE $db.cachedTable(key STRING) USING PARQUET LOCATION '${path.toURI}'") Review comment: Hi @dongjoon-hyun, I modified the test further with following assert statements to make sure the test is running against Hive tables. Be honest, I am not sure whether this assert statement is necessary or not. If it is trivial and is not a good idea, I will remove them. ``` val isHive = spark.table(s"$db.cachedTable") .queryExecution .optimizedPlan .collectFirst { case _: HiveTableRelation => true } .getOrElse(false) assert(isHive, s"$db.cachedTable must be a HiveTable") ``` In addition to that, table creation statements are also updated with keyword `USING HIVE` to ``` sql(s"CREATE TABLE $db.cachedTable(id INT, val STRING) USING HIVE") sql("CREATE TABLE cachedTable(id INT, val STRING) USING HIVE") ``` I hope it can make it clear the test is for HIVE tables. Appreciate if you can help to review the PR again when you have time. Many thanks. Best regards, William This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282959517 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +352,87 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { + +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempPath { path => + withTempDatabase { db => +withTable(s"$db.cachedTable") { + withCache(s"$db.cachedTable") { + +// Create table 'cachedTable' in default db for testing purpose. +sql(s"CREATE TABLE $db.cachedTable(key STRING) USING PARQUET LOCATION '${path.toURI}'") Review comment: Hi @dongjoon-hyun , Sorry about this. The SQL is update to: - s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key" - "CREATE TABLE cachedTable AS SELECT 1 AS key" I also modified the function `withCache` in SQLTestUtils to avoid exception masking. Before the modification, any exception throwing from finally block may mask the exception throwing from try block. I hope the created function `withFinallyBlock` does the right job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282959517 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -361,4 +352,87 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto assert(spark.sharedState.cacheManager.lookupCachedData(samePlan).isDefined) } } + + test("SPARK-27248 refreshTable should recreate cache with same cache name and storage level") { + +// This section tests when a table is cached with its qualified name but its is refreshed with +// its unqualified name. +withTempPath { path => + withTempDatabase { db => +withTable(s"$db.cachedTable") { + withCache(s"$db.cachedTable") { + +// Create table 'cachedTable' in default db for testing purpose. +sql(s"CREATE TABLE $db.cachedTable(key STRING) USING PARQUET LOCATION '${path.toURI}'") Review comment: Sorry about this. The SQL is update to: - s"CREATE TABLE $db.cachedTable AS SELECT 1 AS key" - "CREATE TABLE cachedTable AS SELECT 1 AS key" I also modified the function `withCache` in SQLTestUtils to avoid exception masking. Before the modification, any exception throwing from finally block may mask the exception throwing from try block. I hope the created function `withFinallyBlock` does the right job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282571083 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -299,6 +299,28 @@ private[sql] trait SQLTestUtilsBase } } + /** + * Drops cache `cacheName` after calling `f`. + */ + protected def withCache(cacheNames: String*)(f: => Unit): Unit = { +try f catch { + case cause: Throwable => throw cause +} finally { + cacheNames.foreach(uncacheTable) Review comment: The error was because I didn't make the table creation SQL correct and therefore the `uncacheTable` operation failed. Just fix the SQL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282571083 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -299,6 +299,28 @@ private[sql] trait SQLTestUtilsBase } } + /** + * Drops cache `cacheName` after calling `f`. + */ + protected def withCache(cacheNames: String*)(f: => Unit): Unit = { +try f catch { + case cause: Throwable => throw cause +} finally { + cacheNames.foreach(uncacheTable) Review comment: The error was because I didn't make the table creation SQL correct and therefore the `uncacheTable` operation failed. Somehow, the exception thrown by `uncacheTable` masked the true failure.. The `withGlobalTempView` function does it in a better way by explicitly ignoring `NoSuchTableException` in the finally clause. ``` try f finally { // If the test failed part way, we don't want to mask the failure by failing to remove // global temp views that never got created. try viewNames.foreach(spark.catalog.dropGlobalTempView) catch { case _: NoSuchTableException => } } ``` I would like to enhance `withCache` function in a similar way, but not sure what exception we should ignore. In Scala, do we have anything similar to java's suppress exception? Exception thrown by 'close()' method in a try-with-resource block will be added back to the original exception as a suppressed exception.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282552019 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -299,6 +299,28 @@ private[sql] trait SQLTestUtilsBase } } + /** + * Drops cache `cacheName` after calling `f`. + */ + protected def withCache(cacheNames: String*)(f: => Unit): Unit = { +try f catch { + case cause: Throwable => throw cause +} finally { + cacheNames.foreach(uncacheTable) Review comment: Many thanks. Working on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r282153000 ## File path: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ## @@ -205,6 +206,38 @@ abstract class QueryTest extends PlanTest { planWithCaching) } + /** + * Asserts that a given [[Dataset]] will be executed using the cache with the given name and + * storage level. + */ + def assertCached(query: Dataset[_], + cachedName: String, Review comment: Thanks. Will fix the indention according to the contribution page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r281727513 ## File path: docs/sql-migration-guide-upgrade.md ## @@ -50,6 +50,8 @@ license: | - In Spark version 2.4 and earlier, JSON datasource and JSON functions like `from_json` convert a bad JSON record to a row with all `null`s in the PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the returned row can contain non-`null` fields if some of JSON column values were parsed and converted to desired types successfully. + - Refreshing a cached table would trigger a table uncache operation and then a table cache (lazily) operation. In Spark version 2.4 and earlier, the cache name and storage level are not preserved before the uncache operation. Therefore, the cache name and storage level could be changed unexpectedly. Since Spark 3.0, cache name and storage level will be first preserved for cache recreation. It helps to maintain a consistent cache behavior upon table refreshing. Review comment: Yes. A similar test is added with those 'CACHE TABLE' and 'REFRESH TABLE'. However, we cannot control the storage level with SQL statement 'CACHE TABLE'.. The test skips related validation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r281727562 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -985,4 +976,59 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext val queryStats3 = query().queryExecution.optimizedPlan.stats.attributeStats assert(queryStats3.map(_._1.name).toSet === Set("c0", "v1", "v2")) } + + test("Refresh Qualified Tables") { Review comment: Thanks. Just updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r281728846 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -985,4 +976,59 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext val queryStats3 = query().queryExecution.optimizedPlan.stats.attributeStats assert(queryStats3.map(_._1.name).toSet === Set("c0", "v1", "v2")) } + + test("Refresh Qualified Tables") { +withTempDatabase { db => + withTempPath { path => +spark.catalog.createTable( + s"$db.cachedTable", + "PARQUET", + StructType(Array(StructField("key", StringType))), + Map("LOCATION" -> path.toURI.toString)) +withCache(s"$db.cachedTable") { + spark.catalog.cacheTable(s"$db.cachedTable", MEMORY_ONLY) + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached") + + spark.catalog.refreshTable(s"$db.cachedTable") + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refresh") + + activateDatabase(db) { +assertCached(spark.table("cachedTable"), s"$db.cachedTable", MEMORY_ONLY) +assert(spark.catalog.isCached("cachedTable"), + "Table 'cachedTable' should be cached after refresh") + +spark.catalog.refreshTable(s"cachedTable") +assertCached(spark.table("cachedTable"), s"$db.cachedTable", MEMORY_ONLY) +assert(spark.catalog.isCached("cachedTable"), + "Table 'cachedTable' should be cached after refresh") + } +} + } +} + } + + test("Refresh Unqualified Tables") { Review comment: Yes. They are confusing.. I refactor the test to have two sections. The first section caches a table with qualified table name and then refreshes the table with its unqualified table name. The second section caches a table with its unqualified table name and then refreshes the table with its qualified table name. They help to make sure cache name would not be updated due to table refreshing. I hope the test looks better now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r281727067 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -985,4 +976,59 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext val queryStats3 = query().queryExecution.optimizedPlan.stats.attributeStats assert(queryStats3.map(_._1.name).toSet === Set("c0", "v1", "v2")) } + + test("Refresh Qualified Tables") { +withTempDatabase { db => + withTempPath { path => +spark.catalog.createTable( + s"$db.cachedTable", + "PARQUET", + StructType(Array(StructField("key", StringType))), + Map("LOCATION" -> path.toURI.toString)) +withCache(s"$db.cachedTable") { + spark.catalog.cacheTable(s"$db.cachedTable", MEMORY_ONLY) + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached") + + spark.catalog.refreshTable(s"$db.cachedTable") + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refresh") + + activateDatabase(db) { +assertCached(spark.table("cachedTable"), s"$db.cachedTable", MEMORY_ONLY) +assert(spark.catalog.isCached("cachedTable"), + "Table 'cachedTable' should be cached after refresh") Review comment: I would like to keep the cache without uncaching it on line 1000. It is because i would like to make sure refreshing the table again with unqualified table name would not change the cache name. I have refactor the test a bit with some more explanations. I hope it looks better now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level
William1104 commented on a change in pull request #24221: [SPARK-27248][SQL] `refreshTable` should recreate cache with same cache name and storage level URL: https://github.com/apache/spark/pull/24221#discussion_r281725661 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -985,4 +976,59 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext val queryStats3 = query().queryExecution.optimizedPlan.stats.attributeStats assert(queryStats3.map(_._1.name).toSet === Set("c0", "v1", "v2")) } + + test("Refresh Qualified Tables") { +withTempDatabase { db => + withTempPath { path => +spark.catalog.createTable( + s"$db.cachedTable", + "PARQUET", + StructType(Array(StructField("key", StringType))), + Map("LOCATION" -> path.toURI.toString)) +withCache(s"$db.cachedTable") { + spark.catalog.cacheTable(s"$db.cachedTable", MEMORY_ONLY) + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached") + + spark.catalog.refreshTable(s"$db.cachedTable") + assertCached(spark.table(s"$db.cachedTable"), s"$db.cachedTable", MEMORY_ONLY) + assert(spark.catalog.isCached(s"$db.cachedTable"), +s"Table '$db.cachedTable' should be cached after refresh") + + activateDatabase(db) { +assertCached(spark.table("cachedTable"), s"$db.cachedTable", MEMORY_ONLY) +assert(spark.catalog.isCached("cachedTable"), + "Table 'cachedTable' should be cached after refresh") + +spark.catalog.refreshTable(s"cachedTable") Review comment: updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org