MaxGekk commented on a change in pull request #31101:
URL: https://github.com/apache/spark/pull/31101#discussion_r554466741
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala
##########
@@ -41,6 +45,33 @@ trait AlterTableAddPartitionSuiteBase extends
command.AlterTableAddPartitionSuit
"The spec ([p1=]) contains an empty partition column value"))
}
}
+
+ test("SPARK-34055: refresh cache in partition adding") {
+ withTable("t") {
+ sql(s"CREATE TABLE t (id int, part int) USING parquet PARTITIONED BY
(part)")
+ sql("INSERT INTO t PARTITION (part=0) SELECT 0")
+ assert(!spark.catalog.isCached("t"))
+ sql("CACHE TABLE t")
+ assert(spark.catalog.isCached("t"))
+ checkAnswer(sql("SELECT * FROM t"), Seq(Row(0, 0)))
+
+ // Create new partition (part = 1) in the filesystem
+ val information = sql("SHOW TABLE EXTENDED LIKE 't' PARTITION (part =
0)")
Review comment:
Sure, as soon as this can be used in one more place, we will move this
to the common trait.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -485,6 +485,7 @@ case class AlterTableAddPartitionCommand(
catalog.createPartitions(table.identifier, batch, ignoreIfExists =
ifNotExists)
}
+ sparkSession.catalog.refreshTable(table.identifier.quotedString)
Review comment:
In this particular case, it doesn't matter because table size is
re-calculated by getting file statuses directly from the filesystem. So, the
cached data is not used in updating table stats.
I think we should review other places.
Just in case, I will update the test and check that table size is updated
after adding of the partition.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -485,6 +485,7 @@ case class AlterTableAddPartitionCommand(
catalog.createPartitions(table.identifier, batch, ignoreIfExists =
ifNotExists)
}
+ sparkSession.catalog.refreshTable(table.identifier.quotedString)
Review comment:
> Just in case, I will update the test and check that table size is
updated after adding of the partition.
Let me add such checking later independently from this PR. I think I have
found one more issue relating to looking for `HiveTableRelation` in the cache
of Cache Manager.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -485,6 +485,7 @@ case class AlterTableAddPartitionCommand(
catalog.createPartitions(table.identifier, batch, ignoreIfExists =
ifNotExists)
}
+ sparkSession.catalog.refreshTable(table.identifier.quotedString)
Review comment:
> Just in case, I will update the test and check that table size is
updated after adding of the partition.
Let me add such checking later independently from this PR. I think I have
found one more issue relating to looking for `HiveTableRelation` in the cache
of Cache Manager.
It seems cleaning the table stats made in
https://github.com/apache/spark/pull/30995 is not enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]