Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/14500#discussion_r73729927
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
---
@@ -827,6 +827,45 @@ class DDLSuite extends QueryTest with SharedSQLContext
with BeforeAndAfterEach {
testAddPartitions(isDatasourceTable = true)
}
+ test("alter table: recover partitions (sequential)") {
+ withSQLConf("spark.rdd.parallelListingThreshold" -> "1") {
+ testRecoverPartitions()
+ }
+ }
+
+ test("after table: recover partition (parallel)") {
+ withSQLConf("spark.rdd.parallelListingThreshold" -> "10") {
+ testRecoverPartitions()
+ }
+ }
+
+ private def testRecoverPartitions() {
+ val catalog = spark.sessionState.catalog
+ // table to alter does not exist
+ intercept[AnalysisException] {
+ sql("ALTER TABLE does_not_exist RECOVER PARTITIONS")
+ }
+
+ val tableIdent = TableIdentifier("tab1")
+ createTable(catalog, tableIdent)
+ val part1 = Map("a" -> "1", "b" -> "5")
+ createTablePartition(catalog, part1, tableIdent)
+ assert(catalog.listPartitions(tableIdent).map(_.spec).toSet ==
Set(part1))
+
+ val part2 = Map("a" -> "2", "b" -> "6")
+ val root = new
Path(catalog.getTableMetadata(tableIdent).storage.locationUri.get)
+ val fs = root.getFileSystem(spark.sparkContext.hadoopConfiguration)
+ fs.mkdirs(new Path(new Path(root, "a=1"), "b=5"))
+ fs.mkdirs(new Path(new Path(root, "a=2"), "b=6"))
+ try {
+ sql("ALTER TABLE tab1 RECOVER PARTITIONS")
+ assert(catalog.listPartitions(tableIdent).map(_.spec).toSet ==
+ Set(part1, part2))
+ } finally {
+ fs.delete(root, true)
+ }
+ }
--- End diff --
Let's add tests to exercise the command more. Here are three examples.
1. There is an partition dir has a bad name (not in the format of
key=value).
2. Say that we have two partition columns. We have some files under the
first layer (e.g. _SUCCESS, parquet's metadata files, and/or regular data
files).
3. Some dirs do not have the expected number of partition columns. For
example, the schema specifies 3 partition columns. But, a path only has two
partition columns.
4. The partition column columns encoded in the path does not match the name
specified in the schema. For example, when we create the table, we specify `c1`
as the first partition column. However, the dir in fs has `c2` as the first
partition column.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]