Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19691#discussion_r202637455
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(
*
* The syntax of this command is:
* {{{
- * ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2,
...] [PURGE];
+ * ALTER TABLE table DROP [IF EXISTS] PARTITION (spec1, expr1)
+ * [, PARTITION (spec2, expr2), ...] [PURGE];
* }}}
*/
case class AlterTableDropPartitionCommand(
tableName: TableIdentifier,
- specs: Seq[TablePartitionSpec],
+ partitions: Seq[(TablePartitionSpec, Seq[Expression])],
ifExists: Boolean,
purge: Boolean,
retainData: Boolean)
- extends RunnableCommand {
+ extends RunnableCommand with PredicateHelper {
override def run(sparkSession: SparkSession): Seq[Row] = {
val catalog = sparkSession.sessionState.catalog
val table = catalog.getTableMetadata(tableName)
+ val resolver = sparkSession.sessionState.conf.resolver
DDLUtils.verifyAlterTableType(catalog, table, isView = false)
DDLUtils.verifyPartitionProviderIsHive(sparkSession, table, "ALTER
TABLE DROP PARTITION")
- val normalizedSpecs = specs.map { spec =>
- PartitioningUtils.normalizePartitionSpec(
- spec,
- table.partitionColumnNames,
- table.identifier.quotedString,
- sparkSession.sessionState.conf.resolver)
+ val toDrop = partitions.flatMap { partition =>
+ if (partition._1.isEmpty && !partition._2.isEmpty) {
+ // There are only expressions in this drop condition.
+ extractFromPartitionFilter(partition._2, catalog, table, resolver)
+ } else if (!partition._1.isEmpty && partition._2.isEmpty) {
+ // There are only partitionSpecs in this drop condition.
+ extractFromPartitionSpec(partition._1, table, resolver)
+ } else if (!partition._1.isEmpty && !partition._2.isEmpty) {
+ // This drop condition has both partitionSpecs and expressions.
+ extractFromPartitionFilter(partition._2, catalog, table,
resolver).intersect(
--- End diff --
@DazhuangSu sorry I missed your last comment somehow.
Why do you say it would not be inefficient if you have a lot of
partitions?I think it would be! Imagine that you partition per year and day.
And you want to get the first 6 months of this year. The spec would be
something like `(year = 2018, day < 2018-07-01)`. Imagine we have a 10 years
history. With the current implementation, we would get back basically all the
the partitions from the filter, ie. roughly 3.650 and then it will intersect
those. Anyway, my understanding is that such a case would not even work
properly, as it would try drop the intersect of:
```
Seq(Seq("year"-> "2018", "day" -> "2018-01-01",
...)).intersect(Seq(Map("year"->"2018")))
```
which would result in an empty Seq, so we would drop nothing. Moreover, I
saw no test for this case in the tests. Can we add tests for this use case and
can we add support for it if my understanding that it is not working is right?
Thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]