[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

mgaido91 Mon, 16 Jul 2018 03:35:31 -0700

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19691#discussion_r202637455
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(
      *
      * The syntax of this command is:
      * {{{
    - *   ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, 
...] [PURGE];
    + *   ALTER TABLE table DROP [IF EXISTS] PARTITION (spec1, expr1)
    + *     [, PARTITION (spec2, expr2), ...] [PURGE];
      * }}}
      */
     case class AlterTableDropPartitionCommand(
         tableName: TableIdentifier,
    -    specs: Seq[TablePartitionSpec],
    +    partitions: Seq[(TablePartitionSpec, Seq[Expression])],
         ifExists: Boolean,
         purge: Boolean,
         retainData: Boolean)
    -  extends RunnableCommand {
    +  extends RunnableCommand with PredicateHelper {
     
       override def run(sparkSession: SparkSession): Seq[Row] = {
         val catalog = sparkSession.sessionState.catalog
         val table = catalog.getTableMetadata(tableName)
    +    val resolver = sparkSession.sessionState.conf.resolver
         DDLUtils.verifyAlterTableType(catalog, table, isView = false)
         DDLUtils.verifyPartitionProviderIsHive(sparkSession, table, "ALTER 
TABLE DROP PARTITION")
     
    -    val normalizedSpecs = specs.map { spec =>
    -      PartitioningUtils.normalizePartitionSpec(
    -        spec,
    -        table.partitionColumnNames,
    -        table.identifier.quotedString,
    -        sparkSession.sessionState.conf.resolver)
    +    val toDrop = partitions.flatMap { partition =>
    +      if (partition._1.isEmpty && !partition._2.isEmpty) {
    +        // There are only expressions in this drop condition.
    +        extractFromPartitionFilter(partition._2, catalog, table, resolver)
    +      } else if (!partition._1.isEmpty && partition._2.isEmpty) {
    +        // There are only partitionSpecs in this drop condition.
    +        extractFromPartitionSpec(partition._1, table, resolver)
    +      } else if (!partition._1.isEmpty && !partition._2.isEmpty) {
    +        // This drop condition has both partitionSpecs and expressions.
    +        extractFromPartitionFilter(partition._2, catalog, table, 
resolver).intersect(
    --- End diff --
    
    @DazhuangSu sorry I missed your last comment somehow.
    
    Why do you say it would not be inefficient if you have a lot of 
partitions?I think it would be! Imagine that you partition per year and day. 
And you want to get the first 6 months of this year. The spec would be 
something like `(year = 2018, day < 2018-07-01)`. Imagine we have a 10 years 
history. With the current implementation, we would get back basically all the 
the partitions from the filter, ie. roughly 3.650 and then it will intersect 
those. Anyway, my understanding is that such a case would not even work 
properly, as it would try drop the intersect of:
    ```
    Seq(Seq("year"-> "2018", "day" -> "2018-01-01", 
...)).intersect(Seq(Map("year"->"2018")))
    ```
    which would result in an empty Seq, so we would drop nothing. Moreover, I 
saw no test for this case in the tests. Can we add tests for this use case and 
can we add support for it if my understanding that it is not working is right? 
Thanks



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

Reply via email to