[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

viirya Tue, 16 Oct 2018 18:48:02 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22707#discussion_r225759293
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
    @@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
               // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
               // version and we may not want to catch up new Hive version 
every time. We delete the
               // Hive partition first and then load data file into the Hive 
partition.
    -          if (oldPart.nonEmpty && overwrite) {
    -            oldPart.get.storage.locationUri.foreach { uri =>
    -              val partitionPath = new Path(uri)
    -              val fs = partitionPath.getFileSystem(hadoopConf)
    -              if (fs.exists(partitionPath)) {
    -                if (!fs.delete(partitionPath, true)) {
    -                  throw new RuntimeException(
    -                    "Cannot remove partition directory '" + 
partitionPath.toString)
    -                }
    -                // Don't let Hive do overwrite operation since it is 
slower.
    -                doHiveOverwrite = false
    +          if (overwrite) {
    +            val oldPartitionPath = 
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
    +              .getOrElse {
    +                ExternalCatalogUtils.generatePartitionPath(
    +                  partitionSpec,
    +                  partitionColumnNames,
    +                  HiveClientImpl.toHiveTable(table).getDataLocation)
    --- End diff --
    
    Looks correct as I saw we assign `CatalogTable.storage.locationUr` to 
HiveTable's data location.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

Reply via email to