[GitHub] [spark] CHENXCHEN commented on pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem


CHENXCHEN commented on PR #36070:
URL: https://github.com/apache/spark/pull/36070#issuecomment-1091300982


   Staging Dir is generated based on the table location
   If we specify that the generated partition location must be placed under the 
table location, we need:
   1. Find the partition locations that are different from the table locations, 
change their locations, and then delete the old locations at 
[InsertIntoHadoopFsRelationCommand.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L142)
   2. Partition locations that are different from the table location need to be 
passed into the commiter as new task file at 
[FileFormatDataWriter.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala#L292)
   
   Hive's approach is to keep the path to the partition location and move files 
across file 
systems.[Hive.java](https://github.com/apache/hive/blob/rel/release-2.3.9/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3156)
   Whether our behavior should be consistent with that of hive?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] CHENXCHEN commented on pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem

Reply via email to