CHENXCHEN commented on PR #36070: URL: https://github.com/apache/spark/pull/36070#issuecomment-1091300982
Staging Dir is generated based on the table location If we specify that the generated partition location must be placed under the table location, we need: 1. Find the partition locations that are different from the table locations, change their locations, and then delete the old locations at [InsertIntoHadoopFsRelationCommand.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L142) 2. Partition locations that are different from the table location need to be passed into the commiter as new task file at [FileFormatDataWriter.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala#L292) Hive's approach is to keep the path to the partition location and move files across file systems.[Hive.java](https://github.com/apache/hive/blob/rel/release-2.3.9/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3156) Whether our behavior should be consistent with that of hive? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
