viirya commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r445643405
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##########
@@ -124,11 +153,24 @@ private[hive] trait SaveAsHiveFile extends
DataWritingCommand {
val hiveVersion =
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+ logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+ s"scratch dir '$scratchDir' are used")
if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
oldVersionExternalTempPath(path, hadoopConf, scratchDir)
} else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {
- newVersionExternalTempPath(path, hadoopConf, stagingDir)
+ // HIVE-14270: Write temporary data to HDFS when doing inserts on tables
located on S3
+ // Copied from Context.java#getTempDirForPath of Hive 2.3.
+ if (supportSchemeToUseNonBlobStore(path)) {
+ // Hive sets session_path as
HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+ val HDFS_SESSION_PATH_KEY = "_hive.hdfs.session.path"
+ val sessionScratchDir =
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+ .client.getConf(HDFS_SESSION_PATH_KEY, "")
+ logDebug(s"session scratch dir '$sessionScratchDir' is used")
+ getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)
Review comment:
For the case of `hiveVersionsUsingNewExternalTempPath`, we use
`stagingDir` for temp path. But if `supportSchemeToUseNonBlobStore` is true, we
might use `scratchDir`.
I'm not sure about Hive's behavior. Is `scratchDir` still used after Hive
1.1?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]