Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2848#discussion_r21557323
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
}
/**
+ * Download a file from `in` to `tempFile`, then move it to `destFile`,
checking whether
+ * `destFile` already exists, has the same contents as the downloaded
file, and can be
+ * overwritten.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param in InputStream to download.
+ * @param tempFile File path to download `in` to.
+ * @param destFile File path to move `tempFile` to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ */
+ private def downloadStreamAndMove(
+ url: String,
+ in: InputStream,
+ tempFile: File,
+ destFile: File,
+ fileOverwrite: Boolean): Unit = {
+
+ val out = new FileOutputStream(tempFile)
+ Utils.copyStream(in, out, closeStreams = true)
+ copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile =
true)
+
+ }
+
+ /**
+ * Copy file from `sourceFile` to `destFile`, checking whether
`destFile` already exists, has
+ * the same contents as the downloaded file, and can be overwritten.
Optionally removes
+ * `sourceFile` by moving instead of copying.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param sourceFile File path to copy/move from.
+ * @param destFile File path to copy/move to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ * @param removeSourceFile Whether to remove `sourceFile` after / as
part of moving/copying it to
+ * `destFile`.
+ */
+ private def copyFile(
+ url: String,
+ sourceFile: File,
+ destFile: File,
+ fileOverwrite: Boolean,
+ removeSourceFile: Boolean = false): Unit = {
+
+ var shouldCopy = true
+ if (destFile.exists) {
+ if (!Files.equal(sourceFile, destFile)) {
+ if (fileOverwrite) {
+ destFile.delete()
--- End diff --
This is a good point; the old code (which was a mess) didn't handle this
error case, but we might as well fix it here. I'm in favor of throwing an
exception.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]