Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2848#discussion_r21501514
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
}
/**
+ * Download a file from `in` to `tempFile`, then move it to `destFile`,
checking whether
+ * `destFile` already exists, has the same contents as the downloaded
file, and can be
+ * overwritten.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param in InputStream to download.
+ * @param tempFile File path to download `in` to.
+ * @param destFile File path to move `tempFile` to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ */
+ private def downloadStreamAndMove(
+ url: String,
+ in: InputStream,
+ tempFile: File,
+ destFile: File,
+ fileOverwrite: Boolean): Unit = {
+
+ val out = new FileOutputStream(tempFile)
+ Utils.copyStream(in, out, closeStreams = true)
+ copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile =
true)
+
+ }
+
+ /**
+ * Copy file from `sourceFile` to `destFile`, checking whether
`destFile` already exists, has
+ * the same contents as the downloaded file, and can be overwritten.
Optionally removes
+ * `sourceFile` by moving instead of copying.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param sourceFile File path to copy/move from.
+ * @param destFile File path to copy/move to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ * @param removeSourceFile Whether to remove `sourceFile` after / as
part of moving/copying it to
+ * `destFile`.
+ */
+ private def copyFile(
+ url: String,
+ sourceFile: File,
+ destFile: File,
+ fileOverwrite: Boolean,
+ removeSourceFile: Boolean = false): Unit = {
+
+ var shouldCopy = true
--- End diff --
This is super-nitpicky of me, but I love to avoid mutability whenever
possible, so it would be nice to see if there was a clean way to remove this
variable. It looks like `shouldCopy=false` only in the case where the file
contents are the same, so maybe we could just add a `return` on that branch and
can remove the `shouldCopy` variable entirely. This would let us remove the
`if` on line 478,
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]