Github user ryan-williams commented on a diff in the pull request:
https://github.com/apache/spark/pull/2848#discussion_r21711212
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
}
/**
+ * Download a file from `in` to `tempFile`, then move it to `destFile`,
checking whether
+ * `destFile` already exists, has the same contents as the downloaded
file, and can be
+ * overwritten.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param in InputStream to download.
+ * @param tempFile File path to download `in` to.
+ * @param destFile File path to move `tempFile` to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ */
+ private def downloadStreamAndMove(
+ url: String,
+ in: InputStream,
+ tempFile: File,
+ destFile: File,
+ fileOverwrite: Boolean): Unit = {
+
+ val out = new FileOutputStream(tempFile)
+ Utils.copyStream(in, out, closeStreams = true)
+ copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile =
true)
+
+ }
+
+ /**
+ * Copy file from `sourceFile` to `destFile`, checking whether
`destFile` already exists, has
+ * the same contents as the downloaded file, and can be overwritten.
Optionally removes
+ * `sourceFile` by moving instead of copying.
+ *
+ * @param url URL that `sourceFile` originated from, for logging
purposes.
+ * @param sourceFile File path to copy/move from.
+ * @param destFile File path to copy/move to.
+ * @param fileOverwrite Whether to delete/overwrite an existing
`destFile` that does not match
+ * `sourceFile`
+ * @param removeSourceFile Whether to remove `sourceFile` after / as
part of moving/copying it to
+ * `destFile`.
+ */
+ private def copyFile(
+ url: String,
+ sourceFile: File,
+ destFile: File,
+ fileOverwrite: Boolean,
+ removeSourceFile: Boolean = false): Unit = {
+
+ var shouldCopy = true
+ if (destFile.exists) {
+ if (!Files.equal(sourceFile, destFile)) {
+ if (fileOverwrite) {
+ destFile.delete()
+ logInfo(
+ s"File $destFile exists and does not match contents of $url,
replacing it with $url"
+ )
+ } else {
+ throw new SparkException(
+ s"File $destFile exists and does not match contents of $url")
+ }
+ } else {
+ // Do nothing if the file contents are the same, i.e. this file
has been copied
+ // previously.
+ logInfo(
+ s"${sourceFile.getAbsolutePath} has been previously copied to " +
+ destFile.getAbsolutePath
--- End diff --
I've changed it to a version that uses `String.format`:
```
logInfo(
"%s has been previously copied to %s".format(
sourceFile.getAbsolutePath,
destFile.getAbsolutePath
)
)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]