Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10208#discussion_r47190317
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -331,6 +331,30 @@ private[spark] object Utils extends Logging {
}
/**
+ * A file name may contain some invalid url characters, such as " ".
This method will convert the
+ * file name to a raw path accepted by `java.net.URI(String)`.
+ *
+ * Note: the file name must not contain "/" or "\"
+ */
+ def encodeFileNameToURIRawPath(fileName: String): String = {
+ require(!fileName.contains("/") && !fileName.contains("\\"))
+ // `file` and `localhost` are not used. Just to prevent URI from
parsing `fileName` as
+ // scheme or host. The prefix "/" is required because URI doesn't
accept a relative path.
+ // We should remove it after we get the raw path.
+ new URI("file", null, "localhost", -1, "/" + fileName, null,
null).getRawPath.substring(1)
+ }
+
+ /**
+ * Get the file name from uri's raw path and decode it. The raw path of
uri must not end with "/".
+ */
+ def decodeFileNameInURI(uri: URI): String = {
+ val rawPath = uri.getRawPath
+ assert(!rawPath.endsWith("/"))
+ val rawFileName = rawPath.split("/").last
+ new URI("file:///" + rawFileName).getPath.substring(1)
--- End diff --
I created a method here so that I can write unit tests for this one to
confirm the desired behavior. For the security issue of `%2F`, I think the
server should take care of it, since the attacker can also use such special URI
without Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]