Github user cnauroth commented on the issue:
https://github.com/apache/spark/pull/17149
@HyukjinKwon , nice to meet you! I see I got notified here for a bit of
Hadoop `Path` knowledge, and particularly on Windows.
> Is it okay to use both URIs and local file paths for the input string for
org.apache.hadoop.fs.Path in general (when they are expected to be unescaped)?
Yes, this is correct.
Specifically on the topic of Windows, `Path` has special case logic for
handling a Windows-specific local file path. (This logic is only triggered if
it detects the runtime OS is Windows.) On Windows, I expect a call like `new
Path("C:\\foo\\bar").toUri` to yield a correct `URI` pointing at that local
file path, and further calling `toString` yields a correct `String`
representation of the path. Hadoop code often needs to take a path string that
is possibly a relative path and pass it through `Path` to make it absolute and
escape it according to Hadoop code expectations.
The standard invocation for doing this in the Hadoop code is `new
Path(...).toUri();` or `new Path(...).toUri().toString();`. This works across
all platforms. I don't have any knowledge of the Spark codebase, but I see
this patch uses similar invocations, so I expect it's good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]