xkrogen commented on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-859941249
I realized that the existing logic in my PR, which was copied from the `ApplicationMaster`/driver, wouldn't properly handle `local` paths which used the `GATEWAY_ROOT_PATH` / `REPLACEMENT_ROOT_PATH` mechanism to use a different path on- vs. off-cluster. I guess this was previously a bug in the driver code, but which would only manifest itself with the combination of `cluster` mode and `local` URIs leveraging the gateway/replacement paths. New code follows the strategy used by the old code in `ExecutorRunnable` to perform local URI replacements as necessary, and since this code is shared between the driver and executors, it fixes the bug discussed above. I also make use of Java NIO APIs for performing the relative-to-absolute and path-to-URL conversions, instead of relying on the Java `File` API in combination with manual string manipulation to add a `file:` prefix. I added more tests in both `ClientSuite` and `YarnClusterSuite` for the various options, and also tested running a real job on a real YARN cluster which made use of: - `local` URI requiring gateway/replacement configs - `local` URI not requiring gateway/replacement configs - a relative file in the local working dir - a file on shared storage using `hdfs` scheme, for good measure Everything works as expected with the latest diff (only the 2nd through 4th would succeed with the previous). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
