xkrogen commented on pull request #32810:
URL: https://github.com/apache/spark/pull/32810#issuecomment-859941249


   I realized that the existing logic in my PR, which was copied from the 
`ApplicationMaster`/driver, wouldn't properly handle `local` paths which used 
the `GATEWAY_ROOT_PATH` / `REPLACEMENT_ROOT_PATH` mechanism to use a different 
path on- vs. off-cluster. I guess this was previously a bug in the driver code, 
but which would only manifest itself with the combination of `cluster` mode and 
`local` URIs leveraging the gateway/replacement paths.
   
   New code follows the strategy used by the old code in `ExecutorRunnable` to 
perform local URI replacements as necessary, and since this code is shared 
between the driver and executors, it fixes the bug discussed above. I also make 
use of Java NIO APIs for performing the relative-to-absolute and path-to-URL 
conversions, instead of relying on the Java `File` API in combination with 
manual string manipulation to add a `file:` prefix.
   
   I added more tests in both `ClientSuite` and `YarnClusterSuite` for the 
various options, and also tested running a real job on a real YARN cluster 
which made use of:
   - `local` URI requiring gateway/replacement configs
   - `local` URI not requiring gateway/replacement configs
   - a relative file in the local working dir
   - a file on shared storage using `hdfs` scheme, for good measure
   
   Everything works as expected with the latest diff (only the 2nd through 4th 
would succeed with the previous).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to