I'm having issues with submitting a Spark Yarn job in cluster mode when the cluster filesystem is file:///. It seems that additional resources (--py-files) are simply being skipped and not being added into the PYTHONPATH. The same issue may also exist for --jars, --files, etc.
We use a simple NFS mount on all our nodes instead of HDFS. The problem is that when I submit a job that has files (via --py-files), these don't get copied across to the application's staging directory, nor do they get added to the PYTHONPATH. On startup, I can clearly see the message "Source and destination file systems are the same. Not copying", which is a result of the check here: https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221 The compareFs function simply looks whether the scheme, host and port are the same, and if so (my case), simply skips the copy. While that in itself isn't a problem, the PYTHONPATH isn't updated either.
