Thanks for the quick response and confirmation, Marcelo, I just opened https://issues.apache.org/jira/browse/SPARK-7725.
On Mon, May 18, 2015 at 9:02 PM, Marcelo Vanzin <[email protected]> wrote: > Hi Shay, > > Yeah, that seems to be a bug; it doesn't seem to be related to the default > FS nor compareFs either - I can reproduce this with HDFS when copying files > from the local fs too. In yarn-client mode things seem to work. > > Could you file a bug to track this? If you don't have a jira account I can > do that for you. > > > On Mon, May 18, 2015 at 9:38 AM, Shay Rojansky <[email protected]> wrote: > >> I'm having issues with submitting a Spark Yarn job in cluster mode when >> the cluster filesystem is file:///. It seems that additional resources >> (--py-files) are simply being skipped and not being added into the >> PYTHONPATH. The same issue may also exist for --jars, --files, etc. >> >> We use a simple NFS mount on all our nodes instead of HDFS. The problem >> is that when I submit a job that has files (via --py-files), these don't >> get copied across to the application's staging directory, nor do they get >> added to the PYTHONPATH. On startup, I can clearly see the message "Source >> and destination file systems are the same. Not copying", which is a result >> of the check here: >> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221 >> >> The compareFs function simply looks whether the scheme, host and port are >> the same, and if so (my case), simply skips the copy. While that in itself >> isn't a problem, the PYTHONPATH isn't updated either. >> > > > > -- > Marcelo >
