Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/303#issuecomment-39998654
The issue with yarn-cluster is the following: SparkPi.scala uses
SparkContext.jarOfClass() to define which jar to add to the SparkContext. This
ends up adding the path of the jar without the "local:" prefix, which means the
jar is expected to be in the distributed cache (as per the comment in
SparkContext: "In order for this to work in yarn-cluster mode the user must
specify the --addjars option").
If you add "-addJars
/home/tgraves/test2/tgravescs-spark/examples/target/scala-2.10/spark-examples_2.10-assembly-1.0.0-SNAPSHOT.jar"
to your command line it works (well, it works for me), but it sort of defeats
the purpose of using local: URIs. I had a modified SparkPi in my tree that
hardcoded a local: URI for the addJar() argument, and that worked fine without
needing to add the extra argument (and did not incur in extra copying of the
jar around).
I'm not sure there's an easy way to fix this (how can SparkPi know to add
the jar with a local: URI without some kind of command line argument telling it
to do so?), but it's caused by the client code (in this case, SparkPi), so I'm
less concerned.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---