How about this: https://github.com/apache/incubator-spark/pull/326
-- Christopher T. Nguyen Co-founder & CEO, Adatao <http://adatao.com> linkedin.com/in/ctnguyen On Thu, Jan 2, 2014 at 11:07 PM, Matei Zaharia <[email protected]>wrote: > I agree that it would be good to do it only once, if you can find a nice > way of doing so. > > Matei > > On Jan 3, 2014, at 1:33 AM, Andrew Ash <[email protected]> wrote: > > In my spark-env.sh I append to the SPARK_CLASSPATH variable rather than > overriding it, because I want to support both adding a jar to all instances > of a shell (in spark-env.sh) and adding a jar to a single shell instance > (SPARK_CLASSPATH=/path/to/my.jar > /path/to/spark-shell) > > That looks like this: > > # spark-env.sh > export SPARK_CLASSPATH+=":/path/to/hadoop-lzo.jar" > > However when my Master and workers run, they have duplicates of the > SPARK_CLASSPATH jars. There are 3 copies of hadoop-lzo on the classpath, 2 > of which are unnecessary. > > The resulting command line in ps looks like this: > /path/to/java -cp > :/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:[core > spark jars] ... -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker > spark://my-host:7077 > > I tracked it down and the problem is that spark-env.sh is sourced 3 times: > in spark-daemon.sh, in compute-classpath.sh, and in spark-class. Each of > those adds to the SPARK_CLASSPATH until its contents are in triplicate. > > Are all of those calls necessary? Is it possible to edit the daemon > scripts to only call spark-env.sh once? > > FYI I'm starting the daemons with ./bin/start-master.sh and > ./bin/start-slave.sh 1 $SPARK_URL > > Thanks, > Andrew > > >
