I agree that it would be good to do it only once, if you can find a nice way of 
doing so.

Matei

On Jan 3, 2014, at 1:33 AM, Andrew Ash <[email protected]> wrote:

> In my spark-env.sh I append to the SPARK_CLASSPATH variable rather than 
> overriding it, because I want to support both adding a jar to all instances 
> of a shell (in spark-env.sh) and adding a jar to a single shell instance 
> (SPARK_CLASSPATH=/path/to/my.jar /path/to/spark-shell)
> 
> That looks like this:
> 
> # spark-env.sh
> export SPARK_CLASSPATH+=":/path/to/hadoop-lzo.jar"
> 
> However when my Master and workers run, they have duplicates of the 
> SPARK_CLASSPATH jars.  There are 3 copies of hadoop-lzo on the classpath, 2 
> of which are unnecessary.
> 
> The resulting command line in ps looks like this:
> /path/to/java -cp 
> :/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:[core
>  spark jars] ... -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker 
> spark://my-host:7077
> 
> I tracked it down and the problem is that spark-env.sh is sourced 3 times: in 
> spark-daemon.sh, in compute-classpath.sh, and in spark-class.  Each of those 
> adds to the SPARK_CLASSPATH until its contents are in triplicate.
> 
> Are all of those calls necessary?  Is it possible to edit the daemon scripts 
> to only call spark-env.sh once?
> 
> FYI I'm starting the daemons with ./bin/start-master.sh and 
> ./bin/start-slave.sh 1 $SPARK_URL
> 
> Thanks,
> Andrew

Reply via email to