Hi, all,

I am just setting up to run Spark in standalone mode, as a (Univa) Grid
Engine job. I have been able to set up the appropriate environment
variables such that the master launches correctly, etc. In my setup, I
generate GE job-specific conf and log dirs.

However, I am finding that the SPARK_* environment variables are not passed
to the worker processes on different physical nodes since they are launched
via ssh. I have added echo commands to sbin/start-slaves.sh and
sbin/slaves.sh scripts and verified that they the appropriate SPARK_*
environment variables set.

Since I have a global installation of Spark, I would like not to have all
Spark jobs write to $SPARK_HOME/work and logs.

I know that ssh can read environment variables from ~/.ssh/environment, but
if a user qsubs 2 different Spark jobs, this won't handle it right.

I appreciate any suggestions.

Thanks,
    Dave Chin

Reply via email to