Re: How To Set Environment Variables For Spark Action Script From XML Definition

Richard Primera Wed, 16 May 2018 14:11:37 -0700

Greetings,

Thanks for the suggestion. I tried this and noted two things. The firstis that one has to prepend `oozie.launcher` to the parameter in orderfor it to have an effect over the actual environment of the script. Thesecond, is that when I did this the python script exited claiming itcouldn't find the module pyspark.sql.types, which leads me to believethat `mapred.map.child.env` is being used underneath in order to passsome other environment variables, and being overwritten by me when Imanually set it to a particular set of k=v pairs. I don't know if thisis the case though, just concluding it off the observed behavior.

In the end I managed to get the ${wf:id()} result by appealing to theSparkConf object inside the SparkContext provided by Oozie for the sparkaction. I noticed in the stdout log that when the script is run, one ofthe command line parameters given to spark-submit is actually `--confspark.oozie.job.id=${wf:id()}. So in the end, I was lucky and wf_id =sc._conf.get("spark.oozie.job.id") did the trick for me from within thescript. However, I'd still like to find a way of doing it as Ioriginally intended, which is by being able to access some environmentvariable set from the XML definition.



On 05/14/2018 05:58 AM, Peter Cseh wrote:

Hi!

There is no easy and straightforward way of doing this for the Spark
action, but you can take advantage of the fact that Oozie 4.1.0 uses
MapReduce to launch Spark.
Just put "mapred.map.child.env" in the action configuration using the
format k1=v1,k2=v2. EL functions should also work here.

Gp


On Thu, May 10, 2018 at 6:39 PM, Richard Primera <
richard.prim...@woombatcg.com> wrote:

Greetings,

How can I set an environment variable to be accessible from either a .jar
or .py script launched via a spark action?

The idea is to set the environment variable with the output of the EL
function ${wf:id()} from within the XML workflow definition, something
along these lines:

<jar>script.py</jar>

     <env>OOZIE_WORKFLOW_ID=${wf:id()}</env>

And then have the ability to do wf_id = os.getenv("OOZIE_WORKFLOW_ID")
from the script without having to pass them as command line arguments. The
thing about command line arguments is that they don't scale as well because
they rely on a specific ordering or some custom parsing implementation.
This can be done easily it seems with a shell action, but I've been unable
to find a similar straightforward way of doing it for a spark action.

Oozie Version: 4.1.0-cdh5.12.1

Re: How To Set Environment Variables For Spark Action Script From XML Definition

Reply via email to