Greetings,

Thanks for the suggestion. I tried this and noted two things. The first is that one has to prepend `oozie.launcher` to the parameter in order for it to have an effect over the actual environment of the script. The second, is that when I did this the python script exited claiming it couldn't find the module pyspark.sql.types, which leads me to believe that `mapred.map.child.env` is being used underneath in order to pass some other environment variables, and being overwritten by me when I manually set it to a particular set of k=v pairs. I don't know if this is the case though, just concluding it off the observed behavior.

In the end I managed to get the ${wf:id()} result by appealing to the SparkConf object inside the SparkContext provided by Oozie for the spark action. I noticed in the stdout log that when the script is run, one of the command line parameters given to spark-submit is actually `--conf spark.oozie.job.id=${wf:id()}. So in the end, I was lucky and wf_id = sc._conf.get("spark.oozie.job.id") did the trick for me from within the script. However, I'd still like to find a way of doing it as I originally intended, which is by being able to access some environment variable set from the XML definition.


On 05/14/2018 05:58 AM, Peter Cseh wrote:
Hi!

There is no easy and straightforward way of doing this for the Spark
action, but you can take advantage of the fact that Oozie 4.1.0 uses
MapReduce to launch Spark.
Just put "mapred.map.child.env" in the action configuration using the
format k1=v1,k2=v2. EL functions should also work here.

Gp


On Thu, May 10, 2018 at 6:39 PM, Richard Primera <
richard.prim...@woombatcg.com> wrote:

Greetings,

How can I set an environment variable to be accessible from either a .jar
or .py script launched via a spark action?

The idea is to set the environment variable with the output of the EL
function ${wf:id()} from within the XML workflow definition, something
along these lines:

<jar>script.py</jar>

     <env>OOZIE_WORKFLOW_ID=${wf:id()}</env>

And then have the ability to do wf_id = os.getenv("OOZIE_WORKFLOW_ID")
from the script without having to pass them as command line arguments. The
thing about command line arguments is that they don't scale as well because
they rely on a specific ordering or some custom parsing implementation.
This can be done easily it seems with a shell action, but I've been unable
to find a similar straightforward way of doing it for a spark action.

Oozie Version: 4.1.0-cdh5.12.1




Reply via email to