Hi Richard! I'm happy you've found a workaround for your issue.
Yes, "SPARK_HOME" is set to the current working directory in SparkActionExecutor <https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/SparkActionExecutor.java#L111> Based on the surrounding code, the It should be merged with user-defined environment properties though, so the behavior you're experiencing sounds like a bug to me. The issue might be that SparkActionExecutor uses "mapred.child.env" <https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/SparkActionExecutor.java#L47> and not mapred.map.child.env and MapReduce overwrites one with the other instead of merging them. gp On Wed, May 16, 2018 at 11:11 PM, Richard Primera < richard.prim...@woombatcg.com> wrote: > Greetings, > > Thanks for the suggestion. I tried this and noted two things. The first is > that one has to prepend `oozie.launcher` to the parameter in order for it > to have an effect over the actual environment of the script. The second, is > that when I did this the python script exited claiming it couldn't find the > module pyspark.sql.types, which leads me to believe that > `mapred.map.child.env` is being used underneath in order to pass some other > environment variables, and being overwritten by me when I manually set it > to a particular set of k=v pairs. I don't know if this is the case though, > just concluding it off the observed behavior. > > In the end I managed to get the ${wf:id()} result by appealing to the > SparkConf object inside the SparkContext provided by Oozie for the spark > action. I noticed in the stdout log that when the script is run, one of the > command line parameters given to spark-submit is actually `--conf > spark.oozie.job.id=${wf:id()}. So in the end, I was lucky and wf_id = > sc._conf.get("spark.oozie.job.id") did the trick for me from within the > script. However, I'd still like to find a way of doing it as I originally > intended, which is by being able to access some environment variable set > from the XML definition. > > > > On 05/14/2018 05:58 AM, Peter Cseh wrote: > >> Hi! >> >> There is no easy and straightforward way of doing this for the Spark >> action, but you can take advantage of the fact that Oozie 4.1.0 uses >> MapReduce to launch Spark. >> Just put "mapred.map.child.env" in the action configuration using the >> format k1=v1,k2=v2. EL functions should also work here. >> >> Gp >> >> >> On Thu, May 10, 2018 at 6:39 PM, Richard Primera < >> richard.prim...@woombatcg.com> wrote: >> >> Greetings, >>> >>> How can I set an environment variable to be accessible from either a .jar >>> or .py script launched via a spark action? >>> >>> The idea is to set the environment variable with the output of the EL >>> function ${wf:id()} from within the XML workflow definition, something >>> along these lines: >>> >>> <jar>script.py</jar> >>> >>> <env>OOZIE_WORKFLOW_ID=${wf:id()}</env> >>> >>> And then have the ability to do wf_id = os.getenv("OOZIE_WORKFLOW_ID") >>> from the script without having to pass them as command line arguments. >>> The >>> thing about command line arguments is that they don't scale as well >>> because >>> they rely on a specific ordering or some custom parsing implementation. >>> This can be done easily it seems with a shell action, but I've been >>> unable >>> to find a similar straightforward way of doing it for a spark action. >>> >>> Oozie Version: 4.1.0-cdh5.12.1 >>> >>> >>> >> > -- *Peter Cseh *| Software Engineer cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------