Hi, The Java action is pretty useful for running "driver" programs for MR. i.e. Java code that configures and submits a MapReduce job. I'm not super familiar with how to submit Spark jobs, but I imagine that you can write a similar driver for a Spark Job and give it to the Java action. You'd have to make the necessary Spark jars available on the action's classpath. Oozie has a number of ways to do that, but the easiest is to put them in a directory named "lib" next to your workflow.xml. Other than including the jars in some way, nothing "special" should be needed :)
In the long run, a Spark action would be a nice convenience for users, especially since Spark is becoming more popular. Then Oozie could have a Spark sharelib with the necessary jar files and handle that automatically. (FYI: almost all of the action types are actually subclasses of the Java action where Oozie provides the driver and some integration logic to make things easier for the user) On Wed, Apr 9, 2014 at 4:59 PM, Segerlind, Nathan L < [email protected]> wrote: > Hi All. > > Is it possible to incorporate spark jobs into Oozie workflows? I've heard > that it is possible to do this as a Java action, but I've not seen an > example. If it is possible, does it the size of the workflow application > zip file - in particular would all the spark jars have to be included with > the workflow or could they be distributed about the cluster already? More > generally, does anything "special" have to be done to integrate Spark jobs > into Oozie? > > Thanks, > Nate >
