That is an interesting use case Som! Oozie is tightly coupled with Hadoop
right now. You might be able to use custom ActionExecutor class and plug
it in to perform your Step2 to a Spark job in S3, the first step to
"launching" that action is still going to be a Hadoop Map task by Oozie.
And there would be problem to fetch secure delegation tokens from S3, in
order for launcher to launch that action.

--Mona

On 6/19/13 1:44 PM, "Som Satpathy" <[email protected]> wrote:

>Any body has anything to share here?
>
>Thanks,
>Som
>
>On Mon, Jun 17, 2013 at 3:25 PM, Som Satpathy <[email protected]>
>wrote:
>
>> Hi,
>>
>> I had a question regarding a use case where a user might want to design
>> and execute workflows on YARN/Mesos which runs hadoop and spark in
>> parallel. Does Oozie currently support or will it support in future
>>actions
>> that can execute in a framework agnostic manner?
>>
>> For simplicity let's consider an example - a user defines a workflow as
>> follows:
>>
>> Step1: Mapreduce over hadoop job to do some initial data processing
>> (Input: data from s3, output: data into s3)
>> Step2: Spark job to further process the data (Input: data from s3,
>>output:
>> data into s3)
>>
>> Here assumption is both hadoop and spark are running in the same cluster
>> managed by YARN/Mesos.
>>
>> Can oozie support such a usecase today?
>>
>> Thanks,
>> Som
>>

Reply via email to