Hi, Harshal
Based on doc, a coordinator job instance materialized each hour could take
multiple datasets as input event, e.g.:
<coordinator-app start=, end=, interval=60,>
<datasets>
  <dataset name="logs" />
  <dataset name="siteAccessStats"/>
</datasets>
..
<input-events>
   <data-in name="input1" ... />
   <data-in name="input2" ... />
...
</input-events>
<action>
        <workflow>
          <app-path />
          <configuration>
            <property>
              <name>wfInput1</name>
              <value>${coord:dataIn('input1')}</value>
            </property>
            <property>
              <name>wfInput2</name>
              <value>${coord:dataIn('input2')}</value>
            </property>
...
In the action workflow, you could potentially check each dataset's
readiness and handle accordingly. I haven't done such a complicated case
yet with my project, please feel free to let me know if I'm wrong.

-paul



On Tue, Jul 22, 2014 at 9:50 PM, Harshal Vora <[email protected]> wrote:

> Hi,
>
> Thanks for the reply Paul.
> Although this solution works, the issue is that you end up writing your
> own data dependency logic (i.e. to check that data sets required for each
> POI are processed based on the time zone of the POI).
> Data dependency is one of the major functionality provided by Oozie as
> compared to any other scheduler.
>
> Any thoughts on this?
>
> Regards,
>
>
> On 07/20/2014 04:27 AM, Paul Han wrote:
>
>> U could schedule coordinator job to "wake up" every hour or whatever
>> interval (>= 5 mins ?) to process POIs which are ready.
>>
>> Afterwards, I think this is one of the limits for oozie. It's not obvious
>> how to deal with variable sets of data out of box.
>>
>> I would process it with one "master" workflow, probably with help of java
>> action.
>>
>> Thanks,
>> Paul
>>
>>  On Jul 18, 2014, at 22:41, Harshal Vora <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Any ideas on this?
>>>
>>> Regards,
>>>
>>>  On 07/17/2014 10:37 AM, Harshal Vora wrote:
>>>> Hi,
>>>>
>>>> We are in a situation where we want to crunch data on a daily basis for
>>>> a set of Point of Interests(POI).
>>>> The issue is, these POI's have a different opening time and even worst,
>>>> different closing time, some even go beyond midnight.
>>>>
>>>> Also, they are in different time zones.
>>>> Clearly one coordinator job that keeps running at midnight will not
>>>> suffice the requirement.
>>>> Nor is it feasible to submit and maintain separate coordinator jobs for
>>>> each POI.
>>>>
>>>> Is there a better way to tackle this?
>>>>
>>>> Thanks,
>>>> Regards,
>>>>
>>>
>

Reply via email to