I have three workflows which I wish to coordinate. * WF-A partitions a single input into multiple outputs * WF-B aggregates the partitions of all WF-A workflows at the time it is run * WF-C processes a single aggregate partition created by WF-B There are some more constraints on this system:
* WF-A is started by an external process. Its start time is random. Each WF-A is independent of the others. * WF-B cannot run concurrently with another WF-B. * Each WF-C is independent of the others, except that no two WF-C can process the same partition simultaneously, and if a WF-C is successful another WF-C will not reprocess its data again. * The entire system should be driven by the external process which launches WF-A (I.e there is no clock in this system) I feel like this system may be expressible with Oozie using coordinators (and perhaps bundles), and some custom Map Reduce actions. However I would appreciate some thoughts on how I might construct this, as it isn¹t completely clear to me how to proceed. Thanks, Chris
smime.p7s
Description: S/MIME cryptographic signature
