On 9 Sep 2014, at 21:04, Paul Chavez wrote:
If a workflow takes longer than the coordination interval to execute then the new workflow will be created and put into 'Waiting' state by default. There are concurrency settings that can allow more than one workflow to execute at a time. Since you will be moving data into a processing directoy, you can run with concurrency greater than one if the processing directory is unique per workflow instance.
this is what I did when I had to do something similar: - I used dated directories in HDFS (created through webhdfs from the script pushing the data) - set the action to time out so if it didn't find the expected file there after a while (file name is always the same), it would not stay in WAITING state forever. - set concurrency to a few instances so there's no pileup to be honest, I don't remember how i'm doing cleanup of the created dirs, but you get the idea :) D.Morel
