On 9 Sep 2014, at 21:04, Paul Chavez wrote:

If a workflow takes longer than the coordination interval to execute then the new workflow will be created and put into 'Waiting' state by default. There are concurrency settings that can allow more than one workflow to execute at a time. Since you will be moving data into a processing directoy, you can run with concurrency greater than one if the processing directory is unique per workflow instance.

this is what I did when I had to do something similar:
- I used dated directories in HDFS (created through webhdfs from the
  script pushing the data)
- set the action to time out so if it didn't find the expected file
  there after a while (file name is always the same), it would not stay
  in WAITING state forever.
- set concurrency to a few instances so there's no pileup

to be honest, I don't remember how i'm doing cleanup of the created
dirs, but you get the idea :)

D.Morel

Reply via email to