Hi All,as a follow up to the proposal on enhanced Workflow managment from last october, here are some ideas on implementing it.
Current situation: The actions in the workflows are marked either as autorun or end up in a state that expects a user interaction and are routed to frontend tasks (e.g. approval steps). There is no method to pause/resume a workflow o recover from a temporary failure.
Proposed Enhancements: All Activities should be able to catch foreseeable failures (e.g. availability of tokens or network connected resources) and provide a retry logic if the failure is expected to be temporary. Instead of making an obscure internal sleep/loop construct, the activity should be able to use the workflow engine to say "please execute me again in 5 Minutes".
The baseclass OpenXPKI::Server::Workflow::Activity is extended by a helper method "pause", that can be called from within the execute method of each activity. By default, the parameters are provided by the XML configuration as parameters in the activity definition:
* retry_count - max. number of retries before we fail finally * retry_interval time to wait between each retry (OpenXPKI DateTime String)The method should accept a single parameter which overrides the pause interval. There should be setters to manipulate both values from within the activity.
To wakeup the paused workflows, we need a new process inside the OpenXPKI Server daemon, I will call it "the watcher". The watcher is forked away once at the beginning of the server process as a standalone thread, which looks for resumable workflows once a minute. Paused workflows are determined using a new field "resume_at" in the "workflow" table. The field is a timestamp which is non zero for paused workflows. In addition, we introduce another field "processing status", which gives a verbose indication about what the workflow engine is doing with this workflow:
* running - real perl process/activity is executed * paused - waiting to be resumed * manual - regular stop in a non-autorun state * finished - reached the last state (usually SUCCESS)The watcher can also take care of aborted/crashed workflows. A first, simple approach would be: Add another timestamp field to the workflow database "reap_at" and set it to "now + 5 Minutes" on every start of an activity. If an activity knows that it will regularly take longer than 5 Minutes, this value must be increased or adjusted during runtime. If the time exceeds the time in "reap_at", the process can be considered dead and the watcher tries to run the last action again.
Improvements to support the above ideas: I consider it a good idea to introduce new methods "wakeup" and "restore" in the activity base class, which are called before the "execute" method, if a workflow is resumed from pause or a crash. Obviously there is also a need for setting the "reap_at" timestamps.
In an earlier discussion, Andreas Leibl suggested to set two timeouts (soft/hard) for different aspects of resuming a non-responsing workflow and provide a multi-node failover strategie. This will fit into this model by adding some more information to the workflow table.
I had a quick look at the Workwflow code base and think this ideas can be coded in a suitable manner, if anybody as objections or a better idea, comments are welcome.
regards Oliver -- Protect your environment - close windows and adopt a penguin! PGP-Key: 3B2C 8095 A7DF 8BB5 2CFF 8168 CAB7 B0DD 3985 1721
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________ OpenXPKI-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openxpki-devel
