Hi Folks,

please find below a first idea on how to extend the current crypto 
backend and the workflow engine to support asyncronous key operations 
when using offline/offsite CA.

For each kind of such CA, we provide an own set of  
OpenXPKI::Crypto::API and its relateded classes. For most applications 
it would be sufficient to subclass from the present Backend::OpenSSL 
classes and just override the commands which differ from the default model.

If an async operation is started, the main workflow returns into a 
"waiting for async operation" state. To complete the workflow after the 
async operation was done, we need to poll the async process for 
completion and restart/continue the workflow which is not natively 
possible with the current workflow model.
A possible Solution based on the current workflow engine might be to 
fork of a polling process before the main workflow returns to the 
waiting state and reinject the workflow (or a superseeding one) on 
success. For async operations with a long latency or on heavy loaded 
systems this will not scale well as we need to keep one fork alive until 
the async request returns. Besides, a restart of the daemon will 
terminate all polling processes resulting in stale workflows that need 
manual care.

Therefore I suggest a modification of the workflow engine to introduce 
pause/auto-pickup of workflows.
Basic idea: create a workflow status table which records the current 
status of all unfinished workflows. Possible states are:
* running - regularly executed by a running process
* poll - workflow is in a state the needs regular polling
* crashed - no handling process found but not in a regular state

Each workflow registers and removes itself from the table with a running 
state or marks itself for polling. An external entity (thread inside the 
dameon) will check and trigger the requested polls. The "crash" states 
can handle several error situations, each one demanding its own logic:
1) If the daemon is restarted, all unfinished processes can be 
considered dead
2) Regularly check if the process is still alive (requires recording of 
the PID and might be a bit tricky when using forks)
3) In distributed environments (t.b.d!) the unavailybilty of a node is 
equal to 1) - requires the recording of the working node

The approach might also deliver the groundwork for load balancing or 
special purpose nodes in a distributed environment. For example, a 
resource intensive task will register itself as "paused" and discontinue 
if the system load is to high or even if the current node is not suited 
for the requested operation. Obviously, it requires another decission 
system to delegate paused/unsuited jobs to another node.

Comments welcome ;)

Oliver

-- 
Protect your environment -  close windows and adopt a penguin!
PGP-Key: 3B2C 8095 A7DF 8BB5 2CFF  8168 CAB7 B0DD 3985 1721


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
OpenXPKI-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openxpki-devel

Reply via email to