Hi, Martin.
 
The problems I am going to talk about were discussed with Sergei and Julia
in November 2009 and twice in 2010. We have not found the reasonable
solution and decided to ask you for an advice.
That is why sometimes I will write 'we' instead of 'I'. 

In two recent commits I have fixed LDAP-related workflow and faced some
problems.
For some of them I have found a workaround (implemented by these commits) 
but it concerns some fundamental concepts of the OpenXPKI, which might need 
a sort of strategic turn of the main vector of the OpenXPKI development.
May be it is just a lack of the Fall Workshop.
 
The concrete problems, to begin discussion:
 
1)  new version of IPC::ShareLite changed it's interface 
    (this happened about a year ago):
    the constructor croaks in the case of the memory block does not exist,
    thus calling it like this
 
    $share = new IPC::ShareLite( -key     => $PID,
                                 -create  => 'no',
                                 -destroy => 'no' );
    may cause exception.
    The solution proposed is to put the call into 'eval' block. It works.
    I hope it is a bagatelle...
 
2)  $OpenXPKI::Server::Context::who_forked_me parameter was implemented to
    store the parent PID but it is overwritten in the case of "double-forking"
    which happens in the chain
    CSR workflow -> CertIsuue workflow -> LDAP workflow
    So LDAP workflow as the last child is responsible for executing
    activities in two upper-level workflows but cannot do it because
    who_forked_me does not contain the proper shared memory block (smb) key
    after the second fork.
    The solution proposed is:
    - use a hash %who_forked_me and fill it up like this:
      $OpenXPKI::Server::Context::who_forked_me{$workflow->id()} = [ getppid(),
$PID ]; 
      in OpenXPKI/Server/Workflow/Activity/Tools/ForkWorkflowInstance.pm 
    - use the hash values in 
      OpenXPKI/Server/Workflow/Condition/CheckForkedWorkflowChildren.pm 
      like this:
      my $my_forker =
$OpenXPKI::Server::Context::who_forked_me{$workflow->id()}->[0];
      my $my_forked =
$OpenXPKI::Server::Context::who_forked_me{$workflow->id()}->[1];
    It works. But see (3).
 
3)  Still the double-forking test (we had to check the correctness
    of the workflow forking mechanism) failed. The mechanism is
    implemented in ForkWorkflowInstance.pm, CheckForkedWorkflowChildren.pm and
    NotifyParentWorkflow.pm modules.
 
    There were two reasons.
 
    The first reason was that if the intermediate process
    (such as serving a CertIssue workflow) terminates before the
    last children (such as LDAP) is finished
    the $SIG{'CHLD'}  handler in ForkWorkflowInstance
    erases the shared memory block and the last child (grandchild) cannot
    execute activities in it's grandparent workflow.
 
    The second reason was that in such a case getppid() function
    cannot be used as a method to get shared memory block key
    as was done in CheckForkedWorkflowChildren.pm 
    (see the code after the debugging line ##! 16: 'MYDEBUG everything is done,
destroying shared memory').
    Parent is dead and so getppid() returns the PID of 'init' process.
    Using  the hash %who_forked_me mentioned above and disabling the handler
    solve that problem. The danger of memory leak appears in that case although
    I could not really get any leak while testing my OpenXPKI instance.
 
While working on the problem I tried the proposed cures on the
fresh-installed OpenXPKI. Certificate issue, LDAP publishing and forking test
work OK. No trouble.
But I cannot consider my local experience as a sufficient cause to
commit any of them. Let's come to the agreement first.
 
The handler activity is the only problem without a workaround at the moment.
Solutions like disabling the handler and cleaning the shared memory blocks with
a special activity seems ugly. Actually all the forking system seems ugly too
although it works OK for the time being as I can see reading mail-lists.
Sorry to say, but let me remind that we already had troubles in that
part of OpenXPKI twice (caching workflow conditions etc.).
Well, it is a normal way of developing of course.
But there are doubts to discuss.
 
1) All the mechanism is based on shared memory blocks - it means it is
not as safe as a database (atomic transactions and persistency).
It is a bottleneck. From three points of view:
development, reliability and possibly security.
 
2) doing forking with all that signals and handlers we actually double
the workflow structure for the sake of just one functionality:
'WAITING' until the child does something.
 
Possible alternatives, solutions and workarounds (all of them are
not pretty):
 
1) Do not use forking at all. Use CertIssue and LDAP workflows as blocks to be
inserted into CSR and other necessary workflow templates at the stage of
deployment for instance.
Then what to do with the bundle CSR?
 
2) Do not use double-forking at all and claim it as an OpenXPKI limitation.
Use LDAP workflow as a block to be inserted into CertIssue workflow
template at the stage of deployment for instance.
 
3) Do not use double-forking at all and claim it as an OpenXPKI limitation.
Insert LDAP workflow into CertIssue workflow and forget about the
problem.
 
4) I don't know ...
 
And summarizing the problems...

The general issue: once we had all workflows stored in DB, which was
robust, simple, universally standard and functional.
This used the main idea: give all the logic of business process to the workflow
sub-system.
Say with power shutdown we retained full logical state of the system. 
Also, this was simple to integrate with new possibilities.
 
Then later workflow forking was implemented based on polling.
And then the other workflow forking method came ( let's call it  'callback
method').
The main feature of the callback method is that the child workflow executes the
parent
workflow activity using shared memory for interprocess communication. 
I would say that at this moment we lost a good chunk of universality and
compatibility 
with not yet realized possibilities.  Also, power shutdown became a plague.
At the moment we have to manually double all logic of the system: 
    1) composing workflows 
and 
    2) composing a tree of processes in memory with interaction between
processes
       (termination handling, etc.). 
Our opinion is that this approach is a good way to heavy and hard-to-spot bugs.

Do you know a general justification, why we cannot live without this approach? 
As far as I understand the reason was implementing bundle CSRs processing.
I believe that communication via shared memory really speeds up the process.
But how high is the fee we pay for it...

Maybe we can go by only one of the ways: 
1) backward to only workflows without memory stuctures;
2) forward to only some tables in memory without workflows.

Or maybe we can still go mixing those ways.
 
I hope I could clear up the reasons of our doubts.
We really need your help so we call for your common sense and experience,
Martin.

Best regards,
Petr Grigoriev.






------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
OpenXPKI-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openxpki-devel

Reply via email to