There also seems to be a second problem: Child processes are exiting without something being delivered where it's supposed to go. I can't really tell if this is what you mean, or what the details are, without more information.

--Rocco Caputo - [EMAIL PROTECTED]



I'll get more information on this question, this confuses me as well and I need to get an answer.

Hello, I am able to collect some more information about this performance issue, which is somehow related to perl exit call. Here is what's going on:

client (CGIs)<--->SOAP::Lite<--->CONDA::Component::SOAP::TCP::Server (StoreKeeper)<--->Worker<--->Wheel::Run

As you can see the server is implemented as an TCP server that interacts with a SOAP proxy. When request is accepted, server (in our case called storekeeper) creates an object named Worker that calls Wheel::Run. Wheel::Run forks and executes code handed to it by Worker. Worker roughly looked like this:

Worker::new {
  ....
  POE::Session->create(
       inline_states => {
           _start => sub {
               my ($kernel, $session, $heap, %args) =
                 @_[KERNEL, SESSION, HEAP, ARG0 .. $#_];

               $heap->{callback}      = $args{response_cbk};
               $heap->{prog}          = $args{prog};
               $heap->{progname}      = $args{progname};
               $heap->{stderr_cbk}    = $args{stderr_cbk};
               $heap->{stdout_cbk}    = $args{stdout_cbk};
               $heap->{timeout}       = $args{timeout};
               $heap->{token}         = $args{token};

               $heap->{fh} = tempfile("XXXXXXXX", DIR => $CACHE_ROOT);

               $kernel->yield('request', %args);
           },

           # Allow public to send this
           request => sub {
               my ($heap, %args) = @_[HEAP, ARG0 .. $#_];

               $heap->{'startime'} = time;
               $heap->{'shutdown'} = 0;

               $heap->{'wheel'} = POE::Wheel::Run->new(
                   Program => sub {
                       my (
                           $progname,
                           $prog,
                           $fh,
                           ....
my $res = &$prog(); # the work is done inside Wheel::Run 'CODE' part

                       store_fd($res, $fh);

                       # ok, we are done, exit nicely
                        exit(0);
                         ...
         }
         child => sub {
                        ....
retrieve_fd($res, $fh); # Pick up the result after child exit
                        .....
          }
...

We are able to identify that the exit(0) call takes roughly 100ms to be picked up by waitpid inside POE::Kernel. By changing exit(0) to
                       close($fh);
                       return;
we are able to make POE::Kernel pick up child exit status immediately within 5 ms. We suspect that exit(0) is taking too long to actually exit. I don't know why that's the case, perhaps it's an expensive system call, too many I/O synchronization..?

The reason that POE has to wait for a child to terminate is that the response object is being communicated through Storable serialization. This is very inefficient and I don't know why it's done that way and if it can be improved. Only when POE::Kernel reaps a child server process, it can ask the Wheel::Run Parent session to pick up the response.

At this point, we are hitting a wall again and it seems we have squeezed as much performance as possible out of the entire POE & CONDA marriage. My timing shows that a request is now served usually within 50 ms instead of 100+ ms before the exit(0) change.

11/01/2006 16:44:06.839295 : REQUEST: poe_entry_point called.
11/01/2006 16:44:06.871387 : REQUEST: poe_auth_client_request called.
11/01/2006 16:44:06.871501 : AUTHZ: About to check caps () for function FOO_REQUEST
11/01/2006 16:44:06.871681 : AUTHZ: Function LBUHM::nsp_get_health OK.
11/01/2006 16:44:06.872407 : REQUEST: poe_request called.
11/01/2006 16:44:06.875515 : REQUEST: poe_fork_damnit called. --> creates a new Worker->Wheel::Run (fork)
11/01/2006 16:44:06.876362 : REQUEST: poe_execute_next called.
11/01/2006 16:44:06.876509 : SCHEDULE: FunctionQueue::execute: about to execute FOO_REQUEST <-------------- 11/01/2006 16:44:06.934091 : WORKER: child exit status: 0. <------------- 11/01/2006 16:44:06.934536 : WORKER: session(4067) handled sigchild for 22396, status 0. <--------- This is where we have gained, this used to be slow
11/01/2006 16:44:06.934929 : REQUEST: poe_got_blocking_response called.
11/01/2006 16:44:06.939990 : WORKER: _stop: useragent, id 4067
11/01/2006 16:44:06.940767 : REQUEST: poe_execute_next called.
11/01/2006 16:44:06.941612 : SYSTEM: POE::Component::SOAP::TCP::Server::Handler session 4063 flushed

Again, your insights and comments are welcome!

best,
Fei

Reply via email to