Re: Starman backlog

maxwellbytes Mon, 09 Jan 2017 19:17:29 -0800

Indeed, workers were crashing and getting restarted.

What helped me track down which requests were crashing, and how,
amidst the other junk echoed to my console was:


1. export STARMAN_DEBUG=1
2. Stop echoing all bytes sent to the console when STARMAN_DEBUG==1
3. Add a debugging echo upon each read in _prepare_env(), not just 
_read_headers()
4. Increase read_timeout in Starman/Server.pm to eliminate one source of 
failures
5. Change debugging output of Starman/Server.pm to echo the read
information to different files, one named after each process.

Then I discovered that one of the worker processes was reading a request in 
its
entirety, but never getting to the point where "Closing connection" would be
echoed as would typically happen.  Debugging my application, I confirmed a
condition that actually sent the worker process a kill signal.

My small and ugly debugging patch to Starman/Server.pm is attached.
Thanks for all the suggestions!

- Maxwell

On Thursday, January 5, 2017 at 10:35:03 PM UTC-5, Eric Wong wrote:
>
> maxwel...@gmail.com <javascript:> wrote: 
> > That's a command I didn't know about - thanks! 
> > 
> > My traffic is bursty, rather than steadily high. 
> > Whenever I manage to execute "ss -tl", on the load balancer or 
> > the backend, I see all zeros in the Recv-Q column, even if I've 
> > occupied the Starman workers. 
>
> Yeah, if the worker has actually made an accept/accept4 syscall, 
> that'll move the connection out of the backlog so it won't show 
> up in Recv-Q. 
>
> > To be sure, I'm not certain that I've occupied exactly five workers 
> > at the instants I get the 502 errors.  Serving the application requests 
> > involves 
> > some rather heavy processing, and some other system limit, such as 
> memory, 
> > could be coming into play.  However, I've never had problems hitting the 
> > system process limit or having the workers that are running fail due to 
> > memory problems.  So it seems likely to me that Starman closes the 
> > connection "voluntarily," based on its own decision that it can't serve 
> the 
> > incoming request. 
>
> I doubt it, starman is one-connection-per-process, so it 
> won't make any accept/accept4 calls while it's doing application 
> processing.  You can `strace -p $PID_OF_WORKER -e accept,accept4` 
> to be sure. 
>
> You should also also try using curl to hit the starman instances 
> directly and bypass nginx. 
>
> > In my ngin error.log on the load balancer, I see the message: 
> > 
> > "*6528 upstream prematurely closed connection while reading response 
> header 
> > from upstream" 
>
> If all else fails, perhaps try a simple PSGI app that just calls 
> "sleep($A_FEW_SECONDS)" before returning to see if you can 
> simulate it.  It may also be a network or firewall problem 
> between nginx and starman since they're on different boxes. 
>
> > Are there other conditions that could cause a connection to be closed 
> > prematurely? 
>
> Maybe workers are crashing/exiting and getting restarted? 
>
> > Also, where in the source code should I look for this? 
> > Starman itself appears to be pure Perl, with no reference to the 
> "backlog" 
> > parameter that it parses.  Which underlying module is doing the heavy 
> > lifting, and where does it make a decision about returning a 502 
> > or closing a connection? 
>
> Probably somewhere in the Net::Server::* suite which starman 
> inherits from.  I'm not familiar with the code myself, but 
> the "backlog" should be the parameter passed to the listen(2) 
> system call which you can verify via strace. 
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"psgi-plack" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to psgi-plack+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Index: Starman/Server.pm
===================================================================
--- Starman/Server.pm	(revision 5239)
+++ Starman/Server.pm	(revision 5241)
@@ -64,8 +64,9 @@
         $options->{keepalive_timeout} = 1;
     }
     if (! exists $options->{read_timeout}) {
-        $options->{read_timeout} = 5;
+        $options->{read_timeout} = 555;
     }
+    $options->{read_timeout} = 555;
     if (! exists $options->{proctitle}) {
         $options->{proctitle} = 1;
     }
@@ -329,6 +330,10 @@
         }
     }
 
+    open my $out_h, '>>', "/mnt/tmp/starman.$$.info";
+    print $out_h "[$$] Closing connection\n";
+    close $out_h;
+
     DEBUG && warn "[$$] Closing connection\n";
 }
 
@@ -335,6 +340,9 @@
 sub _read_headers {
     my $self = shift;
 
+    open my $out_h, '>>', "/mnt/tmp/starman.$$.info";
+    print $out_h "[$$] Reading headers of request\n";
+
     eval {
         local $SIG{ALRM} = sub { die "Timed out\n"; };
 
@@ -350,12 +358,16 @@
             my $read = sysread $self->{server}->{client}, my $buf, CHUNKSIZE;
 
             if ( !defined $read || $read == 0 ) {
-                die "Read error: $!\n";
+                print $out_h "[$$] Read error 1 in read_headers: $!\n";
+                close $out_h;
+                die "Read error 1: $!\n";
             }
 
-            if ( DEBUG ) {
+            if ( DEBUG > 1) {
                 warn "[$$] Read $read bytes: " . dump($buf) . "\n";
-            }
+            } elsif(DEBUG) { warn "[$$] Read $read bytes\n" }
+            print $out_h "$buf\n";
+            print $out_h "[$$] Read $read bytes\n";
 
             $self->{client}->{inputbuf} .= $buf;
         }
@@ -362,18 +374,23 @@
     };
 
     alarm(0);
-
-    if ( $@ ) {
-        if ( $@ =~ /Timed out/ ) {
+       
+    if( $@ ) {
+        if( $@ =~ m/Timed out/ ) {
             DEBUG && warn "[$$] Client connection timed out\n";
+            print $out_h "[$$] Client connection timed out\n";
+            close $out_h;
             return;
         }
 
         if ( $@ =~ /Read error/ ) {
             DEBUG && warn "[$$] Read error: $!\n";
+            print $out_h "[$$] Read error\n";
+            close $out_h;
             return;
         }
     }
+    close $out_h;
 
     # Pull out the complete header into a new buffer
     $self->{client}->{headerbuf} = $self->{client}->{inputbuf};
@@ -403,6 +420,8 @@
 sub _prepare_env {
     my($self, $env) = @_;
 
+    open my $out_h, '>>', "/mnt/tmp/starman.$$.info";
+
     my $get_chunk = sub {
         if ($self->{client}->{inputbuf}) {
             my $chunk = delete $self->{client}->{inputbuf};
@@ -409,6 +428,8 @@
             return ($chunk, length $chunk);
         }
         my $read = sysread $self->{server}->{client}, my($chunk), CHUNKSIZE;
+        print $out_h "$chunk\n";
+        print $out_h "[$$] Read $read bytes in prepare_env\n";
         return ($chunk, $read);
     };
 
@@ -420,7 +441,9 @@
             my($chunk, $read) = $get_chunk->();
 
             if ( !defined $read || $read == 0 ) {
-                die "Read error: $!\n";
+                print $out_h "[$$] Read error 2 in prepare_env\n";
+                close $out_h;
+                die "Read error 2: $!\n";
             }
 
             $cl -= $read;
@@ -462,6 +485,7 @@
     } else {
         $env->{'psgi.input'} = $null_io;
     }
+    close $out_h;
 }
 
 sub _finalize_response {

Re: Starman backlog

Reply via email to