Hy Chris, first to your Several thoughts: > > 0. You are missing dots (but you told Andre that it was a huge typo... > not sure how that kind of typo happens). It happens, because I subsituted the real worker names to anonymize the configuration.
> 1. You should look into using "template" workers. Yes, the configuration would be cleaner, but is it a functional problem? > 2. Unless you really want to explicitly set all those properties, don't > set anything that is the same as a documented default. There's no > reason to specify all those details. In some cases, we set the same value as the default values to avoid problems whit changed defaults. > So, 10 requests were sent to Server1 during this minute? That sounds > reasonable, given: > > > workerServer1.retry_interval=100 > > That means that mod_jk will try 10 times per second to reach Server1 > when it's in an error state. > > > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761 > > 56 seems high, but that might be due to multiple httpd workers all > re-trying. > The number of requests in my analysis are really processed requests with RC=200 not just tries (unfortunatly i don´t see tries in Loglevel error). > > 10:54, 686 549 562 506 529 548 > > So, this is when Server1 becomes operational again? Did you have to > use the status worker to trigger mod_jk to allow it back into the > cluster, or did it recover on it's own? the behavior past 10:22, is reasonable to me, mod_jk recovered on its own there was no operational intervention until 10:37, when the faulted server was deaktivated in status and restarted. The problem is the behavior from 10:15 to 10 10:21, because no request was routed to the operational Servers. > You might want to consider configuring mod_jk to use a "ping_mode" for > activation management. Which ping_mode do you recommend me for trying this on a Server with that number of requests? "P" seems to be much overhead..... Thx Steffen Christopher Schultz <ch...@christopherschultz.net> schrieb am 19.09.2011 17:47:39: > Von: Christopher Schultz <ch...@christopherschultz.net> > An: Tomcat Users List <users@tomcat.apache.org> > Datum: 19.09.2011 17:48 > Betreff: Re: mod_jk doesn`t distribute and failover on tomcat-error > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Steffen, > > On 9/19/2011 4:49 AM, steffen.scheu...@fiducia.de wrote: > > If one out of 6 balanced Tomcat-Server, throws an > > OutOfMemory-Error, mod_jk doestn´t distribute any Reqeuest to the > > other Servers, until we restart the faultet Server or stop the > > Server via jkstatus. > > > > Here is a sample of the mod_jk-Logfile, while our Server1 got an > > OutOfMemoryError. From 10:15 to 10:22 non of the other Servers was > > distributet by mod_jk, even tough they had no problem at all. > > > > [snip] > > > > [Mon Sep 05 10:15:07 2011] [26959:22] [error] > > ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to > > backend failed. Tomcat is probably not started or is listening on > > the wrong port (errno=145) > > The good thing is that mod_jk could detect the error. I was going to > guess that Tomcat continued to return 200 responses or something > foolish like that. This ought to be easier to fix than what I was fearing. > > > I made an analysis, about the Number of Requests processed by each > > Server per Minute: > > > > Timest. Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 10:08, 750 821 829 792 > > 796 754 10:09, 677 630 624 635 617 647 10:10, 598 641 604 605 598 > > 624 10:11, 573 551 592 547 560 585 10:12, 634 613 616 628 662 623 > > 10:13, 680 708 634 617 735 771 10:14, 10 546 521 450 437 409 > > So, 10 requests were sent to Server1 during this minute? That sounds > reasonable, given: > > > workerServer1.retry_interval=100 > > That means that mod_jk will try 10 times per second to reach Server1 > when it's in an error state. > > > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761 > > 56 seems high, but that might be due to multiple httpd workers all > re-trying. > > > 10:24, 0 671 638 649 669 608 > > At this point, it looks like mod_jk has finally given up on the worker. > > What did the Tomcat status worker say at this point for the worker > "Server1"? > > > 10:54, 686 549 562 506 529 548 > > So, this is when Server1 becomes operational again? Did you have to > use the status worker to trigger mod_jk to allow it back into the > cluster, or did it recover on it's own? > > Do you know if the [error] messages in mod_jk.log above actually > correspond to client connection failures, or are they just notices > that one member of the cluster dropped-out? > > > workers.properties Configuration: > > > > worker.loadbalancer.type=lb > > > worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8 > > > > > worker.loadbalancer.sticky_session=True > > worker.loadbalancer.sticky_session_force=False > > worker.loadbalancer.method=Request > > worker.loadbalancer.lock=Optimistic > > You might want to set "retries" here. > > > ######################################################################### > > > > > # Worker loadbalancer Server1 # > > ######################################################################### > > > > worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain > > workerServer1.port=8800 workerServer1.socket_timeout=600 > > workerServer1.socket_keepalive=0 workerServer1.retries=7 > > workerServer1.retry_interval=100 > > workerServer1.connection_pool_timeout=600 > > workerServer1.lbfactor=100 workerServer1.connect_timeout=3000 > > workerServer1.prepost_timeout=3000 workerServer1.reply_timeout=0 > > workerServer1.recovery_options=0 workerServer1.activation=Active > > workerServer1.route=Server1 workerServer1.domain=Server1 > > workerServer1.redirect=- > > Several thoughts: > > 0. You are missing dots (but you told Andre that it was a huge typo... > not sure how that kind of typo happens). > 1. You should look into using "template" workers. > 2. Unless you really want to explicitly set all those properties, don't > set anything that is the same as a documented default. There's no > reason to specify all those details. > > > ######################################################################### > > > > > # Worker loadbalancer Server2 # > > ######################################################################### > > > > worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain > > Another huge typo? > > You might want to consider configuring mod_jk to use a "ping_mode" for > activation management. > > - -chris > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk53ZBsACgkQ9CaO5/Lv0PA3MwCfWTOGRC5fHXgIbDr0vfbI2Aq/ > EqUAn0ei7EtsqCW/iNkIhSOylVfc4odP > =NlXi > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > ---------------------------------------------------------------------------------------------------------------------------------------------- Fiducia IT AG Fiduciastraße 20 76227 Karlsruhe Sitz der Gesellschaft: Karlsruhe AG Mannheim HRB 100059 Vorsitzender des Aufsichtsrats: Gregor Scheller Vorsitzender des Vorstands: Michael Krings Stellv. Vorsitzender des Vorstands: Klaus-Peter Bruns Vorstand: Jens-Olaf Bartels, Carsten Pfläging, Hans-Peter Straberger Umsatzsteuer-ID.Nr. DE143582320, http://www.fiducia.de ----------------------------------------------------------------------------------------------------------------------------------------------