Hy Chris,

first to your Several thoughts:
>
> 0. You are missing dots (but you told Andre that it was a huge typo...
>    not sure how that kind of typo happens).
It happens, because I subsituted the real worker names to anonymize the
configuration.

> 1. You should look into using "template" workers.
Yes, the configuration would be cleaner, but is it a functional problem?

> 2. Unless you really want to explicitly set all those properties, don't
>    set anything that is the same as a documented default. There's no
>    reason to specify all those details.
In some cases, we set the same value as the default values to avoid
problems whit changed defaults.


> So, 10 requests were sent to Server1 during this minute? That sounds
> reasonable, given:
>
> > workerServer1.retry_interval=100
>
> That means that mod_jk will try 10 times per second to reach Server1
> when it's in an error state.
>
> > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761
>
> 56 seems high, but that might be due to multiple httpd workers all
> re-trying.
>
The number of requests in my analysis are really processed requests with
RC=200 not just tries (unfortunatly i don´t see tries in Loglevel error).

> > 10:54, 686 549 562 506 529 548
>
> So, this is when Server1 becomes operational again? Did you have to
> use the status worker to trigger mod_jk to allow it back into the
> cluster, or did it recover on it's own?
the behavior past 10:22, is reasonable to me, mod_jk recovered on its own
there was no operational intervention until 10:37, when the faulted server
was deaktivated in status and restarted.

The problem is the behavior from 10:15 to 10 10:21, because no request was
routed to the operational Servers.

> You might want to consider configuring mod_jk to use a "ping_mode" for
> activation management.

Which ping_mode do you recommend me for trying this on a Server with that
number of requests?  "P" seems to be much overhead.....

Thx
Steffen


Christopher Schultz <ch...@christopherschultz.net> schrieb am 19.09.2011
17:47:39:

> Von: Christopher Schultz <ch...@christopherschultz.net>
> An: Tomcat Users List <users@tomcat.apache.org>
> Datum: 19.09.2011 17:48
> Betreff: Re: mod_jk doesn`t distribute and failover on tomcat-error
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Steffen,
>
> On 9/19/2011 4:49 AM, steffen.scheu...@fiducia.de wrote:
> > If one out of 6 balanced Tomcat-Server, throws an
> > OutOfMemory-Error, mod_jk doestn´t distribute any Reqeuest to the
> > other Servers, until we restart the faultet Server or stop the
> > Server via jkstatus.
> >
> > Here is a sample of the mod_jk-Logfile, while our Server1 got an
> > OutOfMemoryError. From 10:15 to 10:22 non of the other Servers was
> > distributet by mod_jk, even tough they had no problem at all.
> >
> > [snip]
> >
> > [Mon Sep 05 10:15:07 2011] [26959:22] [error]
> > ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to
> > backend failed. Tomcat is probably not started or is listening on
> > the wrong port (errno=145)
>
> The good thing is that mod_jk could detect the error. I was going to
> guess that Tomcat continued to return 200 responses or something
> foolish like that. This ought to be easier to fix than what I was
fearing.
>
> > I made an analysis, about the Number of Requests processed by each
> > Server per Minute:
> >
> > Timest.      Srv1 Srv2  Srv3  Srv4  Srv5  Srv6 10:08, 750 821 829 792
> > 796 754 10:09, 677 630 624 635 617 647 10:10, 598 641 604 605 598
> > 624 10:11, 573 551 592 547 560 585 10:12, 634 613 616 628 662 623
> > 10:13, 680 708 634 617 735 771 10:14, 10 546 521 450 437 409
>
> So, 10 requests were sent to Server1 during this minute? That sounds
> reasonable, given:
>
> > workerServer1.retry_interval=100
>
> That means that mod_jk will try 10 times per second to reach Server1
> when it's in an error state.
>
> > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761
>
> 56 seems high, but that might be due to multiple httpd workers all
> re-trying.
>
> > 10:24, 0 671 638 649 669 608
>
> At this point, it looks like mod_jk has finally given up on the worker.
>
> What did the Tomcat status worker say at this point for the worker
> "Server1"?
>
> > 10:54, 686 549 562 506 529 548
>
> So, this is when Server1 becomes operational again? Did you have to
> use the status worker to trigger mod_jk to allow it back into the
> cluster, or did it recover on it's own?
>
> Do you know if the [error] messages in mod_jk.log above actually
> correspond to client connection failures, or are they just notices
> that one member of the cluster dropped-out?
>
> > workers.properties Configuration:
> >
> > worker.loadbalancer.type=lb
> >
>
worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8

> >
> >
> worker.loadbalancer.sticky_session=True
> > worker.loadbalancer.sticky_session_force=False
> > worker.loadbalancer.method=Request
> > worker.loadbalancer.lock=Optimistic
>
> You might want to set "retries" here.
>
> >
#########################################################################
> >
> >
> # Worker loadbalancer Server1 #
> >
#########################################################################
> >
> >  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain
> > workerServer1.port=8800 workerServer1.socket_timeout=600
> > workerServer1.socket_keepalive=0 workerServer1.retries=7
> > workerServer1.retry_interval=100
> > workerServer1.connection_pool_timeout=600
> > workerServer1.lbfactor=100 workerServer1.connect_timeout=3000
> > workerServer1.prepost_timeout=3000 workerServer1.reply_timeout=0
> > workerServer1.recovery_options=0 workerServer1.activation=Active
> > workerServer1.route=Server1 workerServer1.domain=Server1
> > workerServer1.redirect=-
>
> Several thoughts:
>
> 0. You are missing dots (but you told Andre that it was a huge typo...
>    not sure how that kind of typo happens).
> 1. You should look into using "template" workers.
> 2. Unless you really want to explicitly set all those properties, don't
>    set anything that is the same as a documented default. There's no
>    reason to specify all those details.
>
> >
#########################################################################
> >
> >
> # Worker loadbalancer Server2 #
> >
#########################################################################
> >
> >  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain
>
> Another huge typo?
>
> You might want to consider configuring mod_jk to use a "ping_mode" for
> activation management.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk53ZBsACgkQ9CaO5/Lv0PA3MwCfWTOGRC5fHXgIbDr0vfbI2Aq/
> EqUAn0ei7EtsqCW/iNkIhSOylVfc4odP
> =NlXi
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>




----------------------------------------------------------------------------------------------------------------------------------------------


Fiducia IT AG
Fiduciastraße 20
76227 Karlsruhe

Sitz der Gesellschaft: Karlsruhe
AG Mannheim HRB 100059

Vorsitzender des Aufsichtsrats: Gregor Scheller
Vorsitzender des Vorstands: Michael Krings
Stellv. Vorsitzender des Vorstands: Klaus-Peter Bruns
Vorstand: Jens-Olaf Bartels, Carsten Pfläging, Hans-Peter Straberger

Umsatzsteuer-ID.Nr. DE143582320, http://www.fiducia.de
----------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to