Problems with /balancer-manager
Hi, i have the following situation Apache is balancing requests to backend JBoss Server . Everything (the balancing of requests to the webcontainer (tomcat) of jboss)works fine - except i cannot get the balancer-manager working. Of course the GUI appears but after clicking on a worker link nothing happens. Apache 2.2.3 on Suse Linux Enterprise Version 10 Proxy balancer://portal Order deny,allow Allow from all BalancerMember ajp://lx-tpor01..xxx.xx:8009/portal route=jboss11 BalancerMember ajp://lx-tpor01..xxx.xx:18009/portal route=jboss12 and so on... /Proxy ProxyPass /portal balancer://portal stickysession=JSESSIONID lbmethod=byrequests nofailover=Off ProxyPass /balancer-manager ! Location /balancer-manager SetHandler balancer-manager Order Deny,Allow Deny from all Allow from xx /Location I got the follwing gui LoadBalancer Status for balancer://portal StickySession Timeout FailoverAttempts Method JSESSIONID 0 7 byrequests Worker URL Route RouteRedir Factor Status ajp://lx-tpor01..xxx.xx:8009/portal jboss11 1 Ok ajp://lx-tpor01..xxx.xx:18009/portal jboss12 1 Ok and so on but if i disable one Jboss instance the status remains on ok, and if i click on a worker url i don't get the possibility to edit the attribute - only the url is changing without any change in the gui. also when i click on the balancer nothing happens. i appreciate any help - thanxs in advance ahmed -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problems with /balancer-manager
Hallo Rainer, Thanxs for your quick answer - i will talk to my responsible collegue about upgrading Apache - could bee a Problem because its is in the Suse Bundle. I have also added the question to the apache mailinglist - maybe i will get a tip ...from there. Thanxs for your answer und ebenfalls schöne Grüße nach Bonn ciao ahmed Original-Nachricht Datum: Thu, 15 May 2008 12:25:05 +0200 Von: Rainer Jung [EMAIL PROTECTED] An: Tomcat Users List users@tomcat.apache.org Betreff: Re: Problems with /balancer-manager Hallo Ahmed, Ahmed Musa wrote: Hi, i have the following situation Apache is balancing requests to backend JBoss Server . Everything (the balancing of requests to the webcontainer (tomcat) of jboss)works fine - except i cannot get the balancer-manager working. Of course the GUI appears but after clicking on a worker link nothing happens. Apache 2.2.3 on Suse Linux Enterprise Version 10 Proxy balancer://portal Order deny,allow Allow from all BalancerMember ajp://lx-tpor01..xxx.xx:8009/portal route=jboss11 BalancerMember ajp://lx-tpor01..xxx.xx:18009/portal route=jboss12 and so on... /Proxy ProxyPass /portal balancer://portal stickysession=JSESSIONID lbmethod=byrequests nofailover=Off ProxyPass /balancer-manager ! Location /balancer-manager SetHandler balancer-manager Order Deny,Allow Deny from all Allow from xx /Location I got the follwing gui LoadBalancer Status for balancer://portal StickySession Timeout FailoverAttempts Method JSESSIONID 0 7 byrequests Worker URL Route RouteRedir Factor Status ajp://lx-tpor01..xxx.xx:8009/portal jboss11 1 Ok ajp://lx-tpor01..xxx.xx:18009/portal jboss12 1 Ok and so on but if i disable one Jboss instance the status remains on ok, and if i click on a worker url i don't get the possibility to edit the attribute - only the url is changing without any change in the gui. also when i click on the balancer nothing happens. i appreciate any help - thanxs in advance ahmed I just tried it with httpd 2.2.8 and it works for me. Although there seeems to be no fit in the httpd changelog, 2.2.3 is a little early in the 2.2.x release cycle and the balancer and balancer manager were new in 2.2.x, so if nothing else helps, upgrading to a more recent 2.2.x (like 2.2.8 or 2.2.9 expected in a few weeks) would be worth trying. I think I remember having it used with 2.2.6, but I didn't try it with 2.2.3 or earlier. Others? Apart from that: the httpd users list might be a better place to ask, because this seems not to be related to some difficult AJP13 stuff instead it seems to be a more general httpd mod_proxy_* issue. Regards und Grüße nach Wien Rainer - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JkRequestLogFormat Options
Hallo Fred, A - you're right - the missing Letter was the fault - i checked this command so many times -but don't see this. Thanks a lot best ahmed Original-Nachricht Datum: Thu, 28 Feb 2008 12:23:25 -0800 (PST) Von: fredk2 [EMAIL PROTECTED] An: users@tomcat.apache.org Betreff: Re: JkRequestLogFormat Options Hi, btw, in your log format line you have %{JK_REQUEST_DURATON}n instead of %{JK_REQUEST_DURATION}n see the missing I. I am using 1.2.25 and i get times alike 0.0275 when using Apache 2.2 Rgds, Fred Ahmed Musa wrote: Hallo, I am logging the mod_jk Output through the Apache access_log - as written in the reference found under http://tomcat.apache.org/connectors-doc/reference/apache.html Because i want to get clearness about what exactly is going on in our system i use the following LogFormat: LogFormat %h %l %u %t \%r\ %s %b \%{Referer}i\ \%{User-Agent}i\ \%{Cookie}i\ \%{Set-Cookie}o\ %{pid}P %{tid}P%T %{JK_WORKER_NAME}n %{JK_REQUEST_DURATON}n %{JK_WORKER_ROUTE}n %{JK_LB_FIRST_NAME}n %{JK_LB_FIRST_BUSY}n %{JK_LB_FIRST_VALUE}n %{JK_LB_FIRST_ACCESSED}n %{JK_LB_FIRST_READ}n %{JK_LB_FIRST_TRANSFERRED}n %{JK_LB_FIRST_ERRORS}n %{JK_LB_FIRST_ACTIVATION}n %{JK_LB_FIRST_STATE}n %{JK_LB_LAST_NAME}n mod_jk_log ...everthing works fine except the Options responsible for the Request Duration. Mostly neither %T nor %{JK_REQUEST_DURATON}n have a Value (%T mostly is 0 an the other Parameter is -). At some Requests i found the %T has a value like for example 2 or 3.. - and JK_REQUEST DURATION has - or %T is 0 and JK_REQUEST_DURATION has an value like 2 or 3 ... First - why are there not values at each request ? Second -i think both Options are measuring the same Value - why they are not the same ? Third - why they are not showing seconds.microseconds as written in the reference but only (I think so) rounded seconds. We use mod_jk 1.2.26 Thanks for help Best ahmed -- Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! http://games.entertainment.web.de/de/entertainment/games/free - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/JkRequestLogFormat-Options-tp15736214p15745192.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JkRequestLogFormat Options
Hallo Rainer, thanks for your Input - of course i have to change my FIRST and LAST variant (the FIRST_NAME i will use to check if the worker has changed) - but you're right - i am more interested in the LAST values. Changed %T to %D - works fine, thanks We upgraded to 1.2.26 last week - but the Values for ROUTE and DURATION are the same than before (1.2.25) - and we haven't set JkRequestLogFormat explicitly.(of course i wrote DURATION without I - now it's ok). thanks best ahmed Original-Nachricht Datum: Thu, 28 Feb 2008 23:52:40 +0100 Von: Rainer Jung [EMAIL PROTECTED] An: Tomcat Users List users@tomcat.apache.org Betreff: Re: JkRequestLogFormat Options In addition to Freds remark: Usually you want the LAST variant, instead of the FIRST variant. The two are the same, if a loab balancer only tries one worker, but in case of an error and failover, FIRST will be the first worker tried (so the failed one) and LAST the last one, so usually the successful one (unless all workers fail). %T: response time in seconds, and I think it always gets rounded down. So usually not very useful Instead you could use the httpd standard %D, which is response time in microseconds. Last remark: until JK 1.2.25 the variables JK_WORKER_ROUTE and JK_REQUEST_DURATION where only filled, if some JkRequestLogFormat was set. In your version 1.2.26 both of them should get set even with a JkRequestLogFormat (but only, if the request gets handled by mod_jk, so not for static content, that is returned by the web server without any Tomcat interaction). Regards, Rainer Ahmed Musa schrieb: Hallo, I am logging the mod_jk Output through the Apache access_log - as written in the reference found under http://tomcat.apache.org/connectors-doc/reference/apache.html Because i want to get clearness about what exactly is going on in our system i use the following LogFormat: LogFormat %h %l %u %t \%r\ %s %b \%{Referer}i\ \%{User-Agent}i\ \%{Cookie}i\ \%{Set-Cookie}o\ %{pid}P %{tid}P%T %{JK_WORKER_NAME}n %{JK_REQUEST_DURATON}n %{JK_WORKER_ROUTE}n %{JK_LB_FIRST_NAME}n %{JK_LB_FIRST_BUSY}n %{JK_LB_FIRST_VALUE}n %{JK_LB_FIRST_ACCESSED}n %{JK_LB_FIRST_READ}n %{JK_LB_FIRST_TRANSFERRED}n %{JK_LB_FIRST_ERRORS}n %{JK_LB_FIRST_ACTIVATION}n %{JK_LB_FIRST_STATE}n %{JK_LB_LAST_NAME}n mod_jk_log ...everthing works fine except the Options responsible for the Request Duration. Mostly neither %T nor %{JK_REQUEST_DURATON}n have a Value (%T mostly is 0 an the other Parameter is -). At some Requests i found the %T has a value like for example 2 or 3.. - and JK_REQUEST DURATION has - or %T is 0 and JK_REQUEST_DURATION has an value like 2 or 3 ... First - why are there not values at each request ? Second -i think both Options are measuring the same Value - why they are not the same ? Third - why they are not showing seconds.microseconds as written in the reference but only (I think so) rounded seconds. We use mod_jk 1.2.26 Thanks for help Best ahmed - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
JkRequestLogFormat Options
Hallo, I am logging the mod_jk Output through the Apache access_log - as written in the reference found under http://tomcat.apache.org/connectors-doc/reference/apache.html Because i want to get clearness about what exactly is going on in our system i use the following LogFormat: LogFormat %h %l %u %t \%r\ %s %b \%{Referer}i\ \%{User-Agent}i\ \%{Cookie}i\ \%{Set-Cookie}o\ %{pid}P %{tid}P%T %{JK_WORKER_NAME}n %{JK_REQUEST_DURATON}n %{JK_WORKER_ROUTE}n %{JK_LB_FIRST_NAME}n %{JK_LB_FIRST_BUSY}n %{JK_LB_FIRST_VALUE}n %{JK_LB_FIRST_ACCESSED}n %{JK_LB_FIRST_READ}n %{JK_LB_FIRST_TRANSFERRED}n %{JK_LB_FIRST_ERRORS}n %{JK_LB_FIRST_ACTIVATION}n %{JK_LB_FIRST_STATE}n %{JK_LB_LAST_NAME}n mod_jk_log ...everthing works fine except the Options responsible for the Request Duration. Mostly neither %T nor %{JK_REQUEST_DURATON}n have a Value (%T mostly is 0 an the other Parameter is -). At some Requests i found the %T has a value like for example 2 or 3.. - and JK_REQUEST DURATION has - or %T is 0 and JK_REQUEST_DURATION has an value like 2 or 3 ... First - why are there not values at each request ? Second -i think both Options are measuring the same Value - why they are not the same ? Third - why they are not showing seconds.microseconds as written in the reference but only (I think so) rounded seconds. We use mod_jk 1.2.26 Thanks for help Best ahmed -- Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! http://games.entertainment.web.de/de/entertainment/games/free - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Questions to some mod_jk Options
Hallo, I studied the mod_jk docs and the following questions about mod_jk Options are haunting me - i hope wrote the questions in an understandable form and i am pleased of getting hints and tips. .) retries (for LB workers) - At the Apache we use he prefork MPM. So how big is the connection_pool ? because a retry of a lb-worker happens if the loadbalancer can not get a free connection for a member worker from the pool (Info from the doku). Does it depends on the Apache prefork Parameters MaxClients and MaxRequestsPerChild ? If it is so - we have MaxClients 500 and MaxRequestsPerChild 1 = this means the webserver can send/handle 500 requests ? - is this the size of our connection_pool? - i don't think so. On the other side we have 36 Tomcat instances - each Tomcat has - maxThreads=300 on the AJP connector. = ?this doesn't fit, or? (And 3 Apache as frontend - all configured the same) In the worker model i think the number of threads must correspond to the max threads of the Tomcat - but how does it work in our prefork model? .) Why does a load-balancer retries to get a free connection for a member worker from the pool ? Why doesn't he use another member worker ? .) reply_timeout - does it only work between the request and the first response packet or between each two response packets. Is a response packet an AJP-packet with 8k default size ? .) what is the socket_timeout good for ? We configured a connection_timeout, a prepost_timeout and a reply_timeout = i can't find a situation where i need an additional socket_timeout ? And when i wants to know what happens in my system - i think i need a more higher level failure message to evaluate the situation - but on socket level ? .) this question concerns to the mod_jk options retries (for normal worker) (hint - better to find an other Name - the same name for two different things makes problems when writing about) in association with the recovery_options. = when i use the value 7 for the recovery_option - Bit 1+2+4 = i think a retry is only possible if the connection timeout matches. - not on the prepost_timout and not in the situation of reply_timeout = is this right ? Another question to the same topic: i have a long running sticky session - this means that in this session are many requests against the same Tomcat. Will there be established a new connection for each request ? or will there be used the established connection for all requests? If second - that means the established connection is used for all requests of the session = than a retry will not happen if during the session the Tomcat causes Problems. (with recovery_options 7). - is this right? Version mod_jk 1.2.26 (upgraded recently) Here my worker.properties worker.properties worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,...,jkstatus worker.template.type=ajp13 worker.template.lbfactor=5 worker.template.socket_keepalive=1 worker.template.connect_timeout=7000 worker.template.prepost_timeout=5000 worker.template.reply_timeout=18 worker.template.retries=20 worker.template.activation=Active worker.template.recovery_options=7 worker.lbtemplate.type=lb worker.lbtemplate.max_reply_timeouts=6 worker.lbtemplate.method=Session #Produktions Worker # AS-INETP101 - 106 - 6/6 GGI worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT worker.INETP1011.port=65001 worker.INETP1011.reference=worker.template many more of the same then worker.ajp_ad.reference=worker.lbtemplate worker.ajp_ad.balance_workers=INETP1032,INETP1062 many more portals at least jkstatus The JKMount is very simple JkMount /* ajp_ad--- for the other portals mostly the same The Portals are Virtual Hosts on the Apache. Tomcat - server.xml example Connector port=65001 maxThreads=300 protocol=AJP/1.3 / Engine name=Catalina jvmRoute=INETP5021 defaultHost=default .. Host name=slfinsol.com appBase=webapps unpackWARs=true autoDeploy=false deployOnStartup=false xmlValidation=false xmlNamespaceAware=false Aliaswww.slfinsol.com/Alias Aliasweb1.slfinsol.com/Alias ... Aliastestweb.slfinsol.com/Alias . Valve className=org.apache.catalina.valves.AccessLogValve directory=logs prefix=swl_access_log. suffix=.txt pattern=common resolveHosts=false / Valve className=at.allianz.tomcat.valve.RequestTimeValve/ Valve className=at.allianz.tomcat.valve.WebcollaborationWorkaroundValve/ Context path= docBase=swl / Context path=/monitor5 docBase=monitor / Context path=/swl docBase=swl / /Host thanxs for your time reading this and maybe giving tipps - with kind regards ahmed musa -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: RE: mod_jk Problems - - worker went to error state and dont recover
Hallo Luke, Here the information from tomcat.apache.org Unsubscription: Send a blank email to [EMAIL PROTECTED] Digest unsubscription: Send a blank email to [EMAIL PROTECTED] best ahmed Original-Nachricht Datum: Thu, 21 Feb 2008 09:27:31 - Von: [EMAIL PROTECTED] An: users@tomcat.apache.org Betreff: RE: mod_jk Problems - - worker went to error state and dont recover All Apologies, this is unrelated. How do I unsubscribe from this mailing list, I thought it would be useful and small but its overwhelming my inbox? Thanks in Advance. Luke Walshe BT Operate, HGIPCC Technical Specialist Telephone: +44 (0)1314483482, Email: [EMAIL PROTECTED] -Original Message- From: Ahmed Musa [mailto:[EMAIL PROTECTED] Sent: 21 February 2008 09:25 To: Tomcat Users List Subject: Re: mod_jk Problems - - worker went to error state and dont recover Hello Rainer, Thanks for your informations - the Situation gets more clear now. I will read again some dics - following your links and will make further tests also with the improved logging. Thanks a lot for your time with best regards ahmed Original-Nachricht Datum: Wed, 20 Feb 2008 18:59:01 +0100 Von: Rainer Jung [EMAIL PROTECTED] An: Tomcat Users List users@tomcat.apache.org Betreff: Re: mod_jk Problems - - worker went to error state and dont recover Ahmed Musa wrote: Hello, Wow -thank you very much Rainer for your very quick and informative answer. I will go to 1.2.26 and think about some smoother Values for reply_timeout and max_reply_timeouts. I will search for the requests which causes the Problems - becasue i still log the response time in your mentioned way - but I am not sure that the Userrequests are responsible for the Situation. One note: for Apache httpd 2.x %d is microseconds (there is no format for milliseconds), for Tomcat %D is milliseconds. As long as you are searching for the root cause, it might make sense to have both access logs active to check about duration differences. So one further question - does mod_jk itself checks if the Backend is reachable - without userrequests? No. Everything only works on top of user requests. When there are connections to the Backend - are they closed after the respone or are the hold open for further requests. In general hold open. There are parameters on how long they are held open without more requests before they get shut down, and also how many might be kept open even when no requests are coming in. Those are the connection pool parameters, which you will find on http://tomcat.apache.org/connectors-doc/reference/workers.html Tomcat also has a connectionTimeout on the connector, which will shut down a connection from the Tomcat side if it is idle for to long. If you don't want to reuse connections at all, there's also a setting (a JkOption in Apache). Is it possible that the Checkpoint Firewall in Between can be responsible for the connectivity problem? It can cut a connection that's idle for too long. Since you have cping/cpong active via connect_timeout and prepost_timeout, you should get a cping error message, if the connection was dropped by the firewall during idle times and mod_jk tries to use it again. The reply timeout in the error log indicates, that the backend isn't answering. Of course if it takes *very* long to answer, it might be that the firewall dropped the connection in between, but then the root cause would still be the long response time of the backend. Another point is the not recovering of the worker. Yes, you are right - in this situation i have many reply_timeouts - but these happens in a period of time - for example 30 minutes - but the worker is still dead even then when there are no more reply_timeouts. It remains dead. It was necessary to restart it manually via jkstatus. I assume you are using stickyness, so when a session started on a node, it will stay there. So when a worker is in error for a long time, all new sessions will start on other nodes. If the worker is ready for recovery, it needs a request, that doesn't carry a session to get probed with this request. In jkstatus, the status of an error worker should switch to REC, when mod_jk decides that it could send a non-sticky request there (to probe) and to PRB, during the time this request is on the node, and finally either to OK or back to ERR depending on the result of the request. You can log the number of errors (and accesses) that happened on the node in the httpd access log. If you think that the node simply stays in error for a long time, then the error count (and access count) should stay constant. I would expect, that they do not. Have a look at how LogFormat in Apache httpd works, and then add some of those documented
Re: mod_jk Problems - - worker went to error state and dont recover
Hello Rainer, Thanks for your informations - the Situation gets more clear now. I will read again some dics - following your links and will make further tests also with the improved logging. Thanks a lot for your time with best regards ahmed Original-Nachricht Datum: Wed, 20 Feb 2008 18:59:01 +0100 Von: Rainer Jung [EMAIL PROTECTED] An: Tomcat Users List users@tomcat.apache.org Betreff: Re: mod_jk Problems - - worker went to error state and dont recover Ahmed Musa wrote: Hello, Wow -thank you very much Rainer for your very quick and informative answer. I will go to 1.2.26 and think about some smoother Values for reply_timeout and max_reply_timeouts. I will search for the requests which causes the Problems - becasue i still log the response time in your mentioned way - but I am not sure that the Userrequests are responsible for the Situation. One note: for Apache httpd 2.x %d is microseconds (there is no format for milliseconds), for Tomcat %D is milliseconds. As long as you are searching for the root cause, it might make sense to have both access logs active to check about duration differences. So one further question - does mod_jk itself checks if the Backend is reachable - without userrequests? No. Everything only works on top of user requests. When there are connections to the Backend - are they closed after the respone or are the hold open for further requests. In general hold open. There are parameters on how long they are held open without more requests before they get shut down, and also how many might be kept open even when no requests are coming in. Those are the connection pool parameters, which you will find on http://tomcat.apache.org/connectors-doc/reference/workers.html Tomcat also has a connectionTimeout on the connector, which will shut down a connection from the Tomcat side if it is idle for to long. If you don't want to reuse connections at all, there's also a setting (a JkOption in Apache). Is it possible that the Checkpoint Firewall in Between can be responsible for the connectivity problem? It can cut a connection that's idle for too long. Since you have cping/cpong active via connect_timeout and prepost_timeout, you should get a cping error message, if the connection was dropped by the firewall during idle times and mod_jk tries to use it again. The reply timeout in the error log indicates, that the backend isn't answering. Of course if it takes *very* long to answer, it might be that the firewall dropped the connection in between, but then the root cause would still be the long response time of the backend. Another point is the not recovering of the worker. Yes, you are right - in this situation i have many reply_timeouts - but these happens in a period of time - for example 30 minutes - but the worker is still dead even then when there are no more reply_timeouts. It remains dead. It was necessary to restart it manually via jkstatus. I assume you are using stickyness, so when a session started on a node, it will stay there. So when a worker is in error for a long time, all new sessions will start on other nodes. If the worker is ready for recovery, it needs a request, that doesn't carry a session to get probed with this request. In jkstatus, the status of an error worker should switch to REC, when mod_jk decides that it could send a non-sticky request there (to probe) and to PRB, during the time this request is on the node, and finally either to OK or back to ERR depending on the result of the request. You can log the number of errors (and accesses) that happened on the node in the httpd access log. If you think that the node simply stays in error for a long time, then the error count (and access count) should stay constant. I would expect, that they do not. Have a look at how LogFormat in Apache httpd works, and then add some of those documented in http://tomcat.apache.org/connectors-doc/reference/apache.html like: JK_LB_LAST_NAME JK_LB_LAST_ACCESSED JK_LB_LAST_ERRORS JK_LB_LAST_BUSY JK_LB_LAST_STATE using the syntax %{JK_LB_LAST_STATE}n etc. Another point is the learning - i read the dics - the infos on the apache Website i dont't find other ones - are there other ones ? - and they are not going in depth - if you read the spec and watch the logs it is - for me - very hard to match the things. Also the many possibilities that mod_jk has to prove if there is a connection to the Backend,... - i understand them but check the reality in an error situation is very hard. Under matching i mean Which Part of the Communication sequence failed - why - and causes which error message. But i will try - and study also the mailing list.. It's hard for us too (sometimes). Thank you for your time - tomorrow we will have the new version and will see what happens. best ahmed Regards, Rainer
mod_jk Problems - worker went to error state and dont recover
Hallo to all, After long unsuccessful research i hope someone can give me a hint to the following problems. Our Apache-mod_jk-Tomcat Infrastructur was running without Problems for about one year-than since two month mod_jk errors occurs. We upgraded the mod_jk Version, made improvements in the worker.properties - the problems changed and get less but sometimes they appear further on. It seems that the mod_jk worker loose the connection to their Tomcat-Backendserver - there are messages in the mod_jk log Files which points in this direction. Normally this seems not to be a big problem - but under certain conditions (which ?) the worker goes to an error state and cannot recover itself- must be done manually. Problem 1: The Tomcats are reachable - unknown why the workers think the server is dead ? Problem 2: I have no idea why the worker goes to an error state and cannot recover. Problem3: I miss explanations of logged messages - i read the messages - but cannot match them to the situation - when does a worker post this messages [Wed Feb 20 10:04:01.889 2008] [19237:3086010048] [info] jk_handler::mod_jk.c (2270): Aborting connection for worker=ajp_ggi [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] ajp_get_reply::jk_ajp_common.c (1623): (INETP1011) Timeout with waiting reply from tomca t. Tomcat is down, stopped or network problems (errno=110) [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] ajp_service::jk_ajp_common.c (2034): (INETP1011) receiving reply from tomcat failed with out recovery in send loop attempt=0 [Wed Feb 20 10:04:41.799 2008] [19294:3086010048] [error] service::jk_lb_worker.c (1105): unrecoverable error 504, request failed. Tomcat failed i n the middle of request, we can't recover to another instance. - Which Timeout - how does mod_jk think Tomcat is down ? Where can i found details to errno=110 ?... - receiving reply from tomcat failed with out recovery in send loop attempt=0 - ? with out recovery in send loop - means? - unrecoverable error 504 - details to this error ? Ok - i turn the logging level to debug - the course of events get more clear - but also more questions appear - there are socket numbers - which sockets - what are these numbers e.g will be shutting down socket 35 for worker INETP1021 - The sockets are good for ? - how many are there/per worker ? can i configure them ? = Generally -How can i solve such problems - i tried to look into the mod_jk code - searching for error codes, error messages - but cannot find some relevant informations, - i am studying the log Files - but don't find out what really happens. So - maybe someone has an idea why the worker think that the corresponding Tomcat is dead, and why he will not recover by itself. ! And i am also searching for tips how i can help myself - and where to find something about the error codes, messages,..in mod_jk thanks for your attention Best ahmed musa (writing from vienna) Current Infrastructur We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 / Kernelversion 2.6.9-34 In front of the Webserver there are two (two Locations) HW-Loadbalancer (but they have no role in this story) The Webservers are hosted at our ISP. The Webserver balance the requests via mod_jk (Version 1.2.25) for approx. 10 Webapps to 18 Backend-Tomcatserver (Bladeserver - because of underlying Application-Parts the OS ist Windows 2003 Server - a long story not worth to explain :-) ). The Tomcatserver gain Data via Requests against DB2 Server/DB2-Databases on the Mainframe. The Tomcatserver are Inhouse - and were rebooted nightly because of automated Deployment processes. Between the Webserver and the Tomcatserver is a Checkpoint Firewall. All webapps are deployed on all Tomcats - only mod_jk manages the requests to certain Tomcat- instances. (on one Bladeserver there are two identically Tomcat Instances running). Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests against the public Website(s) are normal short living requests - not many - The most Webapps (Portals) need a login, have a strong focus on business logic - so the instances are big (many MBs in RAM), the sessions are sticky and the session timeout is 20 minutes. But there are also less requests. To the User requests - Monitoring requests from our ISP are added. The Problems appears at Servers/Portals which very less Userrequests. worker.properties worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,...,jkstatus worker.template.type=ajp13 worker.template.lbfactor=5 worker.template.socket_keepalive=1 worker.template.connect_timeout=7000 worker.template.prepost_timeout=5000 worker.template.reply_timeout=12 worker.template.retries=6 worker.template.activation=Active worker.template.recovery_options=7 worker.lbtemplate.type=lb worker.lbtemplate.max_reply_timeouts=6 worker.lbtemplate.method=Session #Produktions Worker # AS-INETP101 - 106 - 6/6 GGI worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT worker.INETP1011.port=65001
Re: mod_jk Problems - - worker went to error state and dont recover
of max_reply_timeouts in 1.2.25 was wrong, so you need to go to 1.2.26 to get it working right. See: http://issues.apache.org/bugzilla/show_bug.cgi?id=43229 Caution: this does *not* explain, why the backends are not automatically recovered after a minute of error condition. Maybe you have times, where you getr to many of those reply_timeouts (see log file), and although we recover after a minute the backend almost immediately goes back into error status. - Which Timeout - how does mod_jk think Tomcat is down ? Where can i found details to errno=110 ?... reply_timeout, see above and also http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html errno: a standard unix feature. The numbers are platform dependent. I would assume in your case ETIMEDOUT 110 /* Connection timed out */ so no wonder, that's exactly what we expect (and doesn't tell us the reason, i.e. what's wrong on the *backend* taking that long for a response). - receiving reply from tomcat failed with out recovery in send loop attempt=0 - ? with out recovery in send loop - means? That your configuration doesn't allow us to send the request to another backend. recovery_options 7 include: if mod_jk was able to send the request to a backend, do not try to send it to another backend in case of an error during the response handling. Even if you would allow sending to another backend, it would not help with *not* putting the worker into error state. More likely would be, that you would put all workers into error state, because all of them might run into the same timeout, one after the other. - unrecoverable error 504 - details to this error ? That's simply how we return the situation back to the client (browser). Ok - i turn the logging level to debug - the course of events get more clear - but also more questions appear - there are socket numbers - which sockets - what are these numbers e.g will be shutting down socket 35 for worker INETP1021 - The sockets are good for ? - how many are there/per worker ? can i configure them ? Should not be the problem here. For apache httpd if you do *not* configure anything, we automatically choose the number of httpd threads as the maximum number of connections. No need to change anything here. = Generally -How can i solve such problems - i tried to look into the mod_jk code - searching for error codes, error messages - but cannot find some relevant informations, - i am studying the log Files - but don't find out what really happens. Post to the list. Improve our dics. The error message contains the word timeout and reply and you have a reply_timeout. Long running requests are a frequent problem. If you want to get rid of them, start by adding response times to your httpd and your tomcat access log format (%D). Then have a look, which URLs are producing long running requests, during what time of day are they happening etc. This might give you a clue about the reasons. And if they are very frequent: do Java Thread Dumps of your backends and analyze them. So - maybe someone has an idea why the worker think that the corresponding Tomcat is dead, and why he will not recover by itself. ! Tomecat is dead: from the point of view of mod_jk it simply means: we didn't get an answer, when we expected one. Details depend on the additional log lines (could not connect, reply timeout etc.). And i am also searching for tips how i can help myself - and where to find something about the error codes, messages,..in mod_jk thanks for your attention Best ahmed musa (writing from vienna) Regards, Rainer Current Infrastructur We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 /Kernelversion 2.6.9-34 In front of the Webserver there are two (two Locations) HW-Loadbalancer (but they have no role in this story) The Webservers are hosted at our ISP. The Webserver balance the requests via mod_jk (Version 1.2.25) for approx. 10 Webapps to 18 Backend-Tomcatserver (Bladeserver - because of underlying Application-Parts the OS is Windows 2003 Server - a long story not worth to explain :-) ). The Tomcatserver gain Data via Requests against DB2 Server/DB2-Databases on the Mainframe. The Tomcatserver are Inhouse -and were rebooted nightly because of automated Deployment processes. Between the Webserver and the Tomcatserver is a Checkpoint Firewall. All webapps are deployed on all Tomcats - only mod_jk manages the requests to certain Tomcat- instances. (on one Bladeserver there are two identically Tomcat Instances running). Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests against the public Website(s) are normal short living requests - not many - The most Webapps (Portals) need a login, have a strong focus on business logic - so the instances are big (many MBs in RAM), the sessions