Hallo Luke, Here the information from tomcat.apache.org
Unsubscription: Send a blank email to [EMAIL PROTECTED] Digest unsubscription: Send a blank email to [EMAIL PROTECTED] best ahmed -------- Original-Nachricht -------- > Datum: Thu, 21 Feb 2008 09:27:31 -0000 > Von: [EMAIL PROTECTED] > An: users@tomcat.apache.org > Betreff: RE: mod_jk Problems - - worker went to error state and dont recover > All > > Apologies, this is unrelated. How do I unsubscribe from this mailing > list, I thought it would be useful and small but its overwhelming my > inbox? > > Thanks in Advance. > > Luke Walshe > BT Operate, HGIPCC Technical Specialist > Telephone: +44 (0)1314483482, Email: [EMAIL PROTECTED] > > -----Original Message----- > From: Ahmed Musa [mailto:[EMAIL PROTECTED] > Sent: 21 February 2008 09:25 > To: Tomcat Users List > Subject: Re: mod_jk Problems - - worker went to error state and dont > recover > > Hello Rainer, > Thanks for your informations - the Situation gets more clear now. > I will read again some dics - following your links and will make further > tests also with the improved logging. > Thanks a lot for your time > with best regards > ahmed > > -------- Original-Nachricht -------- > > Datum: Wed, 20 Feb 2008 18:59:01 +0100 > > Von: Rainer Jung <[EMAIL PROTECTED]> > > An: Tomcat Users List <users@tomcat.apache.org> > > Betreff: Re: mod_jk Problems - - worker went to error state and dont > recover > > > Ahmed Musa wrote: > > > Hello, > > > Wow -thank you very much Rainer for your very quick and informative > > answer. > > > I will go to 1.2.26 and think about some "smoother" Values for > > reply_timeout and max_reply_timeouts. > > > I will search for the requests which causes the Problems - becasue i > > still log the response time in your mentioned way - but I am not sure > that the > > Userrequests are responsible for the Situation. > > > > One note: for Apache httpd 2.x %d is microseconds (there is no format > > for milliseconds), for Tomcat %D is milliseconds. As long as you are > > searching for the root cause, it might make sense to have both access > > logs active to check about duration differences. > > > > > So one further question - does mod_jk itself checks if the Backend > is > > reachable - without userrequests? > > > > No. Everything only works on top of user requests. > > > > > When there are connections to the Backend - are they closed after > the > > respone or are the hold open for further requests. > > > > In general hold open. There are parameters on how long they are held > > open without more requests before they get shut down, and also how > many > > might be kept open even when no requests are coming in. Those are the > > connection pool parameters, which you will find on > > > > http://tomcat.apache.org/connectors-doc/reference/workers.html > > > > Tomcat also has a connectionTimeout on the connector, which will shut > > down a connection from the Tomcat side if it is idle for to long. > > > > If you don't want to reuse connections at all, there's also a setting > (a > > JkOption in Apache). > > > > > Is it possible that the Checkpoint Firewall in Between can be > > responsible for the connectivity problem? > > > > It can cut a connection that's idle for too long. Since you have > > cping/cpong active via connect_timeout and prepost_timeout, you should > > > get a cping error message, if the connection was dropped by the > firewall > > during idle times and mod_jk tries to use it again. The reply timeout > in > > the error log indicates, that the backend isn't answering. Of course > if > > it takes *very* long to answer, it might be that the firewall dropped > > the connection in between, but then the root cause would still be the > > long response time of the backend. > > > > > Another point is the "not recovering" of the worker. Yes, you are > right > > - in this situation i have many reply_timeouts - but these happens in > a > > period of time - for example 30 minutes - but the worker is still dead > even > > then when there are no more reply_timeouts. It remains dead. > > > It was necessary to restart it manually via jkstatus. > > > > I assume you are using stickyness, so when a session started on a > node, > > it will stay there. So when a worker is in error for a long time, all > > new sessions will start on other nodes. If the worker is ready for > > recovery, it needs a request, that doesn't carry a session to get > probed > > with this request. > > > > In jkstatus, the status of an error worker should switch to REC, when > > mod_jk decides that it could send a non-sticky request there (to > probe) > > and to PRB, during the time this request is on the node, and finally > > either to OK or back to ERR depending on the result of the request. > > > > You can log the number of errors (and accesses) that happened on the > > node in the httpd access log. If you think that the node simply stays > in > > error for a long time, then the error count (and access count) should > > stay constant. I would expect, that they do not. > > > > Have a look at how LogFormat in Apache httpd works, and then add some > of > > those documented in > > > > http://tomcat.apache.org/connectors-doc/reference/apache.html > > > > like: > > > > JK_LB_LAST_NAME > > JK_LB_LAST_ACCESSED > > JK_LB_LAST_ERRORS > > JK_LB_LAST_BUSY > > JK_LB_LAST_STATE > > > > using the syntax %{JK_LB_LAST_STATE}n etc. > > > > > > > > Another point is the learning - i read the dics - the infos on the > > apache Website i dont't find other ones - are there other ones ? - and > they are > > not going in depth - if you read the spec and watch the logs it is - > for me > > - very hard to match the things. Also the many possibilities that > mod_jk > > has to prove if there is a connection to the Backend,... - i > understand them > > but check the reality in an error situation is very hard. Under > matching i > > mean "Which Part of the Communication sequence failed - why - and > causes > > which error message". > > > But i will try - and study also the mailing list.. > > > > It's hard for us too (sometimes). > > > > > Thank you for your time - tomorrow we will have the new version and > will > > see what happens. > > > > > > best > > > ahmed > > > > > > Regards, > > > > Rainer > > > > > -------- Original-Nachricht -------- > > >> Datum: Wed, 20 Feb 2008 15:56:42 +0100 > > >> Von: Rainer Jung <[EMAIL PROTECTED]> > > >> An: Tomcat Users List <users@tomcat.apache.org> > > >> Betreff: Re: mod_jk Problems - - worker went to error state and > dont > > recover > > > > > >> [EMAIL PROTECTED] wrote: > > >>> See Thread at: http://www.techienuggets.com/Detail?tx=25608 Posted > on > > >> behalf of a User > > >>> Hallo to all, After long unsuccessful research i hope someone can > > >>> give me a hint to the following problems. > > >>> > > >>> Our Apache-mod_jk-Tomcat Infrastructur was running without > Problems > > >>> for about one year-than since two month mod_jk errors occurs. > > >>> We upgraded the mod_jk Version, made improvements in the > > >>> worker.properties - the problems changed and get less but > sometimes > > they > > >>> appear further on. > > >>> > > >>> It seems that the mod_jk worker loose the connection to their > > >>> Tomcat-Backendserver - there are messages in the mod_jk log Files > > which > > >>> points in this direction. Normally this seems not to be a big > problem > > - > > >>> but under certain conditions (which ?) the worker goes to an error > > state > > >>> and cannot recover itself- must be done manually. > > >>> > > >>> Problem 1: The Tomcats are reachable - unknown why the workers > think > > the > > >> server is dead ? > > >>> Problem 2: I have no idea why the worker goes to an error state > and > > >> cannot recover. > > >> > > >> 2 is a consequence of 1 > > >> > > >>> Problem3: I miss explanations of logged messages - i read the > messages > > - > > >> but cannot match them to the situation - when does a worker post > this > > >> messages > > >> > > >> 1 is a consequence of these messages > > >> > > >>> [Wed Feb 20 10:04:01.889 2008] [19237:3086010048] [info] > > >> jk_handler::mod_jk.c (2270): Aborting connection for worker=ajp_ggi > > > >>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] > > >> ajp_get_reply::jk_ajp_common.c (1623): (INETP1011) Timeout with > waiting > > reply from > > >> tomcat. Tomcat is down, stopped or network problems (errno=110) > > >>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] > > >> ajp_service::jk_ajp_common.c (2034): (INETP1011) receiving reply > from > > tomcat failed with > > >> out recovery in send loop attempt=0 > > >>> [Wed Feb 20 10:04:41.799 2008] [19294:3086010048] [error] > > >> service::jk_lb_worker.c (1105): unrecoverable error 504, request > > failed. Tomcat failed in > > >> the middle of request, we can't recover to another instance. > > >> > > >> The second line tells us, that your configured reply_timeout fired. > > >> You set it to 120000 (2 minutes), so there are requests taking > longer > > >> than 2 minutes on the backend, before the first response packet > comes > > >> back from the backend. > > >> > > >> With your configuration mod_jk then doesn't wait any longer on the > > reply > > >> *and puts the backend into error mode*. > > >> > > >> Up until version 1.2.25, if you use a reply-timeout, you need to > set it > > >> to a high number which justifies the resoning "if it takes that > long, > > >> that something is wrong with the backend". > > >> > > >> Reality shows: there is no such number. Often there are few > requests > > >> that take unaccetably long on the backend *although* the backend is > > > >> still working. > > >> > > >> So in 1.2.25 we added max_reply_timeouts. With this set in addition > to > > >> reply_timeout, mod_jk will abort waiting for a reply after > > >> reply_timeout, but allow some timeouts before actually deciding to > put > > >> the backend into error. > > >> > > >> Unfortunately the implementation of max_reply_timeouts in 1.2.25 > was > > >> wrong, so you need to go to 1.2.26 to get it working right. > > >> > > >> See: > > >> > > >> http://issues.apache.org/bugzilla/show_bug.cgi?id=43229 > > >> > > >> Caution: this does *not* explain, why the backends are not > > automatically > > >> recovered after a minute of error condition. Maybe you have times, > > where > > >> you getr to many of those reply_timeouts (see log file), and > although > > we > > >> recover after a minute the backend almost immediately goes back > into > > >> error status. > > >> > > >>> -> Which Timeout - how does mod_jk think Tomcat is down ? Where > can i > > >> found details to errno=110 ?... > > >> > > >> reply_timeout, see above and also > > >> > > >> http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html > > >> > > >> errno: a standard unix feature. The numbers are platform dependent. > I > > >> would assume in your case > > >> > > >> ETIMEDOUT 110 /* Connection timed out */ > > >> > > >> so no wonder, that's exactly what we expect (and doesn't tell us > the > > >> reason, i.e. what's wrong on the *backend* taking that long for a > > >> response). > > >> > > >>> -> receiving reply from tomcat failed with out recovery in send > loop > > >> attempt=0 - ? with out recovery in send loop - means? > > >> > > >> That your configuration doesn't allow us to send the request to > another > > >> backend. recovery_options 7 include: if mod_jk was able to send the > > > >> request to a backend, do not try to send it to another backend in > case > > >> of an error during the response handling. Even if you would allow > > >> sending to another backend, it would not help with *not* putting > the > > >> worker into error state. More likely would be, that you would put > all > > >> workers into error state, because all of them might run into the > same > > >> timeout, one after the other. > > >> > > >>> -> unrecoverable error 504 - details to this error ? > > >> That's simply how we return the situation back to the client > (browser). > > >> > > >>> Ok - i turn the logging level to debug - the course of events get > > >>> more > > >>> clear - but also more questions appear - there are socket numbers > - > > >>> which sockets - what are these numbers e.g will be shutting down > > socket > > >>> 35 for worker INETP1021 - The sockets are good for ? - how many > are > > >>> there/per worker ? can i configure them ? > > >> Should not be the problem here. For apache httpd if you do *not* > > >> configure anything, we automatically choose the number of httpd > threads > > >> as the maximum number of connections. No need to change anything > here. > > >>> => Generally -How can i solve such problems - i tried to look into > > >>> the > > >>> mod_jk code - searching for error codes, error messages - but > cannot > > >>> find some relevant informations, - i am studying the log Files - > but > > >>> don't find out what really happens. > > >> Post to the list. Improve our dics. > > >> > > >> The error message contains the word "timeout" and "reply" and you > have > > a > > >> "reply_timeout". > > >> > > >> Long running requests are a frequent problem. If you want to get > rid of > > >> them, start by adding response times to your httpd and your tomcat > > >> access log format (%D). Then have a look, which URLs are producing > long > > >> running requests, during what time of day are they happening etc. > This > > >> might give you a clue about the reasons. > > >> > > >> And if they are very frequent: do Java Thread Dumps of your > backends > > and > > >> analyze them. > > >> > > >>> So - maybe someone has an idea why the worker think that the > > >>> corresponding Tomcat is dead, and why he will not recover by > itself. ! > > >> Tomecat is dead: from the point of view of mod_jk it simply means: > we > > >> didn't get an answer, when we expected one. Details depend on the > > >> additional log lines (could not connect, reply timeout etc.). > > >> > > >>> And i am also searching for tips how i can help myself - and where > to > > >>> find something about the error codes, messages,..in mod_jk > > >>> > > >>> thanks for your attention > > >>> Best > > >>> ahmed musa (writing from vienna) > > >>> > > >> Regards, > > >> > > >> Rainer > > >> > > >>> Current Infrastructur > > >>> We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 > > >> /Kernelversion 2.6.9-34 > > >>> In front of the Webserver there are two (two Locations) > > HW-Loadbalancer > > >> (but they have no role in this story) > > >>> The Webservers are hosted at our ISP. > > >>> > > >>> The Webserver balance the requests via mod_jk (Version 1.2.25) for > > >>> approx. 10 Webapps to 18 Backend-Tomcatserver (Bladeserver - > because > > of > > >>> underlying Application-Parts the OS is Windows 2003 Server - a > long > > >>> story not worth to explain :-) ). The Tomcatserver gain Data via > > >>> Requests against DB2 Server/DB2-Databases on the Mainframe. The > > >>> Tomcatserver are Inhouse -and were rebooted nightly because of > > automated > > >>> Deployment processes. > > >>> > > >>> Between the Webserver and the Tomcatserver is a Checkpoint > Firewall. > > >>> All webapps are deployed on all Tomcats - only mod_jk manages the > > >>> requests to certain Tomcat- instances. > > >>> (on one Bladeserver there are two identically Tomcat Instances > > >>> running). > > >>> > > >>> Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests > against > > >>> the public Website(s) are normal short living requests - not many > - > > The > > >>> most Webapps (Portals) need a login, have a strong focus on > business > > >>> logic - so the instances are big (many MBs in RAM), the sessions > are > > >>> sticky and the session timeout is 20 minutes. But there are also > less > > >>> requests. To the User requests - Monitoring requests from our ISP > are > > >> added. > > >>> The Problems appears at Servers/Portals which very less > Userrequests. > > >>> > > >>> worker.properties > > >>> worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,.......,jkstatus > > >>> > > >>> worker.template.type=ajp13 > > >>> worker.template.lbfactor=5 > > >>> worker.template.socket_keepalive=1 > > >>> worker.template.connect_timeout=7000 > > >>> worker.template.prepost_timeout=5000 > > >>> worker.template.reply_timeout=120000 > > >>> worker.template.retries=6 > > >>> worker.template.activation=Active > > >>> worker.template.recovery_options=7 > > >>> > > >>> worker.lbtemplate.type=lb > > >>> worker.lbtemplate.max_reply_timeouts=6 > > >>> worker.lbtemplate.method=Session > > >>> > > >>> #Produktions Worker > > >>> # AS-INETP101 - 106 - 6/6 GGI > > >>> worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT > > >>> worker.INETP1011.port=65001 > > >>> worker.INETP1011.reference=worker.template > > >>> > > >>> ....many more of the same > > >>> > > >>> then > > >>> > > >>> worker.ajp_ad.reference=worker.lbtemplate > > >>> worker.ajp_ad.balance_workers=INETP1032,INETP1062 > > >>> > > >>> .... many more portals > > >>> > > >>> at least jkstatus > > >>> > > >>> The JKMount is very simple > > >>> JkMount /* ajp_ad --- for the other portals mostly the same > > >>> > > >>> The Portals are Virtual Hosts on the Apache. > > >>> > > >>> Tomcat - server.xml > > >>> example > > >>> <Connector port="65001" maxThreads="300" protocol="AJP/1.3" /> > > >>> <Engine name="Catalina" jvmRoute="INETP5021" > > defaultHost="default"> > > >>> ...... > > >>> <Host name="slfinsol.com" appBase="webapps" unpackWARs="true" > > >>> autoDeploy="false" deployOnStartup="false" xmlValidation="false" > > >>> xmlNamespaceAware="false"> > > >>> <Alias>www.slfinsol.com</Alias> > > >>> <Alias>web1.slfinsol.com</Alias> > > >>> ... > > >>> <Alias>testweb.slfinsol.com</Alias> > > >>> ..... > > >>> <Valve > className="org.apache.catalina.valves.AccessLogValve" > > >>> directory="logs" prefix="swl_access_log." suffix=".txt" > > pattern="common" > > >>> resolveHosts="false" /> > > >>> <Valve > > >>> className="at.allianz.tomcat.valve.RequestTimeValve"/> > > >>> <Valve > > >>> > className="at.allianz.tomcat.valve.WebcollaborationWorkaroundValve"/> > > >>> <Context path="" docBase="swl" /> > > >>> <Context path="/monitor5" docBase="monitor" /> > > >>> <Context path="/swl" docBase="swl" /> > > >>> </Host> > > > > --------------------------------------------------------------------- > > To start a new topic, e-mail: users@tomcat.apache.org > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > -- > Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten > Browser-Versionen downloaden: http://www.gmx.net/de/go/browser > > --------------------------------------------------------------------- > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen! Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED] --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]