Re: URGENT HELP: James 2.3.2 not responding after few days of run
Hi Bernd, Here is some more information. As per the latest information, server is not killed.. Our support team used to restart the server when we get Connection Refused error from port 25..We have a monitoring tool which Connects to James erver every minute, and issue a QUIT command... This monitoring tool is getting Connection Refused error. Hence the team thought the server is down and followed the routine Stop, Start commands... I have the following configurations in various places in the config.xml. SMTP Server (the only process enabled): connectiontimeout36/connectiontimeout Spool Manager: threads 10 /threads Connections Manager connections idle-timeout30/idle-timeout max-connections30/max-connections /connections Thread Manager thread-manager thread-group namedefault/name priority5/priority is-daemonfalse/is-daemon max-threads100/max-threads min-threads20/min-threads min-spare-threads20/min-spare-threads /thread-group /thread-manager The total number of threads (spool+remotedelivery etc ) are under very much under 100. We don't have any DB configuration in config.xml as well. Do you think the timeout values might cause the connection refused errors ?. Especially the idle-timeout ? Isn't 5 minutes too high ? If say 30 clients are taking few minutes, this will be more than enough to raise an alert from the monitoring tool... Thanks Mahesh On Tue, Mar 31, 2015 at 11:16 AM, Mahesh Sivarama Pillai srm...@gmail.com wrote: Hi Bernd, Our Sys Admin has NOT performed the following things while configuring james as a service. 1. Adding the below lines in phoenix.sh #chkconfig: 2345 80 05#description: James Mail Server 2. Chkconfig command chkconfig --add james They created only the link in /etc/init.d pointing to phoenix.sh. We can start and stop the service using the service command. Do you think not doing the above two steps will impact a running James in any manner ? I am trying to understand he run levels as well. Thanks Mahesh On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai srm...@gmail.com wrote: If there is a clean shutdown through RemoteManager, it should be shown in the log rite ? The thing is, I don't see any entry in the console log which says STOPPED..I am investigating and will keep you posted. Thanks for the help so far. Thanks Mahesh On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel bwai...@intarsys.de wrote: Hi Mahesh finding a hserr would be a clear sign that something happened outside the VM. E.g. if you load a dll or lib inside your Java code and the dll produces a memory fault than the vm may crash. If a hserr is produced the vm have crashed, without writing a log or something else. The log just ends. Not finding a hserr means you need to look for something else. So I think it is not a crash. Another Idea: In the config.xml you could configure a RemoteManager Port and user. I am currently on holidays so I could not look up the syntax. You could telnet to that port and send a shutdown command. Could something simple like that happen? And about chkconfig: We had a system with james configured to run only in runlevel with gui (i think it was 5 or 6). And than a sysadmin switched the system to run without gui. So the switch to another runlevel just stopped james, with a clean shutdown. After that we just carefully looked for the runlevels. James needs to start after network, and after database if used. And also it should stop this way. Greetings Bernd Ursprüngliche Nachricht Von: Mahesh Sivarama Pillai srm...@gmail.com Datum: 29.03.2015 07:58 (GMT+01:00) An: James Users List server-user@james.apache.org Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run Thanks again Bernd... I couldn't find the hserr files under the temp or james directories. Considering we faced Too Many open files issue, will it prevent the JVM from not creating this file ? I am clueless on this issue. No process Killed James, Noone stopped James.. No OOM in logs.. No core dump :) :( Regarding the file system I will verify. As far as I know we have a NAS... On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel bwai...@intarsys.de wrote: Hi Mahesh, Don't missunderstand: Out-of-file-handle COULD lead to a memory leak, consuming memory time by time. But not NEED to. OOMs will normally been shown in the log, as I know, but we got this only for the heap memory. OOMs normally happen if the heap memory will reach the limit, and yes, we got this in the logs, sometimes. Every time I got an OOM in log, I restarted the server. Just to be sure it keeps running. So I do not have long running servers with a lot of OOM errors. So: no experience with that. But you could also get short on memory for the java classes (Native area, Method area), and I am not sure if this
AW: URGENT HELP: James 2.3.2 not responding after few days of run
Hi Mahesh I am currently on holidays. So I could not check on a linux. The chkconfig add will add scripts for startup AND shutdown, with a defined order and in the defined runlevel. Not having this means: you have the service to be started and stopped by hand. And the process may just be killed when rebooting. This MAY result in nothing to be logged on shutdown. If you reboot the Server the log may just end and the process will die. It will not been started again. Just sounds like your description. Does it? Greetings Bernd Ursprüngliche Nachricht Von: Mahesh Sivarama Pillai srm...@gmail.com Datum: 31.03.2015 07:48 (GMT+01:00) An: James Users List server-user@james.apache.org Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run Hi Bernd, Our Sys Admin has NOT performed the following things while configuring james as a service. 1. Adding the below lines in phoenix.sh #chkconfig: 2345 80 05#description: James Mail Server 2. Chkconfig command chkconfig --add james They created only the link in /etc/init.d pointing to phoenix.sh. We can start and stop the service using the service command. Do you think not doing the above two steps will impact a running James in any manner ? I am trying to understand he run levels as well. Thanks Mahesh On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai srm...@gmail.com wrote: If there is a clean shutdown through RemoteManager, it should be shown in the log rite ? The thing is, I don't see any entry in the console log which says STOPPED..I am investigating and will keep you posted. Thanks for the help so far. Thanks Mahesh On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel bwai...@intarsys.de wrote: Hi Mahesh finding a hserr would be a clear sign that something happened outside the VM. E.g. if you load a dll or lib inside your Java code and the dll produces a memory fault than the vm may crash. If a hserr is produced the vm have crashed, without writing a log or something else. The log just ends. Not finding a hserr means you need to look for something else. So I think it is not a crash. Another Idea: In the config.xml you could configure a RemoteManager Port and user. I am currently on holidays so I could not look up the syntax. You could telnet to that port and send a shutdown command. Could something simple like that happen? And about chkconfig: We had a system with james configured to run only in runlevel with gui (i think it was 5 or 6). And than a sysadmin switched the system to run without gui. So the switch to another runlevel just stopped james, with a clean shutdown. After that we just carefully looked for the runlevels. James needs to start after network, and after database if used. And also it should stop this way. Greetings Bernd Ursprüngliche Nachricht Von: Mahesh Sivarama Pillai srm...@gmail.com Datum: 29.03.2015 07:58 (GMT+01:00) An: James Users List server-user@james.apache.org Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run Thanks again Bernd... I couldn't find the hserr files under the temp or james directories. Considering we faced Too Many open files issue, will it prevent the JVM from not creating this file ? I am clueless on this issue. No process Killed James, Noone stopped James.. No OOM in logs.. No core dump :) :( Regarding the file system I will verify. As far as I know we have a NAS... On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel bwai...@intarsys.de wrote: Hi Mahesh, Don't missunderstand: Out-of-file-handle COULD lead to a memory leak, consuming memory time by time. But not NEED to. OOMs will normally been shown in the log, as I know, but we got this only for the heap memory. OOMs normally happen if the heap memory will reach the limit, and yes, we got this in the logs, sometimes. Every time I got an OOM in log, I restarted the server. Just to be sure it keeps running. So I do not have long running servers with a lot of OOM errors. So: no experience with that. But you could also get short on memory for the java classes (Native area, Method area), and I am not sure if this will show up in the log. Never had this with james. I got his when running JIRA long ago, but could not remember the log. The PID (process ID) is something handled by the linux system, it is outside James, and I think you won't find it in log. But the PID is created on startup (phonix.sh), and may be logged in the shell script to somewhere, together with a time stamp. But not in the james logs. If your sysadmins do use a monitoring tool (like nagios or icinga) the may monitor the memory. You could also monitor the memory inside the VM using JMX, but this is a little bit hard to set up. But anyway: the memory may NOT be the problem. So do not spend to much time on that. If you could find a hserr*.pid file, the file will tell the reason for