Re: URGENT HELP: James 2.3.2 not responding after few days of run

2015-03-31 Thread Mahesh Sivarama Pillai
Hi Bernd,

Here is some more information. As per the latest information, server is not
killed.. Our support team used to restart the server when we get
Connection Refused error from port 25..We have a monitoring tool which
Connects to James erver every minute, and issue a QUIT command... This
monitoring tool is getting Connection Refused error. Hence the team thought
the server is down and followed the routine Stop, Start commands...

I have the following configurations in various places in the config.xml.

SMTP Server (the only process enabled):
connectiontimeout36/connectiontimeout

Spool Manager: threads 10 /threads

Connections Manager

connections
  idle-timeout30/idle-timeout
  max-connections30/max-connections
   /connections

Thread Manager

   thread-manager
  thread-group
 namedefault/name
 priority5/priority
 is-daemonfalse/is-daemon
 max-threads100/max-threads
 min-threads20/min-threads
 min-spare-threads20/min-spare-threads
  /thread-group
   /thread-manager

The total number of threads (spool+remotedelivery etc ) are under very much
under 100. We don't have any DB configuration in config.xml as well. Do you
think the timeout values might cause the connection refused errors ?.
Especially the idle-timeout ? Isn't 5 minutes too high ? If say 30 clients
are taking few minutes, this will be more than enough to raise an alert
from the monitoring tool...

Thanks
Mahesh

On Tue, Mar 31, 2015 at 11:16 AM, Mahesh Sivarama Pillai srm...@gmail.com
wrote:

 Hi Bernd,

  Our Sys Admin has NOT performed the following things while configuring
 james as a service.

 1. Adding the below lines in phoenix.sh

 #chkconfig: 2345 80 05#description: James Mail Server

 2. Chkconfig command

 chkconfig --add james


 They created only the link in /etc/init.d pointing to phoenix.sh. We can
 start and stop the service using the service command. Do you think not
 doing the above two steps will impact a running James in any manner ? I am
 trying to understand he run levels as well.

 Thanks
 Mahesh



 On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai srm...@gmail.com
 wrote:

 If there is a clean shutdown through RemoteManager, it should be shown in
 the log rite ? The thing is, I don't see any entry in the console log which
 says STOPPED..I am investigating and will keep you posted. Thanks for the
 help so far.

 Thanks
 Mahesh

 On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel bwai...@intarsys.de
 wrote:

 Hi Mahesh

 finding a hserr would be a clear sign that something happened outside
 the VM.
 E.g. if you load a dll or lib inside your Java code and the dll produces
 a memory fault than the vm may crash.
 If a hserr is produced the vm have crashed, without writing a log or
 something else. The log just ends.
 Not finding a hserr means you need to look for something else.
 So I think it is not a crash.

 Another Idea:
 In the config.xml you could configure a RemoteManager Port and user.
 I am currently on holidays so I could not look up the syntax.
 You could telnet to that port and send a shutdown command.
 Could something simple like that happen?

 And about chkconfig:
 We had a system with james configured to run only in runlevel with gui
 (i think it was 5 or 6).
 And than a sysadmin switched the system to run without gui.
 So the switch to another runlevel just stopped james, with a clean
 shutdown.
 After that we just carefully looked for the runlevels.
 James needs to start after network, and after database if used.
 And also it should stop this way.

 Greetings Bernd


  Ursprüngliche Nachricht 
 Von: Mahesh Sivarama Pillai srm...@gmail.com
 Datum: 29.03.2015 07:58 (GMT+01:00)
 An: James Users List server-user@james.apache.org
 Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
 run

 Thanks again Bernd... I couldn't find the hserr files under the temp or
 james directories. Considering we faced Too Many open files issue, will
 it
 prevent the JVM from not creating this file ? I am clueless on this
 issue.
 No process Killed James, Noone stopped James.. No OOM in logs.. No core
 dump :) :(

 Regarding the file system I will verify. As far as I know we have a
 NAS...

 On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel bwai...@intarsys.de
 wrote:

  Hi Mahesh,
 
  Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
  consuming memory time by time. But not NEED to.
 
  OOMs will normally been shown in the log, as I know, but we got this
 only
  for the heap memory.
  OOMs normally happen if the heap memory will reach the limit, and yes,
 we
  got this in the logs, sometimes.
  Every time I got an OOM in log, I restarted the server. Just to be
 sure it
  keeps running.
  So I do not have long running servers with a lot of OOM errors. So: no
  experience with that.
 
  But you could also get short on memory for the java classes (Native
 area,
  Method area), and I am not sure if this 

AW: URGENT HELP: James 2.3.2 not responding after few days of run

2015-03-31 Thread Bernd Waibel
Hi Mahesh

I am currently on holidays. So I could not check on a linux.

The chkconfig add will add scripts for startup AND shutdown, with a defined 
order and in the defined runlevel.
Not having this means: you have the service to be started and stopped by hand.

And the process may just be killed when rebooting. This MAY result in nothing 
to be logged on shutdown.
If you reboot the Server the log may just end and the process will die. It will 
not been started again.

Just sounds like your description. Does it?

Greetings
Bernd


 Ursprüngliche Nachricht 
Von: Mahesh Sivarama Pillai srm...@gmail.com
Datum: 31.03.2015 07:48 (GMT+01:00)
An: James Users List server-user@james.apache.org
Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run

Hi Bernd,

 Our Sys Admin has NOT performed the following things while configuring
james as a service.

1. Adding the below lines in phoenix.sh

#chkconfig: 2345 80 05#description: James Mail Server

2. Chkconfig command

chkconfig --add james


They created only the link in /etc/init.d pointing to phoenix.sh. We can
start and stop the service using the service command. Do you think not
doing the above two steps will impact a running James in any manner ? I am
trying to understand he run levels as well.

Thanks
Mahesh



On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai srm...@gmail.com
wrote:

 If there is a clean shutdown through RemoteManager, it should be shown in
 the log rite ? The thing is, I don't see any entry in the console log which
 says STOPPED..I am investigating and will keep you posted. Thanks for the
 help so far.

 Thanks
 Mahesh

 On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel bwai...@intarsys.de wrote:

 Hi Mahesh

 finding a hserr would be a clear sign that something happened outside the
 VM.
 E.g. if you load a dll or lib inside your Java code and the dll produces
 a memory fault than the vm may crash.
 If a hserr is produced the vm have crashed, without writing a log or
 something else. The log just ends.
 Not finding a hserr means you need to look for something else.
 So I think it is not a crash.

 Another Idea:
 In the config.xml you could configure a RemoteManager Port and user.
 I am currently on holidays so I could not look up the syntax.
 You could telnet to that port and send a shutdown command.
 Could something simple like that happen?

 And about chkconfig:
 We had a system with james configured to run only in runlevel with gui
 (i think it was 5 or 6).
 And than a sysadmin switched the system to run without gui.
 So the switch to another runlevel just stopped james, with a clean
 shutdown.
 After that we just carefully looked for the runlevels.
 James needs to start after network, and after database if used.
 And also it should stop this way.

 Greetings Bernd


  Ursprüngliche Nachricht 
 Von: Mahesh Sivarama Pillai srm...@gmail.com
 Datum: 29.03.2015 07:58 (GMT+01:00)
 An: James Users List server-user@james.apache.org
 Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run

 Thanks again Bernd... I couldn't find the hserr files under the temp or
 james directories. Considering we faced Too Many open files issue, will it
 prevent the JVM from not creating this file ? I am clueless on this issue.
 No process Killed James, Noone stopped James.. No OOM in logs.. No core
 dump :) :(

 Regarding the file system I will verify. As far as I know we have a NAS...

 On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel bwai...@intarsys.de
 wrote:

  Hi Mahesh,
 
  Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
  consuming memory time by time. But not NEED to.
 
  OOMs will normally been shown in the log, as I know, but we got this
 only
  for the heap memory.
  OOMs normally happen if the heap memory will reach the limit, and yes,
 we
  got this in the logs, sometimes.
  Every time I got an OOM in log, I restarted the server. Just to be sure
 it
  keeps running.
  So I do not have long running servers with a lot of OOM errors. So: no
  experience with that.
 
  But you could also get short on memory for the java classes (Native
 area,
  Method area), and I am not sure if this will show up in the log. Never
 had
  this with james. I got his when running JIRA long ago, but could not
  remember the log.
 
  The PID (process ID) is something handled by the linux system, it is
  outside James, and I think you won't find it in log.
  But the PID is created on startup (phonix.sh), and may be logged in the
  shell script to somewhere, together with a time stamp.
  But not in the james logs.
 
  If your sysadmins do use a monitoring tool (like nagios or icinga) the
 may
  monitor the memory.
  You could also monitor the memory inside the VM using JMX, but this is a
  little bit hard to set up.
 
  But anyway: the memory may NOT be the problem. So do not spend to much
  time on that.
 
  If you could find a hserr*.pid file, the file will tell the reason for