Re: URGENT HELP: James 2.3.2 not responding after few days of run

Mahesh Sivarama Pillai Tue, 31 Mar 2015 07:19:12 -0700

Hi Bernd,

Here is some more information. As per the latest information, server is not
killed.. Our support team used to restart the server when we get
"Connection Refused" error from port 25..We have a monitoring tool which
Connects to James erver every minute, and issue a QUIT command... This
monitoring tool is getting Connection Refused error. Hence the team thought
the server is down and followed the routine Stop, Start commands...


I have the following configurations in various places in the config.xml.

SMTP Server (the only process enabled):
<connectiontimeout>360000</connectiontimeout>

Spool Manager: <threads> 10 </threads>

Connections Manager

    <connections>
      <idle-timeout>300000</idle-timeout>
      <max-connections>30</max-connections>
   </connections>

Thread Manager

   <thread-manager>
      <thread-group>
         <name>default</name>
         <priority>5</priority>
         <is-daemon>false</is-daemon>
         <max-threads>100</max-threads>
         <min-threads>20</min-threads>
         <min-spare-threads>20</min-spare-threads>
      </thread-group>
   </thread-manager>

The total number of threads (spool+remotedelivery etc ) are under very much
under 100. We don't have any DB configuration in config.xml as well. Do you
think the timeout values might cause the connection refused errors ?.
Especially the idle-timeout ? Isn't 5 minutes too high ? If say 30 clients
are taking few minutes, this will be more than enough to raise an alert
from the monitoring tool...

Thanks
Mahesh

On Tue, Mar 31, 2015 at 11:16 AM, Mahesh Sivarama Pillai <srm...@gmail.com>
wrote:

> Hi Bernd,
>
>  Our Sys Admin has NOT performed the following things while configuring
> james as a service.
>
> 1. Adding the below lines in phoenix.sh
>
> #chkconfig: 2345 80 05#description: James Mail Server
>
> 2. Chkconfig command
>
> chkconfig --add james
>
>
> They created only the link in /etc/init.d pointing to phoenix.sh. We can
> start and stop the service using the service command. Do you think not
> doing the above two steps will impact a running James in any manner ? I am
> trying to understand he run levels as well.
>
> Thanks
> Mahesh
>
>
>
> On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai <srm...@gmail.com>
> wrote:
>
>> If there is a clean shutdown through RemoteManager, it should be shown in
>> the log rite ? The thing is, I don't see any entry in the console log which
>> says STOPPED..I am investigating and will keep you posted. Thanks for the
>> help so far.
>>
>> Thanks
>> Mahesh
>>
>> On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel <bwai...@intarsys.de>
>> wrote:
>>
>>> Hi Mahesh
>>>
>>> finding a hserr would be a clear sign that something happened outside
>>> the VM.
>>> E.g. if you load a dll or lib inside your Java code and the dll produces
>>> a memory fault than the vm may crash.
>>> If a hserr is produced the vm have crashed, without writing a log or
>>> something else. The log just ends.
>>> Not finding a hserr means you need to look for something else.
>>> So I think it is not a crash.
>>>
>>> Another Idea:
>>> In the config.xml you could configure a RemoteManager Port and user.
>>> I am currently on holidays so I could not look up the syntax.
>>> You could telnet to that port and send a shutdown command.
>>> Could something simple like that happen?
>>>
>>> And about chkconfig:
>>> We had a system with james configured to run only in runlevel "with gui"
>>> (i think it was 5 or 6).
>>> And than a sysadmin switched the system to run "without gui".
>>> So the switch to another runlevel just stopped james, with a clean
>>> shutdown.
>>> After that we just carefully looked for the runlevels.
>>> James needs to start after network, and after database if used.
>>> And also it should stop this way.
>>>
>>> Greetings Bernd
>>>
>>>
>>> -------- Ursprüngliche Nachricht --------
>>> Von: Mahesh Sivarama Pillai <srm...@gmail.com>
>>> Datum: 29.03.2015 07:58 (GMT+01:00)
>>> An: James Users List <server-user@james.apache.org>
>>> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
>>> run
>>>
>>> Thanks again Bernd... I couldn't find the hserr files under the temp or
>>> james directories. Considering we faced Too Many open files issue, will
>>> it
>>> prevent the JVM from not creating this file ? I am clueless on this
>>> issue.
>>> No process Killed James, Noone stopped James.. No OOM in logs.. No core
>>> dump :) :(
>>>
>>> Regarding the file system I will verify. As far as I know we have a
>>> NAS...
>>>
>>> On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel <bwai...@intarsys.de>
>>> wrote:
>>>
>>> > Hi Mahesh,
>>> >
>>> > Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
>>> > consuming memory time by time. But not NEED to.
>>> >
>>> > OOMs will normally been shown in the log, as I know, but we got this
>>> only
>>> > for the heap memory.
>>> > OOMs normally happen if the heap memory will reach the limit, and yes,
>>> we
>>> > got this in the logs, sometimes.
>>> > Every time I got an OOM in log, I restarted the server. Just to be
>>> sure it
>>> > keeps running.
>>> > So I do not have long running servers with a lot of OOM errors. So: no
>>> > experience with that.
>>> >
>>> > But you could also get short on memory for the java classes (Native
>>> area,
>>> > Method area), and I am not sure if this will show up in the log. Never
>>> had
>>> > this with james. I got his when running JIRA long ago, but could not
>>> > remember the log.
>>> >
>>> > The PID (process ID) is something handled by the linux system, it is
>>> > outside James, and I think you won't find it in log.
>>> > But the PID is created on startup (phonix.sh), and may be logged in the
>>> > shell script to somewhere, together with a time stamp.
>>> > But not in the james logs.
>>> >
>>> > If your sysadmins do use a monitoring tool (like nagios or icinga) the
>>> may
>>> > monitor the memory.
>>> > You could also monitor the memory inside the VM using JMX, but this is
>>> a
>>> > little bit hard to set up.
>>> >
>>> > But anyway: the memory may NOT be the problem. So do not spend to much
>>> > time on that.
>>> >
>>> > If you could find a hserr*.pid file, the file will tell the reason for
>>> > "crashing".
>>> >
>>> >
>>> > There is something else I could remember. But with another software.
>>> > If the log file is stored on a file server (not a local directory), and
>>> > the file server will reboot, you will loose the log.
>>> > We got a java process which "died", cause the file server has been
>>> > rebooted at midnight, and the java process lost all mounted
>>> directories.
>>> > After that we made sure that the log directory is always local. And the
>>> > programm directory too.
>>> > You may check if your server uses mounted file systems.
>>> >
>>> >
>>> > Greetings
>>> > Bernd
>>> >
>>> > -----Ursprüngliche Nachricht-----
>>> > Von: Mahesh Sivarama Pillai [mailto:srm...@gmail.com]
>>> > Gesendet: Freitag, 27. März 2015 15:17
>>> > An: James Users List
>>> > Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
>>> run
>>> >
>>> > Hi Bernd,
>>> >
>>> >  Thanks for the pointers. Let me ask the Sys admin on these details.
>>> Btw,
>>> > will this memory leak be shown in the logs? I couldn't find any OOM
>>> errors
>>> > in any of the logs. When the issue, happened, our team restarted the
>>> > server. It will create a new PID rite ? Is there a way we can see the
>>> old
>>> > pids from the james logs ?
>>> >
>>> > Thanks
>>> > Mahesh
>>> >
>>> > On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <bwai...@intarsys.de>
>>> wrote:
>>> >
>>> > > Hi Mahesh
>>> > >
>>> > > to man open files may result in a memory leak.
>>> > > Could the sysadmin monitor the memory?
>>> > >
>>> > > It is a java prozess. Is there a file called hserr*.pid? That is
>>> > > produced if the vm crashes.
>>> > >
>>> > > Ciao
>>> > > Bernd
>>> > >
>>> > >
>>> > > -------- Ursprüngliche Nachricht --------
>>> > > Von: Mahesh Sivarama Pillai <srm...@gmail.com>
>>> > > Datum: 27.03.2015 14:18 (GMT+01:00)
>>> > > An: James Users List <server-user@james.apache.org>
>>> > > Betreff: URGENT HELP: James 2.3.2 not responding after few days of
>>> run
>>> > >
>>> > > Hi,
>>> > >
>>> > >  I need an urgent help. We have rolled out James 2.3.2 to production
>>> > > for our email processing application. I see that James getting
>>> > > shutdown (no trace in the phoenix.console) after few days of run. It
>>> > > processes around 100K email a day and sends a good amount of
>>> > > Notification through RemoveDelivery.
>>> > >
>>> > > I have verified the logs but I couldn't find any reason for this
>>> > > abnormal shutdown. I have seen couple of "Too Many Open Files" errors
>>> > > in smtpserver log and spoolmanager log. But I think those will not
>>> bring
>>> > down the server.
>>> > > Will they ? I am not sure if James is killed by some other Linux
>>> process.
>>> > > James is running under a user (eg: james) account with sudo access to
>>> > > run on port 25. Since I don't have root access, what all areas that I
>>> > > look to figure out what the problem is ? If I want to talk to Sys
>>> > > Admin, what all information that I should ask him/her to gather ?
>>> > >
>>> > > James is running on a 4 CPU machine with 8GB RAM. Heapsize of James
>>> is
>>> > > set to 4GB.
>>> > >
>>> > > I have configured to run James as service in Linux. I am not sure if
>>> > > our Sys Admin run the chkconfig command. Is there any impact of not
>>> > > running this command ? Please provide your inputs as early as
>>> possible..
>>> > >
>>> > >
>>> > > Thanks
>>> > > Mahesh
>>> > >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>>> > For additional commands, e-mail: server-user-h...@james.apache.org
>>> >
>>>
>>
>>
>

Re: URGENT HELP: James 2.3.2 not responding after few days of run

Reply via email to