Phew - at least I am not the only one who has this problem ;-) Are you using Linux 2.4 as well?

I also did a load test before deploying James into production and I was able to deliver about 100,000 emails within 8 hours without any issue. The issue seems to occur only sporadically - sometimes it takes only a few hours, sometimes a few days. But once it hangs, it does not come back in my case (at least not within 6 hours).

I also had the issue of James suddenly "partially" halting before I upgraded from file to db persistence. I don't know though if this was the same problem.

My connections log level was set to DEBUG when the log was made.

I did not have any ESTABLISHED smtp connection while James was hanging - almost all had status TIME_WAIT.

Michael


Cesar Bonadio wrote:


I think you are not alone :)

In the last five days I have found several messages in the connection
log with the same
problem my server stops responding but after a while it comes back and
if I run a netstat when
I am receiving the error I really dont have more than 20 ESTABLISED
connections.

This is interesting problem because in our test server running the same
version of James and java
a manage to deliver more than 20.000 messages with smtp-source and the
connections works just fine
if I go over it complains and releases the connections after use.

I will put my connection log in debug mode to see if I get something


[]s





On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:




OK, James stopped again last night at about 1:50 am. I did not have the time yet to run postal or smtp-source. But this time I have a log showing smtp sockets with a 10 minute resolution. Since the last stop I increased the maximum number of connections to 120 (thread number to 150).

At 1:50 James reported in connections*.log:

17/02/04 01:50:00 WARN connections: Maximum number of open connections exceeded - refusing connection. Current number of connections is 120

repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron job "netstat | grep smtp") shows for the same time:

Tue Feb 17 01:50:00 EST 2004
tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger SYN_RECV tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute ESTABLISHED
tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mta.email.contine:44071 TIME_WAIT tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074 ESTABLISHED
tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902 ESTABLISHED
tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:42345 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:25694 ESTABLISHED
tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652 TIME_WAIT tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574 ESTABLISHED
tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm CLOSE_WAIT tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853 TIME_WAIT tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221 ESTABLISHED
tcp 0 0 192.168.168.5:smtp server611.colonize:3926 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx CLOSE_WAIT tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611 ESTABLISHED
tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381 ESTABLISHED
tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491 CLOSE_WAIT tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761 ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001 ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes CLOSE_WAIT tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612 CLOSE_WAIT tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724 CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801 ESTABLISHED
tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax ESTABLISHED
tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416 ESTABLISHED
tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml CLOSE_WAIT



And at the next time the netstat snapshot was made:


Tue Feb 17 02:00:00 EST 2004
tcp 0 0 192.168.168.5:smtp mta.email.contine:13622 TIME_WAIT tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699 TIME_WAIT tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp TIME_WAIT tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349 TIME_WAIT tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281 TIME_WAIT tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669 TIME_WAIT tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519 TIME_WAIT tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518 TIME_WAIT tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089 TIME_WAIT tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server TIME_WAIT



From there on the smtp sockets log file basically shows only TIME_WAIT sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. Never netstat shows 120 open smtp connections (what James seems to believe). As a strange aside the default*.log contains a database connection exception (it contained the same exception at the last time James stopped 6 days ago). I don't know if this means anything - could this exception have caused the problem? spoolmanager*.log and mailet*.log stopped recording at 1:32 am.


To summarize the events from my point of view:
o James was running for about 6 days without any problem, delivering about 5,000-10,000 emails a day
o some event at a few minutes before 1:50 am stopped James from returning used smtp connection socket objects to the pool starting at that time
o as a consequence new connections were not accepted and the service stopped effectively at 1:32 am
o it never recovered from that situation until I relieved it by restarting phoenix at 8:29 am


From my point of view it seems to be likely that the mentioned exception in default*.log caused the problem, since it happened so close to the time James stopped:

17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1 is way too old: 63408 > 60000
17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
at org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
at org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)


at org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)

at org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)

at org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)

at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)

at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)

at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)

at org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)

at org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)


On the other hand it could be just a side effect. I attached the mentioned log files since midnight to this email in zipped format.


I really hope somebody on this list can make any sense out of this and knows what's going on here and how I can fix it. Let me know if I should run postal or smtp-source first and I will do it.

All help is greatly appreciated.

Best regards,
Michael

P.S. I was not able to attach a ZIP file to this email. mail.apache.org bounced it saying "552 we don't accept email with executable content (#5.3.4)" ? I try it again as TGZ.



Noel J. Bergman wrote:



Run netstat and check to see if they are real.  I have tested the connection
handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
amongst other versions.

What version of James?

--- Noel





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to