I also did a load test before deploying James into production and I was able to deliver about 100,000 emails within 8 hours without any issue. The issue seems to occur only sporadically - sometimes it takes only a few hours, sometimes a few days. But once it hangs, it does not come back in my case (at least not within 6 hours).
I also had the issue of James suddenly "partially" halting before I upgraded from file to db persistence. I don't know though if this was the same problem.
My connections log level was set to DEBUG when the log was made.
I did not have any ESTABLISHED smtp connection while James was hanging - almost all had status TIME_WAIT.
Michael
Cesar Bonadio wrote:
I think you are not alone :)
In the last five days I have found several messages in the connection log with the same problem my server stops responding but after a while it comes back and if I run a netstat when I am receiving the error I really dont have more than 20 ESTABLISED connections.
This is interesting problem because in our test server running the same version of James and java a manage to deliver more than 20.000 messages with smtp-source and the connections works just fine if I go over it complains and releases the connections after use.
I will put my connection log in debug mode to see if I get something
[]s
On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
OK, James stopped again last night at about 1:50 am. I did not have the time yet to run postal or smtp-source. But this time I have a log showing smtp sockets with a 10 minute resolution. Since the last stop I increased the maximum number of connections to 120 (thread number to 150).
At 1:50 James reported in connections*.log:
17/02/04 01:50:00 WARN connections: Maximum number of open connections exceeded - refusing connection. Current number of connections is 120
repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron job "netstat | grep smtp") shows for the same time:
Tue Feb 17 01:50:00 EST 2004
tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger SYN_RECV tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute ESTABLISHED
tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mta.email.contine:44071 TIME_WAIT tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074 ESTABLISHED
tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902 ESTABLISHED
tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:42345 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:25694 ESTABLISHED
tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652 TIME_WAIT tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574 ESTABLISHED
tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm CLOSE_WAIT tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853 TIME_WAIT tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221 ESTABLISHED
tcp 0 0 192.168.168.5:smtp server611.colonize:3926 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx CLOSE_WAIT tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611 ESTABLISHED
tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381 ESTABLISHED
tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491 CLOSE_WAIT tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761 ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001 ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes CLOSE_WAIT tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612 CLOSE_WAIT tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724 CLOSE_WAIT tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794 ESTABLISHED
tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801 ESTABLISHED
tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax ESTABLISHED
tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416 ESTABLISHED
tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710 ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml CLOSE_WAIT
And at the next time the netstat snapshot was made:
Tue Feb 17 02:00:00 EST 2004
tcp 0 0 192.168.168.5:smtp mta.email.contine:13622 TIME_WAIT tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699 TIME_WAIT tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp TIME_WAIT tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349 TIME_WAIT tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281 TIME_WAIT tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669 TIME_WAIT tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519 TIME_WAIT tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518 TIME_WAIT tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089 TIME_WAIT tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server TIME_WAIT
From there on the smtp sockets log file basically shows only TIME_WAIT sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. Never netstat shows 120 open smtp connections (what James seems to believe). As a strange aside the default*.log contains a database connection exception (it contained the same exception at the last time James stopped 6 days ago). I don't know if this means anything - could this exception have caused the problem? spoolmanager*.log and mailet*.log stopped recording at 1:32 am.
To summarize the events from my point of view:
o James was running for about 6 days without any problem, delivering about 5,000-10,000 emails a day
o some event at a few minutes before 1:50 am stopped James from returning used smtp connection socket objects to the pool starting at that time
o as a consequence new connections were not accepted and the service stopped effectively at 1:32 am
o it never recovered from that situation until I relieved it by restarting phoenix at 8:29 am
From my point of view it seems to be likely that the mentioned exception in default*.log caused the problem, since it happened so close to the time James stopped:
17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1 is way too old: 63408 > 60000
17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
at org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
at org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
at org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
at org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
at org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
at org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
at org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
On the other hand it could be just a side effect. I attached the mentioned log files since midnight to this email in zipped format.
I really hope somebody on this list can make any sense out of this and knows what's going on here and how I can fix it. Let me know if I should run postal or smtp-source first and I will do it.
All help is greatly appreciated.
Best regards, Michael
P.S. I was not able to attach a ZIP file to this email. mail.apache.org bounced it saying "552 we don't accept email with executable content (#5.3.4)" ? I try it again as TGZ.
Noel J. Bergman wrote:
Run netstat and check to see if they are real. I have tested the connection handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem, amongst other versions.
What version of James?
--- Noel
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
