java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
And Linux 2.4 On Tue, 2004-02-17 at 17:02, Michael Nestler wrote: > Phew - at least I am not the only one who has this problem ;-) Are you > using Linux 2.4 as well? > > I also did a load test before deploying James into production and I was > able to deliver about 100,000 emails within 8 hours without any issue. > The issue seems to occur only sporadically - sometimes it takes only a > few hours, sometimes a few days. But once it hangs, it does not come > back in my case (at least not within 6 hours). > > I also had the issue of James suddenly "partially" halting before I > upgraded from file to db persistence. I don't know though if this was > the same problem. > > My connections log level was set to DEBUG when the log was made. > > I did not have any ESTABLISHED smtp connection while James was hanging - > almost all had status TIME_WAIT. > > Michael > > > Cesar Bonadio wrote: > > >I think you are not alone :) > > > >In the last five days I have found several messages in the connection > >log with the same > >problem my server stops responding but after a while it comes back and > >if I run a netstat when > >I am receiving the error I really dont have more than 20 ESTABLISED > >connections. > > > >This is interesting problem because in our test server running the same > >version of James and java > >a manage to deliver more than 20.000 messages with smtp-source and the > >connections works just fine > >if I go over it complains and releases the connections after use. > > > >I will put my connection log in debug mode to see if I get something > > > > > >[]s > > > > > > > > > >On Tue, 2004-02-17 at 11:35, Michael Nestler wrote: > > > > > > > >>OK, James stopped again last night at about 1:50 am. I did not have the > >>time yet to run postal or smtp-source. But this time I have a log > >>showing smtp sockets with a 10 minute resolution. Since the last stop I > >>increased the maximum number of connections to 120 (thread number to 150). > >> > >>At 1:50 James reported in connections*.log: > >> > >>17/02/04 01:50:00 WARN connections: Maximum number of open connections > >>exceeded - refusing connection. Current number of connections is 120 > >> > >>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron > >>job "netstat | grep smtp") shows for the same time: > >> > >>Tue Feb 17 01:50:00 EST 2004 > >>tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger > >>SYN_RECV > >>tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp mta.email.contine:44071 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp mta.email.contine:42345 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp mta.email.contine:25694 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574 > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm > >>CLOSE_WAIT > >>tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp server611.colonize:3926 > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381 > >>ESTABLISHED > >>tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491 > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272 > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649 > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612 > >>CLOSE_WAIT > >>tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724 > >>CLOSE_WAIT > >>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416 > >>ESTABLISHED > >>tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710 > >>ESTABLISHED > >>tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml > >>CLOSE_WAIT > >> > >> > >>And at the next time the netstat snapshot was made: > >> > >>Tue Feb 17 02:00:00 EST 2004 > >>tcp 0 0 192.168.168.5:smtp mta.email.contine:13622 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089 > >>TIME_WAIT > >>tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server > >>TIME_WAIT > >> > >> > >> From there on the smtp sockets log file basically shows only TIME_WAIT > >>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. > >>Never netstat shows 120 open smtp connections (what James seems to > >>believe). As a strange aside the default*.log contains a database > >>connection exception (it contained the same exception at the last time > >>James stopped 6 days ago). I don't know if this means anything - could > >>this exception have caused the problem? spoolmanager*.log and > >>mailet*.log stopped recording at 1:32 am. > >> > >>To summarize the events from my point of view: > >>o James was running for about 6 days without any problem, delivering > >>about 5,000-10,000 emails a day > >>o some event at a few minutes before 1:50 am stopped James from > >>returning used smtp connection socket objects to the pool starting at > >>that time > >>o as a consequence new connections were not accepted and the service > >>stopped effectively at 1:32 am > >>o it never recovered from that situation until I relieved it by > >>restarting phoenix at 8:29 am > >> > >> From my point of view it seems to be likely that the mentioned > >>exception in default*.log caused the problem, since it happened so close > >>to the time James stopped: > >> > >>17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1 > >>is way too old: 63408 > 60000 > >>17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable > >> at > >>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135) > >> at > >>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) > >> > >> > >> at > >>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) > >> > >> > >> at > >>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) > >> > >> > >> at > >>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) > >> > >> at > >>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) > >> > >> at > >>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) > >> > >> at > >>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) > >> > >> at > >>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) > >> > >> > >> at > >>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) > >> > >> > >>On the other hand it could be just a side effect. I attached the > >>mentioned log files since midnight to this email in zipped format. > >> > >>I really hope somebody on this list can make any sense out of this and > >>knows what's going on here and how I can fix it. Let me know if I should > >>run postal or smtp-source first and I will do it. > >> > >>All help is greatly appreciated. > >> > >>Best regards, > >>Michael > >> > >>P.S. I was not able to attach a ZIP file to this email. mail.apache.org > >>bounced it saying "552 we don't accept email with executable content > >>(#5.3.4)" ? I try it again as TGZ. > >> > >> > >> > >>Noel J. Bergman wrote: > >> > >> > >> > >>>Run netstat and check to see if they are real. I have tested the connection > >>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem, > >>>amongst other versions. > >>> > >>>What version of James? > >>> > >>> --- Noel > >>> > >>> > >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
