java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

And Linux 2.4


On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:

> Phew - at least I am not the only one who has this problem ;-) Are you 
> using Linux 2.4 as well?
> 
> I also did a load test before deploying James into production and I was 
> able to deliver about 100,000 emails within 8 hours without any issue. 
> The issue seems to occur only sporadically - sometimes it takes only a 
> few hours, sometimes a few days. But once it hangs, it does not come 
> back in my case (at least not within 6 hours).
> 
> I also had the issue of James suddenly "partially" halting before I 
> upgraded from file to db persistence. I don't know though if this was 
> the same problem.
> 
> My connections log level was set to DEBUG when the log was made.
> 
> I did not have any ESTABLISHED smtp connection while James was hanging - 
> almost all had status TIME_WAIT.
> 
> Michael
> 
> 
> Cesar Bonadio wrote:
> 
> >I think you are not alone :) 
> >
> >In the last five days I have found several messages in the connection
> >log with the same
> >problem my server stops responding but after a while it comes back and
> >if I run a netstat when
> >I am receiving the error I really dont have more than 20 ESTABLISED
> >connections.
> >
> >This is interesting problem because in our test server running the same
> >version of James and java
> >a manage to deliver more than 20.000 messages with smtp-source and the
> >connections works just fine
> >if I go over it complains and releases the connections after use.
> >
> >I will put my connection log in debug mode to see if I get something
> >
> >
> >[]s
> >
> >
> >
> >
> >On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
> >
> >  
> >
> >>OK, James stopped again last night at about 1:50 am. I did not have the 
> >>time yet to run postal or smtp-source. But this time I have a log 
> >>showing smtp sockets with a 10 minute resolution. Since the last stop I 
> >>increased the maximum number of connections to 120 (thread number to 150).
> >>
> >>At 1:50 James reported in connections*.log:
> >>
> >>17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
> >>exceeded - refusing connection.  Current number of connections is 120
> >>
> >>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
> >>job "netstat | grep smtp") shows for the same time:
> >>
> >>Tue Feb 17 01:50:00 EST 2004
> >>tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
> >>SYN_RECV   
> >>tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
> >>CLOSE_WAIT 
> >>tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
> >>ESTABLISHED
> >>tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
> >>CLOSE_WAIT 
> >>tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
> >>CLOSE_WAIT 
> >>
> >>
> >>And at the next time the netstat snapshot was made:
> >>
> >>Tue Feb 17 02:00:00 EST 2004
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
> >>TIME_WAIT  
> >>
> >>
> >> From there on the smtp sockets log file basically shows only TIME_WAIT 
> >>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
> >>Never netstat shows 120 open smtp connections (what James seems to 
> >>believe). As a strange aside the default*.log contains a database 
> >>connection exception (it contained the same exception at the last time 
> >>James stopped 6 days ago). I don't know if this means anything - could 
> >>this exception have caused the problem? spoolmanager*.log and 
> >>mailet*.log stopped recording at 1:32 am.
> >>
> >>To summarize the events from my point of view:
> >>o James was running for about 6 days without any problem, delivering 
> >>about 5,000-10,000 emails a day
> >>o some event at a few minutes before 1:50 am stopped James from 
> >>returning used smtp connection socket objects to the pool starting at 
> >>that time
> >>o as a consequence new connections were not accepted and the service 
> >>stopped effectively at 1:32 am
> >>o it never recovered from that situation until I relieved it by 
> >>restarting phoenix at 8:29 am
> >>
> >> From my point of view it seems to be likely that the mentioned 
> >>exception in default*.log caused the problem, since it happened so close 
> >>to the time James stopped:
> >>
> >>17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
> >>is way too old: 63408 > 60000
> >>17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
> >>   at 
> >>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
> >>   at 
> >>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
> >> 
> >>
> >>   at 
> >>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
> >> 
> >>
> >>   at 
> >>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
> >> 
> >>
> >>   at 
> >>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
> >>
> >>   at 
> >>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
> >>
> >>   at 
> >>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
> >>
> >>   at 
> >>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
> >>
> >>   at 
> >>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
> >> 
> >>
> >>   at 
> >>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
> >>
> >>
> >>On the other hand it could be just a side effect. I attached the 
> >>mentioned log files since midnight to this email in zipped format.
> >>
> >>I really hope somebody on this list can make any sense out of this and 
> >>knows what's going on here and how I can fix it. Let me know if I should 
> >>run postal or smtp-source first and I will do it.
> >>
> >>All help is greatly appreciated.
> >>
> >>Best regards,
> >>Michael
> >>
> >>P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
> >>bounced it saying "552 we don't accept email with executable content 
> >>(#5.3.4)" ? I try it again as TGZ.
> >>
> >>
> >>
> >>Noel J. Bergman wrote:
> >>
> >>    
> >>
> >>>Run netstat and check to see if they are real.  I have tested the connection
> >>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
> >>>amongst other versions.
> >>>
> >>>What version of James?
> >>>
> >>>   --- Noel
> >>>
> >>>      
> >>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Reply via email to