On Fri, Mar 15, 2002 at 10:14:57AM -0500, ListServ wrote: ... > I'm attempting to upgrade my POP server to a new box (bigger) running > Solaris 8. I've configured the box, and thought everything was working > great. I put the new box in service yesterday morning, and at first > everything was humming along great -- this was in the early morning hours, > when utilization was low. As it got closer to peak time (8am) things > started to worsen and worsen quickly. > > The POP service took longer and longer to respond. Load average on the box > wasn't high, CPU utilization wasn't high either. However, doing a "df -k" > command I noticed that my swap space was shrinking dramatically. I have 1GB > of swap space setup (I also have 1GB of RAM), and I was down to less have > 400MB of swap space. I then discovered that I had over 1500 active popper > sessions running on the box at same time "ps -ef|grep popper|wc -l" My > current POP server is lucky if it ever sees 100 concurrent sessions... big > difference. This was using QPopper 4.0.3. ...
Ouch. Sounds pretty nightmarish. It sounds like there is some problem with signal or socket handling, or with the combination of the two, such that qpopper never sees the incoming connection go away. That might mean it would fail to terminate unless it sees a "QUIT" from the client, though I would expect it to time out and terminate anyway in that case. There are many clients which don't send a "QUIT" message. For these clients you *should* be seeing log messages like: ... I/O error flushing output to client xxxxxxx at xx.yy.zz.ww [xx.yy.zz.ww]: Operation not permitted (1) ... xxxxxxx at xx.yy.zz.ww (xx.yy.zz.ww): -ERR POP hangup from server.example.com or ... xxxxxxx at xx.yy.zz.ww (xx.yy.zz.ww): -ERR POP EOF or I/O Error If you *never* see these messages, it's probably a serious problem, because it means that qpopper never sees from the OS that a client has gone away on it. > I also noticed that there were several popper sessions that were currently > running that were actually myself checking mail on the box (7 sessions to be > exact), and I had closed my mail program by this time. There were other > multiple sessions too from other users. It shouldn't be able to start a second session unless the first session is *nearly* completely terminated - temp file closed, etc. It sounds to me like some bug in Solaris is causing it to hang when it is trying to terminate. Maybe Solaris is blocking on writes to a closed/vanished socket, instead of terminating the write with an error? > my entry in the inetd.conf file looks like this: > pop3 stream tcp nowait.600 root /usr/local/lib/popper popper -R > > the .600 is there because the default of 40 instances per second just wasn't > enough, and 100 didn't seem to help, 150 didn't help... 600 worked, and > still is. This normally needs to be fixed for busy systems, but doesn't sound like it relates. > To summarize Solaris 8 and Qpopper (versions 3.0.2, 3.1.2, and 4.0.3) seem > to have an issue where the pop sessions take an extremely long time to > finish and close. Upping the nowait.[max num] parameter helps because it > allows more popper sessions to be created in a 60 second window, but that's > not fixing the problem, and ultimately not doing me any good. Maybe the OS is not detecting the TCP connections as vanished, and is therefore keeping the processes hanging around, blocked on their final writes to it. Try checking the timeouts for TCP-related issues on Solaris 8 and your previous server, with sysctl or the Solaris equivalent. I have no access to a Solaris system, and no time to troubleshoot this even if I did, but I know there are other Solaris users on the list. -- Clifton -- Clifton Royston -- LavaNet Systems Architect -- [EMAIL PROTECTED] WWJD? "JWRTFM!" - Scott Dorsey (kludge) "JWG" - Eddie Aikau
