Re: [AMaViS-user] amavis proxy problems
Rok, Finally figured out what was the problem - penpals. Apparently that part of the code isn't optimised for my kind of usage usage (pgsql backend, table partitions, ...), I'm not quite sure why, but after disabling penpals on all the gateways (~8 server) the load on the SQL server dropped from 6 to 3 and there were no errors ever since... For now it'll be fine, guess we'll have to do the db redesign sometime soon :S Btw, which version of PostgreSQL was that? Check your %sql_clause settings, the current default is: 'sel_penpals' = # no message-id references list SELECT msgs.time_num, msgs.mail_id, subject. FROM msgs JOIN msgrcpt USING (partition_tag,mail_id). WHERE sid=? AND rid=? AND msgs.content!='V' AND ds='P'. ORDER BY msgs.time_num DESC, # LIMIT 1 'sel_penpals_msgid' = # with a nonempty list of message-id references SELECT msgs.time_num, msgs.mail_id, subject, message_id, rid. FROM msgs JOIN msgrcpt USING (partition_tag,mail_id). WHERE sid=? AND msgs.content!='V' AND ds='P' AND message_id IN (%m). AND rid!=sid. ORDER BY rid=? DESC, msgs.time_num DESC, # LIMIT 1 The $sql_clause{sel_penpals used} used to have a coalesce() function in the expression, which had painful effects on query times. It is gone now in 2.7.0-pre (not sure exactly in which pre-release I dropped it). You could try manually analyzing how a penpals query is evaluated by SQL, which may shed some light. Are you using weekly partition tags? Perhaps adding some index may help. Mark -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
Finally figured out what was the problem - penpals. Apparently that part of the code isn't optimised for my kind of usage usage (pgsql backend, table partitions, ...), I'm not quite sure why, but after disabling penpals on all the gateways (~8 server) the load on the SQL server dropped from 6 to 3 and there were no errors ever since... For now it'll be fine, guess we'll have to do the db redesign sometime soon :S -- BR, Rok -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
Rok, Well... that didn't work out as planned... the smtp server keeps deferring the email even though the system load is quite low and according to zabbix, an average of 99.24 instances of amavis is free in the last 12 hours (min is 95 and max is 100), the system load is 0.70... [...] I've changed the $log_level to 5 for the time being... Hopefully I'll be able to do some debugging today as the server isn't used as much due to the holiday... I'm not aware that such behaviour has ever been reported, so I can only guess. Perhaps the first thing to verify is: try telneting (repeatedly) to amavisd port and see if you always get a greeting message soon, or are you stuck on most attempts. Handling of accept(2) on sockets and locking/serialization is entirely under control of a Net::Server module. Below is a patch to add logging to Net/Server/PreForkSimple.pm, so that we could see at which point a child process is stuck. These log lines would appear in the amavis log at level 5, e.g.: amavisd.conf: $log_level = 5; $DO_SYSLOG = 1; $syslog_ident = 'amavis'; $syslog_facility = 'user'; syslog.conf: user.notice /var/log/amavisd.log user.info /var/log/amavisd-info.log user.debug,mail.info /var/log/amavisd-debug.log $ tail -f /var/log/amavisd-debug.log | fgrep 'Net::Server: accept:' --- Net/Server/PreForkSimple.pm~2010-05-05 05:01:58.0 +0200 +++ Net/Server/PreForkSimple.pm 2010-11-02 12:40:11.0 +0100 @@ -248,24 +248,28 @@ ### serialize the child accepts if( $prop-{serialize} eq 'flock' ){ +$self-log(4,accept: waiting on a flock); while (! flock($prop-{lock_fh}, Fcntl::LOCK_EX())) { next if $! == EINTR; $self-fatal(Couldn't get lock on file \$prop-{lock_file}\ [$!]); } }elsif( $prop-{serialize} eq 'semaphore' ){ +$self-log(4,accept: waiting on a semaphore); $prop-{sem}-op( 0, -1, IPC::SysV::SEM_UNDO() ) || $self-fatal(Semaphore Error [$!]); }elsif( $prop-{serialize} eq 'pipe' ){ +$self-log(4,accept: waiting on a pipe read); scalar _WAITING; # read one line - kernel says who gets it } + $self-log(4,accept: lock acquired, doing an accept); ### now do the accept method my $accept_val = $self-SUPER::accept(); - + $self-log(4,accept: accept done, releasing a lock); ### unblock serialization if( $prop-{serialize} eq 'flock' ){ flock($prop-{lock_fh}, Fcntl::LOCK_UN()); @@ -278,6 +282,7 @@ print _READY Next!\n; } + $self-log(4,accept: lock released); ### return our success return $accept_val; One other thing you may try: replace the flock serialization on a socket with a semaphore. It requires modules IPC::SysV and IPC::Semaphore. To let amavisd use it, search for 'serialize' in amavisd and change: serialize = 'flock', # flock, semaphore, pipe into a: serialize = 'semaphore', On a normal system both mechanisms achieve the same and cost about the same. Mark -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
Rok, Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: warning: timeout talking to proxy 127.0.0.1:10024 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; ... on the other hand, it may just be that amavisd (SpamAssassin) took more than the smtpd_proxy_timeout seconds (default 100) for processing a message. Check the log and see if there are entries that took more than 10 ms for processing - these would certainly result in a queue file write error. Make sure the $child_timeout in amavisd.conf is below 100 seconds - I'd set it to 45 or thereabout. Don't know if these are rare and occasional failures that you noticed, or do they happen regularly. On my previous posting I concentrated on socket serialization - which may or may not be what is happening here. If there is a lock/accept/select problem on accepting a socket, this would probably result in hang of all amavisd processes: most would be idling, but none willing to accept a connection. A stuck telnet attempt with many/all processes idling would be a symptom of such state. I have a more extensive patch to instrument the Net::Server with additional logging, if we decide that this may be the problem. But let's first rule out the more mundane possibility, as described above. Mark -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
On 29.10.2010 17:02, Rok Potočnik wrote: On 29.10.2010 16:16, Mark Martinec wrote: 100 instances is probably better for such a setup than 50. Did you enable the: -o smtpd_proxy_options=speed_adjust on the postfix smtpd service on port 25? (Postfix 2.7.0 or later is required). This is practically a must for a pre-queue setup with any substantial mail traffic. Oops... Didn't know about that one... I've just added that to master.cf on all the servers and also added it for smtps and submission on smtp servers for customers. What does amavisd-nanny show? Are all amavisd processes busy most of the time, or is it common that some processes are idle? If all are busy persistently, that would explain the situation. Is there a way to measure socket backlog queue length on amavisd socket on your system (perhaps with netstat), or is your system using SYN cookies? Netstat may be able to show syncache statistics for the tcp protocol, or socket queue drops - not sure how this works on Linux. Nanny shows that the most of the instances are idle and according to stats on zabbix it seems that ever since the change in the number of instances (50 - 100) the lowest 'idle instance' value on all the servers was 66 on one of the servers, 70 on another one and so on... I don't think that's an issue. SYN cookies are enabled on the servers, max_syn_backlog is set to 1k, iptables is disabled, selinux is permissive for the time being... Well... that didn't work out as planned... the smtp server keeps deferring the email even though the system load is quite low and according to zabbix, an average of 99.24 instances of amavis is free in the last 12 hours (min is 95 and max is 100), the system load is 0.70... --- Transcript of session follows. Out: 220 smtp-1.x.net ESMTP In: HELO rob Out: 250 smtp-1.t-2.net In: MAIL FROM: rob...@x.net Out: 250 2.1.0 Ok In: RCPT TO: x...@xxx.xx Out: 250 2.1.5 Ok In: DATA Out: 354 End data with CRLF.CRLF Out: 451 4.3.0 Error: queue file write error Session aborted, reason: lost connection For other details, see the local mail logfile --- I've changed the $log_level to 5 for the time being... Hopefully I'll be able to do some debugging today as the server isn't used as much due to the holiday... -- BR, Rok -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
Rok, I'm using postfix 2.7.1 with amavis 2.7.0 as a proxy. After some clients reported errors sending email, I've noticed postfix is sometimes unable to relay email to amavis due to 'queue file write error'. I'm kinda clueless, why... Unfortunately I can't see no logs at that time in the maillog. Due to this being an active server ( ~100 msgs/minute) I don't dare starting amavis in debug mode. Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: warning: timeout talking to proxy 127.0.0.1:10024 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: proxy-reject: END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=a...@b.c to=d...@e.f proto=SMTP helo=hhh I belive the hardware itself is adequate - ibm hs22 blade, dual quad core xeon with ht, 24GB ram, it's running Centos 5.5 x64. First I thought there aren't enough running processes of amavis, so now I'm running 100 instances (was set to 50 before), but guess that didn't fix the problem. 100 instances is probably better for such a setup than 50. Did you enable the: -o smtpd_proxy_options=speed_adjust on the postfix smtpd service on port 25? (Postfix 2.7.0 or later is required). This is practically a must for a pre-queue setup with any substantial mail traffic. What does amavisd-nanny show? Are all amavisd processes busy most of the time, or is it common that some processes are idle? If all are busy persistently, that would explain the situation. Is there a way to measure socket backlog queue length on amavisd socket on your system (perhaps with netstat), or is your system using SYN cookies? Netstat may be able to show syncache statistics for the tcp protocol, or socket queue drops - not sure how this works on Linux. Mark -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org
Re: [AMaViS-user] amavis proxy problems
On 29.10.2010 16:16, Mark Martinec wrote: 100 instances is probably better for such a setup than 50. Did you enable the: -o smtpd_proxy_options=speed_adjust on the postfix smtpd service on port 25? (Postfix 2.7.0 or later is required). This is practically a must for a pre-queue setup with any substantial mail traffic. Oops... Didn't know about that one... I've just added that to master.cf on all the servers and also added it for smtps and submission on smtp servers for customers. What does amavisd-nanny show? Are all amavisd processes busy most of the time, or is it common that some processes are idle? If all are busy persistently, that would explain the situation. Is there a way to measure socket backlog queue length on amavisd socket on your system (perhaps with netstat), or is your system using SYN cookies? Netstat may be able to show syncache statistics for the tcp protocol, or socket queue drops - not sure how this works on Linux. Nanny shows that the most of the instances are idle and according to stats on zabbix it seems that ever since the change in the number of instances (50 - 100) the lowest 'idle instance' value on all the servers was 66 on one of the servers, 70 on another one and so on... I don't think that's an issue. SYN cookies are enabled on the servers, max_syn_backlog is set to 1k, iptables is disabled, selinux is permissive for the time being... -- BR, Rok -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user Please visit http://www.ijs.si/software/amavisd/ regularly For administrativa requests please send email to rainer at openantivirus dot org