Re: [AMaViS-user] amavis proxy problems

2010-11-16 Thread Mark Martinec
Rok,

 Finally figured out what was the problem - penpals. Apparently that part
 of the code isn't optimised for my kind of usage usage (pgsql backend,
 table partitions, ...), I'm not quite sure why, but after disabling
 penpals on all the gateways (~8 server) the load on the SQL server
 dropped from 6 to 3 and there were no errors ever since... For now it'll
 be fine, guess we'll have to do the db redesign sometime soon :S

Btw, which version of PostgreSQL was that?

Check your %sql_clause settings, the current default is:

'sel_penpals' =  # no message-id references list
  SELECT msgs.time_num, msgs.mail_id, subject.
   FROM msgs JOIN msgrcpt USING (partition_tag,mail_id).
   WHERE sid=? AND rid=? AND msgs.content!='V' AND ds='P'.
   ORDER BY msgs.time_num DESC,  # LIMIT 1

'sel_penpals_msgid' =  # with a nonempty list of message-id references
  SELECT msgs.time_num, msgs.mail_id, subject, message_id, rid.
   FROM msgs JOIN msgrcpt USING (partition_tag,mail_id).
   WHERE sid=? AND msgs.content!='V' AND ds='P' AND message_id IN (%m).
 AND rid!=sid.
   ORDER BY rid=? DESC, msgs.time_num DESC,  # LIMIT 1

The $sql_clause{sel_penpals used} used to have a coalesce() function in
the expression, which had painful effects on query times. It is gone now
in 2.7.0-pre (not sure exactly in which pre-release I dropped it).

You could try manually analyzing how a penpals query is evaluated
by SQL, which may shed some light. Are you using weekly partition tags?
Perhaps adding some index may help.

  Mark

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org


Re: [AMaViS-user] amavis proxy problems

2010-11-11 Thread Rok Potočnik
Finally figured out what was the problem - penpals. Apparently that part 
of the code isn't optimised for my kind of usage usage (pgsql backend, 
table partitions, ...), I'm not quite sure why, but after disabling 
penpals on all the gateways (~8 server) the load on the SQL server 
dropped from 6 to 3 and there were no errors ever since... For now it'll 
be fine, guess we'll have to do the db redesign sometime soon :S

-- 
BR, Rok


--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org


Re: [AMaViS-user] amavis proxy problems

2010-11-02 Thread Mark Martinec
Rok,

 Well... that didn't work out as planned... the smtp server keeps
 deferring the email even though the system load is quite low and
 according to zabbix, an average of 99.24 instances of amavis is free in
 the last 12 hours (min is 95 and max is 100), the system load is 0.70...
 [...]
 I've changed the $log_level to 5 for the time being... Hopefully I'll be
 able to do some debugging today as the server isn't used as much due to
 the holiday...

I'm not aware that such behaviour has ever been reported, so I can only
guess.

Perhaps the first thing to verify is: try telneting (repeatedly) to
amavisd port and see if you always get a greeting message soon,
or are you stuck on most attempts.

Handling of accept(2) on sockets and locking/serialization is entirely
under control of a Net::Server module. Below is a patch to add logging
to Net/Server/PreForkSimple.pm, so that we could see at which point
a child process is stuck. These log lines would appear in the amavis
log at level 5, e.g.: 

 amavisd.conf:
$log_level = 5;
$DO_SYSLOG = 1;
$syslog_ident = 'amavis';
$syslog_facility = 'user';

 syslog.conf:
user.notice   /var/log/amavisd.log
user.info /var/log/amavisd-info.log
user.debug,mail.info  /var/log/amavisd-debug.log

$ tail -f /var/log/amavisd-debug.log | fgrep 'Net::Server: accept:'



--- Net/Server/PreForkSimple.pm~2010-05-05 05:01:58.0 +0200
+++ Net/Server/PreForkSimple.pm 2010-11-02 12:40:11.0 +0100
@@ -248,24 +248,28 @@
 
   ### serialize the child accepts
   if( $prop-{serialize} eq 'flock' ){
+$self-log(4,accept: waiting on a flock);
 while (! flock($prop-{lock_fh}, Fcntl::LOCK_EX())) {
   next if $! == EINTR;
   $self-fatal(Couldn't get lock on file \$prop-{lock_file}\ [$!]);
 }
 
   }elsif( $prop-{serialize} eq 'semaphore' ){
+$self-log(4,accept: waiting on a semaphore);
 $prop-{sem}-op( 0, -1, IPC::SysV::SEM_UNDO() )
   || $self-fatal(Semaphore Error [$!]);
 
   }elsif( $prop-{serialize} eq 'pipe' ){
+$self-log(4,accept: waiting on a pipe read);
 scalar _WAITING; # read one line - kernel says who gets it
   }
 
 
+  $self-log(4,accept: lock acquired, doing an accept);
   ### now do the accept method
   my $accept_val = $self-SUPER::accept();
 
-
+  $self-log(4,accept: accept done, releasing a lock);
   ### unblock serialization
   if( $prop-{serialize} eq 'flock' ){
 flock($prop-{lock_fh}, Fcntl::LOCK_UN());
@@ -278,6 +282,7 @@
 print _READY Next!\n;
   }
 
+  $self-log(4,accept: lock released);
   ### return our success
   return $accept_val;
 



One other thing you may try: replace the flock serialization on a
socket with a semaphore. It requires modules IPC::SysV and IPC::Semaphore.
To let amavisd use it, search for 'serialize' in amavisd and change:
  serialize  = 'flock', # flock, semaphore, pipe
into a:
  serialize  = 'semaphore',

On a normal system both mechanisms achieve the same and cost about
the same.

  Mark

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org


Re: [AMaViS-user] amavis proxy problems

2010-11-02 Thread Mark Martinec
Rok,

 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: warning: timeout talking
   to proxy 127.0.0.1:10024
 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: proxy-reject: 
   END-OF-MESSAGE: 451 4.3.0 Error: queue file write error;

... on the other hand, it may just be that amavisd (SpamAssassin) took
more than the smtpd_proxy_timeout seconds (default 100) for processing
a message. Check the log and see if there are entries that took more
than 10 ms for processing - these would certainly result in a
queue file write error. Make sure the $child_timeout in amavisd.conf
is below 100 seconds - I'd set it to 45 or thereabout.

Don't know if these are rare and occasional failures that you noticed,
or do they happen regularly. On my previous posting I concentrated on
socket serialization - which may or may not be what is happening here.
If there is a lock/accept/select problem on accepting a socket, this
would probably result in hang of all amavisd processes: most would
be idling, but none willing to accept a connection. A stuck telnet
attempt with many/all processes idling would be a symptom of such state.
I have a more extensive patch to instrument the Net::Server with
additional logging, if we decide that this may be the problem.
But let's first rule out the more mundane possibility, as described
above.

  Mark

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org


Re: [AMaViS-user] amavis proxy problems

2010-11-01 Thread Rok Potočnik
On 29.10.2010 17:02, Rok Potočnik wrote:
 On 29.10.2010 16:16, Mark Martinec wrote:
 100 instances is probably better for such a setup than 50.

 Did you enable the:

 -o smtpd_proxy_options=speed_adjust

 on the postfix smtpd service on port 25?  (Postfix 2.7.0 or later
 is required). This is practically a must for a pre-queue setup
 with any substantial mail traffic.

 Oops... Didn't know about that one... I've just added that to master.cf
 on all the servers and also added it for smtps and submission on smtp
 servers for customers.

 What does amavisd-nanny show? Are all amavisd processes busy
 most of the time, or is it common that some processes are idle?
 If all are busy persistently, that would explain the situation.

 Is there a way to measure socket backlog queue length on amavisd socket
 on your system (perhaps with netstat), or is your system using SYN cookies?
 Netstat may be able to show syncache statistics for the tcp protocol,
 or socket queue drops - not sure how this works on Linux.

 Nanny shows that the most of the instances are idle and according to
 stats on zabbix it seems that ever since the change in the number of
 instances (50 -  100) the lowest 'idle instance' value on all the
 servers was 66 on one of the servers, 70 on another one and so on... I
 don't think that's an issue. SYN cookies are enabled on the servers,
 max_syn_backlog is set to 1k, iptables is disabled, selinux is
 permissive for the time being...


Well... that didn't work out as planned... the smtp server keeps 
deferring the email even though the system load is quite low and 
according to zabbix, an average of 99.24 instances of amavis is free in 
the last 12 hours (min is 95 and max is 100), the system load is 0.70...

---
Transcript of session follows.

  Out: 220 smtp-1.x.net ESMTP
  In:  HELO rob
  Out: 250 smtp-1.t-2.net
  In:  MAIL FROM: rob...@x.net
  Out: 250 2.1.0 Ok
  In:  RCPT TO: x...@xxx.xx
  Out: 250 2.1.5 Ok
  In:  DATA
  Out: 354 End data with CRLF.CRLF
  Out: 451 4.3.0 Error: queue file write error

Session aborted, reason: lost connection

For other details, see the local mail logfile
---

I've changed the $log_level to 5 for the time being... Hopefully I'll be 
able to do some debugging today as the server isn't used as much due to 
the holiday...

-- 
BR, Rok

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org

Re: [AMaViS-user] amavis proxy problems

2010-10-29 Thread Mark Martinec
Rok,

 I'm using postfix 2.7.1 with amavis 2.7.0 as a proxy. After some clients
 reported errors sending email, I've noticed postfix is sometimes unable
 to relay email to amavis due to 'queue file write error'. I'm kinda
 clueless, why... Unfortunately I can't see no logs at that time in the
 maillog. Due to this being an active server ( ~100 msgs/minute) I don't
 dare starting amavis in debug mode.
 
 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: warning: timeout talking to
 proxy 127.0.0.1:10024
 Oct 28 15:30:26 smtp-1 postfix/smtpd[3215]: proxy-reject:
 END-OF-MESSAGE: 451 4.3.0 Error: queue file write error; from=a...@b.c
 to=d...@e.f proto=SMTP helo=hhh
 
 I belive the hardware itself is adequate - ibm hs22 blade, dual quad
 core xeon with ht, 24GB ram, it's running Centos 5.5 x64. First I
 thought there aren't enough running processes of amavis, so now I'm
 running 100 instances (was set to 50 before), but guess that didn't fix
 the problem.

100 instances is probably better for such a setup than 50.

Did you enable the:

  -o smtpd_proxy_options=speed_adjust

on the postfix smtpd service on port 25?  (Postfix 2.7.0 or later
is required). This is practically a must for a pre-queue setup
with any substantial mail traffic.

What does amavisd-nanny show? Are all amavisd processes busy
most of the time, or is it common that some processes are idle?
If all are busy persistently, that would explain the situation.

Is there a way to measure socket backlog queue length on amavisd socket
on your system (perhaps with netstat), or is your system using SYN cookies?
Netstat may be able to show syncache statistics for the tcp protocol,
or socket queue drops - not sure how this works on Linux.

  Mark

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org


Re: [AMaViS-user] amavis proxy problems

2010-10-29 Thread Rok Potočnik
On 29.10.2010 16:16, Mark Martinec wrote:
 100 instances is probably better for such a setup than 50.

 Did you enable the:

-o smtpd_proxy_options=speed_adjust

 on the postfix smtpd service on port 25?  (Postfix 2.7.0 or later
 is required). This is practically a must for a pre-queue setup
 with any substantial mail traffic.

Oops... Didn't know about that one... I've just added that to master.cf 
on all the servers and also added it for smtps and submission on smtp 
servers for customers.

 What does amavisd-nanny show? Are all amavisd processes busy
 most of the time, or is it common that some processes are idle?
 If all are busy persistently, that would explain the situation.

 Is there a way to measure socket backlog queue length on amavisd socket
 on your system (perhaps with netstat), or is your system using SYN cookies?
 Netstat may be able to show syncache statistics for the tcp protocol,
 or socket queue drops - not sure how this works on Linux.

Nanny shows that the most of the instances are idle and according to 
stats on zabbix it seems that ever since the change in the number of 
instances (50 - 100) the lowest 'idle instance' value on all the 
servers was 66 on one of the servers, 70 on another one and so on... I 
don't think that's an issue. SYN cookies are enabled on the servers, 
max_syn_backlog is set to 1k, iptables is disabled, selinux is 
permissive for the time being...

-- 
BR, Rok


--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot 
org