Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Mark Martinec

Quanah,


Again, the only difference is 2.9.0 vs 2.10.1.  I.e., we used the same
uulib library in 2.9.0 without these issues.  Unless do_ascii was
modified between 2.9.0 and 2.10.1 to add uulib checks, this doesn't 
seem
like it would be the source.  I will go and remove it however, since 
it

is known problematic. :)


The do_ascii remains essentially unchanged since a long time.


Hm, it is a newer version of Convert::UUlib as well, however.
So it may be related to that.


Most likely. Apart from a possible problem in the library, a
mismatch between a version of perl the Convert::UUlib was compiled
against, vs. the running version, might also cause a problem.


I don't know if you've tested Amavis with the
current release of Convert::UUlib.  It has had some significant
changes recently.


I tested briefly (for a couple of hours) with uulib V0.5pl20
(through Convert::UUlib 1.50), from FreeBSD ports.
Seemed to work, but then I disabled it for production usage.


Either way, I guess it won't matter for me in the long term now that
do_ascii is disabled. ;)


Hope so.

  Mark


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Mark Martinec

The change log to Convert::UUlib 1.50 shows:


  Revision history for Perl extension Convert::UUlib.

  1.5  Sat Jul 11 03:56:06 CEST 2015
  - fix a heap overflow (testcase by Krzysztof Wojtaś).
  - on systems that support it (posix + mmap + map_anonymous),
allocate all dynamic areas via mmap and put four guard
pages around them, to catch similar heap overflows
safely in the future.
  - find a safer way to pass in CC/CFLAGS to uulib.
  - added stability canary support.


The extra protection (guard pages) is probably what is
causing your crashes: previously some heap overflow could
cause corruption and havoc without necessarily being noticed,
bringing down a process. If I understand the changelog
correctly, the new guard pages make it possible to detect
some runaway memory access in uulib and terminate the process
if this occurs, instead of letting a corruption spread.

This is a good step in guarding against security exploits:
better crash than let a leak be exploitable. Unfortunately
the violation cannot be contained, which affects apparent
stability.

  Mark


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Quanah Gibson-Mount
--On Wednesday, January 27, 2016 4:40 PM +0100 Mark Martinec 
 wrote:



The change log to Convert::UUlib 1.50 shows:


   Revision history for Perl extension Convert::UUlib.

   1.5  Sat Jul 11 03:56:06 CEST 2015
   - fix a heap overflow (testcase by Krzysztof Wojtaś).
   - on systems that support it (posix + mmap + map_anonymous),
 allocate all dynamic areas via mmap and put four guard
 pages around them, to catch similar heap overflows
 safely in the future.
   - find a safer way to pass in CC/CFLAGS to uulib.
   - added stability canary support.


The extra protection (guard pages) is probably what is
causing your crashes: previously some heap overflow could
cause corruption and havoc without necessarily being noticed,
bringing down a process. If I understand the changelog
correctly, the new guard pages make it possible to detect
some runaway memory access in uulib and terminate the process
if this occurs, instead of letting a corruption spread.

This is a good step in guarding against security exploits:
better crash than let a leak be exploitable. Unfortunately
the violation cannot be contained, which affects apparent
stability.


Great, thanks Mark!

--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Quanah Gibson-Mount
--On Wednesday, January 27, 2016 4:26 PM +0100 Mark Martinec 
 wrote:



Quanah,


Again, the only difference is 2.9.0 vs 2.10.1.  I.e., we used the same
uulib library in 2.9.0 without these issues.  Unless do_ascii was
modified between 2.9.0 and 2.10.1 to add uulib checks, this doesn't
seem
like it would be the source.  I will go and remove it however, since
it
is known problematic. :)


The do_ascii remains essentially unchanged since a long time.


It was blindly enabled by someone back in 2005 or earlier (our commit logs 
only go back to 2005. ;) ).


--Quanah



--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Mark Martinec

Quanah,

We recently updated to Amavisd 2.10.1 from 2.9.0 internally, and have 
found that amavisd
constantly dies while processing messages after being put under a 
moderate load in our

QA environment.

Jan 19 06:57:52 zqa-211 amavis-services[18544]: PID 13724 went away, 
13724-01


The process crashed. This is typically due to execution of some perl 
module

with embedded C code, or linked to some external library which crashed
a perl process. A less likely cause could be running into some resource
limit, unhandled by perl.

Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) get_deadline 
do_ascii_pre - deadline in 479.9 s, set to 288.000 s
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) prolong_timer 
do_ascii_pre: timer 288, was 0, deadline in 479.9 s


Then we see (all together):
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 3480 went away, 
03480-01-25
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 4970 went away, 
04970-01-6
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 2609 went away, 
02609-01-31
Jan 21 07:01:36 zqa-211 amavis-services[6954]: PID 5406 went away, 
05406-01
Jan 21 07:01:38 zqa-211 amavis-services[6954]: PID 5416 went away, 
05416-03
Jan 21 07:01:38 zqa-211 amavis-services[6954]: PID 5421 went away, 
05421-01


I.e., every single one of the above processes are in the same function.


There you go, the problem must be in do_ascii - which calls 
Convert::UUlib,
which in turn uses the uulib library - which has been known to cause 
crashes

in the past. It is ancient library, poorly maintained.

The do_ascii has been removed (commented out) from a default @decoders 
list

(I believe in amavisd 2.9.0). I suggest to remove it from your @decoders
list in the config file, it causes more grief than is worth.

  Mark



Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Quanah Gibson-Mount
--On Wednesday, January 27, 2016 3:21 PM +0100 Mark Martinec 
 wrote:



There you go, the problem must be in do_ascii - which calls
Convert::UUlib,
which in turn uses the uulib library - which has been known to cause
crashes
in the past. It is ancient library, poorly maintained.

The do_ascii has been removed (commented out) from a default @decoders
list
(I believe in amavisd 2.9.0). I suggest to remove it from your @decoders
list in the config file, it causes more grief than is worth.


Again, the only difference is 2.9.0 vs 2.10.1.  I.e., we used the same 
uulib library in 2.9.0 without these issues.  Unless do_ascii was modified 
between 2.9.0 and 2.10.1 to add uulib checks, this doesn't seem like it 
would be the source.  I will go and remove it however, since it is known 
problematic. :)


--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-27 Thread Quanah Gibson-Mount
--On Wednesday, January 27, 2016 6:40 AM -0800 Quanah Gibson-Mount 
 wrote:



--On Wednesday, January 27, 2016 3:21 PM +0100 Mark Martinec
 wrote:


There you go, the problem must be in do_ascii - which calls
Convert::UUlib,
which in turn uses the uulib library - which has been known to cause
crashes
in the past. It is ancient library, poorly maintained.

The do_ascii has been removed (commented out) from a default @decoders
list
(I believe in amavisd 2.9.0). I suggest to remove it from your @decoders
list in the config file, it causes more grief than is worth.


Again, the only difference is 2.9.0 vs 2.10.1.  I.e., we used the same
uulib library in 2.9.0 without these issues.  Unless do_ascii was
modified between 2.9.0 and 2.10.1 to add uulib checks, this doesn't seem
like it would be the source.  I will go and remove it however, since it
is known problematic. :)


Hm, it is a newer version of Convert::UUlib as well, however.  So it may be 
related to that.  I don't know if you've tested Amavis with the current 
release of Convert::UUlib.  It has had some significant changes recently.


Either way, I guess it won't matter for me in the long term now that 
do_ascii is disabled. ;)


--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Quanah Gibson-Mount
--On Thursday, January 21, 2016 11:08 AM -0800 Quanah Gibson-Mount 
 wrote:



To be clear, this happens on any server (we have hundreds) if we put
amavis under load.  They have plenty of memory, and are not running out.
I think this is related to the changes made here:

- use a perl module File::LibMagic when available, instead of spawning
  a file(1) utility for classifying contents of mail parts.
  By using a direct interface to a libmagic library the startup cost
  of spawning an external process is avoided. Benchmarking shows that
  using libmagic is significantly faster especially for checking a small
  number of files - takes 4 ms for checking one file with libmagic
  vs. 27 ms with a spawned file(1); based on a patch by Markus Benning;

or possibly this:


- adjusted some timeouts to leave more reserve for later stages of
  mail processing and forwarding;


Switching to File::LibMagic reduced the failure rate from 6% to 3%, so that 
helped some.  It appears as though the reworking really broke the usage of 
the "file" binary vs previous versions of Amavisd.  Users be warned.


However, we continue to get a 3% error rate, which really is not 
acceptable.  In the latest run, we again see:


Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) lookup_re("HTML 
document, ASCII text, with very long lines") matches key 
"(?^i:\\btext\\b)", result="asc"
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) lookup 
[map_full_type_to_short_type] => true,  "HTML document, ASCII text, with 
very long lines" matches, result="asc", matching_key="(?^i:\\btext\\b)"
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) File-type of p001: HTML 
document, ASCII text, with very long lines; (asc)
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) do_ascii: Decoding part 
p001
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) do_ascii: Setting 
sigaction handler, was 0
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) get_deadline 
do_ascii_pre - deadline in 479.8 s, set to 288.000 s
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) prolong_timer 
do_ascii_pre: timer 288, was 0, deadline in 479.8 s
Jan 25 06:40:52 zqa-211 amavis-services[22436]: PID 8673 went away, 
08673-01-13


So there seems to be some bug in do_ascii_pre itself as well.  I'll see how 
the function has changed vs 2.9.0 next, I guess.


It would be really helpful if the amavisd source was in a publicly 
accessible SCM like github.


--Quanah



--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Quanah Gibson-Mount
--On Tuesday, January 26, 2016 8:19 AM +1000 Noel Butler 
 wrote:




you might want to CC Marc directly - he hasn't posted in here in a very
log time, inn fact Oct 2014 is last time I think when he announced that
version, strange, since he posts regularly on postfix list we know he's
still alive and kickin.


He's very active on SA development too, but hasn't answered my last few 
direct emails at all.  I guess I can try another one.


--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Noel Butler
Yeah, a lot of people here have seem to have no luck, I think I 
mentioned twice about a problem with no answer (gave up asking a third 
time)


Maybe amavisd-new is going the same way as mailscanner, ie: abandonware


On 26/01/2016 08:31, Quanah Gibson-Mount wrote:

--On Tuesday, January 26, 2016 8:19 AM +1000 Noel Butler
 wrote:



you might want to CC Marc directly - he hasn't posted in here in a 
very
log time, inn fact Oct 2014 is last time I think when he announced 
that
version, strange, since he posts regularly on postfix list we know 
he's

still alive and kickin.


He's very active on SA development too, but hasn't answered my last
few direct emails at all.  I guess I can try another one.

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration




--
If you have the urge to reply to all rather than reply to list, you best
first read  http://members.ausics.net/qwerty/


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Quanah Gibson-Mount
--On Tuesday, January 26, 2016 8:50 AM +1000 Noel Butler 
 wrote:



Yeah, a lot of people here have seem to have no luck, I think I mentioned
twice about a problem with no answer (gave up asking a third time)

Maybe amavisd-new is going the same way as mailscanner, ie: abandonware


Yeah, my last email was about getting amavisd officially into a git 
repository somewhere (github maybe?) so that it could perhaps attract some 
more active development, etc.  Ability to review commits would be really 
nice too.


--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Noel Butler


you might want to CC Marc directly - he hasn't posted in here in a very 
log time, inn fact Oct 2014 is last time I think when he announced that 
version, strange, since he posts regularly on postfix list we know he's 
still alive and kickin.



On 26/01/2016 04:48, Quanah Gibson-Mount wrote:

--On Thursday, January 21, 2016 11:08 AM -0800 Quanah Gibson-Mount
 wrote:


To be clear, this happens on any server (we have hundreds) if we put
amavis under load.  They have plenty of memory, and are not running 
out.

I think this is related to the changes made here:

- use a perl module File::LibMagic when available, instead of spawning
  a file(1) utility for classifying contents of mail parts.
  By using a direct interface to a libmagic library the startup cost
  of spawning an external process is avoided. Benchmarking shows that
  using libmagic is significantly faster especially for checking a 
small

  number of files - takes 4 ms for checking one file with libmagic
  vs. 27 ms with a spawned file(1); based on a patch by Markus 
Benning;


or possibly this:


- adjusted some timeouts to leave more reserve for later stages of
  mail processing and forwarding;


Switching to File::LibMagic reduced the failure rate from 6% to 3%, so
that helped some.  It appears as though the reworking really broke the
usage of the "file" binary vs previous versions of Amavisd.  Users be
warned.

However, we continue to get a 3% error rate, which really is not
acceptable.  In the latest run, we again see:

Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) lookup_re("HTML
document, ASCII text, with very long lines") matches key
"(?^i:\\btext\\b)", result="asc"
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) lookup
[map_full_type_to_short_type] => true,  "HTML document, ASCII text,
with very long lines" matches, result="asc",
matching_key="(?^i:\\btext\\b)"
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) File-type of p001:
HTML document, ASCII text, with very long lines; (asc)
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) do_ascii: Decoding 
part p001

Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) do_ascii: Setting
sigaction handler, was 0
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) get_deadline
do_ascii_pre - deadline in 479.8 s, set to 288.000 s
Jan 25 06:40:41 zqa-211 amavis[8673]: (08673-01-13) prolong_timer
do_ascii_pre: timer 288, was 0, deadline in 479.8 s
Jan 25 06:40:52 zqa-211 amavis-services[22436]: PID 8673 went away, 
08673-01-13


So there seems to be some bug in do_ascii_pre itself as well.  I'll
see how the function has changed vs 2.9.0 next, I guess.

It would be really helpful if the amavisd source was in a publicly
accessible SCM like github.

--Quanah



--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


--
If you have the urge to reply to all rather than reply to list, you best
first read  http://members.ausics.net/qwerty/


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-25 Thread Noel Butler


Yes, I  agree, github would be perfect

On 26/01/2016 08:56, Quanah Gibson-Mount wrote:

--On Tuesday, January 26, 2016 8:50 AM +1000 Noel Butler
 wrote:

Yeah, a lot of people here have seem to have no luck, I think I 
mentioned

twice about a problem with no answer (gave up asking a third time)

Maybe amavisd-new is going the same way as mailscanner, ie: 
abandonware


Yeah, my last email was about getting amavisd officially into a git
repository somewhere (github maybe?) so that it could perhaps attract
some more active development, etc.  Ability to review commits would be
really nice too.

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


--
If you have the urge to reply to all rather than reply to list, you best
first read  http://members.ausics.net/qwerty/


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-21 Thread Quanah Gibson-Mount
--On Tuesday, January 19, 2016 10:39 PM -0700 Thomas Spuhler 
 wrote:




maybe worthwhile to look if it uses to daemonized version of clam (clamd
not  clamav)


Thanks for the thought, but this doesn't appear to be the issue (See my 
reply to Ben coming in shortly. ;) ).


--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-21 Thread Quanah Gibson-Mount
--On Tuesday, January 19, 2016 8:04 PM -0500 listsb-ama...@bitrate.net 
wrote:



On Jan 19, 2016, at 18.09, Quanah Gibson-Mount  wrote:



amavis-services[18544]: PID 13724 went away, 13724-01


does $log_level = 5 reveal any additional clues about what happened to
the process?


It looks like it's related to the recent changes around the use of the 
"file" binary.  Every process that "goes away" occurs here:


Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
target fd0 closing, to become < /dev/null
Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
target fd1 closing, to become (65) &=25
Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
target fd1 dup2 from fd25 (65) &=25
Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
source fd25 closed
Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
target fd2 closing, to become (65) &1
Jan 21 07:01:24 zqa-211 amavis[5414]: (04970-01-6) open_on_specific_fd: 
target fd2 dup2 from fd1 (65) &1
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) result line from 
file(1): p001: HTML document, ASCII text, with very long lines\n
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) lookup_re("HTML 
document, ASCII text, with very long lines") matches key 
"(?^i:\\btext\\b)", result="asc"
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) lookup 
[map_full_type_to_short_type] => true,  "HTML document, ASCII text, with 
very long lines" matches, result="asc", matching_key="(?^i:\\btext\\b)"
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) File-type of p001: HTML 
document, ASCII text, with very long lines; (asc)
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) do_ascii: Decoding part 
p001
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) do_ascii: Setting 
sigaction handler, was 0
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) get_deadline 
do_ascii_pre - deadline in 479.9 s, set to 288.000 s
Jan 21 07:01:24 zqa-211 amavis[4970]: (04970-01-6) prolong_timer 
do_ascii_pre: timer 288, was 0, deadline in 479.9 s



Then we see (all together):
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 3480 went away, 
03480-01-25
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 4970 went away, 
04970-01-6
Jan 21 07:01:34 zqa-211 amavis-services[6954]: PID 2609 went away, 
02609-01-31

Jan 21 07:01:36 zqa-211 amavis-services[6954]: PID 5406 went away, 05406-01
Jan 21 07:01:38 zqa-211 amavis-services[6954]: PID 5416 went away, 05416-03
Jan 21 07:01:38 zqa-211 amavis-services[6954]: PID 5421 went away, 05421-01

I.e., every single one of the above processes are in the same function.


It's anecdotal, but, on a handful of occasions, we have had our mail 

server use up all of its
memory, and iirc, it seemed that amavis had trouble handling that 

elegantly, and troubleshooting
was a little obscure.  most recently, the culprit was something wrt razor 

servers changing
[hostname, ip address, or such], which caused amavis children to get 

stuck.

To be clear, this happens on any server (we have hundreds) if we put amavis 
under load.  They have plenty of memory, and are not running out.  I think 
this is related to the changes made here:


- use a perl module File::LibMagic when available, instead of spawning
 a file(1) utility for classifying contents of mail parts.
 By using a direct interface to a libmagic library the startup cost
 of spawning an external process is avoided. Benchmarking shows that
 using libmagic is significantly faster especially for checking a small
 number of files - takes 4 ms for checking one file with libmagic
 vs. 27 ms with a spawned file(1); based on a patch by Markus Benning;

or possibly this:


- adjusted some timeouts to leave more reserve for later stages of
 mail processing and forwarding;


--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration


Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-19 Thread listsb-amavis
On Jan 19, 2016, at 18.09, Quanah Gibson-Mount  wrote:
> 
> We recently updated to Amavisd 2.10.1 from 2.9.0 internally, and have found 
> that amavisd constantly dies while processing messages after being put under 
> a moderate load in our QA environment.
> 
> For example, here is postfix passing off the email to amavisd:
> 
> Jan 19 06:57:42 zqa-211 postfix/smtp[24983]: 84B33102BD3: 
> to=, relay=127.0.0.1[127.0.0.1]:10024, 
> delay=32179, delays=32178/0.01/0.01/0.22, dsn=4.4.2, status=
> deferred (lost connection with 127.0.0.1[127.0.0.1] while sending end of data 
> -- message may be sent more than once)
> 
> Here we can see amavis accept it, and then die:
> Jan 19 06:57:42 zqa-211 amavis[13724]: (13724-01) ESMTP [127.0.0.1]:10024 
> /opt/zimbra/data/amavisd/tmp/amavis-20160119T065742-13724-gste5uOH: 
>  ->  211.eng.zimbra.com> SIZE=23143 Received: from zqa-211.eng.zimbra.com 
> ([127.0.0.1]) by localhost (zqa-211.eng.zimbra.com [127.0.0.1]) (amavisd-new, 
> port 10024) with ESMTP for  1.eng.zimbra.com>; Tue, 19 Jan 2016 06:57:42 -0800 (PST)
> Jan 19 06:57:42 zqa-211 amavis[13724]: (13724-01) Checking: iugKoVQZWTPd 
> [10.15.32.142]  -> 
> 
> Jan 19 06:57:52 zqa-211 amavis-services[18544]: PID 13724 went away, 13724-01

does $log_level = 5 reveal any additional clues about what happened to the 
process?  perhaps an strace might as well?  it's anecdotal, but, on a handful 
of occasions, we have had our mail server use up all of its memory, and iirc, 
it seemed that amavis had trouble handling that elegantly, and troubleshooting 
was a little obscure.  most recently, the culprit was something wrt razor 
servers changing [hostname, ip address, or such], which caused amavis children 
to get stuck.

-ben

Re: Amavis 2.10.1 dies and is unusable when put under moderate load

2016-01-19 Thread Thomas Spuhler
On Tuesday, January 19, 2016 08:04:50 PM listsb-ama...@bitrate.net wrote:
> On Jan 19, 2016, at 18.09, Quanah Gibson-Mount  wrote:
> > We recently updated to Amavisd 2.10.1 from 2.9.0 internally, and have
> > found that amavisd constantly dies while processing messages after being
> > put under a moderate load in our QA environment.
> > 
> > For example, here is postfix passing off the email to amavisd:
> > 
> > Jan 19 06:57:42 zqa-211 postfix/smtp[24983]: 84B33102BD3:
> > to=, relay=127.0.0.1[127.0.0.1]:10024,
> > delay=32179, delays=32178/0.01/0.01/0.22, dsn=4.4.2, status= deferred
> > (lost connection with 127.0.0.1[127.0.0.1] while sending end of data --
> > message may be sent more than once)
> > 
> > Here we can see amavis accept it, and then die:
> > Jan 19 06:57:42 zqa-211 amavis[13724]: (13724-01) ESMTP [127.0.0.1]:10024
> > /opt/zimbra/data/amavisd/tmp/amavis-20160119T065742-13724-gste5uOH:
> >  -> 
> > SIZE=23143 Received: from zqa-211.eng.zimbra.com ([127.0.0.1]) by
> > localhost (zqa-211.eng.zimbra.com [127.0.0.1]) (amavisd-new, port 10024)
> > with ESMTP for ; Tue, 19 Jan 2016
> > 06:57:42 -0800 (PST)
> > Jan 19 06:57:42 zqa-211 amavis[13724]: (13724-01) Checking: iugKoVQZWTPd
> > [10.15.32.142]  ->
> >  Jan 19 06:57:52 zqa-211
> > amavis-services[18544]: PID 13724 went away, 13724-01
> does $log_level = 5 reveal any additional clues about what happened to the
> process?  perhaps an strace might as well?  it's anecdotal, but, on a
> handful of occasions, we have had our mail server use up all of its memory,
> and iirc, it seemed that amavis had trouble handling that elegantly, and
> troubleshooting was a little obscure.  most recently, the culprit was
> something wrt razor servers changing [hostname, ip address, or such], which
> caused amavis children to get stuck.
> 
> -ben

maybe worthwhile to look if it uses to daemonized version of clam (clamd not 
clamav)

-- 
Best regards
Thomas Spuhler

All of my e-mails have a valid digital signature
ID 60114E63

signature.asc
Description: This is a digitally signed message part.