Justin Erenkrantz wrote:
I have these two corefiles set aside and can do any examination folks would
like to see on it. I don't know enough about Perl's internal structure to do
much good by myself.
You'll want to use Perl_sv_dump() instead of just trying to print the
pointer directly (you'll get more useful details). That being said,
there is no similarity between the three backtraces you have sent, so it
seems likely to me that you are not running into one problem but a host
of them. The first one seemed likely to be a signal race condition, the
second look awfully suspicious:
#0 Perl_malloc (nbytes=672735716) at malloc.c:1514
That's 641MB, which would probably be more than Perl should really be
asking for. The third one seems to be an attempt to clear a null SV. I
don't know that access to the core files would be much help because by
the time you get to the segfault itself, the damage has been done way
upstream (and it is hard to research those).
However, my professional recommendation is to add at least 2 if not 4
other MX boxes and share the load a little. The zombie army tends to
get fixated on a single IP even with multiple MX records, but the legit
mail will load balance much better (and if you segfault handling spam,
too bad ;-). You may just be trying to push too much traffic through a
single qpsmtpd instance.
Our current configuration is 2 equal distance MX boxes running
equivalent configurations (including virus scanning, blacklist and
address validation). Once the message has been accepted for delivery
(i.e. not blacklisted or infected), the message is relayed to the actual
mail server via qmail-qmqpc with a fall back to qmail-queue/smtproutes.
We aren't handling nearly the same volume you are, but our MX boxes
are Cobalt RaQ3's ( barely 450Mhz Pentium-equivalent) and they are quite
happily chugging along with loads below 0.5.
If you want more details, e-mail me directly and I'll tell you what I am
doing...
John