On Fri, 27 Jan 2012, Kris Deugau wrote:
Every so often, one of our spamd instances gets locked up when a burst of
messages with "lots" (150-200K+) of body text gets passed in.
If we catch this happening, restarting spamd seems to clear up whatever gets
deadlocked. Otherwise, it typically takes 10-15 minutes to get unlocked, and
then there's a big burst of processing as the backlog clears.
But it does eventually recover?
Sounds like you're hitting swap. When that happens things *really* bog
down.
How much memory do you have, and how many max spamd children are defined?
Can you capture "top" or other process stats while this is happening?
Is this a Bayes update deadlock? (We use a global Bayes DB, currently MySQL
ISAM tables on a tmpfs.) Testing just before migrating to the current
hardware showed this was actually the *fastest* (and least I/O-intensive)
setup (comparing with InnoDB tables on disk, or "memory" tables).
Devoting memory to a tmpfs for bayes means less memory is available to
spamd and makes it more likely you're hitting swap during a message
burst...
How is SA glued to your MTA? Can you enforce process limits there so that
spamc doesn't just return a "can't scan" result if it gets overloaded?
(it is possible it's database-related if you're using ISAM rather than
InnoDB, but apart from asking "have you tried InnoDB on a tmpfs?" I'll let
others pursue that...)
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Gun Control: The theory that a woman found dead in an alley, raped
and strangled with her panty hose, is somehow morally superior to
a woman explaining to police how her attacker got that fatal bullet
wound. -- L. Neil Smith
-----------------------------------------------------------------------
Today: the 45th anniversary of the loss of Apollo 1