Am 21.12.2011 19:10, schrieb Kris Deugau: > Marc Perkel wrote: >> I've been trying for a long time to get bayes/mysql to actually work. >> Running a dedicated server with MySQL. Several servers running SA >> configured to talk to it. >> >> I'm running big servers with lots of ram and raid 0 flash drives for >> speed. Also using InnoDB. I'm beginning to wonder if it is ever going to >> work and if someone is going to fix it? > > I'm not sure what official testing has been done, but some testing I did > about a year ago when upgrading the SA cluster here showed pretty much > the same IO load for a global Bayes no matter what combination of > MyISAM, InnoDB, generic SQL, or MySQL-specific SA modules I used. > > Enabling MySQL replication also bogged things down pretty badly. > > Performance with the database on physical disks simply wasn't keeping up > with more than about double the average message rate (if that...), so I > fell back to the "good enough" setup of putting the SA database on a > RAMdisk, and tweaking the MySQL init script to reload the database on > startup. A database dump is done once a day, about a half-hour after a > Bayes expiry run. > > This is handling ~250K messages/day, although with some tweaks to > serialize mail delivery a little more to level off the extreme peaks in > messages/second it should probably be able to handle a lot more volume. > > We also have several SA instances - on the inbound side, the first pass > has ~25 of the top-scoring only-hits-spam rules (mostly DNSBLs) to skim > off the junk that would usually score 15+ on a full ruleset. Anything > that gets past that is then passed to a full SA instance with a long > list of local rules targeted at the ones reported as missed spam by > customers. That first pass tags more than 80% of the junk for far less > processing cost than feeding it all through the full ruleset. > > Occasional mail spikes[1] sometimes cause SA to sloooooooowwwww > dooowwwnnn due to CPU contention (60+ spamd threads are simply going to > take a while to chew through mail if you've only got 16 logical CPU > cores), but otherwise a pair of dual-socket, quad-core Xeon E5630 > machines with 12G of RAM are mostly idle. (RAM usage is fairly steady > at just over 4G.) Average scan times are just under a second. > > -kgd > > [1] I'm looking at you, Rocket Science Group - hundreds of messages per > second from netblocks all over the US, all nominally operated by (AKA > "tagged in WHOIS for") the same group - and quite a lot of it spam. > Unfortunately MailChimp seems to buy rack space, hosting, or managed > email servers from them or I'd drop all of their netblocks in the local > reject-at-the-border DNSBL and be done with it.
Interesting Infos, by the way anyone knows postgresql performs better i.e with Bayes clusters etc ? at last using postscreen has helped here stopping bots,so these mails never reach spamd, but for sure in large mailsystems a spamassassin setup has to be configured very carefully ever, and analysed during runtime to get performance tweaks however 250K messages/day seems not that much to me scanning outbound mail with spamd ,was slow here too,i only use clamav-milter with sanesecurity for that, also for inbound before spamass-milter but no flames, for performance issues, a look to the total mailsetup is needed ever, there is no straight right or wrong most cases only analysing the bottlenecks will help -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria