Re: [clamav-users] Occasional sendmail queue delay when using clamav-milter
Hello again, On Wed, 2 May 2018, Aaron Paetznick wrote: ... both very helpful, thanks! ... sorry for the late reply ... It's why we're here. Don't be. :) ... small course corrections and long periods of observation. The correct approach, IMHO. Here are my log levels: define(`confMILTER_LOG_LEVEL', `8') define(`confLOG_LEVEL', `10') For investigation I'd suggest 22 and 15 respectively, but keep an eye on the logs as that's rather verbose. And don't tell Claus. :) For the record, I'm running a distinct copy of clamd using a local unix domain socket on each server. I'd be happy to consider switching to TCP, I wouldn't suggest that's necessary, nor even desirable. ... Here's some threaded top output for my clamd process: ... ?? PID USER? PR? NI??? VIRT??? RES??? SHR S %CPU %MEM TIME+ COMMAND ? 1324 clamav??? 20?? 0? 975436 709588?? 3264 S? 0.0? 5.9 3:17.37 clamd ? 1678 clamav??? 20?? 0? 975436 709588?? 3264 S? 0.0? 5.9 0:08.38 clamd ?57290 clamav??? 20?? 0? 975436 709588?? 3264 S? 0.0? 5.9 1:17.77 clamd ?57438 clamav??? 20?? 0? 975436 709588?? 3264 S? 0.0? 5.9 0:56.31 clamd Interesting. This is the sort of thing I see here: 670 clamav20 0 1035580 673556 12344 S 0.0 4.1 199:44.38 clamd 671 clamav20 0 1035580 673556 12344 S 0.0 4.1 0:00.22 clamd This clamd instance has been running for nearly six months. As you can see there are only two threads and only one of them has done any real work. (The reason that it hardly ever does any work is that my homebrew milter does all the heavy lifting before clamd gets a chance to see the mail data.) I typically see more clamav-milter threads. Maybe this is just characteristic of your heavier loads. Maybe this is a threads issue - it's the sort of thing they do - but I'd have expected to hear more about it on the list if it were something in need of fixing and I don't want to send you off chasing wild geese. Reloading the clamd databases typically takes under 10 seconds. That's a lot quicker than typical here, which databases are you using? I am running two total milters, the other being opendkim. ... INPUT_MAIL_FILTER(`clamav', `S=local:/var/run/clamav-milter/clamav-milter.sock, F=T, T=S:4m;R:4m') INPUT_MAIL_FILTER(`opendkim', `S=local:/var/run/opendkim/opendkim.sock') You're using the default (10 second) timeouts for 'S' and 'R' for the opendkim milter, you might want to consider increasing them, although I've used similar setups in the past without issues (I've never used the opendkim milter, I rolled my own:). All things considered, I may just decide to temporarily set the OnFail option to Accept in my clamav-milter.conf file. It's certainly not an ideal solution though. It certainly isn't. As it happens a colleague in the USA mentioned that he occasionally has to restart clamd too. Except when I've done something daft I don't recall ever having to do that, so I'm starting to wonder if there's something not going on here that _is_ going on elsewhere. If you'd like to let me have a copy of your clamd.conf privately I'll compare it with my own and my colleague's to see if anything obvious meets the eye. I've tweaked our filter rules so mail from your address to my list address won't (shouldn't:) be rejected. -- 73, Ged. ___ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml
Re: [clamav-users] Occasional sendmail queue delay when using clamav-milter
This and Ted's comment about falling back to a permissive OnFail were both very helpful, thanks! I'm sorry for the late reply, but problems like this require small course corrections and long periods of observation. Here are my log levels: define(`confMILTER_LOG_LEVEL', `8') define(`confLOG_LEVEL', `10') For the record, I'm running a distinct copy of clamd using a local unix domain socket on each server. I'd be happy to consider switching to TCP, but because I'm running a distinct clamd per server I think it's unlikely the overhead would be worth it. I could be wrong though. The VM I'm using for testing has 12GB of RAM allocated right now, and averages about 70% memory utilization and under 1% CPU utilization. The pool is setup in a DNS round-robin, so the rest should be similar. Here's some threaded top output for my clamd process: [root@smtp ~]# top -b -n 1 -H -p 1324 top - 13:50:11 up 6 days, 1:25, 1 user, load average: 0.14, 0.15, 0.14 Threads: 4 total, 0 running, 4 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 12121752 total, 3800616 free, 966264 used, 7354872 buff/cache KiB Swap: 1679356 total, 1649968 free, 29388 used. 10741208 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1324 clamav 20 0 975436 709588 3264 S 0.0 5.9 3:17.37 clamd 1678 clamav 20 0 975436 709588 3264 S 0.0 5.9 0:08.38 clamd 57290 clamav 20 0 975436 709588 3264 S 0.0 5.9 1:17.77 clamd 57438 clamav 20 0 975436 709588 3264 S 0.0 5.9 0:56.31 clamd [root@smtp ~]# Reloading the clamd databases typically takes under 10 seconds. One more thing. I am running two total milters, the other being opendkim. Your comment got me thinking that what I'm seeing may not be clamd itself. Maybe clamd is aggravating a problem with opendkim, or maybe something is going wrong with the interaction between the two? Here's how I'm running them side-by-side: INPUT_MAIL_FILTER(`clamav', `S=local:/var/run/clamav-milter/clamav-milter.sock, F=T, T=S:4m;R:4m') INPUT_MAIL_FILTER(`opendkim', `S=local:/var/run/opendkim/opendkim.sock') All things considered, I may just decide to temporarily set the OnFail option to Accept in my clamav-milter.conf file. It's certainly not an ideal solution though. On 5/1/2018 11:34 AM, G.W. Haywood wrote: Hi there, On Tue, 1 May 2018, Aaron Paetznick wrote: Occasionally a small percentage of email will seemingly unnecessarily get held in the queue when using clamav-milter, although it will get delivered successfully on the first attempt with the next queue run. The size, time, sender, and recipient all seem to be irrelevant. Our work-around is to simply process the queue every 5 minutes, but this is not sustainable. We've conclusively narrowed it down to ClamAV, as the problem vanishes when we comment out the INPUT_MAIL_FILTER line in our sendmail.cf file. Here's that milter line: INPUT_MAIL_FILTER(`clamav', `S=local:/var/run/clamav-milter/clamav-milter.sock, F=T, T=S:4m;R:4m') The intermittent ones are the trickiest. :( I haven't seen anything like this issue, although our queues won't be nearly as busy as yours. Typically database reloads here (modest hardware) take a couple of minutes, and your T=S and T=R timeouts for clamav-milter are, like ours, much longer than that. But if you have other milters for which the timeouts are not so long, perhaps it could be an issue. Even so, one might then expect that you'd have noticed a correlation with the reloads, and apparently you haven't. A puzzle. :) What log levels are you using for Sendmail and the milters? Are you using a single clamd instance for all the servers, or one per server, or ...? Do you know how long your database reloads take? Are they reliably taking that long or do they sometimes stall? What connections type(s) are you using for clamd? Have you checked that clamd always responds quickly/reliably to PINGs on the socket? Can we take it that you do have enough memory? -- 73, Ged. ___ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml ___ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml
Re: [clamav-users] Occasional sendmail queue delay when using clamav-milter
Hi there, On Tue, 1 May 2018, Aaron Paetznick wrote: Occasionally a small percentage of email will seemingly unnecessarily get held in the queue when using clamav-milter, although it will get delivered successfully on the first attempt with the next queue run. The size, time, sender, and recipient all seem to be irrelevant. Our work-around is to simply process the queue every 5 minutes, but this is not sustainable. We've conclusively narrowed it down to ClamAV, as the problem vanishes when we comment out the INPUT_MAIL_FILTER line in our sendmail.cf file. Here's that milter line: INPUT_MAIL_FILTER(`clamav', `S=local:/var/run/clamav-milter/clamav-milter.sock, F=T, T=S:4m;R:4m') The intermittent ones are the trickiest. :( I haven't seen anything like this issue, although our queues won't be nearly as busy as yours. Typically database reloads here (modest hardware) take a couple of minutes, and your T=S and T=R timeouts for clamav-milter are, like ours, much longer than that. But if you have other milters for which the timeouts are not so long, perhaps it could be an issue. Even so, one might then expect that you'd have noticed a correlation with the reloads, and apparently you haven't. A puzzle. :) What log levels are you using for Sendmail and the milters? Are you using a single clamd instance for all the servers, or one per server, or ...? Do you know how long your database reloads take? Are they reliably taking that long or do they sometimes stall? What connections type(s) are you using for clamd? Have you checked that clamd always responds quickly/reliably to PINGs on the socket? Can we take it that you do have enough memory? -- 73, Ged. ___ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml
[clamav-users] Occasional sendmail queue delay when using clamav-milter
We use clamav-milter with sendmail on a pool of SMTP servers to handle email delivery for some 10k mailboxes. This has been working well for a long time, but we've noticed a subtle problem creep up on us. Occasionally a small percentage of email will seemingly unnecessarily get held in the queue when using clamav-milter, although it will get delivered successfully on the first attempt with the next queue run. The size, time, sender, and recipient all seem to be irrelevant. Our work-around is to simply process the queue every 5 minutes, but this is not sustainable. We've conclusively narrowed it down to ClamAV, as the problem vanishes when we comment out the INPUT_MAIL_FILTER line in our sendmail.cf file. Here's that milter line: INPUT_MAIL_FILTER(`clamav', `S=local:/var/run/clamav-milter/clamav-milter.sock, F=T, T=S:4m;R:4m') My first thought was some sort of resource contention, but honestly the servers are individually not very busy. I had briefly thought the problem might align with database reloads in the clamd.log file, but that just didn't seem to be the case either. We're currently using ClamAV 0.100.0 with sendmail 8.15.2 on CentOS 7.4, although the problem has been with us though most of the 0.9x series as well. I can send a lot more details if it will help. I guess I'm just wondering if three are any "gotchas" with using ClamAV in this way, and if there are any best-practices we may be missing. Thanks! ___ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml