BAYES_999 strange behavior
Hello. This is the first time SA is giving me enough trouble that I need to ask for help. I hope I get this right. I observed a marked increase in false negatives in the last few weeks. Only today I had enough sense to look at the detailed scores. And, all the escaped spams have hit the BAYES_999 rule. I grepped the site configuration directory: [3+0]~$ fgrep -h BAYES_999 /var/lib/spamassassin/3.003002/updates_spamassassin_org/*.cf ##{ BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes body BAYES_999 eval:check_bayes('0.999', '1.00') tflags BAYES_999 learn,publish describe BAYES_999 Bayes spam probability is 99.9 to 100% # score BAYES_999 0 0 4.84.5 ##} BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes so it seems this is the highest spamminess rule, and the score in the config file reflects that. But the message header is: X-Spam-Tests: BAYES_999=1,DOS_OE_TO_MX=2.523,HTML_MESSAGE=0.001, The score for BAYES_999 is 1 in all cases :( Where does the 1 come from??? Certainly not from my user_prefs, I go to great lengths not to change any scores. And the factory configuration doesn't even seem to have this rule: [4+0]~$ fgrep -h BAYES_999 /usr/share/spamassassin/*.cf [5+0]~$ I am baffled. Is this a bug? My configuration: version 3.3.2 daily sa-update run stores updates in /var/lib/spamassassin/ spamd + spamc --headers -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX signature.asc Description: PGP signature
Re: BAYES_999 strange behavior
On Mon, 17 Feb 2014 16:05:23 -0500 Kevin A. McGrail kmcgr...@pccc.com wrote: Kevin BAYES_999 is just a finer gradient on BAYES_99 allowing for a Kevin higher score on the top .001% of Bayes hits. Thanks for your reply. Could you explain in a bit more detail what gradient on top (of another rule) means? It doesn't mean the score is meant to be additive with the base rule, does it? 'Cause these spams _do not_ trigger any of the bayes rules _except_ for BAYES_999. That's why they score too low to be caught. -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX signature.asc Description: PGP signature
Re: sa-learn from a cronjob?
On Sun, 20 Apr 2014 12:14:37 -0700 (PDT) Dan Mahoney, System Admin d...@prime.gushi.org wrote: Most of my users aren't command-line friendly. I'd like to basically have my IMAP server default to handing out two imap mailboxes that get auto-crontabbed to training bayes. Here is my cronjob for that purpose, in its entirety. Note that each of ~/spam-corpora{ham,spam} is a Maildir. There is a small race condition between the sa-learn run and the move to cur, which wasn't worth fixing in my case; if you use this and fix it let me know :) -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX sa-learn-sync Description: Binary data
Re: sa-learn from a cronjob?
On Thu, 24 Apr 2014 15:07:32 +0100 RW rwmailli...@googlemail.com wrote: RW I don't think it will work for the purpose mentioned, and if it's RW working properly for you, there's a lot you're not mentioning. RW It's only looking for mail in the immediate post-delivery state RW after it's been put into the mailbox by an MTA or MDA and before RW it's been detected as new mail by an MUA (directly or via IMAP). It RW wont learn mail put into the folders by an MUA or IMAP at all. RW You need to use separate destination mailboxes. These are _not_ general purpose Maildirs. The normal mail processing pipe (MTA - LDA - IMAP - MUA) knows nothing about them. To mark something as spam/ham, a user (me) executes a custom macro in the MUA which pipes the message through the safecat command to deliver it explicitly to one of these directories. Basically, Maildir is just a convenient container format here. It could be a database or whatever. Does that answer your objections? -- Please *no* private copies of mailing list or newsgroup messages. gpg public key: 2048R/984A8AE4 fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4 Funny pic: http://bit.ly/ZNE2MX
Re: Bayes refinement
On Fri, 16 May 2014 07:22:56 -0400 David F. Skoll d...@roaringpenguin.com wrote: James Is there any way to limit Bayes content checking to only the James first X characters of the message body? I ask this because it is James clear that the spam messages getting through contain text meant James to poison the tests but this gibberish always trails the main James message and is separated by a large white space in most cases. David In my experience, trying to be too clever with Bayes is David counter-productive. Those Bayes-poisoning attacks rarely work on David a well-trained corpus. You probably just need more training for David Bayes to figure out what's happening. In the last few (~10) days, I have seen a marked increase in FNs, usually with Bayes values in the 50s and 60s. By marked, I mean I do pretty much nothing but adjust my various ad-hoc rules to keep from being flooded ;-\ On close inspection, I see that the hash-busting garbage appended is (faux) technical computing talk instead of the usual cookbooks or classical literature :-p That is, scrambled Stack Overflow discussions and the like. And of course that is what most of my ham is about, so it makes very good sense that Bayes gets confused. I include a magic dump just in case something is wrong with my training. But if not, isn't this a situation where something like James' suggestion would help? [4+0]~$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 5593 0 non-token data: nspam 0.000 0 6190 0 non-token data: nham 0.000 0 148413 0 non-token data: ntokens 0.000 0 1384366530 0 non-token data: oldest atime 0.000 0 1400253567 0 non-token data: newest atime 0.000 0 1400253356 0 non-token data: last journal sync atime 0.000 0 1395423790 0 non-token data: last expiry atime 0.000 0 11059200 0 non-token data: last expire atime delta 0.000 0 25914 0 non-token data: last expire reduction count -- Please *no* private copies of mailing list or newsgroup messages.
Re: SPAM from a registrar
On Thu, 15 May 2014 09:45:21 -0800 Kevin Miller kevin_mil...@ci.juneau.ak.us wrote: Have you looked into Day old bread? http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB Just for the fun of it, I did a manual whois on the domain of one random spam I got today which was not killed by SA. Sure enough, the domain was a day old. Running SA --debug on the spam I can see that URIBL_RHS_DOB lookup is attempted but comes back with NXDOMAIN. So I have to question how effective that rules really is ... I don't know how often the underlying RBL [1] refreshes - could that be the reason? [1] http://www.support-intelligence.com/dob/ -- Please *no* private copies of mailing list or newsgroup messages.
Re: SPAM from a registrar
On Sat, 17 May 2014 01:34:58 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: I don't know whether DOB limits DNS queries of a single host. However, if you *never* get that rule firing, the NXDOMAIN result may indicate exceeding a query limit. Do you use a local caching DNS resolver, or does SA use your upstream ISP's one, along with a million other SA instances? Excellent point. I _used to_ run a local DNS cache, but got rid of it a few months ago, in the name of simplicity. Was that a good or bad thing to do in the current context? -- Please *no* private copies of mailing list or newsgroup messages.
Re: Bayes refinement
On Fri, 16 May 2014 16:20:21 -0400 Bowie Bailey bowie_bai...@buc.com wrote: Keep in mind that BAYES_50 and BAYES_60 still contribute positive scores by default. Though it is technically a neutral result, it still adds a point or two to the score. Rather than messing with Bayes, I would focus on the spams you are seeing and try to find a common thread that you can use to make a custom rule or two to catch them. If they all have similar garbage appended to them, there are probably other similarities you could find. I have already made many such custom rules. As I wrote, that's mostly what I was working on this week :-( For instance, I noticed many of them (but not all) put my address in the Message-ID. Some (but not all) use broken HTML template kits that leave nice fingerprint marks in the body. And so on. But usually only 1 of them fires, at most - that is a 1.0 score, BAYES_50 is also around 1.0 I think, and that's about it - no RBL hits, no Razor or Pyzor hits. And to add insult to injury they almost always hit RP_MATCHES_RCVD, for a (locally modified) -0.15 boost. So, these rules are helping, but not enough. I am still getting about 1 unkilled spam an hour, which is too much for me. Today I have enabled full auto-learning (prior to this, I had bayes_auto_learn_on_error = 1). Hopefully that will give Bayes much more learning material. -- Please *no* private copies of mailing list or newsgroup messages.
Re: SPAM from a registrar
On Mon, 19 May 2014 10:46:25 -0800 Kevin Miller kevin_mil...@ci.juneau.ak.us wrote: Ian Excellent point. I _used to_ run a local DNS cache, but got rid of Ian it a few months ago, in the name of simplicity. Was that a good or Ian bad thing to do in the current context? Kevin That's a bad thing to do. A caching name server is pretty easy Kevin to implement (all the distros that I've played with do it Kevin automatically just installing bind). Many (most?/all?) RBLs Kevin require a subscription (read money) if you exceed a certain Kevin number of queries. A public dns server can hammer them quite Kevin quickly, and thus get filtered out. A local caching server is Kevin definitely recommended. I've never read any posts suggesting Kevin reasons not to use one... Ok, I installed a local bind instance on Saturday. But it is not helping: out of about 100 spams I got today (counting both those that got flagged and those that didn't, but not counting the horrible spams with score 15 that go directly to /dev/null), _none_ scored on URIBL_RHS_DOB. And I know for a fact that most of them contain fresh domains :-( Btw, all those domains are registered with enom. Wth? -- Please *no* private copies of mailing list or newsgroup messages.
Re: Bayes refinement
On Thu, 15 May 2014 12:18:25 -0800 Kevin Miller kevin_mil...@ci.juneau.ak.us wrote: I implemented a rule that looks for multiple breaks for just that reason. Can't remember where I stole it from - probably some folks here helped me with it a few years ago. Can't remember who, but appreciated the assistance. I am trying to do a variant of this for text/plain, as that is the type I mostly face now. But I cannot get it to work. header __LOCAL_PLAIN_ASCII Content-Type =~ /text\/plain; *charset=us-ascii/i rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m meta LOCAL_PLAIN_ASCII_MUCHO_BLANKS (__LOCAL_PLAIN_ASCII __LOCAL_MUCHO_BLANKS) Feeding message into --debug shows __LOCAL_MUCHO_BLANKS never matches. What am I doing wrong? -- Please *no* private copies of mailing list or newsgroup messages.
Re: Bayes refinement
On Wed, 21 May 2014 19:08:51 +0100 Martin Gregorie mar...@gregorie.org wrote: rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m Martin Looking for newlines rather than whitespace? Does /\s{10,}/m Martin work any better? Nope, it doesn't :-( Anyway, looking for newlines was my intention, sorry for the misleading nomenclature. But I guess that is irrelevant as neither variant works. -- Please *no* private copies of mailing list or newsgroup messages.
Matching multiple newlines [Was: Bayes refinement]
On Wed, 21 May 2014 11:50:15 -0700 (PDT) John Hardin jhar...@impsec.org wrote: rawbody __LOCAL_MUCHO_BLANKS /\n\n\n\n\n\n\n\n\n\n/m Hmmm, no, your version doesn't work, either. Would this be of any import? [24+0]~$ perl --version This is perl 5, version 14, subversion 2 (v5.14.2) built for i486-linux-gnu-thread-multi-64int (with 88 registered patches, see perl -V for more detail) -- Please *no* private copies of mailing list or newsgroup messages.
Re: Bayes refinement
On Wed, 21 May 2014 22:26:41 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: Karsten Seriously, the above rule, the shorter /\n{10}/, as well as the Karsten variant posted by John without quantifier do exactly what you Karsten asked for. They match 10 consecutive \n newline chars in the Karsten rawbody. Ok, thanks for the improvements. Karsten The test message does not have that string. Maybe it uses DOS Karsten flavor \r\n. Or what appears to be a bunch of linebreaks Karsten actually has spaces mixed in. Well, no. I looked at the message (the same data I fed to s.a. --debug) with hexdump -C. It definitely has 10 consecutive 0a's. For rawbody rules, is really _the whole_ body fed to the matcher at once? -- Please *no* private copies of mailing list or newsgroup messages.
autolearn_force
I don't understand this setting, and reading the documentation doesn't help. It seems it sould make bayes learn spam whenever the total score surpasses the value of bayes_auto_learn_threshold_spam, and not require 3 points from header and body each; that would make it a global setting similar in purpose to bayes_auto_learn_threshold_spam. But in fact this is a per-test setting, a subcategory of tflags. Do I have to specify it separately for every test? Why? Or is there another way to bypass the 3/3 requirement? -- Please *no* private copies of mailing list or newsgroup messages.
Re: autolearn_force
On Thu, 22 May 2014 15:54:42 +0100 RW rwmailli...@googlemail.com wrote: Ian I don't understand this setting, and reading the documentation Ian doesn't help. Ian It seems it should make Bayes learn spam whenever the total score Ian surpasses the value of bayes_auto_learn_threshold_spam, and not Ian require 3 points from header and body each; that would make it a Ian global setting similar in purpose to Ian bayes_auto_learn_threshold_spam. Ian But in fact this is a per-test setting, a subcategory of tflags. Ian Do I have to specify it separately for every test? Why? RW The point is to set it for a small number of rules that are RW sufficiently strong as to guarantee there will be no mislearning in RW combination with the autolearn as spam threshold. RW It's probably best to create a single metarule for this - something RW that eliminates the possibility of mistraining through a lot of RW overlapping rules. I do something similar to get more spam into my RW high-scoring folder. I assign a lot of the near-certain spam rules RW to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then RW count the number of classes. The problem I am trying to solve is that nearly all of my spam is flagged due to body rules. The header rules seem to be close to useless with the latest campaigns - spammers seem to have learned enough to avoid sending obvious stinking pieces of turd. (The one exception is patterns in the Message-ID, but I am afraid that will be short lived too, and is insufficient by itself even now). Thus, even if I set bayes_auto_learn_threshold_spam low, very few of my spams are autolearned because of the 3/3 requirement. The damn 3/3 is my problem - how can I work around it? If I have to spend an hour a day manually training the classifier the spammers have won :-( By the way, how are meta rules counted for this purpose? The documentation says nothing about that. -- Please *no* private copies of mailing list or newsgroup messages.
Re: Blank line rules
On Thu, 22 May 2014 13:47:04 -0700 (PDT) John Hardin jhar...@impsec.org wrote: John Regular expressions by default only consider a single line of John text. You need to provide an option to say treat multiple lines John as a single line. Try this: rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m James, see also the Bayes refinement thread where I posted about doing the exact same thing. Somehow John's multiline rules don't work for me, either. Kärsten was looking at it last I know. -- Please *no* private copies of mailing list or newsgroup messages.
lint versus spamd log
I have diligently used spamassassin --lint after every edit to my user_prefs file, and made sure there was no output. This morning, in the course of the ongoing battle against enom related spam, I looked in /var/log/mail.log, and imagine my surprise when I found this logged with every delivery: May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in /home/user/.spamassassin/user_prefs: loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: loadplugin Mail::SpamAssassin::Plugin::RelayCountry May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in /home/user/.spamassassin/user_prefs: loadplugin Mail::SpamAssassin::Plugin::RelayCountry May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in /home/user/.spamassassin/user_prefs: loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: pyzor_options --homedir /home/user/.pyzor May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in /home/user/.spamassassin/user_prefs: pyzor_options --homedir /home/user/.pyzor My setup is: spamc --headers from within my .procmailrc file. Does the above mean I cannot use these plugins in this scheme, because they are administrator only? That would be disappointing. Beyond that, I don't know what to make of the pyzor related error. Pyzor seems to be globally enabled: [6+0]~$ fgrep -i pyzor /etc/spamassassin/*.pre /etc/spamassassin/v310.pre:# Pyzor - perform Pyzor message checks. /etc/spamassassin/v310.pre:loadplugin Mail::SpamAssassin::Plugin::Pyzor Please help. -- Please *no* private copies of mailing list or newsgroup messages.
Re: lint versus spamd log
On Fri, 23 May 2014 20:35:26 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: Ian spamassassin --lint Ian after every edit to my user_prefs file, and made sure there was no Ian output. This morning, in the course of the ongoing battle against Ian enom related spam, I looked in /var/log/mail.log, and imagine my Ian surprise when I found this logged with every delivery: Karsten That means you have been running lint check as a user, who is Karsten not the user receiving mail. Linting also checks user_prefs, Karsten but for obvious reasons only for the current user. I mostly get the rest of your answer, but this is incorrect. Same user, I'm 100% sure. Unless you count spamd checking on my behalf as different user - do you? Karsten (FWIW, what really would be disappointing is allowing users to Karsten inject code into the daemon. Which loadplugin in user_prefs Karsten would be.) I assumed spamd forked to process each request, and loaded the plugins only in the child. -- Please *no* private copies of mailing list or newsgroup messages.
Re: lint versus spamd log
On Sat, 24 May 2014 00:51:38 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: Ian I mostly get the rest of your answer, but this is incorrect. Same Ian user, I'm 100% sure. Unless you count spamd checking on my behalf Ian as different user - do you? Karsten Yes. Karsten user_prefs are per user. They are read by the spamd child Karsten process for each and every message processed. If the spamd Karsten daemon runs as root, the children setuid to the spamc calling Karsten user (or given -u argument), to determine which user_prefs to Karsten use. In your case the spamd master process already runs as user Karsten spamd and the setuid step is omitted. The user_prefs are still Karsten based upon the user the spamd child runs as. Karsten Look at it this way: Both the spamd master process as well as Karsten its children are running as an unprivileged, dedicated Karsten user. You don't expect that user to have access to your actual Karsten mail receiving account, do you? Karsten My wording of user receiving mail should have been Karsten processing user. I was a little sloppy, because your OP did Karsten not mention spamd. Given details are my user_prefs, logs Karsten showing a user named user, and mentioning spamc being called Karsten via procmail. I apologize for muddying the waters more than necessary. The log was altered - user is in fact my normal user ID. Karsten In your case of a dedicated spamd user, an attacker able to Karsten load a plugin even potentially can access *any* other user's Karsten mail while being processed by SA. Karsten Again, see the Administrator Settings section in M::SA::Conf. There is no dedicated spamd user - spamd runs as root: [11+0]~# ps lw 13558 13560 13561 F UID PID PPID PRI NIVSZ RSS WCHAN STAT TTYTIME COMMAND 1 0 13558 1 20 0 46656 40888 - Ss ? 0:04 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-d 5 0 13560 13558 20 0 62016 56908 - S? 1:11 spamd child 5 0 13561 13558 20 0 51800 46716 - S? 0:04 spamd child (Sorry if this is also confusion created by my obfuscation of the log.) According to the docs, this means spamd _does_ change identity to the originator when processing each spamc request. -- Please *no* private copies of mailing list or newsgroup messages.
Re: autolearn_force
On Thu, 22 May 2014 15:54:42 +0100 RW rwmailli...@googlemail.com wrote: Ian But in fact this is a per-test setting, a subcategory of tflags. Ian Do I have to specify it separately for every test? Why? RW The point is to set it for a small number of rules that are RW sufficiently strong as to guarantee there will be no mislearning in RW combination with the autolearn as spam threshold. So, now I am really confused. I think I did everything right in user_prefs: bayes_auto_learn1 bayes_auto_learn_threshold_nonspam -2.00 bayes_auto_learn_threshold_spam 6.00 bayes_auto_learn_on_error 0 [snip] tflags URIBL_DBL_SPAM autolearn_force tflags URIBL_JP_SURBL autolearn_force tflags URIBL_BLACK autolearn_force tflags INVALID_DATE autolearn_force Nonetheless: X-Spam-Score: 6.9 X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7 X-Spam-Autolearn: no autolearn_force=no One suspect thing I see in the log: May 24 20:29:58 host spamd[13561]: spamd: result: Y 6 - BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTM L_MESSAGE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK scantime=1.9,size=6208,user=itz, uid=1000,required_score=4.3,rhost=127.0.0.1,raddr=127.0.0.1,rport=60231,mid=23931386609892239320827813 806...@86adv5n4.disabilism.eu,bayes=1.00,autolearn=no autolearn_force=no Note the 6 - is it possible that SA truncates the score to an integer for this purpose, and then treats it as a strict lower bound - that is, if I set bayes_auto_learn_threshold_spam = 6.00, the lowest score to actually trigger autolearn would be 7? That is the only rational explanation I have, tortured as it is. It sure looks like SA is going out of its way to force me to do manual training :-( -- Please *no* private copies of mailing list or newsgroup messages.
Re: autolearn_force
So, now I am really confused. I think I did everything right in user_prefs: bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -2.00 bayes_auto_learn_threshold_spam 6.00 bayes_auto_learn_on_error 0 [snip] tflags URIBL_DBL_SPAM autolearn_force tflags URIBL_JP_SURBL autolearn_force tflags URIBL_BLACK autolearn_force tflags INVALID_DATE autolearn_force Nonetheless: X-Spam-Score: 6.9 X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7 X-Spam-Autolearn: no autolearn_force=no And here's a case where it doesn't autolearn ham (same user_prefs as above): X-Spam-Status: No X-Spam-Level: X-Spam-Score: -2.7 X-Spam-Tests: BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001 X-Spam-Autolearn: no autolearn_force=no The documentation certainly doesn't say anything like the 3/3 and force mechanism is in place for ham. So this _should_ autolearn. Right? Right?? -- Please *no* private copies of mailing list or newsgroup messages.
Re: autolearn_force
On Sun, 25 May 2014 16:40:44 +0200 Axb axb.li...@gmail.com wrote: Axb URIBL rules are not set to use 'userconf' (user configuration) Axb so entries in user_prefs shouldn't affect the results Axb if anything it should go in a system wide rule (ie: local.cf) (not Axb user_prefs) Axb your: tflags URIBL_DBL_SPAM autolearn_force Axb should probably read: Axb tflags URIBL_DBL_SPAM net domains_only autolearn_force Axb etc, etc - and not in user_ Axb iirc, this will also influence Bayes's scoring/learning behaviour. Axb modifying rules' tflags should be done with care But it does autolearn in _some_ instances: May 25 08:33:50 host spamd[13561]: spamd: result: Y 10 - BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY, RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL scantime=1.7,size=6496,user=itz,uid=1000,required_score=4.3,rhost=127.0.0.1, raddr=127.0.0.1,rport=52900, mid=24251386609892242521126914206...@lun5bim.dollazo.eu,bayes=1.00, autolearn=spam autolearn_force=yes (URIBL_JP_SURBL,URIBL_DBL_SPAM,URIBL_BLACK) So I'm afraid I can't be satisfied with this explanation. The whole autolearning settings thing just feels way unpredictable for me. If there are so many hurdles, does anyone actually do it? -- Please *no* private copies of mailing list or newsgroup messages.
Re: autolearn_force
On Sun, 25 May 2014 20:06:22 +0200 Axb axb.li...@gmail.com wrote: Axb Yes, when it reached certain conditions and a score above 15.0 Axb you can tune that score via local.cf entries: Axb bayes_auto_learn_threshold_nonspam bayes_auto_learn_threshold_spam Please see the prefs in my post upthread - I have already done this. That's why I am so confused, and frankly, irritated. I have done everything the documentation says to do, and it still behaves magically and strangely. -- Please *no* private copies of mailing list or newsgroup messages.
Re: Capture vs non-capture groups
On Wed, 28 May 2014 10:47:35 -0700 (PDT) John Hardin jhar...@impsec.org wrote: John The only place I've found backreferences useful is when writing a John header rule that is looking for the same string in multiple John headers. John Other than that, captures are very rare. There was a pattern in the recent campaigns where backreferences would be perfect. So far I have been busy trying other approaches but I may come back to that. Example at http://pastebin.com/KUJAWdHq -- Please *no* private copies of mailing list or newsgroup messages.
Re: SA without procmail?
On Wed, 18 Jun 2014 15:24:36 +0200 Axb axb.li...@gmail.com wrote: Axb Dovecot's Sieve is your friend. (replaces procmail) Not really, not in this context. OP is using procmail merely as a LDA. And in that capacity, is is replaced by the LDA that comes with dovecot. On my debian system, it is /usr/lib/dovecot/dovecot-lda. -- Please *no* private copies of mailing list or newsgroup messages.
Re: SA without procmail?
On Fri, 20 Jun 2014 14:05:04 +0100 Timothy Murphy gayle...@eircom.net wrote: Is there something similar I could append instead to use dovecot-lda? Yes. mailbox_command = /usr/libexec/dovecot/dovecot-lda or mailbox_command = /usr/libexec/dovecot/dovecot-lda -m INBOX I don't know postfix, so I can't help with the magic to substitute another mailbox for INBOX. Or you can do this with a .forward file (I am sure postfix supports those): echo '|sh -c \'/usr/lib/dovecot/dovecot-lda || exit 75\'' ~/.forward -- Please *no* private copies of mailing list or newsgroup messages.
Re: SA and Ubuntu 14.04 LTS
On Wed, 16 Jul 2014 06:09:08 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: And to really include *local* plugins, provide a relative path (to the current site-wide configuration dir, without a leading slash) as optional second argument to the loadplugin statement. There's hardly ever any need for a full absolute path. And if there is, there's something wrong with your environment. There _is_ something wrong with his environment: he's running Ubuntu. :-) Sorry, couldn't resist. -- Please *no* private copies of mailing list or newsgroup messages.
Re: Ready to throw in the towel on email providing...
On Mon, 28 Jul 2014 12:57:38 -0400 David F. Skoll d...@roaringpenguin.com wrote: David 1) Gmail is actually pretty good at filtering spam. I can't David speak for MSFT since I don't use it. David 2) Especially in North America, companies are short-sighted and David go for quick fixes and things that look cheap up-front without David considering the long-term costs. David 3) Especially in North America, people don't see the value in David learning technology. They want simple, spoon-fed solutions and David they love the word oursourcing. Sorry if (2) and (3) are not David PC, but the slag against North Americans is based on my personal David experience. :) And hey, I'm Canadian so I can dis my own crowd... David 4) Most non-technical small businesses equate Mail Server with David Microsoft Exchange, and Microsoft has steadily been making David Exchange more and more of a PITA to administer. Each new version David of Exchange breaks things and requires learning new procedures. David Combine that with (3) and we see that MSFT is using on-premise David Exchange as a trojan horse to get people on O-365. The huge pool David of managed service providers that recommend MSFT solutions is David by-and-large staffed by incompetents who are only too happy to David shove their customers onto O-365 and collect kickbacks every David month. Good summary, but I think you forgot (5): They have prettier icons. I am not 100% kidding, either. -- Please *no* private copies of mailing list or newsgroup messages.
Mojibake alert [Was: Advice sought on how to convince irresponsible Megapath ISP]
On Sun, 17 Aug 2014 07:37:36 -0700, Linda Walsh sa-u...@tlinx.org wrote: Karsten Brmojibake elided/ wrote: In addition to other problems with your posts (which experts here have already pointed out), your scripts clearly do not handle non-ASCII emails well, as you have completely mangled Karsten's name in your quote. The days when you could do all email processing with the basic Unix tools like sed and tr are long gone. Please look into MIME-aware tools or libraries. The python email package, for instance, is excellent. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: Bayes training via inotify (incron)
On Fri, 22 Aug 2014 08:34:34 +, Eric Wong e...@80x24.org wrote: Eric I always thought inotify was an obvious way to train for anybody Eric using Maildirs on Linux, so I set it up for my server and Eric basically forgot about it since it worked well. Fast forward to Eric 2014 and I realize what I do is not widespread. I figure I'll Eric attempt to document things here to a wider audience on this Eric sa-users list and hopefully help other users out. Isn't inotify a bit of overkill for this? If you have a dedicated maildir for training, you know that anything in maildir/new is, uh, new. So you process it and move it to maildir/cur. What am I missing? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Learning both spam and ham, edge case
I know that if you misclassify a mail as spam with sa-learn --spam /path/to/ham you can later run sa-learn --ham /path/to/ham to correct the mistake, and SA will do the right thing (ie. forget the wrong classification). And conversely, with ham - spam. My question is, what happens if you run sa-learn --spam /path/to/spam --ham /path/to/ham and the same message is in both mailboxes? Is the behavior even well-defined (ie. not random)? And if so, can it be relied on in new versions? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: drop of score after update tonight
I definitely have FNs today (about 10 by now today, normally 0). Looks like some/all RBLs tests are not working. I have not changed my configuration at all. Sample here: http://pastebin.com/dsqaVA9Z -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: drop of score after update tonight
On Mon, 25 Aug 2014 19:50:20 +, David Jones djo...@ena.com wrote: Ian I definitely have FNs today (about 10 by now today, normally 0). Ian Looks like some/all RBLs tests are not working. I have not changed Ian my configuration at all. Ian Sample here: Ian http://pastebin.com/dsqaVA9Z David This hit DCC_CHECK, BAYES_50, CRM114, BOGOFILTER and KAM_EU rules David and would have been blocked on my SA 3.4.0 servers. Isn't it a bit odd that SA has rules for all these other Bayes powered backends? Why not give a bit more weight to its own Bayes instead, rather than make users forage for other tools that do essentially the same thing? David (I understand that the DCC_CHECK hit could have also hit on your David mail server too after time had passed if you have DCC enabled.) Don't you need non-free software for DCC? (Meanwhile, more spam came in. This is definitely a crisis for me.) -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: drop of score after update tonight
On Tue, 26 Aug 2014 08:10:23 +0200, Matus UHLAR - fantomas uh...@fantomas.sk wrote: Ian Isn't it a bit odd that SA has rules for all these other Bayes Ian powered backends? Why not give a bit more weight to its own Bayes Ian instead, rather than make users forage for other tools that do Ian essentially the same thing? Matus are they part of stock 3.4.0? Apparently not. So, I have to rephrase: Isn't it a bit odd to use these external rules? :) Ian Don't you need non-free software for DCC? Matus non-free in Debian definition. Matus (you need own server if you process ofer 100k messages daily, and Matus license if you have internal checksum database) Matus you can get the source, build and run in most of cases freely. But that presents difficulties even apart from the religious ones. For instance, it means installing development tools on the target server, or else cross-compiling (and we know how easy that is with average C code). The good news is the bout of spam seems to have calmed down. _Something_ must have been wrong earlier today. The RBLs and Razor and Pyzor all seemed to be out to lunch. Maybe a connectivity problem on my side. Christian Science Programming: Let God Debug It!. May I quote this? :-) -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: Give a penalty to messages with non latin UTF-8 characters?
On Sat, 30 Aug 2014 06:44:39 -0600, LuKreme krem...@kreme.com wrote: LuKreme I would welcome rules that would reliably penalize messages LuKreme that use chinese, japanese, korean, thai, or any other LuKreme characters in the UTF-8 address space that I don’t read. I LuKreme would put them in user_prefs. Doesn't ok_languages and ok_locales do the job? It does for me. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: bayes scroing too low
On Sun, 31 Aug 2014 12:20:41 +0200, Axb axb.li...@gmail.com wrote: Axb Bayes scores are *not* set to be a sole indicator of spam/ham. Axb They're supposed to be yet another indicator. FWIW, I use both Razor and Pyzor, and there are times when they seem to be just asleep. Or maybe a particular kind of spam defeats their hash protection methods. Then for some hours I get repeated cases like Harald's - positive BAYES_999 but nothing much else. It is quite frustrating. I started using the KAM rules and they seem to push most such messages over - but then _they_ include rules with 5+ scores ... -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: sa-learn and find
On Sat, 30 Aug 2014 19:59:53 -0600, LuKreme krem...@kreme.com wrote: RW This may run into shell argument limits if you have to learn a lot RW of spam. Consider piping the output of find to xargs, or using -exec RW ...{} + in find. LuKreme Yes, I tried to do that, but as I said in my first post, if I LuKreme do the find as part of the sa-learn command, then it stall when LuKreme the find command returns null. xargs (the GNU one at least) has an option to not run the inferior when there are no args to give it. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: SA works great!
On Sun, 31 Aug 2014 16:55:50 +0200, Axb axb.li...@gmail.com wrote: Axb During the last +-4 years, scores have been set by the masscheck GA Axb system. IF more ppl would contribute with masschecks and rules, Axb detection could be better, but the lack of volunteers doing this Axb shows that apparently what SA does is good enough or there is Axb little interest in commitment. So, how do I take part in masscheck? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: sa-learn and find
On Sun, 31 Aug 2014 17:37:50 -0600, LuKreme krem...@kreme.com wrote: Ian xargs (the GNU one at least) has an option to not run the inferior Ian when there are no args to give it. LuKreme The interior is the find: _Inferior_ which is GNU speak for subprocess. I should have tried to be less concise :-) sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7` find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u ${i} LuKreme (FreeBSD xargs never runs the command if the input is empty) You may not need -r then. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: bayes scroing too low
On Sun, 31 Aug 2014 12:20:41 +0200, Axb axb.li...@gmail.com wrote: Axb get the source from http://razor.sourceforge.net/ I don't recommend Axb installing via some rpm. The last version mentioned on that site is 2.84, from May 2007. strangely, the version on current Debian packages is 2.85. Anyone know what's going on here? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: large spam messages
On Thu, 4 Sep 2014 12:52:34 -0400 (EDT), Jude DaShiell jdash...@panix.com wrote: Jude Since spamassassin cannot handle large spam over 2MB in size, what Jude can be used to handle that class of junk? I use a script on the MX host to MIME reshape all large messages, dropping all non-text attachments, and save them to files there, before forwarding to my IMAP server. If such a message is ham (which is almost never) it is easy enough to download the files after the fact. Can share the script for the asking. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Reply versus new thread [Was: Dumping email with blank To: header ?]
Others have gracefully answered as to the substance of your message. I'll have to be a pest and ask that you please do not use Reply or Followup when you're starting a new topic. For list readers with user agents that thread the standard (RFC standard) way, that breaks threading. The way to start a new topic is to copy the list address, do a New Message or similar, and paste the address into the destination field. You can also save the address in your contact list / address book to avoid the copy and paste in the future. Thanks for your cooperation. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: sa-learn from a remote imap folder
On Fri, 12 Sep 2014 07:45:22 -0500, Dave Pooser dave...@pooserville.com wrote: Marcus spamassassin and imap (cyrus) are running on different Marcus boxes. What is best practice to learn spam from a remote imap Marcus folder? Dave At $DAYJOB we export the spam folder (and a ham folder for FPs) Dave via NFS and mount them on the frontline SA servers for sa-learn. Doesn't that smell of locking issues? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
KAM_BODY_URIBL_PCCC misfire
I have just had a false positive due to KAM_BODY_URIBL_PCCC (good for 5 pts.), for no apparent reason whatsoever. The are no URIs in the body. spample here: http://pastebin.com/6kaxtNcq -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: more_spam_from like more_spam_to
On Wed, 17 Sep 2014 13:43:49 +0100, RW rwmailli...@googlemail.com wrote: RW A lot of people don't put mailing lists through Spamassassin, most RW of them have already been spam filtered, and to get the best results RW you have to extend your internal network and maintain it. Do you mean the trusted_networks setting here? I do sometimes get spam from lists, and so far I have been feeding list traffic to SA just like everything else. It doesn't seem to have any adverse effects. My trusted_networks is set to just the MX host. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: more_spam_from like more_spam_to
On Fri, 19 Sep 2014 08:37:45 +0200, Matus UHLAR - fantomas uh...@fantomas.sk wrote: RW A lot of people don't put mailing lists through Spamassassin, most RW of them have already been spam filtered, and to get the best results RW you have to extend your internal network and maintain it. Ian Do you mean the trusted_networks setting here? Matus no... they do not filter mail from mailing lists through SA. it Matus is setting in outside spamassassin, usually in MTA, milter or Matus procmail. Matus trusted_networks is SA configuration setting so it can't be used Matus when SA is avoided. Also, it has much different meaning than not Matus scanning mail from those hosts. Well, that is not how I read RW's message. To me, it sounds like this: Lots of people don't put mailing lists through Spamassassin, in part because of the extra work that would be required if they did; namely, they'd have to extend their internal network and maintain it. This is required for best results. (I'm not a native English speaker either, but I've probably been speaking and reading it a bit longer than you. Just guessing, and of course no disrespect meant.) Only RW can clarify ... -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: Non-English spam
On Thu, 25 Sep 2014 13:13:07 -0400, dar...@chaosreigns.com wrote: To enable TextCat to flag everything that's not English, in local.pre I have: loadplugin Mail::SpamAssassin::Plugin::TextCat And in local.cf I have: ok_languages en I have done this too, but I live in an English speaking country. If I had to do this while living in a Polish speaking country, I'd consider that the spammers have won. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: spam - why spam score is low,
On Fri, 26 Sep 2014 17:07:31 +0200, Antony Stone antony.st...@spamassassin.open.source.it wrote: motty Received: from maria.fqdn.com ([127.0.0.1]) Antony That won't be helping - it means you're not basing any tests on Antony the sending server. can you run SA on your inbound MX instead Antony of relaying locally first? Is this right? Isn't this precisely what the internal_networks setting works around? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: what's wrong
On Tue, 30 Sep 2014 09:47:41 +0200, Matus UHLAR - fantomas uh...@fantomas.sk wrote: Do you trust smtp.cesky-hosting.cz? Even if it's open socks and http proxy server? I wonder if slovensky-hosting.sk does better :-P -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: Local URL blocking based on NS records?
On Fri, 03 Oct 2014 00:08:49 +0200, Axb axb.li...@gmail.com wrote: Axb What's wrong with running rbldnsd? It's the tool all BLs use for Axb mirroring BL data. It's so stable and simple to use nothing can Axb beat it. From the website: There is no config file, rbldnsd accepts all configuration in command line. A bit too simple, I'd say. What about kernel argv limits? -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: Regarding mass-check access
On Fri, 10 Oct 2014 16:19:39 -0400, staticsafe m...@staticsafe.ca wrote: I sent an email to priv...@spamassassin.apache.org regarding access to mass-check back on the first of September. Is anybody out there? :) So did I, on August 31, to be precise. Crickets for me, too. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: procmail (was Re: Spam messages bypassing SA)
On Fri, 24 Oct 2014 08:43:41 -0400, David F. Skoll d...@roaringpenguin.com wrote: David Procmail is also unmaintained abandonware, as far as I can tell. David If you use SpamAssassin, you probably like Perl, so I would David recommend Email::Filter instead. It's far more flexible than David procmail and lets you write readable filters. David Since procmail is still the default LDA on Debian, this is my .procmailrc: David :0 David | /usr/bin/perl /home/dfs/.mail-filter.pl /home/dfs/.mail-filter.log 21 David And excerpts from my filter look something like this: Or you could run dovecot and its sieve plugin. Sieve is a real standard (RFC 5228) which procmail never was. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: procmail
On Tue, 28 Oct 2014 11:43:04 -0700 jdow j...@earthlink.net wrote: jdow That is hardly a compelling reason to change from procmail to jdow perl, for me or others with working procmail systems. You seem to jdow be advocating handing me perl and turning me loose after ripping jdow procmail out of my hands. That does not endear you to me. It isn't jdow broken. So why fix it? There is a tremendous amount of experience jdow out there setting it up and using it. Is that a reason to discard jdow it for something new? We're seeing the fruits of that sort of jdow divisiveness with the systemd controversy. If fix means better and jdow still 100% compatible it is an easy sell. If fix means 0% jdow compatible being better is not good for people with better things jdow upon which to spend their time than learning a new way shoved down jdow their throats. In the abstract you are right. In the practical, jdow that rightness appears to tarnish. You sound like you're replying more to me than to David. How do you match non-ASCII From: in procmail? Note that the encoding may differ, even for the same sender, depending on which MUA he's using ATM. _Some_ old stuff deserves to be replaced. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: SOUGHT 2.0 ?
On Sat, 01 Nov 2014 10:06:57 -, Kevin Golding k...@caomhin.org wrote: Kevin So anyone else want to raise their hands? It depends. Would I mind a bit of regular maintenance work? No, I wouldn't mind. Would I mind a major change in how I run my server - for instance, run a virus checker, or run the bleeding edge version of SA? You betcha. Not going to do that, sorry. So, I need more details before I raise my hand much above the keyboard :-P Of course, I'd love to have the autogenerated rules back, so call me selfish. -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: SOUGHT 2.0 ?
On Thu, 13 Nov 2014 09:28:30 -, Kevin Golding k...@caomhin.org wrote: Kevin The main thing that's going to be needed is good, reliable, Kevin data. We'll only get good rules with good feeds. That should be Kevin fairly low impact for people in many respects. Kevin Obviously there's always room to help with some code, so a bit of Kevin Perl or shell skills are a good thing. The impact of that on Kevin people will vary on how they work, but I doubt anyone will do Kevin anything to interfere with their running systems - as proven with Kevin masschecks it's fairly easy to sandbox things to one side for Kevin such analysis even if people do want to do anything on an Kevin important system. Ok, I am still interested. I'm a coder, my Perl is rusty but my shell is current. I can't provide trap servers but you'd be welcome to my spam (all hand-verified by me). -- Please *no* private copies of mailing list or newsgroup messages. Local Variables: mode:claws-external End:
Re: SOUGHT 2.0
On Thu, 04 Dec 2014 22:41:13 +0100, Axb axb.li...@gmail.com wrote: Axb To be able to create usable rules, several times/day I need feeds Axb to spit *at least* +150k/day. As I don't have the data 150k of what? Bytes? Emails? Tokens? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court. Local Variables: mode:claws-external End:
whitelist_from_rcvd not working, WAIDW
Header of test message, massaged for privacy, is here: http://pastebin.com/EV6g15aN I have this in user_prefs: trusted_networks 198.1.2.3/32 [...lots snipped...] whitelist_from_rcvd *@wetransfer.com *.wetransfer.com Why is the whitelist not firing? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court. Local Variables: mode:claws-external End:
Re: whitelist_from_rcvd not working, WAIDW
On Sat, 28 Feb 2015 13:37:29 +0100, Mark Martinec mark.martinec...@ijs.si wrote: Ian trusted_networks 198.1.2.3/32 Ian [...lots snipped...] Ian whitelist_from_rcvd *@wetransfer.com *.wetransfer.com Mark It seems the: Mark Received: (from itz@localhost) Mark by myalias.trusted.mx (8.14.4/8.14.4/Submit) id t1N7YK8O020727 Mark for i...@my.post.office; Sun, 22 Feb 2015 23:34:20 -0800 Mark is breaking a trust chain. It shouldn't. I forgot to add that all of the following resolve to 198.1.2.3: my.domain my.trusted.mx myalias.trusted.mx -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court. Local Variables: mode:claws-external End:
Confused about Bayes expiry
I am very confused by the various features involving expiry from Bayes. perldoc Mail::SpamAssassin::Conf : bayes_expiry_max_db_size (default: 15) What should be the maximum size of the Bayes tokens database? When expiry occurs, the Bayes system will keep either 75% of the maximum value, or 100,000 tokens, whichever has a larger value. 150,000 tokens is roughly equivalent to a 8Mb database file. bayes_auto_expire (default: 1) If enabled, the Bayes system will try to automatically expire old tokens from the database. Auto-expiry occurs when the number of tokens in the database surpasses the bayes_expiry_max_db_size value. If a bayes datastore backend does not implement individual key/value expirations, the setting is silently ignored. bayes_token_ttl (default: 3w, i.e. 3 weeks) Time-to-live / expiration time in seconds for tokens kept in a Bayes database. A numeric value is optionally suffixed by a time unit (s, m, h, d, w, indicating seconds (default), minutes, hours, days, weeks). If bayes_auto_expire is true and a Bayes datastore backend supports it (currently only Redis), this setting controls deletion of expired tokens from a bayes database. The value is observed on a best-effort basis, exact timing promises are not necessarily kept. If a bayes datastore backend does not implement individual key/value expirations, the setting is silently ignored. This really sounds as if expiry is a no-op for backends other than Redis. And yet Debian bug #334829 [1] exists, and has spawned a whole subculture of solutions and work-arounds. (Sorry for the slight exaggeration.) Clearly the users reporting these problems do not use Redis, in fact by all signs they use the default DB backend, as I do. So should I be worried about the expiry overhead and set up a separate --force-expire job? I am confused. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=334829 -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Confused about Bayes expiry
On 2015-05-24 23:25 +0200, Mark Martinec wrote: Mark With other bayes back-ends the traditional expiration mechanisms Mark need to be used, either auto-expiration runs triggered from time Mark to time by SpamAssassin, or explicit expiration runs, e.g. from a Mark cron job. With these traditional back-ends the bayes_token_ttl Mark setting has no effect. Perhaps this paragraph could be included verbatim in the podfile, and the current wording (especially about bayes_auto_expire) removed :-) Thanks. But, in fact I already have a cronjob running sa-learn --force-expire. The reason I would prefer to remove it (and so the reason for my original post) is that it does a journal sync as well, which I didn't intend and which interferes with other things. Would sa-learn --no-sync --force-expire make sense? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Confused about Bayes expiry
On 2015-05-25 09:43 +0200, Matus UHLAR - fantomas wrote: Ian But, in fact I already have a cronjob running sa-learn Ian --force-expire. The reason I would prefer to remove it (and so Ian the reason for my original post) is that it does a journal sync as Ian well, which I didn't intend and which interferes with other things. Matus what other things? Journal is here to fasten database updates, Matus not to avoid database writes. too big journal slows things down. Matus The main reason to use manual expire is to avoid ocassional Matus delays with automatic expire noted in the bugreport you posted Matus link to. Matus so, again, what are reasons you want to avoid journal syncs? I do the database updates in a batch fashion, learning each input message with --no-sync, then doing a --sync at the end. This --sync cannot wait too long because I want to defend against current spam. That is, it cannot wait as long as the typical time between expires. But if an explicit expiry happens to run at the same time, the result is a mess. Of course there is a simple solution, have a single job which decides by itself if it's time to expire or not, rather than rely on the cron schedule. But it seemed to me that the two tasks were independent and so should be in separate jobs. As it was explained in the other subthread, I was wrong with that assumption. Thanks. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: no reporting methods available
On 2015-07-31 18:28 -0500, David B Funk wrote: Reporting is separate from learning. It is the case that spamassassin -r is supposed to report and learn. However it isn't quite the same as sa-learn --spam in that unlike sa-learn --spam it won't override the spam learn prohibition of BAYES_00. Thanks, that is useful to know. However, it isn't really relevant to this situation. My point is: if learning _is_ part of the job of spamassassin -r, then does it have to fail for the no method available message to be emitted? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
no reporting methods available
I run spamassassin -r from cron nightly. Last night I got this output: Jul 30 23:00:11.830 [31065] warn: reporter: no reporting methods available, so couldn't report Jul 30 23:00:11.830 [31065] warn: spamassassin: warning, unable to report message Jul 30 23:00:11.830 [31065] warn: spamassassin: for more information, re-run with -D option to see debug output I tried to follow the instructions and run spamassassin -D -r `ls spam` but that hangs without producing any output. The only external reporting method I'm aware of that should be active is Razor. Running razor-report `ls spam` works normally as expected. Aside from getting an explanation of what happened this time, I'd also like to clarify more generally what spamassassin -r does. From a recent thread here I learned that it also does the equivalent of sa-learn --spam. Right? So presumably it doesn't consider this a reporting method or how could it be not available? Also I recently installed the bogofilter plugin by Christian Laußat, and my understanding is that (when bogofilter_learn is set to 1, as it is), it advertises itself as another external reporting agent. So shouldn't this also happen during a spamassassin -r run, and how could it be not available? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
bayes expiry not happening when it should
~$ grep '^bayes_expiry_max_db_size' ~/.spamassassin/user_prefs | awk '{print $2}' 200 ~$ sa-learn --force-expire bayes: synced databases from journal in 0 seconds: 2784 unique entries (2805 total entries) ~$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 24501 0 non-token data: nspam 0.000 0 23548 0 non-token data: nham 0.000 02009202 0 non-token data: ntokens 0.000 0 100071 0 non-token data: oldest atime 0.000 0 1438755640 0 non-token data: newest atime 0.000 0 1438755988 0 non-token data: last journal sync atime 0.000 0 1438756034 0 non-token data: last expiry atime 0.000 0 11059200 0 non-token data: last expire atime delta 0.000 0 20174 0 non-token data: last expire reduction count ??wth??? I thought I _finally_ understood this stuff :-( -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: bayes expiry not happening when it should
On 2015-08-05 12:58 +0100, RW wrote: The number of tokens is within 0.5% of the configured value. It's designed to produce a value between 75% and roughly 150%. I can't quite parse that answer, so let's be more specific. Doc says: bayes_expiry_max_db_size (default: 15) What should be the maximum size of the Bayes tokens database? When expiry occurs, the Bayes system will keep either 75% of the maximum value, or 100,000 tokens, whichever has a larger value. From this (and the more elaborate description in the EXPIRATION section, which I've also read) I thought it worked roughly like this: if (ntokens bayes_expiry_max_db_size) do_nothing() else goal_ntokens = max(10, 0.75 * bayes_expiry_max_db_size) while (ntokens goal_ntokens) kill_oldest_tokens() If I misunderstood, how/where? Sorry for my density :-( -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Live upgrade safe?
Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local configuration files, and without regenerating the Bayes database? (I use the default bdb Bayes store.) -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: bayes expiry not happening when it should
On 2015-08-05 19:34 +0100, RW wrote: What it actually does is estimate a cut-off time and then delete all tokens older than that. How it gets the cut-off time is described the next two sections: EXPIRE LOGIC and ESTIMATION PASS LOGIC. OMG. For one thing, are the clauses in the definition of weird conjunctive or disjunctive? A more insolent question, why this complexity? Why can't I force an expire when I feel like it? :-P Or can I? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
another bayes oddity
I have bayes_auto_learn0 bayes_auto_expire 0 bayes_learn_to_journal 0 add_header all Autolearn _AUTOLEARN_ and indeed, all messages are tagged with X-Spam-Autolearn: disabled Nevertheless, the mtime _and_ size of ~/.spamassassin/bayes_journal inches forward with every delivery. Why? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Large spam
On 2015-07-15 20:12 +, Zinski, Steve wrote: We're starting to see a lot of spam in the 800KB to 1.2MB size range. I’m running MIMEdefang and it’s configured to skip messages larger than 100KB (and I hesitate to increase the limit due to performance issues). I read somewhere that there’s a way to have MIMEdefang (or spamassassin) strip out the non-text portions of the e-mail and scan. Can anyone help me set this up or point me in the right direction? Thanks! Yes, I see the same thing. I have no doubt at all that it is intentional, to defeat spamc size limit in particular. Moreover, mimedefang won't help because at least some of them are disguised as plain text messages. That is, the outermost message body is an entire MIME message, headers and all. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Debian jessie - new setup, missing data directory
On 2015-11-09 16:42 +0100, Antony Stone wrote: > What did Jessie install it as? > > > > /var/mail/.spamassassin/user_prefs This is very strange. Are you really sure it is not operator error? I run wheezy, so I can't flat out exclude it, but it flies in the face of too much Debian tradition. /var/mail is just for the spool mailboxes. > 1. I seriously doubt that on a Debian system exim is running as root. Indeed: [6+0]~$ ps axl | fgrep 'exim4 -bd' 5 101 3230 1 20 0 46824 2860 ? Ss ? 0:06 /usr/sbin/exim4 -bd -q30m 0 1000 8368 8311 20 0 7800 1760 - S+ pts/1 0:00 fgrep exim4 -bd [7+0]~$ awk 'BEGIN { FS=":" } ( $3 == "101" ) { print $0 }' < /etc/passwd Debian-exim:x:101:103::/var/spool/exim4:/bin/false > 2. It sounds like we're talking slightly at cross-purposes here. Exim may be > calling spamassassin (PS: how?) It matters a good deal. If it's called from the content filtering hook or the ACLs, spamassassin runs as the exim UID (unless it is itself setuid, of course). But if it's called as a "transport filter", it runs as the destination user. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Checking if sa-learn is actually learning
On 2015-10-16 20:59 -0500, Ryan Coleman wrote: > sa-learn commands: > [scans domains for specified folders and scans them] > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d -exec > > /usr/bin/sa-learn --no-sync --spam --progress {}* \; > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type d -exec > > /usr/bin/sa-learn --no-sync --spam --progress {}* \; > > I swear I had issues in the past without having —no-sync, but is that causing > it? If you do the routine learning with --no-sync, you must have one run with --sync as well, maybe in a cron job. Or just run with --sync once at the end of this same script. That much is straightforward, and should be clear from the man/pod pages. The part that caused me some trouble, and is somewhat underdocumented IMO, is the interaction of --sync with --force-expire. I'm afraid I can't help you with that because I took the extreme step of disabling expiration, and instead re-creating a fresh database monthly from the recent corpus which I keep around. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Return Path (TM) whitelists
I just got in my inbox what I consider spam from the Belgian domain selling Japanese copiers printers (you probably know which one). What made it pass through SA were RCVD_IN_RP_CERTIFIED and RCVD_IN_RP_SAFE. Together they account for a whopping -5 points - a poison antidote pill! Isn't that a bit excessive? In fact, since Return Path explicitly advertises itself as a service for marketers, and I _never_ knowingly subscribe to a marketing list, these scores should be (smallish) positive as far as I'm concerned. Also, I'm unsure what membership in SAFE means, the Return Path website doesn't mention it prominently, as it does their certification program. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Return Path (TM) whitelists
On 2015-07-09 16:58 +, David Jones wrote: Did the email have a valid unsubscribe link/process? It is in Dutch, and I can't read Dutch. (Yes, I do use the language plugin.) I shortcircuit as ham for these two rule hits and never have had a report of spam that couldn't be reliably/safely unsubscribed from. (I filter about 90,000 mailboxes.) How can I tell if it is safe if I can't even read the message? But in general, to me it is spam if I didn't explicitly subscribe. And I didn't. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Return Path (TM) whitelists
On 2015-07-10 13:54 +0100, RW wrote: I don't get any spam at all in the return-path lists. ... I don't doubt that there's some abuse, but I also find it hard to believe that the accuracy of the return-path rules isn't dominated by user behaviour. Can you specify user behaviour in more detail? Are you saying it is something I (and the other posters with viewpoint similar to mine) did, or didn't do, that causes us to receive RP certified UCE? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Return Path (TM) whitelists
On 2015-07-10 16:36 +0200, Reindl Harald wrote: most users enable checkboxes which are needed to get random forms submitted, even if they say i agree to get mails from here and there and are missing the context when that mails are coming later You don't know me, so you can hardly claim a basis to lump me with most users. I repeat (for the last time, I promise): I didn't subscribe to any Belgian/Dutch list. Not by enabling a checkbox, not otherwise. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Live upgrade safe?
On 2015-09-11 17:35 +0200, Reindl Harald wrote: > >>>Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local > >>>configuration files, and without regenerating the Bayes database? (I > >>>use the default bdb Bayes store.) > >> > >>yes, but you need to run "sa-update" before restart to fetch the > >>latest rules and hopefully have a distribution which restarts > >>automatically after update the package > > > >Isn't this a contradiction? If my distribution automatically restarts > >(which it does), how can I sneak in a sa-update run after the upgrade > >but before the restart? > > i hope you have a testing environment for production and so just make > the "sa-update" there and rsync the rule-updates to the liveserver I appreciate you trying to help, but you don't really answer my question. Even if I could do what you suggest, the rsync would still take finite time - longer than the interval between the upgrade and the restart on the production system. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: [Announce] SA-Plugins: RedisAWL, RuleTimingRedis
On 2015-06-09 17:57 +0200, Benning, Markus wrote: > RuleTimingRedis - collect SA rule timings in redis I'm trying this out. I have a little annoying problem: the logs beginning on line 178 seem to go to stdout or stderr as well as syslog. The result is that cron sends me email every time spamd is restarted (after every rule update). Do you know how to change that? I find nothing about logging in perldoc Mail::SpamAssassin::Conf. I suppose I could just delete those lines from the module :-) But then I would have extra work when I merge with any new versions you have. Thanks for your ideas. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Live upgrade safe?
On 2015-08-14 17:45 +0200, Reindl Harald wrote: > >Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local > >configuration files, and without regenerating the Bayes database? (I > >use the default bdb Bayes store.) > > yes, but you need to run "sa-update" before restart to fetch the > latest rules and hopefully have a distribution which restarts > automatically after update the package Isn't this a contradiction? If my distribution automatically restarts (which it does), how can I sneak in a sa-update run after the upgrade but before the restart? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: best way to whitelist this list?
On 2015-09-19 20:12 +0200, A. Schulze wrote: > today I was notified by ezmlm that my MTA rejected messages to > me. Messages to this list where classified as spam by .. spamassassin. All of today's messages here scored around -7.5 for me, with no special handling. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: A Plan to Stop Violence on Social Media
On 2015-12-16 14:21 -0800, jdow wrote: > One thing worth pointing out is if this CAN be done refusing to do it > yourself is a shallow gesture. No, it is not. Refusing to take part in what you believe is wrong, even if you know the wrong will be done eventually because the Zeitgeist favors it, is a legitimate point of view. Then again, I don't give a rodent's back what Facebook or Twitter does. But I am afraid it won't stop there. Of course this is totally OT, so I won't post anymore of this here, but I could discuss it off-list. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Trying Bayes / Redis
On 2015-12-11 14:29 -0800, Marc Perkel wrote: > Anyone using this rule timing plugin? Having trouble getting it to > work. Just wondering if it's worth it? > > Mail::SpamAssassin::Plugin::RuleTimingRedis I use it and I have no trouble now. But I remember I had to disable the LUA scripting stuff when I set it up, it wouldn't work even though my Redis version should be recent enough to support it. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Is BAYES filtering working? Having doubts.
On 2015-12-29 20:41 -0500, Bill Cole wrote: > Neither su nor sudo magically changes the permissions or ownership of > files. If you pass filenames as arguments they must be readable by the > user actually running sa-learn, which is the *unprivileged* user > handling the system-wide BayesDB ("amavis" in the case originating > this thread, but "spamd" and "defang" are other common ones...) In > most reasonably well-secured systems using Maildir message stores, the > Maildirs are all owned by individual users or by one user that handles > delivery to "virtual users" understood by the MTA and IMAP or POP > server by not by the OS. That is generally NOT the same user running > spamd or content filters for a system-wide BayesDB. As a result, > relearning has to be done as root, shuttling data from files owned by > one user into a process running as another. You are right. The reason it works for me is that I don't use a systemwide DB. May I ask that you turn down the sarcasm a bit? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Is BAYES filtering working? Having doubts.
On 2015-12-29 19:44 -0500, Bill Cole wrote: > On 29 Dec 2015, at 18:54, Ian Zimmerman wrote: > > >In fact sa-learn accepts multiple named arguments on the command line, > >so the alternative I use is to go through the spambox N files at a time > >in a shell loop. (I have N=100 but obviously this depends.) > > Which successfully ignores the original issue of this thread completely: that > the > user sa-learn must run as cannot read the files being learnt. If you pass > unreadable > filenames as arguments, sa-learn just whines and fails. Shockingly, that is > not the > desired result. Clearly you can do the su magic if needed. The point is that the overhead which you fear is reduced N times. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Is BAYES filtering working? Having doubts.
On 2015-12-29 17:50 -0500, Bill Cole wrote: > Yes, with the advantage of using Mail::SpamAssassin::Util::secure_tmpfile() > rather > than whatever I happen to roll up in a bit of Q shell that I never get > around to > reviewing for edge cases... > > The main reason to do something like that is to avoid the heavyweight sudo & > load of > a Perl script for each message. > > > > >>The alternative without formail would be to pipe each raw message into > >>its own sa-learn. > > > >The alternative is to give it a directory. In fact sa-learn accepts multiple named arguments on the command line, so the alternative I use is to go through the spambox N files at a time in a shell loop. (I have N=100 but obviously this depends.) -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Bayes expiry vs. sync, again
I am sorry to return to this horse which has perhaps been beaten enough. But I still don't know and don't understand (_after_ reading the docs) if I can, at the same time: 1. completely disable expiry 2. force a sync of the journal I just saw with my own eyes that passing --sync to sa-learn does _not_ necessarily force one. (The manpage is ambiguous about it.) But I don't want to pass --force-expire because of 1. I am asking in the context of using the default db backend for Bayes, but if there is a way to do this with one of the other options, I'll consider it. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Interesting rule combo results
On 2016-03-09 07:12 -0800, Marc Perkel wrote: > >>HAM RULES: > >>... > >> 80056 HTML_MESSAGE > > > >What's happening here? This seems to imply that HTML_MESSAGE only > >appears in ham. > > > > > > I think my results are a little strange in that I might not be > training off all the data but just that which gets past all my other > filters. I'm still working on this but thought I'd share what it came > up with for better or worse. If I take your explanation in the OP verbatim, what happens here is that HTML_MESSAGE _without any other rule hits_ only appears in ham. Which seems entirely plausible, even if perhaps not very useful. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Disabling spamcop plugin
On 2016-04-07 14:37 +0100, RW wrote: > What exactly are you trying to do here? > > The pyzor plugin does testing and reporting, use_pyzor is mostly there > to control the test. The spamcop plugin does reporting only. So, if I don't do any explicit reporting (neither spamc -C nor spamassassin -r), the spamcop plugin is not actually used at all? sa-learn doesn't do any reporting, right? My high-level goal here is to get rid of as many configuration changes as I can in the system-managed area (/etc in my case) and achieve the same effects by other means. This is because I'm learning that I cannot trust my distro not to screw me over anymore. I noticed that I had disabled the spamcop plugin before by commenting it out in /etc/*/init.pre, and I wanted to continue not using it even after I reverted that file to its pristine distro state. By the way, manpage for spamc says: -C report type, --reporttype=type Report or revoke a message to one of the configured collaborative filtering databases. The "report type" can be either report or revoke. "To one of the databases"? Which one? Isn't this a bug in the manpage? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Disabling spamcop plugin
Is there any way to disable the spamcop plugin for an individual user (i.e. from ~/.spamassassin/user_prefs) if the plugin is loaded by /etc/spamassassin/*.pre ? By comparison, I seem to be able to disable pyzor even if it is loaded, by writing use_pyzor 0 in my user_prefs. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
[OT] still configuring [Was: Disabling spamcop plugin]
On 2016-04-12 10:57 -0400, David Niklas wrote: > You could use Gentoo, you get to configure it all yourself! Funny you'd say that, I _am_ actually switching to it - on my "workstation" role computers. I'm already over 50% over the hump, I think. But on "server type" computers, I just cannot spare a dedicated security branch. I really don't have the time, and more importantly the nerves, to scramble and recompile the world when each new vulnerability is announced. > You might also try Arch or Devuan. What distro are you using now? Debian. Have been using it over 15 years now, and watched some of the fun vanish over the last few. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: [OT] still configuring [Was: Disabling spamcop plugin]
On 2016-04-13 09:12 -0400, Michael Orlitzky wrote: > package will be recompiled automatically as part of the updates. Any > packages *depending on* that package (like, if they're statically linked > to it) will also be recompiled. But also _direct_ dependencies of the affected package, if the latest version has new requirements. And this is the heart of the problem. With a dedicated security channel like debian has, the fixes are recompiled targeted to the base release, so (for example) I'd never have to update perl because of a fix in spamassassin. In fact you can leave debian servers to update themselves unattended, most of the time. This is too huge a benefit for me to drop, even weighed against the recent debian annoyances. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: sa-update through proxy
On 2016-05-04 08:13 -0700, John Hardin wrote: > > alias sa-update='env http_proxy=http://myserver:myport/ > > https_proxy=http://myserver:myport/ sa-update' > > Lose the "env"? Why? Apart from using an extra process, this should work exactly the same. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Reporting [Was: Disabling spamcop plugin]
On 2016-04-07 13:55 -0700, Ian Zimmerman wrote: > sa-learn doesn't do any reporting, right? [snip snip] > By the way, manpage for spamc says: > >-C report type, --reporttype=type >Report or revoke a message to one of the configured >collaborative filtering databases. >The "report type" can be either report or revoke. > > "To one of the databases"? Which one? Isn't this a bug in the manpage? Unfortunately the thread went sideways into opinion territory after this, but I'd still like to clarify these factual points. Anyone? -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Re: Childish actions of Harald Reindl
On 2016-08-05 09:46 +0100, Martin wrote: > The biggest reason is the way this mailing list is set up, when you > click reply it replies to the poster not the list, this has always > been a bug bare of mine and something that probably should be > addressed. Then don't "click reply" but use a proper mail user agent (like mutt, but there are many others) that have a separate List Reply/Followup function. What "should be addressed" is the misconfigured mailing lists that mess with sender-supplied headers. -- Please *no* private Cc: on mailing lists and newsgroups Why does the arrow on Hillary signs point to the right?
Re: Issue on disable ipv6
On 2016-07-01 20:25 +0200, Massimo Sandolo wrote: > Hi, > I have an issue when try to disable ipv6. > I'm running Debian 8.3 with SpamAssassin version 3.4.0 (running on Perl > version 5.20.2). > In /etc/defualt/spamassassin the options line is the following: > OPTIONS="-4 --create-prefs --max-children 5 --helper-home-dir -x -u > usermail" > > I tried also with --ipv4-only, but it doesn't work, I'm still receiving the > following error "spamc[22477]: connect to spamd on ::1 failed, retrying (#1 > of 3): Connection refused". What is the line or lines containing "localhost" in /etc/hosts? You'll need to comment out the one with the IPv6 address (::1), and leave the one with IPv4 address (127.0.0.1) uncommented. This is all assuming you run spamd and spamc on the same host. If not, please tell us about the network setup between the two hosts. -- Please *no* private copies of mailing list or newsgroup messages. Why does the arrow on Hillary signs point to the right?
New type of monstrosity
Last couple of weeks I saw some messages whose entire contents is in the Subject. They have both a text/plain and text/html part but both are empty (in the case of html, there is some markup but no character data). The Subject is maybe 400 or 500 chars long. Needless to say, this is a 100% spam trait, but some escaped. Is there already a rule somewhere to deal with this? (not among the ones bundled with SA, I don't think) If I'm writing my own, is the naive way to match the Subject going to work? I'm asking mostly because the header is properly split and continued around 60 character bonudaries. That is, does SA join continued lines before matching? -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
Re: New type of monstrosity
On 2017-02-06 20:06, Kevin A. McGrail wrote: > > Last couple of weeks I saw some messages whose entire contents is in > > the Subject. > never seen such a monster. likely killed by some other piece in the > puzzle. Throw it up on pastebin? http://pastebin.com/PYaMcZa7 (I was wrong, the subject is actually one enormous line, it was my MUA that folded it.) -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
Re: New type of monstrosity
On 2017-02-07 09:37, Matus UHLAR - fantomas wrote: > 11.5 - 3.5 = 8.0 And of course 1.2.3.x is not the true relay address, so > 1.5 BOTNET Relay might be a spambot or virusbot > [botnet0.8,ip=1.2.3.12,rdns=disorder.censored.net,maildomain=outlook.fr,baddns] this goes out of the window as well, and you're down to 6.5 > the op may be early recipient, which is why you've got PYZOR hit, > while the OP had not. If the OP doesnt't use pyzor, I recomment to > use it - using razor, pyzor and DCC is very good idea although they > need external software. I used to have pyzor, but I dropped it for some reason I don't remember. It may be time to have another look at it. -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
Re: RFC compliance pedantry (was Re: New type of monstrosity)
On 2017-02-07 18:33, Ruga wrote: > I follow the actual RFC standard, not the proposed revisions. The To > From and Cc fields are defined by a grammar AND a natural language > description. Such fields MUST hold addresses, were an address is a > username the "@" symbol and a domain name. The string "undisclosed > recipients: ;" does not parse the grammar, and it does not pass the > natural language requirement for an address. If the sender hides the > recipients, why should I care delivering its junk to my valued > accounts? FWIW, I regularly get completely legitimate non-commercial messages with headers of this form. People use it to conceal from each recipient the addresses of other recipients - just like a list or an alias, but (I'm guessing) done entirely in the senders MUA. -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
Re: Ignore third-party SA headers
On 2017-01-26 01:03, RW wrote: > Probably what's happening is that these are emails over 500 kB which > by default are just passed through by spamc without sending them to > spamd. If they don't get sent to spamd the existing SA headers don't > get stripped. > > You can to set the -s parameter on spamc to something larger that the > largest spam you want to filter. I have never been clear about this, in two ways. The relevant bit of man spamc says: -s max_size, --max-size=max_size Set the maximum message size which will be sent to spamd -- any bigger than this threshold and the message will be returned unprocessed (default: 500 KB). If spamc gets handed a message bigger than this, it won't be passed to spamd. The maximum message size is 256 MB. The size is specified in bytes, as a positive integer greater than 0. For example, -s 50. My first confusion is that even if there's a knob I can turn up on spamc, there's a "maximum message size". What does that mean? Does spamd have its own limit? Is it really that high? And what happens if I break it? Second, is the default 500 * 1000 bytes or 512 * 1024 bytes? The example seems to suggest the latter. -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
Re: Fastest listing RBL ?
On 2017-02-15 16:30, Tom Hendrikx wrote: > Note that the period that you describe as 'seen by SA a bit later' is > typically less than a second. Not in my case. I have a custom Exim configuration where I intentionally wait for a period of time (currently 4 minutes) between SMTP acceptance and delivery (SA runs at delivery time), precisely because I want to give all the collaborative mechanisms the maximum chance to kick in. When I wrote my OP, 4 minutes was shorter than my BIND max-ncache-ttl parameter. I have since set that to 180 (3 minutes), so that angle shouldn't matter any more. Still the balance between bouncing the most junk outright and the risk of false positives means it's something to think about. > Which RBLs to use, depends on the typical spam you receive, and the > policies that you wish to apply. IMHO, the trust you put in RBLs (and > their listing policies) should be more important in making decisions > than their typical response time to new (types of) spam and their > TTLs. Agreed. -- Please *no* private Cc: on mailing lists and newsgroups Personal signed mail: please _encrypt_ and sign Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html