Re: Certain types of spam seem to get through SA
On Fri, 2014-08-29 at 16:55 -0600, LuKreme wrote: On 28 Aug 2014, at 17:38 , Martin Gregorie mar...@gregorie.org wrote: http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz This file is a compressed source archive that includes documentation for the tool and the definition file format. Any reason not to include your dataset? Yes: its almost certainly specific to my mail stream, which seems to differ quite a lot in content to what others receive: I deduce this from the samples that other list members post from time to time, which shows that they get types of spam that I never see. In addition its a personal stream and as such will be very different from both other personal streams and from the mix seen by those of list members who are looking after corporate mail systems. In addition, my portmanteau rules (the ones containing the lists of alternates) are generally intended to work with metas which are part of my local.cf collection. I've never attempted to separate them out from other metas which don't depend on portmanteau rules and suspect that it would be quite a difficult task as many of the metas are layered. Martin
Re: Give a penalty to messages with non latin UTF-8 characters?
On 29 Aug 2014, at 20:52 , jdebert jdeb...@garlic.com wrote: On Fri, 29 Aug 2014 11:41:48 +0200 Michael Opdenacker michael.opdenac...@free-electrons.com wrote: I find it hard to believe I'm the only one getting spam in Chinese characters ;) And legitimate messages as well. (Here, at least.) BLocking merely messages have more than just the Roman alphabet in them is a bit too much. I would welcome rules that would reliably penalize messages that use chinese, japanese, korean, thai, or any other characters in the UTF-8 address space that I don’t read. I would put them in user_prefs. I get a lot more spam into my inbox in the last few months than I have in many years (20+ a day, into each of 6 inboxes). To be fair, most of the chinese comes to my gmail address. -- Last night - you were unhinged. You were like some desperate, howling demon. You frightened me. - Do it again!
sa-learn and find
The following command seems to get stuck if there is no result from the find. Any suggestions on how to avoid passing an empty find result to spamd? sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7` (where user $i has no emails in notspam that are new in the last 7 days) I am already testing for the presence of the folder. Checking if the folder is empty isn’t going to help because the folder may have mail in it, just old mail. The only thing I can think of to do is something like this: MYFIND= `find $H_PATH/cur -type f -mtime -7` if [ -n $MYFIND ]; then /usr/local/bin/sa-learn --ham -u ${i} $MYFIND fi but I haven’t gotten that to work as I can seem to pass the test with a string that on echo “\”$MYFIND\”” returns “”. -- Why, you stuck-up, half-witted, scruffy-looking... NERFHERDER! Who's Scruffy looking?
Re: sa-learn and find
On 30 Aug 2014, at 07:49 , LuKreme krem...@kreme.com wrote: MYFIND= `find $H_PATH/cur -type f -mtime -7` if [ -n $MYFIND ]; then /usr/local/bin/sa-learn --ham -u ${i} $MYFIND fi Doh! if [ -n “$MYFIND” ]; then or if test -n “$MYFIND”; then Sigh. Feeling extra stupid this Saturday morning. It works, and is no longer processing thousands of old messages for no reason. #/bin/sh # # Straightforward shell script to be run as root. This parses the /home # directory for mailboxes named .Junk and learns those as spam, and then # parses the inbox (cur, not new) for ham. # sa-learn-script (sal) v2.1 Lewis Butler, released to the Public Domain 2012 UROOT=/home/ echo Running SAL for i in `ls $UROOT` ; do J_PATH=${UROOT}${i}/Maildir/.Junk; H_PATH=${UROOT}${i}/Maildir”; if test -d $J_PATH; then MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot` if test -n $MYFIND; then /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21 fi else echo No $J_PATH for $i fi if test -d $H_PATH; then MYFIND=`find $H_PATH/cur -type f -mtime -7|grep -v dovecot` if test -n $MYFIND; then echo Processing $H_PATH /usr/local/bin/sa-learn --ham -u ${i} $MYFIND #/dev/null 21 fi #else # echo No $H_PATH for $i” fi done If I were feeling really clever, I’d make sure the user existed first, but I’m not feeling that clever today. -- A marriage is always made up of two people who are prepared to swear that only the other one snores.
Re: sa-learn and find
On Sat, 30 Aug 2014 08:23:02 -0600 LuKreme wrote: if test -d $J_PATH; then MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot` mtime may not be the best choice. Ideally what you want is the the time since the spam was moved to Junk, rather than the time since it was delivered. What I see with dovecot when I move mail with claws mail is that a new file is created with the mtime preserved at the delivery time and the current epoch time in the filename. In that case the ideal would be Btime if your OS supports it, or failing that ctime. You could also use the time in the filename. Note that epoch times are 10 digits until long after we're dead so simple lexicographical comparisons between maildir filenames or between a maildir filename and an epoch time will work. You may want to check what happens with whatever you use to move the spam. if test -n $MYFIND; then /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21 This may run into shell argument limits if you have to learn a lot of spam. Consider piping the output of find to xargs, or using -exec ...{} + in find.
SA works great!
after two days running SA for the first two test-domains with a well trained bayes for the global milter-user: impressive! the few crap making it through poscreen RBL scroing is detected 0.000 0 3 0 non-token data: bayes db version 0.000 0 1389 0 non-token data: nspam 0.000 0 1350 0 non-token data: nham 0.000 0 257152 0 non-token data: ntokens Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes. Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 - BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=jenniferje...@hotmail.com to=*** signature.asc Description: OpenPGP digital signature
Re: SA works great!
Yes, it does work great when you have the bayes filter turned on and you take the time to feed it. And that means you have to feed the learner both ham and spam and setup reliable sources for those. Unfortunately if Bayes is not turned on, it does not catch more than around 60-70% of spam. As a Spamassassin user server admin, I would really like to see that improve. Ted On 8/30/2014 2:41 PM, Reindl Harald wrote: after two days running SA for the first two test-domains with a well trained bayes for the global milter-user: impressive! the few crap making it through poscreen RBL scroing is detected 0.000 0 3 0 non-token data: bayes db version 0.000 0 1389 0 non-token data: nspam 0.000 0 1350 0 non-token data: nham 0.000 0 257152 0 non-token data: ntokens Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for sa-milt:189 in 0.6 seconds, 2454 bytes. Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 - BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: END-OF-MESSAGE from snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; from=jenniferje...@hotmail.com to=***
Re: sa-learn and find
On 30 Aug 2014, at 15:32 , RW rwmailli...@googlemail.com wrote: On Sat, 30 Aug 2014 08:23:02 -0600 LuKreme wrote: if test -d $J_PATH; then MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot` mtime may not be the best choice. Ideally what you want is the the time since the spam was moved to Junk, rather than the time since it was delivered. What I see with dovecot when I move mail with claws mail is that a new file is created with the mtime preserved at the delivery time and the current epoch time in the filename. In that case the ideal would be Btime if your OS supports it, or failing that ctime. You could also use the time in the filename. Note that epoch times are 10 digits until long after we're dead so simple lexicographical comparisons between maildir filenames or between a maildir filename and an epoch time will work. On my system the file is not renamed when it is moved. You may want to check what happens with whatever you use to move the spam. Spam is delivered to the junk box at delivery time, or is manually moved via IMAP by the user. Is there a way to actually show the mtime and ctime of a file? if test -n $MYFIND; then /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21 This may run into shell argument limits if you have to learn a lot of spam. Consider piping the output of find to xargs, or using -exec ...{} + in find. Yes, I tried to do that, but as I said in my first post, if I do the find as part of the sa-learn command, then it stall when the find command returns null. -- The fact that Bob and John are married does nothing to diminish anyone else's marriage any more than a black woman marrying a white man, a Jew marrying a Catholic, or an ugly Lyle marrying a Pretty Woman