Re: Certain types of spam seem to get through SA

2014-08-30 Thread Martin Gregorie
On Fri, 2014-08-29 at 16:55 -0600, LuKreme wrote:
 On 28 Aug 2014, at 17:38 , Martin Gregorie mar...@gregorie.org wrote:
  http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz
  
  This file is a compressed source archive that includes documentation for
  the tool and the definition file format.
 
 Any reason not to include your dataset?
 
Yes: its almost certainly specific to my mail stream, which seems to
differ quite a lot in content to what others receive: I deduce this from
the samples that other list members post from time to time, which shows
that they get types of spam that I never see. In addition its a personal
stream and as such will be very different from both other personal
streams and from the mix seen by those of list members who are looking
after corporate mail systems.

In addition, my portmanteau rules (the ones containing the lists of
alternates) are generally intended to work with metas which are part of
my local.cf collection. I've never attempted to separate them out from
other metas which don't depend on portmanteau rules and suspect that it
would be quite a difficult task as many of the metas are layered.


Martin





Re: Give a penalty to messages with non latin UTF-8 characters?

2014-08-30 Thread LuKreme
On 29 Aug 2014, at 20:52 , jdebert jdeb...@garlic.com wrote:
 On Fri, 29 Aug 2014 11:41:48 +0200 Michael Opdenacker 
 michael.opdenac...@free-electrons.com wrote:
 I find it hard to believe I'm the only one getting spam in Chinese
 characters ;)
 
 And legitimate messages as well. (Here, at least.) BLocking merely
 messages have more than just the Roman alphabet in them is a bit too
 much.

I would welcome rules that would reliably penalize messages that use chinese, 
japanese, korean, thai, or any other characters in the UTF-8 address space that 
I don’t read. I would put them in user_prefs.

I get a lot more spam into my inbox in the last few months than I have in many 
years (20+ a day, into each of 6 inboxes). To be fair, most of the chinese 
comes to my gmail address. 

-- 
Last night - you were unhinged. You were like some desperate, howling
demon. You frightened me. - Do it again!



sa-learn and find

2014-08-30 Thread LuKreme
The following command seems to get stuck if there is no result from the find. 
Any suggestions on how to avoid passing an empty find result to spamd?

sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7` 

(where user $i has no emails in notspam that are new in the last 7 days)

I am already testing for the presence of the folder. Checking if the folder is 
empty isn’t going to help because the folder may have mail in it, just old mail.

The only thing I can think of to do is something like this:

MYFIND= `find $H_PATH/cur -type f -mtime -7` 
if [ -n $MYFIND ]; then
   /usr/local/bin/sa-learn --ham -u ${i} $MYFIND
fi

but I haven’t gotten that to work as I can seem to pass the test with a string 
that on echo “\”$MYFIND\”” returns “”.

-- 
Why, you stuck-up, half-witted, scruffy-looking... NERFHERDER!
Who's Scruffy looking?



Re: sa-learn and find

2014-08-30 Thread LuKreme
On 30 Aug 2014, at 07:49 , LuKreme krem...@kreme.com wrote:
 MYFIND= `find $H_PATH/cur -type f -mtime -7` 
 if [ -n $MYFIND ]; then
   /usr/local/bin/sa-learn --ham -u ${i} $MYFIND
 fi

Doh!

if [ -n “$MYFIND” ]; then

or

if test -n “$MYFIND”; then

Sigh. Feeling extra stupid this Saturday morning.

It works, and is no longer processing thousands of old messages for no reason.

#/bin/sh
#
# Straightforward shell script to be run as root.  This parses the /home
# directory for mailboxes named .Junk and learns those as spam, and then
# parses the inbox (cur, not new) for ham.

# sa-learn-script (sal) v2.1  Lewis Butler, released to the Public Domain 2012

UROOT=/home/
echo Running SAL
for i in `ls $UROOT` ; do
  J_PATH=${UROOT}${i}/Maildir/.Junk;
  H_PATH=${UROOT}${i}/Maildir”;

  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
if test -n $MYFIND; then
  /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21
fi
  else
 echo No $J_PATH for $i
  fi
  
  if test -d $H_PATH; then
MYFIND=`find $H_PATH/cur -type f -mtime -7|grep -v dovecot`
if test -n $MYFIND; then
  echo Processing $H_PATH
 /usr/local/bin/sa-learn --ham -u ${i} $MYFIND #/dev/null 21
fi
  #else
  #  echo No $H_PATH for $i”
  fi
done

If I were feeling really clever, I’d make sure the user existed first, but I’m 
not feeling that clever today.

-- 
A marriage is always made up of two people who are prepared to swear
that only the other one snores.



Re: sa-learn and find

2014-08-30 Thread RW
On Sat, 30 Aug 2014 08:23:02 -0600
LuKreme wrote:

   if test -d $J_PATH; then
 MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`

mtime may not be the best choice. Ideally what you want is the the time
since the spam was moved to Junk, rather than the time since it was
delivered. What I see with dovecot when I move mail with claws mail is
that  a new file is created with the mtime preserved at the
delivery time and the current epoch time in the filename. In that case
the ideal would be Btime if your OS supports it, or failing that
ctime. 

You could also use the time in the filename. Note that epoch times are
10 digits until long after we're dead so simple lexicographical
comparisons between maildir filenames or between a maildir filename and
an epoch time will work.

You may want to check what happens with whatever you use to move the
spam.  


 if test -n $MYFIND; then
   /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21

This may run into shell argument limits if you have to learn a lot of
spam. Consider piping the output of find to xargs, or using 
-exec ...{} + in find.





SA works great!

2014-08-30 Thread Reindl Harald
after two days running SA for the first two test-domains with a
well trained bayes for the global milter-user: impressive!

the few crap making it through poscreen RBL scroing is detected

0.000  0  3  0  non-token data: bayes db version
0.000  0   1389  0  non-token data: nspam
0.000  0   1350  0  non-token data: nham
0.000  0 257152  0  non-token data: ntokens

Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for 
sa-milt:189 in 0.6 seconds, 2454 bytes.
Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled
Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: 
END-OF-MESSAGE from
snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; 
from=jenniferje...@hotmail.com to=***





signature.asc
Description: OpenPGP digital signature


Re: SA works great!

2014-08-30 Thread Ted Mittelstaedt


Yes, it does work great when you have the bayes filter turned on and you 
take the time to feed it.  And that means you have to feed the

learner both ham and spam and setup reliable sources for those.

Unfortunately if Bayes is not turned on, it does not catch more than
around 60-70% of spam.  As a Spamassassin user  server admin, I would
really like to see that improve.

Ted

On 8/30/2014 2:41 PM, Reindl Harald wrote:

after two days running SA for the first two test-domains with a
well trained bayes for the global milter-user: impressive!

the few crap making it through poscreen RBL scroing is detected

0.000  0  3  0  non-token data: bayes db version
0.000  0   1389  0  non-token data: nspam
0.000  0   1350  0  non-token data: nham
0.000  0 257152  0  non-token data: ntokens

Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for 
sa-milt:189 in 0.6 seconds, 2454 bytes.
Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS
scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled
Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: 
END-OF-MESSAGE from
snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; 
from=jenniferje...@hotmail.com  to=***





Re: sa-learn and find

2014-08-30 Thread LuKreme

 On 30 Aug 2014, at 15:32 , RW rwmailli...@googlemail.com wrote:
 
 On Sat, 30 Aug 2014 08:23:02 -0600
 LuKreme wrote:
 
  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
 
 mtime may not be the best choice. Ideally what you want is the the time
 since the spam was moved to Junk, rather than the time since it was
 delivered. What I see with dovecot when I move mail with claws mail is
 that  a new file is created with the mtime preserved at the
 delivery time and the current epoch time in the filename. In that case
 the ideal would be Btime if your OS supports it, or failing that
 ctime. 
 
 You could also use the time in the filename. Note that epoch times are
 10 digits until long after we're dead so simple lexicographical
 comparisons between maildir filenames or between a maildir filename and
 an epoch time will work.

On my system the file is not renamed when it is moved.

 You may want to check what happens with whatever you use to move the
 spam.

Spam is delivered to the junk box at delivery time, or is manually moved via 
IMAP by the user.

Is there a way to actually show the mtime and ctime of a file?

if test -n $MYFIND; then
  /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21
 
 This may run into shell argument limits if you have to learn a lot of
 spam. Consider piping the output of find to xargs, or using 
 -exec ...{} + in find.

Yes, I tried to do that, but as I said in my first post, if I do the find as 
part of the sa-learn command, then it stall when the find command returns null.


-- 
The fact that Bob and John are married does nothing to diminish anyone
else's marriage any more than a black woman marrying a white man, a Jew
marrying a Catholic, or an ugly Lyle marrying a Pretty Woman