BAYES_999 strange behavior

2014-02-17 Thread Ian Zimmerman
Hello.  This is the first time SA is giving me enough trouble that I
need to ask for help.  I hope I get this right.

I observed a marked increase in false negatives in the last few weeks.
Only today I had enough sense to look at the detailed scores.  And, all
the escaped spams have hit the BAYES_999 rule.  I grepped the site
configuration directory:

 [3+0]~$ fgrep -h
 BAYES_999 /var/lib/spamassassin/3.003002/updates_spamassassin_org/*.cf
 ##{ BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes body
 BAYES_999  eval:check_bayes('0.999', '1.00') tflags
 BAYES_999  learn,publish describe BAYES_999  Bayes spam
 probability is 99.9 to 100% #  score BAYES_999  0  0  4.84.5
##} BAYES_999 ifplugin Mail::SpamAssassin::Plugin::Bayes

so it seems this is the highest spamminess rule, and the score in the
config file reflects that.  But the message header is:

X-Spam-Tests: BAYES_999=1,DOS_OE_TO_MX=2.523,HTML_MESSAGE=0.001,

The score for BAYES_999 is 1 in all cases :( Where does the 1 come
from???  Certainly not from my user_prefs, I go to great lengths not to
change any scores.  And the factory configuration doesn't even seem to
have this rule:

 [4+0]~$ fgrep -h BAYES_999 /usr/share/spamassassin/*.cf
 [5+0]~$

I am baffled.  Is this a bug?

My configuration:

version 3.3.2
daily sa-update run stores updates in /var/lib/spamassassin/
spamd + spamc --headers


-- 
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79  FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX


signature.asc
Description: PGP signature


Re: BAYES_999 strange behavior

2014-02-17 Thread Ian Zimmerman
On Mon, 17 Feb 2014 16:05:23 -0500
Kevin A. McGrail kmcgr...@pccc.com wrote:

Kevin BAYES_999 is just a finer gradient on BAYES_99 allowing for a
Kevin higher score on the top .001% of Bayes hits.

Thanks for your reply.  Could you explain in a bit more detail what
gradient on top (of another rule) means?  It doesn't mean the score
is meant to be additive with the base rule, does it?  'Cause these spams
_do not_ trigger any of the bayes rules _except_ for BAYES_999.  That's
why they score too low to be caught.

-- 
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79  FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX


signature.asc
Description: PGP signature


Re: sa-learn from a cronjob?

2014-04-23 Thread Ian Zimmerman
On Sun, 20 Apr 2014 12:14:37 -0700 (PDT)
Dan Mahoney, System Admin d...@prime.gushi.org wrote:

 Most of my users aren't command-line friendly.  I'd like to basically
 have my IMAP server default to handing out two imap mailboxes that
 get auto-crontabbed to training bayes.

Here is my cronjob for that purpose, in its entirety.  Note that each of
~/spam-corpora{ham,spam} is a Maildir.  There is a small race condition
between the sa-learn run and the move to cur, which wasn't worth fixing
in my case; if you use this and fix it let me know :)

-- 
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79  FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX


sa-learn-sync
Description: Binary data


Re: sa-learn from a cronjob?

2014-04-24 Thread Ian Zimmerman
On Thu, 24 Apr 2014 15:07:32 +0100
RW rwmailli...@googlemail.com wrote:

RW I don't think it will work for the purpose mentioned, and if it's
RW working properly for you, there's a lot you're not mentioning.

RW It's only looking for mail in the immediate post-delivery state
RW after it's been put into the mailbox by an MTA or MDA and before
RW it's been detected as new mail by an MUA (directly or via IMAP). It
RW wont learn mail put into the folders by an MUA or IMAP at all.

RW You need to use separate destination mailboxes.

These are _not_ general purpose Maildirs.  The normal mail processing
pipe (MTA - LDA - IMAP - MUA) knows nothing about them.  To mark
something as spam/ham, a user (me) executes a custom macro in the MUA
which pipes the message through the safecat command to deliver it
explicitly to one of these directories.  Basically, Maildir is just a
convenient container format here.  It could be a database or whatever.

Does that answer your objections?

-- 
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79  FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX


Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 07:22:56 -0400
David F. Skoll d...@roaringpenguin.com wrote:

James Is there any way to limit Bayes content checking to only the
James first X characters of the message body?  I ask this because it is
James clear that the spam messages getting through contain text meant
James to poison the tests but this gibberish always trails the main
James message and is separated by a large white space in most cases.

David In my experience, trying to be too clever with Bayes is
David counter-productive.  Those Bayes-poisoning attacks rarely work on
David a well-trained corpus.  You probably just need more training for
David Bayes to figure out what's happening.

In the last few (~10) days, I have seen a marked increase in FNs,
usually with Bayes values in the 50s and 60s.  By marked, I mean I do
pretty much nothing but adjust my various ad-hoc rules to keep from
being flooded ;-\

On close inspection, I see that the hash-busting garbage appended is
(faux) technical computing talk instead of the usual cookbooks or
classical literature :-p  That is, scrambled Stack Overflow discussions
and the like.  And of course that is what most of my ham is about, so
it makes very good sense that Bayes gets confused.

I include a magic dump just in case something is wrong with my
training.  But if not, isn't this a situation where something like
James' suggestion would help?

 [4+0]~$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   5593  0  non-token data: nspam
0.000  0   6190  0  non-token data: nham
0.000  0 148413  0  non-token data: ntokens
0.000  0 1384366530  0  non-token data: oldest atime
0.000  0 1400253567  0  non-token data: newest atime
0.000  0 1400253356  0  non-token data: last journal sync atime
0.000  0 1395423790  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime delta
0.000  0  25914  0  non-token data: last expire reduction 
count

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SPAM from a registrar

2014-05-16 Thread Ian Zimmerman
On Thu, 15 May 2014 09:45:21 -0800
Kevin Miller kevin_mil...@ci.juneau.ak.us wrote:

 Have you looked into Day old bread?
 http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB

Just for the fun of it, I did a manual whois on the domain of one random
spam I got today which was not killed by SA.

Sure enough, the domain was a day old.

Running SA --debug on the spam I can see that URIBL_RHS_DOB lookup is
attempted but comes back with NXDOMAIN.  So I have to question how
effective that rules really is ... I don't know how often the 
underlying RBL [1] refreshes - could that be the reason?

[1]
http://www.support-intelligence.com/dob/

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SPAM from a registrar

2014-05-16 Thread Ian Zimmerman
On Sat, 17 May 2014 01:34:58 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 I don't know whether DOB limits DNS queries of a single host.

 However, if you *never* get that rule firing, the NXDOMAIN result may
 indicate exceeding a query limit. Do you use a local caching DNS
 resolver, or does SA use your upstream ISP's one, along with a million
 other SA instances?

Excellent point.  I _used to_ run a local DNS cache, but got rid of it a
few months ago, in the name of simplicity.  Was that a good or bad thing
to do in the current context?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 16:20:21 -0400
Bowie Bailey bowie_bai...@buc.com wrote:

 Keep in mind that BAYES_50 and BAYES_60 still contribute positive
 scores by default.  Though it is technically a neutral result, it
 still adds a point or two to the score.

 Rather than messing with Bayes, I would focus on the spams you are
 seeing and try to find a common thread that you can use to make a
 custom rule or two to catch them.  If they all have similar garbage
 appended to them, there are probably other similarities you could
 find.

I have already made many such custom rules.  As I wrote, that's mostly
what I was working on this week :-(

For instance, I noticed many of them (but not all) put my address in the
Message-ID.  Some (but not all) use broken HTML template kits that
leave nice fingerprint marks in the body.  And so on.  But usually only
1 of them fires, at most - that is a 1.0 score, BAYES_50 is also around
1.0 I think, and that's about it - no RBL hits, no Razor or Pyzor hits.
And to add insult to injury they almost always hit RP_MATCHES_RCVD, for
a (locally modified) -0.15 boost.

So, these rules are helping, but not enough.  I am still getting about 1
unkilled spam an hour, which is too much for me.

Today I have enabled full auto-learning (prior to this, I had
bayes_auto_learn_on_error = 1).  Hopefully that will give Bayes much
more learning material.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SPAM from a registrar

2014-05-19 Thread Ian Zimmerman
On Mon, 19 May 2014 10:46:25 -0800
Kevin Miller kevin_mil...@ci.juneau.ak.us wrote:

Ian Excellent point.  I _used to_ run a local DNS cache, but got rid of
Ian it a few months ago, in the name of simplicity.  Was that a good or
Ian bad thing to do in the current context?

Kevin That's a bad thing to do.  A caching name server is pretty easy
Kevin to implement (all the distros that I've played with do it
Kevin automatically just installing bind).  Many (most?/all?) RBLs
Kevin require a subscription (read money) if you exceed a certain
Kevin number of queries.  A public dns server can hammer them quite
Kevin quickly, and thus get filtered out.  A local caching server is
Kevin definitely recommended.  I've never read any posts suggesting
Kevin reasons not to use one...

Ok, I installed a local bind instance on Saturday.  But it is not
helping: out of about 100 spams I got today (counting both those that
got flagged and those that didn't, but not counting the horrible spams
with score  15 that go directly to /dev/null), _none_ scored on
URIBL_RHS_DOB.  And I know for a fact that most of them contain fresh
domains :-(  Btw, all those domains are registered with enom.  Wth?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Thu, 15 May 2014 12:18:25 -0800
Kevin Miller kevin_mil...@ci.juneau.ak.us wrote:

 I implemented a rule that looks for multiple breaks for just that
 reason.  Can't remember where I stole it from - probably some folks
 here helped me with it a few years ago.  Can't remember who, but
 appreciated the assistance.

I am trying to do a variant of this for text/plain, as that is the type
I mostly face now.  But I cannot get it to work.

header __LOCAL_PLAIN_ASCII Content-Type =~ /text\/plain; *charset=us-ascii/i

rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m

meta LOCAL_PLAIN_ASCII_MUCHO_BLANKS (__LOCAL_PLAIN_ASCII  
__LOCAL_MUCHO_BLANKS)

Feeding message into --debug shows __LOCAL_MUCHO_BLANKS never matches.
What am I doing wrong?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 19:08:51 +0100
Martin Gregorie mar...@gregorie.org wrote:

 rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m

Martin Looking for newlines rather than whitespace? Does /\s{10,}/m
Martin work any better?

Nope, it doesn't :-(  Anyway, looking for newlines was my intention,
sorry for the misleading nomenclature.  But I guess that is irrelevant
as neither variant works.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Matching multiple newlines [Was: Bayes refinement]

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 11:50:15 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

rawbody  __LOCAL_MUCHO_BLANKS /\n\n\n\n\n\n\n\n\n\n/m

Hmmm, no, your version doesn't work, either.  Would this be of any import?

 [24+0]~$ perl --version

This is perl 5, version 14, subversion 2 (v5.14.2) built for 
i486-linux-gnu-thread-multi-64int
(with 88 registered patches, see perl -V for more detail)


-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 22:26:41 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

Karsten Seriously, the above rule, the shorter /\n{10}/, as well as the
Karsten variant posted by John without quantifier do exactly what you
Karsten asked for. They match 10 consecutive \n newline chars in the
Karsten rawbody.

Ok, thanks for the improvements.

Karsten The test message does not have that string. Maybe it uses DOS
Karsten flavor \r\n. Or what appears to be a bunch of linebreaks
Karsten actually has spaces mixed in.

Well, no.  I looked at the message (the same data I fed to s.a. --debug)
with hexdump -C.  It definitely has 10 consecutive 0a's.

For rawbody rules, is really _the whole_ body fed to the matcher at once?

-- 
Please *no* private copies of mailing list or newsgroup messages.


autolearn_force

2014-05-21 Thread Ian Zimmerman
I don't understand this setting, and reading the documentation doesn't
help.

It seems it sould make bayes learn spam whenever the total score
surpasses the value of bayes_auto_learn_threshold_spam, and not require
3 points from header and body each; that would make it a global setting
similar in purpose to bayes_auto_learn_threshold_spam.

But in fact this is a per-test setting, a subcategory of tflags.  Do I
have to specify it separately for every test?  Why?

Or is there another way to bypass the 3/3 requirement?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 15:54:42 +0100
RW rwmailli...@googlemail.com wrote:

Ian I don't understand this setting, and reading the documentation
Ian doesn't help.

Ian It seems it should make Bayes learn spam whenever the total score
Ian surpasses the value of bayes_auto_learn_threshold_spam, and not
Ian require 3 points from header and body each; that would make it a
Ian global setting similar in purpose to
Ian bayes_auto_learn_threshold_spam.

Ian But in fact this is a per-test setting, a subcategory of tflags.
Ian Do I have to specify it separately for every test?  Why?

RW The point is to set it for a small number of rules that are
RW sufficiently strong as to guarantee there will be no mislearning in
RW combination with the autolearn as spam threshold.

RW It's probably best to create a single metarule for this - something
RW that eliminates the possibility of mistraining through a lot of
RW overlapping rules. I do something similar to get more spam into my
RW high-scoring folder. I assign a lot of the near-certain spam rules
RW to different classes: BAYES, RBLs, URIBLs, relaycountry etc and then
RW count the number of classes.

The problem I am trying to solve is that nearly all of my spam is
flagged due to body rules.  The header rules seem to be close to useless
with the latest campaigns - spammers seem to have learned enough to
avoid sending obvious stinking pieces of turd.  (The one exception is
patterns in the Message-ID, but I am afraid that will be short lived
too, and is insufficient by itself even now).

Thus, even if I set bayes_auto_learn_threshold_spam low, very few of my
spams are autolearned because of the 3/3 requirement.  The damn 3/3 is
my problem - how can I work around it?  If I have to spend an hour a day
manually training the classifier the spammers have won :-(

By the way, how are meta rules counted for this purpose?  The
documentation says nothing about that.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Blank line rules

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 13:47:04 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

John Regular expressions by default only consider a single line of
John text.  You need to provide an option to say treat multiple lines
John as a single line. Try this:

rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m
rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m
rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m

James, see also the Bayes refinement thread where I posted about doing
the exact same thing.  Somehow John's multiline rules don't work for me,
either.  Kärsten was looking at it last I know.

-- 
Please *no* private copies of mailing list or newsgroup messages.


lint versus spamd log

2014-05-23 Thread Ian Zimmerman
I have diligently used

spamassassin --lint

after every edit to my user_prefs file, and made sure there was no
output.  This morning, in the course of the ongoing battle against enom
related spam, I looked in /var/log/mail.log, and imagine my surprise
when I found this logged with every delivery:

May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: 
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in 
/home/user/.spamassassin/user_prefs: loadplugin 
Mail::SpamAssassin::Plugin::AutoLearnThreshold
May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: 
loadplugin Mail::SpamAssassin::Plugin::RelayCountry
May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in 
/home/user/.spamassassin/user_prefs: loadplugin 
Mail::SpamAssassin::Plugin::RelayCountry
May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: 
loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in 
/home/user/.spamassassin/user_prefs: loadplugin 
Mail::SpamAssassin::Plugin::Rule2XSBody
May 23 09:48:04 host spamd[9033]: config: not parsing, administrator setting: 
pyzor_options --homedir /home/user/.pyzor
May 23 09:48:04 host spamd[9033]: config: failed to parse line, skipping, in 
/home/user/.spamassassin/user_prefs: pyzor_options --homedir /home/user/.pyzor

My setup is: spamc --headers from within my .procmailrc file.  Does the
above mean I cannot use these plugins in this scheme, because they are
administrator only?  That would be disappointing.

Beyond that, I don't know what to make of the pyzor related error.
Pyzor seems to be globally enabled:

 [6+0]~$ fgrep -i pyzor /etc/spamassassin/*.pre
/etc/spamassassin/v310.pre:# Pyzor - perform Pyzor message checks.
/etc/spamassassin/v310.pre:loadplugin Mail::SpamAssassin::Plugin::Pyzor

Please help.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: lint versus spamd log

2014-05-23 Thread Ian Zimmerman
On Fri, 23 May 2014 20:35:26 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

Ian spamassassin --lint

Ian after every edit to my user_prefs file, and made sure there was no
Ian output.  This morning, in the course of the ongoing battle against
Ian enom related spam, I looked in /var/log/mail.log, and imagine my
Ian surprise when I found this logged with every delivery:

Karsten That means you have been running lint check as a user, who is
Karsten not the user receiving mail. Linting also checks user_prefs,
Karsten but for obvious reasons only for the current user.

I mostly get the rest of your answer, but this is incorrect.  Same user,
I'm 100% sure.  Unless you count spamd checking on my behalf as
different user - do you?

Karsten (FWIW, what really would be disappointing is allowing users to
Karsten inject code into the daemon. Which loadplugin in user_prefs
Karsten would be.)

I assumed spamd forked to process each request, and loaded the plugins
only in the child.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: lint versus spamd log

2014-05-23 Thread Ian Zimmerman
On Sat, 24 May 2014 00:51:38 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

Ian I mostly get the rest of your answer, but this is incorrect.  Same
Ian user, I'm 100% sure.  Unless you count spamd checking on my behalf
Ian as different user - do you?

Karsten Yes.

Karsten user_prefs are per user. They are read by the spamd child
Karsten process for each and every message processed. If the spamd
Karsten daemon runs as root, the children setuid to the spamc calling
Karsten user (or given -u argument), to determine which user_prefs to
Karsten use. In your case the spamd master process already runs as user
Karsten spamd and the setuid step is omitted. The user_prefs are still
Karsten based upon the user the spamd child runs as.

Karsten Look at it this way: Both the spamd master process as well as
Karsten its children are running as an unprivileged, dedicated
Karsten user. You don't expect that user to have access to your actual
Karsten mail receiving account, do you?

Karsten My wording of user receiving mail should have been
Karsten processing user. I was a little sloppy, because your OP did
Karsten not mention spamd.  Given details are my user_prefs, logs
Karsten showing a user named user, and mentioning spamc being called
Karsten via procmail.

I apologize for muddying the waters more than necessary.  The log was
altered - user is in fact my normal user ID.

Karsten In your case of a dedicated spamd user, an attacker able to
Karsten load a plugin even potentially can access *any* other user's
Karsten mail while being processed by SA.

Karsten Again, see the Administrator Settings section in M::SA::Conf.

There is no dedicated spamd user - spamd runs as root:

 [11+0]~# ps lw 13558 13560 13561
F   UID   PID  PPID PRI  NIVSZ   RSS WCHAN  STAT TTYTIME COMMAND
1 0 13558 1  20   0  46656 40888 -  Ss   ?  0:04 
/usr/sbin/spamd --create-prefs --max-children 5 --helper-home-d
5 0 13560 13558  20   0  62016 56908 -  S?  1:11 spamd child
5 0 13561 13558  20   0  51800 46716 -  S?  0:04 spamd child

(Sorry if this is also confusion created by my obfuscation of the log.)

According to the docs, this means spamd _does_ change identity to the
originator when processing each spamc request.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-24 Thread Ian Zimmerman
On Thu, 22 May 2014 15:54:42 +0100
RW rwmailli...@googlemail.com wrote:

Ian But in fact this is a per-test setting, a subcategory of tflags.
Ian Do I have to specify it separately for every test?  Why?

RW The point is to set it for a small number of rules that are
RW sufficiently strong as to guarantee there will be no mislearning in
RW combination with the autolearn as spam threshold.

So, now I am really confused.  I think I did everything right in user_prefs:

bayes_auto_learn1
bayes_auto_learn_threshold_nonspam -2.00
bayes_auto_learn_threshold_spam 6.00
bayes_auto_learn_on_error 0

[snip]

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Nonetheless:

X-Spam-Score: 6.9
X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
 HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
 T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
X-Spam-Autolearn: no autolearn_force=no



One suspect thing I see in the log:

May 24 20:29:58 host spamd[13561]: spamd: result: Y 6 - 
BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTM
L_MESSAGE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK 
scantime=1.9,size=6208,user=itz,
uid=1000,required_score=4.3,rhost=127.0.0.1,raddr=127.0.0.1,rport=60231,mid=23931386609892239320827813
806...@86adv5n4.disabilism.eu,bayes=1.00,autolearn=no autolearn_force=no

Note the 6 - is it possible that SA truncates the score to an integer
for this purpose, and then treats it as a strict lower bound - that is,
if I set bayes_auto_learn_threshold_spam = 6.00, the lowest score
to actually trigger autolearn would be 7?

That is the only rational explanation I have, tortured as it is.

It sure looks like SA is going out of its way to force me to do manual
training :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-24 Thread Ian Zimmerman
 So, now I am really confused.  I think I did everything right in
 user_prefs:
 
 bayes_auto_learn  1
 bayes_auto_learn_threshold_nonspam -2.00
 bayes_auto_learn_threshold_spam 6.00
 bayes_auto_learn_on_error 0
 
 [snip]
 
 tflags URIBL_DBL_SPAM autolearn_force
 tflags URIBL_JP_SURBL autolearn_force
 tflags URIBL_BLACK autolearn_force
 tflags INVALID_DATE autolearn_force
 
 Nonetheless:
 
 X-Spam-Score: 6.9
 X-Spam-Tests: BAYES_99=3.5,BAYES_999=0.2,HTML_FONT_LOW_CONTRAST=0.001,
  HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.723,RDNS_NONE=0.793,SPF_PASS=-0.001,
  T_REMOTE_IMAGE=0.01,URIBL_BLACK=1.7
 X-Spam-Autolearn: no autolearn_force=no

And here's a case where it doesn't autolearn ham (same user_prefs as above):

X-Spam-Status: No
X-Spam-Level: 
X-Spam-Score: -2.7
X-Spam-Tests: BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,
 FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001,
 HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_LOW=-0.7,
 RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001
X-Spam-Autolearn: no autolearn_force=no

The documentation certainly doesn't say anything like the 3/3 and force
mechanism is in place for ham.  So this _should_ autolearn.  Right?  Right??

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-25 Thread Ian Zimmerman
On Sun, 25 May 2014 16:40:44 +0200
Axb axb.li...@gmail.com wrote:

Axb URIBL rules are not set to use 'userconf' (user configuration)
Axb so entries in user_prefs shouldn't affect the results

Axb if anything it should go in a system wide rule (ie: local.cf) (not
Axb user_prefs)

Axb your: tflags URIBL_DBL_SPAM autolearn_force

Axb should probably read:

Axb tflags URIBL_DBL_SPAM net domains_only autolearn_force

Axb etc, etc - and not in user_

Axb iirc, this will also influence Bayes's scoring/learning behaviour.
Axb modifying rules' tflags should be done with care

But it does autolearn in _some_ instances:

May 25 08:33:50 host spamd[13561]: spamd: result: Y 10 -
BAYES_99,BAYES_999,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,MIME_HTML_ONLY,
RDNS_NONE,SPF_PASS,T_REMOTE_IMAGE,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL
scantime=1.7,size=6496,user=itz,uid=1000,required_score=4.3,rhost=127.0.0.1,
raddr=127.0.0.1,rport=52900,
mid=24251386609892242521126914206...@lun5bim.dollazo.eu,bayes=1.00,
autolearn=spam autolearn_force=yes (URIBL_JP_SURBL,URIBL_DBL_SPAM,URIBL_BLACK)

So I'm afraid I can't be satisfied with this explanation.

The whole autolearning settings thing just feels way unpredictable for
me.  If there are so many hurdles, does anyone actually do it?

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: autolearn_force

2014-05-25 Thread Ian Zimmerman
On Sun, 25 May 2014 20:06:22 +0200
Axb axb.li...@gmail.com wrote:

Axb Yes, when it reached certain conditions and a score above 15.0

Axb you can tune that score via local.cf entries:

Axb bayes_auto_learn_threshold_nonspam bayes_auto_learn_threshold_spam

Please see the prefs in my post upthread - I have already done this.
That's why I am so confused, and frankly, irritated.  I have done
everything the documentation says to do, and it still behaves magically
and strangely.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Capture vs non-capture groups

2014-05-28 Thread Ian Zimmerman
On Wed, 28 May 2014 10:47:35 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

John The only place I've found backreferences useful is when writing a
John header rule that is looking for the same string in multiple
John headers.

John Other than that, captures are very rare.

There was a pattern in the recent campaigns where backreferences would
be perfect.  So far I have been busy trying other approaches but I may
come back to that.

Example at

http://pastebin.com/KUJAWdHq

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SA without procmail?

2014-06-20 Thread Ian Zimmerman
On Wed, 18 Jun 2014 15:24:36 +0200
Axb axb.li...@gmail.com wrote:

Axb Dovecot's Sieve is your friend.  (replaces procmail)

Not really, not in this context.  OP is using procmail merely as a LDA.
And in that capacity, is is replaced by the LDA that comes with dovecot.
On my debian system, it is /usr/lib/dovecot/dovecot-lda.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SA without procmail?

2014-06-20 Thread Ian Zimmerman
On Fri, 20 Jun 2014 14:05:04 +0100
Timothy Murphy gayle...@eircom.net wrote:

 Is there something similar I could append instead to use dovecot-lda?

Yes.

mailbox_command = /usr/libexec/dovecot/dovecot-lda

or

mailbox_command = /usr/libexec/dovecot/dovecot-lda -m INBOX

I don't know postfix, so I can't help with the magic to substitute
another mailbox for INBOX.

Or you can do this with a .forward file (I am sure postfix supports
those):

echo '|sh -c \'/usr/lib/dovecot/dovecot-lda || exit 75\''  ~/.forward

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: SA and Ubuntu 14.04 LTS

2014-07-16 Thread Ian Zimmerman
On Wed, 16 Jul 2014 06:09:08 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 And to really include *local* plugins, provide a relative path (to the
 current site-wide configuration dir, without a leading slash) as
 optional second argument to the loadplugin statement. There's hardly
 ever any need for a full absolute path. And if there is, there's
 something wrong with your environment.

There _is_ something wrong with his environment: he's running
Ubuntu. :-)

Sorry, couldn't resist.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Ready to throw in the towel on email providing...

2014-07-28 Thread Ian Zimmerman
On Mon, 28 Jul 2014 12:57:38 -0400
David F. Skoll d...@roaringpenguin.com wrote:

David 1) Gmail is actually pretty good at filtering spam.  I can't
David speak for MSFT since I don't use it.

David 2) Especially in North America, companies are short-sighted and
David go for quick fixes and things that look cheap up-front without
David considering the long-term costs.

David 3) Especially in North America, people don't see the value in
David learning technology.  They want simple, spoon-fed solutions and
David they love the word oursourcing.  Sorry if (2) and (3) are not
David PC, but the slag against North Americans is based on my personal
David experience. :) And hey, I'm Canadian so I can dis my own crowd...

David 4) Most non-technical small businesses equate Mail Server with
David Microsoft Exchange, and Microsoft has steadily been making
David Exchange more and more of a PITA to administer.  Each new version
David of Exchange breaks things and requires learning new procedures.
David Combine that with (3) and we see that MSFT is using on-premise
David Exchange as a trojan horse to get people on O-365.  The huge pool
David of managed service providers that recommend MSFT solutions is
David by-and-large staffed by incompetents who are only too happy to
David shove their customers onto O-365 and collect kickbacks every
David month.

Good summary, but I think you forgot (5):

They have prettier icons.

I am not 100% kidding, either.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Mojibake alert [Was: Advice sought on how to convince irresponsible Megapath ISP]

2014-08-18 Thread Ian Zimmerman
On Sun, 17 Aug 2014 07:37:36 -0700,
Linda Walsh sa-u...@tlinx.org wrote:

 Karsten Brmojibake elided/ wrote:

In addition to other problems with your posts (which experts here have
already pointed out), your scripts clearly do not handle non-ASCII
emails well, as you have completely mangled Karsten's name in your quote.

The days when you could do all email processing with the basic Unix
tools like sed and tr are long gone.  Please look into MIME-aware tools
or libraries.  The python email package, for instance, is excellent.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Bayes training via inotify (incron)

2014-08-22 Thread Ian Zimmerman
On Fri, 22 Aug 2014 08:34:34 +,
Eric Wong e...@80x24.org wrote:

Eric I always thought inotify was an obvious way to train for anybody
Eric using Maildirs on Linux, so I set it up for my server and
Eric basically forgot about it since it worked well.  Fast forward to
Eric 2014 and I realize what I do is not widespread.  I figure I'll
Eric attempt to document things here to a wider audience on this
Eric sa-users list and hopefully help other users out.

Isn't inotify a bit of overkill for this?  If you have a dedicated
maildir for training, you know that anything in maildir/new is, uh,
new.  So you process it and move it to maildir/cur.  What am I missing?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Learning both spam and ham, edge case

2014-08-22 Thread Ian Zimmerman
I know that if you misclassify a mail as spam with

 sa-learn --spam /path/to/ham

you can later run

 sa-learn --ham /path/to/ham

to correct the mistake, and SA will do the right thing (ie. forget the
wrong classification).  And conversely, with ham - spam.

My question is, what happens if you run

 sa-learn --spam /path/to/spam --ham /path/to/ham

and the same message is in both mailboxes?  Is the behavior even
well-defined (ie. not random)?  And if so, can it be relied on in new
versions?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: drop of score after update tonight

2014-08-25 Thread Ian Zimmerman
I definitely have FNs today (about 10 by now today, normally 0).

Looks like some/all RBLs tests are not working.  I have not changed my
configuration at all.

Sample here:

http://pastebin.com/dsqaVA9Z

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: drop of score after update tonight

2014-08-25 Thread Ian Zimmerman
On Mon, 25 Aug 2014 19:50:20 +,
David Jones djo...@ena.com wrote:

Ian I definitely have FNs today (about 10 by now today, normally 0).

Ian Looks like some/all RBLs tests are not working.  I have not changed
Ian my configuration at all.

Ian Sample here:

Ian http://pastebin.com/dsqaVA9Z

David This hit DCC_CHECK, BAYES_50, CRM114, BOGOFILTER and KAM_EU rules
David and would have been blocked on my SA 3.4.0 servers.

Isn't it a bit odd that SA has rules for all these other Bayes powered
backends?  Why not give a bit more weight to its own Bayes instead,
rather than make users forage for other tools that do essentially the
same thing?

David (I understand that the DCC_CHECK hit could have also hit on your
David mail server too after time had passed if you have DCC enabled.)

Don't you need non-free software for DCC?

(Meanwhile, more spam came in.  This is definitely a crisis for me.)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: drop of score after update tonight

2014-08-26 Thread Ian Zimmerman
On Tue, 26 Aug 2014 08:10:23 +0200,
Matus UHLAR - fantomas uh...@fantomas.sk wrote:

Ian Isn't it a bit odd that SA has rules for all these other Bayes
Ian powered backends?  Why not give a bit more weight to its own Bayes
Ian instead, rather than make users forage for other tools that do
Ian essentially the same thing?

Matus are they part of stock 3.4.0?

Apparently not.  So, I have to rephrase: Isn't it a bit odd to use
these external rules? :)

Ian Don't you need non-free software for DCC?

Matus non-free in Debian definition.
Matus (you need own server if you process ofer 100k messages daily, and
Matus license if you have internal checksum database)
Matus you can get the source, build and run in most of cases freely.

But that presents difficulties even apart from the religious ones.  For
instance, it means installing development tools on the target server, or
else cross-compiling (and we know how easy that is with average C code).

The good news is the bout of spam seems to have calmed down.
_Something_ must have been wrong earlier today.  The RBLs and Razor and
Pyzor all seemed to be out to lunch.  Maybe a connectivity problem on my
side.

 Christian Science Programming: Let God Debug It!.

May I quote this? :-)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Give a penalty to messages with non latin UTF-8 characters?

2014-08-31 Thread Ian Zimmerman
On Sat, 30 Aug 2014 06:44:39 -0600,
LuKreme krem...@kreme.com wrote:

LuKreme I would welcome rules that would reliably penalize messages
LuKreme that use chinese, japanese, korean, thai, or any other
LuKreme characters in the UTF-8 address space that I don’t read. I
LuKreme would put them in user_prefs.

Doesn't ok_languages and ok_locales do the job?  It does for me.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: bayes scroing too low

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 12:20:41 +0200,
Axb axb.li...@gmail.com wrote:

Axb Bayes scores are *not* set to be a sole indicator of spam/ham.
Axb They're supposed to be yet another indicator.

FWIW, I use both Razor and Pyzor, and there are times when they seem to
be just asleep.  Or maybe a particular kind of spam defeats their hash
protection methods.  Then for some hours I get repeated cases like
Harald's - positive BAYES_999 but nothing much else.  It is quite
frustrating.

I started using the KAM rules and they seem to push most such messages
over - but then _they_ include rules with 5+ scores ...

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sat, 30 Aug 2014 19:59:53 -0600,
LuKreme krem...@kreme.com wrote:

RW This may run into shell argument limits if you have to learn a lot
RW of spam. Consider piping the output of find to xargs, or using -exec
RW ...{} + in find.

LuKreme Yes, I tried to do that, but as I said in my first post, if I
LuKreme do the find as part of the sa-learn command, then it stall when
LuKreme the find command returns null.

xargs (the GNU one at least) has an option to not run the inferior when
there are no args to give it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: SA works great!

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 16:55:50 +0200,
Axb axb.li...@gmail.com wrote:

Axb During the last +-4 years, scores have been set by the masscheck GA
Axb system.  IF more ppl would contribute with masschecks and rules,
Axb detection could be better, but the lack of volunteers doing this
Axb shows that apparently what SA does is good enough or there is
Axb little interest in commitment.

So, how do I take part in masscheck?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 17:37:50 -0600,
LuKreme krem...@kreme.com wrote:

Ian xargs (the GNU one at least) has an option to not run the inferior
Ian when there are no args to give it.

LuKreme The interior is the find:

_Inferior_ which is GNU speak for subprocess.  I should have tried to
be less concise :-)

 sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u 
${i}

LuKreme (FreeBSD xargs never runs the command if the input is empty)

You may not need -r then.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: bayes scroing too low

2014-09-01 Thread Ian Zimmerman
On Sun, 31 Aug 2014 12:20:41 +0200,
Axb axb.li...@gmail.com wrote:

Axb get the source from http://razor.sourceforge.net/ I don't recommend
Axb installing via some rpm.

The last version mentioned on that site is 2.84, from May 2007.

strangely, the version on current Debian packages is 2.85.  Anyone
know what's going on here?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: large spam messages

2014-09-06 Thread Ian Zimmerman
On Thu, 4 Sep 2014 12:52:34 -0400 (EDT),
Jude DaShiell jdash...@panix.com wrote:

Jude Since spamassassin cannot handle large spam over 2MB in size, what
Jude can be used to handle that class of junk?

I use a script on the MX host to MIME reshape all large messages, dropping
all non-text attachments, and save them to files there, before forwarding
to my IMAP server.  If such a message is ham (which is almost never) it
is easy enough to download the files after the fact.

Can share the script for the asking.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Reply versus new thread [Was: Dumping email with blank To: header ?]

2014-09-06 Thread Ian Zimmerman
Others have gracefully answered as to the substance of your message.

I'll have to be a pest and ask that you please do not use Reply or
Followup when you're starting a new topic.  For list readers with user
agents that thread the standard (RFC standard) way, that breaks
threading.

The way to start a new topic is to copy the list address, do a New
Message or similar, and paste the address into the destination field.
You can also save the address in your contact list / address book to
avoid the copy and paste in the future.

Thanks for your cooperation.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn from a remote imap folder

2014-09-12 Thread Ian Zimmerman
On Fri, 12 Sep 2014 07:45:22 -0500,
Dave Pooser dave...@pooserville.com wrote:

Marcus spamassassin and imap (cyrus) are running on different
Marcus boxes. What is best practice to learn spam from a remote imap
Marcus folder?

Dave At $DAYJOB we export the spam folder (and a ham folder for FPs)
Dave via NFS and mount them on the frontline SA servers for sa-learn.

Doesn't that smell of locking issues?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


KAM_BODY_URIBL_PCCC misfire

2014-09-15 Thread Ian Zimmerman
I have just had a false positive due to KAM_BODY_URIBL_PCCC (good for 5
pts.), for no apparent reason whatsoever.  The are no URIs in the body.

spample here:

http://pastebin.com/6kaxtNcq

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: more_spam_from like more_spam_to

2014-09-18 Thread Ian Zimmerman
On Wed, 17 Sep 2014 13:43:49 +0100,
RW rwmailli...@googlemail.com wrote:

RW A lot of people don't put mailing lists through Spamassassin, most
RW of them have already been spam filtered, and to get the best results
RW you have to extend your internal network and maintain it.

Do you mean the trusted_networks setting here?

I do sometimes get spam from lists, and so far I have been feeding list
traffic to SA just like everything else.  It doesn't seem to have any
adverse effects.  My trusted_networks is set to just the MX host.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: more_spam_from like more_spam_to

2014-09-19 Thread Ian Zimmerman
On Fri, 19 Sep 2014 08:37:45 +0200,
Matus UHLAR - fantomas uh...@fantomas.sk wrote:

RW A lot of people don't put mailing lists through Spamassassin, most
RW of them have already been spam filtered, and to get the best results
RW you have to extend your internal network and maintain it.

Ian Do you mean the trusted_networks setting here?

Matus no... they do not filter mail from mailing lists through SA.  it
Matus is setting in outside spamassassin, usually in MTA, milter or
Matus procmail.

Matus trusted_networks is SA configuration setting so it can't be used
Matus when SA is avoided. Also, it has much different meaning than not
Matus scanning mail from those hosts.

Well, that is not how I read RW's message.

To me, it sounds like this:

Lots of people don't put mailing lists through Spamassassin, in part
because of the extra work that would be required if they did; namely,
they'd have to extend their internal network and maintain it.  This is
required for best results.

(I'm not a native English speaker either, but I've probably been
speaking and reading it a bit longer than you.  Just guessing, and of
course no disrespect meant.)

Only RW can clarify ...

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Non-English spam

2014-09-27 Thread Ian Zimmerman
On Thu, 25 Sep 2014 13:13:07 -0400,
dar...@chaosreigns.com wrote:

 To enable TextCat to flag everything that's not English, in local.pre
 I have:
 loadplugin Mail::SpamAssassin::Plugin::TextCat

 And in local.cf I have:
 ok_languages en

I have done this too, but I live in an English speaking country.

If I had to do this while living in a Polish speaking country, I'd
consider that the spammers have won.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: spam - why spam score is low,

2014-09-28 Thread Ian Zimmerman
On Fri, 26 Sep 2014 17:07:31 +0200,
Antony Stone antony.st...@spamassassin.open.source.it wrote:

motty Received: from maria.fqdn.com ([127.0.0.1])

Antony That won't be helping - it means you're not basing any tests on
Antony the sending server.  can you run SA on your inbound MX instead
Antony of relaying locally first?

Is this right?  Isn't this precisely what the internal_networks setting
works around?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: what's wrong

2014-10-01 Thread Ian Zimmerman
On Tue, 30 Sep 2014 09:47:41 +0200,
Matus UHLAR - fantomas uh...@fantomas.sk wrote:

 Do you trust smtp.cesky-hosting.cz?
 Even if it's open socks and http proxy server?

I wonder if slovensky-hosting.sk does better :-P

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Local URL blocking based on NS records?

2014-10-06 Thread Ian Zimmerman
On Fri, 03 Oct 2014 00:08:49 +0200,
Axb axb.li...@gmail.com wrote:

Axb What's wrong with running rbldnsd?  It's the tool all BLs use for
Axb mirroring BL data. It's so stable and simple to use nothing can
Axb beat it.

From the website:

 There is no config file, rbldnsd accepts all configuration in command line.

A bit too simple, I'd say.  What about kernel argv limits?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Regarding mass-check access

2014-10-11 Thread Ian Zimmerman
On Fri, 10 Oct 2014 16:19:39 -0400,
staticsafe m...@staticsafe.ca wrote:

 I sent an email to priv...@spamassassin.apache.org regarding access to
 mass-check back on the first of September. Is anybody out there? :)

So did I, on August 31, to be precise.  Crickets for me, too.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: procmail (was Re: Spam messages bypassing SA)

2014-10-28 Thread Ian Zimmerman
On Fri, 24 Oct 2014 08:43:41 -0400,
David F. Skoll d...@roaringpenguin.com wrote:

David Procmail is also unmaintained abandonware, as far as I can tell.
David If you use SpamAssassin, you probably like Perl, so I would
David recommend Email::Filter instead.  It's far more flexible than
David procmail and lets you write readable filters.

David Since procmail is still the default LDA on Debian, this is my 
.procmailrc:

David :0
David | /usr/bin/perl /home/dfs/.mail-filter.pl  /home/dfs/.mail-filter.log 
21

David And excerpts from my filter look something like this:

Or you could run dovecot and its sieve plugin.  Sieve is a real standard
(RFC 5228) which procmail never was.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: procmail

2014-10-28 Thread Ian Zimmerman
On Tue, 28 Oct 2014 11:43:04 -0700
jdow j...@earthlink.net wrote:

jdow That is hardly a compelling reason to change from procmail to
jdow perl, for me or others with working procmail systems. You seem to
jdow be advocating handing me perl and turning me loose after ripping
jdow procmail out of my hands. That does not endear you to me. It isn't
jdow broken. So why fix it? There is a tremendous amount of experience
jdow out there setting it up and using it. Is that a reason to discard
jdow it for something new? We're seeing the fruits of that sort of
jdow divisiveness with the systemd controversy. If fix means better and
jdow still 100% compatible it is an easy sell. If fix means 0%
jdow compatible being better is not good for people with better things
jdow upon which to spend their time than learning a new way shoved down
jdow their throats. In the abstract you are right. In the practical,
jdow that rightness appears to tarnish.

You sound like you're replying more to me than to David.

How do you match non-ASCII From: in procmail?  Note that the encoding
may differ, even for the same sender, depending on which MUA he's using
ATM.

_Some_ old stuff deserves to be replaced.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: SOUGHT 2.0 ?

2014-11-12 Thread Ian Zimmerman
On Sat, 01 Nov 2014 10:06:57 -,
Kevin Golding k...@caomhin.org wrote:

Kevin So anyone else want to raise their hands?

It depends.

Would I mind a bit of regular maintenance work?  No, I wouldn't mind.

Would I mind a major change in how I run my server - for instance,
run a virus checker, or run the bleeding edge version of SA?  You
betcha.  Not going to do that, sorry.

So, I need more details before I raise my hand much above the keyboard :-P

Of course, I'd love to have the autogenerated rules back, so call me selfish.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: SOUGHT 2.0 ?

2014-11-13 Thread Ian Zimmerman
On Thu, 13 Nov 2014 09:28:30 -,
Kevin Golding k...@caomhin.org wrote:

Kevin The main thing that's going to be needed is good, reliable,
Kevin data. We'll only get good rules with good feeds. That should be
Kevin fairly low impact for people in many respects.

Kevin Obviously there's always room to help with some code, so a bit of
Kevin Perl or shell skills are a good thing. The impact of that on
Kevin people will vary on how they work, but I doubt anyone will do
Kevin anything to interfere with their running systems - as proven with
Kevin masschecks it's fairly easy to sandbox things to one side for
Kevin such analysis even if people do want to do anything on an
Kevin important system.

Ok, I am still interested.  I'm a coder, my Perl is rusty but my shell
is current.  I can't provide trap servers but you'd be welcome to my
spam (all hand-verified by me).

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: SOUGHT 2.0

2014-12-04 Thread Ian Zimmerman
On Thu, 04 Dec 2014 22:41:13 +0100,
Axb axb.li...@gmail.com wrote:

Axb To be able to create usable rules, several times/day I need feeds
Axb to spit *at least* +150k/day. As I don't have the data

150k of what?  Bytes?  Emails?  Tokens?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.
Local Variables:
mode:claws-external
End:


whitelist_from_rcvd not working, WAIDW

2015-02-27 Thread Ian Zimmerman
Header of test message, massaged for privacy, is here:

http://pastebin.com/EV6g15aN

I have this in user_prefs:

 trusted_networks 198.1.2.3/32

 [...lots snipped...]

 whitelist_from_rcvd *@wetransfer.com *.wetransfer.com

Why is the whitelist not firing?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.
Local Variables:
mode:claws-external
End:



Re: whitelist_from_rcvd not working, WAIDW

2015-02-28 Thread Ian Zimmerman
On Sat, 28 Feb 2015 13:37:29 +0100,
Mark Martinec mark.martinec...@ijs.si wrote:

Ian trusted_networks 198.1.2.3/32
Ian [...lots snipped...]
Ian whitelist_from_rcvd *@wetransfer.com *.wetransfer.com

Mark It seems the:

Mark Received: (from itz@localhost)
Mark by myalias.trusted.mx (8.14.4/8.14.4/Submit) id t1N7YK8O020727
Mark for i...@my.post.office; Sun, 22 Feb 2015 23:34:20 -0800

Mark is breaking a trust chain.

It shouldn't.  I forgot to add that all of the following resolve to
198.1.2.3:

my.domain
my.trusted.mx
myalias.trusted.mx

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.
Local Variables:
mode:claws-external
End:



Confused about Bayes expiry

2015-05-24 Thread Ian Zimmerman
I am very confused by the various features involving expiry from Bayes.

perldoc Mail::SpamAssassin::Conf :

   bayes_expiry_max_db_size  (default: 15)

   What should be the maximum size of the Bayes tokens database?
   When expiry occurs, the Bayes system will keep either 75% of
   the maximum value, or 100,000 tokens, whichever has a larger
   value.  150,000 tokens is roughly equivalent to a 8Mb
   database file.

   bayes_auto_expire (default: 1)

   If enabled, the Bayes system will try to automatically expire
   old tokens from the database.  Auto-expiry occurs when the
   number of tokens in the database surpasses the
   bayes_expiry_max_db_size value. If a bayes datastore backend
   does not implement individual key/value expirations, the
   setting is silently ignored.

   bayes_token_ttl   (default: 3w, i.e. 3 weeks)

   Time-to-live / expiration time in seconds for tokens kept in
   a Bayes database.  A numeric value is optionally suffixed by
   a time unit (s, m, h, d, w, indicating seconds (default),
   minutes, hours, days, weeks).

   If bayes_auto_expire is true and a Bayes datastore backend
   supports it (currently only Redis), this setting controls
   deletion of expired tokens from a bayes database. The value
   is observed on a best-effort basis, exact timing promises are
   not necessarily kept. If a bayes datastore backend does not
   implement individual key/value expirations, the setting is
   silently ignored.

This really sounds as if expiry is a no-op for backends other than
Redis.  And yet Debian bug #334829 [1] exists, and has spawned a whole
subculture of solutions and work-arounds.  (Sorry for the slight
exaggeration.)  Clearly the users reporting these problems do not use
Redis, in fact by all signs they use the default DB backend, as I do.
So should I be worried about the expiry overhead and set up a separate
--force-expire job?  I am confused.

[1]
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=334829

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Confused about Bayes expiry

2015-05-24 Thread Ian Zimmerman
On 2015-05-24 23:25 +0200, Mark Martinec wrote:

Mark With other bayes back-ends the traditional expiration mechanisms
Mark need to be used, either auto-expiration runs triggered from time
Mark to time by SpamAssassin, or explicit expiration runs, e.g. from a
Mark cron job. With these traditional back-ends the bayes_token_ttl
Mark setting has no effect.

Perhaps this paragraph could be included verbatim in the podfile, and
the current wording (especially about bayes_auto_expire) removed :-)
Thanks.

But, in fact I already have a cronjob running sa-learn
--force-expire.  The reason I would prefer to remove it (and so the
reason for my original post) is that it does a journal sync as well,
which I didn't intend and which interferes with other things.

Would sa-learn --no-sync --force-expire make sense?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Confused about Bayes expiry

2015-05-25 Thread Ian Zimmerman
On 2015-05-25 09:43 +0200, Matus UHLAR - fantomas wrote:

Ian But, in fact I already have a cronjob running sa-learn
Ian --force-expire.  The reason I would prefer to remove it (and so
Ian the reason for my original post) is that it does a journal sync as
Ian well, which I didn't intend and which interferes with other things.

Matus what other things? Journal is here to fasten database updates,
Matus not to avoid database writes. too big journal slows things down.

Matus The main reason to use manual expire is to avoid ocassional
Matus delays with automatic expire noted in the bugreport you posted
Matus link to.

Matus so, again, what are reasons you want to avoid journal syncs?

I do the database updates in a batch fashion, learning each input
message with --no-sync, then doing a --sync at the end.  This --sync
cannot wait too long because I want to defend against current spam.
That is, it cannot wait as long as the typical time between expires.
But if an explicit expiry happens to run at the same time, the result is
a mess.

Of course there is a simple solution, have a single job which decides by
itself if it's time to expire or not, rather than rely on the cron
schedule.  But it seemed to me that the two tasks were independent and
so should be in separate jobs.  As it was explained in the other
subthread, I was wrong with that assumption.

Thanks.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: no reporting methods available

2015-07-31 Thread Ian Zimmerman
On 2015-07-31 18:28 -0500, David B Funk wrote:

 Reporting is separate from learning.
 
 It is the case that spamassassin -r is supposed to report and learn.
 However it isn't quite the same as sa-learn --spam in that unlike
 sa-learn --spam it won't override the spam learn prohibition of BAYES_00.

Thanks, that is useful to know.  However, it isn't really relevant to
this situation.  My point is: if learning _is_ part of the job of
spamassassin -r, then does it have to fail for the no method available
message to be emitted?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



no reporting methods available

2015-07-31 Thread Ian Zimmerman
I run spamassassin -r from cron nightly.  Last night I got this output:

Jul 30 23:00:11.830 [31065] warn: reporter: no reporting methods
available, so couldn't report
Jul 30 23:00:11.830 [31065] warn: spamassassin: warning, unable to
report message
Jul 30 23:00:11.830 [31065] warn: spamassassin: for more information,
re-run with -D option to
see debug output

I tried to follow the instructions and run

spamassassin -D -r `ls spam`

but that hangs without producing any output.

The only external reporting method I'm aware of that should be active is
Razor.  Running razor-report `ls spam` works normally as expected.

Aside from getting an explanation of what happened this time, I'd also
like to clarify more generally what spamassassin -r does.  From a recent
thread here I learned that it also does the equivalent of sa-learn
--spam.  Right?  So presumably it doesn't consider this a reporting
method or how could it be not available?

Also I recently installed the bogofilter plugin by Christian Laußat, and
my understanding is that (when bogofilter_learn is set to 1, as it is),
it advertises itself as another external reporting agent.  So shouldn't
this also happen during a spamassassin -r run, and how could it be not
available?


-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
~$ grep '^bayes_expiry_max_db_size' ~/.spamassassin/user_prefs | awk '{print 
$2}' 
200
~$ sa-learn --force-expire
bayes: synced databases from journal in 0 seconds: 2784 unique entries (2805 
total entries)
~$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  24501  0  non-token data: nspam
0.000  0  23548  0  non-token data: nham
0.000  02009202  0  non-token data: ntokens
0.000  0 100071  0  non-token data: oldest atime
0.000  0 1438755640  0  non-token data: newest atime
0.000  0 1438755988  0  non-token data: last journal sync atime
0.000  0 1438756034  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime delta
0.000  0  20174  0  non-token data: last expire reduction 
count

??wth???  I thought I _finally_ understood this stuff :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 12:58 +0100, RW wrote:

 The number of tokens is within 0.5% of the configured value. It's
 designed to produce a value between 75% and roughly 150%.

I can't quite parse that answer, so let's be more specific.

Doc says:

  bayes_expiry_max_db_size  (default: 15)

What should be the maximum size of the Bayes tokens database?  When
expiry occurs, the Bayes system will keep either 75% of the maximum
value, or 100,000 tokens, whichever has a larger value.

From this (and the more elaborate description in the EXPIRATION section,
which I've also read) I thought it worked roughly like this:

if (ntokens  bayes_expiry_max_db_size)
do_nothing()
else
goal_ntokens = max(10, 0.75 * bayes_expiry_max_db_size)
while (ntokens  goal_ntokens)
kill_oldest_tokens()

If I misunderstood, how/where?  Sorry for my density :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Live upgrade safe?

2015-08-14 Thread Ian Zimmerman
Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
configuration files, and without regenerating the Bayes database?  (I
use the default bdb Bayes store.)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 19:34 +0100, RW wrote:

 What it actually does is estimate a cut-off time and then delete all
 tokens older than that. How it gets the cut-off time is described the
 next two sections:  EXPIRE LOGIC and ESTIMATION PASS LOGIC.

OMG.  For one thing, are the clauses in the definition of weird
conjunctive or disjunctive?

A more insolent question, why this complexity?  Why can't I force an
expire when I feel like it? :-P  Or can I?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



another bayes oddity

2015-07-23 Thread Ian Zimmerman
I have

bayes_auto_learn0
bayes_auto_expire 0
bayes_learn_to_journal 0

add_header all Autolearn _AUTOLEARN_


and indeed, all messages are tagged with

X-Spam-Autolearn: disabled


Nevertheless, the mtime _and_ size of ~/.spamassassin/bayes_journal
inches forward with every delivery.  Why?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Large spam

2015-07-15 Thread Ian Zimmerman
On 2015-07-15 20:12 +, Zinski, Steve wrote:

 We're starting to see a lot of spam in the 800KB to 1.2MB size
 range. I’m running MIMEdefang and it’s configured to skip messages
 larger than 100KB (and I hesitate to increase the limit due to
 performance issues). I read somewhere that there’s a way to have
 MIMEdefang (or spamassassin) strip out the non-text portions of the
 e-mail and scan. Can anyone help me set this up or point me in the
 right direction? Thanks!

Yes, I see the same thing.  I have no doubt at all that it is
intentional, to defeat spamc size limit in particular.

Moreover, mimedefang won't help because at least some of them are
disguised as plain text messages.  That is, the outermost message body
is an entire MIME message, headers and all.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Debian jessie - new setup, missing data directory

2015-11-09 Thread Ian Zimmerman
On 2015-11-09 16:42 +0100, Antony Stone wrote:

> What did Jessie install it as?
> 
> > > /var/mail/.spamassassin/user_prefs

This is very strange.  Are you really sure it is not operator error?

I run wheezy, so I can't flat out exclude it, but it flies in the face
of too much Debian tradition. /var/mail is just for the spool mailboxes.

> 1. I seriously doubt that on a Debian system exim is running as root.

Indeed:

 [6+0]~$ ps axl | fgrep 'exim4 -bd'
5   101  3230 1  20   0  46824  2860 ?  Ss   ?  0:06
/usr/sbin/exim4 -bd -q30m
0  1000  8368  8311  20   0   7800  1760 -  S+   pts/1  0:00
fgrep exim4 -bd
 [7+0]~$ awk 'BEGIN { FS=":" } ( $3 == "101" ) { print $0 }' <
 /etc/passwd
Debian-exim:x:101:103::/var/spool/exim4:/bin/false

> 2. It sounds like we're talking slightly at cross-purposes here.  Exim may be 
> calling spamassassin (PS: how?)

It matters a good deal.  If it's called from the content filtering hook
or the ACLs, spamassassin runs as the exim UID (unless it is itself
setuid, of course).  But if it's called as a "transport filter", it runs
as the destination user.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Checking if sa-learn is actually learning

2015-10-16 Thread Ian Zimmerman
On 2015-10-16 20:59 -0500, Ryan Coleman wrote:

> sa-learn commands:
> [scans domains for specified folders and scans them]
> > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d -exec 
> > /usr/bin/sa-learn --no-sync --spam --progress {}* \;
> > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type d -exec 
> > /usr/bin/sa-learn --no-sync --spam --progress {}* \;
> 
> I swear I had issues in the past without having —no-sync, but is that causing 
> it?

If you do the routine learning with --no-sync, you must have one run with
--sync as well, maybe in a cron job.  Or just run with --sync once at
the end of this same script.  That much is straightforward, and should
be clear from the man/pod pages.

The part that caused me some trouble, and is somewhat underdocumented
IMO, is the interaction of --sync with --force-expire.  I'm afraid I
can't help you with that because I took the extreme step of disabling
expiration, and instead re-creating a fresh database monthly from the
recent corpus which I keep around.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Return Path (TM) whitelists

2015-07-09 Thread Ian Zimmerman
I just got in my inbox what I consider spam from the Belgian domain
selling Japanese copiers  printers (you probably know which one).  What
made it pass through SA were RCVD_IN_RP_CERTIFIED and RCVD_IN_RP_SAFE.
Together they account for a whopping -5 points - a poison antidote pill!
Isn't that a bit excessive?  In fact, since Return Path explicitly
advertises itself as a service for marketers, and I _never_ knowingly
subscribe to a marketing list, these scores should be (smallish)
positive as far as I'm concerned.

Also, I'm unsure what membership in SAFE means, the Return Path website
doesn't mention it prominently, as it does their certification program.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Return Path (TM) whitelists

2015-07-09 Thread Ian Zimmerman
On 2015-07-09 16:58 +, David Jones wrote:

 Did the email have a valid unsubscribe link/process?

It is in Dutch, and I can't read Dutch.
(Yes, I do use the language plugin.)

 I shortcircuit as ham for these two rule hits and never have had a
 report of spam that couldn't be reliably/safely unsubscribed from.  (I
 filter about 90,000 mailboxes.)

How can I tell if it is safe if I can't even read the message?

But in general, to me it is spam if I didn't explicitly subscribe.  And
I didn't.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Return Path (TM) whitelists

2015-07-10 Thread Ian Zimmerman
On 2015-07-10 13:54 +0100, RW wrote:

 I don't get any spam at all in the return-path lists.

 ...

 I don't doubt that there's some abuse, but I also find it hard to
 believe that the accuracy of the return-path rules isn't dominated by
 user behaviour.

Can you specify user behaviour in more detail?  Are you saying it is
something I (and the other posters with viewpoint similar to mine) did,
or didn't do, that causes us to receive RP certified UCE?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Return Path (TM) whitelists

2015-07-10 Thread Ian Zimmerman
On 2015-07-10 16:36 +0200, Reindl Harald wrote:

 most users enable checkboxes which are needed to get random forms
 submitted, even if they say i agree to get mails from here and
 there and are missing the context when that mails are coming later

You don't know me, so you can hardly claim a basis to lump me with most
users.

I repeat (for the last time, I promise): I didn't subscribe to any
Belgian/Dutch list.  Not by enabling a checkbox, not otherwise.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: Live upgrade safe?

2015-09-11 Thread Ian Zimmerman
On 2015-09-11 17:35 +0200, Reindl Harald wrote:

> >>>Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
> >>>configuration files, and without regenerating the Bayes database?  (I
> >>>use the default bdb Bayes store.)
> >>
> >>yes, but you need to run "sa-update" before restart to fetch the
> >>latest rules and hopefully have a distribution which restarts
> >>automatically after update the package
> >
> >Isn't this a contradiction?  If my distribution automatically restarts
> >(which it does), how can I sneak in a sa-update run after the upgrade
> >but before the restart?
> 
> i hope you have a testing environment for production and so just make
> the "sa-update" there and rsync the rule-updates to the liveserver

I appreciate you trying to help, but you don't really answer my
question.  Even if I could do what you suggest, the rsync would still
take finite time - longer than the interval between the upgrade and the
restart on the production system.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: [Announce] SA-Plugins: RedisAWL, RuleTimingRedis

2015-09-15 Thread Ian Zimmerman
On 2015-06-09 17:57 +0200, Benning, Markus wrote:

> RuleTimingRedis - collect SA rule timings in redis

I'm trying this out.  I have a little annoying problem: the logs
beginning on line 178 seem to go to stdout or stderr as well as syslog.
The result is that cron sends me email every time spamd is restarted
(after every rule update).  Do you know how to change that?  I find
nothing about logging in perldoc Mail::SpamAssassin::Conf.

I suppose I could just delete those lines from the module :-)  But then
I would have extra work when I merge with any new versions you have.

Thanks for your ideas.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Live upgrade safe?

2015-09-11 Thread Ian Zimmerman
On 2015-08-14 17:45 +0200, Reindl Harald wrote:

> >Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
> >configuration files, and without regenerating the Bayes database?  (I
> >use the default bdb Bayes store.)
> 
> yes, but you need to run "sa-update" before restart to fetch the
> latest rules and hopefully have a distribution which restarts
> automatically after update the package

Isn't this a contradiction?  If my distribution automatically restarts
(which it does), how can I sneak in a sa-update run after the upgrade
but before the restart?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: best way to whitelist this list?

2015-09-19 Thread Ian Zimmerman
On 2015-09-19 20:12 +0200, A. Schulze wrote:

> today I was notified by ezmlm that my MTA rejected messages to
> me. Messages to this list where classified as spam by .. spamassassin.

All of today's messages here scored around -7.5 for me, with no special
handling.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: A Plan to Stop Violence on Social Media

2015-12-16 Thread Ian Zimmerman
On 2015-12-16 14:21 -0800, jdow wrote:

> One thing worth pointing out is if this CAN be done refusing to do it
> yourself is a shallow gesture.

No, it is not.  Refusing to take part in what you believe is wrong, even
if you know the wrong will be done eventually because the Zeitgeist
favors it, is a legitimate point of view.

Then again, I don't give a rodent's back what Facebook or Twitter does.
But I am afraid it won't stop there.

Of course this is totally OT, so I won't post anymore of this here, but
I could discuss it off-list.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Trying Bayes / Redis

2015-12-11 Thread Ian Zimmerman
On 2015-12-11 14:29 -0800, Marc Perkel wrote:

> Anyone using this rule timing plugin? Having trouble getting it to
> work. Just wondering if it's worth it?
> 
> Mail::SpamAssassin::Plugin::RuleTimingRedis

I use it and I have no trouble now.  But I remember I had to disable the
LUA scripting stuff when I set it up, it wouldn't work even though my
Redis version should be recent enough to support it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 20:41 -0500, Bill Cole wrote:

> Neither su nor sudo magically changes the permissions or ownership of
> files. If you pass filenames as arguments they must be readable by the
> user actually running sa-learn, which is the *unprivileged* user
> handling the system-wide BayesDB ("amavis" in the case originating
> this thread, but "spamd" and "defang" are other common ones...) In
> most reasonably well-secured systems using Maildir message stores, the
> Maildirs are all owned by individual users or by one user that handles
> delivery to "virtual users" understood by the MTA and IMAP or POP
> server by not by the OS. That is generally NOT the same user running
> spamd or content filters for a system-wide BayesDB. As a result,
> relearning has to be done as root, shuttling data from files owned by
> one user into a process running as another.

You are right.  The reason it works for me is that I don't use a
systemwide DB.

May I ask that you turn down the sarcasm a bit?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 19:44 -0500, Bill Cole wrote:

> On 29 Dec 2015, at 18:54, Ian Zimmerman wrote:
> 
> >In fact sa-learn accepts multiple named arguments on the command line,
> >so the alternative I use is to go through the spambox N files at a time
> >in a shell loop.  (I have N=100 but obviously this depends.)
> 
> Which successfully ignores the original issue of this thread completely: that 
> the
> user sa-learn must run as cannot read the files being learnt. If you pass 
> unreadable
> filenames as arguments, sa-learn just whines and fails. Shockingly, that is 
> not the
> desired result.

Clearly you can do the su magic if needed.  The point is that the
overhead which you fear is reduced N times.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 17:50 -0500, Bill Cole wrote:

> Yes, with the advantage of using Mail::SpamAssassin::Util::secure_tmpfile() 
> rather
> than whatever I happen to roll up in a bit of Q shell that I never get 
> around to
> reviewing for edge cases...
> 
> The main reason to do something like that is to avoid the heavyweight sudo & 
> load of
> a Perl script for each message.
> 
> >
> >>The alternative without formail would be to pipe each raw message into
> >>its own sa-learn.
> >
> >The alternative is to give it a directory.

In fact sa-learn accepts multiple named arguments on the command line,
so the alternative I use is to go through the spambox N files at a time
in a shell loop.  (I have N=100 but obviously this depends.)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Bayes expiry vs. sync, again

2016-03-15 Thread Ian Zimmerman
I am sorry to return to this horse which has perhaps been beaten
enough.  But I still don't know and don't understand (_after_ reading
the docs) if I can, at the same time:

1. completely disable expiry

2. force a sync of the journal

I just saw with my own eyes that passing --sync to sa-learn does _not_
necessarily force one.  (The manpage is ambiguous about it.)  But I
don't want to pass --force-expire because of 1.

I am asking in the context of using the default db backend for Bayes,
but if there is a way to do this with one of the other options, I'll
consider it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Interesting rule combo results

2016-03-09 Thread Ian Zimmerman
On 2016-03-09 07:12 -0800, Marc Perkel wrote:

> >>HAM RULES:
> >>...
> >>   80056 HTML_MESSAGE
> >
> >What's happening here? This seems to imply that  HTML_MESSAGE only
> >appears in ham.
> >
> >
> 
> I think my results are a little strange in that I might not be
> training off all the data but just that which gets past all my other
> filters. I'm still working on this but thought I'd share what it came
> up with for better or worse.

If I take your explanation in the OP verbatim, what happens here is that
HTML_MESSAGE _without any other rule hits_ only appears in ham.  Which
seems entirely plausible, even if perhaps not very useful.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Disabling spamcop plugin

2016-04-07 Thread Ian Zimmerman
On 2016-04-07 14:37 +0100, RW wrote:

> What exactly are you trying to do here?
> 
> The pyzor plugin does testing and reporting, use_pyzor is mostly there
> to control the test. The spamcop plugin does reporting only.

So, if I don't do any explicit reporting (neither spamc -C nor
spamassassin -r), the spamcop plugin is not actually used at all?

sa-learn doesn't do any reporting, right?

My high-level goal here is to get rid of as many configuration changes
as I can in the system-managed area (/etc in my case) and achieve the
same effects by other means.  This is because I'm learning that I cannot
trust my distro not to screw me over anymore.

I noticed that I had disabled the spamcop plugin before by commenting it
out in /etc/*/init.pre, and I wanted to continue not using it even after
I reverted that file to its pristine distro state.

By the way, manpage for spamc says:

   -C report type, --reporttype=type
   Report or revoke a message to one of the configured
   collaborative filtering databases.
   The "report type" can be either report or revoke.

"To one of the databases"?  Which one?  Isn't this a bug in the manpage?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Disabling spamcop plugin

2016-04-06 Thread Ian Zimmerman
Is there any way to disable the spamcop plugin for an individual user
(i.e. from ~/.spamassassin/user_prefs) if the plugin is loaded by
/etc/spamassassin/*.pre ?

By comparison, I seem to be able to disable pyzor even if it is loaded,
by writing

  use_pyzor 0

in my user_prefs.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


[OT] still configuring [Was: Disabling spamcop plugin]

2016-04-12 Thread Ian Zimmerman
On 2016-04-12 10:57 -0400, David Niklas wrote:

> You could use Gentoo, you get to configure it all yourself!

Funny you'd say that, I _am_ actually switching to it - on my
"workstation" role computers.  I'm already over 50% over the hump, I
think. 

But on "server type" computers, I just cannot spare a dedicated security
branch.  I really don't have the time, and more importantly the nerves,
to scramble and recompile the world when each new vulnerability is
announced.

> You might also try Arch or Devuan.  What distro are you using now?

Debian.  Have been using it over 15 years now, and watched some of the
fun vanish over the last few.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: [OT] still configuring [Was: Disabling spamcop plugin]

2016-04-13 Thread Ian Zimmerman
On 2016-04-13 09:12 -0400, Michael Orlitzky wrote:

> package will be recompiled automatically as part of the updates. Any
> packages *depending on* that package (like, if they're statically linked
> to it) will also be recompiled.

But also _direct_ dependencies of the affected package, if the latest
version has new requirements.  And this is the heart of the problem.
With a dedicated security channel like debian has, the fixes are
recompiled targeted to the base release, so (for example) I'd never have
to update perl because of a fix in spamassassin.

In fact you can leave debian servers to update themselves unattended,
most of the time.  This is too huge a benefit for me to drop, even
weighed against the recent debian annoyances.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: sa-update through proxy

2016-05-04 Thread Ian Zimmerman
On 2016-05-04 08:13 -0700, John Hardin wrote:

> > alias sa-update='env http_proxy=http://myserver:myport/
> > https_proxy=http://myserver:myport/  sa-update'
> 
> Lose the "env"?

Why?  Apart from using an extra process, this should work exactly the same.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Reporting [Was: Disabling spamcop plugin]

2016-04-21 Thread Ian Zimmerman
On 2016-04-07 13:55 -0700, Ian Zimmerman wrote:

> sa-learn doesn't do any reporting, right?

[snip snip]

> By the way, manpage for spamc says:
> 
>-C report type, --reporttype=type
>Report or revoke a message to one of the configured
>collaborative filtering databases.
>The "report type" can be either report or revoke.
> 
> "To one of the databases"?  Which one?  Isn't this a bug in the manpage?

Unfortunately the thread went sideways into opinion territory after
this, but I'd still like to clarify these factual points.  Anyone?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Childish actions of Harald Reindl

2016-08-05 Thread Ian Zimmerman
On 2016-08-05 09:46 +0100, Martin wrote:

> The biggest reason is the way this mailing list is set up, when you
> click reply it replies to the poster not the list, this has always
> been a bug bare of mine and something that probably should be
> addressed.

Then don't "click reply" but use a proper mail user agent (like mutt,
but there are many others) that have a separate List Reply/Followup
function. 

What "should be addressed" is the misconfigured mailing lists that mess
with sender-supplied headers.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: Issue on disable ipv6

2016-07-01 Thread Ian Zimmerman
On 2016-07-01 20:25 +0200, Massimo Sandolo wrote:

> Hi,
> I have an issue when try to disable ipv6.
> I'm running Debian 8.3 with SpamAssassin version 3.4.0 (running on Perl
> version 5.20.2).
> In /etc/defualt/spamassassin the options line is the following:
> OPTIONS="-4 --create-prefs --max-children 5 --helper-home-dir -x -u
> usermail"
> 
> I tried also with --ipv4-only, but it doesn't work, I'm still receiving the
> following error "spamc[22477]: connect to spamd on ::1 failed, retrying (#1
> of 3): Connection refused".

What is the line or lines containing "localhost" in /etc/hosts?  You'll
need to comment out the one with the IPv6 address (::1), and leave the
one with IPv4 address (127.0.0.1) uncommented.

This is all assuming you run spamd and spamc on the same host.  If not,
please tell us about the network setup between the two hosts.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Why does the arrow on Hillary signs point to the right?


New type of monstrosity

2017-02-06 Thread Ian Zimmerman
Last couple of weeks I saw some messages whose entire contents is in the
Subject.  They have both a text/plain and text/html part but both are
empty (in the case of html, there is some markup but no character
data).  The Subject is maybe 400 or 500 chars long.

Needless to say, this is a 100% spam trait, but some escaped.

Is there already a rule somewhere to deal with this?  (not among the
ones bundled with SA, I don't think)

If I'm writing my own, is the naive way to match the Subject going to
work?  I'm asking mostly because the header is properly split and
continued around 60 character bonudaries.  That is, does SA join
continued lines before matching?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: New type of monstrosity

2017-02-06 Thread Ian Zimmerman
On 2017-02-06 20:06, Kevin A. McGrail wrote:

> > Last couple of weeks I saw some messages whose entire contents is in
> > the Subject.

> never seen such a monster.  likely killed by some other piece in the
> puzzle.  Throw it up on pastebin?

http://pastebin.com/PYaMcZa7

(I was wrong, the subject is actually one enormous line, it was my MUA
that folded it.)

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: New type of monstrosity

2017-02-07 Thread Ian Zimmerman
On 2017-02-07 09:37, Matus UHLAR - fantomas wrote:

> 11.5 - 3.5 = 8.0

And of course 1.2.3.x is not the true relay address, so

> 1.5 BOTNET Relay might be a spambot or virusbot
> [botnet0.8,ip=1.2.3.12,rdns=disorder.censored.net,maildomain=outlook.fr,baddns]

this goes out of the window as well, and you're down to 6.5

> the op may be early recipient, which is why you've got PYZOR hit,
> while the OP had not.  If the OP doesnt't use pyzor, I recomment to
> use it - using razor, pyzor and DCC is very good idea although they
> need external software.

I used to have pyzor, but I dropped it for some reason I don't
remember.  It may be time to have another look at it.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: RFC compliance pedantry (was Re: New type of monstrosity)

2017-02-07 Thread Ian Zimmerman
On 2017-02-07 18:33, Ruga wrote:

> I follow the actual RFC standard, not the proposed revisions. The To
> From and Cc fields are defined by a grammar AND a natural language
> description. Such fields MUST hold addresses, were an address is a
> username the "@" symbol and a domain name. The string "undisclosed
> recipients: ;" does not parse the grammar, and it does not pass the
> natural language requirement for an address. If the sender hides the
> recipients, why should I care delivering its junk to my valued
> accounts?

FWIW, I regularly get completely legitimate non-commercial messages with
headers of this form.  People use it to conceal from each recipient the
addresses of other recipients - just like a list or an alias, but (I'm
guessing) done entirely in the senders MUA.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Ignore third-party SA headers

2017-01-25 Thread Ian Zimmerman
On 2017-01-26 01:03, RW wrote:

> Probably what's happening is that these are emails over 500 kB which
> by default are just passed through by spamc without sending them to
> spamd.  If they don't get sent to spamd the existing SA headers don't
> get stripped.
> 
> You can to set the -s parameter on spamc to something larger that the
> largest spam you want to filter.

I have never been clear about this, in two ways.

The relevant bit of man spamc says:

 -s max_size, --max-size=max_size

 Set the maximum message size which will be sent to spamd -- any bigger
 than this threshold and the message will be returned unprocessed
 (default: 500 KB).  If spamc gets handed a message bigger than this, it
 won't be passed to spamd.  The maximum message size is 256 MB.

 The size is specified in bytes, as a positive integer greater than 0.
 For example, -s 50.

My first confusion is that even if there's a knob I can turn up on
spamc, there's a "maximum message size".  What does that mean?  Does
spamd have its own limit?  Is it really that high?  And what happens if
I break it?

Second, is the default 500 * 1000 bytes or 512 * 1024 bytes?  The
example seems to suggest the latter.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Fastest listing RBL ?

2017-02-15 Thread Ian Zimmerman
On 2017-02-15 16:30, Tom Hendrikx wrote:

> Note that the period that you describe as 'seen by SA a bit later' is
> typically less than a second.

Not in my case.  I have a custom Exim configuration where I
intentionally wait for a period of time (currently 4 minutes) between
SMTP acceptance and delivery (SA runs at delivery time), precisely
because I want to give all the collaborative mechanisms the maximum
chance to kick in.

When I wrote my OP, 4 minutes was shorter than my BIND max-ncache-ttl
parameter.  I have since set that to 180 (3 minutes), so that angle
shouldn't matter any more.  Still the balance between bouncing the most
junk outright and the risk of false positives means it's something to
think about.

> Which RBLs to use, depends on the typical spam you receive, and the
> policies that you wish to apply. IMHO, the trust you put in RBLs (and
> their listing policies) should be more important in making decisions
> than their typical response time to new (types of) spam and their
> TTLs.

Agreed.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


  1   2   >