Re: NYTimes hitting Bayes_99?

2015-02-13 Thread Reindl Harald


Am 13.02.2015 um 03:49 schrieb LuKreme:

Yeah, in my own email NYT hits bayes_00.

I just switched to using spamass-milter:
/usr/local/sbin/spamass-milter -f -p /var/run/spamass-milter.sock -u spamd -r 9 
-- -s 5242880
And it occurs to me that maybe it is not picking up bayes properly.
Should I train bayes as the spamd user?


look in your logs, the sa-milt in the message below is the user which 
spamass-milter as well as spamd are running, both on high ports


with the milter you have in fact a site-wide bayes in the .spamassassin 
folder of the milter user and hence you need to train *that* bayes


Feb 13 09:44:43 mail-gw spamd[15338]: spamd: clean message (-4.0/5.5) 
for sa-milt:189 in 0.3 seconds, 5195 bytes.




signature.asc
Description: OpenPGP digital signature


Re: NYTimes hitting Bayes_99?

2015-02-12 Thread David B Funk

On Thu, 12 Feb 2015, LuKreme wrote:


An email from the New York times daily headlines service is hitting Bayes_99 
and Bayes_999

pts rule name  description
 -- --
4.0 BAYES_99   BODY: Bayes spam probability is 99 to 100%
   [score: 1.]
0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
   [score: 1.]
0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VERIFIED  No description available.
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature from 
author's
   domain
0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not necessarily 
valid
3.0 DCC_CHECK  Detected as bulk mail by DCC (dcc-servers.net)
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay lines
0.5 MISSING_MIDMissing Message-Id: header

I’m curious about the two bayes hits and also the 3 points for bulk mail for 
something that I can’t see anyone would consider to be actual spam. Oh, and why 
is babes_999 so low scoring?


Where'd you get that score of 3.0 for DCC_CHECK, mine is 1.1. DCC is a bulk mail
detection service, not spam detection.

Those BAYES_99  BAYES_999 hits for a bulk-but-solicted mail really say
mis-trained Bayes.
For New York Times subscriptions my users usually hit either BAYES_00 or 
BAYES_05.


That BAYES_999 is an addition to BAYES_99 thus the small score. It's more
intended to be used as meta fodder (or re-scored based on your trust of
your Bayes).


--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{

NYTimes hitting Bayes_99?

2015-02-12 Thread LuKreme
An email from the New York times daily headlines service is hitting Bayes_99 
and Bayes_999

pts rule name  description
 -- --
 4.0 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VERIFIED  No description available.
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature from 
author's
domain
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not necessarily 
valid
 3.0 DCC_CHECK  Detected as bulk mail by DCC (dcc-servers.net)
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay lines
 0.5 MISSING_MIDMissing Message-Id: header

I’m curious about the two bayes hits and also the 3 points for bulk mail for 
something that I can’t see anyone would consider to be actual spam. Oh, and why 
is babes_999 so low scoring?

Here are the headers:

X-Envelope-From: bou...@ms3.lga2.nytimes.com
X-Envelope-To: *munged*
Received: from pmta01.sea1.nytimes.com (unknown)
by mail.covisp.net(Postfix 2.11.3/8.13.0) with SMTP id unknown;
Thu, 05 Feb 2015 02:49:50 -0700
(envelope-from bou...@ms3.lga2.nytimes.com)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=paperboy-1024; 
d=nytimes.com;
 
h=From:Reply-To:Date:To:Subject:List-Unsubscribe:Content-Type:Content-Transfer-Encoding:Mime-version;
 i=nytdir...@nytimes.com;
 bh=QBBvEngh4H4VJh+esN1V9ZXrEvM=;
 b=nEM/BXRsjVQS6eg8IbBlkoGyDkkvdum/HTeAHs23BWniftrODk69nY1G7aD/hyiSZ8Mt1mfugICd
   46Eo90oUmNPbl+PZG7gWQgJBu3Gzpy81GXM/WP/IiUe+rJAu3niemR2PLCHbAgB89JsfmuEM5cz4
   MvOqLffdWt61lyniYcA=
Received: by pmta01.sea1.nytimes.com (PowerMTA(TM) v3.5r3) id hqcubs0hstka for 
*munged*; Thu, 5 Feb 2015 04:48:51 -0500 (envelope-from 
bou...@ms3.lga2.nytimes.com)
X-SegmentId: 68668
X-CampaignId: 129
X-InstanceId: 53489
X-ClientId: 34527544
From: NYTimes.com nytdir...@nytimes.com
Reply-To: nytdir...@nytimes.com
Date: Thu, 05 Feb 2015 04:48:51 -0500
To: *munged*
X-job: TH-20150205
Subject:  Today's Headlines: Claims Against Saudis Cast New Light on Secret 
Pages of 9/11 Report
List-Unsubscribe: 
mailto:nyt_unsubscr...@lga2.nytimes.com?subject=http://www.nytimes.com/gst/unsub.html?email=*munged*id=34527544segment=68668group=nlproduct=TH,
 
http://www.nytimes.com/gst/unsub.html?email=*munged*id=34527544segment=68668group=nlproduct=TH
Content-Type: text/html; charset=utf-8; 
Content-Transfer-Encoding: quoted-printable
Mime-version: 1.0


-- 
'Listen,' said Rincewind. 'It's all over, do you see? You can't put the
spells back in the book, you can't unsay what's been said, you can't-'
'You can try!' --The Light Fantastic



Re: NYTimes hitting Bayes_99?

2015-02-12 Thread LuKreme

 On 12 Feb 2015, at 19:05 , David B Funk dbf...@engineering.uiowa.edu wrote:
 
 On Thu, 12 Feb 2015, LuKreme wrote:
 
 An email from the New York times daily headlines service is hitting Bayes_99 
 and Bayes_999
 
 pts rule name  description
  -- 
 --
 4.0 BAYES_99   BODY: Bayes spam probability is 99 to 100%
   [score: 1.]
 0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
   [score: 1.]
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.0 HTML_MESSAGE   BODY: HTML included in message
 -0.1 DKIM_VERIFIED  No description available.
 -0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature from 
 author's
   domain
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
 necessarily valid
 3.0 DCC_CHECK  Detected as bulk mail by DCC (dcc-servers.net)
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK 
 signature
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay lines
 0.5 MISSING_MIDMissing Message-Id: header
 
 I’m curious about the two bayes hits and also the 3 points for bulk mail for 
 something that I can’t see anyone would consider to be actual spam. Oh, and 
 why is babes_999 so low scoring?
 
 Where'd you get that score of 3.0 for DCC_CHECK, mine is 1.1. DCC is a bulk 
 mail
 detection service, not spam detection.

Probably in local.cf then. I’ve commented out all the score adjustments in 
there for right now.

 Those BAYES_99  BAYES_999 hits for a bulk-but-solicted mail really say
 mis-trained Bayes.
 For New York Times subscriptions my users usually hit either BAYES_00 or 
 BAYES_05.

Yeah, in my own email NYT hits bayes_00.

I just switched to using spamass-milter:

/usr/local/sbin/spamass-milter -f -p /var/run/spamass-milter.sock -u spamd -r 9 
-- -s 5242880

And it occurs to me that maybe it is not picking up bayes properly.

Should I train bayes as the spamd user?

use_bayes 1
bayes_auto_learn 1
bayes_store_module Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn DBI:mysql:bayes:localhost:3306
bayes_sql_username user
bayes_sql_password *pass*
bayes_sql_override_username user

 That BAYES_999 is an addition to BAYES_99 thus the small score. It's more
 intended to be used as meta fodder (or re-scored based on your trust of
 your Bayes).

OK, that makes sense.

When I make changes to local.cf do I need to restart SA or does it relied that 
file if it sees it’s changed?

-- 
Any man who says he can see through women is really missing a lot. -
Groucho Marx