Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread @lbutlr
On May 30, 2016, at 11:06 PM, Shivram Krishnan  wrote:
> 2) I have set a threshold of -10 to see how spamassassin assigns a score for 
> every mail. 

No. Do not do this.

-- 
When the routine bites hard / and ambitions are low And the resentment
rides high / but emotions won't grow And we're changing our ways, /
taking different roads Then love, love will tear us apart again



Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Shivram Krishnan
1) The message is indeed fabricated. I had to generate a RFC 2822 mail from
JSON. I am harvesting SPAM mails from mailinator.com (public email's). So
that is an error in my generation of the RFC 2822. I did not change it as
spamassassin did not assign a score.

2) I have set a threshold of -10 to see how spamassassin assigns a score
for every mail.



On Mon, May 30, 2016 at 8:25 PM, Dave Funk 
wrote:

> That message is either a fabrication or something from a messed up system.
> There's no sign of an IP address (neither IPv4 nor IPv6) in it.
>
> There are two identical 'Received:' headers which have '()' where
> there should be at least the IP address of the incoming connection.
>
> This indicates that the message has either been tampered with or is from a
> postfix system that somebody has messed up the configuration.
>
>
>
> On Mon, 30 May 2016, Shivram Krishnan wrote:
>
> Hey guys,
>>
>> I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is
>> not picking up an obvious
>> spam like in this case http://pastebin.com/MbNRNFWy .
>>
>> I have followed the guidelines on
>> https://wiki.apache.org/spamassassin/ImproveAccuracy .
>>
>> Let me know how to catch these type of Spams. It would be interesting to
>> know what your spamassassin
>> assigns the score for this spam.
>>
>> spamassassin assigned this score -
>>
>> Content analysis details:   (3.9 points, -10.0 required)
>>
>>pts rule name  description
>>  --
>> --
>>  0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
>> [score: 0.4292]
>>  0.0 HTML_MESSAGE   BODY: HTML included in message
>>  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
>>  0.4 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
>>  0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay
>> lines
>>  2.0 XPRIO  Has X-Priority header
>>
>>
>>
>> Notice that none of the  other body tags are triggered.
>>
>> Thanks,
>>
>> Shivram
>>
>>
>>
> --
> Dave Funk  University of Iowa
> College of Engineering
> 319/335-5751   FAX: 319/384-0549   1256 Seamans Center
> Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
> #include 
> Better is not better, 'standard' is better. B{


Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Dave Funk

That message is either a fabrication or something from a messed up system.
There's no sign of an IP address (neither IPv4 nor IPv6) in it.

There are two identical 'Received:' headers which have '()' where
there should be at least the IP address of the incoming connection.

This indicates that the message has either been tampered with or is from a 
postfix system that somebody has messed up the configuration.



On Mon, 30 May 2016, Shivram Krishnan wrote:


Hey guys,

I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not 
picking up an obvious
spam like in this case http://pastebin.com/MbNRNFWy .

I have followed the guidelines on 
https://wiki.apache.org/spamassassin/ImproveAccuracy .

Let me know how to catch these type of Spams. It would be interesting to know 
what your spamassassin
assigns the score for this spam.

spamassassin assigned this score -

Content analysis details:   (3.9 points, -10.0 required)

   pts rule name              description
 -- --
 0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                            [score: 0.4292]
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.7 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 0.4 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
 2.0 XPRIO                  Has X-Priority header



Notice that none of the  other body tags are triggered.

Thanks,

Shivram




--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread LuKreme
On May 30, 2016, at 20:24, Shivram Krishnan  wrote:
> I have followed the guidelines on 
> https://wiki.apache.org/spamassassin/ImproveAccuracy .

No, you really haven't.

> Content analysis details:   (3.9 points, -10.0 required)

This makes no sense at all. Either you have set the spam scores negative, which 
makes no sense, or you have set it to 10, which makes no sense.

Train more spam and don't muck with the levels.



Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Rob McEwen

On 5/30/2016 10:24 PM, Shivram Krishnan wrote:

I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is
not picking up an obvious spam like in this case
http://pastebin.com/MbNRNFWy .


Your pastebin example didn't show the "last external" sending IP. Could 
have have been there orginally, but was expunged from this sample? Could 
there have also been a link in the body of the message that was likewise 
removed?


it would be nice to be able to check those against respected low-FP DNSBLs.

Or, if the clickable link really wasn't in the original message, then 
this particular example was probably a rare malfunctioned spam that will 
be of no benefit to the spammer, and would then probably not be worth 
investigating since the spammer then has no incentive to keep sending 
these types.


--
Rob McEwen




Spamassassin not capturing obvious Spam

2016-05-30 Thread Shivram Krishnan
Hey guys,

I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is
not picking up an obvious spam like in this case
http://pastebin.com/MbNRNFWy .

I have followed the guidelines on
https://wiki.apache.org/spamassassin/ImproveAccuracy .

Let me know how to catch these type of Spams. It would be interesting to
know what your spamassassin assigns the score for this spam.

spamassassin assigned this score -

Content analysis details:   (3.9 points, -10.0 required)

* pts rule name  description*
 --
--
 0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.4292]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.4 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
 0.0 UNPARSEABLE_RELAY  Informational: message has unparseable relay
lines
 2.0 XPRIO  Has X-Priority header



Notice that none of the  other body tags are triggered.

Thanks,

Shivram


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Bill Cole

On 30 May 2016, at 15:07, Alex wrote:


Yeah, that's it exactly. Particularly overseas where it doesn't appear
NAT and/or submission are used as readily as they are here.


Irrelevant in this case because if you trust that header not to be an 
intentionally deceptive lie, the receiving server claims the mail was 
received with authentication, making it very unlikely that the message 
is spam. That is what "with esmtpa" in an Exim Received header means, 
and your other rule hits indicate that you trust 116.251.209.92 
(vio1.naveca.biz) so I don't quite get why this didn't also hit 
"ALL_TRUSTED" and why SA is doing DNSBL checks on the authenticated 
client of a trusted host.


And in ANY case, getting *a customer* to use port 587 submission with 
authentication over an encrypted channel directly to your server instead 
of trusting an intermediate machine that maybe should not be trusted 
should not be hard. Even shoddy PHP mailing scripts these days can 
handle it. If you are nominally selling any sort of email service to 
that customer and not requiring them to submit though your server to be 
treated as a trusted customer, you're making a mistake.



So even though that IP is on virtually every blacklist, you wouldn't
add any points? And there's nothing further the user could do to fix
the problem, given the dynamic nature of the IP?


I think there's a more complex problem in this case that is not evident 
in a single Received header and list of SA hits.


Note that the IP you are worried about was at the time you scanned its 
output and was still today either itself a badly compromised system or 
is a shared NAT address with one or more compromised systems behind it, 
and either way: it is an ongoing source of spam of the worst sorts to 
the outside world. It isn't listed because it's a dynamic IP, it's 
listed because it's an active ongoing spamming IP.


(and to answer the original question: I don't trust other people's mail 
servers to tell me the truth about where they get mail, so my SA 
instances don't ever hit those rules. However,  I would NEVER make a 
mailspike 'none' listing contribute to anything at all, even as a meta 
rule. LOC_MULTI_RBL seems like a bad idea, whatever it is...)


Re: SA Concepts - plugin for email semantics

2016-05-30 Thread Bill Cole

On 30 May 2016, at 18:25, Dianne Skoll wrote:


On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole"  wrote:


So you could have 'sex' and 'meds' and 'watches' tallied up in into
frequency counts that sum up natural (word) and synthetic (concept)
occurrences, not just as incompatible types of input feature but as
a conflation of incompatible features.


That is easy to patch by giving "concepts" a separate namespace.  You
could do that by picking a character that can't be in a normal token 
and

using something like:  concept*meds, concept*sex, etc. as tokens.


Yes, but I'd still be reluctant to have that namespace directly blended 
with 1-word Bayes because those "concepts" are qualitatively different: 
inherently much more complex in their measurement than words. Robotic 
semantic analysis hasn't reached the point where an unremarkable machine 
can decide whether a message is porn or a discussion of current 
political issues, and I would not hazard a guess as to which actual 
concept in email is more likely to be spam or ham these days. Any old 
mail server can of course tell whether the word 'Carolina' is present in 
a message, which probably distributes quite disproportionately towards 
ham.



FWIW, I have roughly no free time for anything between work and
family demands but if I did, I would most like to build a blind
fixed-length tokenization Bayes classifier: just slice up a message
into all of its n-byte sequences (so that a message of bytelength x
would have x-(n-1) different tokens) and use those as inputs instead
of words.


I think that could be very effective with (as you said) plenty of
training.  I think there *may* be slight justification for
canonicalizing text parts into utf-8 first; while you are losing
information, it's hard to see how 手机色情 should be treated
differently depending on the character encoding.


Well, I've not thought it through deeply, but an evasion of the charset 
issue might be to just decode any Base64 or QP transfer encoding (which 
can be path-dependent rather than a function of the sender or content) 
to get 8-bit bytes and use 6-byte tokens as if it was all 1-byte chars. 
UCS-4 messages would be a wreck, but pairs of non-ASCII chars in UTF-8 
would be seen cleanly once and as an aura of 10 semi-junk tokens around 
them, in a manner that might effectively wash itself out. Or go to 
12-byte tokens and get the same effect with UCS-4. Or 3-byte tokens: 
screw 32-bit charsets, screw encoding semantics of UTF-8, just have 16.8 
million possible 24-bit tokens and see how they distribute. It seems to 
me that this is almost the ultimate test for Naive Bayes text analysis: 
break away from the idea that the input features have any innate meaning 
at all, let them be pure proxies for whatever complex larger patterns 
give rise to them.


Oh, and did I mention that Bayes' Theorem has different 
"interpretations" in the same way Heisenberg's Uncertainty Principle and 
quantum superposition do? 24-bit tokens could settle the dispute...


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 31.05.2016 um 00:59 schrieb Reindl Harald:



Am 31.05.2016 um 00:57 schrieb Reindl Harald:

Am 31.05.2016 um 00:49 schrieb Alex:

Hi,


So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
part of the default ruleset, which I could of course change, but it's
scored 1.3 by default for that same "deep header" IP address.

Does that rule deserve some attention to determine whether it should
also be reduced by default for the same reason as the SBL/XBL rule?


DUNNO - we disabled all internal RBL's (exepct mailspike) from start
because
we feed postscreen and spamassassin from the same webinterface with
different scores for both but same lists (and some of them are
mirrored on
the local rbldnsd with different names in the own domain)


So then what were all those RBLs you listed initially with their
weights? bl.spamcop.net was among them...


can't say initailly - maintained starting summer 2014 - current state

don't use anything ending with "thelounge.net", our public nameservers
answers always with 127.0.0.2 to stop users which blind copy
because they have no access to that zones and there was a lot of useless
response-rate-limitings

in case of mirrored zones the alias contains the real list

hopefully that get somehow useable displayed in the mail


did not - attached as textfile this time


below some numbers from the current month showing why postscreen in 
front is that important (at the moment 250 MHz CPU usage on the virtual 
machine with 2% for journald/rsyslog writin gmaillog) for performance 
while the 434722 dnsbl-rejects are only a small part of the game


the "Hangup: 665860" did not wait for the result at all and closed 
connection because "postscreen_greet_wait = ${stress?2}${stress:12}s"


70% of all that crap is from the last 7 days where numbers started to 
explode, on the inbound-mx as well as on our honeypot network 
blacklisted currently 5 ip's while normally 15000-2 and lists at 
the moment 21161 blacklistings refrsehd within the last 24 hours


BAYES_0027216   73.67 %
BAYES_05  8042.17 %
BAYES_20 10672.88 %
BAYES_40  9012.43 %
BAYES_50 31108.41 %
BAYES_60  3580.96 % 8.91 % (OF TOTAL BLOCKED)
BAYES_80  3470.93 % 8.64 % (OF TOTAL BLOCKED)
BAYES_95  2930.79 % 7.29 % (OF TOTAL BLOCKED)
BAYES_99 28457.70 %70.84 % (OF TOTAL BLOCKED)
BAYES_99925036.77 %62.32 % (OF TOTAL BLOCKED)

DNSWL   52213   94.10 %
SPF 36458   65.70 %
SPF/DKIM WL 16232   29.25 %
SHORTCIRCUIT18515   33.36 %

BLOCKED  40167.23 %
SPAMMY   38436.92 %95.69 % (OF TOTAL BLOCKED)

spamhaus.org  321543
sorbs.net  60687
inps.de35828
barracudacentral.org9023
thelounge.net   5255
junkemailfilter.com  939
psbl.org 437
manitu.net   380
senderscore.com  234
mailspike.net217
spamcannibal.org 102
spamcop.net   70
swinog.ch  7
=
Total DNSBL rejections:434722
_

Connections:   806720
Postscreen WL: 29636 (3.67 %)
Delivered: 52751
Blocked:   753969
Invalid User:  7288
Disallowed User:   12
Reject Postscreen: 438583
Reject Postfix:15419
Reject Milter: 4201
Reject Temporary:  1266
Greylisted:1464
Blacklist: 436079
Pregreet:  43449
Hangup:665860
Protocol Error:1247
Illegal Syntax:7
SpamAssassin:  4016
Virus (Milter):180
Virus (SA):97
Helo:  1644
Subject:   248
From:  65
Attachment:62
Header Length: 22
Sender Regex:  90
Sender Blocked:237
Sender Verify: 168
Sender Invalid:1460
Sender Spoofed:96
Sender Parked: 13
Spam-TLD:  328
PTR Missing:   297
PTR Generic:   499
SPF:   494




signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 31.05.2016 um 00:57 schrieb Reindl Harald:

Am 31.05.2016 um 00:49 schrieb Alex:

Hi,


So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
part of the default ruleset, which I could of course change, but it's
scored 1.3 by default for that same "deep header" IP address.

Does that rule deserve some attention to determine whether it should
also be reduced by default for the same reason as the SBL/XBL rule?


DUNNO - we disabled all internal RBL's (exepct mailspike) from start
because
we feed postscreen and spamassassin from the same webinterface with
different scores for both but same lists (and some of them are
mirrored on
the local rbldnsd with different names in the own domain)


So then what were all those RBLs you listed initially with their
weights? bl.spamcop.net was among them...


can't say initailly - maintained starting summer 2014 - current state

don't use anything ending with "thelounge.net", our public nameservers
answers always with 127.0.0.2 to stop users which blind copy
because they have no access to that zones and there was a lot of useless
response-rate-limitings

in case of mirrored zones the alias contains the real list

hopefully that get somehow useable displayed in the mail


did not - attached as textfile this time


+-++-+---+--+--+
| name| weight | resp| alias
 | sa_weigt | sa_resp  |
+-++-+---+--+--+
| dnsbl.thelounge.net | 16 | 127.0.0.2   | 
dnsbl.thelounge.net   |7 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  9 | 127.0.0.10  | 
dul.dnsbl.sorbs.net   |  6.5 | ^127\.0\.0\.10$  |
| dnsbl.sorbs.net |  9 | 127.0.0.14  | 
noserver.dnsbl.sorbs.net  |  6.5 | ^127\.0\.0\.14$  |
| zen.spamhaus.org|  8 | 127.0.0.[10;11] | 
pbl.spamhaus.org  |  6.5 | ^127\.0\.0\.1[01]$   |
| zen.spamhaus.org|  7 | 127.0.0.[4..7]  | 
xbl.spamhaus.org  |  5.5 | ^127\.0\.0\.[4-7]$   |
| dnsbl.sorbs.net |  7 | 127.0.0.5   | 
smtp.dnsbl.sorbs.net  |  5.5 | ^127\.0\.0\.5$   |
| b.barracudacentral.org  |  7 | 127.0.0.2   | 
b.barracudacentral.org|5 | ^127\.0\.0\.2$   |
| zen.spamhaus.org|  7 | 127.0.0.3   | 
css.spamhaus.org  |5 | ^127\.0\.0\.3$   |
| dnsbl.inps.de   |  7 | 127.0.0.2   | 
dnsbl.inps.de |5 | ^127\.0\.0\.2$   |
| dnsbl-ix.thelounge.net  |  4 | 127.0.0.2   | 
ix.dnsbl.manitu.net   |  2.5 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  4 | 127.0.0.7   | 
web.dnsbl.sorbs.net   |  4.5 | ^127\.0\.0\.7$   |
| bl.spamcop.net  |  4 | 127.0.0.2   | 
bl.spamcop.net|  2.5 | ^127\.0\.0\.2$   |
| bl.mailspike.net|  4 | 127.0.0.2   | 
z.mailspike.net   |0 |  |
| bl.mailspike.net|  4 | 127.0.0.[10;11;12]  | 
bl.mailspike.net  |0 |  |
| hostkarma.junkemailfilter.com   |  4 | 127.0.0.2   | 
hostkarma.junkemailfilter.com |  3.5 | ^127\.0\.0\.2$   |
| dnsbl-surriel.thelounge.net |  4 | 127.0.0.2   | 
psbl.surriel.com  |  2.5 | ^127\.0\.0\.2$   |
| bl.spameatingmonkey.net |  4 | 127.0.0.[2;3]   | 
bl.spameatingmonkey.net   |  2.5 | ^127\.0\.0\.[23]$|
| dnsrbl.swinog.ch|  4 | 127.0.0.3   | 
dnsrbl.swinog.ch  |  2.5 | ^127\.0\.0\.3$   |
| dnsbl-spamcannibal.thelounge.net|  3 | 127.0.0.2   | 
bl.spamcannibal.org   |  1.5 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  3 | 127.0.0.6   | 
spam.dnsbl.sorbs.net  |  1.5 | ^127\.0\.0\.6$   |
| score.senderscore.com   |  3 | 127.0.4.[0..20] | 
senderscore.com High  |  1.5 | ^127\.0\.4\.(1?[0-9]|20)$|
| zen.spamhaus.org|  3 | 127.0.0.2   | 
sbl.spamhaus.org  |  1.5 | ^127\.0\.0\.2$   

Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 31.05.2016 um 00:49 schrieb Alex:

Hi,


So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
part of the default ruleset, which I could of course change, but it's
scored 1.3 by default for that same "deep header" IP address.

Does that rule deserve some attention to determine whether it should
also be reduced by default for the same reason as the SBL/XBL rule?


DUNNO - we disabled all internal RBL's (exepct mailspike) from start because
we feed postscreen and spamassassin from the same webinterface with
different scores for both but same lists (and some of them are mirrored on
the local rbldnsd with different names in the own domain)


So then what were all those RBLs you listed initially with their
weights? bl.spamcop.net was among them...


can't say initailly - maintained starting summer 2014 - current state

don't use anything ending with "thelounge.net", our public nameservers 
answers always with 127.0.0.2 to stop users which blind copy 
because they have no access to that zones and there was a lot of useless 
response-rate-limitings


in case of mirrored zones the alias contains the real list

hopefully that get somehow useable displayed in the mail

+-++-+---+--+--+
| name| weight | resp| 
alias | sa_weigt | sa_resp  |

+-++-+---+--+--+
| dnsbl.thelounge.net | 16 | 127.0.0.2   | 
dnsbl.thelounge.net   |7 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  9 | 127.0.0.10  | 
dul.dnsbl.sorbs.net   |  6.5 | ^127\.0\.0\.10$  |
| dnsbl.sorbs.net |  9 | 127.0.0.14  | 
noserver.dnsbl.sorbs.net  |  6.5 | ^127\.0\.0\.14$  |
| zen.spamhaus.org|  8 | 127.0.0.[10;11] | 
pbl.spamhaus.org  |  6.5 | ^127\.0\.0\.1[01]$   |
| zen.spamhaus.org|  7 | 127.0.0.[4..7]  | 
xbl.spamhaus.org  |  5.5 | ^127\.0\.0\.[4-7]$   |
| dnsbl.sorbs.net |  7 | 127.0.0.5   | 
smtp.dnsbl.sorbs.net  |  5.5 | ^127\.0\.0\.5$   |
| b.barracudacentral.org  |  7 | 127.0.0.2   | 
b.barracudacentral.org|5 | ^127\.0\.0\.2$   |
| zen.spamhaus.org|  7 | 127.0.0.3   | 
css.spamhaus.org  |5 | ^127\.0\.0\.3$   |
| dnsbl.inps.de   |  7 | 127.0.0.2   | 
dnsbl.inps.de |5 | ^127\.0\.0\.2$   |
| dnsbl-ix.thelounge.net  |  4 | 127.0.0.2   | 
ix.dnsbl.manitu.net   |  2.5 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  4 | 127.0.0.7   | 
web.dnsbl.sorbs.net   |  4.5 | ^127\.0\.0\.7$   |
| bl.spamcop.net  |  4 | 127.0.0.2   | 
bl.spamcop.net|  2.5 | ^127\.0\.0\.2$   |
| bl.mailspike.net|  4 | 127.0.0.2   | 
z.mailspike.net   |0 |  |
| bl.mailspike.net|  4 | 127.0.0.[10;11;12]  | 
bl.mailspike.net  |0 |  |
| hostkarma.junkemailfilter.com   |  4 | 127.0.0.2   | 
hostkarma.junkemailfilter.com |  3.5 | ^127\.0\.0\.2$   |
| dnsbl-surriel.thelounge.net |  4 | 127.0.0.2   | 
psbl.surriel.com  |  2.5 | ^127\.0\.0\.2$   |
| bl.spameatingmonkey.net |  4 | 127.0.0.[2;3]   | 
bl.spameatingmonkey.net   |  2.5 | ^127\.0\.0\.[23]$|
| dnsrbl.swinog.ch|  4 | 127.0.0.3   | 
dnsrbl.swinog.ch  |  2.5 | ^127\.0\.0\.3$   |
| dnsbl-spamcannibal.thelounge.net|  3 | 127.0.0.2   | 
bl.spamcannibal.org   |  1.5 | ^127\.0\.0\.2$   |
| dnsbl.sorbs.net |  3 | 127.0.0.6   | 
spam.dnsbl.sorbs.net  |  1.5 | ^127\.0\.0\.6$   |
| score.senderscore.com   |  3 | 127.0.4.[0..20] | 
senderscore.com High  |  1.5 | ^127\.0\.4\.(1?[0-9]|20)$|
| zen.spamhaus.org|  3 | 127.0.0.2   | 
sbl.spamhaus.org  |  1.5 | ^127\.0\.0\.2$   |
| hostkarma.junkemailfilter.com   |  2 | 127.0.0.4   | 

Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Alex
Hi,

>> So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
>> reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
>> part of the default ruleset, which I could of course change, but it's
>> scored 1.3 by default for that same "deep header" IP address.
>>
>> Does that rule deserve some attention to determine whether it should
>> also be reduced by default for the same reason as the SBL/XBL rule?
>
> DUNNO - we disabled all internal RBL's (exepct mailspike) from start because
> we feed postscreen and spamassassin from the same webinterface with
> different scores for both but same lists (and some of them are mirrored on
> the local rbldnsd with different names in the own domain)

So then what were all those RBLs you listed initially with their
weights? bl.spamcop.net was among them...


>
>


Re: SA Concepts - plugin for email semantics

2016-05-30 Thread Dianne Skoll
On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole"  wrote:

> So you could have 'sex' and 'meds' and 'watches' tallied up in into
> frequency counts that sum up natural (word) and synthetic (concept)
> occurrences, not just as incompatible types of input feature but as
> a conflation of incompatible features.

That is easy to patch by giving "concepts" a separate namespace.  You
could do that by picking a character that can't be in a normal token and
using something like:  concept*meds, concept*sex, etc. as tokens.

> FWIW, I have roughly no free time for anything between work and
> family demands but if I did, I would most like to build a blind
> fixed-length tokenization Bayes classifier: just slice up a message
> into all of its n-byte sequences (so that a message of bytelength x
> would have x-(n-1) different tokens) and use those as inputs instead
> of words.

I think that could be very effective with (as you said) plenty of
training.  I think there *may* be slight justification for
canonicalizing text parts into utf-8 first; while you are losing
information, it's hard to see how 手机色情 should be treated
differently depending on the character encoding.

Regards,

Dianne.


Re: SA Concepts - plugin for email semantics

2016-05-30 Thread Bill Cole

On 28 May 2016, at 17:53, John Hardin wrote:

Based on that, do you have an opinion on the proposal to add two-word 
(or configurable-length) combinations to Bayes?


CAVEAT: it has literally been decades since I've worked deep in 
statistics on a routine basis rather than just using blindly trusted 
black-box tools every now and then, so some of the below could be 
influenced by senile dementia...


Tallying word pairs *instead* of single words or as a second discrete 
Bayes analysis wouldn't be a problem and would surely be useful, 
possibly more useful than single-word analysis.


Doing one unified analysis where single words and multi-word phrases are 
both tallied in one Bayes DB to determine one Bayes score is less 
clearly valid because there is absolute dependence in one direction: the 
presence of any phrase requires its component words also to be present. 
OTOH, whether sets of words that are commonly used in particular 
sequences occur independently with or without matching those sequences 
is pretty clearly an independent feature of a text not captured by 
1-word tokenization, so it wouldn't be blatantly wrong to capture it 
indirectly by having a unified word and phrase Bayes DB. So I guess I'm 
undecided, leaning in favor because it captures information otherwise 
invisible to the Bayes DB.


The "Naive Bayes" classification approach is theoretically moored to 
Bayes' Theorem by the concept that even if there's SOME dependent 
correlation across the features being measured to feed the 
classification database, incomplete dependency makes a large set of 
similar measurable features (like the presence of words in a message) 
usable as a proxy for a hypothetical set of truly independent features 
which are unknown and may not be readily quantified. For textual 
analysis, this ironically might be "concepts" but to be accurate that 
set would have to include a properly distributed sample of all possible 
concepts and a concrete way to detect each one accurately. Using words 
or n-word phrases instead of concepts means that Bayesian spam 
classification does not require a full-resolution simulation of Brahman 
on every mail server. Those are very resource-heavy...


The canonical empirical example of Naive Bayes classification is the use 
of simple physical body measurements to classify humans by biological 
sex. That classification improves as one adds more direct physical 
measurements, even though they all relate to each other via abstract 
ideas like "size," "muscularity," and "shape". However, if one includes 
such subjective abstractions, accuracy usually suffers (unless you cheat 
with features like 'femininity'.) Less intuitively, if one adds 
arbitrary derived features like BMI which can be calculated from the 
simpler measured features also in the input set, classification accuracy 
also is usually made worse. Perversely, classifiers using purely 
subjective abstractions or purely derived values such as various ratios 
of direct physical metrics work better on average than classifiers of 
mixed types, but can work better or worse than classifiers using the 
simple measurements on which the derived features are based. This is 
where the serious arguments about various Naive Bayes implementations 
arise: What constitutes features of compatible classes? How strong can a 
correlation between features be without effectively being measurements 
of the same thing twice? Is the empirical support for the idea of 
semi-independent features as proxies for truly independent features 
strong enough? Are the distributions of the predictive features and the 
classifications compatible with each other for Bayes or even for Bayes 
*AT ALL*?


The approach of mixing "concepts" into the existing Bayes DB is 
qualitatively broken because concept tokens would be deterministically 
derived from the actual word tokens in messages based on some subjective 
scheme and then added as words which are likely to also be be naturally 
occurring in some but not all of the messages to which they are added. 
So you could have 'sex' and 'meds' and 'watches' tallied up in into 
frequency counts that sum up natural (word) and synthetic (concept) 
occurrences, not just as incompatible types of input feature but as a 
conflation of incompatible features.



FWIW, I have roughly no free time for anything between work and family 
demands but if I did, I would most like to build a blind fixed-length 
tokenization Bayes classifier: just slice up a message into all of its 
n-byte sequences (so that a message of bytelength x would have x-(n-1) 
different tokens) and use those as inputs instead of words. An advantage 
to this over word-wise Bayes would be attenuation of semantic 
entanglement and better detection of intentional obfuscation, at the 
cost of needing huge training volume to get a usable classifier.


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 21:49 schrieb Alex:

Yeah, that's it exactly. Particularly overseas where it doesn't appear
NAT and/or submission are used as readily as they are here.



with carrier grade NAT and "DS-Lite" aka "public ipv6 but NAT ipv4" becoming
more and more common the problem is and will be growing fast


So even though that IP is on virtually every blacklist, you wouldn't
add any points? And there's nothing further the user could do to fix
the problem, given the dynamic nature of the IP?


no, see above

with enough blacklists in the scoring for last-external you get the
offending mailservers with hacked useraccounts blacklisted fast enough and
in many cases faster because the submission ip's of a hacked account are
changing fast

saw that the very few times it happened for customers of us where the
submission clients came from all over the world - because of rate-limiting
and a good monitoring of the mailqueue (how many mails are queued to the
outside world) it was each time a short enough timeframe to shut down the
affected account and avoid blacklisting (some abuse reports answered
promptly)

so at the end of the day it's enough to check the last-external for good
results and not affect innocent clients which got a dynamic adress abused 30
minutes before by a different enduser or by a user sitting behind the same
ISP NAT


So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
part of the default ruleset, which I could of course change, but it's
scored 1.3 by default for that same "deep header" IP address.

Does that rule deserve some attention to determine whether it should
also be reduced by default for the same reason as the SBL/XBL rule?


DUNNO - we disabled all internal RBL's (exepct mailspike) from start 
because we feed postscreen and spamassassin from the same webinterface 
with different scores for both but same lists (and some of them are 
mirrored on the local rbldnsd with different names in the own domain)





signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Alex
Hi,

>> Yeah, that's it exactly. Particularly overseas where it doesn't appear
>> NAT and/or submission are used as readily as they are here.
>
>
> with carrier grade NAT and "DS-Lite" aka "public ipv6 but NAT ipv4" becoming
> more and more common the problem is and will be growing fast
>
>> So even though that IP is on virtually every blacklist, you wouldn't
>> add any points? And there's nothing further the user could do to fix
>> the problem, given the dynamic nature of the IP?
>
>
> no, see above
>
> with enough blacklists in the scoring for last-external you get the
> offending mailservers with hacked useraccounts blacklisted fast enough and
> in many cases faster because the submission ip's of a hacked account are
> changing fast
>
> saw that the very few times it happened for customers of us where the
> submission clients came from all over the world - because of rate-limiting
> and a good monitoring of the mailqueue (how many mails are queued to the
> outside world) it was each time a short enough timeframe to shut down the
> affected account and avoid blacklisting (some abuse reports answered
> promptly)
>
> so at the end of the day it's enough to check the last-external for good
> results and not affect innocent clients which got a dynamic adress abused 30
> minutes before by a different enduser or by a user sitting behind the same
> ISP NAT

So I created the RCVD_IN_XBL_ALL "deep header" rule and have since
reduced its score. However, there's still RCVD_IN_BL_SPAMCOP_NET as
part of the default ruleset, which I could of course change, but it's
scored 1.3 by default for that same "deep header" IP address.

Does that rule deserve some attention to determine whether it should
also be reduced by default for the same reason as the SBL/XBL rule?

Thanks,
Alex

>


Re: PHP eval()'d code

2016-05-30 Thread John Hardin

On Mon, 30 May 2016, Reindl Harald wrote:




Am 30.05.2016 um 01:20 schrieb John Hardin:

 On Sun, 29 May 2016, Reindl Harald wrote:
>  Am 29.05.2016 um 23:38 schrieb John Hardin:
> >   On Thu, 26 May 2016, RW wrote:
> > 
> > >   I noticed that Bayes is picking-up on very strong tokens from

> >  "eval" and
> > >   "code" in headers like this:
> > > > X-PHP-Originating-Script: 1013:global.php(1938) : eval()'d code
> > > >   The "eval()'d code" part is in just over 2% of my spam, but it's
> > >   never occurred in a single ham in my corpus.
> > 
> >   It doesn't do too well in masscheck:
> > 
> >   http://ruleqa.spamassassin.org/20160528-r1745852-n/__PHP_ORIG_SCRIPT_EVAL/detail
> 
>  where is the rule?


 
https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/20_misc_testing.cf

>  if masscheck pretends that this hits a relevant amount of ham

 It doesn't. 3 out of 139k.


so what did you want to say with "It doesn't do too well in masscheck"


Few hits (either spam or ham) relative to the overall corpora (less than 
6/10 of a percent for either), and the S/O isn't that good (.73).



>  while we see 250 sampls *at all* with a "X-PHP-Originating-Script"

 Here is the basic "header exists" rule for that same masscheck run:

 http://ruleqa.spamassassin.org/20160528-r1745852-n/__HAS_PHP_ORIG_SCRIPT/detail


i see there a lot of stuff but not the rule source itself but that is only 
"has that header" i guess


The rule source for both is in the SVN link posted above. The __HAS rule 
is a basic rule for "does the header exist?". The other rule is the latest 
change in the history:


https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/20_misc_testing.cf?r1=1741551=1745822_format=h


headerCUST_PHP_EVAL X-PHP-Originating-Script =~ /eval\(\)\'d code/
score CUST_PHP_EVAL 1.5
describe  CUST_PHP_EVAL Looks like from exploited webserver


 It hits 1595 spam and 1972 ham. Where are you getting only 250 hits for
 that header?


in our corpus containg 9 eml files


OK. My apologies, when you said "we see" I thought you were referring to 
the masscheck results, not your local results.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The ["assault weapons"] ban is the moral equivalent of banning red
  cars because they look too fast.  -- Steve Chapman, Chicago Tribune
---
 Today: Memorial Day - honor those who sacrificed for our liberty


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 21:07 schrieb Alex:

it's nonsense to give points for dynamic enduser machines, they are
*typically* on a lot of blacklists and the users behind are changing all the
time

when you want to know why - try to use sbl-xbl as suggested by spiderlabs
for a web-application-firewall, did that *only* for form-submissions and
reverted it after few hours on a sunday because support hell with no good
excuse


Yeah, that's it exactly. Particularly overseas where it doesn't appear
NAT and/or submission are used as readily as they are here.


with carrier grade NAT and "DS-Lite" aka "public ipv6 but NAT ipv4" 
becoming more and more common the problem is and will be growing fast



So even though that IP is on virtually every blacklist, you wouldn't
add any points? And there's nothing further the user could do to fix
the problem, given the dynamic nature of the IP?


no, see above

with enough blacklists in the scoring for last-external you get the 
offending mailservers with hacked useraccounts blacklisted fast enough 
and in many cases faster because the submission ip's of a hacked account 
are changing fast


saw that the very few times it happened for customers of us where the 
submission clients came from all over the world - because of 
rate-limiting and a good monitoring of the mailqueue (how many mails are 
queued to the outside world) it was each time a short enough timeframe 
to shut down the affected account and avoid blacklisting (some abuse 
reports answered promptly)


so at the end of the day it's enough to check the last-external for good 
results and not affect innocent clients which got a dynamic adress 
abused 30 minutes before by a different enduser or by a user sitting 
behind the same ISP NAT




signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Alex
Hi,

> "RCVD_IN_XBL_ALL" smells like deep header inspection
>

 The question was:

   "How many points do you add to an email that  *originated*
from a dynamic IP that [is] on a number of blacklists?"
>>>
>>>
>>> no - that was the question of the OP
>>> i responded long ago with config values
>>
>>
>> You're probably misunderstanding the precise meaning of "originated".
>
>
> well *no points at all* if we talk about the client using a submission
> server and not about the server itself deliver the mail to our machine
>
> you can do that only for your *personal* mail, but it's a no-go if you host
> users
>
>>> the question above is a different one while i can't parse it completly
>>
>>
>> The question is about an email from a client IP that's in a lot of
>> blacklists.
>>
>> The IP address that's in the blacklists, 180.178.104.22, authenticated
>>
>>   Received: from [180.178.104.22] (port=51022 helo=CapriciousDude)
>>   by vio1.naveca.biz with esmtpa (Exim 4.87)
>
>
> it's nonsense to give points for dynamic enduser machines, they are
> *typically* on a lot of blacklists and the users behind are changing all the
> time
>
> when you want to know why - try to use sbl-xbl as suggested by spiderlabs
> for a web-application-firewall, did that *only* for form-submissions and
> reverted it after few hours on a sunday because support hell with no good
> excuse

Yeah, that's it exactly. Particularly overseas where it doesn't appear
NAT and/or submission are used as readily as they are here.

So even though that IP is on virtually every blacklist, you wouldn't
add any points? And there's nothing further the user could do to fix
the problem, given the dynamic nature of the IP?

Thanks,
Alex


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 20:45 schrieb RW:

On Mon, 30 May 2016 19:59:10 +0200
Reindl Harald wrote:


Am 30.05.2016 um 18:11 schrieb RW:

On Mon, 30 May 2016 14:12:27 +0200
Reindl Harald wrote:



"RCVD_IN_XBL_ALL" smells like deep header inspection



The question was:

  "How many points do you add to an email that  *originated*
   from a dynamic IP that [is] on a number of blacklists?"


no - that was the question of the OP
i responded long ago with config values


You're probably misunderstanding the precise meaning of "originated".


well *no points at all* if we talk about the client using a submission 
server and not about the server itself deliver the mail to our machine


you can do that only for your *personal* mail, but it's a no-go if you 
host users



the question above is a different one while i can't parse it completly


The question is about an email from a client IP that's in a lot of
blacklists.

The IP address that's in the blacklists, 180.178.104.22, authenticated

  Received: from [180.178.104.22] (port=51022 helo=CapriciousDude)
  by vio1.naveca.biz with esmtpa (Exim 4.87)


it's nonsense to give points for dynamic enduser machines, they are 
*typically* on a lot of blacklists and the users behind are changing all 
the time


when you want to know why - try to use sbl-xbl as suggested by 
spiderlabs for a web-application-firewall, did that *only* for 
form-submissions and reverted it after few hours on a sunday because 
support hell with no good excuse





signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread RW
On Mon, 30 May 2016 19:59:10 +0200
Reindl Harald wrote:

> Am 30.05.2016 um 18:11 schrieb RW:
> > On Mon, 30 May 2016 14:12:27 +0200
> > Reindl Harald wrote:
> >  

> >> "RCVD_IN_XBL_ALL" smells like deep header inspection
> >>  
> >
> > The question was:
> >
> >   "How many points do you add to an email that  *originated*
> >from a dynamic IP that [is] on a number of blacklists?"  
> 
> no - that was the question of the OP
> i responded long ago with config values

You're probably misunderstanding the precise meaning of "originated".
 
> the question above is a different one while i can't parse it completly

The question is about an email from a client IP that's in a lot of
blacklists.

The IP address that's in the blacklists, 180.178.104.22, authenticated

  Received: from [180.178.104.22] (port=51022 helo=CapriciousDude)
  by vio1.naveca.biz with esmtpa (Exim 4.87)


And RCVD_IN_DNSWL_NONE rules-out it being a test on outgoing mail.


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 18:11 schrieb RW:

On Mon, 30 May 2016 14:12:27 +0200
Reindl Harald wrote:


Am 30.05.2016 um 14:10 schrieb Matthias Leisi:

Hm, that looks odd:


Am 27.05.2016 um 20:15 schrieb Alex >:



X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at
http://www.dnswl.org/, no
*  trust
*  [116.251.209.92 listed in list.dnswl.org
]

-^

*  0.0 RCVD_IN_XBL_ALL RBL: Received via a relay in Spamhaus
SBL-XBL
*  [180.178.104.22 listed in mykey.zen.dq.spamhaus.net
]

-^

Why do these two different IPs show up? _NONE for 116.251.209.92
does not add any points, but if that IP ever gets a higher score at
dnswl.org , then it may effect the accuracy of
your spamfilter.

Is that a legitimate forwarder IP?


"RCVD_IN_XBL_ALL" smells like deep header inspection



The question was:

  "How many points do you add to an email that  *originated*
   from a dynamic IP that [is] on a number of blacklists?"


no - that was the question of the OP
i responded long ago with config values

the question above is a different one while i can't parse it completly

Am 27.05.2016 um 20:15 schrieb Alex:
> How many points do you add to an email that originated from a dynamic
> IP that on a number of blacklists?
>
> This 180.178.104.22 is an IP from a customer in Indonesia:
>
> Received: from [180.178.104.22] (port=51022 helo=CapriciousDude)
> by vio1.naveca.biz with esmtpa (Exim 4.87)
> (envelope-from )
> id 1b6FMu-00087L-42; Fri, 27 May 2016 18:51:52 +0800
>
> This IP is on virtually every blacklist, but it doesn't necessarily
> mean it's the result of something this particular customer/user did

don't matter - a enduser IP has no business to deliver mail on port 25 
anywhere



++---+
| spamass_weight | alias |
++---+
|6.5 | pbl.spamhaus.org  |
|6.5 | dul.dnsbl.sorbs.net   |
|6.5 | noserver.dnsbl.sorbs.net  |
|5.5 | smtp.dnsbl.sorbs.net  |
|5.5 | xbl.spamhaus.org  |
|  5 | b.barracudacentral.org|
|  5 | dnsbl.inps.de |
|  5 | css.spamhaus.org  |
|4.5 | web.dnsbl.sorbs.net   |
|3.5 | hostkarma.junkemailfilter.com |
|2.5 | ix.dnsbl.manitu.net   |
|2.5 | psbl.surriel.com  |
|2.5 | dnsrbl.swinog.ch  |
|2.5 | bl.spameatingmonkey.net   |
|2.5 | bl.spamcop.net|
|1.5 | senderscore.com High  |
|1.5 | hostkarma.junkemailfilter.com |
|1.5 | block.dnsbl.sorbs.net |
|1.5 | bl.spamcannibal.org   |
|1.5 | zombie.dnsbl.sorbs.net|
|1.5 | spam.dnsbl.sorbs.net  |
|1.5 | sbl.spamhaus.org  |
|  1 | senderscore.com Medium|
|  1 | bl.nszones.com|
|  1 | http.dnsbl.sorbs.net  |
|  1 | socks.dnsbl.sorbs.net |
|  1 | spam.spamrats.com |
|  1 | misc.dnsbl.sorbs.net  |
|  1 | dnsbl-1.uceprotect.net|
|  1 | dnsbl-2.uceprotect.net|
|0.5 | hostkarma.junkemailfilter.com |
|0.5 | virus.dnsbl.sorbs.net |
|0.1 | ips.backscatterer.org |
++---+



signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread RW
On Mon, 30 May 2016 14:12:27 +0200
Reindl Harald wrote:

> Am 30.05.2016 um 14:10 schrieb Matthias Leisi:
> > Hm, that looks odd:
> >  
> >> Am 27.05.2016 um 20:15 schrieb Alex  >> >:  
> >  
> >> X-Spam-Report:
> >> * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at
> >> http://www.dnswl.org/, no
> >> *  trust
> >> *  [116.251.209.92 listed in list.dnswl.org
> >> ]  
> > -^  
> >> *  0.0 RCVD_IN_XBL_ALL RBL: Received via a relay in Spamhaus
> >> SBL-XBL
> >> *  [180.178.104.22 listed in mykey.zen.dq.spamhaus.net
> >> ]  
> > -^
> >
> > Why do these two different IPs show up? _NONE for 116.251.209.92
> > does not add any points, but if that IP ever gets a higher score at
> > dnswl.org , then it may effect the accuracy of
> > your spamfilter.
> >
> > Is that a legitimate forwarder IP?  
> 
> "RCVD_IN_XBL_ALL" smells like deep header inspection
> 

The question was: 

  "How many points do you add to an email that  *originated* 
   from a dynamic IP that [is] on a number of blacklists?"


Re: Odd results when using whitelisting

2016-05-30 Thread Reindl Harald


Am 30.05.2016 um 16:35 schrieb Nick Howitt:

Just for a bit of closure, it looks like when you use amavisd-new with
SA, it is amavisd-new and not SA which is adding the X-Spam headers. In
/etc/amavisd/api.conf there is a parameter, $sa_tag_level_deflt,
defaulted to -99, below which no X-Spam headers are set. If you
whitelist, you start at -100. So, if the rest of the tests total to less
than 1, you will not get an X-Spam header. This can be confirmed by
playing around with this parameter and by upping the amavisd log level
so you can see the results of all the spam tests for each e-mail even if
it does not get the X-Spam headers.


well, the next time save us from your arrogance like below and accept 
that people with knowledge are knowing what they are talking about 
because otherwise you won't need to ask :-)


Am 26.05.2016 um 12:22 schrieb Nick Howitt:
> On 2016-05-26 10:08, Reindl Harald wrote:
>> like above with the SA-setting *you do not read* what others are
>> answering - you are likely on the wrong mailing-list because you are
>> runnung *AMAVIS* which is not a pure spamassassin and can skip SA
>> based on several settings
> I get the drift. SA is perfect and has no bugs
> so it is not worth doing any diagnostics. The
> people on the amavis lists will acknowledge this
> and assume, therefore, that it is their product
> causing the issue. There is no chance that the
> amavis people will say it is an SA issue


On 26/05/2016 07:17, Nick Howitt wrote:


On 26/05/2016 00:29, Reindl Harald wrote:


Am 25.05.2016 um 21:58 schrieb Nick Howitt:

and what is the problem run a local unbound on port 1053 and just add
"dns_server [127.0.0.1]:1053" to your SA-configuration when one thinks
he is capable to run his own servers?

I've tried looking and failed. Any chance of pointing me to where this
is documented?


seriously?

unbound.conf:
 interface: 127.0.0.1
 port: 1053

/etc/mail/spamassassin/local.cf:
dns_server [127.0.0.1]:1053

https://www.google.com/search?q=unbound.conf
https://www.google.com/search?q=spamassassin+dns_server


Seriously, yes. I'd found and set up unbound OK, if you'd read another
of my posts. I had not found it for SA. Not good searching, but I had
not - and I'd tried a few of the links on google and the some man pages.

OK, I've been heavily shot at for my set up which is totally
irrelevant to the question I posed and not a pleasant experience. Is
there any possibility of some help with the problem I posted about?




signature.asc
Description: OpenPGP digital signature


Re: Odd results when using whitelisting

2016-05-30 Thread Nick Howitt
Just for a bit of closure, it looks like when you use amavisd-new with 
SA, it is amavisd-new and not SA which is adding the X-Spam headers. In 
/etc/amavisd/api.conf there is a parameter, $sa_tag_level_deflt, 
defaulted to -99, below which no X-Spam headers are set. If you 
whitelist, you start at -100. So, if the rest of the tests total to less 
than 1, you will not get an X-Spam header. This can be confirmed by 
playing around with this parameter and by upping the amavisd log level 
so you can see the results of all the spam tests for each e-mail even if 
it does not get the X-Spam headers.


Nick

On 26/05/2016 07:17, Nick Howitt wrote:



On 26/05/2016 00:29, Reindl Harald wrote:



Am 25.05.2016 um 21:58 schrieb Nick Howitt:

and what is the problem run a local unbound on port 1053 and just add
"dns_server [127.0.0.1]:1053" to your SA-configuration when one thinks
he is capable to run his own servers?

I've tried looking and failed. Any chance of pointing me to where this
is documented?


seriously?

unbound.conf:
 interface: 127.0.0.1
 port: 1053

/etc/mail/spamassassin/local.cf:
dns_server [127.0.0.1]:1053

https://www.google.com/search?q=unbound.conf
https://www.google.com/search?q=spamassassin+dns_server

Seriously, yes. I'd found and set up unbound OK, if you'd read another 
of my posts. I had not found it for SA. Not good searching, but I had 
not - and I'd tried a few of the links on google and the some man pages.


OK, I've been heavily shot at for my set up which is totally 
irrelevant to the question I posed and not a pleasant experience. Is 
there any possibility of some help with the problem I posted about?






Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 14:10 schrieb Matthias Leisi:

Hm, that looks odd:


Am 27.05.2016 um 20:15 schrieb Alex >:



X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
*  trust
*  [116.251.209.92 listed in list.dnswl.org ]

-^

*  0.0 RCVD_IN_XBL_ALL RBL: Received via a relay in Spamhaus SBL-XBL
*  [180.178.104.22 listed in mykey.zen.dq.spamhaus.net
]

-^

Why do these two different IPs show up? _NONE for 116.251.209.92 does
not add any points, but if that IP ever gets a higher score at dnswl.org
, then it may effect the accuracy of your spamfilter.

Is that a legitimate forwarder IP?


"RCVD_IN_XBL_ALL" smells like deep header inspection



signature.asc
Description: OpenPGP digital signature


Re: Multiple RBLs and dynamic IPs

2016-05-30 Thread Matthias Leisi
Hm, that looks odd:

> Am 27.05.2016 um 20:15 schrieb Alex :

> X-Spam-Report:
> * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
> *  trust
> *  [116.251.209.92 listed in list.dnswl.org]
-^
> *  0.0 RCVD_IN_XBL_ALL RBL: Received via a relay in Spamhaus SBL-XBL
> *  [180.178.104.22 listed in mykey.zen.dq.spamhaus.net]
-^

Why do these two different IPs show up? _NONE for 116.251.209.92 does not add 
any points, but if that IP ever gets a higher score at dnswl.org, then it may 
effect the accuracy of your spamfilter.

Is that a legitimate forwarder IP? 

— Matthias




Re: PHP eval()'d code

2016-05-30 Thread Reindl Harald



Am 30.05.2016 um 01:20 schrieb John Hardin:

On Sun, 29 May 2016, Reindl Harald wrote:

Am 29.05.2016 um 23:38 schrieb John Hardin:

 On Thu, 26 May 2016, RW wrote:

>  I noticed that Bayes is picking-up on very strong tokens from
"eval" and
>  "code" in headers like this:
> >X-PHP-Originating-Script: 1013:global.php(1938) : eval()'d code
> >  The "eval()'d code" part is in just over 2% of my spam, but it's
>  never occurred in a single ham in my corpus.

 It doesn't do too well in masscheck:

 
http://ruleqa.spamassassin.org/20160528-r1745852-n/__PHP_ORIG_SCRIPT_EVAL/detail



where is the rule?


https://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/20_misc_testing.cf


if masscheck pretends that this hits a relevant amount of ham


It doesn't. 3 out of 139k.


so what did you want to say with "It doesn't do too well in masscheck"


while we see 250 sampls *at all* with a "X-PHP-Originating-Script"


Here is the basic "header exists" rule for that same masscheck run:

http://ruleqa.spamassassin.org/20160528-r1745852-n/__HAS_PHP_ORIG_SCRIPT/detail


i see there a lot of stuff but not the rule source itself but that is 
only "has that header" i guess


headerCUST_PHP_EVAL X-PHP-Originating-Script =~ /eval\(\)\'d 
code/

score CUST_PHP_EVAL 1.5
describe  CUST_PHP_EVAL Looks like from exploited webserver


It hits 1595 spam and 1972 ham. Where are you getting only 250 hits for
that header?


in our corpus containg 9 eml files




signature.asc
Description: OpenPGP digital signature