Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-25 Thread David B Funk

On Sun, 25 Feb 2018, LeandroCarlosRodrigues wrote:


Amir Caspi wrote

On that note -- regardless of what OTHER HW/SW solutions might do, since
this is a SpamAssassin mailing list ... is there any facility to implement
this in SA?  That is, when calling the URIBL plugin, could it check both
the shortened URL and the expanded URL (for known shorteners) ?  Does that
facility already exist and I missed it?


Hi Guys! We provide an URIBL that already have a script in Perl to expand
redirections until no more redirections:

[snip..]

Just be careful how you do that "expand redirections until no more redirections" 
or you may get caught in a spammer trap.


If you're going thru a professional redirect site like goo.gl or bit.ly you're 
probably pretty safe but if it's a dedicated spammer site be ware.


I was testing some redirection expantions on URLs from spam and found a site 
that clearly had been crafted to foil this kind of thing.


It was in one of those "check this out" spams which contains one line of 
greeting and then a URL.


When I grabbed it using curl it returned a 301 redirect, so I grabbed that 
target, which lead to another 301, lather-rinse-repeat ad nausium.
However if you used a browser it went to the target "burn fat pills" site in 
just two redirects.


So my bet is that the spammers are crafty enough to check things like browser 
referrer, cookies, etc to detect/differentiate a browser vs a link-checker.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Loading custom rules.

2018-02-25 Thread David Jones

On 02/25/2018 08:20 PM, Philip wrote:
How do you load custom rules... is it as simple as dropping the .cf file 
in the spamassassin directory and restart?


I'm looking at these: https://wiki.apache.org/spamassassin/CustomRulesets

Phil


Yep.  Usually /etc/mail/spamassassin and restart whatever is calling SA 
(spamd, amavis-new, mime-defang, MailScanner, MTA, etc.).


--
David Jones


Loading custom rules.

2018-02-25 Thread Philip
How do you load custom rules... is it as simple as dropping the .cf file 
in the spamassassin directory and restart?


I'm looking at these: https://wiki.apache.org/spamassassin/CustomRulesets

Phil


Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread Bill Cole

On 25 Feb 2018, at 11:13 (-0500), Peter Thomassen wrote:

Reminder: My question was not "how to run DNS efficiently" or "how 
does
SpamAssassin run DNS queries", my question was "how can I influence 
the

order of tests".


The canonical answer is: by adjusting rule priority values and using the 
short-circuit feature.


Unfortunately, that's not applicable to DNS tests because SA's code is 
optimized for total scan time. This means that DNS checks, which have 
built-in latency, are started asynchronously before everything else. So 
if you want to change that to postpone DNS checks with the possibility 
of short-circuiting them, you will need to re-architect that part of SA. 
If you choose that route, I expect that patches to make that alternative 
design a configurable option would be welcome upstream but I doubt that 
creating such a mechanism would be made a project priority in any way 
because the current optimization for overall performance is much more 
useful for most users than economizing on DNS queries.


For what it's worth, a few years ago I had to do an analysis of URIBL 
data value with hard numbers and I found that for that particular 
operation, URIBLs were decisive in a large majority of spams accurately 
classified as spam by SA. It is important to note that for this site (as 
for all sites I have done significant work with in this millennium) the 
vast majority of mail never was seen by SA because it was rejected (or 
much less often was exempted from SA by whitelisting) ahead of the DATA 
phase. This also meant that in that particular case, there was no risk 
of hitting the point of URIBL_BLOCKED. If all messages had been run 
through SA, they would have needed to pay for data feeds to have a 
useful spam control function.


Obviously, no 2 mail systems see exactly the same distribution of ham 
and spam, so your circumstances might make buying feeds uneconomic. 
OTOH, unless you're an extremely adept programmer or your time is not 
very valuable, redesigning the way SA runs tests is very likely to be 
the most uneconomic choice available to addressing your root problem.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread Axb

On 02/25/2018 05:13 PM, Peter Thomassen wrote:

Reminder: My question was not "how to run DNS efficiently" or "how does
SpamAssassin run DNS queries", my question was "how can I influence the
order of tests".


You will have to hack the priorities in the plugins & rules.
This is definitely not trivial and should not be attempted without very 
thorough knowledge of the SA code or the outcome will be ugly or even 
hilarious. Good luck with that!




Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread RW
On Sun, 25 Feb 2018 16:10:01 +0100
Peter Thomassen wrote:

> Hi,
> 
> While being very satisfied, I noticed that some DNS tests often cannot
> be conducted because of rate limits, e.g. by URIBL. 

This is usually the result of not running a proper recursive local dns
server. If you are using an ISP cache, or allowing your own cache to
forward, then you share per IP limits with others.


> Another solution would be to avoid running queries that don't mean
> much. Unfortunately, I could not find information on this (besides a
> thread from 2004; it seems that at that time, such configuration was
> not possible).

SA sends DNS queries as early a possible in parallel, so it can get
on with local tests while waiting. 


Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread Peter Thomassen
Hi Harald,

On 02/25/2018 05:02 PM, Reindl Harald wrote:
>> 1.) If only tests that are known to have positive weight are postponed,
>> no issue can occur. Furthermore, the situation I was referring to is
>> where the score is way beyond the threshold; that is, it is a _lot_ over
>> the threshold. Any negative scores that (in sum) fall into that excess
>> range will be of no consequence.
> 
> well, SA can't know if something has positive or negative weight,
> especially when it comes to DNS queries *before* run it

But I do. That's why I'm asking how to configure it.

>> 2.) If the DNS tests are done so often that at some point they start
>> failing, any potential negative score (e.g. from a DNS whitelist test)
>> will not be applied (because the test failed). Thus, NOT limiting
>> queries will actually _harm_ negatives scores under some circumstances.
> 
> no, test fail when you exceed a limit to a distinct dns-server, the
> DNSWL is pretty sure a different a different dns-server or your local
> cache would step in anyways

That's true. But as I told you, tests are already failing, so there are
already problem with a distinct DNS server.

What I am telling you is the following: You are claiming that running
certain tests (e.g. the problematic DNS test) in the end, and only if
necessary, is a bad idea, because negative weights may change the fate
of the message. My point is that precisely because of the same reason,
risking running into a rate limit is also a bad idea.

So, in summary, the chance of complications is lower if DNS tests are
only run when necessary.

>>> use unbound as local caching resolver and play around with that two
>>> values:
>>>
>>> cache-min-ttl: 90
>>> cache-max-negative-ttl: 90
>>>
>>> 2018-02-24 07:00:56 [856:0] info: server stats for thread 0: 285034
>>> queries, 125619 answers from cache, 159415 recursions, 3196 prefetch, 0
>>> rejected by ip ratelimitin
>>
>> I am already using a caching resolver
> 
> most people do, but most DNSWBLDNSWL have very low TTL (in the range of
> a few seconds) and by default no caching resolver overrides that

1.) Apart from the fact that it may indeed be reasonable to do some ttl
config tuning here, we should also consider why the TTLs are so low.
Obviously, it has to do with how quickly new entries on the lists become
effective (especially the negative TTL).

In order words, messing with this makes the DNS lists less efficient. On
the other hand, if I could just run those tests after determining
whether it's necessary at all, there would be no efficiency impact, but
my problem would be solved.

2.) There's a great number of hosts that only send messages once within
the TTL (even when increasing to a few minutes). For those, tuning that
setup hardly helps.

However, avoiding unnecessary queries helps!


So, while I'm glad that we've clarified all this, I don't think that any
of this was actually directly related to my question.

Reminder: My question was not "how to run DNS efficiently" or "how does
SpamAssassin run DNS queries", my question was "how can I influence the
order of tests".

Best,
Peter

-- 
--
a4a GmbH
Scheffelstr. 14
97072 Würzburg
Germany

fon: +49-931-2705351
fax: +49-931-27049942

web: https://a4a.de
e-mail: i...@a4a.de



signature.asc
Description: OpenPGP digital signature


Re: Run expensive test last, and skip if meaningless

2018-02-25 Thread Peter Thomassen
Hi Harald,

On 02/25/2018 04:25 PM, Reindl Harald wrote:
>> While being very satisfied, I noticed that some DNS tests often cannot
>> be conducted because of rate limits, e.g. by URIBL. At the same time,
>> running these tests often will not change the outcome of other tests, if
>> the score is already way beyond the threshold.
> 
> wrong assumption
> 
> remaining tests can also lower the score (DNSWL as example)
> 
> remaining tests can also put the score which is below the threshold
> because of DNSWL/BAYES_00 and so on beyond the threshold when it hits
> URIBL_BLACK

wrong interpretation

Here are two arguments how a) the issue you're raising can be mitigated;
b) interference with negative scores is _lowered_ by my proposal:

1.) If only tests that are known to have positive weight are postponed,
no issue can occur. Furthermore, the situation I was referring to is
where the score is way beyond the threshold; that is, it is a _lot_ over
the threshold. Any negative scores that (in sum) fall into that excess
range will be of no consequence.

2.) If the DNS tests are done so often that at some point they start
failing, any potential negative score (e.g. from a DNS whitelist test)
will not be applied (because the test failed). Thus, NOT limiting
queries will actually _harm_ negatives scores under some circumstances.

In other words, a "failing" test is _bad_ because it may lead to false
positives. The "fail-safe" solution would be that failure leads to an
increased number of false negatives (i.e. less efficient filtering), but
you would never want to inflate the number of false positives.

Next time you write, I'd appreciate if you a) assumed that the person
asking does in fact have some knowledge, and b) did not make claims
about which ungrounded assumptions I may be making.

> use unbound as local caching resolver and play around with that two values:
> 
> cache-min-ttl: 90
> cache-max-negative-ttl: 90
> 
> 2018-02-24 07:00:56 [856:0] info: server stats for thread 0: 285034
> queries, 125619 answers from cache, 159415 recursions, 3196 prefetch, 0
> rejected by ip ratelimitin

I am already using a caching resolver.

>> Another solution would be to avoid running queries that don't mean much.
>> Unfortunately, I could not find information on this (besides a thread
>> from 2004; it seems that at that time, such configuration was not
>> possible).
> 
> all DNS queries are fired at the begin in parallel and while SA waits on
> the repsones it can and will still run other tests

Thank you, this is what I observed. I know that some consider this a
performance benefit, and it very well may be.

However, my concern is not performance. You know, my mail setup is
otherwise so blazingly fast that I don't care if a message arrives two
seconds delayed because of a DNS delay ...

Instead, my concern is precisely the question that I asked, namely how
to influence that behavior you have described.

>> Hence: Can I, and if so, how can I configure SpamAssassin to run certain
>> tests only after other tests have been run, under the condition that the
>> classification result is not yet certain?
> 
> since it is a *scoring summary* the result is *not* certain until all
> positive and negative score-points are applied

That is just plain wrong.

It's like saying "since voting for one party reduces the fraction
others, the voting result depends on the last voter". It's just wrong.

Imagine a situation where the sum of any potential outstanding scores is
positive (or negative, but close to zero). In this case, the result is
already certain. And this can very well be the case, as there are
certainly tests that never produce a negative score.

For example, lets say my threshold is 5.0, and I am using a DNS
blacklist with score +5 and a whitelist with score -5. Let's say that
these two tests are the only ones postponed (everything else already was
run and summed up), and the score sum so far is 32. Thus, the two
missing tests may move the score sum so span the range 27 to 37.

Are you telling me that the classification result is uncertain as long
as the two tests have not been run?


Nevertheless, thank you for your answer. In any case, as far as my
original question is concerned, I'm as informed as before, and I'd
appreciate any additional input ...

Best,
Peter

-- 
--
a4a GmbH
Scheffelstr. 14
97072 Würzburg
Germany

fon: +49-931-2705351
fax: +49-931-27049942

web: https://a4a.de
e-mail: i...@a4a.de



signature.asc
Description: OpenPGP digital signature


Run expensive test last, and skip if meaningless

2018-02-25 Thread Peter Thomassen
Hi,

While being very satisfied, I noticed that some DNS tests often cannot
be conducted because of rate limits, e.g. by URIBL. At the same time,
running these tests often will not change the outcome of other tests, if
the score is already way beyond the threshold.

One solution would be to purchase additional queries. In that sense,
these tests would not only be expensive if the computational sense, but
also in the literal sense.

Another solution would be to avoid running queries that don't mean much.
Unfortunately, I could not find information on this (besides a thread
from 2004; it seems that at that time, such configuration was not possible).

Hence: Can I, and if so, how can I configure SpamAssassin to run certain
tests only after other tests have been run, under the condition that the
classification result is not yet certain?

Thanks a lot!

Best,
Peter

-- 
--
a4a GmbH
Scheffelstr. 14
97072 Würzburg
Germany

fon: +49-931-2705351
fax: +49-931-27049942

web: https://a4a.de
e-mail: i...@a4a.de


Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-25 Thread RW
On Sun, 25 Feb 2018 05:25:06 -0700 (MST)
LeandroCarlosRodrigues wrote:

> Amir Caspi wrote
> > On that note -- regardless of what OTHER HW/SW solutions might do,
> > since this is a SpamAssassin mailing list ... is there any facility
> > to implement this in SA?  That is, when calling the URIBL plugin,
> > could it check both the shortened URL and the expanded URL (for
> > known shorteners) ?  Does that facility already exist and I missed
> > it?  
> 
> Hi Guys! We provide an URIBL that already have a script in Perl to
> expand redirections until no more redirections:
> 

There's already a plugin to do that.


Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-25 Thread LeandroCarlosRodrigues
Amir Caspi wrote
> On that note -- regardless of what OTHER HW/SW solutions might do, since
> this is a SpamAssassin mailing list ... is there any facility to implement
> this in SA?  That is, when calling the URIBL plugin, could it check both
> the shortened URL and the expanded URL (for known shorteners) ?  Does that
> facility already exist and I missed it?

Hi Guys! We provide an URIBL that already have a script in Perl to expand
redirections until no more redirections:

https://www.dropbox.com/s/5aorrijafw5ygk0/uribl.pl?dl=0
  

Our URIBL ignores the intermediary URLs. The target is the final URL,
that will be viewed by user:

root@mx-br:/home/ubuntu# ./uribl.pl "http://goo.gl/ylUAd;
(www.hollywoodreporter.com) is not listed in 'uribl.spfbl.net'.

More about consumption of our URIBL can be read here:

http://spfbl.net/en/uribl   

Could you guys help me adjust this script to run inside SA?



--
Sent from: http://spamassassin.1065346.n5.nabble.com/SpamAssassin-Users-f3.html