Re: Warning: Your Pyzor may be broken.

2024-06-09 Thread Michael Orlitzky
On 2024-06-08 14:45:34, Bill Cole wrote:

> I went looking for a better fix and found a reported issue at
> https://github.com/SpamExperts/pyzor/issues/155 matching my original
> symptoms in which a workaround was provided: install directly from
> the GitHub project's master.zip link, i.e. a snapshot assembled from
> the current state of the repo, which claims to be v1.1.1. I do not
> like that solution at all, and added a comment to that issue
> suggesting that they fix the problem by cutting a release for
> PyPI. No response yet, but it has only been a matter of minutes.

The same issue was reported in 2016 and ignored for eight years before
being closed out of frustration (rather than because they did
something about it):

  https://github.com/SpamExperts/pyzor/issues/54


Re: Dinged for .Date

2024-01-15 Thread Michael Orlitzky
On Mon, 2024-01-15 at 17:06 -0800, Cabel Sasser wrote:
> 
> There are 1,239 gTLDs. The SpamAssassin source* blocks just *22* of them.
> 

The official unofficial KAM ruleset blocks a few more, and there are
plenty of third-party URIBLs that essentially block gTLDs through SA,
albeit at one level of abstraction.


> If you believe every new gTLD is garbage (and I get that!), why isn’t 
> SpamAssassin automatically dinging, say, 1,200+ of them?
> 
> Or put another way, why _these_ 22, and _only_ these 22, and not the rest?

Be careful what you wish for :P



Re: Dinged for .Date

2024-01-15 Thread Michael Orlitzky
On Mon, 2024-01-15 at 15:58 -0800, Cabel Sasser wrote:
> 
> Can anyone help me understand “the science”? And how these domains are chosen 
> for such a heavy punishment?

What you're facing is essentially an economic problem. Everyone knows
dot-com, and to a lesser extent dot-net and dot-org. But everything
else is junk: if you're the fifth guy to try to buy example.com, you're
probably not who people are looking for when they type www.example.com
into their web browsers. The other TLDs are also much harder for people
to remember if they see it on a commercial. As a result, dot-info, dot-
biz, and everything after have always been considered knock-offs.

When the wave of new gTLDs hit, the value of each successive one became
diluted even further. By the time you get to dot-date, you're at what
should be, like, somebody's 40th choice for a domain name. How to you
sell that? At a huge fucking discount, if you want anyone to buy it!

That's one half of your economic problem.

Now imagine you're trying to block spammers by domain name, and there's
one particular set of domain names that they can get at a 90% discount
because nobody wants them otherwise. Regardless of how many legitimate
companies use those domains, the signal to noise ratio is going to be
crap.

So, the other half of your economic problem is: how much money does it
cost me (as a recipient) to block dot-date, versus how much does it
cost me to not block it? We have customers who complain about spam and
customers who complain about blocked messages. It's a pretty easy
calculation for a recipient to make, and the result for me at least is
that it's less work (i.e. less expensive) to just block every new gTLD
and whitelist the few legitimate senders brave enough to live there.


Re: 4.0.0 dnsbl_subtests.t test failures

2022-12-28 Thread Michael Orlitzky
On Wed, 2022-12-28 at 16:44 +0200, Henrik K wrote:
> 
> Doesn't look too good for Gentoo packaging though, if since 2009 v310.pre
> and newer have been full of all sorts of plugins loaded.  It's like nobody
> actually cared since most of the stuff is useful.  :-)
> 

Nobody noticed until now, and now it's getting fixed. The intersection
of,

  1. Gentoo users
  2. People who run their own mail server
  3. People who blindly run the default configuration on an important 
 network-facing daemon

is pretty small. And given that changing it is likely to generate a few
complaints, compared to the contented silence regarding the existing
behavior, you can maybe understand why no one has tried to proactively
fix it when it wasn't broken.



Re: 4.0.0 dnsbl_subtests.t test failures

2022-12-28 Thread Michael Orlitzky
On Wed, 2022-12-28 at 16:20 +0200, Henrik K wrote:
> 
> Common sense would ask that how is SPF harmful for the user?  One would
> think it would be actually desirable like any other network lookups, that
> user might have accidentally left disabled?  But sure, if this is the Gentoo
> way, so be it.  I had enough of 90's linux flashbacks trying it for the
> first and last time today.  :-)
> 

Well, SPF wasn't nearly as reliable in 2005 as it is now, and it pulls
in an extra dependency.

Probably the best answer is that by having this ability, Gentoo
attracts the sort of user who likes to disable such things to save disk
space, shave off a few CPU cycles, or improve security. And then
there's a feedback loop wherein most of our users want to retain the
ability to control what gets installed/enabled.



Re: 4.0.0 dnsbl_subtests.t test failures

2022-12-28 Thread Michael Orlitzky
On Wed, 2022-12-28 at 15:38 +0200, Henrik K wrote:
> 
> Disabling default plugins solves nothing, just creates a worse experience
> for user.  Educating and guiding users to use DNS properly does not require
> this.

Gentoo builds everything from source and allows the user to
enable/disable some options for each package, called USE flags. In the
context of a C program, you might have USE=spf which would translate to
an additional dependency on libspf2 and passing

  ./configure --enable-spf

at build time to enable that feature.

These map less well to scripting languages where features are often
enabled at runtime based on the existence of some optional package. In
2005, we had a flag for USE=spf in spamassassin that was supposed to
control whether or not spamassassin used SPF.

Without disabling the plugin, how would that work? If the user happens
to install Mail::SPF as a dependency of something else and if the
plugin is *not* disabled, spamassassin will (surprise!) start using SPF
against the user's wishes.

There's no reason for it today because there's no USE=spf flag for
spamassassin, and it wasn't implemented very well back in 2005 (only
certain plugins should have been disabled, and only conditionally). But
the idea isn't as crazy as it first sounds.


Re: My 10 years old domain have a bad TLD

2021-05-04 Thread Michael Orlitzky
On Tue, 2021-05-04 at 08:28 +0200, Denis Chenu wrote:
> Yes,
> 
> You receive spam from pro and then all pro gTLD owner received a punishment.
> 
> It's same for all gTLDS, like the old teachers who punish a whole school 
> class.
> 

You're right, but as someone who blocks .pro I don't care anymore.
I've wasted half my life fighting assholes who make money by wasting my
time. To a few decimal points, 100% of the mail we get from .pro
domains is spam. I don't care about right or wrong, I just want the
spam to stop, and blocking all of .pro is the easiest way to do that.
You can email postmaster@ to be whitelisted if you're legitimate.




Constructive solution to the blacklist thread

2020-07-23 Thread Michael Orlitzky
I'd like to offer a constructive solution to the blacklist/whitelist
argument to the Apache foundation and Kevin in particular.

There is opposition to this change on at least two fronts:

  * Philosophical: the change does nothing to address the underlying
political problems. Black people are asking not to be murdered;
changing "blacklist" to "blocklist" as the sole response is
insulting and transparently virtue signaling.

  * Practical: the gesture costs the Apache foundation nothing, because
the "gift" is paid for by the labor of the users who have to
reconfigure their systems.

Whether or not you agree with those bullet points, here's what I propose
to address them...

The Apache foundation has some cash laying around. Make whatever wording
changes you like, but **at the same time**, donate a meaningful amount
of money to a cause like the ACLU or the defense/medical funds for the
protestors. This addresses the bullet points above:

  * The donation is of real value to the people who receive it, and
addresses the underlying problem in that it helps the people who are
themselves helping in more direct ways.

  * The donation is also of value to the donor, so cannot be considered
a token gesture.

This will not be free for users: we will all still have to reconfigure
our systems. But if that "wasted" time actually helps the stated cause,
then it's no longer wasted. Knowing that an hour in my text editor may
have helped someone get out of jail or replace an eyeball shot out by a
federal goon makes it much more palatable. In other words, people might
still think it's stupid, but could be willing to suck it up if the
Apache foundation puts its money where its mouth is.

This surely won't please everyone, but it may be satisfactory to a
majority of people on both sides. Also, it will stop the email threads.



Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-10 Thread Michael Orlitzky
On 2020-07-10 20:02, Luis E. Muñoz wrote:
> 
> I keep hearing about this mythical people that get terribly offended by 
> the use of these words. I've been working in IT since the 90s, and I've 
> never actually seen one in real life. Do they really exist?
> 

What black people are asking for is to not be murdered. The idea to
change the word "blacklist" to "blocklist" instead as a consolation
prize comes solely from rich white folks, and is itself condescending
and offensive.

As with "all lives matter," it's possible to have the best of intentions
yet still come across as a patronizing douchebag.


Re: Spamhaus Technology contributions to SpamAssassin

2019-07-03 Thread Michael Orlitzky
On 7/3/19 5:43 AM, Riccardo Alfieri wrote:
> 
> You can find all the needed files here: 
> https://github.com/spamhaus/spamassassin-dqs
> 

Could I talk you into tagging a v0.0.1 release? That would make it
easier for us to create a system package for the new plugin.


Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-21 Thread Michael Orlitzky

On 12/21/18 5:52 PM, Bill Cole wrote:


Fine:

#!/bin/sh
cd `mktemp -d -t HappyMichael???`



Yes, Merry Christmas =P



Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]

2018-12-21 Thread Michael Orlitzky

On 12/20/18 7:00 PM, Bill Cole wrote:


 mkdir /tmp/saupdate-1849156


Never use a fixed path under /tmp =)



Re: spamd Will Not Create unix:socket

2017-11-28 Thread Michael Orlitzky
On 11/27/2017 10:34 PM, Colony.three wrote:
>> ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin
>>
>> There's a root exploit for the "spamd" user in that last line. Assuming
>> you got the tmpfiles.d thing working, you should delete those
>> ExecStartPre commands.
> 
> Can you explain further please?
> 
> If this is true, someone should tell Red Hat that their
> /usr/lib/systemd/system/spamass-milter-root.service has the same problem.
> 

The "chown" command follows both symlinks and hardlinks by default. When
used with the "-R" flag, it only follows hardlinks, but that can still
be abused by the "spamd" user. The first time "chown -R" gets executed,
you give ownership of /run/spamassassin to the "spamd" user. The second
(and third, ...) time that the service is started, the "spamd" user owns
that directory and can place a hard link in it pointing to a root-owned
file. The "chown" call will then give root's file to the "spamd" user.

The exploit is trickier in this case because /run is on a tmpfs, and
because hard links can't cross filesystem boundaries. But I would bet
that you have something else sensitive in /run that can be used to gain
root.


Re: spamd Will Not Create unix:socket

2017-11-27 Thread Michael Orlitzky
On 11/27/2017 11:53 AM, Colony.three wrote:
> 
> It simply would not create /run/spamassassin directory on boot.  It is
> supposed to create it automatically like clamd does, since /run is wiped
> at each boot.  To make it work I finally had to add:
> ExecStartPre=/usr/bin/mkdir /run/spamassassin
> ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin
> 

There's a root exploit for the "spamd" user in that last line. Assuming
you got the tmpfiles.d thing working, you should delete those
ExecStartPre commands.


Re: Oracle Eloqua.com marketing emails

2017-10-22 Thread Michael Orlitzky
On 10/22/2017 09:31 AM, David Jones wrote:
> 
> You hard-coded the IPs based on their current SPF record?  What if 
> things change and they start sending out different servers/IPs?

If they add IPs, then either,

  a) I never know because we don't get spam from them -- great.
  b) We get spam from them, and I track down and block the new IPs.

If they release some of their IPs on the market, whoever buys them will
have to complain (our postmaster address is in the rejection message).


Re: Oracle Eloqua.com marketing emails

2017-10-22 Thread Michael Orlitzky
On 10/21/2017 11:23 AM, David Jones wrote:
> Anyone have any experience with eloqua.com marketing emails and handle 
> these with custom local rules?

We blocked some of their space back in 2013 with no complaints, and
thanks to their SPF record, just blocked a bunch more.


Re: apache.org have URIBL_BLOCKED now :/

2017-08-08 Thread Michael Orlitzky
On 08/08/2017 02:32 PM, Benny Pedersen wrote:
> subj might concern infra staff
> 
> forward please to infra
> 

URIBL_BLOCKED means that the URIBL refused your DNS query:

  http://uribl.com/refused.shtml

The name "apache.org" isn't blacklisted, and there's nothing apache can
do to fix it. You need to make your DNS queries from somewhere else,
probably.


Re: Uninitialized values in URIDNSBL

2017-02-08 Thread Michael Orlitzky
On 02/08/2017 02:08 PM, Kevin A. McGrail wrote:
> On 2/8/2017 1:22 PM, Philip Prindeville wrote:
>> While we’re waiting for that, can I just grab Util.pm and 
>> Plugin/URIDNSBL.pm out of trunk, or are there more dependencies than 
>> that to splice the fix back into 3.4.1?
> I wouldn't be able to say.  EIther custom patch or run trunk would be my 
> recommendation.
> 

I posted a custom patch to our Gentoo bug at

  https://590338.bugs.gentoo.org/attachment.cgi?id=452626

But as the warning in the comment states:

  * I don't know perl.
  * I haven't even tried it.

Give it a try if you're desperate =)



Re: Legit Yahoo mail servers list

2017-01-26 Thread Michael Orlitzky
On 01/26/2017 02:53 PM, David Jones wrote:
> 
> I  understand what their SPF record means and how it works
> but what they are publishing in their SPF record is not common.
> Normally this would expand out to a list of IPs and CIDRs or DNS
> records that can be turned into IPs that postwhite can use to build
> a list for bypassing RBL checks.
> 

Are the problematic RBL checks performed by Postfix, or by SpamAssassin?

The possibilities for whitelisting in SpamAssassin are a lot more
flexible, so if I were you, I would tweak postscreen (or my smtpd
restrictions) to the point where it causes no false positives. Then
SpamAssassin can be configured to do the same level of RBL checks that
are occasionally causing false positives now. The double lookups aren't
expensive because they're cached locally. And the false positives are
easy to deal with in SA, where for example you have access to the result
of SPF.

If you can get it to the point where SA is the one blocking Yahoo, then
all you have to do is add a meta rule that subtracts a few points when
the sender's domain belongs to Yahoo and the SPF_PASS rule hits.



Re: Legit Yahoo mail servers list

2017-01-26 Thread Michael Orlitzky
On 01/26/2017 01:29 PM, Reindl Harald wrote:
> 
> SPF_NEUTRAL will NEVER hit SPF_PASS and that's the problem with ?all
> 

SPF mechanisms are evaluated in order, and each one has a result type
associated with it. The default result is "+" for "pass". Another type
of result is "?" for "neutral."

The record,

  v=spf1 ptr:yahoo.com ptr:yahoo.net ?all

is equivalent to

  v=spf1 +ptr:yahoo.com +ptr:yahoo.net ?all

and it means

  a) PASS if "ptr:yahoo.com" matches
  b) PASS if "ptr:yahoo.net" matches
  c) NEUTRAL if "all" matches



Re: Legit Yahoo mail servers list

2017-01-26 Thread Michael Orlitzky
On 01/26/2017 12:59 PM, Reindl Harald wrote:
> 
> 
> Am 26.01.2017 um 18:51 schrieb Michael Orlitzky:
>> On 01/26/2017 12:22 PM, David Jones wrote:
>>> ...
>>> They don't publish a good SPF record so I am not able to add
>>> them to my postwhite list.
>>>
>>
>> Isn't that what their SPF record does?
> 
> did you notice the "?all"
> re-read your spf manuals
> 

The OP is looking for a way to whitelist so the "?all" is irrelevant.
Does the sending IP pass the SPF check? If so, whitelist it.



Re: Legit Yahoo mail servers list

2017-01-26 Thread Michael Orlitzky
On 01/26/2017 12:22 PM, David Jones wrote:
> Anyone know how to get a list of legit mail servers for Yahoo?
> They don't publish a good SPF record so I am not able to add
> them to my postwhite list.
> 
> # dig yahoo.com txt +short
> "v=spf1 redirect=_spf.mail.yahoo.com"
> # dig _spf.mail.yahoo.com txt +short
> "v=spf1 ptr:yahoo.com ptr:yahoo.net ?all"
> 
> The only way I can think of even coming close is to analyse
> my mail logs for clean mail IPs with PTR values ending in
> yahoo.com and yahoo.net. 

Isn't that what their SPF record does?




Re: T_DKIM_INVALID from yahoo.com

2016-12-24 Thread Michael Orlitzky
On 12/24/2016 11:05 AM, Ian Zimmerman wrote:
> All mail I get from yahoo customers [1] scores on T_DKIM_INVALID, and
> always has.  Why?
> 

Is there any correlation between the DKIM result and the size of the
message?




Re: Matching infinite sets

2016-08-22 Thread Michael Orlitzky
On 08/22/2016 09:02 AM, Joe Quinn wrote:
> On 8/22/2016 8:54 AM, Michael Orlitzky wrote:
>> On 08/21/2016 03:22 PM, Damian wrote:
>>> There is no such set B, as it would contain itself.
>> The empty set contains itself.
> That's an easy mistake to make. The empty set is {}, the set that
> contains only the empty set is {{}}. Sets are discrete elements that
> don't get "flattened".
> 
> In perl syntactic lists do get flattened though, which leads to some fun
> times. You can do silly things like @concatenated = (@listOne, @listTwo).

"Contains" in the context of sets means "is a superset of" =)

(I'm just being pedantic, I don't actually have a point.)



Re: Matching infinite sets

2016-08-22 Thread Michael Orlitzky
On 08/21/2016 03:22 PM, Damian wrote:
>>
> There is no such set B, as it would contain itself.

The empty set contains itself.



Re: Disabling spamcop plugin

2016-04-13 Thread Michael Orlitzky
On 04/13/2016 09:50 AM, Reindl Harald wrote:
> 
> enough problems by wasting time if you have to maintain 10, 20, 30 or 
> more servers and in case of problems need fast downgrades - especially 
> if you run virtual machines where all the compile jobs share hardware

emerge --buildpkg will create a binary package that you can instantly
downgrade to with emerge --usepkg


> besides that on a production server no compilers should be installed at 
> all - the generation of malware which compiles itself is only a question 
> of time

I'm not convinced that an attacker who can execute commands on your
server is more dangerous when one of those commands is `gcc`.


> 
> what gentoo would need to solve for professional environemnts is that 
> you have one machine which pulls the updates, compiles them and apckage 
> them in a way all other machines in the network can pull and apply them 
> in precompiled from over ftp, http or whatever network protocol
> 

As you wish:

  https://wiki.gentoo.org/wiki/Binary_package_guide



Re: [OT] still configuring [Was: Disabling spamcop plugin]

2016-04-13 Thread Michael Orlitzky
On 04/13/2016 01:26 AM, Ian Zimmerman wrote:
> On 2016-04-12 10:57 -0400, David Niklas wrote:
> 
>> You could use Gentoo, you get to configure it all yourself!
> 
> Funny you'd say that, I _am_ actually switching to it - on my
> "workstation" role computers.  I'm already over 50% over the hump, I
> think. 
> 
> But on "server type" computers, I just cannot spare a dedicated security
> branch.  I really don't have the time, and more importantly the nerves,
> to scramble and recompile the world when each new vulnerability is
> announced.
> 

This shouldn't be worse on Gentoo than it is anywhere else. We have a
mailing list, gentoo-announce [0], where security advisories get sent.
But, they only get sent out once the vulnerability has been fixed and
marked stable /everywhere/, so they often come a little late.
Nevertheless, security issues are fixed ASAP:

  1. Some vulnerability is found.

  2. The security team opens a bug, and contacts the maintainer of the
 affected package.

  3. A fix is committed to the tree.

  4. The arch teams scramble to stabilize the version with the fix.

  5. The announcement is sent out.

As long as you follow a semi-regular update cycle, you shouldn't have to
do anything special, even if you run a stable system. The affected
package will be recompiled automatically as part of the updates. Any
packages *depending on* that package (like, if they're statically linked
to it) will also be recompiled. No need to recompile @world.


[0] https://www.gentoo.org/get-involved/mailing-lists/



Re: Rejecting without backscatter (was Re: Spamassassin not catching spam (Follow-up))

2015-03-26 Thread Michael Orlitzky
On 03/26/2015 08:43 AM, David F. Skoll wrote:
 On Thu, 26 Mar 2015 12:09:58 +0100
 Reindl Harald h.rei...@thelounge.net wrote:
 
 why in the world would a reject *before queue* trigger a backscatter
 or bounce on my side?
 
 How do you do before-queue rejection of a message that is...
 
 1) Directed to multiple recipients...
 
 2) Some of which have different spam thresholds or have even opted-out?
 
 Solve that problem, and then I agree with you.  And saying well, don't
 let different end-users have different settings is not a solution.
 Neither is tempfail all recipients but the first so the message
 is transmitted one time for each recipient.
 


If one of your customer domains has non-default settings, give them
their own IP address and a separate MX record pointing to that address.
Then if a multi-recipient message is addressed to someone in that
domain, the sending MTA will split the message before sending it
(because it's headed to a different server, as far as the MTA knows).

Your pre-queue filter can then switch settings depending on the IP
address, and should satisfy your criteria above.

Obviously it's a little annoying to set up an MX for every such domain,
but you can charge a little PITA fee for domains that want special
treatment.



Re: PayPal spam filter?

2013-06-16 Thread Michael Orlitzky
On 06/16/2013 06:48 PM, Jason Haar wrote:
 Just a FYI but SA scores failures of ~all much stronger than it does
 for -all
 
 eg I just deliberately forged an email for my own domain and SA picked
 up the SPF hard failure and added 0.0 to the final score :-(
 
 The logic of the score is well documented, just shows how much SPF
 doesn't work
 
 http://spamassassin.1065346.n5.nabble.com/default-score-for-SPF-HELO-FAIL-too-low-td13894.html
 

The reasoning is sound. Softfail has a better ham/spam ratio than
hardfail. Which is beside the point -- SPF is not a spam filtering
mechanism. It prevents HELO/MAIL FROM forgery. If you don't want to
accept forgeries (this is independent of what you want to do with spam),
reject the hardfails.




Re: .pw / Palau URL domains in spam

2013-05-08 Thread Michael Orlitzky
(replying randomly in the thread)

We've been getting complaints about these, so while I don't like to
target a TLD indiscriminately, I think I'd like to add a few points to
mail from *.pw for a couple of months until things clear up.

What's the correct way to do this? A regexp on the from/return-path
headers? Or is something built-in?



Re: FROM_MISSP_* causing FPs

2012-11-29 Thread Michael Orlitzky
On 11/29/2012 05:43 PM, John Hardin wrote:
 On Thu, 29 Nov 2012, Kris Deugau wrote:
 
 I've just had another couple of reports of false positives due to hits
 on one or more of the FROM_MISSP_* rules.

 Curious coincidence:  Almost all of the reports to date have involved
 webform email for real estate companies.  Most of the rest have involved
 scan-to-email multifunction devices - mostly Xerox used by real
 estate companies.  O_o
 
 Is there any possibility of getting user agent headers for these FPs? If a 
 particular piece of legit software always does this then obviously those 
 rules should ignore such messages.
 

I had one guy actually read the rejection message and contact
postmaster@ about this.

His sig shows:

  Sent from my MOTOROLA ATRIX™ 2 on ATT

And the headers:

  X-Spam-Flag: NO
  X-Spam-Score: 4.224
  X-Spam-Level: 
  X-Spam-Status: No, score=4.224 required=5 tests=[FREEMAIL_FROM=0.001,
  FROM_MISSP_EH_MATCH=2.499, FROM_MISSP_FREEMAIL=1.723,
  HTML_MESSAGE=0.001] autolearn=disabled
  From: u...@example.comu...@example.com
  X-Mailer: Motorola android mail 1.0

It was relayed through AOL, who you think would clean that up. This
particular model also base64 encodes the entire message...


Re: Claims manager / LOTTO_AGENT

2012-11-08 Thread Michael Orlitzky
On 11/08/2012 10:44 AM, John Hardin wrote:

 This is a client of ours (a law firm) and not the company that I work
 for. *I* know there's probably nothing sensitive in there, but just to
 cover my ass I'd need to get permission to send the results off-site.
 
 Only the list of rules which hit is publicly visible, the actual content 
 of the message is not. Any leakage of confidential information is very 
 unlikely.

I know, but there chance isn't zero. For example, I wouldn't want to
mass-check a corpus of emails to my girlfriend, and have it report that
they hit LOTS_OF_VIAGRA.

Likewise, things like LOTTO_AGENT can reveal that someone communicated
with a claims manager. I've explained both sides, and as long as it's a
non-zero chance, they aren't having it. It isn't even that there's a
risk of leaking anything -- the fact that anything at all is sent could
be used as justification for a pain-in-the-ass investigation that nobody
wants.


 From their perspective, it's just simpler to say no: it's not worth the
 time or effort to even think about if there's a minute chance of it
 coming back to bite them legally.
 
 I will take a look at claims manager in the 419 rules.
 

I appreciate it, thanks.


Claims manager / LOTTO_AGENT

2012-11-07 Thread Michael Orlitzky
So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points.
This is bad news for,

  Barbara R. Krieg, Claims Manager
  Foodliner, Inc. / Quest Liner / Truck Country P.O. Box 1565 Dubuque,IA

who has a signature at the bottom of her messages.

This is compounded by the fact that

  ADVANCE_FEE_2_NEW_MONEY = __ADVANCE_FEE_2_NEW_MONEY  ...
  __ADVANCE_FEE_2_NEW_MONEY = LOTS_OF_MONEY  __ADVANCE_FEE_2_NEW
  __ADVANCE_FEE_2_NEW  = (__AFRICAN_STATE + ... + LOTTO_AGENT + ...  1)

for a total score of around 7.8. Believe it or not, claims managers talk
about LOTS_OF_MONEY =)

Can one of these be made a little more strict? Sorry to be a pain and
submit these one at a time, but most of the ones that give me trouble
are confidential.


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread Michael Orlitzky
On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote:
 On 11/07, Michael Orlitzky wrote:
 So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points.
 This is bad news for,

   Barbara R. Krieg, Claims...
 
 When you put a string an an email that hits a spamassassin rule... your
 email then hits that spamassassin rule.  You should generally try to avoid
 that.
 

Yeah, well it's her job title, so...? You misunderstand statistics. The
data aren't wrong.


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread Michael Orlitzky
On 11/07/2012 10:12 PM, dar...@chaosreigns.com wrote:
 On 11/07, Michael Orlitzky wrote:
 Yeah, well it's her job title, so...? You misunderstand statistics. The
 data aren't wrong.
 
 Do I?  I think it's more likely that you misunderstand what is expected of
 spamassassin rules.
 

Sorry, I was a little rude. But saying that she shouldn't put her job
title anywhere in an email, ever, is ridiculous. The inputs (spam, ham)
to the classifier are assumed god-given; and the classification needs to
reflect the data, not the other way around.


 Somebody really should put up a page in the wiki explaining that rules all
 have false positives, and that's the entire reason we don't flag an email
 as spam for any one rule, etc..

Sure, that's why I pointed out that LOTTO_AGENT also helps trigger
ADVANCE_FEE_2_NEW_MONEY, and combined they score 7.8.


 But if you provide us with more masscheck data, we can do a better job of
 automatically calculating ideal scores.

This is my fault, of course, but I'm not allowed to mass-check this
stuff. It's ongoing legal correspondence.


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread Michael Orlitzky
On 11/07/2012 10:21 PM, dar...@chaosreigns.com wrote:
 On 11/07, Michael Orlitzky wrote:
 On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote:
 On 11/07, Michael Orlitzky wrote:
 So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points.
 This is bad news for,

   Barbara R. Krieg, Claims...

 When you put a string an an email that hits a spamassassin rule... your
 email then hits that spamassassin rule.  You should generally try to avoid
 that.

 Yeah, well it's her job title, so...? You misunderstand statistics. The
 data aren't wrong.
 
 After re-reading, I think you may have misunderstood my suggestion to avoid
 putting stuff in emails that is known to hit spam rules.  I wasn't
 suggesting that Barbara R. Krieg change her signature, I was suggesting
 that you not include it intact when posting to this mailing list about it.
 

I see. My apologies. Disregard the first half of that last message.


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread Michael Orlitzky
On 11/07/2012 10:36 PM, dar...@chaosreigns.com wrote:
 On 11/07, Michael Orlitzky wrote:
 Sorry, I was a little rude. But saying that she shouldn't put her job
 title anywhere in an email, ever, is ridiculous. 
 
 Certainly.
 
 The inputs (spam, ham)
 to the classifier are assumed god-given; and the classification needs to
 reflect the data, not the other way around.
 
 If the classifier is spamassassin, and The inputs are the spam
 and ham data provided via masscheck, then... the scores provided via
 sa-update *do* reflect the data.  So I'm not sure what you mean.
 
 The ideal rule scores are chosen to cause one false positive (ham flagged
 as spam) in every 2,500 hams, while maximizing the number of spams
 correctly flagged as spams.  With so few hams hitting this rule in the
 masscheck corpora, we're way below that threshold based on the data we
 have.
 

I wrote that before I saw your clarification, sorry again for coming off
as a jerk. Ignore it.


 This is my fault, of course, but I'm not allowed to mass-check this
 stuff. It's ongoing legal correspondence.
 
 Er, what?  You're not allowed to provide a list of which rules hit each
 of your emails?  Or you're not allowed to run a program on your emails
 that isn't spamassassin?  Or did I just not put This does not require
 sending us your email in bold enough times on the masscheck page?
 

This is a client of ours (a law firm) and not the company that I work
for. *I* know there's probably nothing sensitive in there, but just to
cover my ass I'd need to get permission to send the results off-site.
From their perspective, it's just simpler to say no: it's not worth the
time or effort to even think about if there's a minute chance of it
coming back to bite them legally.


Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3

2012-09-07 Thread Michael Orlitzky
On 09/07/2012 02:36 PM, Kevin A. McGrail wrote:
 On 9/6/2012 11:32 AM, Michael Orlitzky wrote:
 On 09/06/2012 06:16 AM, Kevin A. McGrail wrote:
 With no examples in corpora and good s/o's, i think mass check is likely
 to score the rule high which brings us back to the same point. I did
 consider that though.
 Regards,
 KAM
 I admit my initial instinct was what Jari suggested, but I defer to your
 expertise =)
 Let's see what masscheck shows:
 
 svn commit -m 'Added overlap meta rule for BILLION_DOLLARS and 
 US_DOLLARS_3' rulesrc
 Adding rulesrc/sandbox/kmcgrail/20_kam.cf
 Transmitting file data .
 Committed revision 1382118.

Thanks for taking the time to do this.



Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3

2012-09-06 Thread Michael Orlitzky
On 09/06/2012 06:16 AM, Kevin A. McGrail wrote:
 With no examples in corpora and good s/o's, i think mass check is likely
 to score the rule high which brings us back to the same point. I did
 consider that though.
 Regards,
 KAM

I admit my initial instinct was what Jari suggested, but I defer to your
expertise =)


 Jari Fredriksson ja...@iki.fi wrote:
 
 how about
 
 One RULE that will trigger and add score, if one or both of
 BILLION_DOLLARS and/US_DOLLARS_3 was hit. BILLION_DOLLARS and
 US_DOLLARS_3 would not have a score, only the resulting rule, which
 triggers separately if one of those is true.
 
 Those overlap arrangements seem like a kludge to me...


Re: Sensitivity of FILL_THIS_FORM_SHORT (score: 2.556)

2012-09-06 Thread Michael Orlitzky
On 09/05/2012 01:07 PM, John Hardin wrote:
 On Wed, 5 Sep 2012, Michael Orlitzky wrote:
 
 My recent logwatch reports show it hitting more ham than spam,
 
 If you could send me offline the rule hits for the hams it's hitting at 
 your site that would help. That should be an easy grep of your maillog.
 
 Its primary use is for metas (e.g. a short fill-in form plus mention of 
 millions of dollars _is_ reasonably suspicious); its apparent utility as a 
 standalone rule may be artificially emphasized in masschecks if the 
 masscheck corpus is deficient in ham that includes short fill-in forms.
 

I'll still send my logs, but I was wrong about which subrule was causing
trouble. It's,

  meta __FILL_THIS_FORM_SHORT ...
  (__FILL_THIS_FORM_PARTIAL  2 || __FILL_THIS_FORM_PARTIAL_RAW  2)

  body __FILL_THIS_FORM_PARTIAL
/^\s?FF_LNNO?FF_YOUR(?:FF_ALLANDOR?){1,3}FF_SUFFIX
(?:FF_BLANK1|(?:[-=_.,:;*\s]|=20){1,4}$)/im

In my previous message I mentioned that FF_YOUR and FF_SUFFIX always
match, so this is just,

  (?:FF_ALL){1,3}(?:FF_BLANK1|(?:[-=_.,:;*\s]|=20){1,4}$)

But those matches can show up anywhere in the body, not necessarily
adjacent to one another. I ran with -D on the message that brought this
to my attention, and this is what triggered it:

  Sep  6 11:58:36.169 [10858] dbg: rules: ran body rule
  __FILL_THIS_FORM_PARTIAL == got hit: Address:
  Sep  6 11:58:36.169 [10858] dbg: rules: [...] 

  Sep  6 11:58:36.170 [10858] dbg: rules: ran body rule
  __FILL_THIS_FORM_PARTIAL == got hit: Address:
  Sep  6 11:58:36.170 [10858] dbg: rules: [...] 

  Sep  6 11:58:36.189 [10858] dbg: rules: ran body rule
  __FILL_THIS_FORM_PARTIAL == got hit: Address:
  Sep  6 11:58:36.189 [10858] dbg: rules: [...] 

  Sep  6 11:58:36.191 [10858] dbg: rules: ran body rule
  __FILL_THIS_FORM_PARTIAL == got hit: Address:
  Sep  6 11:58:36.191 [10858] dbg: rules: [...] 


Just 2 mentions of an address, anywhere in the message.


Overlay between BILLION_DOLLARS and US_DOLLARS_3

2012-09-05 Thread Michael Orlitzky
These two rules seem to have significant overlap:

  BILLION_DOLLARS /[BM]ILLION DOLLAR/

and,

  US_DOLLARS_3 /(?:\$|usd).?\d{1,3}[,.]\d{3}[,.]\d{3}(?:[,.]\d   \d)?/i

will both match e.g.

  (a)Comprehensive General Liability insurance with a minimum
  combined single limit of not less than ONE MILLION DOLLARS
  ($1,000,000) for each occurrence.

which comes up frequently in contracts, insurance documents, EULAs, etc.
-- all of which then start out with a score of around 4.

Does it make sense to apply them both? Or should BILLION_DOLLARS just be
one of the US_DOLLARS patterns?


Sensitivity of FILL_THIS_FORM_SHORT (score: 2.556)

2012-09-05 Thread Michael Orlitzky
It looks to me like this score is much too high given how easy it is to
match. FILL_THIS_FORM_SHORT matches either __FILL_THIS_FORM_SHORT1 or
__FILL_THIS_FORM_SHORT2, and the second is more lenient:

  body __FILL_THIS_FORM_SHORT2
  /(?:FF_YOURFF_ALLFF_SUFFIX(?:FF_BLANK2|ANDOR)){3}/i

which contains...

  replace_tag FF_YOUR
  (?:a?\s?copy\sof\s)?
  (?:(?:your|din|seu)[\s,:]{1,5})?
  (?:present\s|c[uo]rrent\s|full(?:st[\xe4]ndigt)?\s?|complete\s|direct
   \s|private?\s|valid\s|personal\s|nuvarande\s|vollst[\xe4]ndige
   \s|aktuelle\s){0,3}

Optional group, optional group, and a match on zero occurences. The
entire thing is optional. So FILL_THIS_FORM_SHORT can be reduced to,

  /(?:FF_ALLFF_SUFFIX(?:FF_BLANK2|ANDOR)){3,}/i

First, let's look at FF_SUFFIX:

  FF_SUFFIX
  (?:\sin\s(?:full|words)|\scompleto)?:?(?:\s?[({][^)}]{1,30}[)}])?

Optional, optional. The whole thing is optional, so we can remove that,
too. All that's left is,

  /(?:FF_ALL(?:FF_BLANK2|ANDOR)){3,}/i

So all we're really matching is 3 or more occurrences of FF_ALL, and
that matches a lot of stuff. If I'm reading everything right, any
lengthy email is likely to hit it. My recent logwatch reports show it
hitting more ham than spam, which makes sense for something like
HTML_MESSAGE but not when it's scoring 2.5 points.


Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3

2012-09-05 Thread Michael Orlitzky
On 09/05/12 13:16, Kevin A. McGrail wrote:
 
 I think they both make sense since one checks for words and another 
 checks for numeric.
 
 We could discuss scoring though the S/O looks pretty good at

Agreed, it hits a lot more spam than ham here, too.


 
 I typically focus on score set 1 in my installations.  Which score set 
 are you using?
 

Same here.


 If you have Hams that hit this a lot, we might ask that you get involved 
 in our masscheck program to improve the scoring perhaps?

Nope, the only thing that looked suspicious to me was that both rules
would hit ONE MILLION DOLLARS ($1,000,000) for a score of ~4.

Individually, ONE MILLION DOLLARS should add some points, and so
should $1,000,000. But if one is just clarifying the other, there's
really only one hit, but it's getting scored twice.