Re: Warning: Your Pyzor may be broken.
On 2024-06-08 14:45:34, Bill Cole wrote: > I went looking for a better fix and found a reported issue at > https://github.com/SpamExperts/pyzor/issues/155 matching my original > symptoms in which a workaround was provided: install directly from > the GitHub project's master.zip link, i.e. a snapshot assembled from > the current state of the repo, which claims to be v1.1.1. I do not > like that solution at all, and added a comment to that issue > suggesting that they fix the problem by cutting a release for > PyPI. No response yet, but it has only been a matter of minutes. The same issue was reported in 2016 and ignored for eight years before being closed out of frustration (rather than because they did something about it): https://github.com/SpamExperts/pyzor/issues/54
Re: Dinged for .Date
On Mon, 2024-01-15 at 17:06 -0800, Cabel Sasser wrote: > > There are 1,239 gTLDs. The SpamAssassin source* blocks just *22* of them. > The official unofficial KAM ruleset blocks a few more, and there are plenty of third-party URIBLs that essentially block gTLDs through SA, albeit at one level of abstraction. > If you believe every new gTLD is garbage (and I get that!), why isn’t > SpamAssassin automatically dinging, say, 1,200+ of them? > > Or put another way, why _these_ 22, and _only_ these 22, and not the rest? Be careful what you wish for :P
Re: Dinged for .Date
On Mon, 2024-01-15 at 15:58 -0800, Cabel Sasser wrote: > > Can anyone help me understand “the science”? And how these domains are chosen > for such a heavy punishment? What you're facing is essentially an economic problem. Everyone knows dot-com, and to a lesser extent dot-net and dot-org. But everything else is junk: if you're the fifth guy to try to buy example.com, you're probably not who people are looking for when they type www.example.com into their web browsers. The other TLDs are also much harder for people to remember if they see it on a commercial. As a result, dot-info, dot- biz, and everything after have always been considered knock-offs. When the wave of new gTLDs hit, the value of each successive one became diluted even further. By the time you get to dot-date, you're at what should be, like, somebody's 40th choice for a domain name. How to you sell that? At a huge fucking discount, if you want anyone to buy it! That's one half of your economic problem. Now imagine you're trying to block spammers by domain name, and there's one particular set of domain names that they can get at a 90% discount because nobody wants them otherwise. Regardless of how many legitimate companies use those domains, the signal to noise ratio is going to be crap. So, the other half of your economic problem is: how much money does it cost me (as a recipient) to block dot-date, versus how much does it cost me to not block it? We have customers who complain about spam and customers who complain about blocked messages. It's a pretty easy calculation for a recipient to make, and the result for me at least is that it's less work (i.e. less expensive) to just block every new gTLD and whitelist the few legitimate senders brave enough to live there.
Re: 4.0.0 dnsbl_subtests.t test failures
On Wed, 2022-12-28 at 16:44 +0200, Henrik K wrote: > > Doesn't look too good for Gentoo packaging though, if since 2009 v310.pre > and newer have been full of all sorts of plugins loaded. It's like nobody > actually cared since most of the stuff is useful. :-) > Nobody noticed until now, and now it's getting fixed. The intersection of, 1. Gentoo users 2. People who run their own mail server 3. People who blindly run the default configuration on an important network-facing daemon is pretty small. And given that changing it is likely to generate a few complaints, compared to the contented silence regarding the existing behavior, you can maybe understand why no one has tried to proactively fix it when it wasn't broken.
Re: 4.0.0 dnsbl_subtests.t test failures
On Wed, 2022-12-28 at 16:20 +0200, Henrik K wrote: > > Common sense would ask that how is SPF harmful for the user? One would > think it would be actually desirable like any other network lookups, that > user might have accidentally left disabled? But sure, if this is the Gentoo > way, so be it. I had enough of 90's linux flashbacks trying it for the > first and last time today. :-) > Well, SPF wasn't nearly as reliable in 2005 as it is now, and it pulls in an extra dependency. Probably the best answer is that by having this ability, Gentoo attracts the sort of user who likes to disable such things to save disk space, shave off a few CPU cycles, or improve security. And then there's a feedback loop wherein most of our users want to retain the ability to control what gets installed/enabled.
Re: 4.0.0 dnsbl_subtests.t test failures
On Wed, 2022-12-28 at 15:38 +0200, Henrik K wrote: > > Disabling default plugins solves nothing, just creates a worse experience > for user. Educating and guiding users to use DNS properly does not require > this. Gentoo builds everything from source and allows the user to enable/disable some options for each package, called USE flags. In the context of a C program, you might have USE=spf which would translate to an additional dependency on libspf2 and passing ./configure --enable-spf at build time to enable that feature. These map less well to scripting languages where features are often enabled at runtime based on the existence of some optional package. In 2005, we had a flag for USE=spf in spamassassin that was supposed to control whether or not spamassassin used SPF. Without disabling the plugin, how would that work? If the user happens to install Mail::SPF as a dependency of something else and if the plugin is *not* disabled, spamassassin will (surprise!) start using SPF against the user's wishes. There's no reason for it today because there's no USE=spf flag for spamassassin, and it wasn't implemented very well back in 2005 (only certain plugins should have been disabled, and only conditionally). But the idea isn't as crazy as it first sounds.
Re: My 10 years old domain have a bad TLD
On Tue, 2021-05-04 at 08:28 +0200, Denis Chenu wrote: > Yes, > > You receive spam from pro and then all pro gTLD owner received a punishment. > > It's same for all gTLDS, like the old teachers who punish a whole school > class. > You're right, but as someone who blocks .pro I don't care anymore. I've wasted half my life fighting assholes who make money by wasting my time. To a few decimal points, 100% of the mail we get from .pro domains is spam. I don't care about right or wrong, I just want the spam to stop, and blocking all of .pro is the easiest way to do that. You can email postmaster@ to be whitelisted if you're legitimate.
Constructive solution to the blacklist thread
I'd like to offer a constructive solution to the blacklist/whitelist argument to the Apache foundation and Kevin in particular. There is opposition to this change on at least two fronts: * Philosophical: the change does nothing to address the underlying political problems. Black people are asking not to be murdered; changing "blacklist" to "blocklist" as the sole response is insulting and transparently virtue signaling. * Practical: the gesture costs the Apache foundation nothing, because the "gift" is paid for by the labor of the users who have to reconfigure their systems. Whether or not you agree with those bullet points, here's what I propose to address them... The Apache foundation has some cash laying around. Make whatever wording changes you like, but **at the same time**, donate a meaningful amount of money to a cause like the ACLU or the defense/medical funds for the protestors. This addresses the bullet points above: * The donation is of real value to the people who receive it, and addresses the underlying problem in that it helps the people who are themselves helping in more direct ways. * The donation is also of value to the donor, so cannot be considered a token gesture. This will not be free for users: we will all still have to reconfigure our systems. But if that "wasted" time actually helps the stated cause, then it's no longer wasted. Knowing that an hour in my text editor may have helped someone get out of jail or replace an eyeball shot out by a federal goon makes it much more palatable. In other words, people might still think it's stupid, but could be willing to suck it up if the Apache foundation puts its money where its mouth is. This surely won't please everyone, but it may be satisfactory to a majority of people on both sides. Also, it will stop the email threads.
Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave
On 2020-07-10 20:02, Luis E. Muñoz wrote: > > I keep hearing about this mythical people that get terribly offended by > the use of these words. I've been working in IT since the 90s, and I've > never actually seen one in real life. Do they really exist? > What black people are asking for is to not be murdered. The idea to change the word "blacklist" to "blocklist" instead as a consolation prize comes solely from rich white folks, and is itself condescending and offensive. As with "all lives matter," it's possible to have the best of intentions yet still come across as a patronizing douchebag.
Re: Spamhaus Technology contributions to SpamAssassin
On 7/3/19 5:43 AM, Riccardo Alfieri wrote: > > You can find all the needed files here: > https://github.com/spamhaus/spamassassin-dqs > Could I talk you into tagging a v0.0.1 release? That would make it easier for us to create a system package for the new plugin.
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 12/21/18 5:52 PM, Bill Cole wrote: Fine: #!/bin/sh cd `mktemp -d -t HappyMichael???` Yes, Merry Christmas =P
Re: sa-update is broken on updates.spamassassin.org channel [was: Re: config: warning: description exists for non-existent rule EXCUSE_24]
On 12/20/18 7:00 PM, Bill Cole wrote: mkdir /tmp/saupdate-1849156 Never use a fixed path under /tmp =)
Re: spamd Will Not Create unix:socket
On 11/27/2017 10:34 PM, Colony.three wrote: >> ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin >> >> There's a root exploit for the "spamd" user in that last line. Assuming >> you got the tmpfiles.d thing working, you should delete those >> ExecStartPre commands. > > Can you explain further please? > > If this is true, someone should tell Red Hat that their > /usr/lib/systemd/system/spamass-milter-root.service has the same problem. > The "chown" command follows both symlinks and hardlinks by default. When used with the "-R" flag, it only follows hardlinks, but that can still be abused by the "spamd" user. The first time "chown -R" gets executed, you give ownership of /run/spamassassin to the "spamd" user. The second (and third, ...) time that the service is started, the "spamd" user owns that directory and can place a hard link in it pointing to a root-owned file. The "chown" call will then give root's file to the "spamd" user. The exploit is trickier in this case because /run is on a tmpfs, and because hard links can't cross filesystem boundaries. But I would bet that you have something else sensitive in /run that can be used to gain root.
Re: spamd Will Not Create unix:socket
On 11/27/2017 11:53 AM, Colony.three wrote: > > It simply would not create /run/spamassassin directory on boot. It is > supposed to create it automatically like clamd does, since /run is wiped > at each boot. To make it work I finally had to add: > ExecStartPre=/usr/bin/mkdir /run/spamassassin > ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin > There's a root exploit for the "spamd" user in that last line. Assuming you got the tmpfiles.d thing working, you should delete those ExecStartPre commands.
Re: Oracle Eloqua.com marketing emails
On 10/22/2017 09:31 AM, David Jones wrote: > > You hard-coded the IPs based on their current SPF record? What if > things change and they start sending out different servers/IPs? If they add IPs, then either, a) I never know because we don't get spam from them -- great. b) We get spam from them, and I track down and block the new IPs. If they release some of their IPs on the market, whoever buys them will have to complain (our postmaster address is in the rejection message).
Re: Oracle Eloqua.com marketing emails
On 10/21/2017 11:23 AM, David Jones wrote: > Anyone have any experience with eloqua.com marketing emails and handle > these with custom local rules? We blocked some of their space back in 2013 with no complaints, and thanks to their SPF record, just blocked a bunch more.
Re: apache.org have URIBL_BLOCKED now :/
On 08/08/2017 02:32 PM, Benny Pedersen wrote: > subj might concern infra staff > > forward please to infra > URIBL_BLOCKED means that the URIBL refused your DNS query: http://uribl.com/refused.shtml The name "apache.org" isn't blacklisted, and there's nothing apache can do to fix it. You need to make your DNS queries from somewhere else, probably.
Re: Uninitialized values in URIDNSBL
On 02/08/2017 02:08 PM, Kevin A. McGrail wrote: > On 2/8/2017 1:22 PM, Philip Prindeville wrote: >> While we’re waiting for that, can I just grab Util.pm and >> Plugin/URIDNSBL.pm out of trunk, or are there more dependencies than >> that to splice the fix back into 3.4.1? > I wouldn't be able to say. EIther custom patch or run trunk would be my > recommendation. > I posted a custom patch to our Gentoo bug at https://590338.bugs.gentoo.org/attachment.cgi?id=452626 But as the warning in the comment states: * I don't know perl. * I haven't even tried it. Give it a try if you're desperate =)
Re: Legit Yahoo mail servers list
On 01/26/2017 02:53 PM, David Jones wrote: > > I understand what their SPF record means and how it works > but what they are publishing in their SPF record is not common. > Normally this would expand out to a list of IPs and CIDRs or DNS > records that can be turned into IPs that postwhite can use to build > a list for bypassing RBL checks. > Are the problematic RBL checks performed by Postfix, or by SpamAssassin? The possibilities for whitelisting in SpamAssassin are a lot more flexible, so if I were you, I would tweak postscreen (or my smtpd restrictions) to the point where it causes no false positives. Then SpamAssassin can be configured to do the same level of RBL checks that are occasionally causing false positives now. The double lookups aren't expensive because they're cached locally. And the false positives are easy to deal with in SA, where for example you have access to the result of SPF. If you can get it to the point where SA is the one blocking Yahoo, then all you have to do is add a meta rule that subtracts a few points when the sender's domain belongs to Yahoo and the SPF_PASS rule hits.
Re: Legit Yahoo mail servers list
On 01/26/2017 01:29 PM, Reindl Harald wrote: > > SPF_NEUTRAL will NEVER hit SPF_PASS and that's the problem with ?all > SPF mechanisms are evaluated in order, and each one has a result type associated with it. The default result is "+" for "pass". Another type of result is "?" for "neutral." The record, v=spf1 ptr:yahoo.com ptr:yahoo.net ?all is equivalent to v=spf1 +ptr:yahoo.com +ptr:yahoo.net ?all and it means a) PASS if "ptr:yahoo.com" matches b) PASS if "ptr:yahoo.net" matches c) NEUTRAL if "all" matches
Re: Legit Yahoo mail servers list
On 01/26/2017 12:59 PM, Reindl Harald wrote: > > > Am 26.01.2017 um 18:51 schrieb Michael Orlitzky: >> On 01/26/2017 12:22 PM, David Jones wrote: >>> ... >>> They don't publish a good SPF record so I am not able to add >>> them to my postwhite list. >>> >> >> Isn't that what their SPF record does? > > did you notice the "?all" > re-read your spf manuals > The OP is looking for a way to whitelist so the "?all" is irrelevant. Does the sending IP pass the SPF check? If so, whitelist it.
Re: Legit Yahoo mail servers list
On 01/26/2017 12:22 PM, David Jones wrote: > Anyone know how to get a list of legit mail servers for Yahoo? > They don't publish a good SPF record so I am not able to add > them to my postwhite list. > > # dig yahoo.com txt +short > "v=spf1 redirect=_spf.mail.yahoo.com" > # dig _spf.mail.yahoo.com txt +short > "v=spf1 ptr:yahoo.com ptr:yahoo.net ?all" > > The only way I can think of even coming close is to analyse > my mail logs for clean mail IPs with PTR values ending in > yahoo.com and yahoo.net. Isn't that what their SPF record does?
Re: T_DKIM_INVALID from yahoo.com
On 12/24/2016 11:05 AM, Ian Zimmerman wrote: > All mail I get from yahoo customers [1] scores on T_DKIM_INVALID, and > always has. Why? > Is there any correlation between the DKIM result and the size of the message?
Re: Matching infinite sets
On 08/22/2016 09:02 AM, Joe Quinn wrote: > On 8/22/2016 8:54 AM, Michael Orlitzky wrote: >> On 08/21/2016 03:22 PM, Damian wrote: >>> There is no such set B, as it would contain itself. >> The empty set contains itself. > That's an easy mistake to make. The empty set is {}, the set that > contains only the empty set is {{}}. Sets are discrete elements that > don't get "flattened". > > In perl syntactic lists do get flattened though, which leads to some fun > times. You can do silly things like @concatenated = (@listOne, @listTwo). "Contains" in the context of sets means "is a superset of" =) (I'm just being pedantic, I don't actually have a point.)
Re: Matching infinite sets
On 08/21/2016 03:22 PM, Damian wrote: >> > There is no such set B, as it would contain itself. The empty set contains itself.
Re: Disabling spamcop plugin
On 04/13/2016 09:50 AM, Reindl Harald wrote: > > enough problems by wasting time if you have to maintain 10, 20, 30 or > more servers and in case of problems need fast downgrades - especially > if you run virtual machines where all the compile jobs share hardware emerge --buildpkg will create a binary package that you can instantly downgrade to with emerge --usepkg > besides that on a production server no compilers should be installed at > all - the generation of malware which compiles itself is only a question > of time I'm not convinced that an attacker who can execute commands on your server is more dangerous when one of those commands is `gcc`. > > what gentoo would need to solve for professional environemnts is that > you have one machine which pulls the updates, compiles them and apckage > them in a way all other machines in the network can pull and apply them > in precompiled from over ftp, http or whatever network protocol > As you wish: https://wiki.gentoo.org/wiki/Binary_package_guide
Re: [OT] still configuring [Was: Disabling spamcop plugin]
On 04/13/2016 01:26 AM, Ian Zimmerman wrote: > On 2016-04-12 10:57 -0400, David Niklas wrote: > >> You could use Gentoo, you get to configure it all yourself! > > Funny you'd say that, I _am_ actually switching to it - on my > "workstation" role computers. I'm already over 50% over the hump, I > think. > > But on "server type" computers, I just cannot spare a dedicated security > branch. I really don't have the time, and more importantly the nerves, > to scramble and recompile the world when each new vulnerability is > announced. > This shouldn't be worse on Gentoo than it is anywhere else. We have a mailing list, gentoo-announce [0], where security advisories get sent. But, they only get sent out once the vulnerability has been fixed and marked stable /everywhere/, so they often come a little late. Nevertheless, security issues are fixed ASAP: 1. Some vulnerability is found. 2. The security team opens a bug, and contacts the maintainer of the affected package. 3. A fix is committed to the tree. 4. The arch teams scramble to stabilize the version with the fix. 5. The announcement is sent out. As long as you follow a semi-regular update cycle, you shouldn't have to do anything special, even if you run a stable system. The affected package will be recompiled automatically as part of the updates. Any packages *depending on* that package (like, if they're statically linked to it) will also be recompiled. No need to recompile @world. [0] https://www.gentoo.org/get-involved/mailing-lists/
Re: Rejecting without backscatter (was Re: Spamassassin not catching spam (Follow-up))
On 03/26/2015 08:43 AM, David F. Skoll wrote: On Thu, 26 Mar 2015 12:09:58 +0100 Reindl Harald h.rei...@thelounge.net wrote: why in the world would a reject *before queue* trigger a backscatter or bounce on my side? How do you do before-queue rejection of a message that is... 1) Directed to multiple recipients... 2) Some of which have different spam thresholds or have even opted-out? Solve that problem, and then I agree with you. And saying well, don't let different end-users have different settings is not a solution. Neither is tempfail all recipients but the first so the message is transmitted one time for each recipient. If one of your customer domains has non-default settings, give them their own IP address and a separate MX record pointing to that address. Then if a multi-recipient message is addressed to someone in that domain, the sending MTA will split the message before sending it (because it's headed to a different server, as far as the MTA knows). Your pre-queue filter can then switch settings depending on the IP address, and should satisfy your criteria above. Obviously it's a little annoying to set up an MX for every such domain, but you can charge a little PITA fee for domains that want special treatment.
Re: PayPal spam filter?
On 06/16/2013 06:48 PM, Jason Haar wrote: Just a FYI but SA scores failures of ~all much stronger than it does for -all eg I just deliberately forged an email for my own domain and SA picked up the SPF hard failure and added 0.0 to the final score :-( The logic of the score is well documented, just shows how much SPF doesn't work http://spamassassin.1065346.n5.nabble.com/default-score-for-SPF-HELO-FAIL-too-low-td13894.html The reasoning is sound. Softfail has a better ham/spam ratio than hardfail. Which is beside the point -- SPF is not a spam filtering mechanism. It prevents HELO/MAIL FROM forgery. If you don't want to accept forgeries (this is independent of what you want to do with spam), reject the hardfails.
Re: .pw / Palau URL domains in spam
(replying randomly in the thread) We've been getting complaints about these, so while I don't like to target a TLD indiscriminately, I think I'd like to add a few points to mail from *.pw for a couple of months until things clear up. What's the correct way to do this? A regexp on the from/return-path headers? Or is something built-in?
Re: FROM_MISSP_* causing FPs
On 11/29/2012 05:43 PM, John Hardin wrote: On Thu, 29 Nov 2012, Kris Deugau wrote: I've just had another couple of reports of false positives due to hits on one or more of the FROM_MISSP_* rules. Curious coincidence: Almost all of the reports to date have involved webform email for real estate companies. Most of the rest have involved scan-to-email multifunction devices - mostly Xerox used by real estate companies. O_o Is there any possibility of getting user agent headers for these FPs? If a particular piece of legit software always does this then obviously those rules should ignore such messages. I had one guy actually read the rejection message and contact postmaster@ about this. His sig shows: Sent from my MOTOROLA ATRIX™ 2 on ATT And the headers: X-Spam-Flag: NO X-Spam-Score: 4.224 X-Spam-Level: X-Spam-Status: No, score=4.224 required=5 tests=[FREEMAIL_FROM=0.001, FROM_MISSP_EH_MATCH=2.499, FROM_MISSP_FREEMAIL=1.723, HTML_MESSAGE=0.001] autolearn=disabled From: u...@example.comu...@example.com X-Mailer: Motorola android mail 1.0 It was relayed through AOL, who you think would clean that up. This particular model also base64 encodes the entire message...
Re: Claims manager / LOTTO_AGENT
On 11/08/2012 10:44 AM, John Hardin wrote: This is a client of ours (a law firm) and not the company that I work for. *I* know there's probably nothing sensitive in there, but just to cover my ass I'd need to get permission to send the results off-site. Only the list of rules which hit is publicly visible, the actual content of the message is not. Any leakage of confidential information is very unlikely. I know, but there chance isn't zero. For example, I wouldn't want to mass-check a corpus of emails to my girlfriend, and have it report that they hit LOTS_OF_VIAGRA. Likewise, things like LOTTO_AGENT can reveal that someone communicated with a claims manager. I've explained both sides, and as long as it's a non-zero chance, they aren't having it. It isn't even that there's a risk of leaking anything -- the fact that anything at all is sent could be used as justification for a pain-in-the-ass investigation that nobody wants. From their perspective, it's just simpler to say no: it's not worth the time or effort to even think about if there's a minute chance of it coming back to bite them legally. I will take a look at claims manager in the 419 rules. I appreciate it, thanks.
Claims manager / LOTTO_AGENT
So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points. This is bad news for, Barbara R. Krieg, Claims Manager Foodliner, Inc. / Quest Liner / Truck Country P.O. Box 1565 Dubuque,IA who has a signature at the bottom of her messages. This is compounded by the fact that ADVANCE_FEE_2_NEW_MONEY = __ADVANCE_FEE_2_NEW_MONEY ... __ADVANCE_FEE_2_NEW_MONEY = LOTS_OF_MONEY __ADVANCE_FEE_2_NEW __ADVANCE_FEE_2_NEW = (__AFRICAN_STATE + ... + LOTTO_AGENT + ... 1) for a total score of around 7.8. Believe it or not, claims managers talk about LOTS_OF_MONEY =) Can one of these be made a little more strict? Sorry to be a pain and submit these one at a time, but most of the ones that give me trouble are confidential.
Re: Claims manager / LOTTO_AGENT
On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points. This is bad news for, Barbara R. Krieg, Claims... When you put a string an an email that hits a spamassassin rule... your email then hits that spamassassin rule. You should generally try to avoid that. Yeah, well it's her job title, so...? You misunderstand statistics. The data aren't wrong.
Re: Claims manager / LOTTO_AGENT
On 11/07/2012 10:12 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: Yeah, well it's her job title, so...? You misunderstand statistics. The data aren't wrong. Do I? I think it's more likely that you misunderstand what is expected of spamassassin rules. Sorry, I was a little rude. But saying that she shouldn't put her job title anywhere in an email, ever, is ridiculous. The inputs (spam, ham) to the classifier are assumed god-given; and the classification needs to reflect the data, not the other way around. Somebody really should put up a page in the wiki explaining that rules all have false positives, and that's the entire reason we don't flag an email as spam for any one rule, etc.. Sure, that's why I pointed out that LOTTO_AGENT also helps trigger ADVANCE_FEE_2_NEW_MONEY, and combined they score 7.8. But if you provide us with more masscheck data, we can do a better job of automatically calculating ideal scores. This is my fault, of course, but I'm not allowed to mass-check this stuff. It's ongoing legal correspondence.
Re: Claims manager / LOTTO_AGENT
On 11/07/2012 10:21 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points. This is bad news for, Barbara R. Krieg, Claims... When you put a string an an email that hits a spamassassin rule... your email then hits that spamassassin rule. You should generally try to avoid that. Yeah, well it's her job title, so...? You misunderstand statistics. The data aren't wrong. After re-reading, I think you may have misunderstood my suggestion to avoid putting stuff in emails that is known to hit spam rules. I wasn't suggesting that Barbara R. Krieg change her signature, I was suggesting that you not include it intact when posting to this mailing list about it. I see. My apologies. Disregard the first half of that last message.
Re: Claims manager / LOTTO_AGENT
On 11/07/2012 10:36 PM, dar...@chaosreigns.com wrote: On 11/07, Michael Orlitzky wrote: Sorry, I was a little rude. But saying that she shouldn't put her job title anywhere in an email, ever, is ridiculous. Certainly. The inputs (spam, ham) to the classifier are assumed god-given; and the classification needs to reflect the data, not the other way around. If the classifier is spamassassin, and The inputs are the spam and ham data provided via masscheck, then... the scores provided via sa-update *do* reflect the data. So I'm not sure what you mean. The ideal rule scores are chosen to cause one false positive (ham flagged as spam) in every 2,500 hams, while maximizing the number of spams correctly flagged as spams. With so few hams hitting this rule in the masscheck corpora, we're way below that threshold based on the data we have. I wrote that before I saw your clarification, sorry again for coming off as a jerk. Ignore it. This is my fault, of course, but I'm not allowed to mass-check this stuff. It's ongoing legal correspondence. Er, what? You're not allowed to provide a list of which rules hit each of your emails? Or you're not allowed to run a program on your emails that isn't spamassassin? Or did I just not put This does not require sending us your email in bold enough times on the masscheck page? This is a client of ours (a law firm) and not the company that I work for. *I* know there's probably nothing sensitive in there, but just to cover my ass I'd need to get permission to send the results off-site. From their perspective, it's just simpler to say no: it's not worth the time or effort to even think about if there's a minute chance of it coming back to bite them legally.
Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3
On 09/07/2012 02:36 PM, Kevin A. McGrail wrote: On 9/6/2012 11:32 AM, Michael Orlitzky wrote: On 09/06/2012 06:16 AM, Kevin A. McGrail wrote: With no examples in corpora and good s/o's, i think mass check is likely to score the rule high which brings us back to the same point. I did consider that though. Regards, KAM I admit my initial instinct was what Jari suggested, but I defer to your expertise =) Let's see what masscheck shows: svn commit -m 'Added overlap meta rule for BILLION_DOLLARS and US_DOLLARS_3' rulesrc Adding rulesrc/sandbox/kmcgrail/20_kam.cf Transmitting file data . Committed revision 1382118. Thanks for taking the time to do this.
Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3
On 09/06/2012 06:16 AM, Kevin A. McGrail wrote: With no examples in corpora and good s/o's, i think mass check is likely to score the rule high which brings us back to the same point. I did consider that though. Regards, KAM I admit my initial instinct was what Jari suggested, but I defer to your expertise =) Jari Fredriksson ja...@iki.fi wrote: how about One RULE that will trigger and add score, if one or both of BILLION_DOLLARS and/US_DOLLARS_3 was hit. BILLION_DOLLARS and US_DOLLARS_3 would not have a score, only the resulting rule, which triggers separately if one of those is true. Those overlap arrangements seem like a kludge to me...
Re: Sensitivity of FILL_THIS_FORM_SHORT (score: 2.556)
On 09/05/2012 01:07 PM, John Hardin wrote: On Wed, 5 Sep 2012, Michael Orlitzky wrote: My recent logwatch reports show it hitting more ham than spam, If you could send me offline the rule hits for the hams it's hitting at your site that would help. That should be an easy grep of your maillog. Its primary use is for metas (e.g. a short fill-in form plus mention of millions of dollars _is_ reasonably suspicious); its apparent utility as a standalone rule may be artificially emphasized in masschecks if the masscheck corpus is deficient in ham that includes short fill-in forms. I'll still send my logs, but I was wrong about which subrule was causing trouble. It's, meta __FILL_THIS_FORM_SHORT ... (__FILL_THIS_FORM_PARTIAL 2 || __FILL_THIS_FORM_PARTIAL_RAW 2) body __FILL_THIS_FORM_PARTIAL /^\s?FF_LNNO?FF_YOUR(?:FF_ALLANDOR?){1,3}FF_SUFFIX (?:FF_BLANK1|(?:[-=_.,:;*\s]|=20){1,4}$)/im In my previous message I mentioned that FF_YOUR and FF_SUFFIX always match, so this is just, (?:FF_ALL){1,3}(?:FF_BLANK1|(?:[-=_.,:;*\s]|=20){1,4}$) But those matches can show up anywhere in the body, not necessarily adjacent to one another. I ran with -D on the message that brought this to my attention, and this is what triggered it: Sep 6 11:58:36.169 [10858] dbg: rules: ran body rule __FILL_THIS_FORM_PARTIAL == got hit: Address: Sep 6 11:58:36.169 [10858] dbg: rules: [...] Sep 6 11:58:36.170 [10858] dbg: rules: ran body rule __FILL_THIS_FORM_PARTIAL == got hit: Address: Sep 6 11:58:36.170 [10858] dbg: rules: [...] Sep 6 11:58:36.189 [10858] dbg: rules: ran body rule __FILL_THIS_FORM_PARTIAL == got hit: Address: Sep 6 11:58:36.189 [10858] dbg: rules: [...] Sep 6 11:58:36.191 [10858] dbg: rules: ran body rule __FILL_THIS_FORM_PARTIAL == got hit: Address: Sep 6 11:58:36.191 [10858] dbg: rules: [...] Just 2 mentions of an address, anywhere in the message.
Overlay between BILLION_DOLLARS and US_DOLLARS_3
These two rules seem to have significant overlap: BILLION_DOLLARS /[BM]ILLION DOLLAR/ and, US_DOLLARS_3 /(?:\$|usd).?\d{1,3}[,.]\d{3}[,.]\d{3}(?:[,.]\d \d)?/i will both match e.g. (a)Comprehensive General Liability insurance with a minimum combined single limit of not less than ONE MILLION DOLLARS ($1,000,000) for each occurrence. which comes up frequently in contracts, insurance documents, EULAs, etc. -- all of which then start out with a score of around 4. Does it make sense to apply them both? Or should BILLION_DOLLARS just be one of the US_DOLLARS patterns?
Sensitivity of FILL_THIS_FORM_SHORT (score: 2.556)
It looks to me like this score is much too high given how easy it is to match. FILL_THIS_FORM_SHORT matches either __FILL_THIS_FORM_SHORT1 or __FILL_THIS_FORM_SHORT2, and the second is more lenient: body __FILL_THIS_FORM_SHORT2 /(?:FF_YOURFF_ALLFF_SUFFIX(?:FF_BLANK2|ANDOR)){3}/i which contains... replace_tag FF_YOUR (?:a?\s?copy\sof\s)? (?:(?:your|din|seu)[\s,:]{1,5})? (?:present\s|c[uo]rrent\s|full(?:st[\xe4]ndigt)?\s?|complete\s|direct \s|private?\s|valid\s|personal\s|nuvarande\s|vollst[\xe4]ndige \s|aktuelle\s){0,3} Optional group, optional group, and a match on zero occurences. The entire thing is optional. So FILL_THIS_FORM_SHORT can be reduced to, /(?:FF_ALLFF_SUFFIX(?:FF_BLANK2|ANDOR)){3,}/i First, let's look at FF_SUFFIX: FF_SUFFIX (?:\sin\s(?:full|words)|\scompleto)?:?(?:\s?[({][^)}]{1,30}[)}])? Optional, optional. The whole thing is optional, so we can remove that, too. All that's left is, /(?:FF_ALL(?:FF_BLANK2|ANDOR)){3,}/i So all we're really matching is 3 or more occurrences of FF_ALL, and that matches a lot of stuff. If I'm reading everything right, any lengthy email is likely to hit it. My recent logwatch reports show it hitting more ham than spam, which makes sense for something like HTML_MESSAGE but not when it's scoring 2.5 points.
Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3
On 09/05/12 13:16, Kevin A. McGrail wrote: I think they both make sense since one checks for words and another checks for numeric. We could discuss scoring though the S/O looks pretty good at Agreed, it hits a lot more spam than ham here, too. I typically focus on score set 1 in my installations. Which score set are you using? Same here. If you have Hams that hit this a lot, we might ask that you get involved in our masscheck program to improve the scoring perhaps? Nope, the only thing that looked suspicious to me was that both rules would hit ONE MILLION DOLLARS ($1,000,000) for a score of ~4. Individually, ONE MILLION DOLLARS should add some points, and so should $1,000,000. But if one is just clarifying the other, there's really only one hit, but it's getting scored twice.