On 2014-07-24 16:11, Philip Prindeville wrote:
You might have a shorter wait if you move to CentOS 6.5 instead.
I would, but the VPS software I'm using does not run on CentOS 6.x, only
5.x. It's rather old software and I should convert to something else,
but it's not worth the time I don't
On 2014-07-02 15:04, Amir Caspi wrote:
For what it's worth, I just received a spam that basically is the same
as what Philip complained about. I've posted a spample here:
http://pastebin.com/Y2YGwL49
[...]
I'm wondering if we shouldn't write a rule looking for lots of
#x0[0-9]{3};
On 2014-07-23 12:23, Paul Stead wrote:
I've also implemented several rules to try and catch these types of emails.
Care to share? Counting encoded chars is easy, of course.
One thing to note, webmail and my MUA often will render the encoded
characters in their translated format, not
On 2014-07-23 13:14, Axb wrote:
doesn't your VPS offer you shell access?
if yes, uninstall the SA rpm stuff and install SA 3.4 from
source/trunk.
I think I didn't explain properly. I'm running the dedicated server on
which there is VPS software. I need RPMs so that they get distributed
to
On 2014-07-23 13:38, Axb wrote:
If you're using spamd, why not run a/multiple dedicated VMs for SA 3.4
and have your other VMs use the spamd on the SA VMs ?
There is a dedicated spamd. It's the other tools that need to be
distributed, like sa-learn. Bayes rules are handled per-user. (No,
On Thu, February 20, 2014 12:57 pm, John Hardin wrote:
0 messages examined generally means either the format isn't what
sa-learn expected, or the message is larger than the size limit.
The file format is most certainly MBOX... it was created by my MUA, and
running file on it tells me that it is
On Thu, February 20, 2014 2:39 pm, Axb wrote:
what's wrong with installing from source?
I run a virtual-hosting server where the individual site RPMs are copied
from server-level RPMs. Basically all software has to be installed as RPMs
in order to propagate to the individual virtual hosts.
---
On Thu, February 20, 2014 2:49 pm, Benny Pedersen wrote:
On 2014-02-20 22:39, Kevin A. McGrail wrote:
--max-size= I believe. Default is 256K.
sa-learn barfs, that flag is not accepted. That flag works for spamc, but
not for sa-learn. sa-learn man page and CLI help don't have any mention
of a
On Thu, February 20, 2014 3:16 pm, Kevin A. McGrail wrote:
Are you using 3.4.0? I believe the size was hard-coded until then when
the max-size option was added to sa-learn.
No, as mentioned previously in this flurry of emails, I'm using 3.3.2.
However, note that using spamassassin directly
On Thu, February 20, 2014 3:52 pm, Kevin A. McGrail wrote:
Questions that will be answered by that is solved in 3.4.0 aren't
really going to get much support from me...
Understood, though it'll be a while before I can upgrade to 3.4 due to the
RPM issue that I've mentioned previously. However,
On Thu, February 20, 2014 4:08 pm, Kevin A. McGrail wrote:
Probably best if you install 3.4.0 (or even trunk) on a test system and
throw the offending email onto that server and run sa-learn on that box
with -D.
In the meantime, anyone want to do it on my behalf? =) I provided the
mbox link
On Thu, February 20, 2014 5:13 pm, Kevin A. McGrail wrote:
Resend the mbox.link and I will likely have a cycle to throw it through.
https://www.dropbox.com/s/m4fuv670wnvwa16/SA_testspam.mbox
To be deleted in 24-48 hours (don't want spammers harvesting it).
If you have a chance, please run it
Hi all,
Occasionally, I will receive an FN that is autolearned as ham. Normally,
I dump it into my spam folder and sa-learn that as spam, so it should be
forgotten as ham and relearned as spam, and all is well with the world
(except for actually getting the spam, of course).
Today, I was
On Wed, January 29, 2014 11:34 am, John Hardin wrote:
There is already a style gibberish rule.
http://ruleqa.spamassassin.org/20140128-r1562007-n/STYLE_GIBBERISH/detail
I haven't seen STYLE_GIBBERISH hitting on very much in the last month...
some hits, but it's missing a bunch of stuff,
On Thu, January 9, 2014 6:20 pm, Karsten Bräckelmann wrote:
Even the most effective results I have ever seen on a non-personal
attack is merely getting the Bayes classification to a neutral. And that
was not a regular text token, but includes mail headers. And a biased
Bayes database towards
On Thu, January 9, 2014 9:46 pm, Karsten Bräckelmann wrote:
Unfortunately, well, for the scumbags, the shorter it gets, the less
likely it is to be understood. Fallen for. Or even understood to be
actual language.
Well, not really true, because of the rising resurgence of spammers using
On Mon, December 23, 2013 5:39 pm, Amir 'CG' Caspi wrote:
I did check... unless I'm completely blind, I don't see it. I'm basically
looking for something like:
header EMAIL_IN_SUBJ Subject =~
/[A-Za-z0-9.-_+]+@(?:[A-Za-z0-9]+\.)+[A-Za-z]{3,5}/
i.e. something that will match a valid email
On Sat, December 28, 2013 7:57 pm, John Hardin wrote:
And in case you actually did mean From: rather than recipient address...
Sorry, no, I meant To: as you surmised.
Unfortunately __SUBJ_HAS_TO_1 isn't performing well enough against the
current masscheck corpora to be published. It's
Hi all,
Over the past couple of days I've been getting slammed with spams that
have subjects of the form:
Subject: u...@host.com spammy subject line
That is, the recipient's email address is included explicitly in angular
brackets at the beginning of the subject line. These are new and varied
On Mon, December 23, 2013 3:06 pm, Axb wrote:
To save you time, grep the pattern you're looking for thru the rules
directories.
If it's not there check in SVN's trunk to see if such a rule indeed
exists and if it's getting auto-promoted and what score it gets.
I did check... unless I'm
On Fri, December 6, 2013 1:23 pm, Marcio Humpris wrote:
But how to make it catch just an email with ONLY 2 words in BODY? Not
match empty message?
I strongly recommend http://www.regular-expressions.info to learn how
regexps work. If you want to catch _exactly_ two words then you'd do
On Fri, December 6, 2013 1:25 pm, Marcio Humpris wrote:
how can I do something that catches in subject of an email 225
spreadsheets for download and variations? such as xx to xxx
spreadshets,
header DL_SPREADSHEETS /^\d+ spreadsheets for download$/
Omit for download if you want. Omit the ^
On Tue, December 3, 2013 4:18 pm, Geoff Soper wrote:
Are other people also suffering from these?
Yes. I think there are regexps that would work but I haven't had a chance
to test any yet, to add them to my growing list of spammy template
regexps. I'll try to work something up next week.
---
On Fri, November 8, 2013 2:39 pm, Sergio Durigan Junior wrote:
I don't think sa-learn can help with spamd. Its own manpage mention
that, for spamd users, spamc -L is the way to go.
Hm, really? I thought spamd kept a global Bayes database, and that
everyone calling spamc -L would end up
On Fri, November 8, 2013 2:56 pm, Sergio Durigan Junior wrote:
The problem with having a user-tailored database is that I will have to
run sa-update for every user, right?
No, or at least, not that I've seen. If spamd is running as root, it will
load the sa-update rules from the root
On Fri, November 8, 2013 3:24 pm, Karsten Bräckelmann wrote:
The latter is incorrect -- spamc by default sends the effective user ID,
and spamd switches users before processing the mail (assuming the daemon
has been started as root). The -u user option is only necessary to
change that default.
are below.
Thanks.
--- Amir
At 2:10 PM -0600 08/10/2013, Amir 'CG' Caspi wrote:
At 12:42 PM -0600 06/30/2013, Amir 'CG' Caspi wrote:
Hi all,
Just got this spam:
http://pastebin.com/KM5paaZ9
To me, it looks like LONGWORDS should have hit
At 3:39 PM -0600 07/31/2013, Amir 'CG' Caspi wrote:
At 3:23 AM +0200 07/25/2013, Karsten Bräckelmann wrote:
header LOCALPART_IN_SUBJECTeval:check_for_to_in_subject('user')
And all of them do hit that rule. A super-set of the ADDRESS variant,
using the local part instead of the complete
At 1:43 PM +0100 08/24/2013, RW wrote:
LONGWORDS is a body rule, i.e. it runs on a normalized version of the
Gah, THAT'S why it wasn't working? I feel like an idiot now. =P
--- Amir
At 1:41 PM -0600 08/10/2013, Amir 'CG' Caspi wrote:
(The HTML comment gibberish rule would be a big step here, since
that's one of the few things that would distinguish this from ham...
unlikely that a real person would embed tens of KB of comment
gibberish.)
OK, I'm trying to test an HTML
At 2:22 AM -0600 08/11/2013, Amir 'CG' Caspi wrote:
My regex is valid and appropriate for those comments... I tested it
at regexpal.com, which shows that all three comments match just fine
(all three get highlighted).
So... why is SA hitting only on the final comment, and ignoring the first
At 9:31 PM -0400 08/11/2013, Alex wrote:
Can you post this rule again so we can investigate?
# HTML comment gibberish
# Looks for sequence of 100 or more words (alphanum + punct
separated by whitespace) within HTML comment
rawbody HTML_COMMENT_GIBBERISH
At 6:56 PM -0700 08/11/2013, John Hardin wrote:
I'm also going to make FP-avoidance changes that should also help.
Care to share? =)
Just make sure that the rule does not match the -- comment-end token
I tried doing that and it caused SA to hang... couldn't figure out
why the regex wasn't
At 7:20 PM -0700 08/11/2013, John Hardin wrote:
The unbounded matches you're using probably caused the RE engine to
get stuck backing off and retrying.
That's what I figured. That's why I changed things to the current
version, which is bounded by the end-tag of the comment. My
current
At 8:23 PM -0700 08/11/2013, John Hardin wrote:
However, I may be taking too-conservative a stance here. It's
possible that, while HTML comments can appear in ham, *long* HTML
comments won't, and the fact that we're looking for long blocks of
comment text is enough safety.
That's why
At 10:41 AM -0700 08/09/2013, John Hardin wrote:
Can you provide a spample or two?
Sure.
http://pastebin.com/VfSCB7fw
http://pastebin.com/VCtvzjzV
Note the outl and outi links near the very bottom. The actual
domains used in these URIs vary... they used to be .pw, but recently
most have
At 10:41 AM -0700 08/09/2013, John Hardin wrote:
Can you provide a spample or two?
Looks like a similar spam method has come out in recent weeks (since
Jul 30, it seems) that uses slightly different footers... example is
here:
http://pastebin.com/QCmSPzwG
Although running SA on this spam
At 12:42 PM -0600 06/30/2013, Amir 'CG' Caspi wrote:
Hi all,
Just got this spam:
http://pastebin.com/KM5paaZ9
To me, it looks like LONGWORDS should have hit... but it didn't. I
ran it manually through spamassassin and spamc, and LONGWORDS still
didn't hit, so it seems to just
At 2:17 PM -0700 08/10/2013, John Hardin wrote:
Perhaps it's time to bring FuzzyOCR up-to-date...?
Is this something I need to manually update or something that needs
updating in the SA distribution?
Thanks.
--- Amir
Hi all,
A number of my users have been receiving spam formatted in a
very specific way which seems to very often miss Bayes... I don't
know why, whether it's because of the HTML gibberish flooding Bayes
with useless tokens (to reduce the relative strength of the spammy
tokens), or if it's
On Fri, August 9, 2013 1:01 pm, RW wrote:
BAYES works on rendered text it doesn't see the HTML.
Hmmm. It doesn't see HTML comments, which would appear in rendered HTML
source even though they are invisible? OK, in that case, I have NO idea
why the spam isn't hitting Bayes, because it looks
At 3:23 AM +0200 07/25/2013, Karsten Bräckelmann wrote:
header LOCALPART_IN_SUBJECTeval:check_for_to_in_subject('user')
And all of them do hit that rule. A super-set of the ADDRESS variant,
using the local part instead of the complete address. Still in stock
rules.
Hm. One of my
At 12:00 PM +0200 07/29/2013, Karsten Bräckelmann wrote:
You're best bet is to just train what you have as spam, to counter the
Sure, I was planning to do that. The reason I
wanted to --forget it was to make sure that I
wasn't learning it twice (once as ham, once as
spam).
You do not
At 5:48 PM +0200 07/29/2013, Karsten Bräckelmann wrote:
I strongly suggest to NEVER copy-n-paste like that, but to either run
sa-learn on an entire mbox, or *save* a single mail to a file. Since
For what it's worth, I also opened the mbox in a
text editor and copied the actual raw message (as
On Mon, July 29, 2013 10:21 am, Karsten Bräckelmann wrote:
There were none for this email.
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
Whoops. I missed those... I guess this could be why a 7-bit copy/paste
wouldn't work, and using the mbox file directly is required.
Tried
Hi all,
So, some of my FNs get autolearned as ham, and because of the
way my mail queue is set up, I typically only see this once the mail
reaches my MUA and has already been deleted from the online inbox. I
have one particular message that got autolearned as ham (but should
be spam), and
At 12:05 AM +0100 07/16/2013, RW wrote:
OTOH when I just tried this in 3.3.2, spamd didn't to pick-up a test
rule I added to ~/.spamassassin/user_prefs (which worked with the
spamassassin script).
Do you have allow_user_rules enabled in your local.cf? According to
At 3:24 PM +0200 07/01/2013, Benny Pedersen wrote:
if content end user see is mangled, then end user cant relearn ham to be spam
Yes, they can, because SA sees the mangled email before the user
does. Therefore if SA misclassifies an email as ham, that exact same
email is the one seen by the
Hi all,
Just got this spam:
http://pastebin.com/KM5paaZ9
To me, it looks like LONGWORDS should have hit... but it didn't. I
ran it manually through spamassassin and spamc, and LONGWORDS still
didn't hit, so it seems to just not be hitting that rule. But, to my
eye, it looks like
At 8:57 PM +0200 06/30/2013, Benny Pedersen wrote:
well it might confuse bayes yes, but it cant confuse you to run
sa-learn --spam on it ?
I've been running sa-learn --spam on these messages for a month
straight. Some get picked up, others don't. I'm still getting a lot
of BAYES_50 on
At 11:23 PM +0200 06/30/2013, Benny Pedersen wrote:
does it continue if one msg is learned as spam, does it still after
say bayes_50 ?
No, it has BAYES_99 if I learn the message. That is, running SA on
the SAME message will give BAYES_99 after it's learned. It's not a
ham problem.
you
On Tue, June 25, 2013 5:15 am, Matus UHLAR - fantomas wrote:
This looks lik Net::DNS ssue, try upgrading that one.
Also, try upgrading perl...
Why do you say this looks like a Net::DNS issue? The error is being
reported from Mail::SPF, and I've traced that error through the code, it
tracks to
Hi all,
So, I think I've gotten my Bayes DB largely under control...
most of the FN spam I'm getting is getting high Bayes scores and
simply not a large enough aggregate score to count as spam. So, now
I'm wondering if I should increase the points assigned to high Bayes
scores.
For
end up failing due to this
error and some spam therefore gets missed (FNs).
Any ideas are most welcome.
Thanks.
--- Amir
At 12:01 AM -0600 06/13/2013, Amir 'CG' Caspi wrote:
Hi all,
I am getting the follow error peppering my
At 9:47 AM +0200 06/20/2013, Tom Hendrikx wrote:
Since mailscanner already has support for integrating spamassassin [1]
(As I mentioned explicitly in a previous email...)
why would you ever want to put work in reversing some of mailscanners
'protection'?
Because, given the particularls of
On Wed, June 19, 2013 3:14 pm, Axb wrote:
iirc, MailScanner munges the URL befor SA sees it so unless your plugin
idea involves a crystal ball, it's not possible.
Yes, MailScanner gets to it before SA does, unless SA is called from
within MailScanner (which it isn't, on my setup, but that is a
On Wed, June 19, 2013 3:47 pm, Axb wrote:
SA's URIBL plugin doesn't and shouldn't look in the alt attribute.
Why not, exactly? I wouldn't look at it for _all_ img tags, only for ones
that are clearly MailScanner-munged. That is, one would look for the
patterns that MailScanner uses for
At 4:37 PM -0400 06/14/2013, Alex wrote:
On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote:
I wonder if there's some
difference between running spamassassin manually on the message versus
running spamd.
I think the only difference would be if spamd somehow didn't
At 10:13 AM -0700 06/18/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
Any idea why it failed to hit, and does this need another rule revision?
Yep, and yep. Revision committed. Initial comment gibberish rule committed.
Thanks for the revision. Do you want to explain
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
a.) You are copying/pasting the body of the email, but not the headers.
No, I am copying the headers... however, I am using Eudora (ancient,
I know) as a mail client, and it's possible the headers are not
properly formatted. For example, for
At 10:24 AM -0700 06/18/2013, John Hardin wrote:
The earlier version wasn't allowing for some punctuation in the
gibberish. There may be a period of whack-a-mole here, I was
conservative in the change I made.
Makes sense. Both of those examples are good for creating an
Replies to multiple folks below...
At 1:42 PM -0400 06/18/2013, Kris Deugau wrote:
Try opening the on-disk file with Notepad (or your favourite text editor
on *nix). If you see the same thing you see when you hit the blah blah
blah button in Eudora, you should be OK. If not...
I've done
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around
trivial gibberish detection by putting a valid CSS property at the
very beginning of the style tag.
Revising the rules...
I am now seeing STYLE_GIBBERISH hitting on a lot of spam
At 10:48 AM -0700 06/17/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the
past day or so, since the new rules hit the distribution. So far,
all TPs, no FPs.
Yay!
But, I found one today that should have hit
At 9:43 PM -0400 06/13/2013, Alex wrote:
I'd say if you have any that are hitting bayes20 or lower, your
database is not working properly and you should probably start over.
Not quite sure I want to do that... I don't really have a sufficient
corpus of mail for good training. It's working
At 4:37 PM -0400 06/14/2013, Alex wrote:
Yeah, but not bayes20. That's bad for sure. You should start
collecting now, or pull a few hundred from your recent quarantine and
use those, along with people's mail folders.
Well, I got bayes99 when I ran spamassassin manually just now. So, I
really
At 4:37 PM -0400 06/14/2013, Alex wrote:
I think the only difference would be if spamd somehow didn't recognize
all the locations for your rules. Perhaps create a rule that you know
will hit with a very low score in each directory that contains rules.
Maybe there's a way to run spamd in the
At 11:43 PM +0100 06/14/2013, Martin Gregorie wrote:
Are you sure? Take a look at how sa_update is getting run to make sure
that it is doing what you expect.
Yes, I'm sure. I looked at the update script (in my case, it's
called update_spamassassin, due to the way Parallels Pro configures
Hi all,
I am getting the follow error peppering my maillogs:
Jun 13 01:26:42 kismet spamd[24575]: spf: lookup failed: Can't locate
object method new_from_string via package Mail::SPF::v1::Record
at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SPF/Server.pm line 524.
This occurs very often,
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Not all of it is identical in format, but there seems to be one thing
in common: they include lots of random garbage inside either CSS or
in HTML comments. All of this gets ignored by the HTML parser
At 7:25 PM -0400 06/13/2013, Alex wrote:
I think people will start by telling you to block the pw domain
Sure, but not all of the comment-laden spam is from the pw domain.
It comes in from .net, .com, .us, and a bunch of other places as
well. This is just the one example I happened to pick
At 8:04 PM -0400 06/13/2013, Alex wrote:
After looking at it more closely, it's also only hitting bayes20 for
you. Do the others also score so low? This hits bayes99 on my system.
The ones that SA doesn't catch, yes, they are typically low. I have
some that are bayes50, some bayes20, some
72 matches
Mail list logo