Re: Re-running SA on an mbox

2009-09-20 Thread Theo Van Dinter
You probably want spamassassin --mbox. :)
It won't modify the messages in-place, but you can do something like
spamassassin --mbox infile  outfile.

If you're talking about sa-learn, though, it also knows --mbox.


On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote:
 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.


Re: About reporting

2009-09-13 Thread Theo Van Dinter
On Sun, Sep 13, 2009 at 5:08 PM, João Eiras joao.ei...@gmail.com wrote:
 Should the file message.txt in the example contain the full -mail with
 headers, attachments and everything ?

Yes.  It should be the original and complete message.

 Does the reporting tool remove all information about the receiver for
 privacy sake ?

No, nothing is removed from the message.


Re: Filtering depending mail header

2009-09-08 Thread Theo Van Dinter
There's no way to do that with SpamAssassin itself.  Once you send
something to SA, it will do the whole process (there's short
circuiting, but that's not really what you want here).  It sounds like
you're trying to not filter internal mail but filter external mail, so
I would recommend two things:

a) Ideally, have your MTA listen on two different IPs, one internal
and one external.  Apply different rules depending on which IP is
being used.  You mentioned using Postfix, and doing this is fairly
trivial.

b) Send the mail through something like procmail.  It can make
lightweight decisions about if this header exists do X.


On Tue, Sep 8, 2009 at 5:17 AM, Daniel Ruiz
Molinadaniel.r...@caos.uab.es wrote:
 I want to know if it would be possible a spamassassing configuration that
 allows me execute spamassassing just in case a header mail exists with a
 defined value.
[...]
 With this configuration, each mail that receives postfix is scanned before
 sending to each user. My idea is that spamassassin scans only in case mail
 received has a header added by another SMTP. My SMTP receives mails that
 have been received before for the Enterprise SMTP. That Enterprise SMTP add
 a header called X-imss-result:. Depending on the value of this header, I
 would like to configure SpamAssassing for execute scan just in case this
 header has a value equal to Default_Triggered; in other cases, mail will
 be sent to user with no scan.


Re: How do I make Net::DNS::Resolver take /etc/hosts into account?

2009-07-01 Thread Theo Van Dinter
On Wed, Jul 1, 2009 at 3:23 AM, Per Jessenp...@computer.org wrote:
 Back to the subject line - how do I make Net::DNS::Resolver
 take /etc/hosts into account?

a) of course it doesn't, /etc/hosts isn't DNS, so why would Net::DNS
look at it? :)
b) my guess is that you can't, but it's a question for the Net::DNS
folks, not SA.


Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 11:48 AM, Matus UHLAR -
fantomasuh...@fantomas.sk wrote:
 I am not sure but I think something alike was done. What I mean is to have
 generic chain of format converters, where at the end would be plain image
 or even text, that could be processed by classic rules like bayes,
 replacetags etc.

Already exists, check recent list history for set_rendered.
:)


Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 1:12 PM, Jonas Eckermanjonas_li...@frukt.org wrote:
 Already exists, check recent list history for set_rendered.

 I though that was for text only.

It is only for text.

 In any case, any plugin looking for images, or a PDF, will most likely look
 at MIME type and/or file name, and then use the decode method to get the
 data, and AFAICT the set_rendered method doesn't have any impact on any of
 that.

Of course.  There are three states for the data in a Message::Node object:
  - raw: whatever the email had originally.  may be encoded, etc.
  - decoded: the raw content, decoded (ie: base64 or
quoted-printable).  may be binary.
  - rendered: the text content.  if it was a text part, it's the same
as decoded.  if it was a html part, the decoded data gets rendered
into text.  if it's anything else, the rendered text is blank because
nothing else is supported.

The goal with the plugin calls and set_rendered is to allow other
plugins to find parts that they understand how to convert into text,
and set the rendered version of the part to whatever as appropriate.
So if you want to do OCR on image/*, you can do that.  If you want to
convert PDF/DOC/whatever to text, you can do that.

I would comment that plugins should probably skip parts they want to
render that already has rendered text available.

Rules, Bayes, etc, then take all the rendered parts and use them.

 I can't see how set_rendered would help in creating a fucntioning chain
 where one converter could put an arbitrary extracted object (image, pdf,
 whatever) where another converter could have a go at it.

Well, you wouldn't do that because there's no point. ;)   (feel free
to disagree with me though)
If a plugin wants to get image/* parts and do something with the
contents, they can do that already.
If a plugin wants to get application/octet-stream w/ filename *.pdf
and do something with the contents, they can do that already.

If you want to have a plugin do some work on a part's contents, then
store that result and let another plugin pick up and continue doing
other work ...  There's no official method to do that.  You can store
data as part of the Node object.  You could potentially also write a
tempfile, though you'll want to be careful to clean up the tempfile as
necessary.

But what would be a use case for that?  I guess something like
converting a PDF to a TIFF, then OCR the TIFF?
I'd probably implement that as a single plugin w/ ocr as a function
that gets called from both the PDF and TIFF handlers.
Arguably, there could be multiple people developing plugins for
different types, but you'd need some coordination for the
register_method_priority calls to figure out who goes in what order.
(btw: I just found the register_method_priority() method. \o/)

Note: Do not try to add or remove parts in the tree.  The tree is
meant to represent the mime structure of the mail, and each node
relates to that specific mime part.  The tree is not meant to be a
temporary data storage mechanism.


Hope this helps.


Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 3:41 PM, Jonas Eckermanjonas_li...@frukt.org wrote:
 Matus example was a Word document that contained as PDF wich (might in turn
 contain an image). A plugin that knows how to read word document could
 extract th text of the word document and then use set_rendered to make
 that avaiölable to SA. It cannot currently extract the PDF and make it
 available to any plugins that knows how tpo read PDFs though.

My view would be that if someone is going to try making things so
convoluted such as that, a) we've won because no one is going to go
through the trouble of opening that doc, b) the convolution is a
fingerprint that you could write a rule for and then you don't care
what the content actually is.  For example, you'd render something
like doc_pdf_jpg, which would make an obvious Bayes token.  In the
same way for a zip file, you could do zip_pdf zip_jpg zip_txt, etc,
and they'd all be different tokes.

But yes, you're right, the Message/Message::Node stuff wasn't designed
with the idea of supporting multiple independent data objects from a
single mime part.  I can see the argument for treat embeded files
similar to multipart, but I still lean towards mime structure only.

 For some stuff coordination would be needed, yes. But not for what I'm
 thinking of.

Why not?  If you have no coordination, you would possibly look for
images first, then pdfs, then word docs, and end up not getting
anywhere.  If it's all your plugin, you can configure the order.  If
it's not, you need coordination.  For example, as from above, if
there's zip file with a doc which has a pdf which has a jpg, and your
plugin doesn't handle zip but another one does ...

 The most common thing to extract apart from text will most likely be images.
 Any OCR text extractor tied into my plugin would get to see those images,
 but any OCR SA plugins run after my plugin won't. It might be good to make
 extracted images available to those, and other image handling plugins.

But yours already ran, so who cares about the others?

Seriously.

If you're expending the resources to OCR the same image in an email
multiple times ...  You clearly either have a lot of hardware or not a
lot of mail.


Re: How many people are still using perl 5.6.x?

2009-06-25 Thread Theo Van Dinter
Well, the point is that if it works, don't break it.
Yes, you can totally avoid upgrades, depending on your environment.
Sometimes you have no choice and continue to run old versions of
software or firmware or ...
Get over it. :)

If you want to continue debating system administration issues, there
are several lists to do so (go to sage or lopsa, for example).  The
goal for this thread is to get a sense of how many people are still
running SA on Perl 5.6 and therefore how disruptive would it be to the
user base to require a newer version of Perl for newer versions of SA.


On Thu, Jun 25, 2009 at 5:35 PM, Yet Another Ninjasa-l...@alexb.ch wrote:
 On 6/25/2009 11:27 PM, John Rudd wrote:
 On Thu, Jun 25, 2009 at 10:09, Chris Hoogendykhoogen...@bio.umass.edu
 wrote:
 Gone are the days when you totally avoided upgrades because of the time,
 hassle and risk involved.

 Time and hassle, maybe.  Risk, no.  Risk is not a binary, it's a
 balancing act.  Live updates don't remove risk, they simply alter the
 risk balance.  There will always be applications and environments
 where risk is high enough that will cause you to wait.
 can we get back to Spamassassin and a sane update cycle context? .-)


Re: Bayes and SQL.

2009-06-22 Thread Theo Van Dinter
On Mon, Jun 22, 2009 at 6:06 AM, Kasper Sacharias Eenbergk...@hovmark.dk 
wrote:
 I'm not completely sure that force-expire does anything. I ran it
 several times last week, and nothing showed up in the 'last expiry
 atime' column. So i figured it wasn't working.

Please keep in mind that --force-expire means force an expire run
to occur which isn't the same as force tokens to be expired.
Reading the verbose expirations docs in man sa-learn may be useful.

fwiw, that's actually what the man page says for --force-expire as well. ;)
   --force-expire
   Forces an expiry attempt, regardless of whether it may be necessary
   or not.  Note: This doesn't mean any tokens will actually expire.
   Please see the EXPIRATION section below.

 1) Now, expiry gives me some 'strange' output. Can anyone take a look at
 this and tell me if it's normal?

What's strange?  It says couldn't find a good delta atime, need more
token difference, skipping expire.
That's explained in the man page as mentioned above (see ESTIMATION
PASS LOGIC).

In short:
[3753] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[3753] dbg: bayes: token count: 171682, final goal reduction size: 59182

SA wants to expire down to 112500, by removing 59182 tokens.

[3753] dbg: bayes: 1382400 89566
[3753] dbg: bayes: 2764800 0

Your DB is pretty new, and so when looking at the atime deltas, there
is no delta which will expire = 59182 (it can't take 2764800 because
there's nothing to do there, and 1382400 expires too many tokens).

Therefore it can't do anything, and needs more atime differences which
should let it find an appropriate delta to use.

 2) The 'sync' apparently will not work. No sync atime is reported.

If you're using SQL, there is no sync time because there is no journal.
e


Re: BAYES_99 score lint

2009-06-22 Thread Theo Van Dinter
The debug output is saying that the meta rule, LOCAL_BAYES_RTF, has a
dependency, BAYES_99, which has a 0 score.
In the score line, there are two zero values. ;)   It depends what
scoreset you're running in.
Also, just because 50_scores.cf has something set doesn't mean
something later on doesn't change it.


On Mon, Jun 22, 2009 at 2:50 PM, MySQL Studentmysqlstud...@gmail.com wrote:
 Hi all,

 When I run spamassassin -D --lint, I receive this output:

 [14406] info: rules: meta test LOCAL_BAYES_RTF has dependency 'BAYES_99'
 with a zero score

 Which is it saying has a zero score?

 BAYES_99 in 50_scores.cf is shown as:

 score BAYES_99 0 0 3.5 3.5

 The LOCAL_BAYES_RTF is a meta rule that combines BAYES_99 with a mimeheader
 rule with 0.1 score that catches RTF files.


Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
On Fri, Jun 19, 2009 at 3:04 AM, Jason Haarjason.h...@trimble.co.nz wrote:
 Speaking of image/rtf/word attachment spam; is there any work going on
 to standardize this so that the textual output of such attachments could
 be fed back into SA?

That functionality already exists (has for almost 3 years, actually),
but as in the past (list archives) the documentation hasn't improved
for it. :(

Here's my last(?) post about it which has some sample code and everything:

http://www.nabble.com/Re:-PDFText-Plugin-for-PDF-file-scoring---not-for-PDF-images-p11595641.html


Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
On Fri, Jun 19, 2009 at 4:42 PM, Charles Gregorycgreg...@hwcn.org wrote:
 H. Big question for developers: Does the performance 'burden' of a large
 e-mail come from the 'reading' of that mail into spamassassin and initial
 processing? Or is the 'cost' of a large message only 'paid' when SA attempts
 to run 'rawbody' or 'full' rules against the entire message?

There is very little load for reading in a message.  It's all about
the running of rules.  Some rules cost more than others, full, for
example.


Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
Once you have a part you can use the documented methods in
Message::Node to access data (see perldoc
Mail::SpamAssassin::Message::Node).  You will probably want
$p-decode() which returns a decoded (base64, quoted-printable) string
of the part contents.


On Fri, Jun 19, 2009 at 7:00 PM, Rosenbaum, Larry
M.rosenbau...@ornl.gov wrote:
 From: felic...@kluge.net On Behalf Of Theo Van Dinter

 On Fri, Jun 19, 2009 at 3:04 AM, Jason Haarjason.h...@trimble.co.nz
 wrote:
  Speaking of image/rtf/word attachment spam; is there any work going
 on
  to standardize this so that the textual output of such attachments
 could
  be fed back into SA?

 That functionality already exists (has for almost 3 years, actually),
 but as in the past (list archives) the documentation hasn't improved
 for it. :(

 Here's my last(?) post about it which has some sample code and
 everything:

 http://www.nabble.com/Re:-PDFText-Plugin-for-PDF-file-scoring---not-
 for-PDF-images-p11595641.html

 Thanks for the sample code.  Once you get the $p object from 
 $msg-find_parts(), how do you extract the contents of the message part to 
 run it through antiword or whatever?

 L



Re: Suggested Change For FS_TEEN_BAD

2009-06-18 Thread Theo Van Dinter
On Thu, Jun 18, 2009 at 7:26 AM, Michael
Monneriemichael.monne...@is.it-management.at wrote:
 On Mittwoch 17 Juni 2009 Theo Van Dinter wrote:
 Yes, it matters (one path is tried then the other has to be tried, as
 opposed to having a single path)

 So which is better performance wise? I guess [sz]? but I'm not sure now.

[sz] is better than (s|z), I want to say always (true from the
theoretical POV), but it depends on the RE compiler which can optimize
(convert) one to the other (the reality POV).  IMO, it's good habit to
just do the right thing yourself, since different RE compilers are
well, different.

In short:
if you want to match one of several specific single characters, use a
character class [].  only use (...|...) if you need to catch more
complicated/non-single character things.


If you want to know more gory details, search around for finite automata. :)


Re: Suggested Change For FS_TEEN_BAD

2009-06-17 Thread Theo Van Dinter
Yes, it matters (one path is tried then the other has to be tried, as
opposed to having a single path), though the overall amount is
probably negligible.  Perl's RE compiler could well optimize this away
anyway.


On Wed, Jun 17, 2009 at 7:45 PM, Kelsonkel...@speed.net wrote:
 Wouldn't it be more efficient to write all the single-letter matches like
 (?:s|z)? as [sz]? or does it end up not making a difference when the
 regex is actually processed?


Re: Suggested Change For FS_TEEN_BAD

2009-06-15 Thread Theo Van Dinter
On Tue, Jun 16, 2009 at 12:23 AM, Andy Dormanador...@ironicdesign.com wrote:
 However, I was a little surprised that SpamAssassin did not have a test for
 a phrase in the subject that seemed to clearly indicate potential child porn
 like girls getting f**ked.

SpamAssassin is not a porn filter, whatever the variety.


Re: Capturing and using values....

2009-06-14 Thread Theo Van Dinter
No, SA doesn't do that.  The best way to do this is to write a plugin
where you can do whatever you want. :)

On Sun, Jun 14, 2009 at 3:18 PM, Charles Gregorycgreg...@hwcn.org wrote:
 Got a usage question. Is there a simple mechanism, similar to Perl's use
 of parantheses and $1 to 'capture' a value in one rule and USE that captured
 value in the next rule?


Re: Question on add-to-blacklist

2009-06-02 Thread Theo Van Dinter
Well, the first problem is that the AWL has no impact on Bayes.
They're totally independent.
Perhaps you want sa-learn ?

On Tue, Jun 2, 2009 at 2:32 PM, Larry Starr lar...@fullcompass.com wrote:
 I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump
 new spam senders above the Bayes-99 score.

 My problem is that this feature seems, extreemly slow.

 I'm now trying to use the ( --add-to-blacklist ) option and am finding that
 this is, equally, slow.

 I'm running it as:
 spamassassin  -d --progress --add-to-blacklist --mbox mboxfile

 The mboxfile contains the messages whose senders I wish to blacklist, via
 AWL.

 The process seems to take anywhere from 5 to 15 minutes, per address.

 Can anyone offer a faster alternative?

 Thanks,
 --
 Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
 Software Engineer: Full Compass Systems LTD.
 Phone: 608-831-7330 x 1347  FAX: 608-831-6330
 ===
 There are only three sports: bullfighting, mountaineering and motor
 racing, all the rest are merely games! - Ernest Hemmingway



Re: Identifying Source of False Positives

2009-06-01 Thread Theo Van Dinter
fwiw, even if there isn't a blank line, SA will figure it out (though
it'll trigger a MISSING_HB_SEP rule hit).

As for the debug output ... it depends, how did you run the command
(ie: what was the command you tried).  My guess is you did something
like spamassassin -D filename, where filename gets treated as the
argument to -D, so then it was waiting for input.  If this is the
case, try spamassassin -D  filename  /dev/null. :)

On Mon, Jun 1, 2009 at 6:09 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
  There is always a blank line between headers and body. I tried running
 'spamassassin -D' on the saved message and nothing happened. Should it take
 more than a few seconds to complete and return a debug report?


Re: Plugin/TVD.pm

2009-05-31 Thread Theo Van Dinter
That depends, what's TVD.pm?  ;)

Doing a quick search shows
http://mail-archives.apache.org/mod_mbox/spamassassin-users/200603.mbox/%3c20060316233124.gv22...@kluge.net%3e
which was a conversation we had way back in 2006 about SA 3.1 and bug
4255.  There was a TVD.pm in discussion, so I assume that's the plugin
in question.

It appears to have become HTTPSMismatch.pm, already included as a
standard plugin in SA 3.2 and beyond. :)


On Sun, May 31, 2009 at 2:03 PM, Philip Prindeville
philipp_s...@redfish-solutions.com wrote:
 I upgraded from FC8 to FC9 recently, and spamassassin could no longer
 find TVD.pm after I deprecated the old Perl install.

 Where does TVD.pm currently live?


Re: sa-learn doesn't remember messages it's already learned from

2009-05-31 Thread Theo Van Dinter
When you say the database, do you mean bayes_toks or bayes_toks
and bayes_seen?  If the former, you need to grant write privs to
bayes_seen as well.

Also, when in doubt, run w/ -D to see what's going on.


On Sun, May 31, 2009 at 1:41 PM, Russell Jones rjo...@eggycrew.com wrote:
 I am running a global bayes database. The file permissions for the database
 is 0666. For some reason I just realized that sa-learn is not remembering
 the messages it's already learned from. I've checked the bayes file
 permissions and everything else I could think of, but if you run sa-learn,
 wait for it to finish, and then run it again it learns off of the same
 messages. (IE, learned 5 tokens from 5 messages. This will show everytime
 you run sa-learn). Nothing has changed from my configuration that should
 have caused this.

 I am running SpamAssassin 3.2.5, Perl 5.8.8. There are no errors shown when
 sa-learn finishes. Any ideas what I could look at as the cause for this?


Re: Problem with check_invalid_ip()

2009-05-29 Thread Theo Van Dinter
None of the IPs you listed will match.
Have you tried simply running a loop in Perl to see what the results are?

Also, negation ~ ?  What do you mean?  =~ is not a negation (that
would be !~).
Also also, the ^ and $ chars are important.  If you remove them,
you change the RE.


On Fri, May 29, 2009 at 7:59 AM, Eric Rodriguez thewa...@gmail.com wrote:
 Hi,

 I removed the negation ~ , the begin ^ and end $  charaters from the
 original source:

 sub check_for_illegal_ip {
   my ($self, $pms) = @_;

   foreach my $rcvd ( @{$pms-{relays_untrusted}} ) {

 # (note this might miss some hits if the Received.pm skips any invalid
 IPs)
 foreach my $check ( $rcvd-{ip}, $rcvd-{by} ) {
   return 1 if ($check =~ /^

   
 (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+

   $/x);
 }
   }
   return 0;
 }

 Here are my results:
 Test Target String matches() replaceFirst() replaceAll() lookingAt() find()
 group(0)
 1 127.0.0.1 No 12 12 No Yes 7.0.0.1
 2 192.168.1.1 No 19 19 No Yes 2.168.1.1
 3 87.248.121.75 No 8 8 No Yes 7.248.121.75
 4
 193.1.1.1 No 193.1.1.1 193.1.1.1 No No
 5
 194.1.1.1 No 194.1.1.1 194.1.1.1 No No

 If I understand correctly the first 3 tests are valid IP, but not the
 193.1.1.1 and 194.1.1.1 ??

 Eric Rodriguez


 On Fri, May 29, 2009 at 13:53, Matt Kettler mkettler...@verizon.net wrote:

 Eric Rodriguez wrote:
  Hi,
 
  I'm having trouble with the check invalid_ip subroutine in the
  RelayEval.pm.
  See
 
  http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385
 
  http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/RelayEval.pm?view=logr1=451385pathrev=451385
 
  After a couple test, it seems that 193.X.X.X and 194.X.X.X ip's are
  not valid with respect to the regexp.
  Is this a bug? or am I wrong about the test?
 
  I used http://www.fileformat.info/tool/regex.htm with
  RegExp:
 
  (?:[01257]|(?!127.0.0.)127|22[3-9]|2[3-9]\d|[12]\d{3,}|[3-9]\d\d+)\.\d+\.\d+\.\d+
  Tests:
  127.0.0.1
  192.168.1.1
  87.248.121.75
  193.1.1.1
  194.1.1.1
 
 
  Could someone explain me which ip are valid according to this test ?
  Thanks
 
  Eric Rodriguez
 Using the above tool I get results telling me that 193.1.1.1 and
 194.1.1.1 do NOT match, and therefore are valid IPs.

 Test    Target String   matches()       replaceFirst()  replaceAll()
 lookingAt()     find()  group(0)
 1       193.1.1.1       *No*    193.1.1.1       193.1.1.1       No      No
 2       194.1.1.1       *No*    194.1.1.1       194.1.1.1       No      No



 In fact, NONE of your test strings match the regex. But 127.1.1.1,
 correctly, does.






Re: Filtering through mailing lists

2009-05-29 Thread Theo Van Dinter
Sure, change your mail system so it doesn't call SA more than once on
the same message. :)

On Fri, May 29, 2009 at 9:26 AM, Garik garik@gmail.com wrote:
 Is there anything that can be done so there's only one instance of
 [**SPAM**] in the subject? Have postfix strip out the spam headers from the
 subject, or is there another solution? Someone would have run across this
 problem before me.


Re: Error when running sa-update

2009-05-20 Thread Theo Van Dinter
What version of IO::Zlib do you have installed?  sa-update line 82 is
it trying to load IO::Zlib 1.04 or later:

use IO::Zlib 1.04;


So my guess is that you either have an early non-version exporting
version, or a strange/corrupted module.  Either way, reinstalling it
would be the way to go.


On Wed, May 20, 2009 at 2:13 PM, Patrick Saweikis psawei...@techpro.com wrote:
 Has anyone seen the following when trying to run SA-update?

 IO::Zlib does not define $IO::Zlib::VERSION--version check failed at
 /usr/bin/sa-update line 82.
 BEGIN failed--compilation aborted at /usr/bin/sa-update line 82.


Re: catch22: MIRRORED.BY wrong, sa-update won't

2009-05-19 Thread Theo Van Dinter
just fyi, I left spamassassin.kluge.net up for over a month after
removing it from the MIRRORED.BY file, and forced a new update to deal
with https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6083.  I
figured that anyone using sa-update would run it at least once a
month, and then get the new MIRRORED.BY file.

Fixing 6083 will stop this from happening again in the future.
I'm interested in working on it, but haven't had time to set aside to
do so yet.  Feel free to go for it if you want to. :)


2009/5/16 Karsten Bräckelmann guent...@rudersport.de:
 # test mirror: zone, cached via Coral
 #http://buildbot.spamassassin.org.nyud.net:8090/updatestage/
 http://spamassassin.kluge.net/updates/

 Oh, that explains it indeed. A single mirror. :/  Thanks.


Re: Boxtrapper and Spamassassin Cpanel 11 strange behaviour.

2009-05-11 Thread Theo Van Dinter
fwiw, I also confirm any CR mails that I get.  I just wanted to paste
in this quote...  :)

challenge response is a great way to tell people they are less important
than you   - Dan Quinlan via IRC


On Mon, May 11, 2009 at 2:33 PM, Dave Pooser dave...@pooserville.com wrote:
 Not necessarily true-- anytime I see one of those challenges to a (forged
 sender) address I control I'll click the confirmation link just to make sure
 the backscatterer gets the spam. I figure if he wants me to filter his spam
 for free, he'll get every bit of his money's worth.  ;-)


Re: Errors during installation spamassasssin

2009-05-05 Thread Theo Van Dinter
Mail::SPF replaced Mail::SPF::Query.  You should pick one or the
other, though Mail::SPF is preferred.  See the INSTALL doc.

Also note, the module diag output is not a list of things that you
need to install, it's just a list that can help when debugging.


On Tue, May 5, 2009 at 4:58 AM, Jack Raats j...@jarasoft.net wrote:
 I'm using the FreeBSD 7.2-RELESE. I've installed spamassassin using the
 ports.
 When running sa-update -D I get the following output (part of it)

 [97306] dbg: diag: module installed: Mail::SPF, version v2.006
 [97306] dbg: diag: module not installed: Mail::SPF::Query ('require' failed)

 When installing the module Mail::SPF::Query I'll get:

 ===  p5-Mail-SPF-Query-1.999.1 conflicts with installed package(s):
   p5-Mail-SPF-2.006

 Is this a bug in sa-update or a bug of the portssytem of freebsd???


Re: Error: spamc: connection attempt to spamd aborted after 3 retries

2009-05-05 Thread Theo Van Dinter
This has been said before, but there seems to still be some confusion.

In short -- you seem to think you're using amavis, and have an amavis
config file ...  But instead you seem to be calling spamc/spamd, which
is completely different and unrelated.

If you want to use amavis, then stop using spamc/spamd, and make sure
your MTA configuration uses amavis.
Once you are sure you have amavis configured in the MTA, if you are
still not getting the expected results, you will want to ask the
amavis folks for support.

If you want to use spamc/spamd instead, then stop trying to configure
amavis and set SpamAssassin config files appropriately to do the
markup that you want.


On Tue, May 5, 2009 at 1:49 PM, Alejandro Cabrera Obed
aco1...@gmail.com wrote:
 Now the message are checked for spam with an assigned score, but it'doesn't
 appear anymore the ***SPAM*** tag the Amavisd-new set up when a spam score
 is greater than de defined threshold. I have to have this tag in order to
 filter ths spam for each user.

 My amavis conf file have the following lines:

 
 $inet_socket_port = 10024;   # default listenting socket
 $inet_socket_bind = '127.0.0.1'; # limit socket bind to loopback interface
 @inet_acl = qw ( 10.1.1.2 127.0.0.1 ); # allow SMTP access from these IP's
 $sa_spam_subject_tag = '***SPAM*** ';
 $sa_tag_level_deflt  = 4.0;  # add spam info headers if at, or above that
 level
 $sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level
 $sa_kill_level_deflt = 5.0; # triggers spam evasive actions
 $sa_dsn_cutoff_level = 10;
 ...

 Why If I use socket for spamd the Amavisd-new does not put the ***SPAM***
 tag to the spam messages ???


Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Theo Van Dinter
On Tue, May 5, 2009 at 5:40 PM, Micah Anderson mi...@riseup.net wrote:
 Eh?  Last journal sync atime is Jan 1 1970?
 Try running:   sa-learn --sync

 Doesn't seem to change the 'last journal sync atime' from 0.
[...]
 I'm using a mysql DB and I've got the following set in my local.cf:

SQL Bayes DBs don't have journals, so no last sync time is expected.  fyi.


Re: [sa] Re: The weirdest problem .....

2009-05-04 Thread Theo Van Dinter
You're wrong (but you're close). :)

You can configure your own whitelist_from_* and blacklist_from_* (or
the other whitelist/blacklist commands) in your user_prefs/configs.
Either you have the config or you don't, and the scores are for the
rule not each sender, so in that sense, it's permanent.

Then there's the AWL, aka the historical score averager, which has
some commands via spamassassin to do simple manipulation, usually to
correct undesired entries.  The score changes per message, typically.

Hope this helps. :)


On Mon, May 4, 2009 at 12:16 PM, Charles Gregory cgreg...@hwcn.org wrote:
 Okay, maybe I'm misunderstanding. I was under the impression that
 spamassassin had TWO 'whitelists'. One was user specified, with 'add' and
 'remove' capability (and anyone removed *stayed* removed), and the other was
 'auto', which was generated automatically by AWL rules, and with NO commands
 to manipulate it? Gurus? Am I wrong?


Re: Spam from windows live

2009-05-04 Thread Theo Van Dinter
2009/5/4 Karsten Bräckelmann guent...@rudersport.de:
 Bear in mind that an email that gets a Bayes score of more than one
 point can't be autolearned as ham.

 Nope, this is wrong.

 The Bayes rules (as well as some other rules) do NOT have any impact on
 the auto-learning. In fact, the auto-learner even uses a score-set
 without Bayes, to avoid self-feeding.

  http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

Actually it's not wrong.  The POD just doesn't match the code,
unfortunately. :(  (feel like opening a bug?)

Yes, the different score set is used to avoid any biasing by the Bayes
system for as to whether or not to autolearn, but there's also a check
of the Bayes score that was applied (rule score not bayes probability)
via https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2865.  In
short, if the message seems to strongly be ham or spam, don't
autolearn it the other way and let train-on-error happen if it is
actually wrong.


Re: The weirdest problem

2009-05-04 Thread Theo Van Dinter
I think usually when renaming it comes up, people just start talking
about the stuff it should or could be doing, and that branches into a
write a more fully featured plugin conversation, which then doesn't
go anywhere. :(

The AWL has also been around for so long that renaming it would
probably just cause more confusion.


On Mon, May 4, 2009 at 1:27 PM, Adam Katz antis...@khopis.com wrote:
 Theo Van Dinter wrote:
 Then there's the AWL, aka the historical score averager, which has
 some commands via spamassassin to do simple manipulation, usually to
 correct undesired entries.  The score changes per message, typically.

 Any movement to rename AWL and thus to avoid explaining it so often?

 We could keep the acronym:
 * Average Weight Leveler
 * Auto-weighting Lean
 * Adjust to Widespread Listing
 * All Worth (was) Lost
 * Assassins We Love
 (or mix and match as needed)

 Or replace it altogether:
 * Historical Score Averager (HSA)
 * Regression to the Mean (RTM or RttM)
 * Out-of-bound Adjustor (OOB or OOBA or OA)
 * Return to Average (RTA)
 * Return to Former Mean (RTFM)
 * Historical Off-Target Average Score Standardizer (...)

 I personally like either Average Weight Leveler or Return to Average,
 though as somebody who hangs around statisticians, I have a strong
 pull towards RttM (and as a Systems Admin, a tendency towards RTFM).



Re: Error: spamc: connection attempt to spamd aborted after 3 retries

2009-05-04 Thread Theo Van Dinter
If you're using amavis, what is calling spamc?  It sounds like
something changed your config somewhere.  Did someone put in a
procmailrc entry?


On Mon, May 4, 2009 at 2:57 PM, Alejandro Cabrera Obed
aco1...@gmail.com wrote:
 Dear all, I use Postfix (version 2.3.8-2+etch1) + amavisd-new (version
 2.4.2-6.1) + spamassassin (version 3.2.3-0.volatile1), and they are Debian
 Etch packages.

 Spamassassin is invoked from amavisd-new, so port TCP/783 is never open.

 A pair of days ago, I notice that the messages are not being checked for
 spam, and I have this log messages in /var/log/mail.err time after time:

 May  4 15:55:04 mail spamc[18892]: connect to spamd on 127.0.0.1 failed,
 retrying (#1 of 3): Connection refused
 May  4 15:55:04 mail spamc[18893]: connect to spamd on 127.0.0.1 failed,
 retrying (#1 of 3): Connection refused
 May  4 15:55:04 mail spamc[18894]: connect to spamd on 127.0.0.1 failed,
 retrying (#1 of 3): Connection refused
 May  4 15:55:04 mail spamc[18881]: connection attempt to spamd aborted after
 3 retries

 I tried restarting all the mail services but I fail.

 What can be the problem, because this model has worked very well until last
 week and nobody has change nothing except apt-get dist-upgrade from Debian
 volatile repositories ???

 Special thanks

 Alejandro



Re: Spam from windows live

2009-05-04 Thread Theo Van Dinter
2009/5/4 Karsten Bräckelmann guent...@rudersport.de:
 via https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2865.  In

 No commit pointer. I'm lazy, Theo, any hints to the actual commit so I
 don't have to dig? :)

Sure.  I found it by a) looking at the code and validating my
understanding, and b) looking at svn log and finding:


r157204 | jm | 2005-03-11 21:02:03 -0500 (Fri, 11 Mar 2005) | 1 line

bug 2788: doco fixes for blacklist rules where autolearning is
concerned; also bug 2865: don't learn messages as ham if they were
previously marked spam by the classifier (due to blacklists etc.), and
vice-versa.


That said, the diff doesn't really show much, and svn blame actually
points at r149224 instead:


r149224 | jm | 2005-01-31 00:52:33 -0500 (Mon, 31 Jan 2005) | 1 line

move default Bayes auto-learn discriminator out of core, into an
active-by-default plugin, so that it can be overridden if desired


so then you have to find the original module.  I thought it was
Bayes.pm, but it's actually PerMsgStatus.pm, which makes sense when I
think about it some more ...However, finding when the code got
added was hard -- I ended up doing a binary search w/ svn cat and
ended up here, which was the first mention of learner_said_ham_points:


r6746 | duncf | 2004-02-18 20:24:48 -0500 (Wed, 18 Feb 2004) | 1 line

Bug 1332: replace hits with points (or score) internally and
externally... Some variable names have changed, notably $self-{hits}
is now $self-{score}. Backwards compatibility is maintained where
possible



Re: Can't locate File/Scan/ClamAV.pm

2009-05-03 Thread Theo Van Dinter
Apparently the clamav.pm plugin requires other modules which you
didn't install.  You need to find out what the dependencies are, and
make sure they're met before trying to use the plugin.


On Sun, May 3, 2009 at 12:05 PM, Chris cpoll...@embarqmail.com wrote:
 Can't locate File/Scan/ClamAV.pm in @INC (@INC
 contains: /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
 /usr/lib/perl5/site_perl/5.10.0 
 /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
 /usr/lib/perl5/vendor_perl/5.10.0 
 /usr/lib/perl5/5.10.0/i386-linux-thread-multi
 /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl) at 
 /etc/mail/spamassassin/clamav.pm line 5.

 I reinstalled SA from source this time and restored all my .cf files
 from a current backup. Did I miss something?


Re: Restarting bayes

2009-05-02 Thread Theo Van Dinter
bayes_seen is rather irrelevant.
bayes_toks is very binary-oriented, and uses lots of pack() calls.

There is no SA-based validity check for the DB files/data.  If you
think the DB file itself is corrupt, you could try the appropriate DBM
tools (db_verify, etc.)  The dump/restore method really should have
solved your issue.  If you're still having the same problem, I would
say either a) are you sure you're looking at the right DB file, b) do
the dump/restore again and make sure to delete/move the DB file before
restoring, c) make sure the data you're restoring is valid (gigo and
all that).


On Sat, May 2, 2009 at 2:34 PM, Gene Heskett gene.hesk...@verizon.net wrote:
 Greetings;

 1. The suggestions to rebuild the bayes db didn't make any difference.
 2. The error complains about the packing format of the db, when as near as I
 can tell, it isn't packed, its plain text, or at least the bayes_seen file is.
 And its nearly 9 megabytes.

 bayes_toks, OTOH, is inscrutable. and over 2 megabytes.

 Is there a way to check this bayes_toks file for validity,  maybe even fix
 it, or should I just nuke all bayes_* and retrain?

 Thanks.

 --
 Cheers, Gene
 There are four boxes to be used in defense of liberty:
  soap, ballot, jury, and ammo. Please use in that order.
 -Ed Howdershelt (Author)
 Look afar and see the end from the beginning.




Re: Looks like sa-learn --spam troubles

2009-05-01 Thread Theo Van Dinter
I would say it's less someone poisoning your DB and more your DB
becoming corrupt.  As it says, a pack format of dec(73) is not a valid
value.  It's set by the BayesStore module itself, not influenced by
the token in question.

You can try to do a dump/verify/restore ...  ala:

sa-learn --sync
sa-learn --backup  db-dump
vi db-dump   [... make sure things look as expected, etc ...]
[... backup your db, however appropriate, depending on your setup ...]
sa-learn --restore db-dump



On Fri, May 1, 2009 at 11:23 AM, Gene Heskett gene.hesk...@verizon.net wrote:
 The error:
 bayes: unknown packing format for bayes db, please re-learn: 73 at
 /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/BayesStore/DBM.pm line
 1883.

 This seems to be repeated at about 3x for every spam I put in the spam folder.
 Obviously someone has figured out a way to poison the bayes_db.

 Is there a fix?


Re: trying to score based on image name and image size

2009-04-30 Thread Theo Van Dinter
There could be various reasons ranging from plugin isn't loaded
(though you'd get an error w/ the rules then) to image isn't exactly
that size, to plugin can't determine width+height from image, to
...

Assuming the plugin is loaded (spamassassin -D plugin --lint would
tell you), and you've verified that the size is what you think it is
using some method ...   For option 3, run the message through
spamassassin -D imageinfo and see what it spits out.  If you don't
see something like:

imageinfo: png image FOO.PNG is 400 x 240 pixels (96000 pixels sq.)

Then it didn't figure out the height + width.  If it does output that,
compare the height and width to what you expected.


On Thu, Apr 30, 2009 at 1:57 PM, aixenv aix...@yahoo.com wrote:

 I notice there's a:

 mx1:/usr/share/perl5/Mail/SpamAssassin/Plugin# ls -lah ImageInfo.pm
 -rw-r--r-- 1 root root 11K Aug  8  2007 ImageInfo.pm

 and within that there's two subs 'image_named' and 'image_size_exact'

 mx1:/usr/share/perl5/Mail/SpamAssassin/Plugin# cat ImageInfo.pm |grep
 image_named
  $self-register_eval_rule (image_named);
 sub image_named {
 mx1:/usr/share/perl5/Mail/SpamAssassin/Plugin# cat ImageInfo.pm |grep exact
  $self-register_eval_rule (image_size_exact);
 sub image_size_exact {
 mx1:/usr/share/perl5/Mail/SpamAssassin/Plugin#

 I am trying the following rule and it is not scoring, what am i missing?:

 (this rule is in my local.cf)

 # rule to block annoying viagra spam with scraped text based off image size,
 # name and having other rule hits
 # 4/30/09 8:45AM
 body __ZL_PNG_400_240 eval:image_size_exact('png',400,240)
 body __ZL_CAM eval:image_named('/^DS[CL]\d{4}\.png$/')
 meta ZL_VIAGRAIMG HTML_MESSAGE  __ZL_CAM  __ZL_PNG_400_240
 describe ZL_VIAGRAIMG Includes 400x240 viagra png image
 score ZL_VIAGRAIMG 1.00

 any help is appreciate thanks

 aixenv

 --
 View this message in context: 
 http://www.nabble.com/trying-to-score-based-on-image-name-and-image-size-tp23321365p23321365.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.




Re: [SA] 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz antis...@khopis.com wrote:
 The mechanism for sa-update is brilliant, but
 doesn't lend itself to enormous indices of frequently-changing rulesets.

I guess it depends what you mean by enormous.  A sought rule update is 135k.

The likelihood is, imo, that you would probably split up your updates
into multiple channels before they really got out of control in size.
For example, you could do something like a weekly, daily, and
sub-daily channel, and move rules appropriately between them.  Yes, a
little more of a PITA for clients, but how much churn do you really
expect?

 Justin:  Perhaps sa-update could support [version].torrent in addition
 to [version].tar.gz on each mirror?  (This doesn't touch the current
 DNS-based version/announce system.)  Channels hosted for versions of
 SA after the supporting release (e.g. 0.4.3.[channel] and higher)
 would be allowed to host only the torrent file.

I had actually thought about doing a P2P sa-update so as to better
withstand DoS issues, skip the need for a mirrored.by file, etc.  But
the main issue is that most channel updates are rather small, and so
therefore the downloads are rather fast.  Compared to doing a torrent,
which takes relatively a long time to get setup, and just as you
start, you're done.  Also, it means clients are serving data, which
makes the quick sa-update and move on more of a procedure and you
have to worry about remote connectivity, etc, etc.

In the end it didn't seem worthwhile beyond the security aspect, so I
didn't move beyond the thinking about stage.


(and yes, I know I'm not Justin. ;))


Re: 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 8:06 PM, John Hardin jhar...@impsec.org wrote:
 And 135k doesn't add up to a lot of bandwidth?

 ...so don't look for updates more than once every day or two.

Yeah, but I think the point was that a frequently changing ruleset
would be downloaded frequently.

 And if bandwidth at the server is a problem, would publishing the ruleset
 updates via the Coral Cache network work?

Unfortunately, no.  In fact, they kind of suck as a CDN.  We
originally were putting updates through there and would regularly have
issues w/ 404s, corrupt or incomplete downloads, etc.

It may have improved since the 2005 or so timeframe when we started w/
updates, but ...  Haven't checked in a while.


Re: 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 7:56 PM, Adam Katz antis...@khopis.com wrote:
 I guess it depends what you mean by enormous.  A sought rule update is 
 135k.

 And 135k doesn't add up to a lot of bandwidth?  I suppose it depends
 on the number of users, and I'm figuring worst-case scenario, e.g.
 when/if it ships enabled in the default SA install.

Well, it depends what you're measuring.  :)

The update itself isn't large, it's just 135k, which is the not
enormous bit.  135k in and of itself is a pretty tiny file, but I'm
not sure what enormous means in this context -- megs?  gigs?

The aggregate bandwidth could very well be large, depending on update
publish frequency, client update frequency, number of clients, client
bandwidth, etc.  From what I've seen, the standard SA updates w/ the
same ~130k size and the current number of users ... isn't a lot of
bandwidth.

There are some pretty standard ways to deal with this issue though, such as:

a) have lots of mirrors, same idea as your P2P idea though less
dynamic  (oh, that was another thought I had ... go short of using
torrents since they're resource heavy and instead make our own P2P
protocol doing a dynamic http/mirrored.by system)

b) split the channel into a frequent / not frequent channel (or stable
/ testing, or split based on content, or ...) for patterns which don't
change often, there's no reason to keep sending them out.  same idea I
mentioned before.

c) shrink or hold update size steady in face of updates.  hard.

d) make updates less frequently.  defeats the purpose?  clearly every
15m is different than every day is different than weekly ...


To be perfectly honest, I really don't worry about the omg, update
bandwidth issue right now.  I worry that there aren't enough updates
right now.  The only auto-generated one, sought, is daily, and the
manual ones now are more than weekly on average.  I don't know if
sought could even be produced faster, you need a certain amount of
incoming ham and spam to sample and produce test rules, and enough
diversity of mails to test against to avoid obvious bad rules...


Re: Procmail Setup NOT Working

2009-04-28 Thread Theo Van Dinter
2009/4/28 Robert Ober ro...@robob.com:
 It was global and I want it to stay global.  The old procmailrc is:

 DROPPRIVS=yes

 :0fw
 | /usr/bin/spamc

That's a global config, but you're running it per-user due to the
DROPPRIVS line.  fyi.

 All I want to do now is have all the identified spam(X-Spam-Status: Yes ?)
 go to a global file instead of delivered to the users.  The global spam file
 will be readable by only myself and management.

Just create a file and set the permissions to be globally writable,
then point procmail at it.
You can set the read perms however you want.

This makes it hard for users to figure out that some of their mail is
missing though, and makes it harder for them to recover it.


Re: Code Rot?

2009-04-27 Thread Theo Van Dinter
fwiw, I was going to say Yes to the first question.  Not sure about
the second question, though I've always wanted to see more
sharing/give-back from those folks.

While there have been a bunch of mails on the dev list, most of it is
incorrectly opened bugs, or other randomness.
IMO, there hasn't been a lot of actual development going on in quite a
while -- it's definitely *way* less than it was back in the 2004-2005
days (wow, really?  I didn't realize that was so long ago...)

I only speak for myself here, but SA mostly accomplished the goal that
I wanted it to -- the vast (*vast*) majority of my spam was dealt with
automatically, so I didn't have to think about it much anymore.  That
combined w/ several job-related changes, kind of pulled me away, which
is why I haven't been very active since 2005.

With sa-update, I had hoped that there'd be more effort in bug fixing
and maintenance releases of older versions, along with more focus on
rule development ... but that didn't really happen.  That's actually
the big killer, IMO: lack of rule development.  New SA releases just
update the engine, which is great, but there's diminishing returns to
update something which works pretty well already.  This is really why
I wanted the third-party rule folks to get more involved w/ the main
project (thereby being less third-party and thus giving more
momentum to the project), but that never really happened either.

These days there is basically no rule development going on, it seems.
Justin's sought rules are the only ones really being updated, and
that's because they're computer generated. :)

That's actually something else I'm sad about -- we had such a huge
corpus of mail, I would really like to have seen something that took
advantage of it.


So anyway ...  Yeah, IMO, if more people don't get involved, and
specifically to work on rule development, SA is going to completely
stagnate.


On Mon, Apr 27, 2009 at 7:56 AM, Matt Kettler mkettler...@verizon.net wrote:
 Dan Mahoney, System Admin wrote:
 Hey all,

 While there's a decent amount of spamassassin list traffic to imply
 otherwise, is the SA project falling dormant?

 the sare-rules claim they won't be updated due to lives, wives, and
 hockey.

 the fuzzyOCR project claims the only thing that works with 3.2 is the
 SVN version, and on the same page claims you shouln't really expect
 the SVN version to work.

 The wiki pages show the last release as almost a year ago, with no
 notice of any betas, pending releases, or whatnot.

 Many commercial products have happily used SA in their core offering,
 is that where the future of development is?

 Well, I can't speak for third-party efforts like SARE and fuzzyOCR.
 However, you can check out the SA devel effort over on our dev list
 archives:

 http://mail-archives.apache.org/mod_mbox/spamassassin-dev/200904.mbox/browser

 I'd say our effort has been a little lower than normal lately, but it's
 hardly dead. We're trying to wrap 3.3 up, see the 3.3.0 plans thread.




Re: Image spam and failing rule

2009-04-26 Thread Theo Van Dinter
It's already been mentioned, but mimeheader is the right way to look
at the headers of MIME parts.

The rule of thumb is if you are using 'full' you're probably doing it
wrong. :)


On Sun, Apr 26, 2009 at 11:57 AM, Charles Gregory cgreg...@hwcn.org wrote:
 On Sat, 25 Apr 2009, Gary Forrest wrote:

 We are receiving the same image spam many times, random text within the
 body. The only common thing is a image attachment, with the filename in the
 following format
  DSL1234.png
 I have made the following ' RAWBODY ' rule
 /dsl[0-9]{4}\.png/i

 You need to use a 'full' rule to scan attachment names.
 While you are at it, you can also scan for
   full /Content-Type: image\/gif;\n[^a-z]+name=/

 As this seems to be the next evolution of the spam. Nameless gifs :)

 Enjoy!

 - Charles




Re: DATE_IN_FUTURE

2009-04-24 Thread Theo Van Dinter
You'd really want to post the message headers in pastebot or something
so people can look at them.  It's not just the Date header, the rule
also looks at the Received headers, etc.


On Fri, Apr 24, 2009 at 1:44 PM, Rik hlug090...@buzzhost.co.uk wrote:
 I was stumped on a question today about DATE_IN_FUTURE. My googling
 offered me nothing more than the obvious 'The message has a date in the
 future.

 Thing is, I could not see it. The time stamp was 24 Apr 2009 14:20:32
 +0800 and matched the firewall connection log OK. Can anyone point me to
 a sensible explanation of what this rule looks at so I can troubleshoot
 it?


Re: Bayes filter not always triggered

2009-04-20 Thread Theo Van Dinter
On Mon, Apr 20, 2009 at 8:47 AM, m.b mbarc...@f451.net wrote:
 scantime=3.2,size=2745,user=(unknown),uid=104,required_score=5.0,rhost=,raddr=..,rport=57786,mid=

 Do you have any suggestions why not every message is passing through BAYESS?
 I thought it is was locking problem but I'am using flock (and no signs of
 'open bayes database' errors).

My guess is that user=(unknown) is causing your issue.  Perhaps the
user that calls spamd doesn't exist on that server?

Running spamd in debug mode would probably give you more information, fyi.


Re: Bayes filter not always triggered

2009-04-20 Thread Theo Van Dinter
That depends how you have SA setup and how you call it, which you
haven't explained. :)

If you're running it in a site-wide setup, and always with the same
user (even if it doesn't exist on the server), then I'd recommend
running spamd in debug mode and see what it says.


On Mon, Apr 20, 2009 at 11:27 AM, m.b mbarc...@f451.net wrote:
 If user would be missing, it would always cause problems. But it works 75% of
 the time.

 Mark


 Theo Van Dinter-2 wrote:

 On Mon, Apr 20, 2009 at 8:47 AM, m.b mbarc...@f451.net wrote:
 scantime=3.2,size=2745,user=(unknown),uid=104,required_score=5.0,rhost=,raddr=..,rport=57786,mid=

 Do you have any suggestions why not every message is passing through
 BAYESS?
 I thought it is was locking problem but I'am using flock (and no signs of
 'open bayes database' errors).

 My guess is that user=(unknown) is causing your issue.  Perhaps the
 user that calls spamd doesn't exist on that server?

 Running spamd in debug mode would probably give you more information, fyi.


Re: accept only gpg/pgp mail

2009-03-07 Thread Theo Van Dinter
It's already been mentioned, but SpamAssassin doesn't accept, deliver,
or route mail.  It simply marks up a message, particularly with some
added headers, and then you would need something else to filter/route
mails as you want.

As for looking for encrypted vs unencrypted mails, you'd have to write
your own rules and/or plugins (depending on how far/complicated you
wanted to go) to identify those mails you do/don't want.

Going in the way back machine for a minute, in SA 2.5 we had some
rules that looked for pgp/gpg signed/encrypted mails and gave a
negative score to those mails, figuring that they weren't spam.
Unfortunately, that makes a very attractive target for spammers and
they started forging it the signs, which caused us to remove the
rules.

In the end, in general, signing/encrypting does not necessarily help
for spam vs non-spam, and it's resource intensive to look at mails to
validate that the signatures/encryption is valid, making it even less
useful.

But since SpamAssassin is a mail scanning engine, you can write
rules/plugins to do whatever you feel appropriate. :)

Hope this helps.


On Sat, Mar 7, 2009 at 2:07 PM, dmdm dmd...@yahoo.com wrote:
 What lines lines would need to be added and in which file
 to accept only gpg/pgp encrypted and non-ecrypted signed emails to my admin
 account?
 (debian lenny mail server amavisd-new)


Re: how to make a custom ruleset

2009-03-06 Thread Theo Van Dinter
Just fyi, this particular topic keeps getting raised here.  It'd be
great if people would search the list archives.  :)

One of the last times around:
http://www.nabble.com/forum/ViewPost.jtp?post=21296293framed=y

In short, if you want to do this, write a plugin.  REs are great until
you get complicated, like doing multiple headers and comparing
captured text.

SA used to have a rule that looked for from=to, but it didn't do well
and got removed.  Some pointers in the above thread.


On Fri, Mar 6, 2009 at 2:44 PM, Mark Martinec mark.martinec...@ijs.si wrote:
 Adi,

 First, it read the sender, and put it into a variable
 Then, it check, if the recipient is the same as that variable
 if true, then give score 3.0

 The trick is to let a regexp see an entire mail header section.
 Unfortunately it means we can't reuse already parsed addresses
 in From and To header fields, but need to reparse it all in a regexp.

 The rules below comes close, but is not exact (the TOFROM rule
 only checks the first To). Mind the line wraps, there are three
 long lines, each starting by 'header':


 header SAME_FROMTO1 ALL =~ m{^From: (?: . | \n[\ \t] )* \s*(.+)\s* (?s:.*) 
 ^To: (?: (?: [^]* | \([^)]*\) |
 [\ \t]+ | \n[\ \t]+ )*? \1 [,(\ \t\n] | (?: . | \n[\ \t])* \s*\1\s*)}mix
 header SAME_FROMTO2 ALL =~ m{^From: (?: [^]* | \([^)]*\) | [\ \t]+ | \n[\ 
 \t]+ )* ([^,;\s...@[0-9a-z._-]+\.
 [a-z]{2,})\b (?s:.*) ^To: (?: (?: [^]* | \([^)]*\) | [\ \t]+ | \n[\ \t]+ 
 )*? \1 [,(\ \t\n] | (?: . | \n[\
 \t])* \s*\1\s*)}mix
 header SAME_TOFROM  ALL =~ m{^To: (?: . | \n[\ \t] )* (?:\b|) 
 ([^,;\s...@[0-9a-z._-]+\.[a-z]{2,}) \b (?!\.)
 (?s:.*) ^From: (?: (?: [^]* | \([^)]*\) | [\ \t]+ | \n[\ \t]+ )*? \1 [,(\ 
 \t\n] | (?: . | \n[\ \t])*
 \s*\1\s*)}mix
 meta   SAME_FROMTO  SAME_FROMTO1 || SAME_FROMTO2 || SAME_TOFROM
 score  SAME_FROMTO1 0.1
 score  SAME_FROMTO2 0.1
 score  SAME_TOFROM  0.1
 score  SAME_FROMTO  1.5


 Mark



Re: Something doofuzzled in a * ^To: line.

2009-02-23 Thread Theo Van Dinter
It sounds like an issue w/ kmail/vim and not so much a nefarious
spammer ability.

And I'm not sure what you mean by unlisted header.  If you mean:

[other headers]
To:
unlisted header

Then the answer is unlisted header is actually the first line of the body.


On Mon, Feb 23, 2009 at 5:55 PM, Gene Heskett gene.hesk...@verizon.net wrote:
 I've had zip luck getting a trigger line based on Undisclosed Recipients:, or
 Unlisted Recipients: here, so I called up my .procmailrc and tried to enter
 the check phrase by doing a copy/paste from the kmail displayed line when in
 show all headers mode.  But, when pasting that into vim, there is an
 invisible linefeed occupying the underscores place in the header line, and it
 doesn't show up in the show all headers display.

 The input line looks like this:

 To: unlisted-recipients:; (no To-header on input)@gmail-pop.l.google.com

 But copy/pastes as:
 To: _
 unlisted-recipients:; (no To-header on input)@gmail-pop.l.google.com

 Where the underscore is the hidden line feed.  I save the message, and
 inspected it with khexedit, but the saved version does not have an 0x0a
 there.

 Anybody got an idea how the spammers have managed that?

 And better yet, how to defend against it as I'd like to /dev/null any message
 with an unlisted header.


Re: Something doofuzzled in a * ^To: line.

2009-02-23 Thread Theo Van Dinter
Oh, and having a sample mail via pastebin/etc would be handy if you
want more commentary about the mail. :)


On Mon, Feb 23, 2009 at 6:52 PM, Theo Van Dinter felic...@apache.org wrote:
 It sounds like an issue w/ kmail/vim and not so much a nefarious
 spammer ability.

 And I'm not sure what you mean by unlisted header.  If you mean:

 [other headers]
 To:
 unlisted header

 Then the answer is unlisted header is actually the first line of the body.


 On Mon, Feb 23, 2009 at 5:55 PM, Gene Heskett gene.hesk...@verizon.net 
 wrote:
 I've had zip luck getting a trigger line based on Undisclosed Recipients:, or
 Unlisted Recipients: here, so I called up my .procmailrc and tried to enter
 the check phrase by doing a copy/paste from the kmail displayed line when in
 show all headers mode.  But, when pasting that into vim, there is an
 invisible linefeed occupying the underscores place in the header line, and it
 doesn't show up in the show all headers display.

 The input line looks like this:

 To: unlisted-recipients:; (no To-header on input)@gmail-pop.l.google.com

 But copy/pastes as:
 To: _
 unlisted-recipients:; (no To-header on input)@gmail-pop.l.google.com

 Where the underscore is the hidden line feed.  I save the message, and
 inspected it with khexedit, but the saved version does not have an 0x0a
 there.

 Anybody got an idea how the spammers have managed that?

 And better yet, how to defend against it as I'd like to /dev/null any message
 with an unlisted header.



Re: cpan question

2009-02-22 Thread Theo Van Dinter
Since you don't need Net::Ident for SA, I'm going to say no.

:)

On Sat, Feb 21, 2009 at 10:28 PM, Gene Heskett gene.hesk...@verizon.net wrote:
 On Saturday 21 February 2009, Bill Landry wrote:
Gene Heskett wrote:
 Using cpan, trying to install Net::Ident (the other bits except razor were
 nominal from the same source)

 Checking for Apache.pm... not found
 Writing Makefile for Net::Ident
 cp Ident.pm blib/lib/Net/Ident.pm
 Manifying blib/man3/Net::Ident.3pm
   JPC/Net-Ident-1.20.tar.gz
   /usr/bin/make -- OK
 Warning (usually harmless): 'YAML' not installed, will not store
 persistent state
 Running make test
 PERL_DL_NONLAZY=1 /usr/bin/perl -MExtUtils::Command::MM -e
 test_harness(0, 'blib/lib', 'blib/arch') t/*.t
 t/0use.t  Net::Ident::_export_hooks() called too early to check
 prototype at /root/.cpan/build/Net-Ident-1.20-FRTCAm/blib/lib/Net/Ident.pm
 line 29. t/0use.t  ok
 t/apache.t .. Net::Ident::_export_hooks() called too early to check
 prototype at /root/.cpan/build/Net-Ident-1.20-FRTCAm/blib/lib/Net/Ident.pm
 line 29. t/apache.t .. skipped: (no reason given)
 t/compat.t .. Net::Ident::_export_hooks() called too early to check
 prototype at /root/.cpan/build/Net-Ident-1.20-FRTCAm/blib/lib/Net/Ident.pm
 line 29. t/compat.t .. skipped: (no reason given)
 t/Ident.t ... Net::Ident::_export_hooks() called too early to check
 prototype at /root/.cpan/build/Net-Ident-1.20-FRTCAm/blib/lib/Net/Ident.pm
 line 29. t/Ident.t ... Failed 3/8 subtests

 Test Summary Report
 ---
 t/Ident.t (Wstat: 0 Tests: 8 Failed: 3)
   Failed tests:  1-3
 Files=4, Tests=9, 112 wallclock secs ( 0.04 usr  0.01 sys +  1.61 cusr
 0.42 csys =  2.08 CPU)
 Result: FAIL
 Failed 1/4 test programs. 3/9 subtests failed.
 make: *** [test_dynamic] Error 255
   JPC/Net-Ident-1.20.tar.gz
   /usr/bin/make test -- NOT OK
 //hint// to see the cpan-testers results for installing this module, try:
   reports JPC/Net-Ident-1.20.tar.gz
 Warning (usually harmless): 'YAML' not installed, will not store
 persistent state
 Running make install
   make test had returned bad status, won't install without force
 Failed during this command:
  JPC/Net-Ident-1.20.tar.gz: make_test NO

 This YAML does not appear to be available via yum if that's important

 Suggestions please?

 Many thanks too, I forgot to add that to the other message I sent a few
 minutes ago.  My apologies.

Try cpan install YAML (yes, in all caps).

Bill

 2 questions then.
 1) what is it?

 and 2) do I need it for SA?

 Thanks.

 --
 Cheers, Gene
 There are four boxes to be used in defense of liberty:
  soap, ballot, jury, and ammo. Please use in that order.
 -Ed Howdershelt (Author)
 ... relaxed in the manner of a man who has no need to put up a front of
 any kind.
-- John Ball, Mark One: the Dummy



Re: NO_RELAYS FP on relayed mail via IPv6

2009-02-21 Thread Theo Van Dinter
On Sat, Feb 21, 2009 at 7:11 PM, Greg Troxel g...@ir.bbn.com wrote:
 This is a funny case, since the message in question is generated by a
 machine that I would set as TRUSTED.  I am the moderator for
 regional-bos...@netbsd.org, and it gets spam, stunningly enough.  The
 mail is sent to me over IPv6, and SA appears not to parse postfix's IPv6
 received lines.  Is anyone else seeing this problem, and is it specific
 to postfix?  Any hints for where in the sources to read to fix?

At the last check, SA doesn't have a lot of support for IPv6 yet.  For
example, here's some code from the Received header parser in 3.2.x:

  $ip = Mail::SpamAssassin::Util::extract_ipv4_addr_from_string ($ip);
  if (!$ip) {
dbg(received-header: could not parse IPv4 address, assuming IPv6);
return 0;   # ignore IPv6 handovers
  }

Taking a quick look at the 3.3 code, it seems the code now handles
IPv6, but I'm not sure if it's complete support or if partial, how
much, etc.

The code is all in .../lib/Mail/SpamAssassin/Message/Metadata/Received.pm


Re: Everything gets a score of 0

2009-02-21 Thread Theo Van Dinter
According to the debug output, you just have the openprotect channel
and not the SA updates channel.  Hence, none of the standard rules
exist.  Run sa-update. :)

On Sat, Feb 21, 2009 at 8:15 PM, oliver oli...@schinagl.nl wrote:
 This is a clean install on a gentoo hardened box. I'm using SA 3.2.5 and
 have learned about 15k worth of mails for the bayes filter. I only
 started to use sa-learn yesterday as someone suggested that this would
 'fix' things. I used sa-learn --spam on my 'junk' folder and --ham on my
 inbox that should be about spam free. No change. I am using the
 sa-update channel from SA and openprotect (which explains the 70 rules
 below). The only thing I seem to be missing in the dbg output is
 inclusion of the rules from the default path: '/usr/share/spamassassin/'.

 From what I can tell, SA is loading up the rules just fine, but then
 awards no points for them? There seem to be also some strange dependency
 issues from the rules, but I found that that shouldn't be really an
 issue. I used the sample-spam.txt as input to let SA figure it out.

 enterprise ~ # spamassassin -tD  sample-spam.txt
[...]
 [26970] dbg: dns: Net::DNS version: 0.63
 [26970] dbg: config: using /etc/mail/spamassassin for site rules pre files

ok pre files, then the sa-update dir for rules ...

 [26970] dbg: config: using /var/lib/spamassassin/3.002005 for default
 rules dir
 [26970] dbg: config: read file
 /var/lib/spamassassin/3.002005/saupdates_openprotect_com.cf

and that's it...

 [26970] dbg: config: using /etc/mail/spamassassin for site rules dir
[...]


Re: misc_10.cf

2009-02-09 Thread Theo Van Dinter
10_misc.cf isn't in 3.2, 3.1 was the last version to have it.
In 3.2 it's called 10_default_prefs.cf.

You should have it installed in the default rules dir, probably
/usr/share/spamassassin.

And no, it's not editable.  Or more specifically, you shouldn't edit it.


On Mon, Feb 09, 2009 at 09:40:47PM -0800, RobertH wrote:
  Um, that's a file that comes with SA, and it is *NOT* user editable.
  Therefore, it's not an example, it is a standard config file 
  that generates the default settings that you later over-ride 
  with your local.cf.
  
  The 3.2.5 installation tarball will install the version of 
  this file that is appropriate for 3.2.5, and sa-update may update it.
  
  
 
 matt,
 
 i am not seeing that file anywhere in my install and i am quite capable of
 using the locate command etc...
 
 i am fairly certain i hand generated and installed via rpm generated by
 
 rpm -tb sa-tarballname.whateveritwas.somethingsomething
 
 something like that.
 
 on a centos aka redhat clone
 
 the misc_10.cf file looks pretty editable to me in some respects.
 
 i wouldnt have even have asked if i had not gone to
 
 spamassassin.apache.org and then clicked on downloads and on that page it
 says
 
 System Administrators
 Please create a local copy of the report_template text in a file named
 something like /etc/mail/spamassassin/10_local_report.cf, and modify it to
 provide your tech support desk's contact information, instead of the
 default. Otherwise your users will be confused, and some may ultimately
 contact the SpamAssassin development team, which is not appreciated; we
 cannot help them with whitelisting/blacklisting/customisation of settings at
 your site, after all. The default report text can be found in the file
 rules/10_misc.cf. 
 
 so, i searched for 10_misc.cf so that i could consider and generate a
 /etc/mail/spamassassin/10_local_report.cf
 
 eh???
 
  - rh

-- 
Randomly Selected Tagline:
* Cool serving items ... trays, platters, vases, bowls, etc.
CAREFUL: Be careful if you either (a) have radically different tastes or
(b) have Y chromosomes. - Ed Bailey on possible wedding gifts


pgphrJL1VF34d.pgp
Description: PGP signature


Re: Calling spamc and looping through files

2009-02-08 Thread Theo Van Dinter
I would use formail -s to go through the mbox file, and pipe the
mail through procmail w/ an appropriate recipe file to filter the mails as
you'd want.

SpamAssassin is happy to markup your mails, but has no filtering capabilities
since it doesn't deliver mail.

On Sun, Feb 08, 2009 at 04:37:30PM -0800, cnone wrote:
 How can I call spamc and loop through all mails(like 100 mbox email files)
 under a directory and decide which is spam which is not  and save the spams
 in a different dir?

-- 
Randomly Selected Tagline:
T5 - knows sendmail exists, can fog a mirror - Seann Herdejurgen


pgp5x4qPYEPED.pgp
Description: PGP signature


Re: html experts: empty style tags.

2009-01-29 Thread Theo Van Dinter
On Thu, Jan 29, 2009 at 08:50:32PM +0100, Per Jessen wrote:
  you have LEGIT EMAIL with this in it?
  style
 
 I do too. AFAICT, it's Microsoft related. 

taking a look at my january corpus, there are a relative lot of hits
for that, including things like STYLE/STYLE.  a lot of the mails,
as mentioned above, seem to have this (QP-encoded):

meta name=3DGenerator content=3DMicrosoft Word 12 (filtered medium)

-- 
Randomly Selected Tagline:
At least it had heated rear windows--so your hands would stay warm
 while you pushed. - Unknown about the Yugo


pgpiWTSCZE7Af.pgp
Description: PGP signature


Re: bayes autolearn off but journal updated

2009-01-20 Thread Theo Van Dinter
On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
 Why does it update the journal? Why does it try to open journal in R/W mode?

$ man sa-learn
[...]
   bayes_journal
   While SpamAssassin is scanning mails, it needs to track which tokens 
it uses in its cal-
   culations.  To avoid the contention of having each SpamAssassin 
process attempting to
   gain write access to the Bayes DB, the token timestamps are written 
to a ’journal’ file
   which will later (either automatically or via sa-learn --sync) be 
used to synchronize
   the Bayes DB.

In other words, the journal isn't just for learning.

-- 
Randomly Selected Tagline:
Cats are smarter than dogs.  You can't make eight cats pull a sled through
 the snow.


pgpHkdGFBX2Ib.pgp
Description: PGP signature


Re: Test order

2009-01-17 Thread Theo Van Dinter
On Sun, Jan 18, 2009 at 01:58:48AM +0100, mouss wrote:
  Then I should use postfix regexp capabilities to rewrite subject and 
  replace 
  [SPAM] with [VIRII] in case X-Spam-Virus: Yes
 
 If you mean header_checks, you can't. header_checks operate on headers
 ONE at a time. you can't tell it to rewrite the subject based on
 X-Spam-* headers.

FWIW, you should be able to do this with a plugin.  I'd probably do
something like generating your own tag to be used during the rewrite
stage.

-- 
Randomly Selected Tagline:
At this point you can step away from the computer for a little while
 and have a quick snack while INN compiles.   - INN INSTALL file


pgpsVbzqmb0S7.pgp
Description: PGP signature


Re: more habeas spam

2009-01-09 Thread Theo Van Dinter
On Thu, Jan 08, 2009 at 04:37:37PM +0100, Karsten Bräckelmann wrote:
  It appears to me that the HABEAS rules are hitting only a very tiny 
  fraction of 
  mail, many of the nightly mass-checks don't have a hit at all (or is it 
  that those 
  checks don't contain any network checks?). The aggregated view shows no 
  hits at all 
  for these rules. 
 
 Network tests are done once a week, not daily.

Just to share some data, my last weekly run shows:

  0.084   0.   1.26380.000   0.580.00  HABEAS_ACCREDITED_SOI
  0.010   0.   0.14840.000   0.470.00  HABEAS_ACCREDITED_COI
  0.000   0.   0.0.500   0.440.00  HABEAS_CHECKED

and generating stats from the last weekly run results from everyone:

  0.039   0.0001   0.68790.000   0.620.00  HABEAS_ACCREDITED_SOI
  0.003   0.   0.05730.000   0.510.00  HABEAS_ACCREDITED_COI
  0.000   0.   0.0.500   0.490.00  HABEAS_CHECKED

There's a handful of spam hits for a couple of people, so it's not clear if
that's misfiling or an abusive sender.  But these results are pretty good IMO.

Other related services/rules to compare to (everyone's results):

  0.076   0.   1.75050.000   0.680.00  RCVD_IN_BSP_TRUSTED
  0.008   0.0003   0.17220.002   0.520.00  RCVD_IN_BSP_OTHER

  0.143   0.0118   3.03120.004   0.620.00  RCVD_IN_DNSWL_LOW
  0.203   0.0376   3.82390.010   0.550.00  RCVD_IN_DNSWL_MED
  0.001   0.0002   0.01430.011   0.500.00  RCVD_IN_DNSWL_HI

  0.054   0.0001   0.95850.000   0.660.00  __RCVD_IN_IADB
  0.054   0.0001   0.94950.000   0.650.00  RCVD_IN_IADB_LISTED
  0.053   0.0001   0.93520.000   0.650.00  RCVD_IN_IADB_SPF
  0.025   0.   0.44250.000   0.590.00  RCVD_IN_IADB_DOPTIN
  0.012   0.   0.20420.000   0.550.00  RCVD_IN_IADB_SENDERID
  0.005   0.   0.08420.000   0.510.00  RCVD_IN_IADB_VOUCHED
  0.002   0.   0.02870.000   0.500.00  RCVD_IN_IADB_UNVERIFIED_2
  0.001   0.   0.02150.000   0.500.00  RCVD_IN_IADB_OPTIN_GT50
  0.001   0.   0.01430.000   0.500.00  RCVD_IN_IADB_EPIA
  0.001   0.   0.01250.000   0.500.00  RCVD_IN_IADB_LOOSE
  0.001   0.   0.01070.000   0.490.00  RCVD_IN_IADB_EDDB
  0.001   0.   0.01070.000   0.490.00  RCVD_IN_IADB_ML_DOPTIN
  [the other IADB rules show 0 hits]

-- 
Randomly Selected Tagline:
We use a NetApp 820 with Oracle8i (running on win2k)- The machine
 itself is amazing.  Fast, reliable, smarter than us when it breaks,
 and support is great.
 - JoAnne Martone in 006901c21ddf$11bd8220$86a77...@oit.ads.umass.edu


pgpJVksvv9nAO.pgp
Description: PGP signature


Re: custom post-processing. Howto?

2009-01-08 Thread Theo Van Dinter
On Thu, Jan 08, 2009 at 11:12:47PM +0300, JVlad wrote:
 Thanks, but is there a way to get this perl script executed as part of 
 Spamassassin work and pass there score, ip, and address?
 Does spamassassin support such post-processing plugins?

Yes, though unfortunately writing plugins is rather badly documented. :(
If you grep -r call_plugins through the tarball, you'll find all the plugin
calls.

You'd probably want check_end or finish_tests, possibly log_scan_result
if you use spamd.  You can get some idea of how to do plugins via looking at
the code in lib/Mail/SpamAssassin/Plugin ...

-- 
Randomly Selected Tagline:
Professor Farnsworth: Oh my, that steamed carrot was a bit spicy for me. 


pgpyUnaUZ0Pzq.pgp
Description: PGP signature


Re: Spam with clean URI's which forward to DNSBListed URL (by HTML redirect header)

2009-01-07 Thread Theo Van Dinter
On Wed, Jan 07, 2009 at 04:46:44PM +0100, Florian Lagg wrote:
 So - if possible - I want spamassassign to:
 1. Request the links in the mail body and check them for http-error 302 or
 meta redirects
 2. Check the links we got by doing this against some DNSBL's
  
 Is this possible? Is there a reason why we shouldn't do this?

Possible?  Sure.
Should?  Not unless you want to turn your (and anyone else running that code's)
machine into a DDoS client.

In other words, while it's possible to shoot yourself in the face, it's really
not a good idea to do so.

-- 
Randomly Selected Tagline:
Where are all the great pot head writers?  There aren't any.  Because no
 one wants to read a book about the most delicious twinkie.
 - Dave Attell, Insomniac, New York City, 2001


pgpiFGvJv5hhP.pgp
Description: PGP signature


Re: AND logical operation for scoring options

2009-01-07 Thread Theo Van Dinter
rtm for meta rules

:)

On Wed, Jan 07, 2009 at 09:45:18AM -0800, ml wrote:
 Concerning scoring options defined on “user_prefs”, is there a way to 
 apply AND logical operation for two or more SYMBOLIC_TEST_NAMEs describing 
 like “score A  B 2.0”?  If it is not available now, let me know how to 
 react as a temporary resolution.
 
 In case that (A || B) sometimes appears on non-spams but (A  B) frequently 
 appears on spams, we can eliminate such spams by applying AND logical 
 operation.

-- 
Randomly Selected Tagline:
I have a simple test to determine if any windows executable that I
 received via E-mail is a virus or not: If I received it, it's a virus.
 - Charlie Watts on the SpamAssassin mailing list


pgpyipdkNdQcZ.pgp
Description: PGP signature


Re: What does it mean?

2009-01-05 Thread Theo Van Dinter
On Mon, Jan 05, 2009 at 08:46:37AM -0800, schnee wrote:
 1: MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 So what ? Do I have to send a text only part also? All my users can read
 HTML.

It'd probably be a good idea to do multipart/alternative w/ an appropriate
text/plain version.

 2. HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of words
 There is one 862 bytes image in the message. It is an example of an icon the
 user could 
 click for some new tool in the main site. What's wrong with that? 

It's an image with a small amount of text, so looks like a graphic spam.

 3. HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
 If someone could be more specific about these, I'd appreciate.

It's text/html only w/ no html tag.

-- 
Randomly Selected Tagline:
... one of the main causes of the fall of the Roman Empire was that,
 lacking zero, they had no way to indicate successful termination of their
 C programs.  - Robert Firth


pgpU7nCxZdlNb.pgp
Description: PGP signature


Re: TO: and FROM: line are the same.

2009-01-05 Thread Theo Van Dinter
On Sun, Jan 04, 2009 at 05:28:45PM -0500, Matt Kettler wrote:
 I don't know that anyone said it couldn't be done. It is however rather
 expensive. That long multi-header regex could take a very long time to
 run because it may have to scan the entire header block if one of the
 From/To headers is missing.

fwiw, in 3.1 there was a rule to look for this stuff (FROM_AND_TO_SAME) using
an eval rule (would now be a plugin), which is much more efficient for this
type of thing than a RE rule.

I don't recall the details, but since it's not in 3.2, I would say that the
rule was found not to provide useful results and was removed.  The 3.1
STATISTICS files say:

STATISTICS-set0.txt:  0.009   0.0113   0.00190.857   0.300.00 
FROM_AND_TO_SAME
STATISTICS-set1.txt:  0.008   0.0105   0.00190.848   0.300.00 
FROM_AND_TO_SAME
STATISTICS-set2.txt:  0.008   0.0105   0.00190.848   0.300.00 
FROM_AND_TO_SAME
STATISTICS-set3.txt:  0.010   0.0129   0.00190.873   0.300.00 
FROM_AND_TO_SAME

So that's pretty horrible.  The situation may be different now, but someone
would have to do a test run to see what the results are given newer mails.

-- 
Randomly Selected Tagline:
No prisoner's dilemma here.  Over the long term, symbiosis is more
 useful than parasitism.  More fun, too.  Ask any mitochondria. - Larry Wall


pgpExWtQ6GsEU.pgp
Description: PGP signature


Re: Problem with spamassassin not finding razor-agent.conf

2008-12-11 Thread Theo Van Dinter
On Thu, Dec 11, 2008 at 05:33:36PM +, Johan Borch wrote:
 [22640] warn: razor2: razor2 check failed: No such file or directory razor2:
 Can't read conf file: = /etc/razor/razor-agent.conf at
 /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Plugin/Razor2.pm line 326.

Do you have a razor_config config line somewhere, perhaps that looks like:

razor_config = /etc/razor/razor-agent.conf

?

-- 
Randomly Selected Tagline:
I won't be made useless, or be idle with despair. - Jewel, Hands


pgpy64WsDGqqu.pgp
Description: PGP signature


Re: 1000 times easier to just do sa-update --nogpg

2008-12-09 Thread Theo Van Dinter
On Tue, Dec 09, 2008 at 10:54:23PM -0700, LuKreme wrote:
 echo 24F434CE  gpg.keys
 echo 6C6191E3  gpg.keys
 echo 856AA88A  gpg.keys
 
 The three lines that are echo HEXCODE  gpg.keys are the issue for  
 me, I guess. Where do those numbers come from?

They're the keyids for the given channels you're using.  The channel
publishers should state the keyid in use for the channel.  You need to specify
them so that when sa-update checks the signature on the update file, it will
know what keyid to consider valid, which protects you from someone else
creating a channel update file and signing it with another random key.

-- 
Randomly Selected Tagline:
I've always tried to teach you two things. Never let them see you bleed,
 always have an escape plan. - Q in The World is Not Enough


pgp9AX5qHLwiq.pgp
Description: PGP signature


Re: Log

2008-12-06 Thread Theo Van Dinter
On Sat, Dec 06, 2008 at 01:50:20PM +0100, Jon Essen-Moller wrote:
 So you look in the /var/log/maillog (maybe with grep) and find messages 
 and their id you are interested in. I get you that far.

:)

 Are there a log somewhere where one can find information like the last 
 log entry you pasted below?

The last log entry was from spamd which logs (by default) to syslog's mail
facility, and so typically ends up in maillog with everything else.  If you're
using a different method for calling SA (third party daemon, etc,) then they
may log differently and you'd have to talk to those folks about what they do.

-- 
Randomly Selected Tagline:
Your next question is 'How does this gate work?'  I don't know.  I
 don't have to know, I'm not an Electrical Engineer, I'm a Computer
 Scientist.  - Prof. Hamel


pgpeHbkD6j6Vx.pgp
Description: PGP signature


Re: Single URI spam not checked against URIBLs

2008-12-06 Thread Theo Van Dinter
On Sat, Dec 06, 2008 at 11:16:03PM +0100, Wolfgang Zeikat wrote:
 Could you describe more elaborately how you did that?

You may wish to take a look at cpan2rpm, fwiw.

-- 
Randomly Selected Tagline:
... Either this man is suffering from serious brain damage, or the new 
 vacuum cleaner's arrived... - Rowan Atkinson


pgpHwSmixaDmg.pgp
Description: PGP signature


Re: Spam slipping through

2008-12-06 Thread Theo Van Dinter
On Sat, Dec 06, 2008 at 08:00:10PM -0800, John Hardin wrote:
 mechanism for. Devs: there've been wishes for this before; how hard
 would it be to add the ability to match on the substring match captured
 by another rule? Add a flag to say capture the match for this rule and
 a syntax for substituting that into the match RE of another rule, and
 dependency enforcement?

Non-trivial.  Write a plugin, where it is trivial.  :)

-- 
Randomly Selected Tagline:
Advice is kind of like sex. It's not always good, it's not always free
 and you don't always get from the person you want to get it from.
  - Peter Liam Taylor


pgp1xZloiRWbN.pgp
Description: PGP signature


Re: Backup command for AWL?

2008-12-05 Thread Theo Van Dinter
On Fri, Dec 05, 2008 at 11:58:26AM -0500, Rosenbaum, Larry M. wrote:
 The Bayes database can be backed up and restored with sa-learn 
 --backup/--restore.  Is there any similar way to back up and restore a 
 MySQL-based AWL database?  The check_whitelist command is only good for DBM 
 files.

If you're using MySQL, why not just use the standard MySQL backup tools?
ie: mysqldump, etc.

-- 
Randomly Selected Tagline:
Stop stealing my blanket.  You're an arctic wolf for god sakes.  You're
 getting soft.  - Due South


pgpvKAT8bxXGN.pgp
Description: PGP signature


Re: Log

2008-12-05 Thread Theo Van Dinter
On Fri, Dec 05, 2008 at 12:53:20AM +0100, Jon Essen-Moller wrote:

the mail was in HTML, so it's basically unreadable.  text please.

I did get out of it:

I wish to check a specific mail address and see if many mails are
classified as spam that are sent to that address.br

It sounds like you want SA statistics instead of information out of Bayes.  SA
doesn't keep track of this kind of information.  You probably want to take a
look at your mail log (or wherever the appropriate location is for however you
run SA) to get that kind of information.

For example:

Nov 30 04:03:08 eclectic postfix/smtpd[617]: EF297AF143: 
client=p4FCCB2A7.dip.t-dialin.net[79.204.178.167]
Nov 30 04:03:09 eclectic postfix/cleanup[608]: EF297AF143: message-id=[EMAIL 
PROTECTED]
Nov 30 04:03:09 eclectic postfix/qmgr[12948]: EF297AF143: from=[EMAIL 
PROTECTED], size=1814, nrcpt=2 (queue active)
Nov 30 04:03:10 eclectic postfix/local[32692]: EF297AF143: to=[EMAIL 
PROTECTED], orig_to=[EMAIL PROTECTED], relay=local, delay=2, status=sent 
(delivered to command: /usr/bin/procmail -a $EXTENSION)
Nov 30 04:03:17 eclectic postfix/local[917]: EF297AF143: to=[EMAIL 
PROTECTED], orig_to=[EMAIL PROTECTED], relay=local, delay=9, status=sent 
(delivered to command: /usr/bin/procmail -a $EXTENSION)
Nov 30 04:03:17 eclectic postfix/qmgr[12948]: EF297AF143: removed

So if I was interested in mails to [EMAIL PROTECTED], this would come up, and I
see the message-id in there.  Then, since I use spamd, I can figure out what
the results were:

Nov 30 04:03:10 eclectic spamd[336]: spamd: result: Y 17 - 
BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,DRUGS_MUSCLE,FB_CIALIS_LEO3,FB_GET_MEDS,FR_ALMOST_VIAG2,FUZZY_MEDICATION,FUZZY_PRICES,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_PBL,RDNS_DYNAMIC,VIA_GAP_GRA
 
scantime=1.5,size=1964,user=felicity,uid=501,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=33858,mid=[EMAIL
 PROTECTED],bayes=1.00,autolearn=disabled 

Your system may vary entirely.

-- 
Randomly Selected Tagline:
The goal of computer science is to build something that will last at
 least until we're finished building it.  - Unknown


pgpqJFRf3ygWt.pgp
Description: PGP signature


Re: Running message through a single SA test

2008-12-04 Thread Theo Van Dinter
On Wed, Dec 03, 2008 at 06:46:32PM -0700, Kelly Jones wrote:
 I want to run a message through ONE SpamAssassin test w/o the overhead
 of running all the tests.
 
 Does SA have a --run-just-this-test=FOO option?

It sounds like you want to take a look at the mass-check tool. :)

-- 
Randomly Selected Tagline:
A side impact by a bicycle totaled my Dauphine after only one year. 
 - Unknown about the Renault Dauphine


pgp4BvGPUYv7Z.pgp
Description: PGP signature


Re: Log

2008-12-04 Thread Theo Van Dinter
On Thu, Dec 04, 2008 at 10:52:18PM +0100, Jon wrote:
 Does anyone know if it is possible to retrieve information from any of 
 theses files below about mails that are classified as spam?
 
 Or in general. I there a way to view statistics from spammassassin?
 
 bayes_seen 
 bayes_toks 

What kind of information do you want?  sa-learn --dump magic is really all
you can get out of these.  bayes_seen will give you IDs of messages learned,
but you'd have to do some processing of mails to generate the IDs to map
backwards.

 bayes_toks.expire16585 
 bayes_toks.expire1852 
 bayes_toks.expire26661 
 bayes_toks.expire31998 
 bayes_toks.expire4343

delete these.  they're temp files used when doing a bayes expiry.  if they're
still around, it means the expire process is being killed externally before it
completes.

-- 
Randomly Selected Tagline:
Never try to outstubborn a cat.
-- Lazarus Long, Time Enough for Love


pgpqpCSgSQ4tK.pgp
Description: PGP signature


Re: Bad check_for_from_to_same code in EvalTests.pm?

2008-12-03 Thread Theo Van Dinter
On Wed, Dec 03, 2008 at 07:13:26AM -0700, Kelly Jones wrote:
 SA doesn't use EvalTests.pm's check_for_from_to_same test, but part of
 the code looks like this:

Wow.  Had to whip out the 3.1 code to find this...

 Is that right? Shouldn't the 'eq' be 'ne'?

As the comment about 6 lines up from there says:

# From and To have same address, but are not exactly the same and
# neither contains intermediate spaces.

:)

-- 
Randomly Selected Tagline:
Before his State of the Union speech, the president's niece was arrested
 for trying to fill a fake prescription for the anti-anxiety drug Xanax. If
 you're not familiar with Xanax, the best way to describe it is, after
 taking three or four with a wine cooler, you become a really, really
 compassionate conservative.- Bill Maher, Politically Incorrect


pgpWbJxluAOBI.pgp
Description: PGP signature


Re: Change Score Set

2008-12-01 Thread Theo Van Dinter
On Mon, Dec 01, 2008 at 08:30:32AM -0800, jlefvendahl wrote:
 I am new to administrating a server with SpamAssassin.  Currently, our server
 is using score set (0), and I would like for it to be (3) - Bayes + network. 
 I need some specific instructions on how to change this server-wide -
 assistance much appreciated.

It really all depends on how you have things configured.

Network tests are usually enabled by default unless you don't have Net::DNS
installed or are running in local-only mode (-L).

Bayes is also enabled by default, but won't be used until you've learned at
least 200 ham and 200 spam messages, using the DB that you are using
site-wide.

If you are using a third-party daemon/etc, there are possibly other things
that need to be done, but you'd have to research those at the appropriate
site/list.

Hope this helps.

-- 
Randomly Selected Tagline:
In USA Today, a new survey reports that seventy-nine percent of Americans
 said that rudeness is a serious national problem.  The other twenty-one
 percent told the survey takers to screw off.
 - Conan O'Brian, The Conan O'Brian Show, 2002.07.24


pgp3rlZMZ6cVY.pgp
Description: PGP signature


Re: Auto-whitelist not closing file

2008-12-01 Thread Theo Van Dinter
On Mon, Dec 01, 2008 at 03:42:05PM -0500, Dan Barker wrote:
 How do I go about trapping this error in locker? (Specifically, how do I
 figure out who Called locker, to find the code that's not closing the file
 it opened?)
 
 Has anyone else run into this sort of issue?

The last time this sort of issue came up, it was traced back to a bug in
DB_File.  Specifically, the untie call would actually not let go of the DB.
SA doesn't actually open the database files itself, it lets tie/untie
(DB_File) deal with it.

-- 
Randomly Selected Tagline:
It is our job to protect the magic smoke ...  - Prof. Michaelson


pgpazod0q97W7.pgp
Description: PGP signature


Re: Custom rules

2008-12-01 Thread Theo Van Dinter
On Mon, Dec 01, 2008 at 10:37:36PM +0100, Fabrizio Regalli wrote:
   uri LOCAL_URI_VIAPAYPAL /www\.viapaypal\.com\//
  score LOCAL_URI_VIAPAYPAL   5.0
  (for add five points to e-mail contains www.viapaypal.com 
  http://www.viapaypal.com/ into body)
   I've add it to /etc/mail/spamassassin/local.cf http://local.cf but I
  can't see it with spamassassin --lint -D (and the rule seem doesn't work)

What are you expecting to see in --lint -D output for this rule?  --lint
generates an internal message which is not going to trigger the rule, so you
aren't going to see anything in the -D output.

Perhaps you want to shove a message through just spamassassin -D ?

-- 
Randomly Selected Tagline:
It is far more impressive when others discover your good qualities
 without your help. - Zen Musings


pgpil5aIVFsfS.pgp
Description: PGP signature


Re: OS Upgrade Broke SpamAssassin; Help Needed to Fix

2008-11-30 Thread Theo Van Dinter
On Sun, Nov 30, 2008 at 04:39:49PM -0800, Rich Shepard wrote:
 [EMAIL PROTECTED] ~]$ /usr/local/bin/spamassassin -V
 spamassassin: spamassassin script is v3.001007, but using modules v3.002005
 
   How should I proceed to fix the installation so there's only one copy
 (either in /etc/mail/spamassassin or /usr/local/bin) and that's the latest
 version? I would like to get this fixed ASAP so I can turn it back on and
 still have the MTA working.

Find all SA-related files and delete them.  Then go back and install a fresh
version.  Make sure to save your own site configs and such, and reinstall them
when you're done.

-- 
Randomly Selected Tagline:
I cannot have an aide who will not look up. You will be forever walking
 into things. - Dukhat on Babylon 5


pgpZP1B8RUQQd.pgp
Description: PGP signature


Re: spamc and extra rules

2008-11-23 Thread Theo Van Dinter
On Sun, Nov 23, 2008 at 09:23:51PM +, Geoff Soper wrote:
 Is this the right way of bringing in extra rules? I want these rules to 
 be in addition to the ones in /etc/mail/spamassassin/local.cf and not 
 instead of.

No, spamc has no impact on rules.  You could look at putting the rules in the
user's user_prefs file, but you'd have to then also set allow_user_rules 1
in local.cf to allow user preferences to include rules.  Be sure to think
about the security aspects to this before doing so.  If you're the only user,
I'd recommend just using local.cf, of course. :)

-- 
Randomly Selected Tagline:
You're not significant until someone complains about you publically.
 - Theo Van Dinter


pgpxYO9qf7Rtd.pgp
Description: PGP signature


Re: prefork: oops! no idle kids in need_to_del_server?

2008-10-27 Thread Theo Van Dinter
On Mon, Oct 27, 2008 at 10:07:15PM +0100, Per Jessen wrote:
 I was about to open a bugreport on this until I did a search for spamd
 reports:
 
 https://issues.apache.org/SpamAssassin/buglist.cgi?quicksearch=spamd
 
 There are 195 reports, of which 90% or more seem to be new.  Has the
 spamd maintainer gone away and died?

This is where someone, apparently me, gets to say: patches welcome and
encouraged.  :)

The SA devs are generally a pretty busy lot, and typically $DAYJOB doesn't
involve SA, so ...

-- 
Randomly Selected Tagline:
Software engineering is a race between engineers who try to create
 foolproof software and the universe which is trying to create bigger
 fools.  So far, the universe is winning...   - Michael H. Warfield


pgpS5Kiedora0.pgp
Description: PGP signature


Re: URIBL_BLACK

2008-10-10 Thread Theo Van Dinter
This has come up on the list before, but...  Looking at my most recent
network run:

OVERALLSPAM% HAM% S/ORANK   SCORE  NAME
  0   460740215640.955   0.000.00  (all messages)
0.0  95.5290   4.47100.955   0.000.00  (all messages as %)
 74.714  78.1593   1.11300.986   0.780.00  URIBL_BLACK

a 1.1% FP rate is very bad IMO.  SURBL is  0.1%, for comparison.


On Fri, Oct 10, 2008 at 04:55:57PM -0400, [EMAIL PROTECTED] wrote:
 Of the fair amount of false negatives that get through, more than 90% of 
 them appear to hit on URIBL_BLACK.  I have incrementally increased it 
 recently to a score of 5.0 (I hit on 6.0).  The stuff that's still getting 
 through seems to be hitting on only URIBL_BLACK.
 
 I am very tempted to bump the score of it to 6.0 or higher, as it would 
 drastically reduce spam, but I'd like to get any false positive feedback 
 on doing that first.  I haven't seen any so far, but I figure others must 
 be doing this.

-- 
Randomly Selected Tagline:
... the Saab company didn't report a slight problem with the Saab 9000 
 cars.  The Saabs have a problem with the wiring which causes the engine 
 to fail, and the power windows and door locks to stop working.  The car then 
 fills with smoke pouring from the dashboard, and then may explode.
  - From Headline News


pgpKVoWgQZcdo.pgp
Description: PGP signature


Re: URIBL_BLACK

2008-10-10 Thread Theo Van Dinter
On Sat, Oct 11, 2008 at 12:01:48AM +0200, Benny Pedersen wrote:
 meta URIBL_BLACK_ADJ (URIBL_BLACK)
 describe URIBL_BLACK_ADJ Meta: i trust uribl more :)
 score URIBL_BLACK_ADJ 1.5
 
 that way you still benefit from score adjust on sa-rules

The right way to do this is:

score URIBL_BLACK (1.5)

you don't need another rule, you just want to add a value to the score.

-- 
Randomly Selected Tagline:
Presenting MIP- Men in Pain
 Starring... Mr. T... Our new security advisor... pocket protectors with
 an attitude!  I pity the fool who tries to break into MY firewall!
 -Don Roeber


pgp2IwCJOWdq6.pgp
Description: PGP signature


Re: URIBL_BLACK

2008-10-10 Thread Theo Van Dinter
On Sat, Oct 11, 2008 at 12:15:00AM +0200, Yet Another Ninja wrote:
  74.714  78.1593   1.11300.986   0.780.00  URIBL_BLACK
 
 Would you pls post those FP URIs so ppl can judge what your rating is 
 based upon.

(imperfect) command posted for my future reference ...
$ grep URIBL_BLACK ham-net-theo.log | samailoffset | egrep -A1 ' URIBL_BLACK ' 
| grep URIs | sort | uniq -c | sort -rn

180 *  [URIs: displaymarketplace.com]
 16 *  [URIs: cmcx4.com]
  6 *  [URIs: bme1.net]
  5 *  [URIs: s2d6.com]
  5 *  [URIs: expatica.com]
  3 *  [URIs: closeoutcatalogoutlet.com]
  2 *  [URIs: n-email.com]
  2 *  [URIs: lduhtrp.net]
  2 *  [URIs: internetbrands1.com]
  1 *  [URIs: mybid.com.au]
  1 *  [URIs: jdoqocy.com]
  1 *  [URIs: delivra.com]
  1 *  [URIs: bit.ly]
  1 *  [URIs: barackobama.com]

-- 
Randomly Selected Tagline:
Oh ...   I love God.  He's so deliciously evil. - Stewie on Family Guy


pgp7Lck3Fx2Ha.pgp
Description: PGP signature


Re: check_whitelist

2008-10-08 Thread Theo Van Dinter
On Wed, Oct 08, 2008 at 07:49:41PM +0200, Per olof Ljungmark wrote:
 The check_whitelist tool is apparently gone,
 - can we use this tool from older releases with 3.2.5?

Not sure.  Probably, unless the format changed.

 Is there any work to get tools/ back?

It got removed from the tarball because the stuff in there is totally
unsupported, but you can still get it from SVN:

http://svn.apache.org/repos/asf/spamassassin/trunk/tools/

-- 
Randomly Selected Tagline:
Programming isn't so much a profession as it is an obsessive-compulsive
 disorder.  - Unknown


pgpSgZtNAkKbJ.pgp
Description: PGP signature


Re: Turning off all tests

2008-10-06 Thread Theo Van Dinter
On Mon, Oct 06, 2008 at 08:19:49AM -0700, NeoSHNIK wrote:
 I am making a new plugin and in order gather enough data about its
 performance I need to turn off all other tests. I was very surprised that
 there aren't any topics about it.
 So how does one turn off all SA tests?

Set their scores to 0 or remove the cf files, and disable other plugins as
appropriate.

Perhaps you want mass-check which lets you test specific rule(s) on a corpus
of mail instead of going through the other SA tools..

 inserted that at the end of the local.cf file. I also disabled all the
 network tests and bayes learning. So why does my test email still have a
 score or 0.9? 
 X-Spam-Status: No, score=0.9 required=1.0 tests=AWL,DRUGS_ERECTILE,
 PLING_PLING autolearn=disabled version=3.1.7-deb

AWL is a dynamic rule that you disable through other means (like disabling the
plugin).

-- 
Randomly Selected Tagline:
The Internet treats censorship like damage and routes around it.
  - Sebastian Kuzminsky


pgpdvGz2ekgqe.pgp
Description: PGP signature


Re: bayes_token table too big?

2008-10-06 Thread Theo Van Dinter
On Mon, Oct 06, 2008 at 03:42:53PM -0400, Rosenbaum, Larry M. wrote:
 And here is the information from the local.cf file:
 
 bayes_expiry_max_db_size  50
 
 So the config file says 500 thousand tokens, but the database has 105 million 
 entries.  Have I misunderstood something, or is expiry not working correctly?

Do an expire run w/ -D bayes and show the expiry details.

It's likely that your tokens are such that there's no good expiry delta to
use, so each run removes as many as it can w/out going over (it's like the
Price is Right...)

-- 
Randomly Selected Tagline:
... and on that side you have a 50kg kid, and that's a pretty good sized
  kid...  - Prof. Farr


pgpMyAD2CsHMV.pgp
Description: PGP signature


Re: updates.spamassassin.org 2ndaries not updating (was re dsbl)

2008-09-26 Thread Theo Van Dinter
On Fri, Sep 26, 2008 at 03:04:56PM +0100, Justin Mason wrote:
 Kelsey, Theo, can you check and see why your secondaries aren't picking up
 the zone change on updates.spamassassin.org?  cheers,

Grrr.  I really need to fix this stupid bind package:

Sep 26 11:31:05 eclectic named[29926]: dumping master file: 
slave/tmp-8bVUx5lsNT: open: permission denied

it keeps setting the permissions on the directory such that named can't
write to it so updates fail, at least when I upgrade the package as I
did the other day.  It's fixed now and I forced a refresh:

Sep 26 11:50:07 eclectic named[29926]: zone spamassassin.org/IN: transferred 
serial 2008092600

:(

-- 
Randomly Selected Tagline:
Screens are sometimes called displays because they display stuff ... 
  - UNIX for Dummies


pgpPcKtoLETko.pgp
Description: PGP signature


Re: MATCH_WORDS false positives

2008-09-24 Thread Theo Van Dinter
On Wed, Sep 24, 2008 at 01:52:27PM -0500, Alan Lehman wrote:
 I've seen a few false positives that hit MATCH_WORDS_5. Can someone
 point me to this rule so I can try to determine what is causing the hit?

As far as I can see, there is no such rule in the standard or updates
rulesets.  Perhaps it's something you have defined locally?  Check out
/etc/mail/spamassassin/*.cf or whatever your site rules dir is.

-- 
Randomly Selected Tagline:
I'm a fraud - a poor, lazy, sexy fraud. -Bender 


pgpMFnArVFjT3.pgp
Description: PGP signature


Re: Folder Redirection Besides classification

2008-09-11 Thread Theo Van Dinter
On Thu, Sep 11, 2008 at 05:03:06PM +0100, David Carvalho wrote:
 Is it possible to redirect classified spam to another file, just after
 classification,  instead of 

No.

 appending to the user regular mail file (like /var/mail/usermail) ?

SA isn't doing that either.  It's just marking up the message.

-- 
Randomly Selected Tagline:
It started as all journies do, with a beginning...   - Commercial


pgpJLXcjoDaX4.pgp
Description: PGP signature


Re: Setting up razor

2008-09-06 Thread Theo Van Dinter
On Sat, Sep 06, 2008 at 11:32:54AM -0400, Skip wrote:
 [EMAIL PROTECTED] [~]# telnet discovery.razor.cloudmark.com 2703
 Trying 208.83.137.205...
 telnet: connect to address 208.83.137.205: Connection timed out
 Trying 208.83.137.117...
 telnet: connect to address 208.83.137.117: Connection timed out
 
 Should I be able to telnet to discovery.razor.cloudmark.com on port 
 2703?  If my system is blocking that port for some reason, can other 
 ports be used and where is that configured?  I don't know how successful 
 I would be at getting my server to unblock that port.

It would seem you probably have a firewall in the way.  As far as I know,
no, you can't use other ports, the servers only run on 2703.

-- 
Randomly Selected Tagline:
Oh My God! They Killed init! You Bastards!   - Unknown


pgpVtQsvvnRXr.pgp
Description: PGP signature


Re: How to avoid localhost mails tagged as spam

2008-08-25 Thread Theo Van Dinter
Since you're using amavis, you'd have to ask those folks.
SA will scan anything given to it, so ...

On Tue, Aug 26, 2008 at 01:05:39AM +0200, GoodnGo.de (R) Zentrale wrote:
 Easy solution: Don't pass mail from localhost to spamassassin.
 
 Hello Evan,
 
 how can I do that?
 
 (I am newbie)

-- 
Randomly Selected Tagline:
How to knock yourself out: Take a large herring, make contact with back 
 of head.  Repeat if necessary.- Theo


pgpKsGRvifIxb.pgp
Description: PGP signature


Re: SA scores MISSING_SUBJECT, but message _has_ a Subject

2008-08-20 Thread Theo Van Dinter
If you think there's an issue, feel free to pastebot the message somewhere and
folks can take a look.  Otherwise there's not much people are going to be able
to comment on.

My guess is that however you're feeding mails into SA is having issues.

On Wed, Aug 20, 2008 at 09:18:37AM -0700, Bob Gereford wrote:
 If at all relevant, I just received another legit message and, despite
 having both a Subject  Message that are apparently valid, the spam score
 includes:
 
  1.8 MISSING_SUBJECT MISSING_SUBJECT
  1.4 EMPTY_MESSAGE EMPTY_MESSAGE
 
 Clearly, something's not right here ... :-(

-- 
Randomly Selected Tagline:
Cats are smarter than dogs.  You can't make eight cats pull a sled through
 the snow.


pgptZC6nLHaE5.pgp
Description: PGP signature


Re: SA scores MISSING_SUBJECT, but message _has_ a Subject

2008-08-20 Thread Theo Van Dinter
On Wed, Aug 20, 2008 at 09:34:34AM -0700, Bob Gereford wrote:
 Here's the paste of the raw message content from the last message
 http://pastebin.com/d57d0894d

Yeah, nothing strange there.  Passing it through spamassassin shows what
you'd expect:

X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RDNS_NONE,SPF_FAIL,
SPF_HELO_PASS autolearn=disabled version=3.2.5

I was noticing that the X-Spam headers as posted aren't in the standard
format (X-Spam-Status), nor is there a X-Spam-Checker-Version header which
makes me think you're not calling SA directly to process the mails.  So how
are you sending mails to SA?

-- 
Randomly Selected Tagline:
Commitment can be illustrated by a breakfast of ham and eggs.
 The chicken was involved, but the pig was committed - Unknown


pgpKy1XviNFCa.pgp
Description: PGP signature


Re: sa-update needs --nogpg

2008-08-20 Thread Theo Van Dinter
http://wiki.apache.org/spamassassin/SaUpdateKeyNotCrossCertified

On Thu, Aug 21, 2008 at 01:36:30AM +0800, [EMAIL PROTECTED] wrote:
 Just want to mention that
 $ sa-update -D
 [7581] dbg: gpg: gpg: WARNING: signing subkey 24F434CE is not cross-certified
 [7581] dbg: gpg: gpg: please see 
 http://www.gnupg.org/faq/subkey-cross-certify.html for more information
 The update downloaded successfully, but the GPG signature verification failed.
 
 So need
 $ sa-update -D --nogpg
 [7612] dbg: http: GET request, 
 http://daryl.dostech.ca/sa-update/asf/681717.tar.gz
 for it to work.

-- 
Randomly Selected Tagline:
Oh...  Well at least it didn't explode...  - Prof. Wills


pgpLLwQGcUTSq.pgp
Description: PGP signature


Re: RCVD_ILLEGAL_IP question(s)

2008-08-13 Thread Theo Van Dinter
On Wed, Aug 13, 2008 at 03:33:56PM -0700, SM wrote:
 They are not the only ones using these IP addresses for internal 
 use.  It will be interesting to see what happens when these IP 
 addresses are assigned.

Reminds me of a time where I ran into a company who internally were
using long-time public address space from a different company.  They were
surprised when they couldn't get to http://www.hp.com/.  Oops.

-- 
Randomly Selected Tagline:
Do not underestimate the value of print statements for debugging.
 Don't have aesthetic convulsions when using them, either.


pgpnq25DzlvV9.pgp
Description: PGP signature


Re: Pharma spam getting through again

2008-08-12 Thread Theo Van Dinter
On Tue, Aug 12, 2008 at 12:41:17PM -0700, Owen Mehegan wrote:
 Here are two more that got through today. Even several hours later, these 
 haven't shown up in blacklists. Do anyone else's rules catch these?

Your main problem is that both messages hit BAYES_00:

 X-Spam-Status: No, score=2.0 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
   HTML_MESSAGE autolearn=no version=3.2.1
...
 X-Spam-Status: No, score=4.7 required=5.0 tests=BAYES_00,BOTNET_SERVERWORDS,
   FREEMAIL_FROM,GEO_QUERY_STRING,HTML_MESSAGE autolearn=no version=3.2.1

Versus:

$ spamassassin -D check  spam1  /dev/null
[10048] dbg: check: is spam? score=7.001 required=5
[10048] dbg: check: 
tests=BAYES_99,DKIM_SIGNED,DKIM_VERIFIED,HTML_MESSAGE,RCVD_IN_BL_SPAMCOP_NET
$ spamassassin -D check  spam2  /dev/null
[10069] dbg: check: is spam? score=6.197 required=5
[10069] dbg: check: 
tests=BAYES_99,DKIM_SIGNED,DKIM_VERIFIED,GEO_QUERY_STRING,HTML_MESSAGE

-- 
Randomly Selected Tagline:
Oh gee, there it is, too bad.- Prof. Farr


pgpC1YBC33QBz.pgp
Description: PGP signature


Re: Mass-check not scanning all messages.

2008-08-10 Thread Theo Van Dinter
On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
 I have a custom spam corpus that I am trying to run rules against to test
 their effectiveness however mass-check will only scan a few (  5 ) messages
 of the spam and usually only 1 or 2 of the ham messages.  Any clues? Roughly
 a week of googling and I can't find anyone with this exact problem.

Can you be more specific about what you're doing / how your corpus
is setup / etc?  You've essentially said things don't work, what's
wrong. :)

Some random thoughts: do you have mbox files but are not specifying them
as such?  are the majority of messages  250k?

-- 
Randomly Selected Tagline:
Besides, I think [Slackware] sounds better than 'Microsoft,' don't you?
   - Patrick Volkerding


pgpRdKPRN39IS.pgp
Description: PGP signature


Re: rules dataset archive with creation_date

2008-08-07 Thread Theo Van Dinter
On Thu, Aug 07, 2008 at 11:36:57AM -0700, Gigi Albertosi wrote:
   I'm wondering if there is a place where I can find
 an archive of spamassassin official rules and their associated date
 of creation/update.
 
 For example, a dataset of the type
 
 RULE_NAME1LAST_UPDATE
 RULE_NAME2LAST_UPDATE
 RULE_NAME3LAST_UPDATE

Nope, there's nothing like that.

 Can you point me to a link or info that I can use to reconstruct such dataset?

Everything is in SVN.  You'd have to go through each commit and figure out the
timelines.

-- 
Randomly Selected Tagline:
In case you were wondering, that was just for Zapp. 
-Leela, after kissing Fry


pgpsLWl6G2U2y.pgp
Description: PGP signature


Re: Sa-update failures? Yerp AND kluge Offline? DOS?

2008-08-04 Thread Theo Van Dinter
I don't know of any connectivity issues w/ the kluge.net server.
There were some ISP issues last month that took it offline for a day or
so, but nothing in the last couple of days.


On Mon, Aug 04, 2008 at 11:34:22AM +0100, Rob Sharp wrote:
 There was a message recently posted saying that Yerp was being taken 
 offline for a server move.
 
 Rob
 
 Michael Scheidell wrote:
 Didn't think too much of seeing this in every SA box log last night, just
 thought maybe yerp.org offline.
 Running 350.sa-update
 http: request failed: 500 Can't connect to yerp.org:80 (connect: Invalid
 argument): 500 Can't connect to yerp.org:80 (connect: Invalid argument)
 channel: could not find working mirror, channel failed
 http: request failed: 500 Can't connect to yerp.org:80 (connect: Invalid
 argument): 500 Can't connect to yerp.org:80 (connect: Invalid argument)
 Tested it, yep, off line:
 telnet yerp.org 80
 Trying 72.232.31.42...
 telnet: connect to address 72.232.31.42: Connection refused
 telnet: Unable to connect to remote host
 
 But, then saw this in a couple of them and thought this was too weird.
 Concentrated DOS attack against the saupdate channel servers?
 
 http: request failed: 500 Can't connect to spamassassin.kluge.net:80
 (connect: timeout): 500 Can't connect to spamassassin.kluge.net:80 
 (connect:
 timeout) 
 
 While looking up information on taint.org, got it offline also.
 (well, its the same box ;)
 telnet taint.org 80
 Trying 72.232.31.42...
 telnet: connect to address 72.232.31.42: Connection refused
 telnet: Unable to connect to remote host
 
 Looks fine now, and sa-update -D doesn't show any missing updates 
 available.
 

-- 
Randomly Selected Tagline:
How do I type for i in *.dvi do xdvi i done in a GUI?
 (Discussion in comp.os.linux.misc on the intuitiveness of interfaces.)


pgp6M22hzrzaQ.pgp
Description: PGP signature


  1   2   3   4   5   6   7   8   9   10   >