Resigning from SpamAssassin project

2014-01-03 Thread Duncan Findlay
Hi everybody,

I greatly enjoyed my time working on SpamAssassin, but sadly I haven’t had time 
to work on SpamAssassin in a long time.

As a result, I’d like to officially resign from the project, both as a 
committer and as a member of the SpamAssassin PMC.

Thanks
Duncan Findlay



Resigning from SpamAssassin project

2014-01-01 Thread Duncan Findlay
Hi everybody,

I greatly enjoyed my time working on SpamAssassin, but sadly I haven’t had time 
to work on SpamAssassin in a long time.

As a result, I’d like to officially resign from the project, both as a 
committer and as a member of the SpamAssassin PMC.

Thanks
Duncan Findlay



Re: syncing SpamAssassin with Debian downstream

2010-01-26 Thread Duncan Findlay

On Jan 26, 2010, at 3:00 PM, Adam Katz wrote:

> If only I had noticed this before the 3.3.0 release...
> 
> There are some patches to the Debian package for 3.2.5 which are
> applicable to the trunk.  I'm comfortable incorporating some of them
> myself, but wanted to double-check on this one:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=491159
> 
> The update:
> http://svn.apache.org/viewvc?view=revision&revision=903473

All the patches to the source used in the Debian packaging can be found here: 

http://svn.debian.org/viewsvn/collab-maint/deb-maint/spamassassin/trunk/debian/patches/

Noah, is there something we can do to make it easier to submit these changes 
upstream?

> I've also added Duncan Findlay's Debian-specific rules (which don't
> actually look so Debian-specific to me) to my sandbox to see how
> useful they might be to a larger audience.  They're marked "nopublish"
> until we either get a thumbs-up from Duncan or enough time passes that
> we can assume full license compatibility.

Yikes, that file is ancient; it probably dates back to when we used nice rules 
with impunity. I'm a little surprised that nobody ever exploited those rules 
(or nobody noticed somebody exploiting the rules enough to file a bug report).

As far as license issues: I allow inclusion of all or any part of 65_debian.cf 
to be licensed under the Apache License in accordance with the CLA I have on 
file.

> To ensure there aren't licensing issues, I've copied Duncan and Noah
> Meyerhans (the current Debian packager for SpamAssassin) on this email
> as a way of keeping everybody on the same page.


Thanks

Duncan

Re: Hudson build became unstable: SpamAssassin-trunk #2776

2009-03-03 Thread Duncan Findlay


On Mar 3, 2009, at 1:56 AM, Justin Mason wrote:


On Tue, Mar 3, 2009 at 00:05, Duncan Findlay  wrote:

According to
http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776  
my commit
broke the build because we run make test on the distributed files,  
and
mass-check isn't distributed. What's the best way to work around  
this?
Presumably we should skip reuse.t if masses/mass-check doesn't  
exist, but
that could indicate an error we want to fail on. Should it be made  
a config

option?


No, I think if (-e "masses/mass-check") fails, or maybe just if (-d
"masses") fails,
skip the test.  That will be very obvious, and it's unlikely we could
accidentally
svn delete the entire masses directory without someone noticing. ;)


Makes sense to me. I've fixed this in r749680. Sorry for all the build  
failure spam!


Duncan



Re: Hudson build became unstable: SpamAssassin-trunk #2776

2009-03-02 Thread Duncan Findlay


On Mar 2, 2009, at 3:48 PM, Apache Hudson Server wrote:

See http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776/changes


According to http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776 
 my commit broke the build because we run make test on the  
distributed files, and mass-check isn't distributed. What's the best  
way to work around this? Presumably we should skip reuse.t if masses/ 
mass-check doesn't exist, but that could indicate an error we want to  
fail on. Should it be made a config option?


Also, I'm confused by this output:
http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776/console

It suggests that a couple of other tests failed as well, but these  
aren't listed in the summary?


Failed Test Stat Wstat Total Fail  Failed  List of Failed
---
t/config_tree_recurse.t   17  4352 48 200.00%  1-4
t/whitelist_addrs.t  255 6528035   44 125.71%  14-35
36 tests skipped.
Failed 2/152 test scripts, 98.68% okay. 26/3128 subtests failed,  
99.17% okay.

*** Error code 11

Any thoughts?

Thanks

Duncan




Re: svn commit: r601070 - in /spamassassin/trunk/spamc: libspamc.c libspamc.h

2007-12-06 Thread Duncan Findlay


On Dec 5, 2007, at 1:22 AM, Justin Mason wrote:

All of these constants are exposed for public use; by changing their
values, ABI compatibility is broken.



I suggest changing them back to what they were before, and simply
using (1<<14) for SPAMC_LOG_TO_CALLBACK.  That way ABI compatibility
is maintained, and callers don't need to recompile their code to
use a new libspamc.


Good point. I was only thinking in terms of source compatibility, but  
you're right that's a gratuitous and unnecessary change to the ABI.


Duncan


Re: add a new rule type: single-line body?

2007-08-07 Thread Duncan Findlay


On Jul 27, 2007, at 7:08 AM, Justin Mason wrote:

Just wondering.  would it be handy to have a new "body" type, the  
same as
"body" but matched as a single string, with all newlines converted  
to " "?

in other words, this text:


[...]

ie, no newlines, all whitespace converted to " ".  this would be  
optimal

for matching with phrase rules.  (To avoid exponential-runtime .*
problems, it'd chop the text after the first 8000 characters or so.)


How is this different than rawbody /s rules?




Re: CEAS 2007 Live Spam Challenge (today)

2007-08-04 Thread Duncan Findlay


On Aug 2, 2007, at 3:57 AM, Justin Mason wrote:



Mark Martinec writes:

Does anyone have a SpamAssassin-based content filter
registered for the today's CEAS 2007 Live Spam Challenge?

  http://www.ceas.cc/challenge/


not that I know of...


I'm sure there were some SpamAssassin-derived entries, but we can't  
know for sure. The Challenge itself was a bit of a failure -- they  
had numerous technical problems and in the end they measured  
performance based on only a few hundred spam and non-spam.


Duncan


Re: Is re2c 0.10.x really needed?

2007-05-28 Thread Duncan Findlay
On Mon, May 28, 2007 at 12:13:07PM +0100, Justin Mason wrote:
> The re2c author has recommended 0.12.0:

> http://sourceforge.net/tracker/index.php?func=detail&aid=1708378&group_id=96864&atid=616200

>   Actually i highly suggest you use cvs version of either 0.12.0
>   (unreleased) or even HEAD. It appears the generated code is wrong using
>   the older re2c versions. I discoverd this while fixing the crash. So in
>   0.12.0 and HEAD both issues are resolved.

Alright, I'll make spamassassin recommend re2c (>= 0.12.0) as soon as
such a version of re2c is uploaded to Debian.

-- 
Duncan Findlay


pgpAUd0oGkEPb.pgp
Description: PGP signature


Re: Is re2c 0.10.x really needed?

2007-05-27 Thread Duncan Findlay
On Sun, May 27, 2007 at 07:59:11PM -0400, Matt Sergeant wrote:
> Bug fixes. :-)

> While I was hacking on the original re2xs that Justin based sa-compile off, I 
> found some bugs which they fixed in the 0.10.x 
> series.

> But they were just bugs in the code generation, so if it's working fine for 
> you then that should be OK.

So, if sa-compile doesn't fail, the version of re2c is adequate?

-- 
Duncan Findlay


pgpYq3VFznESn.pgp
Description: PGP signature


Is re2c 0.10.x really needed?

2007-05-27 Thread Duncan Findlay
Hey folks,

The version of re2c currently in Debian is 0.9.x, and according to the
sa-compile man page, 0.10.x is needed for the Rule2XSBody plugin. As
far as I can tell, sa-compile is working fine with 0.9.x. Anybody know
the reason behind recommending 0.10.x?

-- 
Duncan Findlay


pgpYUxF0cxvsS.pgp
Description: PGP signature


Re: sa-update-keys

2007-05-25 Thread Duncan Findlay
On Thu, May 24, 2007 at 02:44:58PM -0400, Daryl C. W. O'Shea wrote:
> Duncan Findlay wrote:
> >I'm working on the 3.2.0 Debian package and I'm running into some
> >problems with the way I handle the sa-update-keys directory.
> >It's in /etc/spamassassin/sa-update-keys, which implies it contains
> >configurations file. As far as I can tell, it's not something that we
> >expect anyone to touch manually (we specifically provide sa-update
> >--import to import keys instead of having them use gpg directly on
> >these files), so it would seem to me that it should be in
> >/var/lib/spamassassin/sa-update-keys or something instead.

> I wouldn't put it there... you should be able to rm /var/lib/spamassassin and 
> have everything continue to work fine.

Well... It would assuming you have no non-standard channels.

> >Is there a reason I'm missing for putting it in /etc?

> The keys aren't really variable (we're not going to release new ones in a 
> rule update), they are a part of the software 
> configuration and configurable (manually or with sa-update) though.

I'm not sure I buy that -- they really aren't configuration. I mean
you might configure which keys to allow, but the actual contents of
the public keys aren't really configuration information.

-- 
Duncan Findlay


pgpFJ0FZXt6pm.pgp
Description: PGP signature


sa-update-keys

2007-05-21 Thread Duncan Findlay
I'm working on the 3.2.0 Debian package and I'm running into some
problems with the way I handle the sa-update-keys directory.

It's in /etc/spamassassin/sa-update-keys, which implies it contains
configurations file. As far as I can tell, it's not something that we
expect anyone to touch manually (we specifically provide sa-update
--import to import keys instead of having them use gpg directly on
these files), so it would seem to me that it should be in
/var/lib/spamassassin/sa-update-keys or something instead.

Is there a reason I'm missing for putting it in /etc?

-- 
Duncan Findlay


pgpNbEpnTAnsx.pgp
Description: PGP signature


Re: [VOTE][DRAFT] SpamAssassin 3.2.0

2007-05-01 Thread Duncan Findlay
On Tue, May 01, 2007 at 03:13:13PM +0100, Justin Mason wrote:

> ok, here's the proposed release announcement and tarballs.

> PMC members, please vote on these tarballs -- for a full release,
> we need 3 +1's from PMC members ;)

+1 on the tarballs, they met basic sanity checks, make test passed
succesfully.

-- 
Duncan Findlay


pgplfvlOcYCIl.pgp
Description: PGP signature


Re: Score Generation for Apache SpamAssassin

2007-04-26 Thread Duncan Findlay
On Thu, Apr 26, 2007 at 12:15:52PM +0100, Justin Mason wrote:
> thanks Duncan -- a great read, and looks promising!

> Would it help btw if we came up with a spec for what a score-generation
> tool needs to generate, in terms of score ranges and so on?
> This would also be useful for the future (I'm sure there'll be
> more... ;)

Probably not to me, but it might be useful to others. (I think I
already know what needs to be done.) Also, it might limit creativity
in possible solutions. We need a score ranges mechanism, we don't need
the specific one we have now.


-- 
Duncan Findlay


pgpnowbLHPvri.pgp
Description: PGP signature


Score Generation for Apache SpamAssassin

2007-04-23 Thread Duncan Findlay
Hi everybody,

As you may already know, Steven Birk and I have been working on our
4th year undergraduate project in Math and Engineering at Queen's
University.

The goal of our project was to examine the use of logistic regression
as a potential replacement for the Perceptron/GA currently used by the
SpamAssassin project.

It's now done, and it's available here:
http://people.apache.org/~duncf/FindlayBirkThesis.pdf

Basically, we've found a technique that shows promise as a possible
replacement, but requires some modifications in order to handle some
of the restrictions the SpamAssassin projects puts on scores.

I hope to try to make those modifications in the next month or so, but
I have no idea how well it will turn out, or how easy it will be.

The paper may be an interesting read for people not too familiar with
the way the scoring process works now, as it discusses many of the
issues that differentiate the scoring process from most other machine
learning problems. (Then again, it might just be boring.)

Enjoy!

-- 
Duncan Findlay


pgpUSBFsMSnZj.pgp
Description: PGP signature


Re: Better score generation tool

2007-03-13 Thread Duncan Findlay
On Tue, Mar 13, 2007 at 10:32:20AM +, Justin Mason wrote:
> > There seem to be a lot of... issues... relating to promoting rules,
> > for example, there are rules that were mass-checked under one name and
> > then promoted (I guess I need to check out the exact revision of
> > rulesrc before running any scoring scripts?). Or maybe I just don't
> > understand how it all works.

> I suspect the latter ;)
> Read the documentation on the wiki: I've kept it up to date for
> the 3.2.0 mass-checks, so it's canonical.

>   http://wiki.apache.org/spamassassin/RescoreMassCheck

> Basically, you have to keep a single "rules/active.list" file for the
> entire process, and ensure you don't overwrite it with an "svn update"
> halfway through.  (see '4.3 resync to mcsnapshot rules list')

Yeah, sorry I didn't end up reading that the first time round. Guess I
wanted to go off memory... Thanks for the link.

Anyways, I'm still getting some wierdness:

make tmp/ranges.data

spewed a whole lot of errors about tests like T_FRT_CONTACT no longer
existing -- it exists in the mass-check logs, but I don't see it in
the rules directory. Is this just a matter of mass-checkers checking
against 70_sandbox.cf though they shouldn't?

-- 
Duncan Findlay


pgpws2vRcNFrK.pgp
Description: PGP signature


Re: Better score generation tool

2007-03-12 Thread Duncan Findlay
Whoops...

Turns out I used all the sandbox rules when generating my scores
instead of just the active ones. Naturally, the TCR I reported was
much higher than it should have been. This suggests two things:

1. We should probably loosen our promotion criteria.

2. The results I quoted in my previous e-mail are wrong. Sorry if I
got your hopes up

There seem to be a lot of... issues... relating to promoting rules,
for example, there are rules that were mass-checked under one name and
then promoted (I guess I need to check out the exact revision of
rulesrc before running any scoring scripts?). Or maybe I just don't
understand how it all works.

-- 
Duncan Findlay


pgpeojGe22EZv.pgp
Description: PGP signature


Re: Better score generation tool

2007-03-12 Thread Duncan Findlay
On Mon, Mar 12, 2007 at 01:48:10PM +, Justin Mason wrote:
> that *is* good news ;)   can you give a rough idea of what algorithm
> it uses?

It's basically a logistic regression algorithm, but optimized for
binary data. It's called Truncated Regularized Iteratively Reweighted
Least Squares (TR-IRLS).

I'll see if I can get some spare time to at least provide valid scores
that I've optimized (once I work out the min/max bits), even if I
can't commit my scripts yet.

-- 
Duncan Findlay


pgpQRkDCyORMj.pgp
Description: PGP signature


Better score generation tool

2007-03-11 Thread Duncan Findlay
Good news, everyone!

As part of our 4th year Math & Engineering Design Project, Steven Birk
and I have been working to develop a better scoring algorithm for
SpamAssassin.

We've come across an algorithm that shows some great promise:

Using the 3.2.0 logs:

scoreset 0:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  67528  99.97%
# Correctly spam: 100519  84.41%
# False positives:22  0.03%
# False negatives: 18564  15.59%
# TCR(l=50): 6.055889  SpamRecall: 84.411%  SpamPrec: 99.978%

# SUMMARY for threshold 3.5:
# Correctly non-spam:  67446  99.85%
# Correctly spam: 108479  91.10%
# False positives:   104  0.15%
# False negatives: 10604  8.90%
# TCR(l=50): 7.534991  SpamRecall: 91.095%  SpamPrec: 99.904%

scoreset 1:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  67498  99.92%
# Correctly spam: 112670  94.61%
# False positives:52  0.08%
# False negatives:  6413  5.39%
# TCR(l=50): 13.212360  SpamRecall: 94.615%  SpamPrec: 99.954%

scoreset 2:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  67517  99.95%
# Correctly spam: 115916  97.34%
# False positives:33  0.05%
# False negatives:  3167  2.66%
# TCR(l=50): 24.721403  SpamRecall: 97.341%  SpamPrec: 99.972%

scoreset 3:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  67518  99.95%
# Correctly spam: 117809  98.93%
# False positives:32  0.05%
# False negatives:  1274  1.07%
# TCR(l=50): 41.434586  SpamRecall: 98.930%  SpamPrec: 99.973%

# SUMMARY for threshold 5.2:
# Correctly non-spam:  67521  99.96%
# Correctly spam: 117727  98.86%
# False positives:29  0.04%
# False negatives:  1356  1.14%
# TCR(l=50): 42.438703  SpamRecall: 98.861%  SpamPrec: 99.975%

These are using the same training and validation sets as bug 5270. The
run time is roughly of the same order of magnitude as the
perceptron. (The slow bit is the analog of the logs-to-c script.)

Clearly from the set 0 results, we need to tune the algorithm some
more to get the threshold of 5.0 to be optimal.

At this point, the algorithm breaks a number of our current score
generation "rules", so there is room for improvement. (We're working
on it).

 - Our handling of immutable rules is pretty much broken at this
point. (We assume all rules are mutable, evaluate the optimal
threshold value and scale our scores appropriately, and then only
update the mutable scores for evaluating against the validation
set. For our purposes, we also assumed BAYES_* is mutable.) I'm not
sure how hard this will be to fix, or if it's worth it.

 - We have no concept of max/min scores or score ranges. Many tests
get small negative scores and should simply be set to 0. We haven't
yet figured out what effect this has on the TCR. Also, some scores get
set really high -- i.e. BAYES_99 is scored 6.1 in scoreset 3. I'm not
sure people are comfortable with that. There's at least 2 ways we can
fix this -- adapting the algorithm to take into account min/max scores
(hard), simply capping the scores after they are generated (easy). A
quick look through the scores and score-ranges-from-freqs output
suggests that this will not hurt our performance all that much.

Our project is due in a few weeks, and with any luck we'll have a
complete new score generation system for SpamAssassin.

-- 
Duncan Findlay


pgpi782Vwj6ls.pgp
Description: PGP signature


Re: VOTE: change voting procedure for prerelease tarballs

2007-03-08 Thread Duncan Findlay
On Tue, Mar 06, 2007 at 01:26:05PM +, Justin Mason wrote:

> As noted in recent dev list traffic (see below), we have a problem: we
> haven't been able to publish a 3.2.0 prerelease tarball yet in the past
> few weeks, due to lack of votes across two attempts.

> Currently, our release policy [1] requires 3 committer +1s to mark a
> tarball as a pre-release.  I propose that we change this to "lazy
> consensus", instead, since ASF policy requires votes only for "package
> releases" [2], which I'd interpret as a *full*, general-availability
> release.

>   [1]: http://wiki.apache.org/spamassassin/ReleasePolicy
>   [2]: http://www.apache.org/foundation/voting.html

> My vote: +1

> I guess this is a PMC vote, so PMC members: please vote...

+1 after board clarification that this is OK.

-- 
Duncan Findlay


pgpVGwxfkeYWF.pgp
Description: PGP signature


Re: VOTE: change voting procedure for prerelease tarballs

2007-03-07 Thread Duncan Findlay
On Tue, Mar 06, 2007 at 01:26:05PM +, Justin Mason wrote:

> As noted in recent dev list traffic (see below), we have a problem: we
> haven't been able to publish a 3.2.0 prerelease tarball yet in the past
> few weeks, due to lack of votes across two attempts.

> Currently, our release policy [1] requires 3 committer +1s to mark a
> tarball as a pre-release.  I propose that we change this to "lazy
> consensus", instead, since ASF policy requires votes only for "package
> releases" [2], which I'd interpret as a *full*, general-availability
> release.

Can we get a confirmation of this (I don't know who to ask)? IIRC,
this has come up before and prerelease tarballs required a vote.

-- 
Duncan Findlay


pgptboyQcVRMw.pgp
Description: PGP signature


Re: VOTE: SpamAssassin 3.2.0 prerelease 2 tarballs

2007-03-03 Thread Duncan Findlay
On Fri, Mar 02, 2007 at 01:22:12PM +, Justin Mason wrote:

> should we just not bother with votes for prereleases?

> To be honest, I can't see the harm in accidentally pushing a prerelease
> tarball at the wrong time -- and this is the second 3.2.0-preX that isn't
> garnering votes, so clearly the process is getting in the way here. :(

> (Votes for "official" full releases, of course, would still be necessary)

I think by ASF policy, we need a vote. That said, right now a +1 vote means:

 a) I think we should have a pre-release now.
 b) The tarballs presented are well constructed, work well, etc and
I've tested them.

I think if we agree that a +1 vote for pre-release only implies a)
then we won't have the issue of not getting votes. I haven't had time
to test the tarballs, so I haven't voted, but I'm +1 on the idea of a
pre-release. (i.e. +1 to part a) above).

-- 
Duncan Findlay


Re: VOTE: SpamAssassin 3.2.0 prerelease 2 tarballs

2007-02-28 Thread Duncan Findlay
On Wed, Feb 28, 2007 at 07:46:55PM +0100, Matthias Leisi wrote:
> I installed it and all is fine except that spamd does not like the
> - --daemonize [1] option. Spamd starts, runs the test/lint message, and
> then dies without warning before I can even feed a message using spamc.
> No problems if I omit --daemonize (even with all custom things loaded).

Silly question, but are you sure it didn't just background itself like
it's supposed to do with the daemonize option? (I mean did you check
that the process isn't still running with "ps aux" (or similar)) Did
you try using spamc to send it a message anyways?

-- 
Duncan Findlay


pgpIEeYqZ9sIr.pgp
Description: PGP signature


Re: Nagios

2007-02-12 Thread Duncan Findlay
On Tue, Feb 13, 2007 at 12:12:06AM -0500, Theo Van Dinter wrote:
> On Mon, Feb 12, 2007 at 10:53:40PM -0500, Duncan Findlay wrote:
> > Any chance we can turn off the nagios notifications? Or at least turn
> > them down in frequency?

> How about fixing the issues?  ;) I haven't had time to figure out
> what it's monitoring for, so I haven't prodded the box to figure out
> what's up.  A quick look around makes it seem that things are ok,
> but ...  

Well, to be fair, the vast majority of this list can't (due to
permissions and such) fix said issues. (I'm going on the assumption
that the number of subscribers to dev >> number of committers/PMC
members.) And far more can't easily fix them due to lack of
understanding on how it works. :-) (I suppose I probably fall in the
latter category.)

Is it possible to acknowledge the issue to silence the alerts? I
imagine this requires access to the apache nagios web interface, but I
don't have any idea where that is or who has access.

Or perhaps, notifications should go to [EMAIL PROTECTED]

I just don't want anyone driven away from the dev list by the intense
volume of nags. :-)

-- 
Duncan Findlay


pgppN3Kr2RhYl.pgp
Description: PGP signature


Nagios

2007-02-12 Thread Duncan Findlay
Hey folks,

Any chance we can turn off the nagios notifications? Or at least turn
them down in frequency?

Thanks,
-- 
Duncan Findlay


pgpUYlXzs640M.pgp
Description: PGP signature


Re: Moving on

2007-01-14 Thread Duncan Findlay
On Sun, Jan 14, 2007 at 09:48:57AM -0800, Robert Menschel wrote:
> As you may have gathered by my complete silence these last several
> months, I've been unable to contribute any time to SARE or SA.  My
> time is taken up by other things right now, and it doesn't look
> like that's going to change for a while.

Thank you for your contributions -- we'll miss you.

-- 
Duncan Findlay


Re: 3.2.0 release schedule

2007-01-14 Thread Duncan Findlay
On Wed, Jan 03, 2007 at 02:42:44PM +, Justin Mason wrote:

>   - T + 0 days: announce a heads-up mail. clean up our corpora, get ready
> for mass-checking, try out mass-check to spot any big memory leaks or
> whatnot, fix remaining bugs that affect mass-checks (esp bug 5260!),
> get people signed up, enable all rules in svn.

>   - T + 1 week, around a Thursday or so: start --bayes --net mass-checks;
> move to C-T-R.

>   - T + 3 weeks, a Monday or so: hopefully finish mass-checks, bugs
> allowing ;) (note that includes two weekends.)

>   - T + 3 weeks: perceptron runs, voting on new proposed scores, etc

>   - T + 4 weeks and a bit: hopefully ready to release

+1

BTW, how do we generate all 4 scoresets from one run? We used to have
to do two runs, and I can't remember the rationale for that, or the
rationale for doing it one. :-)

-- 
Duncan Findlay


Re: ham check on RCVD_BAD_ID

2006-12-27 Thread Duncan Findlay
On Tue, Dec 26, 2006 at 12:27:18AM -0500, Theo Van Dinter wrote:
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6140261
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6181618
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6587291
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6592701
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6876629
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.7151699
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200607.792396
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200610.6358
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.587266
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1704258
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1817957
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1844028
> ham-daf.log:. 
> /home/duncf/Maildir/Old/debian-project/debian-project200612.75347
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.1435129
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.2505460
> ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.2536901

All valid ham, from the same sender. I can send you a copy off-list if
you want.

-- 
Duncan Findlay


pgp6Ai1WOJVO4.pgp
Description: PGP signature


Re: [Spamassassin Wiki] Update of "BetterDocumentation/SqlReadme" by TimHunter

2006-12-12 Thread Duncan Findlay
On Tue, Dec 12, 2006 at 06:06:42PM +, Justin Mason wrote:

> Duncan, are you merging changes back from these pages?  I'm
> concerned that they're diverging from what's in SVN.

I haven't seen one of these in a while. I'll merge it when I get a
chance.

-- 
Duncan Findlay


pgpgUbYCMq2Ae.pgp
Description: PGP signature


Re: Rule update over DNS?

2006-12-07 Thread Duncan Findlay
On Thu, Dec 07, 2006 at 08:56:45PM +1300, Jason Haar wrote:
> If all SA users set sa-update to run hourly - then when an update comes
> out, you will have *all* SA users contacting the same sites
> simultaneously for the downloads. Och...

That's a good point. Those of us packaging SpamAssassin for
distributions should think about this. :-) Will it be okay if all
Debian users start running sa-update on the same minute of the hour?

-- 
Duncan Findlay


pgpuwvX322ox8.pgp
Description: PGP signature


Re: another fp check

2006-12-06 Thread Duncan Findlay
On Wed, Dec 06, 2006 at 05:06:43PM -0500, Theo Van Dinter wrote:
> ../ham-daf.log:. 
> /home/duncf/Maildir/Old/debian-project/debian-project200606.349198
> ../ham-daf.log:. 
> /home/duncf/Maildir/Old/debian-project/debian-project200606.353902
> ../ham-daf.log:. 
> /home/duncf/Maildir/Old/debian-project/debian-project200606.365610

These are spam. Sorry.

-- 
Duncan Findlay


pgpxRS3ZJjYJy.pgp
Description: PGP signature


Re: ham check please

2006-12-04 Thread Duncan Findlay
On Sun, Dec 03, 2006 at 11:04:55PM -0500, Theo Van Dinter wrote:
> ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.236990
> ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.550207
> ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.574290
> ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200607.129882

All ham, for a legitimate newsletter I'm subscribed to.

The attachment is an image/jpg, BASE64 encoded. I don't know what your
rules are looking for, but the first line of the base 64 is one
character longer than the others.

-- 
Duncan Findlay


pgp1vY7ugHjL6.pgp
Description: PGP signature


logs-to-c

2006-12-02 Thread Duncan Findlay
Hey folks,

What's the deal with logs-to-c? As far as I can tell, it has two
purposes:

 1. Convert mass-check logs to C code for use with the perceptron.

 2. Spit out FPs, FNs, TCR, and other statistics for the mass-check
logs and the current scores.

These two purposes seem to be entirely different. Furthermore, the
code is ugly and it doesn't even "use strict".

I was going to try to split the script into two (logs-to-c, which
would do #1 above, and something else, maybe "evaluate-logs" to do
#2). Unfortunately, I've run into some confusion.

Logs-to-c reads in the ranges data from scores-ranges-from-freqs. It
iterates through those data and modifies its internal concept of the
score for each rule based on the range. (i.e. if the current score in
50_scores.cf is outside the range from scores-ranges-from-freqs it
will set the score to the upper/lower limit of the range). There are
other scenarios where it will change its internal concept of the score
for a rule, but that's the idea.

Now this makes sense when it needs to output the score for the
perceptron. (i.e. Use #1) But when it's evaluating the FPs on the
current logs, I'm not sure this makes sense.

In theory, after rewriting the new scores as output by perceptron and
re-running parse-rules-for-masses this munging of scores shouldn't
make a difference, since the scores should be set within the their
ranges.*

So, the ultimate question is "do we need to fudge with our scores
based on ranges info, in order to do a logs-to-c --count?" I would
argue no. But I'm a relative neophyte to the internals of the scoring
mechanisms. If the answer is no, I plan on splitting logs-to-c in
half.

Let me know what you think.

Thanks,

-- 
Duncan Findlay

* Currently it makes a small difference; rules that are ignored have
their scores set to 0 by read_ranges in logs-to-c, but never make it
into perceptron.scores, so they can't be rewritten in
50_scores.cf. This can account for the difference. I'm not sure what
the ideal behaviour is here.



pgpYSL3wtIO8g.pgp
Description: PGP signature


Re: spam in daf corpus

2006-11-21 Thread Duncan Findlay
On Tue, Nov 21, 2006 at 01:16:25PM +, Justin Mason wrote:
> Duncan -- fyi --

> Y  6 /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1718489
> AXB_FAKETZ,GMD_FAKETZ,L_SPAM_TOOL_13,REPTO_OVERQUOTE_THEBAT,T_RCVD_CORRUPT_ESMTP,T_RCVD_FORGED_WROTE2,__CT,__CTE,__CT_TEXT_PLAIN,__FH_RCVD_NODNS,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_LOOP,__HAS_X_MAILER,__HAS_X_MAILING_LIST,__KAM_NUMBER2,__LIST_MAIL,__LIST_UNSUB,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__REPTO_OVERQUOTE,__REPTO_QUOTE,__SANE_MSGID,__SOBER_P_PRIO,__THEBAT_MUA,__THEBAT_MUA_V1,__TOCC_EXISTS
> time=1163176269,scantime=0,format=m,reuse=no

> looks spammy.

Of course you're right. Fixed. Sorry about that.

-- 
Duncan Findlay


pgpsxHy1NrblD.pgp
Description: PGP signature


Re: new rule->sa-update speedup idea (was Re: spam attacks - so and so wrote about a stock )

2006-10-18 Thread Duncan Findlay
On Wed, Oct 18, 2006 at 06:07:01PM +0100, Justin Mason wrote:

> Theo Van Dinter writes:
> in other words, reducing the worst-case scenario to just under 1 day. (If
> we were to increase frequency of update publishing in the future, that
> would then reduce that further, if necessary.)

> Rules that got promoted based on "being new" and having a 1.0 S/O in the
> preflight mass-checks would then only *stay* promoted if they then passed
> the normal, existing promotion criteria -- so a rule that was good
> "enough" to get into the update due to a 1.0 S/O, but had FPs on the
> larger test set, would fall out anyway after 1 day.


I think I'd want to see a spam% restriction on there
too. Unfortunately, this probably wont help, since (correct me if I'm
wrong) the preflight mass-checks are old messages, not brand new ones,
right? This would mean they wouldn't get a good S/O ratio anyways.

-- 
Duncan Findlay


pgpGDqNQb0XSY.pgp
Description: PGP signature


Re: Nightly run OOMs during scan...

2006-10-14 Thread Duncan Findlay
On Fri, Oct 13, 2006 at 10:40:26AM -0400, Theo Van Dinter wrote:
> Second day in a row, I haven't started debugging yet, but fyi.

I've been getting "too many open files" for a few weeks now. Also
haven't debugged.

-- 
Duncan Findlay


pgpQEO3WhgUww.pgp
Description: PGP signature


Re: mass-check: Too many open files!

2006-10-01 Thread Duncan Findlay
On Sun, Oct 01, 2006 at 08:57:28AM -0400, Duncan Findlay wrote:
> On Sat, Sep 30, 2006 at 03:06:44PM -0400, Daryl C. W. O'Shea wrote:
> > Duncan Findlay wrote:
> > >Fixed now! Thanks, Daryl!

> > I'm confused.  What did I unknowingly fix now? :)

> I think your fixes to ArchiveIterator must have fixed my problem with
> mass-check complaining about too many open files.

I lied. I've still got problems. :-(

-- 
Duncan Findlay


pgpxYX0yMwWda.pgp
Description: PGP signature


L1 Logistic Regression in SpamAssassin

2006-10-01 Thread Duncan Findlay
Hey everybody,

Just wanted to let you know that, as part of my 4th Year Math &
Engineering Design Project, I'm working on using L1 regularized
Logistic Regression to replace the Perceptron scoring mechanism, as
described by Lee, Lee, Abbeel and Ng in this paper:

http://ai.stanford.edu/~ang/papers/aaai06-efficientL1logisticregression.pdf

It's way too early for any kind of predictions, but we're hoping to
see better perfomance with this algortihm than the perceptron. I'll
let you know when we start to see results.

-- 
Duncan Findlay


pgp8YVQaSQNFC.pgp
Description: PGP signature


Re: mass-check: Too many open files!

2006-10-01 Thread Duncan Findlay
On Sat, Sep 30, 2006 at 03:06:44PM -0400, Daryl C. W. O'Shea wrote:
> Duncan Findlay wrote:
> >Fixed now! Thanks, Daryl!

> I'm confused.  What did I unknowingly fix now? :)

I think your fixes to ArchiveIterator must have fixed my problem with
mass-check complaining about too many open files.


-- 
Duncan Findlay


pgpnZF4tVeXWY.pgp
Description: PGP signature


Re: mass-check: Too many open files!

2006-09-30 Thread Duncan Findlay
Fixed now! Thanks, Daryl!

-- 
Duncan Findlay


pgpvcJTlsApnK.pgp
Description: PGP signature


mass-check: Too many open files!

2006-09-25 Thread Duncan Findlay
Has anybody seen this before? It's been a while since I've looked
through my nightly mass-check logs, but I'm now getting lots of
errors:

bayes: cannot write to 
/home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes 
db update ignored: Too many open files
bayes: cannot write to 
/home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes 
db update ignored: Too many open files
bayes: cannot open bayes databases 
/home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_* R/O: tie 
failed: Too many open files
bayes: cannot open bayes databases 
/home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_* R/O: tie 
failed: Too many open files
bayes: cannot write to 
/home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes 
db update ignored: Too many open files

...

util: secure_tmpfile failed to create file '/tmp/.spamassassin14261y8fPFutmp': 
Too many open files
util: secure_tmpfile failed to create file '/tmp/.spamassassin14261MWKNz4tmp': 
Too many open files

...


archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open 
/home/duncf/Maildir/Old/debian-devel/debian-devel200604: Too many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too 
many open files
archive-iterator: unable to open 
/home/duncf/Maildir/Old/debian-project/debian-project200604: Too many open files

etc.

Anybody know what's going on? (I haven't had time to dig in myself...)

-- 
Duncan Findlay


Re: BZ Quips

2006-08-30 Thread Duncan Findlay
On Wed, Aug 30, 2006 at 05:55:43PM -0400, Theo Van Dinter wrote:
> and so the issue was: is it appropriate to have that on an "official"
> (for lack of a better term) website.  There was a concern that maybe
> the SA PMC or ASF Board should/would have a problem with that.
> 
> So I wanted to bring it up for him and see what the general thoughts were
> about this.
> 
> There seems to be two options really if people consider this a worthy issue:
> 
> 1) remove questionable content from the quips and limit the addition cgi
> 2) disable quips altogether and avoid having to censor/police the list

Personally, I feel the BZ quips are stupid (at best), and I would not
be the least bit upset if they were removed. That's certainly a lot
easier than #1 above.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: criticism of our changelogs

2006-08-30 Thread Duncan Findlay
On Wed, Aug 30, 2006 at 05:41:50PM -0400, Theo Van Dinter wrote:
> On Wed, Aug 30, 2006 at 09:54:05PM +0100, Justin Mason wrote:
> > http://use.perl.org/~petdance/journal/30809
> 
> Yeah, I can understand the POV.  Our changelog is generally written so that we
> know what's changed in a commit, and if we need more info, that's what the bug
> ticket reference is for -- and the Changes file lets people see those as well
> w/out needing SVN and such.

I'd argue that "written" is the wrong word here. AFAIK, we use SVN for
this only because it's easier than actually writing a useful change
log.

Maybe we want to keep a running changelog and every time someone fixes
something worth while they make an entry in it.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: plugins in sa-update

2006-07-01 Thread Duncan Findlay
On Sat, Jul 01, 2006 at 05:22:19AM -0700, Loren Wilton wrote:
> If it is a rule that requires new code to work, then the new code better in
> some way come with the new rule.  Otherwise there is no point in
> distributing the (unworkable) rule, and no point in listing it in sa-update.
> (And, contra-wise, if someone makes a wonderful new rule that just happens
> to require code to work, the need fo code shouldn't disqualify it from
> distribution.)

On the other hand, I maybe don't want to be installing new code that
is run by root on an automated basis, but I'm fine with rules.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Summer of Code

2006-04-14 Thread Duncan Findlay
On Fri, Apr 14, 2006 at 03:32:19AM -0400, Duncan Findlay wrote:
> On Thu, Apr 13, 2006 at 09:52:07PM +0100, Justin Mason wrote:
> > this is coming up soon -- do we want to get a couple of entrants
> > for SpamAssassin?
> 
> Uh... is it coming up soon? I've seen no announcement or indication
> that they will be continuing that program this year.

Actually, it very much looks like they will continue this year. It
just hasn't been officially announced.

http://groups.google.com/group/summer-discuss/browse_thread/thread/675724ce3b035acb/b2ab8df0dc512e58#b2ab8df0dc512e58

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Summer of Code

2006-04-14 Thread Duncan Findlay
On Thu, Apr 13, 2006 at 09:52:07PM +0100, Justin Mason wrote:
> this is coming up soon -- do we want to get a couple of entrants
> for SpamAssassin?

Uh... is it coming up soon? I've seen no announcement or indication
that they will be continuing that program this year.

Maybe you've seen something I haven't? :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: svn commit: r384691 - in /spamassassin/rules/trunk/sandbox/dos: SIQ.cf SIQ.pm

2006-03-09 Thread Duncan Findlay
On Fri, Mar 10, 2006 at 01:01:07AM -0500, Daryl C. W. O'Shea wrote:
> >>Note that Outbound Index is a member/subscription service.
> >
> >Will this automatically get enabled in the nightly checks? Is this a
> >problem for those of us not subscribed to the service?
> 
> The loadplugin line is commented out, so no.

Doh! Missed that, obviously.

> Running mass-checks with it, even if you've got access, probably isn't a 
> great idea though, since it'll likely severely skew their database.
> 
> I think some co-ordination would have to take place to make sure that 
> doesn't happen if we ever want to do mass-checks with it.

Fair enough. I really don't know anything about this service.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: svn commit: r384691 - in /spamassassin/rules/trunk/sandbox/dos: SIQ.cf SIQ.pm

2006-03-09 Thread Duncan Findlay
On Fri, Mar 10, 2006 at 03:29:02AM -, [EMAIL PROTECTED] wrote:
> Author: dos
> Date: Thu Mar  9 19:28:58 2006
> New Revision: 384691
> 
> URL: http://svn.apache.org/viewcvs?rev=384691&view=rev
> Log:
> sandbox: add my SIQ plugin for anyone who's interested
> 
> insomnia == non-engineered, messy code :(
> 
> This plugin executes SIQ queries in the background and allows the data
> returned by the reputation service provider to be tested via a number of evals
> and a psuedo-header
> 
> Note that Outbound Index is a member/subscription service.

Will this automatically get enabled in the nightly checks? Is this a
problem for those of us not subscribed to the service?

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Nagios notifications

2006-03-09 Thread Duncan Findlay
Can these be throttled more / disabled? I understand the merit of
telling someone that can do something about it when it breaks, but as
far as I can tell, a very small percentage of this list actually can;
the rest of us don't really need the notification. Plus they're
getting annoying.

Thanks,
-- 
Duncan Findlay



signature.asc
Description: Digital signature


Re: svn commit: r382049 - in /spamassassin/rules/trunk/sandbox/kam: ./ 20_stock.cf

2006-03-03 Thread Duncan Findlay
On Thu, Mar 02, 2006 at 12:38:42PM -0500, Daryl C. W. O'Shea wrote:
> [EMAIL PROTECTED] wrote:
> >Author: kmcgrail
> >Date: Wed Mar  1 07:15:05 2006
> >New Revision: 382049
> >
> >URL: http://svn.apache.org/viewcvs?rev=382049&view=rev
> >Log:
> >KAM Sandbox Creation and 1 test rule
> >
> >Added:
> >spamassassin/rules/trunk/sandbox/kam/
> >spamassassin/rules/trunk/sandbox/kam/20_stock.cf
> 
> I think we were (well, we have so far) sticking to using our Apache 
> logins for sandbox directory names.

Yeah, that's a good point. Kevin, would you please consider moving
your sandbox directory to kmcgrail please? (Use svn move to preserve
history)

Thanks,
-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Nightly mass-checks

2006-03-01 Thread Duncan Findlay
On Wed, Mar 01, 2006 at 11:50:58PM -0600, Doc Schneider wrote:
> >>is what I'm personally using. The machine is a dual 500 with a gig of 
> >>RAM. And perl 5.8.6 on it. Anyone have any ideas?
> >
> >What size are these mailboxes?
> >
> Total size of the files? or how many messages in each mbox?
> 
> Size of ham is 333 megs
> Size of spam is 535 megs.
> 
> A bit over 100k messages total
> spams:63731hams:39385 give or take.

My current checks are roughly 40k messages. I use --after '6 months'
on a much larger corpus, but that gets me down to 40k. (On my weekly
net-enabled runs I use --after '1 month')

It takes roughly 3 hours start to finish (including scanning the
corpus and rsyncing), this is on a 2.8 GHz P4 w/ 1GB RAM.

Suggestions:

 - Get rid of --all; you could be hitting some giant messages and burning a lot 
of CPU.
 - Use -j2 since you have 2 processors... might as well use them.
 - Trim your corpus, (use --after)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Nightly mass-checks

2006-03-01 Thread Duncan Findlay
On Wed, Mar 01, 2006 at 10:57:27PM -0600, Doc Schneider wrote:
> I started one with an rsync'd version I grabbed last night about this 
> time and it is still going. Says it is 50% complete. I think I'm missing 
> an option or something.
> ./mass-check --progress --all \
> ham:mbox:/home/masschecker/mail/ham \
> spam:mbox:/home/masschecker/mail/spam
> 
> is what I'm personally using. The machine is a dual 500 with a gig of 
> RAM. And perl 5.8.6 on it. Anyone have any ideas?

What size are these mailboxes?

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Nightly mass-checks

2006-02-28 Thread Duncan Findlay
On Tue, Feb 28, 2006 at 10:29:10PM -0600, Doc Schneider wrote:
> Gang,
> 
> I'm working on a better wiki page for nightly mass-checks and an easier 
> way for those old and new to SA to be able to do these.
> 
> A few things which I'm not entirely sure of are:
> 
> 1) Command line to run a mass-check $./mass-check --all --progress 
> ham:mbox:path/to/ham (does this need a / or a /* to run through a whole 
> directory of mbox ham files? Ditto with spam?)

I recommend using mass-check's -f option, which lets you define all of
your targets (ham:dir:/path, etc) in one file, and just pass the name
of the file to mass-check.

For nightly checks I do:

mass-check -f  --after='6 months'

For weekly checks I do:

mass-check --net -j 8 -f  --after='6 months' 

Looks like I don't use --all, I probably should. --progress is
probably unnecessary if you're running this from a cron job.

> 2) Command line for maildir files? ham:dir:/path/to/ham (Wondering the 
> same on this also about needing a / or /*)

For maildirs:
ham:dir:/path/to/maildir/cur

For a dir of mboxes:
ham:dir/path/to/mboxdir/*

> 3) A good example of how to do a mass-check using the SVN method. (Theo 
> gave me some clues on using it so am pretty much set for doing this.)

It seems this is the "old", almost deprecated way now... Guess I
missed that memo :-)

Manually:
Check out a spamassassin tree

Nightly:
wget http://rsync.spamassassin.org/weekly-versions.txt or nightly-versions.txt
take the revision number from the last line of that file

(inside a loop with some (randomized) retry logic)
svn update -r 

perl Makefile.PL
make

run mass-check

upload results via rsync (setting the RSYNC_PASSWORD environment variable is 
useful)

--

My script is designed to be run every hour, and it will only run if it
hasn't yet run "today" where "today" is defined as since 9:00 UTC
(when the revision file gets updated)


> 5) Anyone having a particular problem that should be addressed for doing 
> mass-checks?

Now that I think about it, with the rules sandbox stuff, the SVN
checkout probably won't always get the same revision if people update
their SVN at different times of the day. Is this a bad thing.

I just tested this. If I run my nightly mass checks at 11:00 UTC (for
example) and someone commits a rule change between 9:00 UTC and 11:00
UTC, I'll pick up the rule and test it, but others might not. I don't
know what ramifications this might happen.

[EMAIL PROTECTED]:~/svn/spamassassin-nightly$ svn update -r 381595

Fetching external item into 'rulesrc'
Urulesrc/sandbox/duncf/20_drugs.cf
Updated external to revision 381910.

Updated to revision 381595.

Note the rules directory gets updated to 381910 instead of 381595.

Interesting...

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Issues with nightly runs & rsync, docs, etc.

2006-02-26 Thread Duncan Findlay
On Sun, Feb 26, 2006 at 11:47:34PM -0500, Theo Van Dinter wrote:
> On Sun, Feb 26, 2006 at 11:32:57PM -0500, Duncan Findlay wrote:
> > > So the problem here is that mkrules is never run, so the majority
> > > of rules aren't actually run.  I've added in a mkrules call into the
> > > nightlymc script.
> > 
> > Umm... which nightlymc script is this? Also, we don't all use the same
> > script for running nightlies, so if you change something make sure you
> > let people know very clearly. :-)
> 
> The one that runs on the zones machine to generate the rsync image.
> http://wiki.apache.org/spamassassin/NightlyMassCheck simply says that
> people can rsync down the appropriate version and start running, which
> wasn't really true since you had to run mkrules as well, so now it's
> run at the server so when people rsync they get the full image.

Ah... OK...

I'd forgotten we even allow rsync download of the tree for this
purpose, so I had no idea what you were talking about :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Issues with nightly runs & rsync, docs, etc.

2006-02-26 Thread Duncan Findlay
On Wed, Feb 22, 2006 at 06:32:58PM -0500, Theo Van Dinter wrote:
> Second, the suggested method for doing rsync nightlies is (basically):
> 
> cd nightly_mass_check/masses
> rm -f ham.log spam.log
> ./mass-check --progress \
>  [list of files]
> 
> So the problem here is that mkrules is never run, so the majority
> of rules aren't actually run.  I've added in a mkrules call into the
> nightlymc script.

Umm... which nightlymc script is this? Also, we don't all use the same
script for running nightlies, so if you change something make sure you
let people know very clearly. :-)

(I really should commit mine at some point, it has some nice features
over the shell script that's there.)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


[OT] SpamAssassin Developer for Hire

2006-02-06 Thread Duncan Findlay
I am currently seeking employment for 8 or 9 weeks in May and June
2006.[1] I would greatly enjoy working for a company involved in the
anti-spam / e-mail security industry, especially if it would allow me
to use or contribute to the Apache SpamAssassin project.

As you may know, I've been a SpamAssassin developer since 2002,
although I have contributed little recently -- being at school
full-time seems to get in the way of that. Last summer, I worked for
IronPort Systems as an Anti-Spam Developer and I greatly enjoyed the
experience.

I am located in Toronto, Ontario (Canada) and/or Kingston, Ontario,
and I am not eager to relocate for such a short term. I would be
interested in working in either of these two cities or the surrounding
areas, or remotely.[2]

If your company would be interested in hiring a highly motivated and
skilled young computer programmer with extensive experience in the
anti-spam industry, I would love to hear from you.

I realize that this is a rather short period of time, but I am
confident I could tackle a sizeable project in this time frame.

My resume is available online at the following address:

http://people.apache.org/~duncf/DuncanFindlay.pdf

Please feel free to forward this message to anyone that may
interested. References are available on request.

Thank you,
Duncan Findlay

[1] More precisely, I'd like to work May, June and the first little
bit of July. I am going to be travelling in Europe for the last half
of the summer, from mid-July to the end of August.

[2] Some travel, on the other hand, would be perfectly fine. I just do
not want to deal with the hassles of finding somewhere to live,
furnishings, etc. for a short period. If this were a permanent job, I
would be happy to relocate.


signature.asc
Description: Digital signature


Re: Proposal: two-implementation requirement for new plugin interfaces

2006-02-03 Thread Duncan Findlay
On Fri, Feb 03, 2006 at 10:58:21AM -0800, John Myers wrote:
> Related to bug 4776, I propose a necessary requirement before creating a 
> new plugin interface that at least two plugins implementing that 
> interface exist and be intended for production use.  The null 
> implementation (no plugin of that type registered) could count for one 
> of these implementations.  A non-contributed, proprietary plugin could 
> similarly count for one implementation.

That's fine with me; though it'd be nice to see at least two open
source ones. :-)

If someone is going to the effort of pluginizing something, chances
are that there exists a plugin other than the default.

In bug 4776, the default would be the current behaviour, wouldn't it?
So the requirement would only require one other implementation, which
I presume must exist.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: [Spamassassin Wiki] Update of "BecomingCommitter" by DuncanFindlay

2006-02-02 Thread Duncan Findlay
On Thu, Feb 02, 2006 at 10:54:38PM -0500, Daryl C. W. O'Shea wrote:
> On 02/02/2006 10:49 PM, Theo Van Dinter wrote:
> >On Fri, Feb 03, 2006 at 03:14:19AM -, Apache Wiki wrote:
> >
> >>You have to set your own svn password -- infra doesnt do this
> >>- In theory, when the account request was made to the infrastructure 
> >>group, it would have included the output from ''htpasswd -ns username'' 
> >>and that would have your initial SVN password setup.  If there is a 
> >>problem with this however, you will need to run svnpasswd.
> >
> >
> >If this is the case, why do we ask for htpasswd output when we offer commit
> >access?
> >
> >Also, do we have a wikidoc or something with information about what/how to 
> >request
> >from infra, etc?
> 
> They setup mine (account on minotaur and svn password) with the htpasswd 
> output I supplied last year.

Hmmm... I could be wrong then, I suppose. My understanding is they
just set up the regular passwords, and we were supposed to change the
svn ones ourselves.

Maybe I should just go ahead and ask the people that actually *do*
this stuff.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: review reminder

2006-01-25 Thread Duncan Findlay
On Wed, Jan 25, 2006 at 04:33:02PM -0600, Doc Schneider wrote:
> Welcome to the world of out of work Admins. I've been out of work for 
> close to a year now. And the job pickings are really slim.
> 
> I'm still looking for work too!

And I'm worried about finding a summer job :-)

Sorry I also haven't been doing much. For some reason, I'm taking 8
courses this term leaving little time for all the extracurricular
stuff I'm doing, and unfortunately SpamAssassin lies below that on my
priority list. :-(

Hopefully I'll find a job for this summer that leaves with me a bit of
time to code on SpamAssassin...


-- 
Duncan Findlay


signature.asc
Description: Digital signature


Security-related bugs

2006-01-10 Thread Duncan Findlay
I was hoping to start a discussion over what constitutes a "security"
bug in our Bugzlla. This is not meant to criticize any previous
decisions around security, merely to gauge how we feel about this as a
community.

So, here I'd like to outline the criteria I would suggest for
determining whether a bug should be classified as "security" and
restricted to the "security team." Please comment. :-)

 - Bugs which allow false negatives are not security bugs. In
particular if a bug allows a carefully crafted message to bypass some,
but not all, of SpamAssassin's tests, then it should not be marked as
"security".

 - DOS attacks and other related, *exploitable* bugs that cause
disruption to mail-scanning or other problems for the server are
security bugs. (I don't consider 4570 to be a security bug, for
example. It's just not exploitable by spammers.)

 - Bugs that allow a specially crafted spammy message to get through
regardless of any other charactersistics (i.e. header, body, Bayes and
other tests fail to count) may be security bugs. (I'd argue it's not
strictly speaking a security issue for the system, but it is something
we should maybe not make public. I could be convinced either way on
this.)


Lastly, I'd like to say that once a bug is outlined in the open, there
is no point to hide it after the fact. In fact, all this may
accomplish is to hide the fix from our users, even though a
description of the "exploit" is publicly available. (Example: bug
4759, 4535, others I'm sure.)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: sa-updates of the main ruleset: require GPG?

2005-12-16 Thread Duncan Findlay
On Fri, Dec 16, 2005 at 03:10:10PM -0800, Justin Mason wrote:
> a question that Henry put to me -- should sa-updates of the main ruleset
> mandate that GPG verification be used?
> 
> Otherwise an attacker that rooted the download server (or a mirror) could
> put out faked updates, which would be automatically downloaded by
> thousands of servers.

I'm not sure it should be "required" since users could just manually
download it and stick it in the right place and requiring it would be
an inconvenience then, but "strongly recommended unless you give
sa-update the --yes-im-crazy-and-dont-want-to-use-gpg option".*

-- 
Duncan Findlay

* That said, "--no-gpg" would probably be equally suitable.


signature.asc
Description: Digital signature


Re: updated sa-update proposal

2005-12-16 Thread Duncan Findlay
On Fri, Dec 16, 2005 at 06:04:18PM -0500, Warren Togami wrote:
> Justin Mason wrote:
> >If we version the /var/lib/spamassassin directory, then we would have
> >this timeline:

I agree versioning is useful. My only concern is we'll never purge the
old stuff.

> I think this makes a lot of sense and we should go for it.  I also think 
> that we don't need auto-purging to remove other versions of these 
> directories.  There are just too many things that can go wrong, and 
> these things are not very big on space consumption.

I just don't like cruft hanging around forever. :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: updated sa-update proposal

2005-12-15 Thread Duncan Findlay
On Thu, Dec 15, 2005 at 10:06:36AM -0500, Thomas Schulz wrote:
> Just checking on various operating systems.  /var/lib seems to exist on
> Linux, but not on Solaris, HP-UX or AIX.  There is a /var/opt on Solaris
> and HP-UX, but not on AIX.  Of course you could always create the /var/lib
> (or /var/opt) directory in the install.

Right. I'm going by the Filesystem Heirarchy Standard, which is what
many Linux istributions go by. Many other OSs don't, and we would need
to stick this in the appropriate place on other OSs too. Namely, it
should be somewhere where "variable" data goes. I just don't know
where that is. ;-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: updated sa-update proposal

2005-12-14 Thread Duncan Findlay
On Wed, Dec 14, 2005 at 04:54:54PM -0800, Justin Mason wrote:
> So we have these requirements:
> 
>   1. use /var for updates, instead of /etc or /usr
> 
>   2. sa-update updates must not overwrite any packaged files
> 
>   3. the user shouldn't have to choose at package-install time whether
>   they want to use packaged rules, or sa-update rules.  (although
>   conversely, it's ok to entirely stop using packaged rules from that
>   point on, if sa-update installs an update set.)

Agreed.

> So the suggestion is to use:
> 
> /etc/mail/spamassassin:
> 
> *.cf
> *.pre: Admin-installed local settings
> 
> 
> /usr/share/spamassassin:
> 
> default, distro-package-installed scores and rules
> 
> 
> /var/lib/spamassassin/3.1.0:
> /var/lib/spamassassin/3.1.1:
> /var/lib/spamassassin/3.1.2:
> /var/lib/spamassassin/3.2.0:
> 
> sa-update-installed scores and rules

I'm not sure I see the need for multiple directories lying around. I
suppose it can be useful, I'm assuming that most will only have one
directory. Also, sa-update should be smart enough to remove old
directories of previous versions (optionally?).

> The presence of anything in /var/lib/spamassassin/3.1.1 causes
> /usr/share/spamassassin to be ignored.
> 
> All rules, including the code-tied stuff for that release, are put in the
> sa-update tarballs (and therefore /var/lib/spamassassin/3.1.2 etc.)

Hmm... does it make sense to redistibute the code-tied stuff? That
seems like unnecessary bandwidth usage. sa-update should only be
grabbing the "changing" non-code tied stuff.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: hackathon notes from Sat

2005-12-14 Thread Duncan Findlay

On Wed, Dec 14, 2005 at 11:36:11AM -0800, Justin Mason wrote:
> Duncan Findlay writes:

> >Right. I also don't see any need to split the rules out of the main
> >package -- spamassassin just needs to be smart enough to use the right
> >set of rules -- either where sa-update drops them or where they are
> >installed by default.
> 
> So you're suggesting we'd have:
> 
> /usr/share/spamassassin/72_active.cf: base, released copy of
>  rule updates
> /etc/mail/spamassassin/sa_update.cf: override of that default set
> 
> ??

Yes, except that I'd argue /etc/ isn't the right place for it
either. I'm really thinking it should go in /var/lib somewhere. But
that would mean we'd have the following:
 
 /etc/spamassassin | /etc/mail/spamassassin - site config
 /usr/share/spamassassin | ...  - default rules
 /var/lib/spamassassin  - sa-update drop directory

> I could go for that.  We'd have to modify the Mail::SpamAssassin code
> to recognise the 72_active.cf file somehow and allow it to be ignored
> in the system rules dir, if it appears in the site rules dir.

Are we going to be consolidating all the rules to one file? It would
make it tougher for users to read and play with, if that's a concern.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: hackathon notes from Sat

2005-12-13 Thread Duncan Findlay
On Tue, Dec 13, 2005 at 03:49:44PM -0500, Warren Togami wrote:
> Duncan Findlay wrote:
> >The only problem I see with the above, is that no script should be
> >overwriting rules that are distributed in a package. So if I
> >distribute a spamassassin-rules .deb, which would stick files in
> >/usr/share/spamassassin, no script should go in and overwrite those
> >rules. sa-update should be writing to somewhere in
> >/var/lib/spamassassin (or /var/cache/spamassassin ?) and
> >spamassassin/spamd should be reading from that location if it exists.
> >
> >So, looks like spamassassin/spamd probably needs to be modified to
> >read from /var/lib/spamassassin if we want sa-update to work this way.
> >
> 
> I am in agreement that sa-update should download rules/scores into 
> somewhere in /var, and it shouldn't overwrite files distributed by the 
> package.  I am not so sure I like the separate co-dependent package for 
> scores thing as a requirement.

Right. I also don't see any need to split the rules out of the main
package -- spamassassin just needs to be smart enough to use the right
set of rules -- either where sa-update drops them or where they are
installed by default.

> I am a little confused about the terminology, active-set means network 
> tests right?

I believe "active-set" refers to the latest scored set of rules -- the
idea being that rules will be updated more often than code.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: hackathon notes from Sat

2005-12-12 Thread Duncan Findlay
On Sun, Dec 11, 2005 at 12:35:46PM -0800, Justin Mason wrote:
> OK, we're rethinking this; it no longer seems necessary for it
> to be a requirement, and you have good points there.
> 
> What about this?
> 
>   - basic "spamassassin" package (rpm/deb) contains no active-set rules
> 
>   - there's another package which contains the active-set rules, in the
> location where "sa-update" can later overwrite them
> 
>   - both packages co-depend on each other.
> 
> The second package can be updated either via distro packaging methods --
> apt-get/yum, or can be overwritten using "sa-update".

Yeah, sorry I didn't read the original message carefully enough. I
think I'm pretty much in agreement with Warren though as far as
requirements go.

The only problem I see with the above, is that no script should be
overwriting rules that are distributed in a package. So if I
distribute a spamassassin-rules .deb, which would stick files in
/usr/share/spamassassin, no script should go in and overwrite those
rules. sa-update should be writing to somewhere in
/var/lib/spamassassin (or /var/cache/spamassassin ?) and
spamassassin/spamd should be reading from that location if it exists.

So, looks like spamassassin/spamd probably needs to be modified to
read from /var/lib/spamassassin if we want sa-update to work this way.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: BTW / RE: Bug 4535 - Re: 3.0.5 almost done, votes needed

2005-11-20 Thread Duncan Findlay
On Mon, Nov 21, 2005 at 12:20:39AM -0500, Duncan Findlay wrote:
> On Sun, Nov 20, 2005 at 09:11:02PM -0800, Linda Walsh wrote:
> > Was reading the bugs to see what was yet to go into 3.0.5 and
> > ran into some weirdness with bug# 4535:
> > 
> >"You are not authorized to access bug #4535."
> > 
> > Hmmm...not authorized to "look" at a bug"...what's it take to
> > look at bugs these days (or at least this specific one...I could
> > view the rest, they all seem to be resolved)?

> Bug 4535 is classified because it has been marked as a security bug.

Also, it's RESOLVED FIXED, FYI.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: BTW / RE: Bug 4535 - Re: 3.0.5 almost done, votes needed

2005-11-20 Thread Duncan Findlay
On Sun, Nov 20, 2005 at 09:11:02PM -0800, Linda Walsh wrote:
> Was reading the bugs to see what was yet to go into 3.0.5 and
> ran into some weirdness with bug# 4535:
> 
>"You are not authorized to access bug #4535."
> 
> Hmmm...not authorized to "look" at a bug"...what's it take to
> look at bugs these days (or at least this specific one...I could
> view the rest, they all seem to be resolved)?

In general, bugs can be restricted so that only the "Security Team"
can look at them. I believe the "Security Team" is simply the group of
all committers.

Bug 4535 is classified because it has been marked as a security bug.

Obviously, I can't elaborate any more than that. :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: svn commit: r345103 - in /spamassassin/rules/trunk/sandbox: duncf/25_replace.cf jm/20_vbounce.cf

2005-11-16 Thread Duncan Findlay
On Wed, Nov 16, 2005 at 08:28:22PM -, [EMAIL PROTECTED] wrote:
> ==
> --- spamassassin/rules/trunk/sandbox/duncf/25_replace.cf (original)
> +++ spamassassin/rules/trunk/sandbox/duncf/25_replace.cf Wed Nov 16 12:28:21 
> 2005
> @@ -6,7 +6,9 @@
>  describe T_FUZZY_STOCK   Obfuscates the word "stock"
>  
>  test T_FUZZY_STOCK   fail Stock
> -test T_FUZZY_STOCK   ok St0ck
> +
> +# this is failing; "make test TEST_FILES=t/rule_tests.t" will repro it
> +# test   T_FUZZY_STOCK   ok St0ck
>  
>  
>  endif

Wierd. This definitely worked for me before. As far as I can tell, it
*should* work.

I am relying on the definitions of tags in rules/25_replace.cf. Is
this a problem?

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: "ready to release" votes -- procedure change

2005-11-15 Thread Duncan Findlay
On Tue, Nov 15, 2005 at 12:06:39PM -0800, Justin Mason wrote:
> I've just updated http://wiki.apache.org/spamassassin/VotingProcedure
> to fix a bug -- looks like we were reading the ASF pages wrongly. 
> quoting the fixed version:
> 
>   For code modifications, patches, and R-T-C changes to svn, committers have
>   the binding votes. However, for "ready to release" and project-procedural
>   ASF votes, votes must come from PMC members to be considered binding.
> 
>   (Note: previously committers could vote for releases, but this has had to
>   be changed, due to ASF regulations. While the Apache Voting page is a
>   little unclear on the subject, discussion on the 'legal-discuss' list has
>   made it clear that it is part of the ASF's bylaws that PMCs, and only
>   PMCs, can direct this action.) 

Hmm... are "pre-releases" and "release candidates" different in this
respect? Personally, I think requiring +3 from PMC for an RC or -pre
is excessive.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Sandboxes

2005-11-08 Thread Duncan Findlay
Could someone please outline the process I would go through to propose
rules for testing? (*cough* jmason *cough*) ;-)

I assume it's like this. (In MoinMoin format, suitable for someone to
paste into the Wiki.)

 1. Create a new sandbox in rulesrc/sandbox (e.g. rulesrc/sandbox/duncf)
 1. Create as many files in this directory as I wish, containing as many rules 
as I wish, named whatever I wish.
 1. Wait until the mass-check results are in, and look at the results
at http://buildbot.spamassassin.org/ruleqa.
 1. Tweak until I'm satisfied with my rule.
 1. ???
 1. Profit!

The last couple of steps are a little hazy -- I'm not sure if we've
ever decided how we're going to move rules from sandboxes to "core
rules". Can anyone comment about this? I guess we don't need to worry
about that until closer to "release time".

Anyways... it looks like the mechanism is pretty much set up... now
all we need to do is write rules. (?) Anyone notice a lot more
"clever" spam recently?

Thanks,
-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: please fix your nightly mass-checks

2005-10-30 Thread Duncan Findlay
On Sat, Oct 29, 2005 at 07:28:09PM -0700, Justin Mason wrote:
> It's only myself, daf and bzoetekouw submitting results afaics.
> 
> BTW, that new rule-hits-over-time graph is *cool*.  Check out these
> graphs!

What do the different colours represent? (Could you provide a legend?)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: svn commit: r328517 - /spamassassin/trunk/UPGRADE

2005-10-25 Thread Duncan Findlay
On Wed, Oct 26, 2005 at 01:24:30AM -, [EMAIL PROTECTED] wrote:
> Author: duncf
> Date: Tue Oct 25 18:24:28 2005
> New Revision: 328517
> 
> URL: http://svn.apache.org/viewcvs?rev=328517&view=rev
> Log:
> Fix typo (Debian Bug 335799) in UPGRADE

For those following commits way too closely, that bug number is in
fact incorrect. It should be 335794. The commit to the 3.1 branch is
correct.

Is there a way to change log entries? Is it desirable?

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Suggestion: new list for corpus run announcements and discussion

2005-10-17 Thread Duncan Findlay
On Sun, Oct 16, 2005 at 12:19:35PM -0700, Justin Mason wrote:
> ok, I can go for a new bugs@ list, too.  +1

-0

I kinda doubt there's enough on dev@ to keep it going if we don't have
bugs sent there. If we want to lower the traffic so that certain
people can read what they need to read, then we should just create a
dev-announce list or something like that for the important stuff.

By creating a bugs@ list, we'll be splitting discussion of development
(i.e. bug fixing) over two lists, which probably doesnt make much
sense.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Suggestion: new list for corpus run announcements and discussion

2005-10-13 Thread Duncan Findlay
On Thu, Oct 13, 2005 at 12:14:27PM -0400, Theo Van Dinter wrote:
> I suggest to people who volunteer to do nightly/weekly runs that they
> subscribe to dev to keep up with any issues regarding changes that
> affect them.  I've heard complaints that there's too much mail on dev
> to keep up, or changes may be missed because they're only in a commit
> mail which most of those folks ignore.
> 
> So I'd like to make a new list (not sure about name) where we can have
> discussions/send announcements/etc that are directly relevent to those
> folks.  Thoughts/votes? (do we need to vote for a new list?)

[EMAIL PROTECTED] ?

+1

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: 3.1.0rc3, or general release?

2005-09-13 Thread Duncan Findlay
On Mon, Sep 12, 2005 at 06:25:26PM -0700, Justin Mason wrote:
> I think 4570 is the only relatively big one there -- and it was a
> one-liner.  Given that, I think we could do the general release of 3.1.0,
> instead of another RC.
> 
> Personally, I'd like to get 3.1.0 out.
> 
> Thoughts?

I'd be more in favour of 3.1.0 rc3, to get some last minute testing in
before we release. (It'd be nice for 3.1.0 to be rock solid, so we
don't need to release 3.1.1 right away...)

We've dragged the process on long enough that another couple days wont
hurt.  :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Debian Packages for 3.1.0-rc2

2005-08-31 Thread Duncan Findlay
On Wed, Aug 31, 2005 at 02:23:14PM -0500, Chris Thielen wrote:
> Installed cleanly for me on my Sarge box.  I merged my local.cf into the 
> updated package one.
> 
> Aug 31 14:04:27 ns1 spamd[4269]: Can't locate Sys/Hostname/Long.pm in @INC 
> (@INC contains: ../lib /usr/share/perl5 /etc/perl /usr/local/lib/perl/5.8.4 
> /usr/local/share/perl/5.8.4 /usr/lib/perl5 /usr/lib/perl/5.8 
> /usr/share/perl/5.8 /usr/local/lib/site_perl) at 
> /usr/share/perl5/Mail/SPF/Query.pm line 328,  line 64. 
> I'm getting these in mail.log, but that's because libmail-spf-query-perl 
> only suggests libsys-hostname-long-perl (which I don't have installed).  
> I believe this is a warning at worst since the package is only 
> suggested, and I assume it is out of your control.

That should be reported to the BTS as a libmail-spf-query-perl bug, I
believe.

> I am seeing some other anomalous messages in the log, but I believe they 
> are not packaging related.

Hmm... feel free to report those to SA Bugzilla. :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Spamassassin in Fedora service restart problem

2005-08-31 Thread Duncan Findlay
On Wed, Aug 31, 2005 at 08:07:31AM -1000, Warren Togami wrote:
> This is not a spamassassin bug per-se, but rather a problem in the way 
> we had packaged it for years.
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141323
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161785
> It would be helpful if upstream could look at the excellent explanation 
> in Bug #161785.  I am hoping there is a better solution to this problem 
> than the suggested shell script .pid solution suggested in that bug.
> 
> Wouldn't it be preferable to have a solution where spamd itself returns 
> when it is finished stopping rather than relying on an asynchronous kill 
> signal?

I think the solution is for spamd's --pidfile to write the pid to the
file before dropping root, and this is fixed in 3.1.0.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Debian Packages for 3.1.0-rc2

2005-08-30 Thread Duncan Findlay
On Mon, Aug 29, 2005 at 11:41:39PM -0400, Duncan Findlay wrote:
> *** THIS IS A RELEASE CANDIDATE ONLY, NOT THE FINAL 3.1.0 RELEASE ***
> 
> SpamAssassin 3.1.0-rc2 is released!  SpamAssassin 3.1.0 is a major
> update.  SpamAssassin is a mail filter which uses advanced statistical
> and heuristic tests to identify spam (also known as unsolicited bulk
> email).

Debian packages or 3.1.0-rc2 are available from the experimental
distibution (version 3.0.99pre3.1.0+rc2-1). I'd appreciate help
testing them, so that all the bugs in the packaging can be worked out
by the time 3.1.0 is uploaded to unstable.

Packages.debian.org hasn't been updated yet, but if you have the
appropriate line for experimental in your /apt/sources.list, you can
try:

# apt-get install spamassassin/experimental spamc/experimental

Or you can download the files from your favourite (up-to-date) mirror:

ftp://ftp.us.debian.org/debian/pool/main/s/spamassassin/spamassassin_3.0.99pre3.1.0+rc2-1_all.deb
ftp://ftp.us.debian.org/debian/pool/main/s/spamassassin/spamc_3.0.99pre3.1.0+rc2-1_i386.deb


All bugs in the Debian packaging should be reported to the Debian BTS,
not the SpamAssassin bugzilla.

Thanks,
-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: body rule speed

2005-08-30 Thread Duncan Findlay
On Tue, Aug 30, 2005 at 06:29:20AM -0700, Dale Luck wrote:

> In my study of where SA is spending most of its time, it became
> quickly apparent the do_body_tests is by far the largest cpu hog.
> Indeed i've seen just a single file (sare_fraud) can use up half of
> the cpu cycles for every spam scan.
>  
> I was wondering if anyone investigating flipping inside out the
> algorithm used to apply the rules to the body.

I believe I tried to look at this one time, but it got pretty messy to
hack that in and I didn't have enough time to spend on it. Any speedup
seemed to be minimal, but it might be worth looking into in greater
detail. Also, I'm not convinced study helps a whole lot. Having said
that, some of our regular expressions could probably be tuned better
so that study helps more.

The case insensitive thing can be a very large speedup; however, we do
have many tests that rely on capitalization. We'd need a way of
splitting them up or something, since we definitely need some case
sensetive rules

-- 
Duncan Findlay


signature.asc
Description: Digital signature


ANNOUNCE: SpamAssassin 3.1.0-rc2 release candidate available!

2005-08-29 Thread Duncan Findlay
ugins: DomainKeys (off by default), MIMEHeader: a new plugin to perform
  tests against header in internal MIME structure, ReplaceTags: plugin by Felix
  Bauer to support fuzzy text matching, WhiteListSubject: plugin added to
  support user whitelists by Subject header.

- TextCat language guesser moved to a plugin.  (This means "ok_languages"
  is no longer part of the core engine by default.)

- Razor: disable Razor2 support by default per our policy, since the
  service is not free for non-personal use.  It's trivial to reenable.

- DCC: disable DCC for similar reasons, due to new license terms.

- Net::DNS bug: high load caused answer packets to be mixed up and delivered as
  answers to the wrong request, causing false positives.  worked around.

- DNSBL lookups and other DNS operations are now more efficient, by using a
  custom single-socket event-based model instead of Net::DNS.

- add support for accreditation services, including Habeas v2.

- better URI parsing -- many evasion tricks now caught.

- URIBL lookups are prioritized based on the location in the message
  the URI was found.

- mass-check now supports reusing realtime DNSBL hit results, and sample-based
  Bayes autolearning emulation, to reduce complexity.

- sa-learn, spamassassin and mass-check now have optional progress bars.

- modify header ordering for DomainKeys compatibility, by placing markup
  headers at the top of the message instead at the bottom of the list.

- spamd/spamc now support remote Bayes training, and reporting spam.

- spamc now supports reading its flags from a configuration file using the -F
  switch, contributed by John Madden.

- added SPF-based whitelisting.

- Polish rules contributed by Radoslaw Stachowiak.

- many rule changes and additions.


-- 
Duncan Findlay


signature.asc
Description: Digital signature


[vote] Release 3.1.0-rc2

2005-08-27 Thread Duncan Findlay
I hereby propose that we release Apache SpamAssassin 3.1.0-rc2

http://people.apache.org/~duncf/devel/

md5sum of archive files:
1e2ecf555d62deae136b08fb482e8f68  Mail-SpamAssassin-3.1.0-rc2.tar.bz2
41fe5c0c5ab226e0d33de20c10f69240  Mail-SpamAssassin-3.1.0-rc2.tar.gz
91bc48f87eb520040ece42dced886243  Mail-SpamAssassin-3.1.0-rc2.zip
sha1sum of archive files:
a68a040c2b2c51d7284fbd15336e639a32a0d45d  Mail-SpamAssassin-3.1.0-rc2.tar.bz2
a20f3d82743186af085fac1deb540c22ebdc8ce1  Mail-SpamAssassin-3.1.0-rc2.tar.gz
f76cc96981c6766d48edd6ed60c621036a9dfcf5  Mail-SpamAssassin-3.1.0-rc2.zip

I vote +1.

For those following the commits mailing lists, yes, this is my first
build. ;-)

The GPG signatures are not available at the above URL since I do not
yet have access to the GPG signing key.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


RC2?

2005-08-27 Thread Duncan Findlay
Remaining bugs in 3.1.0 queue:

4494nor ASSI[review] sa-learn uses local_tests_only=0 which can 
mess ...

Patch ready, needs one more vote.

4552maj NEW [review] Unitialized value warnings in spamd

Patch ready, needs two more votes.

4558nor NEW oscommerce ships with an open redirector

Punt to 3.1.1? Certainly not a release blocker, but looks trivial,
someone just needs to put together a patch.

I'm happy to get RC2 ready, though I need access to the private
signing key. (Anyone want to send that to me via GPG encrypted mail?)
But first, we need to fix at least 4494 and 4552. Please, committers,
take a second to review those bugs.

Thanks,
-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Wanted: Better Documentation

2005-08-26 Thread Duncan Findlay
On Wed, Aug 24, 2005 at 03:54:19PM -0700, Daniel Quinlan wrote:
> I doubt this will work any better than just working on it incrementally
> in SVN.

Well, I think anything that lowers the bar for submissions is going to
help somewhat. It's much more of a hassle to submit a bug report for a
minor typo as compared to going and fixing it yourself.

Now all we need to do is to throw a link into the documentation so we
can catch users as they're first reading it. Then the fixes will start
pouring in. (I'm hoping)

I think it might be a little *more* work on me (or whoever else wants
to copy stuff over), but also will yield *more* fixes, don't you
think?

> If anyone here would like to help us work on documentation, submit some
> patches via bugzilla and if we like them, the PMC will consider giving
> you commit access.

We could do this with Wiki contributions too, just as easily; without
the Bugzilla overhead.

So to summarise: to help contribute documentation, either a) Post to
the Wiki, or b) File a bug (preferably with a patch). If we like what
you do, we'll consider giving commit access.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Wanted: Better Documentation

2005-08-21 Thread Duncan Findlay
Certain parts of SpamAssassin's documentation are horribly out of date
and could really use some help. (For example spamd/README still
recommends people set --max-children to 20, and has numerous "FIXME"
sections.)

Since it's ofter a bit arduous to file a bug and submit a patch to
make corrections, I've put all our documentation up on the Wiki, so
it's now super easy to edit. Please go through it if you get a
chance. Periodically, we'll go through it and apply the changes to
SVN.

It's my hope that this will lead to more accurate and useful
documentation.

So, here's the link:
http://wiki.apache.org/spamassassin/BetterDocumentation


Happy editing! We appreciate your help.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: proposed branch policy change

2005-08-15 Thread Duncan Findlay
On Mon, Aug 15, 2005 at 10:37:01AM -0700, Justin Mason wrote:
> Daniel Quinlan writes:
> > I propose that new branches default to CTR mode and only enter RTC if
> > explicitly made so.  All existing branches are RTC mode, of course.

+1

I agree that the act of branching and the decision to go to RTC should
not necessarily be made at the same time.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: VOTE: the rules project

2005-08-13 Thread Duncan Findlay
On Sat, Aug 13, 2005 at 03:23:58PM -0700, Justin Mason wrote:
> Based on email from the last few weeks, I think we're all pretty happy
> with the sandboxes idea as described on
> http://wiki.apache.org/spamassassin/RulesProjSandboxes .  It's also the
> first step on the way to all the other listed ideas. Given that, I've come
> up with a task list to get us there.   So can we get votes, both for the
> plan, and for the tasks?  Here they are:

I'm not sure the tasks really need to be voted on, I mean, would it
make sense if we approved all but number 3? :-)

> - PMC: vote to approve the sandboxes project
> - reorganise the rules directory into core/ , sandbox/, and extra/; link
>   that rules project SVN repository to 3.2.0's 'rules' dir; use SVN
>   externals to do this.
> - write scripts to test, filter, and pull rules from sandboxes
>   automatically into core/ production ruleset
> - move current ruleset into a new "legacy" sandbox
> - start using the above scripts to generate core/ ruleset in svn

+5

i.e. +1 for each of the above.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: svn commit: r232534 - /spamassassin/branches/3.1/lib/Mail/SpamAssassin/BayesStore/MySQL.pm

2005-08-13 Thread Duncan Findlay
On Sat, Aug 13, 2005 at 10:02:23PM -, [EMAIL PROTECTED] wrote:
> Author: parker
> Date: Sat Aug 13 15:02:22 2005
> New Revision: 232534
> 
> URL: http://svn.apache.org/viewcvs?rev=232534&view=rev
> Log:
> Fix typo

Yeah, sorry, I meant to fix both trunk and 3.1, but got distracted
half way through. :-)

Duncan


signature.asc
Description: Digital signature


Re: [Spamassassin Wiki] Update of "DnsblAccuracy082005" by JustinMason

2005-08-12 Thread Duncan Findlay
On Fri, Aug 12, 2005 at 10:52:55PM -, Apache Wiki wrote:
> The following page has been changed by JustinMason:
> http://wiki.apache.org/spamassassin/DnsblAccuracy082005
> 
> The comment on the change is:
> everybody likes DNSBL stats ;)
> 
> New page:
> = DNS Blocklist Accuracy Figures (as of July 2005) =

[...]

> * hits are recorded from 'live' data at the time the messages were
> received, not post-facto testing (using 'mass-check --reuse')

I don't think *everyone* used --reuse. :-( So these stats are probably
not nearly as valid as we'd like to think. (I, for example, did not
use --reuse.)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: change R-T-C rules

2005-08-10 Thread Duncan Findlay
On Wed, Aug 10, 2005 at 08:13:16PM -0700, Justin Mason wrote:
> I propose we change our procedures to lower the number of +1's required
> for code changes during R-T-C, from 3 to 2.  Please vote.

I'm not convinced that we've been blocked by the review process. Bug
4505 has taken forever to resolve, but it's primarily a lack of
patches for review, or, perhaps it's not been made clear when a review
is needed.

Either way, I don't think the above proposal would have helped us. I'm
-1 (assuming we need a majority; this is not a veto), until I can be
convinced that this would actually speed up the release process by a
meaningful amount. :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: PROPOSAL: create "SpamAssassin Rules Project"

2005-07-26 Thread Duncan Findlay
On Tue, Jul 26, 2005 at 02:33:12PM -0400, Chris Santerre wrote:
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > Herb Martin writes:
> > > Normally in an open source project anyone who wishes to
> > > listen, lurk, and read or even use the bleeding edge code
> > > is free to do so to learn and get into the frame of the
> > > project.
> > > 
> > > That cannot be true (to the same extent) if there are 
> > > security layers that make such gradual involvement 
> > > difficult.
> > 
> > Yep, this is entirely true -- and this is the reason why the 
> > ASF suggests
> > that lists should be open if at all possible.
> > 
> > It's a tricky conundrum -- need to think about this some more...
> 
> I don't see official rules majorly discussed in the open now. With a new
> release of SA, you don't go into detail about what new rules are looking
> for, so why should that change. 

If they aren't discussed in the open right now, they aren't being
discussed. :-) The development process is perfectly open right now,
yet it's not a problem; i.e. we don't have any evidence that spammers
are exploiting this.

> People who update from SARE, just hear: "Hey .cf got updated." And they
> go and get it. Or they don't even know it gets updated and the RDJ script
> does it. So public is pretty good at just accepting the rule updates. 

Yes, but it's difficult for people to join SARE, or learn what goes
into rule development. If all the development takes place in private,
then there's no way for newcomers to join and this is a really bad
thing.

> Having an open public discussion on new rule ideas, pretty much defeats the
> purpose.

I'd like to see the data that supports this claim. I'm really
skeptical.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Hackathon summary

2005-07-23 Thread Duncan Findlay
Just thought I'd post a quick note about the hackathon that took place
today at Stanford university. "We" below refers to Justin Mason,
Daniel Quinlan, Michael Parker and me. Matt Sergeant was also present
too for a while, so "we" can include him too for some of the
following items. :-)

Discussion

 * We discussed at length the ideas for the new rules project, and we
came up with some ideas, which we're trying to track
http://wiki.apache.org/spamassassin/RulesProjectPlan (Please give us
your feedback)

 * We discussed the 3.2 release goals
(http://wiki.apache.org/spamassassin/ReleaseGoals)

 * Dr. Andrew Ng gave us a brief presentation of how Logistic
Regression may be an algorithm we could use in the future to replace
the perceptron.

Development

 * We came up with a plan to restructure PerMsgStatus.pm so it's not
so unwieldy and out of
control. (http://bugzilla.spamassassin.org/show_bug.cgi?id=4497)

 * We branched the tree so we could start committing stuff to
HEAD. Strangely, however, we got almost no coding done.

QA/Bugs

 * We went through all the bugs targeted for 3.1.0 and triaged
them. (all the bugzilla comments from me today were really from all of
us that were present)

 * We added a "moreinfo" keyword to bugzilla for bugs that are in need
of more info. One side effect of this, is that we'll need to remove
that keyword when more info is actually given. :-)


That's about all.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Branch 3.1?

2005-07-23 Thread Duncan Findlay
I think it's time to branch 3.1 (also so Justin, Dan, Michael and I
can get some work done while we're here). Votes?

+1

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: PROPOSAL: create "SpamAssassin Rules Project"

2005-07-21 Thread Duncan Findlay
On Wed, Jul 20, 2005 at 11:53:25PM -0700, Robert Menschel wrote:
> Hello Duncan,
> 
> Wednesday, July 20, 2005, 9:07:15 PM, you wrote:
> 
> >> The SARE list is private and invitation only for exactly these reasons.
> 
> DF> I'm *really worried* about proposals that involve mailing lists that
> DF> have only private archives and require moderator approval for
> DF> subscription. It just doesn't feel right for an open source project.
> 
> Agreed.  But you do secure the security-bug submissions from
> publicly accessible lists and archives...

Leaking rules to the public don't compromise users systems! Obviously
there is a tradeoff.

> DF> It's quite possible that this drives people away. In fact I'm quite
> DF> sure people are less likely to get involved if they have to somehow
> DF> prove that they aren't a spammer in order to subscribe.
> 
> Yes, but you also don't want spammers wrecking the system, making it
> useless.  There's a viable balance somewhere...

Agreed.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: PROPOSAL: create "SpamAssassin Rules Project"

2005-07-21 Thread Duncan Findlay
On Wed, Jul 20, 2005 at 11:35:20PM -0700, Loren Wilton wrote:
> > I'm *really worried* about proposals that involve mailing lists that
> > have only private archives and require moderator approval for
> > subscription. It just doesn't feel right for an open source project.
> 
> I understand the feeling.  I'm trying to balance the obvious desire for a
> completely public process with the absolutely known fact that publishing a
> rule in the user's group will literally within hours lead to the rule
> becoming useless in many cases.

I guess you'd have better data than I would; but I'm still having
trouble believing that Spammers are adjusting on that time frame.

> (I've even a couple of times as a test given the bodies for slightly bogus
> rules out - that detected a not particularly useful spam sign - to see if
> the spam sign disappeared, and how quickly.  Indeed, the signs would usually
> disappear.  One could probably conclude something about the spam gang using
> a particular sign from how quickly after publication of a rule the sign
> disappears; but I'm not particularly interested in that form of research.)
> 
> This led to my twofold suggestion that a) entry to the group be moderated,
> and b) the archives be embargoed for a week or two, or perhaps a month.

But how do we know who should be allowed access to the group?

I definitely prefer delayed archives to closed ones.

> For instance, on many projects to be a developer you have to be
> admitted to developer access to the source.  Others can look at the
> source and make their own versions, but can't necessarily modify the
> actual project source unless the local gods approve of them.  (See
> for instance the description of the Audacity project over at SF,
> which I was looking at earlier today.)

I'm really not sure what you mean here. Audacity is licensed under the
GPL. The main difference between the GPL and the Apache license (IIRC,
IANAL, etc) is that with the GPL, if you do make changes and
distribute a changed version, you need to distribute the source of the
changed version.

I'm sure they have the same procedures with respect to modifying the
official project source as we do, namely there is a group of
committers that have access to do this, everyone else gets to submit
patches to them. (And I'm not sure what you mean by "local gods". Most
developers are human, at least the ones I've met in person... :-P )

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: NOTICE: rescore mass-checks

2005-07-20 Thread Duncan Findlay
On Wed, Jul 20, 2005 at 01:22:47PM -0700, Justin Mason wrote:
> hmm -- do they have to have a ham file?  I don't think there's a need for
> that to be a rule.

I think there is. Without spam, I'd be pretty leary of the quality of
the corpus, but specifically the quality of the BAYES results.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: PROPOSAL: create "SpamAssassin Rules Project"

2005-07-20 Thread Duncan Findlay
On Wed, Jul 20, 2005 at 11:37:20AM -0400, Chris Santerre wrote:
> Perhaps some thing like the dev "bug squish events" could be used? Once a
> week the people who run SARE rule sets check to see the biggest hitters, and
> on that day we test those heavy hitters against a bigger corpus, and look to
> add to SA. Successful ones get moved out of SARE and into SA.  

Interesting idea. I think I'd like to see more of the development take
place under the Apache umbrella, so that the failures and the
successes are available to all rather than just the SARE people. Also,
if people get bogged down and don't get a chance to submit rules, etc,
it's easier for other people to take over.

With Dan's sandbox idea, that can certainly happen - the development
would take place separate from SA, but still in Apache so that rules
can easily be brought in to the main distribution, at some interval,
say, once a week? :-)

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: PROPOSAL: create "SpamAssassin Rules Project"

2005-07-20 Thread Duncan Findlay
On Wed, Jul 20, 2005 at 08:44:26PM -0700, Robert Menschel wrote:
> Indeed, it's not uncommon for a rule or ruleset to be checked 2-3
> times with knowingly excessive regexes, so we can see what actually is
> or isn't being matched in various regex hits.  We use this information
> to improve the rule, and then remove the excess to the regex for a
> final pre-publication run.

When more of the committers were actually writing rules, we'd do the
same thing, we'd commit some giant number of rules (up to 20, for
example) and wait till the next day when the results came back. Sure,
it's a lot nicer to be closer to real time!

-- 
Duncan Findlay


signature.asc
Description: Digital signature


  1   2   >