Resigning from SpamAssassin project
Hi everybody, I greatly enjoyed my time working on SpamAssassin, but sadly I haven’t had time to work on SpamAssassin in a long time. As a result, I’d like to officially resign from the project, both as a committer and as a member of the SpamAssassin PMC. Thanks Duncan Findlay
Resigning from SpamAssassin project
Hi everybody, I greatly enjoyed my time working on SpamAssassin, but sadly I haven’t had time to work on SpamAssassin in a long time. As a result, I’d like to officially resign from the project, both as a committer and as a member of the SpamAssassin PMC. Thanks Duncan Findlay
Re: syncing SpamAssassin with Debian downstream
On Jan 26, 2010, at 3:00 PM, Adam Katz wrote: > If only I had noticed this before the 3.3.0 release... > > There are some patches to the Debian package for 3.2.5 which are > applicable to the trunk. I'm comfortable incorporating some of them > myself, but wanted to double-check on this one: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=491159 > > The update: > http://svn.apache.org/viewvc?view=revision&revision=903473 All the patches to the source used in the Debian packaging can be found here: http://svn.debian.org/viewsvn/collab-maint/deb-maint/spamassassin/trunk/debian/patches/ Noah, is there something we can do to make it easier to submit these changes upstream? > I've also added Duncan Findlay's Debian-specific rules (which don't > actually look so Debian-specific to me) to my sandbox to see how > useful they might be to a larger audience. They're marked "nopublish" > until we either get a thumbs-up from Duncan or enough time passes that > we can assume full license compatibility. Yikes, that file is ancient; it probably dates back to when we used nice rules with impunity. I'm a little surprised that nobody ever exploited those rules (or nobody noticed somebody exploiting the rules enough to file a bug report). As far as license issues: I allow inclusion of all or any part of 65_debian.cf to be licensed under the Apache License in accordance with the CLA I have on file. > To ensure there aren't licensing issues, I've copied Duncan and Noah > Meyerhans (the current Debian packager for SpamAssassin) on this email > as a way of keeping everybody on the same page. Thanks Duncan
Re: Hudson build became unstable: SpamAssassin-trunk #2776
On Mar 3, 2009, at 1:56 AM, Justin Mason wrote: On Tue, Mar 3, 2009 at 00:05, Duncan Findlay wrote: According to http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776 my commit broke the build because we run make test on the distributed files, and mass-check isn't distributed. What's the best way to work around this? Presumably we should skip reuse.t if masses/mass-check doesn't exist, but that could indicate an error we want to fail on. Should it be made a config option? No, I think if (-e "masses/mass-check") fails, or maybe just if (-d "masses") fails, skip the test. That will be very obvious, and it's unlikely we could accidentally svn delete the entire masses directory without someone noticing. ;) Makes sense to me. I've fixed this in r749680. Sorry for all the build failure spam! Duncan
Re: Hudson build became unstable: SpamAssassin-trunk #2776
On Mar 2, 2009, at 3:48 PM, Apache Hudson Server wrote: See http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776/changes According to http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776 my commit broke the build because we run make test on the distributed files, and mass-check isn't distributed. What's the best way to work around this? Presumably we should skip reuse.t if masses/ mass-check doesn't exist, but that could indicate an error we want to fail on. Should it be made a config option? Also, I'm confused by this output: http://hudson.zones.apache.org/hudson/job/SpamAssassin-trunk/2776/console It suggests that a couple of other tests failed as well, but these aren't listed in the summary? Failed Test Stat Wstat Total Fail Failed List of Failed --- t/config_tree_recurse.t 17 4352 48 200.00% 1-4 t/whitelist_addrs.t 255 6528035 44 125.71% 14-35 36 tests skipped. Failed 2/152 test scripts, 98.68% okay. 26/3128 subtests failed, 99.17% okay. *** Error code 11 Any thoughts? Thanks Duncan
Re: svn commit: r601070 - in /spamassassin/trunk/spamc: libspamc.c libspamc.h
On Dec 5, 2007, at 1:22 AM, Justin Mason wrote: All of these constants are exposed for public use; by changing their values, ABI compatibility is broken. I suggest changing them back to what they were before, and simply using (1<<14) for SPAMC_LOG_TO_CALLBACK. That way ABI compatibility is maintained, and callers don't need to recompile their code to use a new libspamc. Good point. I was only thinking in terms of source compatibility, but you're right that's a gratuitous and unnecessary change to the ABI. Duncan
Re: add a new rule type: single-line body?
On Jul 27, 2007, at 7:08 AM, Justin Mason wrote: Just wondering. would it be handy to have a new "body" type, the same as "body" but matched as a single string, with all newlines converted to " "? in other words, this text: [...] ie, no newlines, all whitespace converted to " ". this would be optimal for matching with phrase rules. (To avoid exponential-runtime .* problems, it'd chop the text after the first 8000 characters or so.) How is this different than rawbody /s rules?
Re: CEAS 2007 Live Spam Challenge (today)
On Aug 2, 2007, at 3:57 AM, Justin Mason wrote: Mark Martinec writes: Does anyone have a SpamAssassin-based content filter registered for the today's CEAS 2007 Live Spam Challenge? http://www.ceas.cc/challenge/ not that I know of... I'm sure there were some SpamAssassin-derived entries, but we can't know for sure. The Challenge itself was a bit of a failure -- they had numerous technical problems and in the end they measured performance based on only a few hundred spam and non-spam. Duncan
Re: Is re2c 0.10.x really needed?
On Mon, May 28, 2007 at 12:13:07PM +0100, Justin Mason wrote: > The re2c author has recommended 0.12.0: > http://sourceforge.net/tracker/index.php?func=detail&aid=1708378&group_id=96864&atid=616200 > Actually i highly suggest you use cvs version of either 0.12.0 > (unreleased) or even HEAD. It appears the generated code is wrong using > the older re2c versions. I discoverd this while fixing the crash. So in > 0.12.0 and HEAD both issues are resolved. Alright, I'll make spamassassin recommend re2c (>= 0.12.0) as soon as such a version of re2c is uploaded to Debian. -- Duncan Findlay pgpAUd0oGkEPb.pgp Description: PGP signature
Re: Is re2c 0.10.x really needed?
On Sun, May 27, 2007 at 07:59:11PM -0400, Matt Sergeant wrote: > Bug fixes. :-) > While I was hacking on the original re2xs that Justin based sa-compile off, I > found some bugs which they fixed in the 0.10.x > series. > But they were just bugs in the code generation, so if it's working fine for > you then that should be OK. So, if sa-compile doesn't fail, the version of re2c is adequate? -- Duncan Findlay pgpYq3VFznESn.pgp Description: PGP signature
Is re2c 0.10.x really needed?
Hey folks, The version of re2c currently in Debian is 0.9.x, and according to the sa-compile man page, 0.10.x is needed for the Rule2XSBody plugin. As far as I can tell, sa-compile is working fine with 0.9.x. Anybody know the reason behind recommending 0.10.x? -- Duncan Findlay pgpYUxF0cxvsS.pgp Description: PGP signature
Re: sa-update-keys
On Thu, May 24, 2007 at 02:44:58PM -0400, Daryl C. W. O'Shea wrote: > Duncan Findlay wrote: > >I'm working on the 3.2.0 Debian package and I'm running into some > >problems with the way I handle the sa-update-keys directory. > >It's in /etc/spamassassin/sa-update-keys, which implies it contains > >configurations file. As far as I can tell, it's not something that we > >expect anyone to touch manually (we specifically provide sa-update > >--import to import keys instead of having them use gpg directly on > >these files), so it would seem to me that it should be in > >/var/lib/spamassassin/sa-update-keys or something instead. > I wouldn't put it there... you should be able to rm /var/lib/spamassassin and > have everything continue to work fine. Well... It would assuming you have no non-standard channels. > >Is there a reason I'm missing for putting it in /etc? > The keys aren't really variable (we're not going to release new ones in a > rule update), they are a part of the software > configuration and configurable (manually or with sa-update) though. I'm not sure I buy that -- they really aren't configuration. I mean you might configure which keys to allow, but the actual contents of the public keys aren't really configuration information. -- Duncan Findlay pgpFJ0FZXt6pm.pgp Description: PGP signature
sa-update-keys
I'm working on the 3.2.0 Debian package and I'm running into some problems with the way I handle the sa-update-keys directory. It's in /etc/spamassassin/sa-update-keys, which implies it contains configurations file. As far as I can tell, it's not something that we expect anyone to touch manually (we specifically provide sa-update --import to import keys instead of having them use gpg directly on these files), so it would seem to me that it should be in /var/lib/spamassassin/sa-update-keys or something instead. Is there a reason I'm missing for putting it in /etc? -- Duncan Findlay pgpNbEpnTAnsx.pgp Description: PGP signature
Re: [VOTE][DRAFT] SpamAssassin 3.2.0
On Tue, May 01, 2007 at 03:13:13PM +0100, Justin Mason wrote: > ok, here's the proposed release announcement and tarballs. > PMC members, please vote on these tarballs -- for a full release, > we need 3 +1's from PMC members ;) +1 on the tarballs, they met basic sanity checks, make test passed succesfully. -- Duncan Findlay pgplfvlOcYCIl.pgp Description: PGP signature
Re: Score Generation for Apache SpamAssassin
On Thu, Apr 26, 2007 at 12:15:52PM +0100, Justin Mason wrote: > thanks Duncan -- a great read, and looks promising! > Would it help btw if we came up with a spec for what a score-generation > tool needs to generate, in terms of score ranges and so on? > This would also be useful for the future (I'm sure there'll be > more... ;) Probably not to me, but it might be useful to others. (I think I already know what needs to be done.) Also, it might limit creativity in possible solutions. We need a score ranges mechanism, we don't need the specific one we have now. -- Duncan Findlay pgpnowbLHPvri.pgp Description: PGP signature
Score Generation for Apache SpamAssassin
Hi everybody, As you may already know, Steven Birk and I have been working on our 4th year undergraduate project in Math and Engineering at Queen's University. The goal of our project was to examine the use of logistic regression as a potential replacement for the Perceptron/GA currently used by the SpamAssassin project. It's now done, and it's available here: http://people.apache.org/~duncf/FindlayBirkThesis.pdf Basically, we've found a technique that shows promise as a possible replacement, but requires some modifications in order to handle some of the restrictions the SpamAssassin projects puts on scores. I hope to try to make those modifications in the next month or so, but I have no idea how well it will turn out, or how easy it will be. The paper may be an interesting read for people not too familiar with the way the scoring process works now, as it discusses many of the issues that differentiate the scoring process from most other machine learning problems. (Then again, it might just be boring.) Enjoy! -- Duncan Findlay pgpUSBFsMSnZj.pgp Description: PGP signature
Re: Better score generation tool
On Tue, Mar 13, 2007 at 10:32:20AM +, Justin Mason wrote: > > There seem to be a lot of... issues... relating to promoting rules, > > for example, there are rules that were mass-checked under one name and > > then promoted (I guess I need to check out the exact revision of > > rulesrc before running any scoring scripts?). Or maybe I just don't > > understand how it all works. > I suspect the latter ;) > Read the documentation on the wiki: I've kept it up to date for > the 3.2.0 mass-checks, so it's canonical. > http://wiki.apache.org/spamassassin/RescoreMassCheck > Basically, you have to keep a single "rules/active.list" file for the > entire process, and ensure you don't overwrite it with an "svn update" > halfway through. (see '4.3 resync to mcsnapshot rules list') Yeah, sorry I didn't end up reading that the first time round. Guess I wanted to go off memory... Thanks for the link. Anyways, I'm still getting some wierdness: make tmp/ranges.data spewed a whole lot of errors about tests like T_FRT_CONTACT no longer existing -- it exists in the mass-check logs, but I don't see it in the rules directory. Is this just a matter of mass-checkers checking against 70_sandbox.cf though they shouldn't? -- Duncan Findlay pgpws2vRcNFrK.pgp Description: PGP signature
Re: Better score generation tool
Whoops... Turns out I used all the sandbox rules when generating my scores instead of just the active ones. Naturally, the TCR I reported was much higher than it should have been. This suggests two things: 1. We should probably loosen our promotion criteria. 2. The results I quoted in my previous e-mail are wrong. Sorry if I got your hopes up There seem to be a lot of... issues... relating to promoting rules, for example, there are rules that were mass-checked under one name and then promoted (I guess I need to check out the exact revision of rulesrc before running any scoring scripts?). Or maybe I just don't understand how it all works. -- Duncan Findlay pgpeojGe22EZv.pgp Description: PGP signature
Re: Better score generation tool
On Mon, Mar 12, 2007 at 01:48:10PM +, Justin Mason wrote: > that *is* good news ;) can you give a rough idea of what algorithm > it uses? It's basically a logistic regression algorithm, but optimized for binary data. It's called Truncated Regularized Iteratively Reweighted Least Squares (TR-IRLS). I'll see if I can get some spare time to at least provide valid scores that I've optimized (once I work out the min/max bits), even if I can't commit my scripts yet. -- Duncan Findlay pgpQRkDCyORMj.pgp Description: PGP signature
Better score generation tool
Good news, everyone! As part of our 4th year Math & Engineering Design Project, Steven Birk and I have been working to develop a better scoring algorithm for SpamAssassin. We've come across an algorithm that shows some great promise: Using the 3.2.0 logs: scoreset 0: # SUMMARY for threshold 5.0: # Correctly non-spam: 67528 99.97% # Correctly spam: 100519 84.41% # False positives:22 0.03% # False negatives: 18564 15.59% # TCR(l=50): 6.055889 SpamRecall: 84.411% SpamPrec: 99.978% # SUMMARY for threshold 3.5: # Correctly non-spam: 67446 99.85% # Correctly spam: 108479 91.10% # False positives: 104 0.15% # False negatives: 10604 8.90% # TCR(l=50): 7.534991 SpamRecall: 91.095% SpamPrec: 99.904% scoreset 1: # SUMMARY for threshold 5.0: # Correctly non-spam: 67498 99.92% # Correctly spam: 112670 94.61% # False positives:52 0.08% # False negatives: 6413 5.39% # TCR(l=50): 13.212360 SpamRecall: 94.615% SpamPrec: 99.954% scoreset 2: # SUMMARY for threshold 5.0: # Correctly non-spam: 67517 99.95% # Correctly spam: 115916 97.34% # False positives:33 0.05% # False negatives: 3167 2.66% # TCR(l=50): 24.721403 SpamRecall: 97.341% SpamPrec: 99.972% scoreset 3: # SUMMARY for threshold 5.0: # Correctly non-spam: 67518 99.95% # Correctly spam: 117809 98.93% # False positives:32 0.05% # False negatives: 1274 1.07% # TCR(l=50): 41.434586 SpamRecall: 98.930% SpamPrec: 99.973% # SUMMARY for threshold 5.2: # Correctly non-spam: 67521 99.96% # Correctly spam: 117727 98.86% # False positives:29 0.04% # False negatives: 1356 1.14% # TCR(l=50): 42.438703 SpamRecall: 98.861% SpamPrec: 99.975% These are using the same training and validation sets as bug 5270. The run time is roughly of the same order of magnitude as the perceptron. (The slow bit is the analog of the logs-to-c script.) Clearly from the set 0 results, we need to tune the algorithm some more to get the threshold of 5.0 to be optimal. At this point, the algorithm breaks a number of our current score generation "rules", so there is room for improvement. (We're working on it). - Our handling of immutable rules is pretty much broken at this point. (We assume all rules are mutable, evaluate the optimal threshold value and scale our scores appropriately, and then only update the mutable scores for evaluating against the validation set. For our purposes, we also assumed BAYES_* is mutable.) I'm not sure how hard this will be to fix, or if it's worth it. - We have no concept of max/min scores or score ranges. Many tests get small negative scores and should simply be set to 0. We haven't yet figured out what effect this has on the TCR. Also, some scores get set really high -- i.e. BAYES_99 is scored 6.1 in scoreset 3. I'm not sure people are comfortable with that. There's at least 2 ways we can fix this -- adapting the algorithm to take into account min/max scores (hard), simply capping the scores after they are generated (easy). A quick look through the scores and score-ranges-from-freqs output suggests that this will not hurt our performance all that much. Our project is due in a few weeks, and with any luck we'll have a complete new score generation system for SpamAssassin. -- Duncan Findlay pgpi782Vwj6ls.pgp Description: PGP signature
Re: VOTE: change voting procedure for prerelease tarballs
On Tue, Mar 06, 2007 at 01:26:05PM +, Justin Mason wrote: > As noted in recent dev list traffic (see below), we have a problem: we > haven't been able to publish a 3.2.0 prerelease tarball yet in the past > few weeks, due to lack of votes across two attempts. > Currently, our release policy [1] requires 3 committer +1s to mark a > tarball as a pre-release. I propose that we change this to "lazy > consensus", instead, since ASF policy requires votes only for "package > releases" [2], which I'd interpret as a *full*, general-availability > release. > [1]: http://wiki.apache.org/spamassassin/ReleasePolicy > [2]: http://www.apache.org/foundation/voting.html > My vote: +1 > I guess this is a PMC vote, so PMC members: please vote... +1 after board clarification that this is OK. -- Duncan Findlay pgpVGwxfkeYWF.pgp Description: PGP signature
Re: VOTE: change voting procedure for prerelease tarballs
On Tue, Mar 06, 2007 at 01:26:05PM +, Justin Mason wrote: > As noted in recent dev list traffic (see below), we have a problem: we > haven't been able to publish a 3.2.0 prerelease tarball yet in the past > few weeks, due to lack of votes across two attempts. > Currently, our release policy [1] requires 3 committer +1s to mark a > tarball as a pre-release. I propose that we change this to "lazy > consensus", instead, since ASF policy requires votes only for "package > releases" [2], which I'd interpret as a *full*, general-availability > release. Can we get a confirmation of this (I don't know who to ask)? IIRC, this has come up before and prerelease tarballs required a vote. -- Duncan Findlay pgptboyQcVRMw.pgp Description: PGP signature
Re: VOTE: SpamAssassin 3.2.0 prerelease 2 tarballs
On Fri, Mar 02, 2007 at 01:22:12PM +, Justin Mason wrote: > should we just not bother with votes for prereleases? > To be honest, I can't see the harm in accidentally pushing a prerelease > tarball at the wrong time -- and this is the second 3.2.0-preX that isn't > garnering votes, so clearly the process is getting in the way here. :( > (Votes for "official" full releases, of course, would still be necessary) I think by ASF policy, we need a vote. That said, right now a +1 vote means: a) I think we should have a pre-release now. b) The tarballs presented are well constructed, work well, etc and I've tested them. I think if we agree that a +1 vote for pre-release only implies a) then we won't have the issue of not getting votes. I haven't had time to test the tarballs, so I haven't voted, but I'm +1 on the idea of a pre-release. (i.e. +1 to part a) above). -- Duncan Findlay
Re: VOTE: SpamAssassin 3.2.0 prerelease 2 tarballs
On Wed, Feb 28, 2007 at 07:46:55PM +0100, Matthias Leisi wrote: > I installed it and all is fine except that spamd does not like the > - --daemonize [1] option. Spamd starts, runs the test/lint message, and > then dies without warning before I can even feed a message using spamc. > No problems if I omit --daemonize (even with all custom things loaded). Silly question, but are you sure it didn't just background itself like it's supposed to do with the daemonize option? (I mean did you check that the process isn't still running with "ps aux" (or similar)) Did you try using spamc to send it a message anyways? -- Duncan Findlay pgpIEeYqZ9sIr.pgp Description: PGP signature
Re: Nagios
On Tue, Feb 13, 2007 at 12:12:06AM -0500, Theo Van Dinter wrote: > On Mon, Feb 12, 2007 at 10:53:40PM -0500, Duncan Findlay wrote: > > Any chance we can turn off the nagios notifications? Or at least turn > > them down in frequency? > How about fixing the issues? ;) I haven't had time to figure out > what it's monitoring for, so I haven't prodded the box to figure out > what's up. A quick look around makes it seem that things are ok, > but ... Well, to be fair, the vast majority of this list can't (due to permissions and such) fix said issues. (I'm going on the assumption that the number of subscribers to dev >> number of committers/PMC members.) And far more can't easily fix them due to lack of understanding on how it works. :-) (I suppose I probably fall in the latter category.) Is it possible to acknowledge the issue to silence the alerts? I imagine this requires access to the apache nagios web interface, but I don't have any idea where that is or who has access. Or perhaps, notifications should go to [EMAIL PROTECTED] I just don't want anyone driven away from the dev list by the intense volume of nags. :-) -- Duncan Findlay pgppN3Kr2RhYl.pgp Description: PGP signature
Nagios
Hey folks, Any chance we can turn off the nagios notifications? Or at least turn them down in frequency? Thanks, -- Duncan Findlay pgpUYlXzs640M.pgp Description: PGP signature
Re: Moving on
On Sun, Jan 14, 2007 at 09:48:57AM -0800, Robert Menschel wrote: > As you may have gathered by my complete silence these last several > months, I've been unable to contribute any time to SARE or SA. My > time is taken up by other things right now, and it doesn't look > like that's going to change for a while. Thank you for your contributions -- we'll miss you. -- Duncan Findlay
Re: 3.2.0 release schedule
On Wed, Jan 03, 2007 at 02:42:44PM +, Justin Mason wrote: > - T + 0 days: announce a heads-up mail. clean up our corpora, get ready > for mass-checking, try out mass-check to spot any big memory leaks or > whatnot, fix remaining bugs that affect mass-checks (esp bug 5260!), > get people signed up, enable all rules in svn. > - T + 1 week, around a Thursday or so: start --bayes --net mass-checks; > move to C-T-R. > - T + 3 weeks, a Monday or so: hopefully finish mass-checks, bugs > allowing ;) (note that includes two weekends.) > - T + 3 weeks: perceptron runs, voting on new proposed scores, etc > - T + 4 weeks and a bit: hopefully ready to release +1 BTW, how do we generate all 4 scoresets from one run? We used to have to do two runs, and I can't remember the rationale for that, or the rationale for doing it one. :-) -- Duncan Findlay
Re: ham check on RCVD_BAD_ID
On Tue, Dec 26, 2006 at 12:27:18AM -0500, Theo Van Dinter wrote: > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6140261 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6181618 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6587291 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6592701 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.6876629 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200606.7151699 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200607.792396 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200610.6358 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.587266 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1704258 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1817957 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1844028 > ham-daf.log:. > /home/duncf/Maildir/Old/debian-project/debian-project200612.75347 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.1435129 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.2505460 > ham-daf.log:. /home/duncf/Maildir/Old/debian-devel/debian-devel200612.2536901 All valid ham, from the same sender. I can send you a copy off-list if you want. -- Duncan Findlay pgp6Ai1WOJVO4.pgp Description: PGP signature
Re: [Spamassassin Wiki] Update of "BetterDocumentation/SqlReadme" by TimHunter
On Tue, Dec 12, 2006 at 06:06:42PM +, Justin Mason wrote: > Duncan, are you merging changes back from these pages? I'm > concerned that they're diverging from what's in SVN. I haven't seen one of these in a while. I'll merge it when I get a chance. -- Duncan Findlay pgpgUbYCMq2Ae.pgp Description: PGP signature
Re: Rule update over DNS?
On Thu, Dec 07, 2006 at 08:56:45PM +1300, Jason Haar wrote: > If all SA users set sa-update to run hourly - then when an update comes > out, you will have *all* SA users contacting the same sites > simultaneously for the downloads. Och... That's a good point. Those of us packaging SpamAssassin for distributions should think about this. :-) Will it be okay if all Debian users start running sa-update on the same minute of the hour? -- Duncan Findlay pgpuwvX322ox8.pgp Description: PGP signature
Re: another fp check
On Wed, Dec 06, 2006 at 05:06:43PM -0500, Theo Van Dinter wrote: > ../ham-daf.log:. > /home/duncf/Maildir/Old/debian-project/debian-project200606.349198 > ../ham-daf.log:. > /home/duncf/Maildir/Old/debian-project/debian-project200606.353902 > ../ham-daf.log:. > /home/duncf/Maildir/Old/debian-project/debian-project200606.365610 These are spam. Sorry. -- Duncan Findlay pgpxRS3ZJjYJy.pgp Description: PGP signature
Re: ham check please
On Sun, Dec 03, 2006 at 11:04:55PM -0500, Theo Van Dinter wrote: > ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.236990 > ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.550207 > ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200606.574290 > ham-daf.log:. /home/duncf/Maildir/Old/rogers/rogers200607.129882 All ham, for a legitimate newsletter I'm subscribed to. The attachment is an image/jpg, BASE64 encoded. I don't know what your rules are looking for, but the first line of the base 64 is one character longer than the others. -- Duncan Findlay pgp1vY7ugHjL6.pgp Description: PGP signature
logs-to-c
Hey folks, What's the deal with logs-to-c? As far as I can tell, it has two purposes: 1. Convert mass-check logs to C code for use with the perceptron. 2. Spit out FPs, FNs, TCR, and other statistics for the mass-check logs and the current scores. These two purposes seem to be entirely different. Furthermore, the code is ugly and it doesn't even "use strict". I was going to try to split the script into two (logs-to-c, which would do #1 above, and something else, maybe "evaluate-logs" to do #2). Unfortunately, I've run into some confusion. Logs-to-c reads in the ranges data from scores-ranges-from-freqs. It iterates through those data and modifies its internal concept of the score for each rule based on the range. (i.e. if the current score in 50_scores.cf is outside the range from scores-ranges-from-freqs it will set the score to the upper/lower limit of the range). There are other scenarios where it will change its internal concept of the score for a rule, but that's the idea. Now this makes sense when it needs to output the score for the perceptron. (i.e. Use #1) But when it's evaluating the FPs on the current logs, I'm not sure this makes sense. In theory, after rewriting the new scores as output by perceptron and re-running parse-rules-for-masses this munging of scores shouldn't make a difference, since the scores should be set within the their ranges.* So, the ultimate question is "do we need to fudge with our scores based on ranges info, in order to do a logs-to-c --count?" I would argue no. But I'm a relative neophyte to the internals of the scoring mechanisms. If the answer is no, I plan on splitting logs-to-c in half. Let me know what you think. Thanks, -- Duncan Findlay * Currently it makes a small difference; rules that are ignored have their scores set to 0 by read_ranges in logs-to-c, but never make it into perceptron.scores, so they can't be rewritten in 50_scores.cf. This can account for the difference. I'm not sure what the ideal behaviour is here. pgpYSL3wtIO8g.pgp Description: PGP signature
Re: spam in daf corpus
On Tue, Nov 21, 2006 at 01:16:25PM +, Justin Mason wrote: > Duncan -- fyi -- > Y 6 /home/duncf/Maildir/Old/debian-devel/debian-devel200611.1718489 > AXB_FAKETZ,GMD_FAKETZ,L_SPAM_TOOL_13,REPTO_OVERQUOTE_THEBAT,T_RCVD_CORRUPT_ESMTP,T_RCVD_FORGED_WROTE2,__CT,__CTE,__CT_TEXT_PLAIN,__FH_RCVD_NODNS,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_LOOP,__HAS_X_MAILER,__HAS_X_MAILING_LIST,__KAM_NUMBER2,__LIST_MAIL,__LIST_UNSUB,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__REPTO_OVERQUOTE,__REPTO_QUOTE,__SANE_MSGID,__SOBER_P_PRIO,__THEBAT_MUA,__THEBAT_MUA_V1,__TOCC_EXISTS > time=1163176269,scantime=0,format=m,reuse=no > looks spammy. Of course you're right. Fixed. Sorry about that. -- Duncan Findlay pgpsxHy1NrblD.pgp Description: PGP signature
Re: new rule->sa-update speedup idea (was Re: spam attacks - so and so wrote about a stock )
On Wed, Oct 18, 2006 at 06:07:01PM +0100, Justin Mason wrote: > Theo Van Dinter writes: > in other words, reducing the worst-case scenario to just under 1 day. (If > we were to increase frequency of update publishing in the future, that > would then reduce that further, if necessary.) > Rules that got promoted based on "being new" and having a 1.0 S/O in the > preflight mass-checks would then only *stay* promoted if they then passed > the normal, existing promotion criteria -- so a rule that was good > "enough" to get into the update due to a 1.0 S/O, but had FPs on the > larger test set, would fall out anyway after 1 day. I think I'd want to see a spam% restriction on there too. Unfortunately, this probably wont help, since (correct me if I'm wrong) the preflight mass-checks are old messages, not brand new ones, right? This would mean they wouldn't get a good S/O ratio anyways. -- Duncan Findlay pgpGDqNQb0XSY.pgp Description: PGP signature
Re: Nightly run OOMs during scan...
On Fri, Oct 13, 2006 at 10:40:26AM -0400, Theo Van Dinter wrote: > Second day in a row, I haven't started debugging yet, but fyi. I've been getting "too many open files" for a few weeks now. Also haven't debugged. -- Duncan Findlay pgpQEO3WhgUww.pgp Description: PGP signature
Re: mass-check: Too many open files!
On Sun, Oct 01, 2006 at 08:57:28AM -0400, Duncan Findlay wrote: > On Sat, Sep 30, 2006 at 03:06:44PM -0400, Daryl C. W. O'Shea wrote: > > Duncan Findlay wrote: > > >Fixed now! Thanks, Daryl! > > I'm confused. What did I unknowingly fix now? :) > I think your fixes to ArchiveIterator must have fixed my problem with > mass-check complaining about too many open files. I lied. I've still got problems. :-( -- Duncan Findlay pgpxYX0yMwWda.pgp Description: PGP signature
L1 Logistic Regression in SpamAssassin
Hey everybody, Just wanted to let you know that, as part of my 4th Year Math & Engineering Design Project, I'm working on using L1 regularized Logistic Regression to replace the Perceptron scoring mechanism, as described by Lee, Lee, Abbeel and Ng in this paper: http://ai.stanford.edu/~ang/papers/aaai06-efficientL1logisticregression.pdf It's way too early for any kind of predictions, but we're hoping to see better perfomance with this algortihm than the perceptron. I'll let you know when we start to see results. -- Duncan Findlay pgp8YVQaSQNFC.pgp Description: PGP signature
Re: mass-check: Too many open files!
On Sat, Sep 30, 2006 at 03:06:44PM -0400, Daryl C. W. O'Shea wrote: > Duncan Findlay wrote: > >Fixed now! Thanks, Daryl! > I'm confused. What did I unknowingly fix now? :) I think your fixes to ArchiveIterator must have fixed my problem with mass-check complaining about too many open files. -- Duncan Findlay pgpnZF4tVeXWY.pgp Description: PGP signature
Re: mass-check: Too many open files!
Fixed now! Thanks, Daryl! -- Duncan Findlay pgpvcJTlsApnK.pgp Description: PGP signature
mass-check: Too many open files!
Has anybody seen this before? It's been a while since I've looked through my nightly mass-check logs, but I'm now getting lots of errors: bayes: cannot write to /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes db update ignored: Too many open files bayes: cannot write to /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes db update ignored: Too many open files bayes: cannot open bayes databases /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_* R/O: tie failed: Too many open files bayes: cannot open bayes databases /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_* R/O: tie failed: Too many open files bayes: cannot write to /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes db update ignored: Too many open files ... util: secure_tmpfile failed to create file '/tmp/.spamassassin14261y8fPFutmp': Too many open files util: secure_tmpfile failed to create file '/tmp/.spamassassin14261MWKNz4tmp': Too many open files ... archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/debian-devel/debian-devel200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/spam/spam200604: Too many open files archive-iterator: unable to open /home/duncf/Maildir/Old/debian-project/debian-project200604: Too many open files etc. Anybody know what's going on? (I haven't had time to dig in myself...) -- Duncan Findlay
Re: BZ Quips
On Wed, Aug 30, 2006 at 05:55:43PM -0400, Theo Van Dinter wrote: > and so the issue was: is it appropriate to have that on an "official" > (for lack of a better term) website. There was a concern that maybe > the SA PMC or ASF Board should/would have a problem with that. > > So I wanted to bring it up for him and see what the general thoughts were > about this. > > There seems to be two options really if people consider this a worthy issue: > > 1) remove questionable content from the quips and limit the addition cgi > 2) disable quips altogether and avoid having to censor/police the list Personally, I feel the BZ quips are stupid (at best), and I would not be the least bit upset if they were removed. That's certainly a lot easier than #1 above. -- Duncan Findlay signature.asc Description: Digital signature
Re: criticism of our changelogs
On Wed, Aug 30, 2006 at 05:41:50PM -0400, Theo Van Dinter wrote: > On Wed, Aug 30, 2006 at 09:54:05PM +0100, Justin Mason wrote: > > http://use.perl.org/~petdance/journal/30809 > > Yeah, I can understand the POV. Our changelog is generally written so that we > know what's changed in a commit, and if we need more info, that's what the bug > ticket reference is for -- and the Changes file lets people see those as well > w/out needing SVN and such. I'd argue that "written" is the wrong word here. AFAIK, we use SVN for this only because it's easier than actually writing a useful change log. Maybe we want to keep a running changelog and every time someone fixes something worth while they make an entry in it. -- Duncan Findlay signature.asc Description: Digital signature
Re: plugins in sa-update
On Sat, Jul 01, 2006 at 05:22:19AM -0700, Loren Wilton wrote: > If it is a rule that requires new code to work, then the new code better in > some way come with the new rule. Otherwise there is no point in > distributing the (unworkable) rule, and no point in listing it in sa-update. > (And, contra-wise, if someone makes a wonderful new rule that just happens > to require code to work, the need fo code shouldn't disqualify it from > distribution.) On the other hand, I maybe don't want to be installing new code that is run by root on an automated basis, but I'm fine with rules. -- Duncan Findlay signature.asc Description: Digital signature
Re: Summer of Code
On Fri, Apr 14, 2006 at 03:32:19AM -0400, Duncan Findlay wrote: > On Thu, Apr 13, 2006 at 09:52:07PM +0100, Justin Mason wrote: > > this is coming up soon -- do we want to get a couple of entrants > > for SpamAssassin? > > Uh... is it coming up soon? I've seen no announcement or indication > that they will be continuing that program this year. Actually, it very much looks like they will continue this year. It just hasn't been officially announced. http://groups.google.com/group/summer-discuss/browse_thread/thread/675724ce3b035acb/b2ab8df0dc512e58#b2ab8df0dc512e58 -- Duncan Findlay signature.asc Description: Digital signature
Re: Summer of Code
On Thu, Apr 13, 2006 at 09:52:07PM +0100, Justin Mason wrote: > this is coming up soon -- do we want to get a couple of entrants > for SpamAssassin? Uh... is it coming up soon? I've seen no announcement or indication that they will be continuing that program this year. Maybe you've seen something I haven't? :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r384691 - in /spamassassin/rules/trunk/sandbox/dos: SIQ.cf SIQ.pm
On Fri, Mar 10, 2006 at 01:01:07AM -0500, Daryl C. W. O'Shea wrote: > >>Note that Outbound Index is a member/subscription service. > > > >Will this automatically get enabled in the nightly checks? Is this a > >problem for those of us not subscribed to the service? > > The loadplugin line is commented out, so no. Doh! Missed that, obviously. > Running mass-checks with it, even if you've got access, probably isn't a > great idea though, since it'll likely severely skew their database. > > I think some co-ordination would have to take place to make sure that > doesn't happen if we ever want to do mass-checks with it. Fair enough. I really don't know anything about this service. -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r384691 - in /spamassassin/rules/trunk/sandbox/dos: SIQ.cf SIQ.pm
On Fri, Mar 10, 2006 at 03:29:02AM -, [EMAIL PROTECTED] wrote: > Author: dos > Date: Thu Mar 9 19:28:58 2006 > New Revision: 384691 > > URL: http://svn.apache.org/viewcvs?rev=384691&view=rev > Log: > sandbox: add my SIQ plugin for anyone who's interested > > insomnia == non-engineered, messy code :( > > This plugin executes SIQ queries in the background and allows the data > returned by the reputation service provider to be tested via a number of evals > and a psuedo-header > > Note that Outbound Index is a member/subscription service. Will this automatically get enabled in the nightly checks? Is this a problem for those of us not subscribed to the service? -- Duncan Findlay signature.asc Description: Digital signature
Nagios notifications
Can these be throttled more / disabled? I understand the merit of telling someone that can do something about it when it breaks, but as far as I can tell, a very small percentage of this list actually can; the rest of us don't really need the notification. Plus they're getting annoying. Thanks, -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r382049 - in /spamassassin/rules/trunk/sandbox/kam: ./ 20_stock.cf
On Thu, Mar 02, 2006 at 12:38:42PM -0500, Daryl C. W. O'Shea wrote: > [EMAIL PROTECTED] wrote: > >Author: kmcgrail > >Date: Wed Mar 1 07:15:05 2006 > >New Revision: 382049 > > > >URL: http://svn.apache.org/viewcvs?rev=382049&view=rev > >Log: > >KAM Sandbox Creation and 1 test rule > > > >Added: > >spamassassin/rules/trunk/sandbox/kam/ > >spamassassin/rules/trunk/sandbox/kam/20_stock.cf > > I think we were (well, we have so far) sticking to using our Apache > logins for sandbox directory names. Yeah, that's a good point. Kevin, would you please consider moving your sandbox directory to kmcgrail please? (Use svn move to preserve history) Thanks, -- Duncan Findlay signature.asc Description: Digital signature
Re: Nightly mass-checks
On Wed, Mar 01, 2006 at 11:50:58PM -0600, Doc Schneider wrote: > >>is what I'm personally using. The machine is a dual 500 with a gig of > >>RAM. And perl 5.8.6 on it. Anyone have any ideas? > > > >What size are these mailboxes? > > > Total size of the files? or how many messages in each mbox? > > Size of ham is 333 megs > Size of spam is 535 megs. > > A bit over 100k messages total > spams:63731hams:39385 give or take. My current checks are roughly 40k messages. I use --after '6 months' on a much larger corpus, but that gets me down to 40k. (On my weekly net-enabled runs I use --after '1 month') It takes roughly 3 hours start to finish (including scanning the corpus and rsyncing), this is on a 2.8 GHz P4 w/ 1GB RAM. Suggestions: - Get rid of --all; you could be hitting some giant messages and burning a lot of CPU. - Use -j2 since you have 2 processors... might as well use them. - Trim your corpus, (use --after) -- Duncan Findlay signature.asc Description: Digital signature
Re: Nightly mass-checks
On Wed, Mar 01, 2006 at 10:57:27PM -0600, Doc Schneider wrote: > I started one with an rsync'd version I grabbed last night about this > time and it is still going. Says it is 50% complete. I think I'm missing > an option or something. > ./mass-check --progress --all \ > ham:mbox:/home/masschecker/mail/ham \ > spam:mbox:/home/masschecker/mail/spam > > is what I'm personally using. The machine is a dual 500 with a gig of > RAM. And perl 5.8.6 on it. Anyone have any ideas? What size are these mailboxes? -- Duncan Findlay signature.asc Description: Digital signature
Re: Nightly mass-checks
On Tue, Feb 28, 2006 at 10:29:10PM -0600, Doc Schneider wrote: > Gang, > > I'm working on a better wiki page for nightly mass-checks and an easier > way for those old and new to SA to be able to do these. > > A few things which I'm not entirely sure of are: > > 1) Command line to run a mass-check $./mass-check --all --progress > ham:mbox:path/to/ham (does this need a / or a /* to run through a whole > directory of mbox ham files? Ditto with spam?) I recommend using mass-check's -f option, which lets you define all of your targets (ham:dir:/path, etc) in one file, and just pass the name of the file to mass-check. For nightly checks I do: mass-check -f --after='6 months' For weekly checks I do: mass-check --net -j 8 -f --after='6 months' Looks like I don't use --all, I probably should. --progress is probably unnecessary if you're running this from a cron job. > 2) Command line for maildir files? ham:dir:/path/to/ham (Wondering the > same on this also about needing a / or /*) For maildirs: ham:dir:/path/to/maildir/cur For a dir of mboxes: ham:dir/path/to/mboxdir/* > 3) A good example of how to do a mass-check using the SVN method. (Theo > gave me some clues on using it so am pretty much set for doing this.) It seems this is the "old", almost deprecated way now... Guess I missed that memo :-) Manually: Check out a spamassassin tree Nightly: wget http://rsync.spamassassin.org/weekly-versions.txt or nightly-versions.txt take the revision number from the last line of that file (inside a loop with some (randomized) retry logic) svn update -r perl Makefile.PL make run mass-check upload results via rsync (setting the RSYNC_PASSWORD environment variable is useful) -- My script is designed to be run every hour, and it will only run if it hasn't yet run "today" where "today" is defined as since 9:00 UTC (when the revision file gets updated) > 5) Anyone having a particular problem that should be addressed for doing > mass-checks? Now that I think about it, with the rules sandbox stuff, the SVN checkout probably won't always get the same revision if people update their SVN at different times of the day. Is this a bad thing. I just tested this. If I run my nightly mass checks at 11:00 UTC (for example) and someone commits a rule change between 9:00 UTC and 11:00 UTC, I'll pick up the rule and test it, but others might not. I don't know what ramifications this might happen. [EMAIL PROTECTED]:~/svn/spamassassin-nightly$ svn update -r 381595 Fetching external item into 'rulesrc' Urulesrc/sandbox/duncf/20_drugs.cf Updated external to revision 381910. Updated to revision 381595. Note the rules directory gets updated to 381910 instead of 381595. Interesting... -- Duncan Findlay signature.asc Description: Digital signature
Re: Issues with nightly runs & rsync, docs, etc.
On Sun, Feb 26, 2006 at 11:47:34PM -0500, Theo Van Dinter wrote: > On Sun, Feb 26, 2006 at 11:32:57PM -0500, Duncan Findlay wrote: > > > So the problem here is that mkrules is never run, so the majority > > > of rules aren't actually run. I've added in a mkrules call into the > > > nightlymc script. > > > > Umm... which nightlymc script is this? Also, we don't all use the same > > script for running nightlies, so if you change something make sure you > > let people know very clearly. :-) > > The one that runs on the zones machine to generate the rsync image. > http://wiki.apache.org/spamassassin/NightlyMassCheck simply says that > people can rsync down the appropriate version and start running, which > wasn't really true since you had to run mkrules as well, so now it's > run at the server so when people rsync they get the full image. Ah... OK... I'd forgotten we even allow rsync download of the tree for this purpose, so I had no idea what you were talking about :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: Issues with nightly runs & rsync, docs, etc.
On Wed, Feb 22, 2006 at 06:32:58PM -0500, Theo Van Dinter wrote: > Second, the suggested method for doing rsync nightlies is (basically): > > cd nightly_mass_check/masses > rm -f ham.log spam.log > ./mass-check --progress \ > [list of files] > > So the problem here is that mkrules is never run, so the majority > of rules aren't actually run. I've added in a mkrules call into the > nightlymc script. Umm... which nightlymc script is this? Also, we don't all use the same script for running nightlies, so if you change something make sure you let people know very clearly. :-) (I really should commit mine at some point, it has some nice features over the shell script that's there.) -- Duncan Findlay signature.asc Description: Digital signature
[OT] SpamAssassin Developer for Hire
I am currently seeking employment for 8 or 9 weeks in May and June 2006.[1] I would greatly enjoy working for a company involved in the anti-spam / e-mail security industry, especially if it would allow me to use or contribute to the Apache SpamAssassin project. As you may know, I've been a SpamAssassin developer since 2002, although I have contributed little recently -- being at school full-time seems to get in the way of that. Last summer, I worked for IronPort Systems as an Anti-Spam Developer and I greatly enjoyed the experience. I am located in Toronto, Ontario (Canada) and/or Kingston, Ontario, and I am not eager to relocate for such a short term. I would be interested in working in either of these two cities or the surrounding areas, or remotely.[2] If your company would be interested in hiring a highly motivated and skilled young computer programmer with extensive experience in the anti-spam industry, I would love to hear from you. I realize that this is a rather short period of time, but I am confident I could tackle a sizeable project in this time frame. My resume is available online at the following address: http://people.apache.org/~duncf/DuncanFindlay.pdf Please feel free to forward this message to anyone that may interested. References are available on request. Thank you, Duncan Findlay [1] More precisely, I'd like to work May, June and the first little bit of July. I am going to be travelling in Europe for the last half of the summer, from mid-July to the end of August. [2] Some travel, on the other hand, would be perfectly fine. I just do not want to deal with the hassles of finding somewhere to live, furnishings, etc. for a short period. If this were a permanent job, I would be happy to relocate. signature.asc Description: Digital signature
Re: Proposal: two-implementation requirement for new plugin interfaces
On Fri, Feb 03, 2006 at 10:58:21AM -0800, John Myers wrote: > Related to bug 4776, I propose a necessary requirement before creating a > new plugin interface that at least two plugins implementing that > interface exist and be intended for production use. The null > implementation (no plugin of that type registered) could count for one > of these implementations. A non-contributed, proprietary plugin could > similarly count for one implementation. That's fine with me; though it'd be nice to see at least two open source ones. :-) If someone is going to the effort of pluginizing something, chances are that there exists a plugin other than the default. In bug 4776, the default would be the current behaviour, wouldn't it? So the requirement would only require one other implementation, which I presume must exist. -- Duncan Findlay signature.asc Description: Digital signature
Re: [Spamassassin Wiki] Update of "BecomingCommitter" by DuncanFindlay
On Thu, Feb 02, 2006 at 10:54:38PM -0500, Daryl C. W. O'Shea wrote: > On 02/02/2006 10:49 PM, Theo Van Dinter wrote: > >On Fri, Feb 03, 2006 at 03:14:19AM -, Apache Wiki wrote: > > > >>You have to set your own svn password -- infra doesnt do this > >>- In theory, when the account request was made to the infrastructure > >>group, it would have included the output from ''htpasswd -ns username'' > >>and that would have your initial SVN password setup. If there is a > >>problem with this however, you will need to run svnpasswd. > > > > > >If this is the case, why do we ask for htpasswd output when we offer commit > >access? > > > >Also, do we have a wikidoc or something with information about what/how to > >request > >from infra, etc? > > They setup mine (account on minotaur and svn password) with the htpasswd > output I supplied last year. Hmmm... I could be wrong then, I suppose. My understanding is they just set up the regular passwords, and we were supposed to change the svn ones ourselves. Maybe I should just go ahead and ask the people that actually *do* this stuff. -- Duncan Findlay signature.asc Description: Digital signature
Re: review reminder
On Wed, Jan 25, 2006 at 04:33:02PM -0600, Doc Schneider wrote: > Welcome to the world of out of work Admins. I've been out of work for > close to a year now. And the job pickings are really slim. > > I'm still looking for work too! And I'm worried about finding a summer job :-) Sorry I also haven't been doing much. For some reason, I'm taking 8 courses this term leaving little time for all the extracurricular stuff I'm doing, and unfortunately SpamAssassin lies below that on my priority list. :-( Hopefully I'll find a job for this summer that leaves with me a bit of time to code on SpamAssassin... -- Duncan Findlay signature.asc Description: Digital signature
Security-related bugs
I was hoping to start a discussion over what constitutes a "security" bug in our Bugzlla. This is not meant to criticize any previous decisions around security, merely to gauge how we feel about this as a community. So, here I'd like to outline the criteria I would suggest for determining whether a bug should be classified as "security" and restricted to the "security team." Please comment. :-) - Bugs which allow false negatives are not security bugs. In particular if a bug allows a carefully crafted message to bypass some, but not all, of SpamAssassin's tests, then it should not be marked as "security". - DOS attacks and other related, *exploitable* bugs that cause disruption to mail-scanning or other problems for the server are security bugs. (I don't consider 4570 to be a security bug, for example. It's just not exploitable by spammers.) - Bugs that allow a specially crafted spammy message to get through regardless of any other charactersistics (i.e. header, body, Bayes and other tests fail to count) may be security bugs. (I'd argue it's not strictly speaking a security issue for the system, but it is something we should maybe not make public. I could be convinced either way on this.) Lastly, I'd like to say that once a bug is outlined in the open, there is no point to hide it after the fact. In fact, all this may accomplish is to hide the fix from our users, even though a description of the "exploit" is publicly available. (Example: bug 4759, 4535, others I'm sure.) -- Duncan Findlay signature.asc Description: Digital signature
Re: sa-updates of the main ruleset: require GPG?
On Fri, Dec 16, 2005 at 03:10:10PM -0800, Justin Mason wrote: > a question that Henry put to me -- should sa-updates of the main ruleset > mandate that GPG verification be used? > > Otherwise an attacker that rooted the download server (or a mirror) could > put out faked updates, which would be automatically downloaded by > thousands of servers. I'm not sure it should be "required" since users could just manually download it and stick it in the right place and requiring it would be an inconvenience then, but "strongly recommended unless you give sa-update the --yes-im-crazy-and-dont-want-to-use-gpg option".* -- Duncan Findlay * That said, "--no-gpg" would probably be equally suitable. signature.asc Description: Digital signature
Re: updated sa-update proposal
On Fri, Dec 16, 2005 at 06:04:18PM -0500, Warren Togami wrote: > Justin Mason wrote: > >If we version the /var/lib/spamassassin directory, then we would have > >this timeline: I agree versioning is useful. My only concern is we'll never purge the old stuff. > I think this makes a lot of sense and we should go for it. I also think > that we don't need auto-purging to remove other versions of these > directories. There are just too many things that can go wrong, and > these things are not very big on space consumption. I just don't like cruft hanging around forever. :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: updated sa-update proposal
On Thu, Dec 15, 2005 at 10:06:36AM -0500, Thomas Schulz wrote: > Just checking on various operating systems. /var/lib seems to exist on > Linux, but not on Solaris, HP-UX or AIX. There is a /var/opt on Solaris > and HP-UX, but not on AIX. Of course you could always create the /var/lib > (or /var/opt) directory in the install. Right. I'm going by the Filesystem Heirarchy Standard, which is what many Linux istributions go by. Many other OSs don't, and we would need to stick this in the appropriate place on other OSs too. Namely, it should be somewhere where "variable" data goes. I just don't know where that is. ;-) -- Duncan Findlay signature.asc Description: Digital signature
Re: updated sa-update proposal
On Wed, Dec 14, 2005 at 04:54:54PM -0800, Justin Mason wrote: > So we have these requirements: > > 1. use /var for updates, instead of /etc or /usr > > 2. sa-update updates must not overwrite any packaged files > > 3. the user shouldn't have to choose at package-install time whether > they want to use packaged rules, or sa-update rules. (although > conversely, it's ok to entirely stop using packaged rules from that > point on, if sa-update installs an update set.) Agreed. > So the suggestion is to use: > > /etc/mail/spamassassin: > > *.cf > *.pre: Admin-installed local settings > > > /usr/share/spamassassin: > > default, distro-package-installed scores and rules > > > /var/lib/spamassassin/3.1.0: > /var/lib/spamassassin/3.1.1: > /var/lib/spamassassin/3.1.2: > /var/lib/spamassassin/3.2.0: > > sa-update-installed scores and rules I'm not sure I see the need for multiple directories lying around. I suppose it can be useful, I'm assuming that most will only have one directory. Also, sa-update should be smart enough to remove old directories of previous versions (optionally?). > The presence of anything in /var/lib/spamassassin/3.1.1 causes > /usr/share/spamassassin to be ignored. > > All rules, including the code-tied stuff for that release, are put in the > sa-update tarballs (and therefore /var/lib/spamassassin/3.1.2 etc.) Hmm... does it make sense to redistibute the code-tied stuff? That seems like unnecessary bandwidth usage. sa-update should only be grabbing the "changing" non-code tied stuff. -- Duncan Findlay signature.asc Description: Digital signature
Re: hackathon notes from Sat
On Wed, Dec 14, 2005 at 11:36:11AM -0800, Justin Mason wrote: > Duncan Findlay writes: > >Right. I also don't see any need to split the rules out of the main > >package -- spamassassin just needs to be smart enough to use the right > >set of rules -- either where sa-update drops them or where they are > >installed by default. > > So you're suggesting we'd have: > > /usr/share/spamassassin/72_active.cf: base, released copy of > rule updates > /etc/mail/spamassassin/sa_update.cf: override of that default set > > ?? Yes, except that I'd argue /etc/ isn't the right place for it either. I'm really thinking it should go in /var/lib somewhere. But that would mean we'd have the following: /etc/spamassassin | /etc/mail/spamassassin - site config /usr/share/spamassassin | ... - default rules /var/lib/spamassassin - sa-update drop directory > I could go for that. We'd have to modify the Mail::SpamAssassin code > to recognise the 72_active.cf file somehow and allow it to be ignored > in the system rules dir, if it appears in the site rules dir. Are we going to be consolidating all the rules to one file? It would make it tougher for users to read and play with, if that's a concern. -- Duncan Findlay signature.asc Description: Digital signature
Re: hackathon notes from Sat
On Tue, Dec 13, 2005 at 03:49:44PM -0500, Warren Togami wrote: > Duncan Findlay wrote: > >The only problem I see with the above, is that no script should be > >overwriting rules that are distributed in a package. So if I > >distribute a spamassassin-rules .deb, which would stick files in > >/usr/share/spamassassin, no script should go in and overwrite those > >rules. sa-update should be writing to somewhere in > >/var/lib/spamassassin (or /var/cache/spamassassin ?) and > >spamassassin/spamd should be reading from that location if it exists. > > > >So, looks like spamassassin/spamd probably needs to be modified to > >read from /var/lib/spamassassin if we want sa-update to work this way. > > > > I am in agreement that sa-update should download rules/scores into > somewhere in /var, and it shouldn't overwrite files distributed by the > package. I am not so sure I like the separate co-dependent package for > scores thing as a requirement. Right. I also don't see any need to split the rules out of the main package -- spamassassin just needs to be smart enough to use the right set of rules -- either where sa-update drops them or where they are installed by default. > I am a little confused about the terminology, active-set means network > tests right? I believe "active-set" refers to the latest scored set of rules -- the idea being that rules will be updated more often than code. -- Duncan Findlay signature.asc Description: Digital signature
Re: hackathon notes from Sat
On Sun, Dec 11, 2005 at 12:35:46PM -0800, Justin Mason wrote: > OK, we're rethinking this; it no longer seems necessary for it > to be a requirement, and you have good points there. > > What about this? > > - basic "spamassassin" package (rpm/deb) contains no active-set rules > > - there's another package which contains the active-set rules, in the > location where "sa-update" can later overwrite them > > - both packages co-depend on each other. > > The second package can be updated either via distro packaging methods -- > apt-get/yum, or can be overwritten using "sa-update". Yeah, sorry I didn't read the original message carefully enough. I think I'm pretty much in agreement with Warren though as far as requirements go. The only problem I see with the above, is that no script should be overwriting rules that are distributed in a package. So if I distribute a spamassassin-rules .deb, which would stick files in /usr/share/spamassassin, no script should go in and overwrite those rules. sa-update should be writing to somewhere in /var/lib/spamassassin (or /var/cache/spamassassin ?) and spamassassin/spamd should be reading from that location if it exists. So, looks like spamassassin/spamd probably needs to be modified to read from /var/lib/spamassassin if we want sa-update to work this way. -- Duncan Findlay signature.asc Description: Digital signature
Re: BTW / RE: Bug 4535 - Re: 3.0.5 almost done, votes needed
On Mon, Nov 21, 2005 at 12:20:39AM -0500, Duncan Findlay wrote: > On Sun, Nov 20, 2005 at 09:11:02PM -0800, Linda Walsh wrote: > > Was reading the bugs to see what was yet to go into 3.0.5 and > > ran into some weirdness with bug# 4535: > > > >"You are not authorized to access bug #4535." > > > > Hmmm...not authorized to "look" at a bug"...what's it take to > > look at bugs these days (or at least this specific one...I could > > view the rest, they all seem to be resolved)? > Bug 4535 is classified because it has been marked as a security bug. Also, it's RESOLVED FIXED, FYI. -- Duncan Findlay signature.asc Description: Digital signature
Re: BTW / RE: Bug 4535 - Re: 3.0.5 almost done, votes needed
On Sun, Nov 20, 2005 at 09:11:02PM -0800, Linda Walsh wrote: > Was reading the bugs to see what was yet to go into 3.0.5 and > ran into some weirdness with bug# 4535: > >"You are not authorized to access bug #4535." > > Hmmm...not authorized to "look" at a bug"...what's it take to > look at bugs these days (or at least this specific one...I could > view the rest, they all seem to be resolved)? In general, bugs can be restricted so that only the "Security Team" can look at them. I believe the "Security Team" is simply the group of all committers. Bug 4535 is classified because it has been marked as a security bug. Obviously, I can't elaborate any more than that. :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r345103 - in /spamassassin/rules/trunk/sandbox: duncf/25_replace.cf jm/20_vbounce.cf
On Wed, Nov 16, 2005 at 08:28:22PM -, [EMAIL PROTECTED] wrote: > == > --- spamassassin/rules/trunk/sandbox/duncf/25_replace.cf (original) > +++ spamassassin/rules/trunk/sandbox/duncf/25_replace.cf Wed Nov 16 12:28:21 > 2005 > @@ -6,7 +6,9 @@ > describe T_FUZZY_STOCK Obfuscates the word "stock" > > test T_FUZZY_STOCK fail Stock > -test T_FUZZY_STOCK ok St0ck > + > +# this is failing; "make test TEST_FILES=t/rule_tests.t" will repro it > +# test T_FUZZY_STOCK ok St0ck > > > endif Wierd. This definitely worked for me before. As far as I can tell, it *should* work. I am relying on the definitions of tags in rules/25_replace.cf. Is this a problem? -- Duncan Findlay signature.asc Description: Digital signature
Re: "ready to release" votes -- procedure change
On Tue, Nov 15, 2005 at 12:06:39PM -0800, Justin Mason wrote: > I've just updated http://wiki.apache.org/spamassassin/VotingProcedure > to fix a bug -- looks like we were reading the ASF pages wrongly. > quoting the fixed version: > > For code modifications, patches, and R-T-C changes to svn, committers have > the binding votes. However, for "ready to release" and project-procedural > ASF votes, votes must come from PMC members to be considered binding. > > (Note: previously committers could vote for releases, but this has had to > be changed, due to ASF regulations. While the Apache Voting page is a > little unclear on the subject, discussion on the 'legal-discuss' list has > made it clear that it is part of the ASF's bylaws that PMCs, and only > PMCs, can direct this action.) Hmm... are "pre-releases" and "release candidates" different in this respect? Personally, I think requiring +3 from PMC for an RC or -pre is excessive. -- Duncan Findlay signature.asc Description: Digital signature
Sandboxes
Could someone please outline the process I would go through to propose rules for testing? (*cough* jmason *cough*) ;-) I assume it's like this. (In MoinMoin format, suitable for someone to paste into the Wiki.) 1. Create a new sandbox in rulesrc/sandbox (e.g. rulesrc/sandbox/duncf) 1. Create as many files in this directory as I wish, containing as many rules as I wish, named whatever I wish. 1. Wait until the mass-check results are in, and look at the results at http://buildbot.spamassassin.org/ruleqa. 1. Tweak until I'm satisfied with my rule. 1. ??? 1. Profit! The last couple of steps are a little hazy -- I'm not sure if we've ever decided how we're going to move rules from sandboxes to "core rules". Can anyone comment about this? I guess we don't need to worry about that until closer to "release time". Anyways... it looks like the mechanism is pretty much set up... now all we need to do is write rules. (?) Anyone notice a lot more "clever" spam recently? Thanks, -- Duncan Findlay signature.asc Description: Digital signature
Re: please fix your nightly mass-checks
On Sat, Oct 29, 2005 at 07:28:09PM -0700, Justin Mason wrote: > It's only myself, daf and bzoetekouw submitting results afaics. > > BTW, that new rule-hits-over-time graph is *cool*. Check out these > graphs! What do the different colours represent? (Could you provide a legend?) -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r328517 - /spamassassin/trunk/UPGRADE
On Wed, Oct 26, 2005 at 01:24:30AM -, [EMAIL PROTECTED] wrote: > Author: duncf > Date: Tue Oct 25 18:24:28 2005 > New Revision: 328517 > > URL: http://svn.apache.org/viewcvs?rev=328517&view=rev > Log: > Fix typo (Debian Bug 335799) in UPGRADE For those following commits way too closely, that bug number is in fact incorrect. It should be 335794. The commit to the 3.1 branch is correct. Is there a way to change log entries? Is it desirable? -- Duncan Findlay signature.asc Description: Digital signature
Re: Suggestion: new list for corpus run announcements and discussion
On Sun, Oct 16, 2005 at 12:19:35PM -0700, Justin Mason wrote: > ok, I can go for a new bugs@ list, too. +1 -0 I kinda doubt there's enough on dev@ to keep it going if we don't have bugs sent there. If we want to lower the traffic so that certain people can read what they need to read, then we should just create a dev-announce list or something like that for the important stuff. By creating a bugs@ list, we'll be splitting discussion of development (i.e. bug fixing) over two lists, which probably doesnt make much sense. -- Duncan Findlay signature.asc Description: Digital signature
Re: Suggestion: new list for corpus run announcements and discussion
On Thu, Oct 13, 2005 at 12:14:27PM -0400, Theo Van Dinter wrote: > I suggest to people who volunteer to do nightly/weekly runs that they > subscribe to dev to keep up with any issues regarding changes that > affect them. I've heard complaints that there's too much mail on dev > to keep up, or changes may be missed because they're only in a commit > mail which most of those folks ignore. > > So I'd like to make a new list (not sure about name) where we can have > discussions/send announcements/etc that are directly relevent to those > folks. Thoughts/votes? (do we need to vote for a new list?) [EMAIL PROTECTED] ? +1 -- Duncan Findlay signature.asc Description: Digital signature
Re: 3.1.0rc3, or general release?
On Mon, Sep 12, 2005 at 06:25:26PM -0700, Justin Mason wrote: > I think 4570 is the only relatively big one there -- and it was a > one-liner. Given that, I think we could do the general release of 3.1.0, > instead of another RC. > > Personally, I'd like to get 3.1.0 out. > > Thoughts? I'd be more in favour of 3.1.0 rc3, to get some last minute testing in before we release. (It'd be nice for 3.1.0 to be rock solid, so we don't need to release 3.1.1 right away...) We've dragged the process on long enough that another couple days wont hurt. :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: Debian Packages for 3.1.0-rc2
On Wed, Aug 31, 2005 at 02:23:14PM -0500, Chris Thielen wrote: > Installed cleanly for me on my Sarge box. I merged my local.cf into the > updated package one. > > Aug 31 14:04:27 ns1 spamd[4269]: Can't locate Sys/Hostname/Long.pm in @INC > (@INC contains: ../lib /usr/share/perl5 /etc/perl /usr/local/lib/perl/5.8.4 > /usr/local/share/perl/5.8.4 /usr/lib/perl5 /usr/lib/perl/5.8 > /usr/share/perl/5.8 /usr/local/lib/site_perl) at > /usr/share/perl5/Mail/SPF/Query.pm line 328, line 64. > I'm getting these in mail.log, but that's because libmail-spf-query-perl > only suggests libsys-hostname-long-perl (which I don't have installed). > I believe this is a warning at worst since the package is only > suggested, and I assume it is out of your control. That should be reported to the BTS as a libmail-spf-query-perl bug, I believe. > I am seeing some other anomalous messages in the log, but I believe they > are not packaging related. Hmm... feel free to report those to SA Bugzilla. :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: Spamassassin in Fedora service restart problem
On Wed, Aug 31, 2005 at 08:07:31AM -1000, Warren Togami wrote: > This is not a spamassassin bug per-se, but rather a problem in the way > we had packaged it for years. > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141323 > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161785 > It would be helpful if upstream could look at the excellent explanation > in Bug #161785. I am hoping there is a better solution to this problem > than the suggested shell script .pid solution suggested in that bug. > > Wouldn't it be preferable to have a solution where spamd itself returns > when it is finished stopping rather than relying on an asynchronous kill > signal? I think the solution is for spamd's --pidfile to write the pid to the file before dropping root, and this is fixed in 3.1.0. -- Duncan Findlay signature.asc Description: Digital signature
Debian Packages for 3.1.0-rc2
On Mon, Aug 29, 2005 at 11:41:39PM -0400, Duncan Findlay wrote: > *** THIS IS A RELEASE CANDIDATE ONLY, NOT THE FINAL 3.1.0 RELEASE *** > > SpamAssassin 3.1.0-rc2 is released! SpamAssassin 3.1.0 is a major > update. SpamAssassin is a mail filter which uses advanced statistical > and heuristic tests to identify spam (also known as unsolicited bulk > email). Debian packages or 3.1.0-rc2 are available from the experimental distibution (version 3.0.99pre3.1.0+rc2-1). I'd appreciate help testing them, so that all the bugs in the packaging can be worked out by the time 3.1.0 is uploaded to unstable. Packages.debian.org hasn't been updated yet, but if you have the appropriate line for experimental in your /apt/sources.list, you can try: # apt-get install spamassassin/experimental spamc/experimental Or you can download the files from your favourite (up-to-date) mirror: ftp://ftp.us.debian.org/debian/pool/main/s/spamassassin/spamassassin_3.0.99pre3.1.0+rc2-1_all.deb ftp://ftp.us.debian.org/debian/pool/main/s/spamassassin/spamc_3.0.99pre3.1.0+rc2-1_i386.deb All bugs in the Debian packaging should be reported to the Debian BTS, not the SpamAssassin bugzilla. Thanks, -- Duncan Findlay signature.asc Description: Digital signature
Re: body rule speed
On Tue, Aug 30, 2005 at 06:29:20AM -0700, Dale Luck wrote: > In my study of where SA is spending most of its time, it became > quickly apparent the do_body_tests is by far the largest cpu hog. > Indeed i've seen just a single file (sare_fraud) can use up half of > the cpu cycles for every spam scan. > > I was wondering if anyone investigating flipping inside out the > algorithm used to apply the rules to the body. I believe I tried to look at this one time, but it got pretty messy to hack that in and I didn't have enough time to spend on it. Any speedup seemed to be minimal, but it might be worth looking into in greater detail. Also, I'm not convinced study helps a whole lot. Having said that, some of our regular expressions could probably be tuned better so that study helps more. The case insensitive thing can be a very large speedup; however, we do have many tests that rely on capitalization. We'd need a way of splitting them up or something, since we definitely need some case sensetive rules -- Duncan Findlay signature.asc Description: Digital signature
ANNOUNCE: SpamAssassin 3.1.0-rc2 release candidate available!
ugins: DomainKeys (off by default), MIMEHeader: a new plugin to perform tests against header in internal MIME structure, ReplaceTags: plugin by Felix Bauer to support fuzzy text matching, WhiteListSubject: plugin added to support user whitelists by Subject header. - TextCat language guesser moved to a plugin. (This means "ok_languages" is no longer part of the core engine by default.) - Razor: disable Razor2 support by default per our policy, since the service is not free for non-personal use. It's trivial to reenable. - DCC: disable DCC for similar reasons, due to new license terms. - Net::DNS bug: high load caused answer packets to be mixed up and delivered as answers to the wrong request, causing false positives. worked around. - DNSBL lookups and other DNS operations are now more efficient, by using a custom single-socket event-based model instead of Net::DNS. - add support for accreditation services, including Habeas v2. - better URI parsing -- many evasion tricks now caught. - URIBL lookups are prioritized based on the location in the message the URI was found. - mass-check now supports reusing realtime DNSBL hit results, and sample-based Bayes autolearning emulation, to reduce complexity. - sa-learn, spamassassin and mass-check now have optional progress bars. - modify header ordering for DomainKeys compatibility, by placing markup headers at the top of the message instead at the bottom of the list. - spamd/spamc now support remote Bayes training, and reporting spam. - spamc now supports reading its flags from a configuration file using the -F switch, contributed by John Madden. - added SPF-based whitelisting. - Polish rules contributed by Radoslaw Stachowiak. - many rule changes and additions. -- Duncan Findlay signature.asc Description: Digital signature
[vote] Release 3.1.0-rc2
I hereby propose that we release Apache SpamAssassin 3.1.0-rc2 http://people.apache.org/~duncf/devel/ md5sum of archive files: 1e2ecf555d62deae136b08fb482e8f68 Mail-SpamAssassin-3.1.0-rc2.tar.bz2 41fe5c0c5ab226e0d33de20c10f69240 Mail-SpamAssassin-3.1.0-rc2.tar.gz 91bc48f87eb520040ece42dced886243 Mail-SpamAssassin-3.1.0-rc2.zip sha1sum of archive files: a68a040c2b2c51d7284fbd15336e639a32a0d45d Mail-SpamAssassin-3.1.0-rc2.tar.bz2 a20f3d82743186af085fac1deb540c22ebdc8ce1 Mail-SpamAssassin-3.1.0-rc2.tar.gz f76cc96981c6766d48edd6ed60c621036a9dfcf5 Mail-SpamAssassin-3.1.0-rc2.zip I vote +1. For those following the commits mailing lists, yes, this is my first build. ;-) The GPG signatures are not available at the above URL since I do not yet have access to the GPG signing key. -- Duncan Findlay signature.asc Description: Digital signature
RC2?
Remaining bugs in 3.1.0 queue: 4494nor ASSI[review] sa-learn uses local_tests_only=0 which can mess ... Patch ready, needs one more vote. 4552maj NEW [review] Unitialized value warnings in spamd Patch ready, needs two more votes. 4558nor NEW oscommerce ships with an open redirector Punt to 3.1.1? Certainly not a release blocker, but looks trivial, someone just needs to put together a patch. I'm happy to get RC2 ready, though I need access to the private signing key. (Anyone want to send that to me via GPG encrypted mail?) But first, we need to fix at least 4494 and 4552. Please, committers, take a second to review those bugs. Thanks, -- Duncan Findlay signature.asc Description: Digital signature
Re: Wanted: Better Documentation
On Wed, Aug 24, 2005 at 03:54:19PM -0700, Daniel Quinlan wrote: > I doubt this will work any better than just working on it incrementally > in SVN. Well, I think anything that lowers the bar for submissions is going to help somewhat. It's much more of a hassle to submit a bug report for a minor typo as compared to going and fixing it yourself. Now all we need to do is to throw a link into the documentation so we can catch users as they're first reading it. Then the fixes will start pouring in. (I'm hoping) I think it might be a little *more* work on me (or whoever else wants to copy stuff over), but also will yield *more* fixes, don't you think? > If anyone here would like to help us work on documentation, submit some > patches via bugzilla and if we like them, the PMC will consider giving > you commit access. We could do this with Wiki contributions too, just as easily; without the Bugzilla overhead. So to summarise: to help contribute documentation, either a) Post to the Wiki, or b) File a bug (preferably with a patch). If we like what you do, we'll consider giving commit access. -- Duncan Findlay signature.asc Description: Digital signature
Wanted: Better Documentation
Certain parts of SpamAssassin's documentation are horribly out of date and could really use some help. (For example spamd/README still recommends people set --max-children to 20, and has numerous "FIXME" sections.) Since it's ofter a bit arduous to file a bug and submit a patch to make corrections, I've put all our documentation up on the Wiki, so it's now super easy to edit. Please go through it if you get a chance. Periodically, we'll go through it and apply the changes to SVN. It's my hope that this will lead to more accurate and useful documentation. So, here's the link: http://wiki.apache.org/spamassassin/BetterDocumentation Happy editing! We appreciate your help. -- Duncan Findlay signature.asc Description: Digital signature
Re: proposed branch policy change
On Mon, Aug 15, 2005 at 10:37:01AM -0700, Justin Mason wrote: > Daniel Quinlan writes: > > I propose that new branches default to CTR mode and only enter RTC if > > explicitly made so. All existing branches are RTC mode, of course. +1 I agree that the act of branching and the decision to go to RTC should not necessarily be made at the same time. -- Duncan Findlay signature.asc Description: Digital signature
Re: VOTE: the rules project
On Sat, Aug 13, 2005 at 03:23:58PM -0700, Justin Mason wrote: > Based on email from the last few weeks, I think we're all pretty happy > with the sandboxes idea as described on > http://wiki.apache.org/spamassassin/RulesProjSandboxes . It's also the > first step on the way to all the other listed ideas. Given that, I've come > up with a task list to get us there. So can we get votes, both for the > plan, and for the tasks? Here they are: I'm not sure the tasks really need to be voted on, I mean, would it make sense if we approved all but number 3? :-) > - PMC: vote to approve the sandboxes project > - reorganise the rules directory into core/ , sandbox/, and extra/; link > that rules project SVN repository to 3.2.0's 'rules' dir; use SVN > externals to do this. > - write scripts to test, filter, and pull rules from sandboxes > automatically into core/ production ruleset > - move current ruleset into a new "legacy" sandbox > - start using the above scripts to generate core/ ruleset in svn +5 i.e. +1 for each of the above. -- Duncan Findlay signature.asc Description: Digital signature
Re: svn commit: r232534 - /spamassassin/branches/3.1/lib/Mail/SpamAssassin/BayesStore/MySQL.pm
On Sat, Aug 13, 2005 at 10:02:23PM -, [EMAIL PROTECTED] wrote: > Author: parker > Date: Sat Aug 13 15:02:22 2005 > New Revision: 232534 > > URL: http://svn.apache.org/viewcvs?rev=232534&view=rev > Log: > Fix typo Yeah, sorry, I meant to fix both trunk and 3.1, but got distracted half way through. :-) Duncan signature.asc Description: Digital signature
Re: [Spamassassin Wiki] Update of "DnsblAccuracy082005" by JustinMason
On Fri, Aug 12, 2005 at 10:52:55PM -, Apache Wiki wrote: > The following page has been changed by JustinMason: > http://wiki.apache.org/spamassassin/DnsblAccuracy082005 > > The comment on the change is: > everybody likes DNSBL stats ;) > > New page: > = DNS Blocklist Accuracy Figures (as of July 2005) = [...] > * hits are recorded from 'live' data at the time the messages were > received, not post-facto testing (using 'mass-check --reuse') I don't think *everyone* used --reuse. :-( So these stats are probably not nearly as valid as we'd like to think. (I, for example, did not use --reuse.) -- Duncan Findlay signature.asc Description: Digital signature
Re: change R-T-C rules
On Wed, Aug 10, 2005 at 08:13:16PM -0700, Justin Mason wrote: > I propose we change our procedures to lower the number of +1's required > for code changes during R-T-C, from 3 to 2. Please vote. I'm not convinced that we've been blocked by the review process. Bug 4505 has taken forever to resolve, but it's primarily a lack of patches for review, or, perhaps it's not been made clear when a review is needed. Either way, I don't think the above proposal would have helped us. I'm -1 (assuming we need a majority; this is not a veto), until I can be convinced that this would actually speed up the release process by a meaningful amount. :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: PROPOSAL: create "SpamAssassin Rules Project"
On Tue, Jul 26, 2005 at 02:33:12PM -0400, Chris Santerre wrote: > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Herb Martin writes: > > > Normally in an open source project anyone who wishes to > > > listen, lurk, and read or even use the bleeding edge code > > > is free to do so to learn and get into the frame of the > > > project. > > > > > > That cannot be true (to the same extent) if there are > > > security layers that make such gradual involvement > > > difficult. > > > > Yep, this is entirely true -- and this is the reason why the > > ASF suggests > > that lists should be open if at all possible. > > > > It's a tricky conundrum -- need to think about this some more... > > I don't see official rules majorly discussed in the open now. With a new > release of SA, you don't go into detail about what new rules are looking > for, so why should that change. If they aren't discussed in the open right now, they aren't being discussed. :-) The development process is perfectly open right now, yet it's not a problem; i.e. we don't have any evidence that spammers are exploiting this. > People who update from SARE, just hear: "Hey .cf got updated." And they > go and get it. Or they don't even know it gets updated and the RDJ script > does it. So public is pretty good at just accepting the rule updates. Yes, but it's difficult for people to join SARE, or learn what goes into rule development. If all the development takes place in private, then there's no way for newcomers to join and this is a really bad thing. > Having an open public discussion on new rule ideas, pretty much defeats the > purpose. I'd like to see the data that supports this claim. I'm really skeptical. -- Duncan Findlay signature.asc Description: Digital signature
Hackathon summary
Just thought I'd post a quick note about the hackathon that took place today at Stanford university. "We" below refers to Justin Mason, Daniel Quinlan, Michael Parker and me. Matt Sergeant was also present too for a while, so "we" can include him too for some of the following items. :-) Discussion * We discussed at length the ideas for the new rules project, and we came up with some ideas, which we're trying to track http://wiki.apache.org/spamassassin/RulesProjectPlan (Please give us your feedback) * We discussed the 3.2 release goals (http://wiki.apache.org/spamassassin/ReleaseGoals) * Dr. Andrew Ng gave us a brief presentation of how Logistic Regression may be an algorithm we could use in the future to replace the perceptron. Development * We came up with a plan to restructure PerMsgStatus.pm so it's not so unwieldy and out of control. (http://bugzilla.spamassassin.org/show_bug.cgi?id=4497) * We branched the tree so we could start committing stuff to HEAD. Strangely, however, we got almost no coding done. QA/Bugs * We went through all the bugs targeted for 3.1.0 and triaged them. (all the bugzilla comments from me today were really from all of us that were present) * We added a "moreinfo" keyword to bugzilla for bugs that are in need of more info. One side effect of this, is that we'll need to remove that keyword when more info is actually given. :-) That's about all. -- Duncan Findlay signature.asc Description: Digital signature
Branch 3.1?
I think it's time to branch 3.1 (also so Justin, Dan, Michael and I can get some work done while we're here). Votes? +1 -- Duncan Findlay signature.asc Description: Digital signature
Re: PROPOSAL: create "SpamAssassin Rules Project"
On Wed, Jul 20, 2005 at 11:53:25PM -0700, Robert Menschel wrote: > Hello Duncan, > > Wednesday, July 20, 2005, 9:07:15 PM, you wrote: > > >> The SARE list is private and invitation only for exactly these reasons. > > DF> I'm *really worried* about proposals that involve mailing lists that > DF> have only private archives and require moderator approval for > DF> subscription. It just doesn't feel right for an open source project. > > Agreed. But you do secure the security-bug submissions from > publicly accessible lists and archives... Leaking rules to the public don't compromise users systems! Obviously there is a tradeoff. > DF> It's quite possible that this drives people away. In fact I'm quite > DF> sure people are less likely to get involved if they have to somehow > DF> prove that they aren't a spammer in order to subscribe. > > Yes, but you also don't want spammers wrecking the system, making it > useless. There's a viable balance somewhere... Agreed. -- Duncan Findlay signature.asc Description: Digital signature
Re: PROPOSAL: create "SpamAssassin Rules Project"
On Wed, Jul 20, 2005 at 11:35:20PM -0700, Loren Wilton wrote: > > I'm *really worried* about proposals that involve mailing lists that > > have only private archives and require moderator approval for > > subscription. It just doesn't feel right for an open source project. > > I understand the feeling. I'm trying to balance the obvious desire for a > completely public process with the absolutely known fact that publishing a > rule in the user's group will literally within hours lead to the rule > becoming useless in many cases. I guess you'd have better data than I would; but I'm still having trouble believing that Spammers are adjusting on that time frame. > (I've even a couple of times as a test given the bodies for slightly bogus > rules out - that detected a not particularly useful spam sign - to see if > the spam sign disappeared, and how quickly. Indeed, the signs would usually > disappear. One could probably conclude something about the spam gang using > a particular sign from how quickly after publication of a rule the sign > disappears; but I'm not particularly interested in that form of research.) > > This led to my twofold suggestion that a) entry to the group be moderated, > and b) the archives be embargoed for a week or two, or perhaps a month. But how do we know who should be allowed access to the group? I definitely prefer delayed archives to closed ones. > For instance, on many projects to be a developer you have to be > admitted to developer access to the source. Others can look at the > source and make their own versions, but can't necessarily modify the > actual project source unless the local gods approve of them. (See > for instance the description of the Audacity project over at SF, > which I was looking at earlier today.) I'm really not sure what you mean here. Audacity is licensed under the GPL. The main difference between the GPL and the Apache license (IIRC, IANAL, etc) is that with the GPL, if you do make changes and distribute a changed version, you need to distribute the source of the changed version. I'm sure they have the same procedures with respect to modifying the official project source as we do, namely there is a group of committers that have access to do this, everyone else gets to submit patches to them. (And I'm not sure what you mean by "local gods". Most developers are human, at least the ones I've met in person... :-P ) -- Duncan Findlay signature.asc Description: Digital signature
Re: NOTICE: rescore mass-checks
On Wed, Jul 20, 2005 at 01:22:47PM -0700, Justin Mason wrote: > hmm -- do they have to have a ham file? I don't think there's a need for > that to be a rule. I think there is. Without spam, I'd be pretty leary of the quality of the corpus, but specifically the quality of the BAYES results. -- Duncan Findlay signature.asc Description: Digital signature
Re: PROPOSAL: create "SpamAssassin Rules Project"
On Wed, Jul 20, 2005 at 11:37:20AM -0400, Chris Santerre wrote: > Perhaps some thing like the dev "bug squish events" could be used? Once a > week the people who run SARE rule sets check to see the biggest hitters, and > on that day we test those heavy hitters against a bigger corpus, and look to > add to SA. Successful ones get moved out of SARE and into SA. Interesting idea. I think I'd like to see more of the development take place under the Apache umbrella, so that the failures and the successes are available to all rather than just the SARE people. Also, if people get bogged down and don't get a chance to submit rules, etc, it's easier for other people to take over. With Dan's sandbox idea, that can certainly happen - the development would take place separate from SA, but still in Apache so that rules can easily be brought in to the main distribution, at some interval, say, once a week? :-) -- Duncan Findlay signature.asc Description: Digital signature
Re: PROPOSAL: create "SpamAssassin Rules Project"
On Wed, Jul 20, 2005 at 08:44:26PM -0700, Robert Menschel wrote: > Indeed, it's not uncommon for a rule or ruleset to be checked 2-3 > times with knowingly excessive regexes, so we can see what actually is > or isn't being matched in various regex hits. We use this information > to improve the rule, and then remove the excess to the regex for a > final pre-publication run. When more of the committers were actually writing rules, we'd do the same thing, we'd commit some giant number of rules (up to 20, for example) and wait till the next day when the results came back. Sure, it's a lot nicer to be closer to real time! -- Duncan Findlay signature.asc Description: Digital signature