Eric Shubert wrote:
Jon Ernster wrote:
Eric Shubert wrote:
Jon Ernster wrote:
Eric Shubert wrote:
Jon Ernster wrote:
Figured I'd share something I wrote that others here might use since
many of you have shared or helped me out as well.

I wrote this because when I go on vacation I usually shut down the
laptop that has my mail rules which sends all my spam to the spam
folder.  By the time I get back the shell script that I have that runs
on a daily basis (courtesy of Jake Vickers) gives this error because
there are too many spam files:

/root/learn-spam: /usr/bin/sa-learn: /usr/bin/perl: bad interpreter:
Argument list too long
/root/learn-spam: line 10: /bin/rm: Argument list too long

So I just wrote this to process the files individually.  Not the fastest
script in the world (because of SpamAssassin, not becaues of my code,
obviously) ;), but it works.

J.

------------------------------------------------------------------------

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Std;

#~ $Id: learn-spam.pl 15 2008-07-03 17:27:09Z jernster $

my %opts = ();

getopts( 'd:u:', \%opts );

my ( $domain, $user ) = @opts{ qw( d u ) };

my $usage =<<EOF;
Usage:

   $0 -d example.com -u user

EOF

die "$usage" unless ( $domain && $user );

my $dir = "/home/vpopmail/domains/$domain/$user/Maildir/cur";
my $starttime = time;
my $count = 0;

opendir(DIR, $dir);
my @files = readdir(DIR);
close(DIR);

foreach my $file ( @files )
{
   if ( $file =~ /^\./ )
   {
      next;

   }
   else
   {
      my $fpfile = "$dir/$file";

      $count++;

      print "Learning SPAM - $file\n";
system("/usr/bin/sa-learn --spam $fpfile");

      print "Deleting $file\n";

      unlink($fpfile);

   }

}

print "Syncing databases...\n";
system("/usr/bin/sa-learn --sync");

print "De-linting files...\n";
system("/usr/bin/spamassassin --lint");

system("chown vpopmail:vchkpw /home/vpopmail/.spamassassin/*");

system("/usr/bin/qmail-spam restart");

print "Done!\n";

my $duration = time - $starttime;

print "\nTotal duration: $duration seconds\n";
print "Processed $count SPAM files.\n";


Thanks, Jon. I wish we had a little more of this.

Observations:
.) I like the way you've handled parameters
.) Is this learning everything in the user's cur directory as spam? Doesn't
seem appropriate to me
.) all sa-learn and spamassassin commands need to be run as user vpopmail.
How is that happening?
.) qmail-spam is usually in /usr/sbin, not /usr/bin

Eric,

I create a user named spam and I forward everything to that
user/directory.  You're right that this wouldn't be ideal for someone to
run this against their personal mail box.  Alternatively this could
easily be modified to learn email as HAM instead of SPAM.

I just run the script as root - seems to work.
It would update root's bayes database ok (the database is created if it
doesn't exist). That's not the same database which is used on incoming mail
though.

[EMAIL PROTECTED] ~]# perl learn-spam.pl -d dumbfounded.net -u spam
Learning SPAM -
1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2,
Learning SPAM -
1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2,
Learning SPAM -
1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2,
Learning SPAM -
1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2,
Learning SPAM -
1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2,
Learning SPAM -
1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2,
Learning SPAM -
1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2,
Learning SPAM -
1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2,
Learning SPAM -
1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2,
Learning SPAM -
1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2,
Learning SPAM -
1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2,
Learning SPAM -
1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting
1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
Syncing databases...
De-linting files...
Use of uninitialized value in concatenation (.) or string at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30.
Use of uninitialized value in concatenation (.) or string at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30.
Restarting spamd....

/var/qmail/supervise/spamd: up (pid 9686) 2 seconds
/var/qmail/supervise/spamd/log: up (pid 9687) 2 seconds
Done!


Total duration: 83 seconds
Processed 12 SPAM files.
[EMAIL PROTECTED] ~]# ll /home/vpopmail/.spamassassin/
total 24984
-rw-------  1 vpopmail vchkpw   20967424 Jul  6 15:00 auto-whitelist
-rw-------  1 vpopmail vchkpw          5 Jun  9  2007 bayes.mutex
-rw-------  1 vpopmail vpopmail    24648 Jul  6 15:00 bayes_journal
-rw-------  1 vpopmail vchkpw    5214208 Jul  6 15:00 bayes_seen
-rw-------  1 vpopmail vchkpw    5398528 Jul  6 15:00 bayes_toks

Do you perhaps have bayes_path defined in your local.cf file? That would
force sa to use the correct location. It's not a stock toaster setting
though iirc. Perhaps that has been changed.

Nope, I haven't made any custom mods to my local.cf
I commented out the system command that chmod's the files in that
directory.  It doesn't seem to change the ownership of the files to root
if it's ran as root so I don't really see a problem if it's not ran as
the vpopmail user.

IIRC the ownership only gets changed to root when it does expiration
processing, which can happen automatically or manually depending on your
configuration settings.

Location is pretty much irrelevant as long as it's some where in the
PATH as to not have to reference it WITH the full path if necessary.
That's a little confusing to me. If the full path isn't necessary, why
specify one (especially an incorrect one)? This is typically done for
security purposes (in case $PATH is somehow modified).
I'm just use to specifying the path to files.  Even if the /usr/bin is
in the path of the user you execute the script from doesn't mean that
the path is present if you decide to run the script as a
cron...therefore specifying the full path would save you there.

That's the safest thing to do all right, but what if it's pointing to the
wrong directory? (as I think might be the case with qmail-spam)

I wrote this script to work for my environment, I don't expect this to
work flawlessly on every toaster install.  Some people might not even
have the qmail-spam script.  I fail to see how the path is "incorrect"
though.  Because it doesn't work on your server?  Then change the path
in the script. ;)

The location for qmail-spam, as installed with qmailtoaster-plus, is
/usr/sbin (actually there's a symlink there to it's real location, per LSB).
If your qmail-spam is in /usr/bin, I would expect that it's an old version
(not that it has changed any since it was first written).
# locate qmail-spam ?

Ah, I see where the confusion is. I'm not using qmailtoaster-plus - I started using qmail-spam before qt-plus existed and just put it in /usr/bin because it was included in my path.

Reply via email to