Eric Shubert wrote:
Jon Ernster wrote:
Eric Shubert wrote:
Jon Ernster wrote:
Figured I'd share something I wrote that others here might use since
many of you have shared or helped me out as well.

I wrote this because when I go on vacation I usually shut down the
laptop that has my mail rules which sends all my spam to the spam
folder.  By the time I get back the shell script that I have that runs
on a daily basis (courtesy of Jake Vickers) gives this error because
there are too many spam files:

/root/learn-spam: /usr/bin/sa-learn: /usr/bin/perl: bad interpreter:
Argument list too long
/root/learn-spam: line 10: /bin/rm: Argument list too long

So I just wrote this to process the files individually.  Not the fastest
script in the world (because of SpamAssassin, not becaues of my code,
obviously) ;), but it works.

J.

------------------------------------------------------------------------

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Std;

#~ $Id: learn-spam.pl 15 2008-07-03 17:27:09Z jernster $

my %opts = ();

getopts( 'd:u:', \%opts );

my ( $domain, $user ) = @opts{ qw( d u ) };

my $usage =<<EOF;
Usage:

   $0 -d example.com -u user

EOF

die "$usage" unless ( $domain && $user );

my $dir = "/home/vpopmail/domains/$domain/$user/Maildir/cur";
my $starttime = time;
my $count = 0;

opendir(DIR, $dir);
my @files = readdir(DIR);
close(DIR);

foreach my $file ( @files )
{
   if ( $file =~ /^\./ )
   {
      next;

   }
   else
   {
      my $fpfile = "$dir/$file";

      $count++;

      print "Learning SPAM - $file\n";
system("/usr/bin/sa-learn --spam $fpfile");

      print "Deleting $file\n";

      unlink($fpfile);

   }

}

print "Syncing databases...\n";
system("/usr/bin/sa-learn --sync");

print "De-linting files...\n";
system("/usr/bin/spamassassin --lint");

system("chown vpopmail:vchkpw /home/vpopmail/.spamassassin/*");

system("/usr/bin/qmail-spam restart");

print "Done!\n";

my $duration = time - $starttime;

print "\nTotal duration: $duration seconds\n";
print "Processed $count SPAM files.\n";


Thanks, Jon. I wish we had a little more of this.

Observations:
.) I like the way you've handled parameters
.) Is this learning everything in the user's cur directory as spam? Doesn't
seem appropriate to me
.) all sa-learn and spamassassin commands need to be run as user vpopmail.
How is that happening?
.) qmail-spam is usually in /usr/sbin, not /usr/bin

Eric,

I create a user named spam and I forward everything to that
user/directory.  You're right that this wouldn't be ideal for someone to
run this against their personal mail box.  Alternatively this could
easily be modified to learn email as HAM instead of SPAM.

I just run the script as root - seems to work.

It would update root's bayes database ok (the database is created if it
doesn't exist). That's not the same database which is used on incoming mail
though.

[EMAIL PROTECTED] ~]# perl learn-spam.pl -d dumbfounded.net -u spam
Learning SPAM - 1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2, Learning SPAM - 1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2, Learning SPAM - 1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2, Learning SPAM - 1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2, Learning SPAM - 1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2, Learning SPAM - 1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2, Learning SPAM - 1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2, Learning SPAM - 1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2, Learning SPAM - 1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2, Learning SPAM - 1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2, Learning SPAM - 1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2, Learning SPAM - 1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
Learned tokens from 1 message(s) (1 message(s) examined)
Deleting 1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
Syncing databases...
De-linting files...
Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30. Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30.
Restarting spamd....

/var/qmail/supervise/spamd: up (pid 9686) 2 seconds
/var/qmail/supervise/spamd/log: up (pid 9687) 2 seconds
Done!


Total duration: 83 seconds
Processed 12 SPAM files.
[EMAIL PROTECTED] ~]# ll /home/vpopmail/.spamassassin/
total 24984
-rw-------  1 vpopmail vchkpw   20967424 Jul  6 15:00 auto-whitelist
-rw-------  1 vpopmail vchkpw          5 Jun  9  2007 bayes.mutex
-rw-------  1 vpopmail vpopmail    24648 Jul  6 15:00 bayes_journal
-rw-------  1 vpopmail vchkpw    5214208 Jul  6 15:00 bayes_seen
-rw-------  1 vpopmail vchkpw    5398528 Jul  6 15:00 bayes_toks

I commented out the system command that chmod's the files in that directory. It doesn't seem to change the ownership of the files to root if it's ran as root so I don't really see a problem if it's not ran as the vpopmail user.
Location is pretty much irrelevant as long as it's some where in the
PATH as to not have to reference it WITH the full path if necessary.
That's a little confusing to me. If the full path isn't necessary, why
specify one (especially an incorrect one)? This is typically done for
security purposes (in case $PATH is somehow modified).
I'm just use to specifying the path to files. Even if the /usr/bin is in the path of the user you execute the script from doesn't mean that the path is present if you decide to run the script as a cron...therefore specifying the full path would save you there.

I wrote this script to work for my environment, I don't expect this to work flawlessly on every toaster install. Some people might not even have the qmail-spam script. I fail to see how the path is "incorrect" though. Because it doesn't work on your server? Then change the path in the script. ;)
Appreciate the comments!

Jon



Reply via email to