Jon Ernster wrote:
> Eric Shubert wrote:
>> Jon Ernster wrote:
>>   
>>> Eric Shubert wrote:
>>>     
>>>> Jon Ernster wrote:
>>>>   
>>>>       
>>>>> Figured I'd share something I wrote that others here might use since
>>>>> many of you have shared or helped me out as well.
>>>>>
>>>>> I wrote this because when I go on vacation I usually shut down the
>>>>> laptop that has my mail rules which sends all my spam to the spam
>>>>> folder.  By the time I get back the shell script that I have that runs
>>>>> on a daily basis (courtesy of Jake Vickers) gives this error because
>>>>> there are too many spam files:
>>>>>
>>>>> /root/learn-spam: /usr/bin/sa-learn: /usr/bin/perl: bad interpreter:
>>>>> Argument list too long
>>>>> /root/learn-spam: line 10: /bin/rm: Argument list too long
>>>>>
>>>>> So I just wrote this to process the files individually.  Not the fastest
>>>>> script in the world (because of SpamAssassin, not becaues of my code,
>>>>> obviously) ;), but it works.
>>>>>
>>>>> J.
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use strict;
>>>>> use warnings;
>>>>> use Getopt::Std;
>>>>>
>>>>> #~ $Id: learn-spam.pl 15 2008-07-03 17:27:09Z jernster $
>>>>>
>>>>> my %opts = ();
>>>>>
>>>>> getopts( 'd:u:', \%opts );
>>>>>
>>>>> my ( $domain, $user ) = @opts{ qw( d u ) };
>>>>>
>>>>> my $usage =<<EOF;
>>>>> Usage:
>>>>>
>>>>>    $0 -d example.com -u user
>>>>>
>>>>> EOF
>>>>>
>>>>> die "$usage" unless ( $domain && $user );
>>>>>
>>>>> my $dir = "/home/vpopmail/domains/$domain/$user/Maildir/cur";
>>>>> my $starttime = time;
>>>>> my $count = 0;
>>>>>
>>>>> opendir(DIR, $dir);
>>>>> my @files = readdir(DIR);
>>>>> close(DIR);
>>>>>
>>>>> foreach my $file ( @files )
>>>>> {
>>>>>    if ( $file =~ /^\./ )
>>>>>    {
>>>>>       next;
>>>>>
>>>>>    }
>>>>>    else
>>>>>    {
>>>>>       my $fpfile = "$dir/$file";
>>>>>
>>>>>       $count++;
>>>>>
>>>>>       print "Learning SPAM - $file\n";
>>>>>       
>>>>>       system("/usr/bin/sa-learn --spam $fpfile");
>>>>>
>>>>>       print "Deleting $file\n";
>>>>>
>>>>>       unlink($fpfile);
>>>>>
>>>>>    }
>>>>>
>>>>> }
>>>>>
>>>>> print "Syncing databases...\n";
>>>>> system("/usr/bin/sa-learn --sync");
>>>>>
>>>>> print "De-linting files...\n";
>>>>> system("/usr/bin/spamassassin --lint");
>>>>>
>>>>> system("chown vpopmail:vchkpw /home/vpopmail/.spamassassin/*");
>>>>>
>>>>> system("/usr/bin/qmail-spam restart");
>>>>>
>>>>> print "Done!\n";
>>>>>
>>>>> my $duration = time - $starttime;
>>>>>
>>>>> print "\nTotal duration: $duration seconds\n";
>>>>> print "Processed $count SPAM files.\n";
>>>>>
>>>>>
>>>>>     
>>>>>         
>>>> Thanks, Jon. I wish we had a little more of this.
>>>>
>>>> Observations:
>>>> .) I like the way you've handled parameters
>>>> .) Is this learning everything in the user's cur directory as spam? Doesn't
>>>> seem appropriate to me
>>>> .) all sa-learn and spamassassin commands need to be run as user vpopmail.
>>>> How is that happening?
>>>> .) qmail-spam is usually in /usr/sbin, not /usr/bin
>>>>
>>>>   
>>>>       
>>> Eric,
>>>
>>> I create a user named spam and I forward everything to that
>>> user/directory.  You're right that this wouldn't be ideal for someone to
>>> run this against their personal mail box.  Alternatively this could
>>> easily be modified to learn email as HAM instead of SPAM.
>>>
>>> I just run the script as root - seems to work.
>>>     
>>
>> It would update root's bayes database ok (the database is created if it
>> doesn't exist). That's not the same database which is used on incoming mail
>> though.
>>
>>   
> [EMAIL PROTECTED] ~]# perl learn-spam.pl -d dumbfounded.net -u spam
> Learning SPAM -
> 1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215351471.M922793P3482V000000000000004AI098D13EC_417.vps.dumbfounded.net,S=2432:2,
> Learning SPAM -
> 1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215370267.M407813P30062V000000000000004AI098D1994_2.vps.dumbfounded.net,S=2260:2,
> Learning SPAM -
> 1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215344032.M28632P3482V000000000000004AI098D1066_416.vps.dumbfounded.net,S=3416:2,
> Learning SPAM -
> 1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215381127.M194384P30062V000000000000004AI098D1A54_5.vps.dumbfounded.net,S=1986:2,
> Learning SPAM -
> 1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215361414.M459353P3482V000000000000004AI098D161C_419.vps.dumbfounded.net,S=2887:2,
> Learning SPAM -
> 1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215366247.M890863P30062V000000000000004AI098D1972_0.vps.dumbfounded.net,S=20120:2,
> Learning SPAM -
> 1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215343789.M956277P3482V000000000000004AI098D083E_415.vps.dumbfounded.net,S=3323:2,
> Learning SPAM -
> 1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215336171.M990235P3482V000000000000004AI098D105C_414.vps.dumbfounded.net,S=2384:2,
> Learning SPAM -
> 1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215379388.M717118P30062V000000000000004AI098D1A26_4.vps.dumbfounded.net,S=15039:2,
> Learning SPAM -
> 1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215368887.M166353P30062V000000000000004AI098D1980_1.vps.dumbfounded.net,S=2380:2,
> Learning SPAM -
> 1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215374947.M536410P30062V000000000000004AI098D19C6_3.vps.dumbfounded.net,S=2662:2,
> Learning SPAM -
> 1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
> Learned tokens from 1 message(s) (1 message(s) examined)
> Deleting
> 1215359554.M445829P3482V000000000000004AI098D108A_418.vps.dumbfounded.net,S=6187:2,
> Syncing databases...
> De-linting files...
> Use of uninitialized value in concatenation (.) or string at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30.
> Use of uninitialized value in concatenation (.) or string at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Scalar/Util.pm line 30.
> Restarting spamd....
> 
> /var/qmail/supervise/spamd: up (pid 9686) 2 seconds
> /var/qmail/supervise/spamd/log: up (pid 9687) 2 seconds
> Done!
> 
> 
> Total duration: 83 seconds
> Processed 12 SPAM files.
> [EMAIL PROTECTED] ~]# ll /home/vpopmail/.spamassassin/
> total 24984
> -rw-------  1 vpopmail vchkpw   20967424 Jul  6 15:00 auto-whitelist
> -rw-------  1 vpopmail vchkpw          5 Jun  9  2007 bayes.mutex
> -rw-------  1 vpopmail vpopmail    24648 Jul  6 15:00 bayes_journal
> -rw-------  1 vpopmail vchkpw    5214208 Jul  6 15:00 bayes_seen
> -rw-------  1 vpopmail vchkpw    5398528 Jul  6 15:00 bayes_toks

Do you perhaps have bayes_path defined in your local.cf file? That would
force sa to use the correct location. It's not a stock toaster setting
though iirc. Perhaps that has been changed.

> I commented out the system command that chmod's the files in that
> directory.  It doesn't seem to change the ownership of the files to root
> if it's ran as root so I don't really see a problem if it's not ran as
> the vpopmail user.

IIRC the ownership only gets changed to root when it does expiration
processing, which can happen automatically or manually depending on your
configuration settings.

>>> Location is pretty much irrelevant as long as it's some where in the
>>> PATH as to not have to reference it WITH the full path if necessary.
>>>     
>> That's a little confusing to me. If the full path isn't necessary, why
>> specify one (especially an incorrect one)? This is typically done for
>> security purposes (in case $PATH is somehow modified).
>>   
> I'm just use to specifying the path to files.  Even if the /usr/bin is
> in the path of the user you execute the script from doesn't mean that
> the path is present if you decide to run the script as a
> cron...therefore specifying the full path would save you there.

That's the safest thing to do all right, but what if it's pointing to the
wrong directory? (as I think might be the case with qmail-spam)

> I wrote this script to work for my environment, I don't expect this to
> work flawlessly on every toaster install.  Some people might not even
> have the qmail-spam script.  I fail to see how the path is "incorrect"
> though.  Because it doesn't work on your server?  Then change the path
> in the script. ;)

The location for qmail-spam, as installed with qmailtoaster-plus, is
/usr/sbin (actually there's a symlink there to it's real location, per LSB).
If your qmail-spam is in /usr/bin, I would expect that it's an old version
(not that it has changed any since it was first written).
# locate qmail-spam ?

-- 
-Eric 'shubes'

---------------------------------------------------------------------
     QmailToaster hosted by: VR Hosted <http://www.vr.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to