FreeBSD is currently installing TxRep.pm rev 1651114 from Jan 12 15:17:46 2015 
(it is the only revision that has only whitespace differences, all leading 
padding, there are code differences between installed and 1650327 (previous) 
and 1678017 (next).  The most recent is 1720440 from Dec 16 20:23:02 2015.

https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm

TxRep.pm 1651114 emits error messages as below, it would seem for each message:

Use of uninitialized value $msgscore in addition (+) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
Use of uninitialized value $msgscore in subtraction (-) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.

to be displayed as reported in bug 7164 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164

The fix is to use 1678017, though work seems to continue through 1720440.  
However, 1720440 seems to cause massive performance issues.

== With FreeBSD installed TxRep.pm (1651114) ==

# sa-learn --clear
# sa-learn --dump
ERROR: Bayes dump returned an error, please re-run with -D for more information

(folder has 236 messages)

# perl -T -d:NYTProf /usr/local/bin/sa-learn --spam --no-sync 
/mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
Use of uninitialized value $msgscore in addition (+) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
Use of uninitialized value $msgscore in subtraction (-) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
....
....
Use of uninitialized value $msgscore in addition (+) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
Use of uninitialized value $msgscore in subtraction (-) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
Learned tokens from 234 message(s) (234 message(s) examined)
locker: error accessing /var/amavis/.spamassassin/bayes.lock: No such file or 
directory at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 190.
locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due 
to expiry at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219.

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        234          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0      31174          0  non-token data: ntokens
0.000          0 1498800507          0  non-token data: oldest atime
0.000          0 1509734778          0  non-token data: newest atime
0.000          0 1509879373          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction 
count

Profiling results for sa-learn: 
http://www.blackrosetech.com/insecure/nytprof-txrep-1651114/
732 seconds (3.1 seconds/message), 1.64 seconds in TxRep.pm (pretty trivial), 
mostly in the check_reputation subroutine
TxRep.pm profiling results 
http://www.blackrosetech.com/insecure/nytprof-txrep-1651114/Mail-SpamAssassin-Plugin-TxRep-pm-171-line.html

(side note, this maxed out one of 8 physical cores for the duration of the run, 
mostly at 100%)


== With latest TxRep.pm (1720440) ==

# sa-learn --clear
# sa-learn --dump
ERROR: Bayes dump returned an error, please re-run with -D for more information

(Same folder, 236 messages)

# perl -T -d:NYTProf /usr/local/bin/sa-learn --spam --no-sync 
/mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
Learned tokens from 234 message(s) (234 message(s) examined)
locker: error accessing /var/amavis/.spamassassin/bayes.lock: No such file or 
directory at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 190.
locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due 
to expiry at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219.


Profiling results for sa-learn: 
http://www.blackrosetech.com/insecure/nytprof-txrep-1720440/
96451 seconds (132x longer) 94419 seconds in sleep.  

Weirdly, TxRep.pm consumed only 12.1 seconds, more but only 7.4x more.    

Note the only change was to replace TxRep.pm-1651114 with TxRep.pm-1720440

(side note, this command never showed > 4.2% CPU use, usually at 0.0%)


I'll see if anyone can make more sense of this than I can.   If not, I'll try 
each of the TxRep.pm variants from the one that is fast but throws errors to 
the current one and see if that provides any insight into the problematic code 
change.

It should significantly speed up sa-learn if it were threaded, no?  I mean not 
with 1720440, which has other problems, but if those are fixed....  This seems 
like the sort of operation that could be easily parallelized.  

Reply via email to