[SNIP] >>>> Hi there, >>>> >>>> >>>> This is the table I have in mysql, and the one I intend to populate with >>>> data:- >>>> >>>> mysql> describe bayes_vars; >>>> +--------------------+--------------+------+-----+------------+----------------+ >>>> | Field | Type | Null | Key | Default | >>>> Extra | >>>> +--------------------+--------------+------+-----+------------+----------------+ >>>> | id | int(11) | NO | PRI | NULL | >>>> auto_increment | >>>> | username | varchar(200) | NO | UNI | >>>> | | >>>> | spam_count | int(11) | NO | | 0 >>>> | | >>>> | ham_count | int(11) | NO | | 0 >>>> | | >>>> | token_count | int(11) | NO | | 0 >>>> | | >>>> | last_expire | int(11) | NO | | 0 >>>> | | >>>> | last_atime_delta | int(11) | NO | | 0 >>>> | | >>>> | last_expire_reduce | int(11) | NO | | 0 >>>> | | >>>> | oldest_token_age | int(11) | NO | | 2147483647 >>>> | | >>>> | newest_token_age | int(11) | NO | | 0 >>>> | | >>>> +--------------------+--------------+------+-----+------------+----------------+ >>>> 10 rows in set (0.00 sec) >>>> >>>> >>>> The configuration I intend to use for Bayes is: >>>> >>>> -------------------- START local.cf ------------------------------- >>>> rewrite_header Subject *****SPAM***** >>>> report_safe 0 >>>> report_hostname xxx.xxx.com >>>> dns_available yes >>>> use_dcc 1 >>>> dcc_path /usr/local/bin/dccproc >>>> dcc_home /var/dcc >>>> use_pyzor 1 >>>> pyzor_path /usr/bin/pyzor >>>> pyzor_timeout 5 >>>> use_razor2 1 >>>> razor_config /etc/razor/razor-agent.conf >>>> razor_timeout 5 >>>> >>>> required_score 6.0 >>>> >>>> use_bayes 1 >>>> skip_rbl_checks 1 >>>> bayes_auto_learn 0 >>>> # bayes_auto_learn_threshold_nonspam 0.1 >>>> # bayes_auto_learn_threshold_spam 13.0 >>>> >>>> bayes_expiry_max_db_size 300000 >>>> bayes_auto_expire 1 >>>> >>>> bayes_sql_override_username postfix >>>> # I don't understand what this setting does, nor why its postfix. >>>> Postfix has no intereaction with SA in my set-up as postfix pipes the >>>> mail into dovecot,and dovecot handles the spamc portion before filing >>>> the email. >>>> >>>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL >>>> bayes_sql_dsn DBI:mysql:spamassassin:localhost >>>> bayes_sql_username |shamster_user >>>> |bayes_sql_password shamster||_password| >>>> >>>> ifplugin Mail::SpamAssassin::Plugin::Shortcircuit >>>> shortcircuit USER_IN_WHITELIST on >>>> shortcircuit SUBJECT_IN_WHITELIST on >>>> shortcircuit USER_IN_BLACKLIST on >>>> shortcircuit SUBJECT_IN_BLACKLIST on >>>> >>>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody >>>> endif >>>> >>>> score RDNS_DYNAMIC 2.639 0.363 1.663 1.700 >>>> meta __PILL_PRICE_1 (0) >>>> meta __PILL_PRICE_2 (0) >>>> meta __PILL_PRICE_3 (0) >>>> -------------------- END local.cf ------------------------------- >>>> >>>> N.B Yes, I know there are some custom rules in the local.cf and these'll >>>> be lost after an upgrade of SA, but I have reasonable backups. >>>> >>>> * Questions >>>> Does the configuration above look correct? >>>> Will SA only write into the table bayes_vars, or will it touch other >>>> tables? >>> Seems that some process butchered part of the config by discovering some >>> pipe characters. >>> [SNIP] >>> >>> Other question: If the above looks correct, is that somethin else that I >>> ought to enable? e.g plugins for mysql, or a particular perl module >>> that I might have omitted? >>> >>> Regards, S. >> Regarding local.cf >> >> Should the password be quoted such as in single quotes? >> >> The password has many strange chars in it e.g >> bayes_sql_password fg$%-)_()(Wsuisrt{^%TEST > RTFM problem... Apologies. > > Jun 30 16:10:11.628 [2220] dbg: bayes: found bayes db version 3 > Jun 30 16:10:11.628 [2220] dbg: bayes: Using userid: 186 > Jun 30 16:10:11.628 [2220] dbg: bayes: not available for scanning, > only 0 spam(s) in bayes DB < 200 > > Solved by feeding one piece of spam to init the database: > sa-learn --spam gtube.txt > > However, I added some messages, but the detail from --dump magic shows > nothing: > # sa-learn --ham cur/ > Learned tokens from 25 message(s) (26 message(s) examined) > # sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 0 0 non-token data: nspam > 0.000 0 0 0 non-token data: nham > 0.000 0 0 0 non-token data: ntokens > 0.000 0 2147483647 0 non-token data: oldest atime > 0.000 0 0 0 non-token data: newest atime > 0.000 0 0 0 non-token data: last journal > sync atime > 0.000 0 0 0 non-token data: last expiry atime > 0.000 0 0 0 non-token data: last expire > atime delta > 0.000 0 0 0 non-token data: last expire > reduction count > > I checked if the postfix entry was created in bayes_vars; > | postfix | 0 | 0 | > +-------------------------------+------------+-----------+ > > Does this look correct? > > > > I loaded a substantial number of messages via sa-learn :
mysql> select * from bayes_vars where username='postfix'; +-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+ | id | username | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | oldest_token_age | newest_token_age | +-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+ | 186 | postfix | 0 | 0 | 0 | 0 | 0 | 0 | 2147483647 | 0 | +-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+ 1 row in set (0.00 sec) # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 2147483647 0 non-token data: oldest atime 0.000 0 0 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count Still the data was not put into it. I would be intersted to know where it did store the data, because there might well be a file on the disc that is growing for no real reason? Does anyone know where sa-learn would put the data, if its not loading it into mysql? Regards