Re: price less

2023-03-04 Thread Bert Van de Poel
There are similar mailings for other kinds of "customers". MongoDB 
customers, MariaDB Users, SugarCRM users, Unix Software users.


I have a bunch of rules against them. If I send samples they won't make 
it through our filters.



On 4/03/2023 19:05, Benny Pedersen wrote:

Hello,



I would like to know if you are interested in acquiring Colocation 
Customer List.




Information fields: Names, Title, Email, Phone, Company Name, Company 
URL, Company physical address, SIC Code, Industry, and Specialty 
(Revenue and Employee).




Please let me know your target geography so that I will get back to 
you with the counts, Pricing, and more information.




Regards,

Sarah

Marketing Executive


spamasssin tag

X-Spam-ASN: AS15169 GOOGLE
Return-Path: 
X-Spam-Status: No, score=2.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM,HTML_MESSAGE,
ITA_GMAIL_UNDISCLOSED,KAM_LIST3_1,RCVD_IN_DNSWL_NONE,RELAYCOUNTRY_GREY, 


SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0

maybe time for a new job without any urls ?





Re: How to incorporate network blocks

2022-11-11 Thread Bert Van de Poel

Actually, ipset supports - syntax:
   CREATE-OPTIONS := range fromip-toip|ip/cidr [ netmask cidr ] [ 
timeout value ] [ counters ] [ comment ] [ skbinfo ]



On 11/11/2022 18:10, Bill Cole wrote:

On 2022-11-11 at 11:26:13 UTC-0500 (Fri, 11 Nov 2022 09:26:13 -0700)
Grant Taylor via users 
is rumored to have said:


On 11/11/22 9:09 AM, Bert Van de Poel wrote:

- IP/CIDR lists like the one you mention, but also lists like Stop Forum Spam 
(https://www.stopforumspam.com/) I cron fetch then add to an ipset with a DROP 
(which is quite similar to what others are suggesting).

Stop Forum Spam seems interesting.

I'd be curious to see how you're converting SFS list(s) to ipset entries.  Mostly I've not yet had 
enough coffee to convert from a range of IPs; -, to CIDR; 
/.

 From my bashrc...

# type cidrcon
cidrcon is a function
cidrcon ()
{
 for a in $*;
 do
 echo $a;
 done | perl -e "use Net::CIDR::Lite;  \$cidr = Net::CIDR::Lite->new(<>) ; \$_ = join 
(\"\n\",\$cidr->list) ; print \"\$_\n\";"
}

Obviously requires Perl and the Net::CIDR::Lite module. I do not recall why the 
implementation is so weird, but I've been using it for decades(!?)



I didn't pay close attention to the list, but I did see that it was range based 
and would need some conversion.  --  I have added it to my pile of things to 
look at more closely later.



--
Grant. . . .
unix || die




Re: How to incorporate network blocks

2022-11-11 Thread Bert Van de Poel
I've been dealing with IP blocklists using two other methods before 
email even reaches SA:
- In postfix my smtpd_recipient_restrictions includes "reject_rbl_client 
zen.spamhaus.org, reject_rhsbl_reverse_client dbl.spamhaus.org, 
reject_rhsbl_helo dbl.spamhaus.org, reject_rhsbl_sender 
dbl.spamhaus.org" and I'm guessing potentially others could be added.
- IP/CIDR lists like the one you mention, but also lists like Stop Forum 
Spam (https://www.stopforumspam.com/) I cron fetch then add to an ipset 
with a DROP (which is quite similar to what others are suggesting).

I find that those are quite suitable.

Bert

On 10/11/2022 18:05, Grant Taylor via users wrote:

On 11/10/22 9:54 AM, Joey J wrote:

Hello All,


Hi,

I'm trying to see if there is a way to incorporate network ranges 
into SA to essentially flag messages.


N.B. at least one of the lists below is individual IPs and not 
networks / ranges of IPs.  --  I'm not sure how to square that peg 
with your wants / needs.


I know I can use iptables and reject it before getting to SA, but in 
some cases we would have legit email get flagged within these bigger 
blocks.


I would suggest investigating the other offerings from each vendor.  I 
suspect there is a good chance that many, if not all, of them offer a 
DNS based query method.


See Riccardo's comment about Spamhaus / Spamteq.


I'm trying to incorporate:
feeds.dshield.org/block.txt
spamhaus.org/drop/drop.lasso
ciarmy.com/list/ci-badguys.txt
openbl.org/lists/base.txt
Short of that, it wouldn't be hard to turn them into a locally hosted 
BL and then configure SpamAssassin to query it.








Re: subscribe to blacklist for domains

2022-08-13 Thread Bert Van de Poel
I think what Noel is referring to is Postfix configuration like this for 
example:
smtpd_recipient_restrictions = permit_mynetworks, 
permit_sasl_authenticated, reject_unauth_destination, reject_rbl_client 
zen.spamhaus.org, reject_rhsbl_reverse_client dbl.spamhaus.org, 
reject_rhsbl_helo dbl.spamhaus.org, reject_rhsbl_sender 
dbl.spamhaus.org, reject_non_fqdn_recipient, reject_unknown_recipient_domain


Notice the spamhaus links for different blocklist settings.

On 13/08/2022 15:38, joe a wrote:

On 8/12/2022 11:43 PM, Noel Butler wrote:

Why are you not blocking with blacklists at the border, ie: MTA.


I'm not familiar with how to do that or if it can be done.  Since SA 
offers this functionality, so did not even consider that. I'll look 
into it.


Given its 0 resources for your MTA, with anti spam checking on SA 
often using significant resources (depending on traffic/number of 
tests/rules etc), its best to stop it getting to SA in the first place.


SA also has this by-default list of domains that it never checks, for 
along time I have disagreed with this, we are the ones to decide who 
gets whitelisted not SA, not some paid third party, the option 
clear_uridnsbl_skip_domain  however prevents this, but then you have 
to locate and 0 all the general rulesets scores that are whitelists 
as well.




The configuration/usage of those lists causes me great frustration. 
Semi retirement and infrequent "tech stuff" may be partly to blame.







Re: Spam with Pyzor and DCC scores

2022-07-11 Thread Bert Van de Poel

On 11/07/2022 15:44, Matus UHLAR - fantomas wrote:

On 11.07.22 12:57, Bert Van de Poel wrote:
A few times a month we have spam messages getting through, often in 
German, that have some spam score but not enough to be 
marked/discarded. Always these messages are marked by DCC, since 
they're of course bulk spam, but it's also not uncommon to see Pyzor 
as well. I've been wondering if there are realistic cases where both 
DCC and Pyzor would mark as spam while the message was ham.


this is likely to happen if the message is empty or learly empty.
some people are stupid, send one-two words or a short link in message 
without Subject: ...



Oh yeah, that's a case I hadn't thought of, good point!
I feel like when both co-occur it's a pretty solid sign it's spam.  
Therefore, I'm wondering if an upstream amplification (or a local 
one) would make sense.


Some examples (I can also supply full emails, but fear this might 
prevent my message from arriving):

X-Spam-Status: No, score=4.082 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.001,
    HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_IMAGE_RATIO_08=0.001,
    HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, PYZOR_CHECK=1.985,
    SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, T_SCC_BODY_TEXT_LINE=-0.01]
X-Spam-Status: No, score=4.816 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.001,
    HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_IMAGE_ONLY_28=0.726,
    HTML_IMAGE_RATIO_02=0.001, HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1,
    PYZOR_CHECK=1.985, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652,
    T_REMOTE_IMAGE=0.01, T_SCC_BODY_TEXT_LINE=-0.01]
X-Spam-Status: No, score=4.109 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.029,
    HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_IMAGE_RATIO_04=0.001,
    HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, PYZOR_CHECK=1.985,
    SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, T_SCC_BODY_TEXT_LINE=-0.01]


looks like you should implement bayes.
since these are generated by amavis, you could train amavis database.

We have Bayes running on the main server, but my own local server 
doesn't have it so hence why it's missing. I did however take all spam I 
received myself in 2022 that wasn't caught and fed it to sa-learn (for 
the amavis user), thx for that suggestion. Let's hope it works to remove 
this minor inconvenience :)




Spam with Pyzor and DCC scores

2022-07-11 Thread Bert Van de Poel

Hi everyone,

A few times a month we have spam messages getting through, often in 
German, that have some spam score but not enough to be marked/discarded. 
Always these messages are marked by DCC, since they're of course bulk 
spam, but it's also not uncommon to see Pyzor as well. I've been 
wondering if there are realistic cases where both DCC and Pyzor would 
mark as spam while the message was ham. I feel like when both co-occur 
it's a pretty solid sign it's spam. Therefore, I'm wondering if an 
upstream amplification (or a local one) would make sense.


Some examples (I can also supply full emails, but fear this might 
prevent my message from arriving):

X-Spam-Status: No, score=4.082 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.001,
    HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_IMAGE_RATIO_08=0.001,
    HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, PYZOR_CHECK=1.985,
    SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, T_SCC_BODY_TEXT_LINE=-0.01]
X-Spam-Status: No, score=4.816 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.001,
    HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_IMAGE_ONLY_28=0.726,
    HTML_IMAGE_RATIO_02=0.001, HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1,
    PYZOR_CHECK=1.985, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652,
    T_REMOTE_IMAGE=0.01, T_SCC_BODY_TEXT_LINE=-0.01]
X-Spam-Status: No, score=4.109 tagged_above=- required=5
    tests=[DCC_CHECK=1.1, DIGEST_MULTIPLE=0.001, FSL_BULK_SIG=0.029,
    HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_IMAGE_RATIO_04=0.001,
    HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, PYZOR_CHECK=1.985,
    SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, T_SCC_BODY_TEXT_LINE=-0.01]

What's people's opinion here?

Kind regards,
Bert Van de Poel
ULYSSIS


Re: Spamassassin spamming in log

2022-06-02 Thread Bert Van de Poel
If you are using systemd, you can "systemctl disable spamd". Otherwise 
you can indeed use the enabled=0. I would probably do both just in case ;)



On 2/06/2022 20:36, Timo Brandt wrote:


Maybe one of you has a hint for me how to disable the automatic 
startup of spamd?


Its been a long time ago that I setup a Debian from scratch :-(

It seems that spamd doesnt need to start at system boot so I will 
disable it.



Will this be done when I add ENABLED=0 into the file 
/etc/default/spamassassin  ?




Thanks,

Timo


Am 2022-06-02 20:27, schrieb Timo Brandt:


Hi all,


indeed - sorry.

I wasnt aware of that I do not need to run spamd beside amavis 若

Thanks for all your help.

Timo


Am 2022-06-02 20:18, schrieb Matija Nalis:

On Thu, Jun 02, 2022 at 02:47:28PM +0200, Bert Van de Poel wrote:

For the errors about nonexistent uses you will want to have a
look at
/etc/default/spamassassin I'm guessing.
For the info messages: this has just got to do with your
logging level. You
will want to decrease it in local.cf or maybe also in the
default file.


Also, depending on your distro and init system,
/etc/default/spamassassin
might not be processed (e.g. on Debian systems, in many cases
/etc/default/*
entries are only read via /etc/init.d/* System-V-init scripts, and
not used when using default systemd init system).

You should use "ps auxw" to determine with what exactly
parameters it is being run, and then grep the system for those flags
if different from ones in /etc/default/spamassassin (esp. when you
change that file and restart, but changes are not applied)



Re: Spamassassin spamming in log

2022-06-02 Thread Bert Van de Poel
Did you restart the unit after changing the configuration? It does seem 
like debian-spamd is indeed the correct user. I'm not sure how exactly 
the spawning of children works within SA. Has your CPU usage decreased now?


PS: you can just reply to the list, no need to email me personally every 
time, that just causes me to get each message twice.


On 2/06/2022 15:17, Timo Brandt wrote:


Hi Bert,


I checked the user table:

debian-spamd:x:114:120::/var/lib/spamassassin:/usr/sbin/nologin

And also adjusted the config file:

OPTIONS="-u debian-spamd --create-prefs --max-children 5 
--helper-home-dir -s /var/log/spamassassin/spamd.log



But process is already running under root:


Am 2022-06-02 15:13, schrieb Bert Van de Poel:

For the error: does the spamd user actually exist? that's a 
requirement of course.


I've always controlled SA loglevels through amavis, but from the 
spamd man page I would expect that it's related to -D. I'm not 
completely sure what the default is. 
http://wiki.apache.org/spamassassin/DebugChannels 
<http://wiki.apache.org/spamassassin/DebugChannels> is listed for 
more information.


I expect your high CPU usage is just coming from SA trying to spawn 
children as a user that doesn't exist though.


On 2/06/2022 14:57, Timo Brandt wrote:


Hi Bert,

many thanks for your answer.

Please find the spamassassin config below.

I already checked it but do not find anything to change which is 
stopping the flooding.



Also, spamassassin is consuming mostly all of my CPU power.

When its running, the CPU is nearly the whole time at 99%. When I 
stop spamassassin, the CPU consumption is going down.



# /etc/default/spamassassin
# Duncan Findlay

# WARNING: please read README.spamd before using.
# There may be security risks.

# Prior to version 3.4.2-1, spamd could be enabled by setting
# ENABLED=1 in this file. This is no longer supported. Instead, please
# use the update-rc.d command, invoked for example as "update-rc.d 
spamassassin enable", to enable the spamd service.


# Options
# See man spamd for possible options. The -d option is automatically 
added.

ENABLED=1
# SpamAssassin uses a preforking model, so be careful! You need to
# make sure --max-children is not set to anything higher than 5,
# unless you know what you're doing.
OPTIONS="--create-prefs --max-children 5 --helper-home-dir 
--username spamd --helper-home-dir /home/spamd -s 
/var/log/spamassassin/spamd.log


# Pid file
# Where should spamd write its PID to file? If you use the -u or
# --username option above, this needs to be writable by that user.
# Otherwise, the init script will not be able to shut spamd down.
PIDFILE="/var/run/spamd.pid"

# Set nice level of spamd
#NICE="--nicelevel 15"

# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
CRON=1


Am 2022-06-02 14:47, schrieb Bert Van de Poel:

For the errors about nonexistent uses you will want to have a
look at /etc/default/spamassassin I'm guessing.
For the info messages: this has just got to do with your logging
level. You will want to decrease it in local.cf or maybe also in
the default file.

On 2/06/2022 14:33, Timo Brandt wrote:

Hi all,

I have a running debian 11 with postfix/dovecot and Amavis
with clamav / spamassassin.
I saw that the spamassassin logfile is growing very fast and
found the following entries occuring many times per second.
Can you maybe help me to get this fixed?
I searched along the internet but did not find really a solution.
Do you need any config files to check?

Thanks for your help in advance,
Timo

Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled
cleanup of child pid [1849690] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled
cleanup of child pid [1849691] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled
cleanup of child pid [1849692] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled
cleanup of child pid [1849693] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server
successfully spawned child process, pid 1849698
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled
cleanup of child pid [1849696] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1849698] error: spamd: cannot run
as nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server
successfully spawned child process, pid 1849699
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: SS
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server
successfully spawned child process, pid 1849700
Thu Jun  2 11:43:11 2022 [

Re: Spamassassin spamming in log

2022-06-02 Thread Bert Van de Poel
For the error: does the spamd user actually exist? that's a requirement 
of course.


I've always controlled SA loglevels through amavis, but from the spamd 
man page I would expect that it's related to -D. I'm not completely sure 
what the default is. http://wiki.apache.org/spamassassin/DebugChannels 
is listed for more information.


I expect your high CPU usage is just coming from SA trying to spawn 
children as a user that doesn't exist though.


On 2/06/2022 14:57, Timo Brandt wrote:


Hi Bert,

many thanks for your answer.

Please find the spamassassin config below.

I already checked it but do not find anything to change which is 
stopping the flooding.



Also, spamassassin is consuming mostly all of my CPU power.

When its running, the CPU is nearly the whole time at 99%. When I stop 
spamassassin, the CPU consumption is going down.



# /etc/default/spamassassin
# Duncan Findlay

# WARNING: please read README.spamd before using.
# There may be security risks.

# Prior to version 3.4.2-1, spamd could be enabled by setting
# ENABLED=1 in this file. This is no longer supported. Instead, please
# use the update-rc.d command, invoked for example as "update-rc.d 
spamassassin enable", to enable the spamd service.


# Options
# See man spamd for possible options. The -d option is automatically 
added.

ENABLED=1
# SpamAssassin uses a preforking model, so be careful! You need to
# make sure --max-children is not set to anything higher than 5,
# unless you know what you're doing.
OPTIONS="--create-prefs --max-children 5 --helper-home-dir --username 
spamd --helper-home-dir /home/spamd -s /var/log/spamassassin/spamd.log


# Pid file
# Where should spamd write its PID to file? If you use the -u or
# --username option above, this needs to be writable by that user.
# Otherwise, the init script will not be able to shut spamd down.
PIDFILE="/var/run/spamd.pid"

# Set nice level of spamd
#NICE="--nicelevel 15"

# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
CRON=1


Am 2022-06-02 14:47, schrieb Bert Van de Poel:

For the errors about nonexistent uses you will want to have a look at 
/etc/default/spamassassin I'm guessing.
For the info messages: this has just got to do with your logging 
level. You will want to decrease it in local.cf or maybe also in the 
default file.


On 2/06/2022 14:33, Timo Brandt wrote:

Hi all,

I have a running debian 11 with postfix/dovecot and Amavis with 
clamav / spamassassin.
I saw that the spamassassin logfile is growing very fast and found 
the following entries occuring many times per second.

Can you maybe help me to get this fixed?
I searched along the internet but did not find really a solution.
Do you need any config files to check?

Thanks for your help in advance,
Timo

Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849690] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849691] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849692] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849693] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849698
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849696] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1849698] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849699

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: SS
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849700
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. Increasing spamd 
children: 1849700 started.
Thu Jun  2 11:43:11 2022 [1849699] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1849700] error: spamd: cannot run as 
nonexistent user or root with -u option

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: SSS
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849701
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. Increasing spamd 
children: 1849701 started.

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: 
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849702
Thu Jun  2 11:43:11 2022 [1849701] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. 

Re: Spamassassin spamming in log

2022-06-02 Thread Bert Van de Poel
For the errors about nonexistent uses you will want to have a look at 
/etc/default/spamassassin I'm guessing.
For the info messages: this has just got to do with your logging level. 
You will want to decrease it in local.cf or maybe also in the default file.


On 2/06/2022 14:33, Timo Brandt wrote:

Hi all,

I have a running debian 11 with postfix/dovecot and Amavis with clamav 
/ spamassassin.
I saw that the spamassassin logfile is growing very fast and found the 
following entries occuring many times per second.

Can you maybe help me to get this fixed?
I searched along the internet but did not find really a solution.
Do you need any config files to check?

Thanks for your help in advance,
Timo

Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849690] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849691] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849692] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849693] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849698
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: handled cleanup of 
child pid [1849696] due to SIGCHLD: exit 255
Thu Jun  2 11:43:11 2022 [1849698] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849699

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: SS
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849700
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. Increasing spamd children: 
1849700 started.
Thu Jun  2 11:43:11 2022 [1849699] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1849700] error: spamd: cannot run as 
nonexistent user or root with -u option

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: SSS
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849701
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. Increasing spamd children: 
1849701 started.

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: 
Thu Jun  2 11:43:11 2022 [1848608] info: spamd: server successfully 
spawned child process, pid 1849702
Thu Jun  2 11:43:11 2022 [1849701] error: spamd: cannot run as 
nonexistent user or root with -u option
Thu Jun  2 11:43:11 2022 [1848608] info: prefork: adjust: 0 idle 
children less than 1 minimum idle children. Increasing spamd children: 
1849702 started.

Thu Jun  2 11:43:11 2022 [1848608] info: prefork: child states: S






Re: [SPAM?] Re: Memory requirement for SpamAssassin/Postfix/Roundcube/Dovecot stack

2022-05-26 Thread Bert Van de Poel
If you want to save on memory usage, just having amavis filter out exe 
files or exe-like files (screensavers, exes in archives, etc.) is much 
more efficient than using clamav. Of course this doesn't filter out 
Office macros/OLE, but there's a plugin in SA related to that, I believe.



On 26/05/2022 16:49, Ian Evans wrote:
On Thu, May 26, 2022, 10:36 AM Reindl Harald,  
wrote:




Am 26.05.22 um 16:32 schrieb Ian Evans:
> File under "questions I think I already know the answer to."
>
> Looking at moving my site to a new host and I'm pondering
splitting my
> web/email servers which have always shared the same server.
>
> Our email server is five accounts. Just me and the missus. A big
day is
> receiving 200 emails.
>
> Is it safe to assume that a $5/mth 1gig memory account will
laugh at the
> resources needed to run
a SpamAssassin/Postfix/Roundcube/Dovecot/Nginx
> stack and not ever break a sweat?

when you add clamav later it will be clamav who laughs about 1 GB
memory
after it has sucked it completly


Just looked at clamav's memory usage. Ouch. :)



Regex error in most recent update

2022-02-18 Thread Bert Van de Poel

Hi everyone,

I just noticed we had two email servers complain last night after 
running sa-update about a regex problem:

/etc/cron.daily/spamassassin:
config: invalid regexp for __URI_TRY_3LD 
'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?Variable length lookbehind is experimental in regex; marked by <-- HERE 
in 
m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<-- HERE /


channel 'updates.spamassassin.org': lint check of update failed, channel 
failed

sa-update failed for unknown reasons


Did anyone else notice the same thing or is it just on our end?

Kind regards,
Bert


Re: Do these domains merit blocking?

2021-12-15 Thread Bert Van de Poel
You can find the email we received from them here 
http://paste.debian.net/1223611/ (just the body, idk if anyone also want 
headers)


Must admit I thought it was a scam, just because it was its own domain, 
out of the blue and as many have mentioned unsolicited.


Bert

On 15/12/2021 19:24, Charles Sprickman wrote:

Does anyone have a sample of one of their emails?

I’m composing a brief nastygram and would like to get my eyes on one before 
finishing up.

Thanks,

Charles


On Dec 15, 2021, at 11:39 AM, Bill Cole 
 wrote:

There has recently been a spate of odd spams to harvested addresses asking 
hypothetical questions about domains' privacy practices. It turns out this is a 
grad student enrolling human subjects in a study without informed consent... 
The explanation is at https://measurement.cs.princeton.edu/privacystudy/ and 
there is a list of domains there which were created to run this maldesigned 
study.

Many of the early batch compounded the consent problem with outright fraud, 
claiming to be from people who do not exist.

I am curious about what the SA user world thinks of such domains. My personal 
opinion is that the grad student, his faculty advisors, and his IRB should all 
be forced to find new careers and the domains should have a null CNAME at the 
root forever. It appears that URIBL, SURBL, and Spamhaus DBL have all noticed 
the domains unflatteringly, which I suppose constitutes a more balanced 
consequence...

A customer has expressed mild dismay at the concept that a fine research institution 
should be "punished for doing research." I'm less attached to Princeton than my 
NJ-based customer and (having worked in a NIH-funded lab) less idolizing of the Ivory 
Tower in general. I have no difficulty explaining my position, but I am rather surprised 
that I need to in 2021. Am I missing something special that makes such research spam 
somehow not spam?

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire




Re: why are not all rules run all the time

2021-10-08 Thread Bert Van de Poel
DNSWL is a whitelist for mailservers. So the tests based on that use the 
IP that handed your trusted_networks the email.


Several tests are based on the transmitting server instead of just the 
email contents, since contents can be convincing or not, if the server 
is notorious for sending spam it will end up on blocklists for example.



On 8/10/2021 11:57, Thomas Seilund wrote:


On 10/8/21 11:38 AM, Matus UHLAR - fantomas wrote:

On 08.10.21 11:18, Thomas Seilund wrote:

I run SA 3.4.2 on Debian GNU/Linux 10 (buster)

If I look at incomming mails after SA has processed the incomming 
mail then the list of SA rules that have been run is not the same 
for all mails.


Below are to examples:

X-Spam-Status: No, score=-2.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,

DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,

T_KAM_HTML_FONT_INVALID autolearn=ham autolearn_force=no version=3.4.2

X-Spam-Status: No, score=2.9 required=3.0 tests=BAYES_50,HTML_MESSAGE,

HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,SPF_HELO_PASS,URIBL_BLACK

autolearn=no autolearn_force=no version=3.4.2

For instance, rule RCVD_IN_DNSWL_NONE is run for the first mail but 
not for the second.


Why is that?


perhaps the rule did not match, that's how spam score is evaluated.
did those mails come from the same host?


Thanks.

No mails did not come from the same host.

I am a little in the dark here!

Why does it matter where the mails came from? In my 
/etc/spamassassin/local.cf I have nothing about trusted networks.


Is it so that the list of rules only show rules that contribute to the 
score?


What do you mean by a rule did not match?






Re: Disabling autolearn on given rule

2021-09-22 Thread Bert Van de Poel
This is complete news to me! Based on the activity on the dev list, I 
had assumed there were still 10-20 people devoting some of their time to 
developing SA. If you are the only one, that of course changes my view 
very much, and would be something worth communicating in some spot. When 
I asked about my Bayes bugs in this list a long time ago, I also got 
very mixed responses on whether my suggested solutions to the bugs I 
found through discussion on the list were actually the right ones, so I 
filed those bugs specifically to get feedback on whether my solutions 
were deemed acceptable by SA developers (assuming there was a whole team 
working on SA either in the evenings or as part of their job at a 
company that heavily uses SA). If the idea is that bugs will most 
probably never get resolved except if you write and submit patches to 
solve them, that's completely understandable if there are barely any 
developers or maintainers, but then people have to be told of course.


Maybe it would then also be a good idea to start some kind of bug review 
project, similar to how projects like Inkscape have been asking their 
community to retest *all* bugs, where members from the mailing list and 
other SA users are encouraged to go through a few bugs at a time, 
starting with the very oldest ones, to check whether they're still valid 
and otherwise close them. There are currently 373 unresolved bugs on 
bugzilla (if that counter can be trusted, it's the same amount of bugs I 
get under "my bugs", which seems suspicious), I wouldn't be surprised if 
over half of those were questions or about things that have long been 
resolved or become irrelevant. For example, I'm guessing 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5679 can be closed 
since if this problem had persisted, there would be a ton of reports of 
those still ongoing.


What do you think?

I would also like to point out, as a sort of PS, that while I do 
understand that Perl isn't rocket science, there is quite a barrier due 
to Perl's reputation and the decreasing number of people with experience 
in Perl. If I'm brutally honest, I would have probably already fixed 
those 4 bugs I reported myself if SA was on GitHub and written in 
Python, since I could most probably read the code more easily, and 
especially submit my changes more easily. I do understand that SA is 
like that for historic reasons, and I don't think a rewrite would be 
sensible at all, but I wouldn't underestimate how much of a deterrent 
the combination of Perl, Bugzilla, SVN and email patch submission is for 
new FOSS developers used to the newer languages and GitHub. I for one 
have no idea how I would submit a fix to SA once I've written it, to 
give a concrete example. I'm guessing I just paste the patch to a 
Bugzilla comment and hope someone merges it?


Anyway, this is way offtopic for Matt's initial issue, but probably 
still relevant since he's hoping to fix it himself.


On 22/09/2021 10:54, Henrik K wrote:

On Wed, Sep 22, 2021 at 10:45:43AM +0200, Bert Van de Poel wrote:

I hope I'm not passing on too much of a negative message. It would be great
of someone had a look at the Bayes autolearn code. I think it would be a
great service to the community!

The fact is that there really aren't any active developers around these
days.  We are no different from any other semi-active open source project.
I can only give so much of personal free time to "service the community".
The community is supposed to try to take care of itself, so where are all
the volunteers?  :-) Doing Perl is not rocket science, but getting familiar
with SA internals can be daunting.  I can help with that, but someone needs
to step up with decend effort.





Re: Disabling autolearn on given rule

2021-09-22 Thread Bert Van de Poel
I think having a look at the code itself is a good idea. I'm not sure if 
it's up-to-date but you can find some information on 
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/DevelopmentStuff


I've found that just reporting issues on SA's bugzilla is completely 
useless since it's just used as a fancy interface to display email 
conversations of the development list. Newly reported bugs or issues 
often go ignored by email and their status is never changed since no one 
uses the interface to manage bugs, this means that bugzilla is filled to 
the brim with hundreds of bugs marked as new, of which some are actual 
bugs and large parts are just questions or fixed problems that were 
never closed. Bugzilla is also very buggy, for example when I press "my 
bugs", I get a list of 373 bugs, some predating the existence of my 
account, and obviously I didn't take part in the discussion of almost 
all of them. So keep in mind that Bugzilla can be untrustworthy and that 
the dev mailing list mentioned on 
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/mailinglists is 
connected to that.


If you're planning to work on the Bayes plugin, I can tell you there are 
several problems with it I've reported in the past that have gone ignored:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7904
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7905
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7906
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7907
I assume many others have also reported valid bugs, but they can be hard 
to find between the many questions that have been asked on 
https://bz.apache.org/SpamAssassin/buglist.cgi?quicksearch=bayes_id=34478 
and I'm also not too sure we can trust the search functionality.


I hope I'm not passing on too much of a negative message. It would be 
great of someone had a look at the Bayes autolearn code. I think it 
would be a great service to the community!


Bert

On 22/09/2021 03:29, Matt Corallo wrote:



On 9/21/21 18:01, Loren Wilton wrote:

None of these seem to accomplish disabling learning for a specific rule


I think the problem is that I believe Bayes works off of the total 
score, and probably only sees rule names as more tokens, if it sees 
them at all. If it indeed works off the total score, about all you 
can do is somehow tweak that score for a given rule or rule combination.


Right, I expected roughly as much from the docs I could find. Two 
things, then:


(1) maybe time to revisit the old discussions of providing this as a 
default feature?,
(2) where would I go to look at building a plugin for this? Ideally 
something that ends up upstream, but though I can write code, I know 
no perl :).


Matt




Re: Does anyone know what generates these email headers?

2021-09-08 Thread Bert Van de Poel
By default any PHP script that's sending an email will contain 
X-PHP-Originating-Script on several Linux distros, even though it's not 
the official default (see 
https://www.php.net/manual/en/mail.configuration.php , one of the first 
Google results). It's a pretty common occurrence to see that header in 
automated emails of all kinds (e.g. registration confirmation emails, 
notifications, login link emails). Alone it's a sign of spam nor ham, 
but combined with other things it can be interesting. The others don't 
ring a bell for me.


Bert

On 8/09/2021 23:27, Loren Wilton wrote:

I'm getting a lot of mails with some very curious headers in them.
I tried searching with Google, and it has never heard of many of these 
strings.

Does anyone recognize what might be generating these headers?

X-EOPTenantAttributedMessage
X-EmailAdvisor
X-Mxtb-Transitionid
X-MG-Subscriptionuid
X-PHP-Originating-Script
X-EmailTransmit-type
CMM-X-SID-Result
CMM-X-AUTH-Result
CMM-X-Message-Status
X-OutGoing-Spam-Status
X-EmailTransmit-aid
X-rext

Thanks!

   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com





Re: Office phish

2021-06-30 Thread Bert Van de Poel
SpamAssassin has plugins for PhishTank and OpenPhish. I would suggest 
you submit the link to them.
You can also reach out to the domain provider, hosting provider(s) and 
other companies involved.



On 30/06/2021 21:51, Alex wrote:

Hi,
Would anyone like to help me block this office phish? It includes an
HTML file that presents an O365 login page:

https://pastebin.com/JMSrY6KU

More javascript in an HTML file.





Re: Gmail spam filters

2021-06-17 Thread Bert Van de Poel

Dear Bowie,

I'm afraid this really isn't a question for this email list, since it 
has nothing to do with SpamAssassin.


However, to not just send you off with nothing: IP reputation plays a 
big role for Google. If you're hosted by a provider like OVH, that seems 
to serve lots of cybercriminals, your IP might have been previously used 
for spamming and therefore just has a bad reputation already. Spammers 
nowadays also more often set up SPF, DKIM and DMARC properly. If you've 
made sure you have SSL/TLS enable, SPF, DKIM and DMARC set up, reverse 
DNS, DNS, and your email server's domain are all set up properly, then 
really the best thing you can do is give it time and ask people to mark 
your emails as "not spam" in the mean time. You may also consider 
changing providers/IP if you're with a more notorious provider.


I'm afraid you really can't do much more. It's quite unfair but it's the 
way things work I'm afraid. But again, this really isn't a question for 
this list. Perhaps try Libera IRC, some forum or something like Reddit?


Kind regards,
Bert

On 17/06/2021 17:42, Bowie Bailey wrote:
This is a bit off-topic, but I'm hoping someone here might have some 
suggestions.


We are having a problem getting mail to Gmail users.  It almost always 
ends up in their spam folder.  I have set up SPF, DKIM, and DMARC.  
The mail-tester.com email test gives a 10/10 for the test emails I 
have sent to it.  The information I've been able to find from Google 
is completely unhelpful.  I tried signing up for their postmaster 
tools, but my volume is too low to show any data.


Does anyone have any tips on how to get mail through Gmail's spam 
filters?


Thanks,

Bowie





Re: Detect Emoticons in Subject

2021-05-20 Thread Bert Van de Poel
We've started getting lots of spam with emoji in the subject too the 
past few weeks, so I've looked into this as well. As mentioned by RW, 
you would need to create some kind of UTF8 regex header Subject rule. As 
I'm not too excited about writing such a regex, it's way at the bottom 
of my todo list to contemplate whether an SA plugin could be written for 
that and to then reach out to the SA developers to see whether that 
would be something upstream would accept. But honestly, I won't be able 
to any time soon (I don't have the time). Still, thought I'd mention it, 
since it might be relevant to your question. If you do end up figuring 
out a regex that works out and isn't an extreme length, I think plenty 
of people on this list would love to know!


Bert

On 20/05/2021 18:19, RW wrote:

On Thu, 20 May 2021 11:42:59 -0400
Clive Jacques wrote:


Hi,

I've been using SA a long time.  Lately, I'm getting more and more
spam with emoticons in the subject line.  I'd say about 90% of my
emails with emoticons in the subject are spam.  I'd like to create a
local rule which scores email with emoticons in the subject.
# Local Rule for Emoticons in subject
subjectEMOTICON_IN_SUBJECT  Subject =~ /\p{Emoticons}/

The rule should start with "header", that's what's causing the lint
failure.

However, AFAIK, the rule still won't work because \p{Emoticons}
isn't supported in spamassassin, which works on byte sequences. You
need to rewrite it to match UTF-8 bytes.




Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-10 Thread Bert Van de Poel

Dear Loren,

Thank you very much for your email. Based on your message I could deduce 
there were earlier messages (which I then read through a web archive). 
For some unexplained reason I never received the previous 3 responses to 
my email. I hope the university network isn't randomly over-filtering 
spam again (we've had those kinds of problems for a while now, it's 
quite a problem, we are much more careful about how we mark spam).


Based on what I've read, I agree that this is indeed a bug (or actually 
several). I've filed the following bug reports:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7904 (missing body 
types, as mentioned by RW)
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7905 (meta tflags=net 
tests are ignored)
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7906 (meta 
tflags!=net tests are always header tests)
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7907 (better support 
for meta tests in autolearning in general, with 2 possible solutions)


Thank you very much to RW and Matus Uhlar for helping me figure out what 
code to look at and for al three of you to confirm that this is clearly 
a set of bugs.


Feel free to file more bugs if you consider there are more based on my 
issue, as well as to give support, write suggestions or submit patches 
on the bugs I have already filed.


Kind regards,
Bert Van de Poel

On 10/05/2021 06:41, Loren Wilton wrote:

so you don't have points from body rules.

your mentioned URI_DEOBFU_INSTR is a meta rule:

meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST

so maybe it's not considered.


They are treated as header, or ignored if marked as net.


I think a bug report should be submitted for this.

Either they should be treated split 50/50 as header and body score, or 
when the metas are built they shoudl have a "body rule" flag, and that 
used to determine where the score goes.


I tried, but for some reason apache decided that I'm evil and blocked 
the submission attempt, so someone else can do it.


   Loren





Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-08 Thread Bert Van de Poel

Dear fellow Spamassassin users,

I recently noticed that quite a lot of spam emails with high scores 
weren't marked for Bayes autolearning. While some senders and receivers 
were a common match, explaining why autolearn was nog, there was no 
clear explanation for other cases. I therefore put Spamassassin in debug 
mode to check in more detail, and noticed that fairly often autolearn is 
not used because the minimum score for body tests isn't achieved. After 
looking at some specific cases, it seems however that several rules are 
either not considered when calculating the header rule score and body 
rule score for Bayes autolearning. I've always presumed these scores are 
calculated based on whether the underlying rule performs a regex on a 
header or on the body, but now I'm not so sure any more. I hope you can 
help clear up whether this is intended behaviour (and what that 
behaviour is) or whether I should report this as a bug.


One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I understand 
it correctly a URI test that's performed on the body. Should a test like 
this be counted towards the body score count? Then there's the question 
of meta rules such as MONEY_NOHTML. If you resolve the different meta 
levels within this rule, it's a combination of header and body, however 
it's only counted towards the header score. Finally, it seems as if 
custom rules I've added within local.cf aren't considered. Is that 
indeed the case (and if so, is that by design)? I'm also not completely 
sure if UNWANTED_BODY_LANGUAGE and tests like razor, pyzor and DCC are 
considered for body scores.


Within the same realm, I'm also wondering whether these expected numbers 
for body and header can be tweaked and if so, how. For example the case 
below isn't autolearned even though it has a huge score and a vast 
amount of tests going off, but seemingly not enough body-related scores. 
Is that really the intended behaviour?


May  8 10:40:32 mail amavis[4076058]: (4076058-16) 
header_edits_for_quar:  -> 
, Yes, score=24.619 tag=- tag2=5 kill=7.5 
tests=[ADVANCE_FEE_3_NEW_MONEY=0.001, 
AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1, 
FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095, 
FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25, 
FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001, 
FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001, 
FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001, 
FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001, 
MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202, 
MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001, 
PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996, 
SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593, 
TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=no 
autolearn_force=no


Thank you in advance for your help. If you need any more examples or 
would us to run some tests, then feel free to let me know.


Kind regards,
Bert Van de Poel
ULYSSIS



Re: Why does sa-compile access the bayes db?

2020-05-28 Thread Bert Van de Poel
Oh, I had misunderstood you, Matus. My bad! I thought you meant we 
should use a separate bayes db for every mailbox user, but now I 
understand you were referring to the amavis user which indeed runs 
everything.


I just moved the existing bayes db (after stopping amavis of course) to 
the amavis user's .spamassassin folder and removed the path from 
local.cf and it seems to work just fine and indeed solves our issue with 
sa-compile. Thank you very much for the suggestion. This is a much 
cleaner solution than what I had initially in mind!



On 28/05/2020 17:03, Matus UHLAR - fantomas wrote:

On 28.05.20 15:32, Bert Van de Poel wrote:
Almost all of the email we process are forwarders. It doesn't really 
make sense for us to do a non-global bayes db. The large majority of 
email we process is also for a uniform group: student organizations 
at our local university.


you have apparently missed what I said before, so I repeat:

you said you use amavis.  amavis daemon runs (usually) under amavis 
user. Therefore, all mails processed by amavis use amavis' bayes 
database stored

in amavis home directory.

move the database to amavis' home (and chown it to the amavis user):

# ls -la ~amavis/.spamassassin/
total 41368
drwx-- 2 amavis amavis 4096 May 28 16:59 .
drwxr-x--- 7 amavis amavis 4096 May 28 06:50 ..
-rw--- 1 amavis amavis    89136 May 28 17:01 bayes_journal
-rw--- 1 amavis amavis 21065728 May 28 16:59 bayes_seen
-rw--- 1 amavis amavis 40144896 May 28 16:59 bayes_toks
-rw-r--r-- 1 amavis amavis 2304 May  5 12:41 user_prefs

Then remove global setting of bayes database in 
/etc/spamassassin/local.cf

and your problem will most probably to away.


On 28.05.20 13:38, Bert Van de Poel wrote:

We're using a global bayes_path defined in local.cf:



On 28/05/2020 15:22, Matus UHLAR - fantomas wrote:

This is your problem imho.

if you use amavis, you need no bayes database, but amavis users',
i guess in /var/lib/amavis/.spamassassin/



On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:

On 25.05.20 23:34, Bert Van de Poel wrote:
Recently, we've been setting up Bayesian learning on our existing 
Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin 
3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've 
decided to use a global db that was seeded with an aggregation of 
spam and ham we've received, then enabling autolearn to further 
train the set. As Spamassassin runs inside Amavis, the Bayes 
database files are owned by the amavis user. This setup works 
fine, and results for Bayes are great and growing in accuracy by 
autolearning.


What was somewhat confusing is that we noticed our daily cronjob 
running sa-update and sa-compile was giving us an error 
concerning permissions:
May 25 00:31:25.488 [8381] warn: bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update 
ignored: Permission denied
bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update 
ignored: Permission denied


I wonder where did these files come from.
did you sety bayes_path in /etc/spamassassin/  ?










Re: Why does sa-compile access the bayes db?

2020-05-28 Thread Bert Van de Poel
Almost all of the email we process are forwarders. It doesn't really 
make sense for us to do a non-global bayes db. The large majority of 
email we process is also for a uniform group: student organizations at 
our local university.


On 28/05/2020 15:22, Matus UHLAR - fantomas wrote:

On 28.05.20 13:38, Bert Van de Poel wrote:

We're using a global bayes_path defined in local.cf:


This is your problem imho.

if you use amavis, you need no bayes database, but amavis users',
i guess in /var/lib/amavis/.spamassassin/



On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:

On 25.05.20 23:34, Bert Van de Poel wrote:
Recently, we've been setting up Bayesian learning on our existing 
Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin 
3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided 
to use a global db that was seeded with an aggregation of spam and 
ham we've received, then enabling autolearn to further train the 
set. As Spamassassin runs inside Amavis, the Bayes database files 
are owned by the amavis user. This setup works fine, and results 
for Bayes are great and growing in accuracy by autolearning.


What was somewhat confusing is that we noticed our daily cronjob 
running sa-update and sa-compile was giving us an error concerning 
permissions:
May 25 00:31:25.488 [8381] warn: bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update 
ignored: Permission denied
bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update 
ignored: Permission denied


I wonder where did these files come from.
did you sety bayes_path in /etc/spamassassin/  ?








Re: Why does sa-compile access the bayes db?

2020-05-28 Thread Bert Van de Poel

We're using a global bayes_path defined in local.cf:

use_bayes 1
use_bayes_rules 1
bayes_auto_learn 1
bayes_expiry_max_db_size 150
bayes_path /var/lib/spamassassin/bayes_db/bayes
bayes_file_mode 0775
bayes_ignore_to spam-analy...@ulyssis.org
bayes_ignore_from spam-analy...@ulyssis.org
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 10.0

score BAYES_00  -0.001 -0.001 -0.001 -0.001
score BAYES_05  -0.001 -0.001 -0.001 -0.001
score BAYES_20  -0.001 -0.001 -0.001 -0.001
score BAYES_40  -0.001 -0.001 -0.001 -0.001
score BAYES_50  0.001 0.001 0.001 0.001
score BAYES_60  0.001 0.001 0.001 0.001
score BAYES_80  0.001 0.001 0.001 0.001
score BAYES_95  0.001 0.001 0.001 0.001
score BAYES_99  0.001 0.001 0.001 0.001
score BAYES_999 0.001 0.001 0.001 0.001

Currently we're still evaluating the amount of false positives (and 
contacting users who seem to have broken cronjobs that confuse bayes) 
before taking away the artificial scores. We wanted to clear up our 
sa-compile cronjob error.



On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:

On 25.05.20 23:34, Bert Van de Poel wrote:
Recently, we've been setting up Bayesian learning on our existing 
Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin 
3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided 
to use a global db that was seeded with an aggregation of spam and 
ham we've received, then enabling autolearn to further train the set. 
As Spamassassin runs inside Amavis, the Bayes database files are 
owned by the amavis user. This setup works fine, and results for 
Bayes are great and growing in accuracy by autolearning.


What was somewhat confusing is that we noticed our daily cronjob 
running sa-update and sa-compile was giving us an error concerning 
permissions:
May 25 00:31:25.488 [8381] warn: bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update 
ignored: Permission denied
bayes: cannot write to /var/lib/spamassassin/bayes_db/bayes_journal, 
bayes db update ignored: Permission denied


I wonder where did these files come from.
did you sety bayes_path in /etc/spamassassin/  ?






Re: Why does sa-compile access the bayes db?

2020-05-27 Thread Bert Van de Poel

Plugin initialization+journal sync would make a lot of sense.

What would be the cleanest solution in that case? It's quite annoying to 
receive the same error mail every day. Should we use --cnf to disable 
the bayes plugin, or is there a more elegant solution? Should we file a 
bug about this?



On 26/05/2020 00:45, RW wrote:

On Mon, 25 May 2020 23:34:27 +0200
Bert Van de Poel wrote:



My question therefore specifically is: what exactly does sa-compile
do to the bayes database files?

I don't know for sure, but it's probably just a side-effect of
initializing plugins. Possibly it's trying to perform an opportunistic
sync on the journal file.

sa-compile doesn't need to access Bayes, so you could just treat it as
a cosmetic error. I wouldn't change ownership or permissions just for
this.




Why does sa-compile access the bayes db?

2020-05-25 Thread Bert Van de Poel

Dear Spamassassin users and developers,

Recently, we've been setting up Bayesian learning on our existing Amavis 
with Spamassassin setup on Ubuntu 18.04 (Spamassassin 
3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided to 
use a global db that was seeded with an aggregation of spam and ham 
we've received, then enabling autolearn to further train the set. As 
Spamassassin runs inside Amavis, the Bayes database files are owned by 
the amavis user. This setup works fine, and results for Bayes are great 
and growing in accuracy by autolearning.


What was somewhat confusing is that we noticed our daily cronjob running 
sa-update and sa-compile was giving us an error concerning permissions:
May 25 00:31:25.488 [8381] warn: bayes: cannot write to 
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update ignored: 
Permission denied
bayes: cannot write to /var/lib/spamassassin/bayes_db/bayes_journal, 
bayes db update ignored: Permission denied


While this makes a lot of sense, considering that the files are owned by 
the amavis user, we were quite surprised this cronjob would need to 
access these files in the first place. Looking further into the issue, 
we figured out it was specifically sa-compile, and the specific message 
probably originated from 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm. While I have some 
programming experience, I was sadly unable to understand this Perl file 
enough to properly comprehend why this code was accessing bayes_journal 
and what it was planning to do there.


My question therefore specifically is: what exactly does sa-compile do 
to the bayes database files?


I've asked this same question on IRC but was unable to get an answer. 
While a fix for this issue changing permissions and user/group ownership 
is rather obvious, we'd first want to understand what sa-compile is up to.


Kind regards,
Bert Van de Poel
ULYSSIS




Custom rule aware of occurrences

2019-09-15 Thread Bert Van de Poel

Dear fellow Spamassassin users,

I'm contacting you as a member of ULYSSIS. ULYSSIS is a student 
non-profit organisation at the University of Leuven trying to make 
computers and technology more approachable and available to students. As 
part of this objective, we run a hosting service within our university's 
network for student organisations, student unions and individuals at our 
university.


We've battled with spam from time to time, since we seem to attract a 
lot of exotic languages which are rather well able to circumvent 
commonly used methods. This has had us resort to some custom rulesets to 
battle against mostly targetted French and SEO spam often coming from 
very respectable servers and very normal addresses.


Now because SEO spam specifically has been adapting quite well to any 
rule we think of (finding alternative ways of saying the same thing time 
and time again), I was hoping to write a rule that basically boiled down 
to "give some spam score to emails that contain the word SEO 3 or more 
times" to push those already being detected by other rules over the 
edge. To be clear, this will be a low score rule, I'm aware that ham can 
perfectly well contain that word 3 times, just like this email for 
example. Now while investigating I started wondering how to tackle that 
some spam will just have a plain text body, while others will also 
feature HTML, which means that suddenly the amount may double/half. 
Beyond that it seems quite hacky to use a regex that boils down to 
something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of something that is 
properly aware of the count of certain words.


Since I sort of expected Spamassassin to have a solution for both the 
text/text+html and the counting problems, I asked around on IRC but was 
pointed here. So uhm, any suggestions or pointers are more than welcome. 
Not too sure if any more information is required, but feel free to ask 
questions or corect my presumptions if necessary.


Kind regards,
Bert Van de Poel
ULYSSIS
University of Leuven