Re: sa-learn on an Exchange public folder

2023-12-05 Thread jahlives

Does anything "speak" against just fetching the message from said folder
(ex getmail or fetchmail) and feed them to sa-learn? At least for
getmail one can define a filter section which then calls sa-learn and
give it the message for learning. I use a getmail config like this

[retriever]
type = SimpleIMAPRetriever
server = MyServer
port = 143
username = u...@example.com
password = TopSecret
timeout = 180
mailboxes = ("INBOX.Spam",)

[filter-salearn]
type = Filter_classifier
path = /usr/bin/sa-learn
arguments = ("--spam",)
user = spamassassin
group = spamassassin
ignore_stderr = True

[destination]
type = Maildir
path = /sa-learn/spam/
user = spamassassin


Have a good one

tobi

On 04/12/2023 18:23, Kris Deugau wrote:

Emmanuel Seyman wrote:


Hello all.

I've set up SA at $WORK and now want to train the bayesian classifier.
To that end, a public folder has been setup on our Exchange server and
I want to run sa-learn on any email that is transferred to it.

I'm guessing this is a popular thing to do and that there would already
be a wrapper around sa-learn on github but my Google-foo seems to be
off today.

Is there such a wrapper or do I have to write my own script?


Have a look at http://deepnet.cx/~kdeugau/spamtools/imap-learn. It
looks like the link for the original script I mangled to create that
has moved to https://dmzs.com/tools/files/spam.php.

Fair warning, I gave up on using IMAP for feeding Bayes locally
because it started to glitch out and fail for no reason I could see. 
But the mailboxes I'm learning from are maildir on a *nix platform,
not whatever black box Exchange hides things in.

-kgd


Re: sa-learn on an Exchange public folder

2023-12-04 Thread Bill Cole

On 2023-12-03 at 14:58:36 UTC-0500 (Sun, 3 Dec 2023 20:58:36 +0100)
Emmanuel Seyman 
is rumored to have said:


Hello all.

I've set up SA at $WORK and now want to train the bayesian classifier.
To that end, a public folder has been setup on our Exchange server and
I want to run sa-learn on any email that is transferred to it.

I'm guessing this is a popular thing to do and that there would 
already

be a wrapper around sa-learn on github but my Google-foo seems to be
off today.

Is there such a wrapper or do I have to write my own script?



I am aware of no such script. The overwhelming majority of sites using 
SA use operating systems other than Windows and mail servers using open 
format standards like mbox and Maildir. Last I knew, Exchange folders 
were binary blobs in a format (PST?) that MS either does not document or 
documents poorly, but that could be a decade or more out of date.


SpamAssassin understands the standard format of Internet mail messages 
as defined in RFC822 and its successors. It also understands a few 
simple ways that RFC822 messages are packaged together (mbox, mbx, 
bsmtp) but Exchange only uses that format for sending mail over the 
Internet, while it uses its own proprietary formats internally.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: sa-learn on an Exchange public folder

2023-12-04 Thread Benny Pedersen

Kris Deugau skrev den 2023-12-04 18:23:

Fair warning, I gave up on using IMAP for feeding Bayes locally because 
it started to glitch out and fail for no reason I could see.  But the 
mailboxes I'm learning from are maildir on a *nix platform, not 
whatever black box Exchange hides things in.


+1

https://gitlab.com/isbg/isbg i just prefer python :)

that sayed dovecot sieve via roundcube integration, might kill exchange 
one time for all





Re: sa-learn on an Exchange public folder

2023-12-04 Thread Kris Deugau

Emmanuel Seyman wrote:


Hello all.

I've set up SA at $WORK and now want to train the bayesian classifier.
To that end, a public folder has been setup on our Exchange server and
I want to run sa-learn on any email that is transferred to it.

I'm guessing this is a popular thing to do and that there would already
be a wrapper around sa-learn on github but my Google-foo seems to be
off today.

Is there such a wrapper or do I have to write my own script?


Have a look at http://deepnet.cx/~kdeugau/spamtools/imap-learn.  It 
looks like the link for the original script I mangled to create that has 
moved to https://dmzs.com/tools/files/spam.php.


Fair warning, I gave up on using IMAP for feeding Bayes locally because 
it started to glitch out and fail for no reason I could see.  But the 
mailboxes I'm learning from are maildir on a *nix platform, not whatever 
black box Exchange hides things in.


-kgd


Re: sa-learn using multiple CPUs?

2021-04-16 Thread Benny Pedersen

On 2021-04-16 03:29, John Hardin wrote:


So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


i dont like to see redis needs sysctl non default settings

so much more power does redis not have

imho one could use memory engine in mysql, and then periodly dump to 
sql, or copy from memory to csv in mariadb, both memory engine and csv 
engine is very low mem frindly while still performing fast access


maybe i am wroung, i just use postgresql


Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

How hard is it to keep list mail on list and not reply directly to sender?

Have you seen
https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ 
?


there may be some helpful info in there.

On 4/16/21 9:47 AM, Christian Völker wrote:
Thanks for the hint. I will monitor it. The machine has 16GB of memory 
which should be sufficient but I already notivce the preallocation of 
redis with 2GB.


It is somehow unclear what happens. If there is no limit I will get an 
OOM errror and redis will (if killed) loose the last transactions after 
the last "save 900 1" snapshot, right?


If I set a limit it will discard the oldest entries, correct?

Both seems not to be perfect for Spamassassin.

However, I will ignore the topic for the moment and see how it goes. 
16GB shoud (hopefully) be enough. Once scanned the expired rules of 
Spamassassin should take place and reduce the amount of memory.


Greetings

/Christian




Am 16.04.2021 um 09:15 schrieb Axb:

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used 
(flock yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall 
was according to "top" only at 25% as top showed 75% idle. I assume 
there is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in 
parallel. Increasing or decreasing the number of jobs does not 
significally change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian













Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian








Re: sa-learn using multiple CPUs?

2021-04-16 Thread Christian Völker

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian





Re: sa-learn using multiple CPUs?

2021-04-16 Thread Christian Völker

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 24hrs. 
My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there is 
some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me about 
25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally change 
the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian



Re: sa-learn using multiple CPUs?

2021-04-15 Thread John Hardin

On Thu, 15 Apr 2021, Christian Völker wrote:


Hi,

so I did some testing.

When using bayes_ files as backend and flock only a single process consumes 
CPU (strange, I have seen different behaviour before).
When using MariaDB as backend all processes use CPU and share them with the 
MariaDB process.


So I will re-configure my installation to use MariaDB.


You should also consider the Redis backend.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our politicians should bear in mind the fact that
  the American Revolution was touched off by the then-current
  government attempting to confiscate firearms from the people.
---
 4 days until the 246th anniversary of The Shot Heard 'Round The World

Re: sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi,

so I did some testing.

When using bayes_ files as backend and flock only a single process 
consumes CPU (strange, I have seen different behaviour before).
When using MariaDB as backend all processes use CPU and share them with 
the MariaDB process.


So I will re-configure my installation to use MariaDB.


Thanks for your input!

/Christian



Am 15.04.2021 um 15:07 schrieb Henrik K:

If you insist on file bayes, atleast make sure you use "lock_method flock".
Or maybe BDB backend, don't remember if it's faster.


On 4/15/21 2:45 PM, Christian Völker wrote:

Hi,

well, here it is not I/O bound (running on RAID1-SSDs). I am using the
"default" file based backend ~/.spamassassin/bayes*.

40msg/sec is not really fast enough for me. The number of messages to be
processed is really huge.

So again asking: is it possible with the file-based dbackend to do this
stuff in parallel?

Thanks

/Christian

Am 15.04.2021 um 14:38 schrieb Axb:

Depending on your Bayes backend, your bottleneck will not be the
CPUs but I/O.
Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database
with sa-learn.

I have a machine with six vCPUs and obviously I would like to
speed up the learning process. I am thinking of running six
sa-learn processes in parallel. Is there any issue with this
like locks for the database?

Or is sa-learn itself multithreaded and I do not need to run it
in parallel (does not look so)?

Next, when running the above in parallel (if possible) should I
use the "--no-sync" and do the syncing afterwards? But again,
this is then only single-threaded, right?

Thanks a lot for your input!

/Christian








Re: sa-learn using multiple CPUs?

2021-04-15 Thread Henrik K


If you insist on file bayes, atleast make sure you use "lock_method flock". 
Or maybe BDB backend, don't remember if it's faster.

> On 4/15/21 2:45 PM, Christian Völker wrote:
> > Hi,
> > 
> > well, here it is not I/O bound (running on RAID1-SSDs). I am using the
> > "default" file based backend ~/.spamassassin/bayes*.
> > 
> > 40msg/sec is not really fast enough for me. The number of messages to be
> > processed is really huge.
> > 
> > So again asking: is it possible with the file-based dbackend to do this
> > stuff in parallel?
> > 
> > Thanks
> > 
> > /Christian
> > 
> > Am 15.04.2021 um 14:38 schrieb Axb:
> > > Depending on your Bayes backend, your bottleneck will not be the
> > > CPUs but I/O.
> > > Normally there's no need for running multiple sa-learn instances.
> > > 
> > > My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.
> > > 
> > > On 4/15/21 2:33 PM, Christian Völker wrote:
> > > > Hi all,
> > > > 
> > > > I am going to add some large spam archives for my Bayes database
> > > > with sa-learn.
> > > > 
> > > > I have a machine with six vCPUs and obviously I would like to
> > > > speed up the learning process. I am thinking of running six
> > > > sa-learn processes in parallel. Is there any issue with this
> > > > like locks for the database?
> > > > 
> > > > Or is sa-learn itself multithreaded and I do not need to run it
> > > > in parallel (does not look so)?
> > > > 
> > > > Next, when running the above in parallel (if possible) should I
> > > > use the "--no-sync" and do the syncing afterwards? But again,
> > > > this is then only single-threaded, right?
> > > > 
> > > > Thanks a lot for your input!
> > > > 
> > > > /Christian
> > > > 
> > > > 
> > > 
> > > 
> > 
> 


Re: sa-learn using multiple CPUs?

2021-04-15 Thread Axb

Please keep list mail on list!
if you run parallel sa-learn instances you'll run into locked DB errors.
With a SDBM backend it would be a bit faster but still lock up.
afaik, Redis backend won't have locking issues.
(dunno about SQL - I use Redis)

On 4/15/21 2:45 PM, Christian Völker wrote:

Hi,

well, here it is not I/O bound (running on RAID1-SSDs). I am using the 
"default" file based backend ~/.spamassassin/bayes*.


40msg/sec is not really fast enough for me. The number of messages to be 
processed is really huge.


So again asking: is it possible with the file-based dbackend to do this 
stuff in parallel?


Thanks

/Christian

Am 15.04.2021 um 14:38 schrieb Axb:
Depending on your Bayes backend, your bottleneck will not be the CPUs 
but I/O.

Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database with 
sa-learn.


I have a machine with six vCPUs and obviously I would like to speed 
up the learning process. I am thinking of running six sa-learn 
processes in parallel. Is there any issue with this like locks for 
the database?


Or is sa-learn itself multithreaded and I do not need to run it in 
parallel (does not look so)?


Next, when running the above in parallel (if possible) should I use 
the "--no-sync" and do the syncing afterwards? But again, this is 
then only single-threaded, right?


Thanks a lot for your input!

/Christian












Re: sa-learn using multiple CPUs?

2021-04-15 Thread Henrik K
On Thu, Apr 15, 2021 at 08:39:42AM -0400, Greg Troxel wrote:
>
> I don't know, but beware that if you have TXREP configured, and you do
> not use -L to sa-learn, I believe you will end up making DNSBL queries
> for all of them.

Thanks, TxRep actually seems to be the culprit.  Will look into it..

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7881



Re: sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi,


I don't know, but beware that if you have TXREP configured, and you do
not use -L to sa-learn, I believe you will end up making DNSBL queries
for all of them.


Good catch! I did not use "-L" so far and I am pretty sure there is 
nothing configured but from reading then man page it will not do any 
harm. So I will add "-L".


Besides of this a test run really cam up with 100% single CPU usage so I 
doubt it is doing the queries here.


Thanks!

/Christian



Re: sa-learn using multiple CPUs?

2021-04-15 Thread Greg Troxel

Christian Völker  writes:

> I am going to add some large spam archives for my Bayes database with
> sa-learn.
>
> I have a machine with six vCPUs and obviously I would like to speed up
> the learning process. I am thinking of running six sa-learn processes
> in parallel. Is there any issue with this like locks for the database?

I don't know, but beware that if you have TXREP configured, and you do
not use -L to sa-learn, I believe you will end up making DNSBL queries
for all of them.


signature.asc
Description: PGP signature


Re: sa-learn using multiple CPUs?

2021-04-15 Thread Axb
Depending on your Bayes backend, your bottleneck will not be the CPUs 
but I/O.

Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database with 
sa-learn.


I have a machine with six vCPUs and obviously I would like to speed up 
the learning process. I am thinking of running six sa-learn processes in 
parallel. Is there any issue with this like locks for the database?


Or is sa-learn itself multithreaded and I do not need to run it in 
parallel (does not look so)?


Next, when running the above in parallel (if possible) should I use the 
"--no-sync" and do the syncing afterwards? But again, this is then only 
single-threaded, right?


Thanks a lot for your input!

/Christian







Re: sa-learn, TXREP, network queries, documentation

2021-04-12 Thread RW
On Mon, 12 Apr 2021 09:40:47 -0400
Greg Troxel wrote:



>   3) sa-learn does not document that it is no longer for BAYES, but a
>   general interface to mechanisms that learn.  

It always was in theory. 

>   4) There is a bonus of txrep_learn_penalty for learning spam,
> default 20.  If the user says it is spam by calling learn, then I
> don't understand why it isn't just treated as score 20.  

Probably it's to keep retraining straightforward if you make a mistake. 

>   (I've added -L to my script that calls sa-learn.)

That sounds like a bad idea because you will be feeding it bogus data.
It's probably better to just turn-of TxRep when training from an
historic corpus and only run your periodic training on new mail.



Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-24 Thread Nix
On 14 Jan 2020, Henrik K. said:

> On Tue, Jan 14, 2020 at 12:05:57PM +, Nix wrote:
>> 
>> I've come to the conclusion that TxRep is essentially unmaintained and
>> basically doesn't work unless you use SQL storage, and have migrated
>> back to the AWL, which still works fine. I hope I'm wrong.
>
> There's only so much a few inactive developers can do, I don't even use
> TxRep (or AWL for that matter), so my priorities are on more critical
> things.  Feel free to contribute.  :-)

I tried, but couldn't make it work well enough to not lose functionality
at the same time as I gained the ability to not break locks.

> In any case one should use SQL if possible (and Redis for Bayes), file based
> databases have always been painful to use.

Honestly, for smaller installations a whole database server is just one
more thing to break. I'd only want one of those tied up with my email if
I was storing *the email itself* in the database as well.

-- 
NULL && (void)


Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-24 Thread Nix
On 14 Jan 2020, Dean Carpenter spake thusly:

> On 2020-01-14 7:05 am, Nix wrote:
>> On 8 Jan 2020, Benjamin Block told this:
>>
>> ... looks like it to me. It's at least spotting the lock and breaking
>> it, but it's still taking a second and a half to do it, and it happens
>> for each message. That's better than the 90s it used to take, but still
>> bad.
>>
>> I've come to the conclusion that TxRep is essentially unmaintained and
>> basically doesn't work unless you use SQL storage, and have migrated
>> back to the AWL, which still works fine. I hope I'm wrong.
>
> Are you saying that it DOES work cleanly when using SQL storage ? I've
> been using AWL with SQL for years and it's been "fine".

Other people seem to be saying that. I never tried SQL anything with
SpamAssassin, so I can't be sure.

-- 
NULL && (void)


Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-17 Thread RW
On Tue, 14 Jan 2020 12:05:57 +
Nix wrote:


> I've come to the conclusion that TxRep is essentially unmaintained and
> basically doesn't work unless you use SQL storage, and have migrated
> back to the AWL, which still works fine. I hope I'm wrong.

I think people should think about whether they actually need TxRep. To
me it's an additional risk rather than a safety net.

TxRep looks to be hacked-out from AWL, it's complex and lacks
transparency. Most of its reported bugs are clearly visible, they
involve long delays, runtime errors and debug messages. The chance are
that these bugs are the tip of the iceberg. If it's also getting its
computed score wrong, it will have to be pretty bad, pretty often,
before anyone notices. 

Most of what it does doesn't seem well designed. I think in part this
is because it reuses AWL's database code and so sees everything as a
score-averaging problem.

The chief flaw in AWL was that it used the first-public IP address
from a forgeable received header. This potentially allows spammers to
exploit a good reputation if they can match email addresses to IP
address blocks. 

TxRep uses a trusted IP address which is mostly a step forward (except
for forwarded email where it's very much worse). However, in practice
this is rarely used and it uses DKIM or SPF reputations instead. 

Unfortunately TxRep appears to mishandle SPF and treats the header
"From" as being authenticated by a pass regardless of alignment with
the envelope sender. This can allow spam to abuse good reputations
without the spammer even trying.



Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-17 Thread Benjamin Block
On Tue, Jan 14, 2020 at 12:05:57PM +, Nix wrote:
> On 8 Jan 2020, Benjamin Block told this:
>
> > Now, if I run sa-learn again on the same folder (the manual says 
> > "SpamAssassin remembers which mail messages it has learnt already,
> > and will not re-learn those messages again, unless you use the --forget 
> > option.", so I think this is OK to do), it gets absurdly
> > slow, taking over 2 minutes for the same directory with 45 mails.
> >
> > + /usr/bin/sa-learn --no-sync --progress --ham 
> > /var/spool/fetchmail/Maildir/.Congstar
> >  92% [=]   0.30 msgs/sec 02m40s DONE
> > Learned tokens from 0 message(s) (49 message(s) examined)
> >
> > Now imagine this for a folder with over 2k messages (of which I have 
> > several).
>
> Possibly related to ?

Ah yes, I saw that as well, and thought it might be related. But I saw
they made changes in response to the bug, so I wasn't sure that still
applies.

>
> > Jan  8 23:49:52.209 [308] dbg: TxRep: reputation: none, count: 0, learning: 
> > -20, MSG_ID:
> > ec300f7aa9c95003b94439831b843605e9a94660@sa_generated
> > Jan  8 23:49:52.209 [308] dbg: auto-whitelist: add_score: new count: 1, new 
> > totscore: 20
> > Jan  8 23:49:53.710 [308] dbg: auto-whitelist: DB addr list: untie-ing and 
> > unlocking
> > Jan  8 23:49:53.715 [308] dbg: auto-whitelist: DB addr list: file locked, 
> > breaking lock
> > Jan  8 23:49:53.716 [308] dbg: locker: safe_unlock: unlink 
> > /var/spool/fetchmail/.spamassassin/tx-reputation.lock
>
> ... looks like it to me. It's at least spotting the lock and breaking
> it, but it's still taking a second and a half to do it, and it happens
> for each message. That's better than the 90s it used to take, but still
> bad.
>
> I've come to the conclusion that TxRep is essentially unmaintained and
> basically doesn't work unless you use SQL storage, and have migrated
> back to the AWL, which still works fine. I hope I'm wrong.

Hmm, interesting. Maybe I should try SQL then to see whether its faster
with that. Makes my setup more complex though, not a huge fan of that,
but OK.

Thanks,
 - Benjamin


Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-14 Thread Dean Carpenter

On 2020-01-14 7:05 am, Nix wrote:

On 8 Jan 2020, Benjamin Block told this:

... looks like it to me. It's at least spotting the lock and breaking
it, but it's still taking a second and a half to do it, and it happens
for each message. That's better than the 90s it used to take, but still
bad.

I've come to the conclusion that TxRep is essentially unmaintained and
basically doesn't work unless you use SQL storage, and have migrated
back to the AWL, which still works fine. I hope I'm wrong.


Are you saying that it DOES work cleanly when using SQL storage ?  I've 
been using AWL with SQL for years and it's been "fine".  Want to change 
up to TxRep with SQL, but now not so sure ...


Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-14 Thread Henrik K
On Tue, Jan 14, 2020 at 12:05:57PM +, Nix wrote:
> 
> I've come to the conclusion that TxRep is essentially unmaintained and
> basically doesn't work unless you use SQL storage, and have migrated
> back to the AWL, which still works fine. I hope I'm wrong.

There's only so much a few inactive developers can do, I don't even use
TxRep (or AWL for that matter), so my priorities are on more critical
things.  Feel free to contribute.  :-)

In any case one should use SQL if possible (and Redis for Bayes), file based
databases have always been painful to use.



Re: sa-learn absurdly slow on re-iterating over mailboxes (TxRep involved)

2020-01-14 Thread Nix
On 8 Jan 2020, Benjamin Block told this:

> Now, if I run sa-learn again on the same folder (the manual says 
> "SpamAssassin remembers which mail messages it has learnt already,
> and will not re-learn those messages again, unless you use the --forget 
> option.", so I think this is OK to do), it gets absurdly
> slow, taking over 2 minutes for the same directory with 45 mails.
>
> + /usr/bin/sa-learn --no-sync --progress --ham 
> /var/spool/fetchmail/Maildir/.Congstar
>  92% [=]   0.30 msgs/sec 02m40s DONE
> Learned tokens from 0 message(s) (49 message(s) examined)
>
> Now imagine this for a folder with over 2k messages (of which I have several).

Possibly related to ?

> Jan  8 23:49:52.209 [308] dbg: TxRep: reputation: none, count: 0, learning: 
> -20, MSG_ID:
> ec300f7aa9c95003b94439831b843605e9a94660@sa_generated
> Jan  8 23:49:52.209 [308] dbg: auto-whitelist: add_score: new count: 1, new 
> totscore: 20
> Jan  8 23:49:53.710 [308] dbg: auto-whitelist: DB addr list: untie-ing and 
> unlocking
> Jan  8 23:49:53.715 [308] dbg: auto-whitelist: DB addr list: file locked, 
> breaking lock
> Jan  8 23:49:53.716 [308] dbg: locker: safe_unlock: unlink 
> /var/spool/fetchmail/.spamassassin/tx-reputation.lock

... looks like it to me. It's at least spotting the lock and breaking
it, but it's still taking a second and a half to do it, and it happens
for each message. That's better than the 90s it used to take, but still
bad.

I've come to the conclusion that TxRep is essentially unmaintained and
basically doesn't work unless you use SQL storage, and have migrated
back to the AWL, which still works fine. I hope I'm wrong.


Re: sa-learn

2018-10-15 Thread cs993232
Great, and it works!
Kevin, you rocks!



--
Sent from: http://spamassassin.1065346.n5.nabble.com/SpamAssassin-Users-f3.html


Re: sa-learn

2018-10-15 Thread Kevin A. McGrail
I think it's possibly related to a misspelling.  See
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7636

Regards,
KAM
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Mon, Oct 15, 2018 at 3:55 PM cs993232  wrote:

>  Hi,
>
> Appreciated for any help on this.
> After upgrading to latest SpamAssassin 3.4.2 , sa-learn --spam
> /home/guests/userid/Mail/testspam
> I got the following error message, as I ran the saem command several days
> back on 3.4.1 so I am not sure what happened  but it does look like it has
> something to do with the upgrade:
>
> plugin: failed to parse plugin (from @INC): Can't locate
> Mail/SpamAssassin/Plugin/PhishTag.pm in @INC (@INC contains: lib
> /usr/local/share/perl5 /usr/local/lib64/perl5 /usr/lib64/perl5/vendor_perl
> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at (eval
> 94)
> line 1.
>
> Learned tokens from 0 message(s) (0 message(s) examined)
> FYI, my perl version is 5.10.1
>
> Thanks again,
> Steven
>
>
>
> --
> Sent from:
> http://spamassassin.1065346.n5.nabble.com/SpamAssassin-Users-f3.html
>


Re: sa-learn - not able to get a byes lock

2018-07-23 Thread Nick Bright

On 7/19/2018 1:22 PM, John Hardin wrote:


Is this something I should fix on a flat file bayes DB, or should I 
look at going to Bayes-SQL?


Redis would probably be better.
I'm having some trouble finding any kind of significant documentation 
about this.


Could somebody please point me at some reading?

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/readme.txt 
has *very* little information.


How does it work? how does one set it up? From what is available, this 
seems highly experimental and not advisable for a production server?


--
---
-  Nick Bright-
-  Vice President of Technology   -
-  Valnet -=- We Connect You -=-  -
-  Tel 888-332-1616 x 315 / Fax 620-331-0789  -
-  Web http://www.valnet.net/ -
---
- Are your files safe?-
- Valnet Vault - Secure Cloud Backup  -
- More information & 30 day free trial at -
- http://www.valnet.net/services/valnet-vault -
---



Re: sa-learn - not able to get a byes lock

2018-07-19 Thread John Hardin

On Thu, 19 Jul 2018, Nick Bright wrote:


On 7/19/2018 1:22 PM, John Hardin wrote:

Do you happen to have autolearn enabled? If so, turn it off.

In general, or just while trying to run sa-learn?


I think there's consensus that you leave it disabled initially, and do 
manual training to a base reliable state. Then if you don't have a pool of 
trustworthy users to provide training messages that you review, you can 
turn on autolearn.



Redis would probably be better.

I'll check it out, thanks.
Also, if you don't have autolearn enabled and you're using flat files, you 
could learn into an offline database and when done copy the files over to 
the live instance (ideally by directory renaming to minimize the window).


Thanks for the tip!


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If the rock of doom requires a gentle nudge away from Gaia to
  prevent a very bad day for Earthlings, NASA won’t be riding to the
  rescue. These days, NASA does dodgy weather research and outreach
  programs, not stuff in actual space with rockets piloted by
  flinty-eyed men called Buzz.   -- Daily Bayonet
---
 Tomorrow: the 49th anniversary of Apollo 11 landing on the Moon

Re: sa-learn - not able to get a byes lock

2018-07-19 Thread Nick Bright

On 7/19/2018 1:22 PM, John Hardin wrote:

Do you happen to have autolearn enabled? If so, turn it off.

In general, or just while trying to run sa-learn?

Redis would probably be better.

I'll check it out, thanks.
Also, if you don't have autolearn enabled and you're using flat files, 
you could learn into an offline database and when done copy the files 
over to the live instance (ideally by directory renaming to minimize 
the window).


Thanks for the tip!

--
---
-  Nick Bright-
-  Vice President of Technology   -
-  Valnet -=- We Connect You -=-  -
-  Tel 888-332-1616 x 315 / Fax 620-331-0789  -
-  Web http://www.valnet.net/ -
---
- Are your files safe?-
- Valnet Vault - Secure Cloud Backup  -
- More information & 30 day free trial at -
- http://www.valnet.net/services/valnet-vault -
---



Re: sa-learn - not able to get a byes lock

2018-07-19 Thread John Hardin

On Thu, 19 Jul 2018, Nick Bright wrote:

I've deployed SA into my environment, and I'm trying to add some training 
data. This is in a site-wide configuration, so it's a site-wide bayes file. 
The server is fairly active (several thousand mailboxes, hundreds of messages 
per second).


When attemting to sa-learn some spam, it runs for a few moments, then gets:

Jul 19 13:12:03.797 [5437] dbg: locker: safe_lock: trying to get lock on 
/var/spamassassin/bayes_db/bayes with 175 retries


It doesn't seem to get anywhere, it's been running for several minutes that 
way.


Do you happen to have autolearn enabled? If so, turn it off.

I suspect that this is simply because my activity level in bayes is higher 
than a flat file may support.


Is this something I should fix on a flat file bayes DB, or should I look at 
going to Bayes-SQL?


Redis would probably be better.

Also, if you don't have autolearn enabled and you're using flat files, you 
could learn into an offline database and when done copy the files over to 
the live instance (ideally by directory renaming to minimize the window).


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If the rock of doom requires a gentle nudge away from Gaia to
  prevent a very bad day for Earthlings, NASA won’t be riding to the
  rescue. These days, NASA does dodgy weather research and outreach
  programs, not stuff in actual space with rockets piloted by
  flinty-eyed men called Buzz.   -- Daily Bayonet
---
 Tomorrow: the 49th anniversary of Apollo 11 landing on the Moon

Re: sa-learn

2018-02-12 Thread Matus UHLAR - fantomas

On 11.02.18 19:09, Hendrik Haddorp wrote:
I have a maildir with about 2 mails. In the past this does 
not seem to have been a problem. But since a few weeks my 
sa-learn process dies with an OOM now.


On 11.02.18 20:10, Hendrik Haddorp wrote:
so far I was always letting it run once a week over my inbox in --ham 
mode and over my spam folder in --spam mode. all tutorials I saw did 
it the same way. this also worked for years but likely with less mail 
files. I was under the impression that sa-learn would skip messages 
that it already learned. the debug log also indicated that it 
recognized those.


The problem with this approach is that all those messages must be opened,
read from, parsed and only then it's possible to find out they have been
already trained so they can be skipped.

even if there's a memory bug in sa-learn and it can be fixed, it's still
very inefficient.

Luckily you have been advised a better approaches. Good luck.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I'm not interested in your website anymore.
If you need cookies, bake them yourself.


Re: sa-learn

2018-02-11 Thread RW
On Sun, 11 Feb 2018 19:09:32 +0100
Hendrik Haddorp wrote:

> Hi,
> 
> I have a maildir with about 2 mails. In the past this does not
> seem to have been a problem. But since a few weeks my sa-learn
> process dies with an OOM now. My server has only 1GB of memory with
> another GB for swap. sa-learn is eating up pretty much the complete
> memory for the run and is only able to finish when I stop everything
> else. Why is sa-learn using more and more memory even when it learned
> all those messages already in the past? 

I don't know, it sounds like a bug.

This is a bit of a long shot, but try tuning-off autoexpiry if you are
using it. 


Re: sa-learn

2018-02-11 Thread Hendrik Haddorp

thanks, I'll give that a try

On 11.02.2018 20:15, Reindl Harald wrote:



Am 11.02.2018 um 20:10 schrieb Hendrik Haddorp:
so far I was always letting it run once a week over my inbox in --ham 
mode and over my spam folder in --spam mode. all tutorials I saw did 
it the same way. this also worked for years but likely with less mail 
files. I was under the impression that sa-learn would skip messages 
that it already learned. the debug log also indicated that it 
recognized those.


but to recognize it needs to read them

man find
man xargs

find "$SA_MILTER_HOME/training/spam/" -type f -mtime -$TRAIN_DAYS | 
xargs -r sa-learn --max-size=0 --no-sync --spam
find "$SA_MILTER_HOME/training/ham/" -type f -mtime -$TRAIN_DAYS | 
xargs -r sa-learn --max-size=0 --no-sync --ham



On 11.02.2018 19:44, Matus UHLAR - fantomas wrote:

On 11.02.18 19:09, Hendrik Haddorp wrote:
I have a maildir with about 2 mails. In the past this does not 
seem to have been a problem. But since a few weeks my sa-learn 
process dies with an OOM now.


do you run sa-learn over whole maildir all the time?
why?

My server has only 1GB of memory with another GB for swap. sa-learn 
is eating up pretty much the complete memory for the run and is 
only able to finish when I stop everything else. Why is sa-learn 
using more and more memory even when it learned all those messages 
already in the past? Is there a way to limit the memory usage 
except from making the set of messages smaller?


you are not supposed to repeatedly call sa-learn over huge maildir.

calling over new mail (or, better, false-positives and 
false-negatives) is

faster and won't eat all your memory






Re: sa-learn

2018-02-11 Thread Hendrik Haddorp
so far I was always letting it run once a week over my inbox in --ham 
mode and over my spam folder in --spam mode. all tutorials I saw did it 
the same way. this also worked for years but likely with less mail 
files. I was under the impression that sa-learn would skip messages that 
it already learned. the debug log also indicated that it recognized those.


On 11.02.2018 19:44, Matus UHLAR - fantomas wrote:

On 11.02.18 19:09, Hendrik Haddorp wrote:
I have a maildir with about 2 mails. In the past this does not 
seem to have been a problem. But since a few weeks my sa-learn 
process dies with an OOM now.


do you run sa-learn over whole maildir all the time?
why?

My server has only 1GB of memory with another GB for swap. sa-learn 
is eating up pretty much the complete memory for the run and is only 
able to finish when I stop everything else. Why is sa-learn using 
more and more memory even when it learned all those messages already 
in the past? Is there a way to limit the memory usage except from 
making the set of messages smaller?


you are not supposed to repeatedly call sa-learn over huge maildir.

calling over new mail (or, better, false-positives and 
false-negatives) is

faster and won't eat all your memory.





Re: sa-learn

2018-02-11 Thread Matus UHLAR - fantomas

On 11.02.18 19:09, Hendrik Haddorp wrote:
I have a maildir with about 2 mails. In the past this does not 
seem to have been a problem. But since a few weeks my sa-learn 
process dies with an OOM now.


do you run sa-learn over whole maildir all the time?
why?

My server has only 1GB of memory with 
another GB for swap. sa-learn is eating up pretty much the complete 
memory for the run and is only able to finish when I stop everything 
else. Why is sa-learn using more and more memory even when it learned 
all those messages already in the past? Is there a way to limit the 
memory usage except from making the set of messages smaller?


you are not supposed to repeatedly call sa-learn over huge maildir.

calling over new mail (or, better, false-positives and false-negatives) is
faster and won't eat all your memory.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
We are but packets in the Internet of life (userfriendly.org)


Re: sa-learn

2018-02-11 Thread Hendrik Haddorp

it's a small hosted VM running fine for years.

On 11.02.2018 19:35, Reindl Harald wrote:



Am 11.02.2018 um 19:09 schrieb Hendrik Haddorp:
I have a maildir with about 2 mails. In the past this does not 
seem to have been a problem. But since a few weeks my sa-learn 
process dies with an OOM now. My server has only 1GB of memory with 
another GB for swap. sa-learn is eating up pretty much the complete 
memory for the run and is only able to finish when I stop everything 
else. Why is sa-learn using more and more memory even when it learned 
all those messages already in the past? Is there a way to limit the 
memory usage except from making the set of messages smaller?


from where did you get a machine with 1 GB in the last decade?
below 1.5 GB i don't even deploy a golden-master VM

My problem sounds somewhat like 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5141
probably - but my coropus is 150 mails large, two bayes 
(sa-builtin and bogofilter) with 425 MB living in tmpfs and so in 
memory rsyned at boot/shutdown to a persistent location


clamav needs some hundret MB
dns-cache needs some memory

sorry, but 1 GB is not suitebale in 2018




Re: sa-learn won't read db created via MSTOR

2017-07-10 Thread RW
On Sat, 8 Jul 2017 21:55:36 +0100
RW wrote:

> On Sat, 8 Jul 2017 14:14:42 -0500
> Jerry Malcolm wrote:

> As a proof of concept try a small mbox file with
> 
> mbox_format_from_regex /^From\s/

and if it works try this instead:


/^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4})/



Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Antony Stone
On Saturday 08 July 2017 at 22:55:36, RW wrote:

> I had a spillage and most of the punctuation characters
> on my keyboard aren't working at the moment.

Oh dear, my sympathies - but what a splendid quote on a mailing list :)


Antony.

-- 
Salad is what food eats.

   Please reply to the list;
 please *don't* CC me.


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread RW
On Sat, 8 Jul 2017 14:14:42 -0500
Jerry Malcolm wrote:

> Thanks for the info.  Unfortunately, I don't have a clue how to 
> interpret a regex expression.  I couldn't find any reference to 
> mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that
> came up when I googled it.

I hope you aren't actually running 3.1.x because that's ten years old.
 
> The separators in my mbox file are:
> 
>  From - Sat Jul 8 01:02:28 2017

That looks to be the problem.

As a proof of concept try a small mbox file with

mbox_format_from_regex /^From\s/

This is actually all you need if the mbox files are properly formatted
and lines that start "From " are escaped. I would give you a fuller
replacement but I had a spillage and most of the punctuation characters
on my keyboard aren't working at the moment.


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Upon further investigation, I don't think sa-learn is even attempting to 
open the file.   I get the exact same message whether I give it a real 
file or just a string of characters for a file name:


[C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn.exe --spam 
--mbox c:\IMAPUtil\temp\uncaughtSpam.mstor\temp

Learned tokens from 0 message(s) (0 message(s) examined)

[C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn.exe --spam 
--mbox 

Learned tokens from 0 message(s) (0 message(s) examined)

This can't be right.  How can I tell if it's really reading the file?


On 7/8/2017 2:14 PM, Jerry Malcolm wrote:
Thanks for the info.  Unfortunately, I don't have a clue how to 
interpret a regex expression.  I couldn't find any reference to 
mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that came 
up when I googled it.


The separators in my mbox file are:

From - Sat Jul 8 01:02:28 2017

Can someone who speaks regex tell me if this syntax is my problem, and 
if so, point me to where I can find the correct regex that matches 
this that I can copy/paste?


Thanks.

Jerry


On 7/8/2017 8:45 AM, RW wrote:

On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:


Below is a complete log dump from the -D option on sa-learn.

...


_set_default_message_selection_opts After: Scanprob[1], want_date[0],
cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d
\d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation



---
This email has been checked for viruses by AVG.
http://www.avg.com





Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Thanks for the info.  Unfortunately, I don't have a clue how to 
interpret a regex expression.  I couldn't find any reference to 
mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that came 
up when I googled it.


The separators in my mbox file are:

From - Sat Jul 8 01:02:28 2017

Can someone who speaks regex tell me if this syntax is my problem, and 
if so, point me to where I can find the correct regex that matches this 
that I can copy/paste?


Thanks.

Jerry


On 7/8/2017 8:45 AM, RW wrote:

On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:


Below is a complete log dump from the -D option on sa-learn.

...


_set_default_message_selection_opts After: Scanprob[1], want_date[0],
cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d
\d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation




Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread RW
On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:

> Below is a complete log dump from the -D option on sa-learn. 
...

> _set_default_message_selection_opts After: Scanprob[1], want_date[0], 
> cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d 
> \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Below is a complete log dump from the -D option on sa-learn.  I am 
really curious that the file name I passed in is never even mentioned in 
the log. Is that expected? Do I have some sort of syntax error passing 
the mbox filename in?  Here's the command:


 [C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn -D 
--spam --showdots --mbox c:\imaputil\temp\uncaughtspam.mstor\temp


Thx,

Jerry

Jul  8 01:47:42.704 [12972] dbg: logger: adding facilities: all
Jul  8 01:47:42.704 [12972] dbg: logger: logging level is DBG
Jul  8 01:47:42.704 [12972] dbg: generic: SpamAssassin version 3.4.1
Jul  8 01:47:42.704 [12972] dbg: generic: Perl 5.022001, 
PREFIX=C:\Program Files\JAM Software\SpamAssassin in a Box\runtime, 
DEF_RULES_DIR=C:\ProgramData\JAM Software\spamdService\sa-rules, 
LOCAL_RULES_DIR=C:\ProgramData\JAM Software\spamdService\sa-config, 
LOCAL_STATE_DIR=..\share

Jul  8 01:47:42.705 [12972] dbg: config: timing enabled
Jul  8 01:47:42.706 [12972] dbg: config: score set 0 chosen.
Jul  8 01:47:42.712 [12972] dbg: util: running in taint mode? no
Jul  8 01:47:42.712 [12972] dbg: util: defining getpwuid() wrapper using 
'unknown' as username
Jul  8 01:47:42.715 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-config" for site rules pre files
Jul  8 01:47:42.715 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/init.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v310.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v312.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v320.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v330.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v340.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v341.pre
Jul  8 01:47:42.717 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-rules" for sys rules pre files
Jul  8 01:47:42.717 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-rules" for default rules dir
Jul  8 01:47:42.717 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/sa_zmi_at.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/sought_rules_yerp_org.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/spamassassin_heinlein-support_de.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/updates_spamassassin_org.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/xsaupdate_jam-software_com.cf
Jul  8 01:47:42.718 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-config" for site rules dir
Jul  8 01:47:42.719 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/20_khop_bl.cf
Jul  8 01:47:42.719 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/contact.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_DNSBL.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_example_rules.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_virus_bounce_rules.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/local.cf
Jul  8 01:47:42.721 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::URIDNSBL from @INC
Jul  8 01:47:42.727 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Hashcash from @INC
Jul  8 01:47:42.733 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SPF from @INC
Jul  8 01:47:42.738 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Pyzor from @INC

Jul  8 01:47:42.740 [12972] dbg: pyzor: network tests on, attempting Pyzor
Jul  8 01:47:42.740 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Razor2 from @INC

Jul  8 01:47:42.806 [12972] dbg: razor2: razor2 is available, version 2.84
Jul  8 01:47:42.806 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SpamCop from @INC
Jul  8 01:47:45.307 [12972] dbg: reporter: network tests on, attempting 
SpamCop
Jul  8 01:47:45.307 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::AutoLearnThreshold from @INC
Jul  8 01:47:45.309 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::TextCat from @INC
Jul  8 01:47:45.313 [12972] dbg: textcat: loading languages file 

Re: sa-learn seems to ignore auto_whitelist_path directive for global txrep database

2017-01-20 Thread Michael Meier

I just switched from AWL to txrep. It seems to be working properly
from amavis, the only problem I've got, is that sa-learn seems to
ignore the auto_whitelist_path directive in local.cf .
It doesn't matter to what I set it, sa-learn always updates
/root/.spamassassin/tx-reputation



https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7383


Thanks a lot for the quick answer :-). I didn't find that one over 
google search.





(I'm running the command as root).


that doesn't sound good.


only for testing purposes...


Re: sa-learn seems to ignore auto_whitelist_path directive for global txrep database

2017-01-19 Thread RW
On Thu, 19 Jan 2017 21:43:54 +0100
Michael Meier wrote:

> Hi
> 
> I just switched from AWL to txrep. It seems to be working properly
> from amavis, the only problem I've got, is that sa-learn seems to
> ignore the auto_whitelist_path directive in local.cf .
> It doesn't matter to what I set it, sa-learn always updates 
> /root/.spamassassin/tx-reputation 


https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7383


> (I'm running the command as root).

that doesn't sound good.


Re: sa-learn --force-expire bails with "netset: illegal IP address given (patricia trie): '=/128'"

2016-07-26 Thread Benny Pedersen

On 2016-07-26 16:37, Ralf Hildebrandt wrote:


I grepped for "=/128" in /etc and /var/lib/spamassassin -- nothing.
What is amiss here?


http://ipv6bingo.com/

me hiddes, check where =/128 is located

provide spamassassin 2>&1 -D --lint

so we know what perl modules you have


Re: sa-learn --force-expire bails with "netset: illegal IP address given (patricia trie): '=/128'"

2016-07-26 Thread Benny Pedersen

On 2016-07-26 16:42, Ralf Hildebrandt wrote:


Argh. I set
trusted_networks = something
instead of
trusted_networks something


would a spamassassin --lint show that ? :=)


Re: sa-learn --force-expire bails with "netset: illegal IP address given (patricia trie): '=/128'"

2016-07-26 Thread Reindl Harald



Am 26.07.2016 um 17:27 schrieb Benny Pedersen:

On 2016-07-26 16:37, Ralf Hildebrandt wrote:


I grepped for "=/128" in /etc and /var/lib/spamassassin -- nothing.
What is amiss here?


http://ipv6bingo.com/


do you post that crap everytime?
then place it in your signature


me hiddes, check where =/128 is located

provide spamassassin 2>&1 -D --lint

so we know what perl modules you have


just read the next message before press reply in a reflex



signature.asc
Description: OpenPGP digital signature


Re: sa-learn --force-expire bails with "netset: illegal IP address given (patricia trie): '=/128'"

2016-07-26 Thread Ralf Hildebrandt
* Ralf Hildebrandt :

> I grepped for "=/128" in /etc and /var/lib/spamassassin -- nothing.
> What is amiss here?

Argh. I set
trusted_networks = something
instead of
trusted_networks something

-- 
Ralf Hildebrandt   Charite Universitätsmedizin Berlin
ralf.hildebra...@charite.deCampus Benjamin Franklin
http://www.charite.de  Hindenburgdamm 30, 12203 Berlin
Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155


Re: sa-learn from mails which passed SA

2015-12-17 Thread Reindl Harald



Am 17.12.2015 um 11:41 schrieb Matthias Apitz:

I'm sorting all mails out into 2 mbox: HAM and SPAM and from time to
time I run sa-learn with them. Short question: the mails in both are not
the original mail, but the result of what SA added to them. Isn't it
wrong to pass this so modified to sa-learn? It will see also the tokens
of the lines of SA 


the SA headers are ignored




signature.asc
Description: OpenPGP digital signature


Re: sa-learn

2015-04-13 Thread Kevin A. McGrail

On 4/13/2015 6:24 PM, Roman Gelfand wrote:
I have deployed spamassassin with postfix on mail gateway machine.   
My dovecot mailbox server is on another machine.  The mail on mailbox 
server is stored on maildir.


If I understand this correctly, If I want to run sa-learn, I need to 
nfs mount mailbox server to the mail gateway.


Is this an optimal way to accomplish this?  Are there other methods?

Thanks in advance
Well, I'm thinking that off-the-cuff, sa-learn just needs access to a cf 
file, the mail and the backend store so you could just run it on the 
mailbox server...


Regards,
KAM




Re: sa-learn

2015-04-13 Thread John Hardin

On Mon, 13 Apr 2015, Roman Gelfand wrote:


I have deployed spamassassin with postfix on mail gateway machine.   My
dovecot mailbox server is on another machine.  The mail on mailbox server
is stored on maildir.

If I understand this correctly, If I want to run sa-learn, I need to nfs
mount mailbox server to the mail gateway.

Is this an optimal way to accomplish this?  Are there other methods?


Probably somewhat suboptimal.

You can run sa-learn locally on the mailbox server to populate local Bayes 
database files, and then copy those files over to the SA server, 
momentarily down SA, rename them on top of the existing ones, and restart 
SA.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Today: Thomas Jefferson's 272nd Birthday


Re: sa-learn strip last Received: header for own MDA

2014-09-19 Thread John Hardin

On Fri, 19 Sep 2014, Marcus Schopen wrote:


still playing with sa-learn. If I feed sa-learn do I have to strip the
last Received: header which is the Received header for my own MDA
(imap-backend) before piping the message into sa-learn?


No, that shouldn't matter. The common bits will be learned as neutral 
because they appear in both ham and spam.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...for a nation to tax itself into prosperity is like a man
  standing in a bucket and trying to lift himself up by the handle.
 -- Winston Churchill
---
 Today: Talk Like a Pirate day


Re: sa-learn strip last Received: header for own MDA

2014-09-19 Thread LuKreme
On 19 Sep 2014, at 09:06 , Marcus Schopen li...@localguru.de wrote:
 still playing with sa-learn. If I feed sa-learn do I have to strip the
 last Received: header which is the Received header for my own MDA
 (imap-backend) before piping the message into sa-learn?

All you need to do is make sure that you feed both ham and spam into sa-learn. 
A mistake many people make is to train only spam.

-- 
The Salvation Army Band played and the children drunk lemonade and the
morning lasted all day, all day. And through an open window came like
Sinatra in a younger day pushing the town away



Re: sa-learn from a remote imap folder

2014-09-15 Thread Giles Coochey

On 12/09/2014 18:34, Rick Macdougall wrote:

On 2014-09-12 1:24 PM, John Hardin wrote:

On Fri, 12 Sep 2014, Reindl Harald wrote:



Am 12.09.2014 um 15:26 schrieb Giles Coochey:

On 12/09/2014 13:47, Rick Macdougall wrote:


I have used imap-sa-learn.pl for years.  Works great.

Google imap-sa-learn.pl to get the perl source code.


Wouldn't mind using it, but don't think I can get it working as my
IMAP server requires SSL


have you tried it?

these days almost anything works with SSL
because common used libraries


Also: stunnel.



Or you know, set ssl = 1 and port = 993 where the program opens a 
new connection.


It's very straight forward.


Straightforward once the documentation is located and read...

So now it appears to connect, but doesn't appear to read my Junk 
E-mail folder correctly (perhaps due to handling of white space, or 
hyphen in the folder name?)...


--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7584 634135
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net




smime.p7s
Description: S/MIME Cryptographic Signature


Re: sa-learn from a remote imap folder

2014-09-15 Thread RW
On Sat, 13 Sep 2014 13:54:13 -0500
Dave Pooser wrote:

 On 9/13/14 1:38 PM, Bob Proulx b...@proulx.com wrote:
 
 If you are using Maildir format then there are no locks.  That makes
 Maildir format the format of choice for use over NFS.
 
 Excellent. I am in fact using Maildir, so I guess it's not just luck
 that I've had no locking issues.

Delivery into Maildir doesn't require a lock, but a well designed
IMAP server will still lock Maildir to avoid flag corruption. This is
unlikely to make a difference to sa-learn, unless you are trying to
do something exotic.


Re: sa-learn from a remote imap folder

2014-09-13 Thread Jari Fredriksson
13.09.2014, 08:57, Dave Pooser kirjoitti:
 Dave At $DAYJOB we export the spam folder (and a ham folder for FPs)
 Dave via NFS and mount them on the frontline SA servers for sa-learn.

 Doesn't that smell of locking issues?
 To be honest, I'd just assumed that NFS wouldn't do any locking on a
 read-only export. I haven't seen any issues yet, but we're talking small
 volume, and the frontline servers mount the directories periodically, run
 the sa-learn, and then unmount them again.

 ...Great, now I'm paranoid. Thanks, Ian. ;-) I guess I can go back and
 force the exports to ne NFS3 instead of NFS4 so I can explicitly state the
 nolocks option

I have my Maildir on a NAS via NFS, but that is only a one user system
anyway.

-- 
jarif.bit




signature.asc
Description: OpenPGP digital signature


Re: sa-learn from a remote imap folder

2014-09-13 Thread Bob Proulx
Dave Pooser wrote:
 Dave At $DAYJOB we export the spam folder (and a ham folder for FPs)
 Dave via NFS and mount them on the frontline SA servers for sa-learn.
 
 Doesn't that smell of locking issues?
 
 To be honest, I'd just assumed that NFS wouldn't do any locking on a
 read-only export.

If you are using Maildir format then there are no locks.  That makes
Maildir format the format of choice for use over NFS.  Here is a
reference.

  http://www.postfix.org/NFS_README.html
  The maildir format uses one file per message and needs no file
  locking support in Postfix or in other mail software.

 ...Great, now I'm paranoid. Thanks, Ian. ;-) I guess I can go back and
 force the exports to ne NFS3 instead of NFS4 so I can explicitly state the
 nolocks option

Paranoia is good.  But using Maildir format would also be okay too.

Bob


Re: sa-learn from a remote imap folder

2014-09-13 Thread Dave Pooser
On 9/13/14 1:38 PM, Bob Proulx b...@proulx.com wrote:

If you are using Maildir format then there are no locks.  That makes
Maildir format the format of choice for use over NFS.

Excellent. I am in fact using Maildir, so I guess it's not just luck that
I've had no locking issues.

Thanks, Ian, for the warning and thanks, Bob, for setting my mind at ease.
;-) 
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com




Re: sa-learn from a remote imap folder

2014-09-12 Thread Axb

On 09/12/2014 10:05 AM, Marcus Schopen wrote:

Hi,

spamassassin and imap (cyrus) are running on different boxes. What is
best practice to learn spam from a remote imap folder? Any good working
scripts?

I found these:

https://wiki.apache.org/spamassassin/RemoteImapFolder

https://gist.github.com/colinmollenhour/4127743


Imapsync or OfflineIMAP

http://imapsync.lamiral.info/
http://offlineimap.org/

there are tons more - Google sync imap



Re: sa-learn from a remote imap folder

2014-09-12 Thread Marcus Schopen
Hi,

Am Freitag, den 12.09.2014, 10:13 +0200 schrieb Axb:
 On 09/12/2014 10:05 AM, Marcus Schopen wrote:
  Hi,
 
  spamassassin and imap (cyrus) are running on different boxes. What is
  best practice to learn spam from a remote imap folder? Any good working
  scripts?
 
  I found these:
 
  https://wiki.apache.org/spamassassin/RemoteImapFolder
 
  https://gist.github.com/colinmollenhour/4127743
 
 Imapsync or OfflineIMAP
 
 http://imapsync.lamiral.info/
 http://offlineimap.org/


On the spamassassin box there is no imap server running to have an imap
to imap sync. Therefore I think imapsync can't handle this. offlineimap
seems to be interesting, because it seems to dump the mails to a local
folder, which could eaten by sa-learn. 

 there are tons more - Google sync imap

Before start scripting mysself I was looking for a smart and stable
script, which people use for long time without problems.

Ciao
Marcus




RE: sa-learn from a remote imap folder

2014-09-12 Thread David Jones


From: Marcus Schopen li...@localguru.de
Sent: Friday, September 12, 2014 3:33 AM
To: Axb
Cc: users@spamassassin.apache.org
Subject: Re: sa-learn from a remote imap folder

Hi,

Am Freitag, den 12.09.2014, 10:13 +0200 schrieb Axb:
 On 09/12/2014 10:05 AM, Marcus Schopen wrote:
  Hi,
 
  spamassassin and imap (cyrus) are running on different boxes. What is
  best practice to learn spam from a remote imap folder? Any good working
  scripts?
 
  I found these:
 
  https://wiki.apache.org/spamassassin/RemoteImapFolder
 
  https://gist.github.com/colinmollenhour/4127743

 Imapsync or OfflineIMAP

 http://imapsync.lamiral.info/
 http://offlineimap.org/


On the spamassassin box there is no imap server running to have an imap
to imap sync. Therefore I think imapsync can't handle this. offlineimap
seems to be interesting, because it seems to dump the mails to a local
folder, which could eaten by sa-learn.

 there are tons more - Google sync imap

Before start scripting mysself I was looking for a smart and stable
script, which people use for long time without problems.

Check out fetchmail.  It combined with procmail can do just about anything
you need to do with email.  You may only need fetchmail in this case to
gather up the email for sa-learn.

Ciao
Marcus




Re: sa-learn from a remote imap folder

2014-09-12 Thread Dave Pooser
On 9/12/14 3:05 AM, Marcus Schopen li...@localguru.de wrote:

spamassassin and imap (cyrus) are running on different boxes. What is
best practice to learn spam from a remote imap folder?

At $DAYJOB we export the spam folder (and a ham folder for FPs) via NFS
and mount them on the frontline SA servers for sa-learn.
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
...Life is not a journey to the grave with the intention of arriving
safely in one pretty and well-preserved piece, but to slide across the
finish line broadside, thoroughly used up, worn out, leaking oil, and
shouting GERONIMO!!! -- Bill McKenna






Re: sa-learn from a remote imap folder

2014-09-12 Thread Rick Macdougall

Am Freitag, den 12.09.2014, 10:13 +0200 schrieb Axb:

On 09/12/2014 10:05 AM, Marcus Schopen wrote:

Hi,

spamassassin and imap (cyrus) are running on different boxes. What is
best practice to learn spam from a remote imap folder? Any good working
scripts?



Hi,

I have used imap-sa-learn.pl for years.  Works great.

Google imap-sa-learn.pl to get the perl source code.

Regards,

Rick



Re: sa-learn from a remote imap folder

2014-09-12 Thread Giles Coochey

On 12/09/2014 13:47, Rick Macdougall wrote:


Hi,

I have used imap-sa-learn.pl for years.  Works great.

Google imap-sa-learn.pl to get the perl source code.

Wouldn't mind using it, but don't think I can get it working as my IMAP 
server requires SSL


--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7584 634135
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net





smime.p7s
Description: S/MIME Cryptographic Signature


Re: sa-learn from a remote imap folder

2014-09-12 Thread Reindl Harald

Am 12.09.2014 um 15:26 schrieb Giles Coochey:
 On 12/09/2014 13:47, Rick Macdougall wrote:

 I have used imap-sa-learn.pl for years.  Works great.

 Google imap-sa-learn.pl to get the perl source code.

 Wouldn't mind using it, but don't think I can get it working as my IMAP 
 server requires SSL

have you tried it?

these days almost anything works with SSL
because common used libraries



signature.asc
Description: OpenPGP digital signature


Re: sa-learn from a remote imap folder

2014-09-12 Thread Kris Deugau
Marcus Schopen wrote:
 Hi,
 
 spamassassin and imap (cyrus) are running on different boxes. What is
 best practice to learn spam from a remote imap folder? Any good working
 scripts?

I've been using http://www.deepnet.cx/~kdeugau/spamtools/imap-learn
(derived from the script at http://www.dmzs.com/tools/files/spam.phtml)
in production here for quite a while.  (The timestamp on that particular
copy is from 2012;  pretty sure it hasn't changed for quite some time
before that.)

There are probably things it could do better or more efficiently, but it
works well enough I can't justify fiddling with it.

-kgd


Re: sa-learn from a remote imap folder

2014-09-12 Thread Giles Coochey

On 12/09/2014 14:30, Reindl Harald wrote:



Wouldn't mind using it, but don't think I can get it working as my IMAP server 
requires SSL

have you tried it?

these days almost anything works with SSL
because common used libraries


It times out, as it tries to connect on port 143, server runs on port 993.

--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7584 634135
http://www.netsecspec.co.uk
giles.cooc...@netsecspec.co.uk



--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7584 634135
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net



smime.p7s
Description: S/MIME Cryptographic Signature


Re: sa-learn from a remote imap folder

2014-09-12 Thread Reindl Harald

Am 12.09.2014 um 17:48 schrieb Giles Coochey:
 On 12/09/2014 14:30, Reindl Harald wrote:

 Wouldn't mind using it, but don't think I can get it working as my IMAP 
 server requires SSL
 have you tried it?

 these days almost anything works with SSL
 because common used libraries

 It times out, as it tries to connect on port 143, server runs on port 993

uhm Port 143 should offer and enforce STARTTLS independent
of the script, just because some clients don't work with
993 and others not with STARTTLS




signature.asc
Description: OpenPGP digital signature


Re: sa-learn from a remote imap folder

2014-09-12 Thread Giles Coochey

On 12/09/2014 17:01, Reindl Harald wrote:

Am 12.09.2014 um 17:48 schrieb Giles Coochey:

On 12/09/2014 14:30, Reindl Harald wrote:

Wouldn't mind using it, but don't think I can get it working as my IMAP server 
requires SSL

have you tried it?

these days almost anything works with SSL
because common used libraries


It times out, as it tries to connect on port 143, server runs on port 993

uhm Port 143 should offer and enforce STARTTLS independent
of the script, just because some clients don't work with
993 and others not with STARTTLS


Yes, it's not what the server offers, it's about what is open on the 
firewall - which separates the smtp server (external DMZ), from the 
mailbox server (internal DMZ). The point is that the server does not 
support the certificate based SSL that is run by policy, STARTTLS is 
different altogether.


--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7584 634135
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net




smime.p7s
Description: S/MIME Cryptographic Signature


Re: sa-learn from a remote imap folder

2014-09-12 Thread John Hardin

On Fri, 12 Sep 2014, Reindl Harald wrote:



Am 12.09.2014 um 15:26 schrieb Giles Coochey:

On 12/09/2014 13:47, Rick Macdougall wrote:


I have used imap-sa-learn.pl for years.  Works great.

Google imap-sa-learn.pl to get the perl source code.


Wouldn't mind using it, but don't think I can get it working as my IMAP server 
requires SSL


have you tried it?

these days almost anything works with SSL
because common used libraries


Also: stunnel.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Maxim II: A Sergeant in motion outranks a Lieutenant who doesn't
  know what's going on.
  Maxim III: An ordnance technician at a dead run outranks everybody.
---
 5 days until the 227th anniversary of the signing of the U.S. Constitution


Re: sa-learn from a remote imap folder

2014-09-12 Thread Rick Macdougall

On 2014-09-12 1:24 PM, John Hardin wrote:

On Fri, 12 Sep 2014, Reindl Harald wrote:



Am 12.09.2014 um 15:26 schrieb Giles Coochey:

On 12/09/2014 13:47, Rick Macdougall wrote:


I have used imap-sa-learn.pl for years.  Works great.

Google imap-sa-learn.pl to get the perl source code.


Wouldn't mind using it, but don't think I can get it working as my
IMAP server requires SSL


have you tried it?

these days almost anything works with SSL
because common used libraries


Also: stunnel.



Or you know, set ssl = 1 and port = 993 where the program opens a new 
connection.


It's very straight forward.

Regards,

Rick



Re: sa-learn from a remote imap folder

2014-09-12 Thread Ian Zimmerman
On Fri, 12 Sep 2014 07:45:22 -0500,
Dave Pooser dave...@pooserville.com wrote:

Marcus spamassassin and imap (cyrus) are running on different
Marcus boxes. What is best practice to learn spam from a remote imap
Marcus folder?

Dave At $DAYJOB we export the spam folder (and a ham folder for FPs)
Dave via NFS and mount them on the frontline SA servers for sa-learn.

Doesn't that smell of locking issues?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn from a remote imap folder

2014-09-12 Thread Dave Pooser
Dave At $DAYJOB we export the spam folder (and a ham folder for FPs)
Dave via NFS and mount them on the frontline SA servers for sa-learn.

Doesn't that smell of locking issues?

To be honest, I'd just assumed that NFS wouldn't do any locking on a
read-only export. I haven't seen any issues yet, but we're talking small
volume, and the frontline servers mount the directories periodically, run
the sa-learn, and then unmount them again.

...Great, now I'm paranoid. Thanks, Ian. ;-) I guess I can go back and
force the exports to ne NFS3 instead of NFS4 so I can explicitly state the
nolocks option
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
...Life is not a journey to the grave with the intention of arriving
safely in one pretty and well-preserved piece, but to slide across the
finish line broadside, thoroughly used up, worn out, leaking oil, and
shouting GERONIMO!!! -- Bill McKenna






Re: sa-learn and find

2014-09-03 Thread Matus UHLAR - fantomas

On Sat, 30 Aug 2014 08:23:02 -0600
LuKreme wrote:


  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`


On 30.08.14 22:32, RW wrote:

mtime may not be the best choice. Ideally what you want is the the time
since the spam was moved to Junk, rather than the time since it was
delivered.


ctime should provide this information - it's changed when sa file is moved. 
For example courier-imap uses ctime ifnormation for deleting old mail from

trash and spam (and whatever you configure to TRASH variable.

Note that something that manipulates file status can break this feature,
e.g.  a backup system that reads files and resets atime back will cause
resetting the ctime.  Setting it _not_ to reset atime (nobody uses atime
nowadays) should fix the problem.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fighting for peace is like fucking for virginity...


Re: sa-learn and find

2014-09-03 Thread LuKreme

 On 03 Sep 2014, at 02:05 , Matus UHLAR - fantomas uh...@fantomas.sk wrote:
 
 On Sat, 30 Aug 2014 08:23:02 -0600
 LuKreme wrote:
 
  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
 
 On 30.08.14 22:32, RW wrote:
 mtime may not be the best choice. Ideally what you want is the the time
 since the spam was moved to Junk, rather than the time since it was
 delivered.
 
 ctime should provide this information - it's changed when sa file is moved. 
 For example courier-imap uses ctime ifnormation for deleting old mail from
 trash and spam (and whatever you configure to TRASH variable.

I agree that it should. However, I’ve had very poor luck with -ctime.

For example, I have a command O run to delete files in my ~/tmp that are more 
than 30 days old. If I use -ctime, none of the files are ever deleted, while if 
I use -mtime, everything works as expected.
 
 Note that something that manipulates file status can break this feature,
 e.g.  a backup system that reads files and resets atime back will cause
 resetting the ctime.  Setting it _not_ to reset atime (nobody uses atime
 nowadays) should fix the problem.

That may be what is happening then, since the system is backed up with 
rsnapshot.

-- 
Personal isn't the same as important



Re: sa-learn and find

2014-09-01 Thread LuKreme

On 31 Aug 2014, at 18:16 , Ian Zimmerman i...@buug.org wrote:

 find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham 
 -u ${i}

Right. Doh. I got so held up in running find under sa-learn...

Well, that does make thins a lot easier, doesn't it.

Thanks for your patience.

-- 
There will always be women in rubber flirting with me.



Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sat, 30 Aug 2014 19:59:53 -0600,
LuKreme krem...@kreme.com wrote:

RW This may run into shell argument limits if you have to learn a lot
RW of spam. Consider piping the output of find to xargs, or using -exec
RW ...{} + in find.

LuKreme Yes, I tried to do that, but as I said in my first post, if I
LuKreme do the find as part of the sa-learn command, then it stall when
LuKreme the find command returns null.

xargs (the GNU one at least) has an option to not run the inferior when
there are no args to give it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn and find

2014-08-31 Thread LuKreme

On 31 Aug 2014, at 14:46 , Ian Zimmerman i...@buug.org wrote:

 On Sat, 30 Aug 2014 19:59:53 -0600,
 LuKreme krem...@kreme.com wrote:
 
 RW This may run into shell argument limits if you have to learn a lot
 RW of spam. Consider piping the output of find to xargs, or using -exec
 RW ...{} + in find.
 
 LuKreme Yes, I tried to do that, but as I said in my first post, if I
 LuKreme do the find as part of the sa-learn command, then it stall when
 LuKreme the find command returns null.
 
 xargs (the GNU one at least) has an option to not run the inferior when
 there are no args to give it.

The interior is the find:

This was my original command:

sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

Which stalls if find returns nothing. I am not seeing how xargs would help this.

(FreeBSD xargs never runs the command if the input is empty)

-- 
'I really should talk to him, sir. He's had a near-death experience!'
'We all do. It's called living.'



Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 17:37:50 -0600,
LuKreme krem...@kreme.com wrote:

Ian xargs (the GNU one at least) has an option to not run the inferior
Ian when there are no args to give it.

LuKreme The interior is the find:

_Inferior_ which is GNU speak for subprocess.  I should have tried to
be less concise :-)

 sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u 
${i}

LuKreme (FreeBSD xargs never runs the command if the input is empty)

You may not need -r then.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn and find

2014-08-30 Thread LuKreme
On 30 Aug 2014, at 07:49 , LuKreme krem...@kreme.com wrote:
 MYFIND= `find $H_PATH/cur -type f -mtime -7` 
 if [ -n $MYFIND ]; then
   /usr/local/bin/sa-learn --ham -u ${i} $MYFIND
 fi

Doh!

if [ -n “$MYFIND” ]; then

or

if test -n “$MYFIND”; then

Sigh. Feeling extra stupid this Saturday morning.

It works, and is no longer processing thousands of old messages for no reason.

#/bin/sh
#
# Straightforward shell script to be run as root.  This parses the /home
# directory for mailboxes named .Junk and learns those as spam, and then
# parses the inbox (cur, not new) for ham.

# sa-learn-script (sal) v2.1  Lewis Butler, released to the Public Domain 2012

UROOT=/home/
echo Running SAL
for i in `ls $UROOT` ; do
  J_PATH=${UROOT}${i}/Maildir/.Junk;
  H_PATH=${UROOT}${i}/Maildir”;

  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
if test -n $MYFIND; then
  /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21
fi
  else
 echo No $J_PATH for $i
  fi
  
  if test -d $H_PATH; then
MYFIND=`find $H_PATH/cur -type f -mtime -7|grep -v dovecot`
if test -n $MYFIND; then
  echo Processing $H_PATH
 /usr/local/bin/sa-learn --ham -u ${i} $MYFIND #/dev/null 21
fi
  #else
  #  echo No $H_PATH for $i”
  fi
done

If I were feeling really clever, I’d make sure the user existed first, but I’m 
not feeling that clever today.

-- 
A marriage is always made up of two people who are prepared to swear
that only the other one snores.



Re: sa-learn and find

2014-08-30 Thread RW
On Sat, 30 Aug 2014 08:23:02 -0600
LuKreme wrote:

   if test -d $J_PATH; then
 MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`

mtime may not be the best choice. Ideally what you want is the the time
since the spam was moved to Junk, rather than the time since it was
delivered. What I see with dovecot when I move mail with claws mail is
that  a new file is created with the mtime preserved at the
delivery time and the current epoch time in the filename. In that case
the ideal would be Btime if your OS supports it, or failing that
ctime. 

You could also use the time in the filename. Note that epoch times are
10 digits until long after we're dead so simple lexicographical
comparisons between maildir filenames or between a maildir filename and
an epoch time will work.

You may want to check what happens with whatever you use to move the
spam.  


 if test -n $MYFIND; then
   /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21

This may run into shell argument limits if you have to learn a lot of
spam. Consider piping the output of find to xargs, or using 
-exec ...{} + in find.





Re: sa-learn and find

2014-08-30 Thread LuKreme

 On 30 Aug 2014, at 15:32 , RW rwmailli...@googlemail.com wrote:
 
 On Sat, 30 Aug 2014 08:23:02 -0600
 LuKreme wrote:
 
  if test -d $J_PATH; then
MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
 
 mtime may not be the best choice. Ideally what you want is the the time
 since the spam was moved to Junk, rather than the time since it was
 delivered. What I see with dovecot when I move mail with claws mail is
 that  a new file is created with the mtime preserved at the
 delivery time and the current epoch time in the filename. In that case
 the ideal would be Btime if your OS supports it, or failing that
 ctime. 
 
 You could also use the time in the filename. Note that epoch times are
 10 digits until long after we're dead so simple lexicographical
 comparisons between maildir filenames or between a maildir filename and
 an epoch time will work.

On my system the file is not renamed when it is moved.

 You may want to check what happens with whatever you use to move the
 spam.

Spam is delivered to the junk box at delivery time, or is manually moved via 
IMAP by the user.

Is there a way to actually show the mtime and ctime of a file?

if test -n $MYFIND; then
  /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #/dev/null 21
 
 This may run into shell argument limits if you have to learn a lot of
 spam. Consider piping the output of find to xargs, or using 
 -exec ...{} + in find.

Yes, I tried to do that, but as I said in my first post, if I do the find as 
part of the sa-learn command, then it stall when the find command returns null.


-- 
The fact that Bob and John are married does nothing to diminish anyone
else's marriage any more than a black woman marrying a white man, a Jew
marrying a Catholic, or an ugly Lyle marrying a Pretty Woman



Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Marcin Mirosław
W dniu 20.08.2014 o 14:42, Axb pisze:
 On 08/20/2014 02:25 PM, Matteo Dessalvi wrote:
 Hi all.


 I am managing a bunch of Linux MTAs which are placed in
 front of some Exchange servers. In such a configuration
 the Bayes filter is deployed site-wide.

 For a new deployment of these servers I am planning
 to use Redis as a centralized backend (previously
 the bayes db were just files saved on the disk).

 My question is: do I have to use a specific option
 to tell sa-learn that the bayes db is now hosted on
 Redis? Or sa-learn will use the info from the
 bayes_sql_dsn directive in my local.cf?

 Looking into the wiki:
 http://wiki.apache.org/spamassassin/SiteWideBayesSetup

 or into the sa-learn docs:
 http://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

 did not give me any clues.
 
 see
 
 http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/
 
 
 hope that helps.
 This is not an official doc, so if you see anything that needs to be
 added/changed, pls let me know.

Hi!
I'm reading bayes_redis.cf and I can see:

#NOTE: We're not using authentication assuming the Redis server/port
should not be reachable form the outside
# You can add authentication once you've seen it work.


Does it means that this example config doesn't include authentication
options or it means that SA doesn't support auth for redis?

Marcin






Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Matteo Dessalvi

I am pretty sure SA support the Redis authentication mechanism.
For my tests I have used the following line:

bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2

Matteo

On 21.08.2014 12:56, Marcin Mirosław wrote:


Hi!
I'm reading bayes_redis.cf and I can see:

#NOTE: We're not using authentication assuming the Redis server/port
should not be reachable form the outside
# You can add authentication once you've seen it work.


Does it means that this example config doesn't include authentication
options or it means that SA doesn't support auth for redis?

Marcin






Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Marcin Mirosław
W dniu 21.08.2014 o 13:45, Matteo Dessalvi pisze:
 I am pretty sure SA support the Redis authentication mechanism.
 For my tests I have used the following line:
 
 bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2

Thanks Matteo,
firstly I should try then write to ML:) So now I did own check. It looks
that SA doesn't authenticate when connects to redis. It didn't work for
me with your example not when I used
bayes_sql_password   password

When redis needs passowrd then SA throws bayes: Redis failed: Redis
error: ERR operation not permitted, tcpdump also confirms that SA
doesn't do AUTH.
It's strange because in Redis.pm I can see that authentication is
supported. Now I'm thinking where I could made mistake in configuration...

Thanks,
Marcin


Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Matteo Dessalvi

Which version of Redis are you using? I did have some
problems with the 2.4 version packaged by Debian and
I did solve a similar problem using a more recent
version, like the 2.7 or 2.8.

Matteo

On 21.08.2014 14:45, Marcin Mirosław wrote:

W dniu 21.08.2014 o 13:45, Matteo Dessalvi pisze:

I am pretty sure SA support the Redis authentication mechanism.
For my tests I have used the following line:

bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2


Thanks Matteo,
firstly I should try then write to ML:) So now I did own check. It looks
that SA doesn't authenticate when connects to redis. It didn't work for
me with your example not when I used
bayes_sql_password   password

When redis needs passowrd then SA throws bayes: Redis failed: Redis
error: ERR operation not permitted, tcpdump also confirms that SA
doesn't do AUTH.
It's strange because in Redis.pm I can see that authentication is
supported. Now I'm thinking where I could made mistake in configuration...

Thanks,
Marcin



Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

On 08/20/2014 02:25 PM, Matteo Dessalvi wrote:

Hi all.


I am managing a bunch of Linux MTAs which are placed in
front of some Exchange servers. In such a configuration
the Bayes filter is deployed site-wide.

For a new deployment of these servers I am planning
to use Redis as a centralized backend (previously
the bayes db were just files saved on the disk).

My question is: do I have to use a specific option
to tell sa-learn that the bayes db is now hosted on
Redis? Or sa-learn will use the info from the
bayes_sql_dsn directive in my local.cf?

Looking into the wiki:
http://wiki.apache.org/spamassassin/SiteWideBayesSetup

or into the sa-learn docs:
http://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

did not give me any clues.


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/

hope that helps.
This is not an official doc, so if you see anything that needs to be 
added/changed, pls let me know.




Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Matteo Dessalvi

No, unfortunately it does not help me.
I already have a proper config file for SA
to access Redis as backend and most of
the configurations are done automatically
through a Chef cookbook (Redis included).

In the docs you pointed me there's nothing
about the interaction between sa-learn and
Redis.

Best regards,
   Matteo

On 20.08.2014 14:42, Axb wrote:


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/


hope that helps.
This is not an official doc, so if you see anything that needs to be
added/changed, pls let me know.



Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent, 
as with any other backed (DBD,SDBM,SQL)


bayes_redis.cf shows what parameters are mandatory/optional

On 08/20/2014 03:02 PM, Matteo Dessalvi wrote:

No, unfortunately it does not help me.
I already have a proper config file for SA
to access Redis as backend and most of
the configurations are done automatically
through a Chef cookbook (Redis included).

In the docs you pointed me there's nothing
about the interaction between sa-learn and
Redis.

Best regards,
Matteo

On 20.08.2014 14:42, Axb wrote:


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/



hope that helps.
This is not an official doc, so if you see anything that needs to be
added/changed, pls let me know.





Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Matteo Dessalvi

Ok, perfect! Thanks a lot! This is what I want to know
and I was not so sure about.

I may be wrong but it looks to me the fact that
tools like sa-learn can access transparently the
backends configured for SA is not exactly clear
from the docs.

It would be great if the wiki maintainers could add
a short note somewhere in the pages regarding the
SiteWide deployment or related topics.

Best regards,
 Matteo

On 20.08.2014 15:08, Axb wrote:

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent,
as with any other backed (DBD,SDBM,SQL)

bayes_redis.cf shows what parameters are mandatory/optional




Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

I so love to posters.

On 08/20/2014 03:33 PM, Matteo Dessalvi wrote:

Ok, perfect! Thanks a lot! This is what I want to know
and I was not so sure about.

I may be wrong but it looks to me the fact that
tools like sa-learn can access transparently the
backends configured for SA is not exactly clear
from the docs.

It would be great if the wiki maintainers could add
a short note somewhere in the pages regarding the
SiteWide deployment or related topics.

Best regards,
  Matteo

On 20.08.2014 15:08, Axb wrote:

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent,
as with any other backed (DBD,SDBM,SQL)

bayes_redis.cf shows what parameters are mandatory/optional


Watch your memory usage:

If you configure Redis to dump data from memory to file, it's safe to 
*double* the amount of memory you planned for Redis usage



as in my case:

sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   25218483  0  non-token data: nspam
0.000  0   11919587  0  non-token data: nham

# Memory
used_memory:3637407032
used_memory_human:3.39G
used_memory_rss:4068585472
used_memory_peak:3702485960
used_memory_peak_human:3.45G
used_memory_lua:205824
mem_fragmentation_ratio:1.12
mem_allocator:jemalloc-3.2.0


I keep at least 5 GB of free memory for the dump to file to avoid ugly 
swaps or crashes.


free
total   used   free sharedbuffers cached
Mem:1426264857866648475984  0  162744 1343408
-/+ buffers/cache:42805129982136
Swap:  2046968  02046968





Re: SA-Learn - OT (slightly) Bash Script help needed

2014-05-29 Thread Axb

On 05/29/2014 12:22 PM, Arthur Dent wrote:


So...

Will this work for sa-learn?

8
# Proposed sa-learn maildir script
#!/bin/bash

sa-learn --ham ~/Maildir/.Hobby/{cur,new}
sa-learn --ham ~/Maildir/.{Misc,Personal,etc}.*/{cur,new}
sa-learn --spam ~/Maildir/.Malware.*/{cur,new}
8


new means unread - you really want to run sa-learn on stuff you 
haven't looked at? (as in learning false negatives as ham?)




Re: SA-Learn - OT (slightly) Bash Script help needed

2014-05-29 Thread Giles Coochey

On 29/05/2014 11:43, Axb wrote:

On 05/29/2014 12:22 PM, Arthur Dent wrote:


So...

Will this work for sa-learn?

8 


# Proposed sa-learn maildir script
#!/bin/bash

sa-learn --ham ~/Maildir/.Hobby/{cur,new}
sa-learn --ham ~/Maildir/.{Misc,Personal,etc}.*/{cur,new}
sa-learn --spam ~/Maildir/.Malware.*/{cur,new}
8 



new means unread - you really want to run sa-learn on stuff you 
haven't looked at? (as in learning false negatives as ham?)


If it was his Inbox then perhaps it would be best to avoid new, but a 
lot of the time stuff that is unread in other folders generally means 
that it has been looked at - perhaps not totally read, or perhaps put 
aside for later inspection.
For me, I use unread / read as a marker to whether I have actioned a 
particular email and keep messages unread until such time that they 
are dealt with.


--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 8444 780677
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net




smime.p7s
Description: S/MIME Cryptographic Signature


Re: SA-Learn - OT (slightly) Bash Script help needed

2014-05-29 Thread Arthur Dent
On Thu, 2014-05-29 at 12:04 +0100, Giles Coochey wrote:
 On 29/05/2014 11:43, Axb wrote:
  On 05/29/2014 12:22 PM, Arthur Dent wrote:
 
  So...
 
  Will this work for sa-learn?
 
  8
   
 
  # Proposed sa-learn maildir script
  #!/bin/bash
 
  sa-learn --ham ~/Maildir/.Hobby/{cur,new}
  sa-learn --ham ~/Maildir/.{Misc,Personal,etc}.*/{cur,new}
  sa-learn --spam ~/Maildir/.Malware.*/{cur,new}
  8
   
 
 
  new means unread - you really want to run sa-learn on stuff you 
  haven't looked at? (as in learning false negatives as ham?)
 
 If it was his Inbox then perhaps it would be best to avoid new, but a 
 lot of the time stuff that is unread in other folders generally means 
 that it has been looked at - perhaps not totally read, or perhaps put 
 aside for later inspection.
 For me, I use unread / read as a marker to whether I have actioned a 
 particular email and keep messages unread until such time that they 
 are dealt with.

Yes, quite right. All the stuff in each of those mbox files has been
either put there manually, or by a well-tested procmail recipe from
known contacts etc. I do sometimes file an unread message for later
reading, so I think I will need new.

The (very few) FNs that slip through get hand-filed be me into the
Malware/Spam mbox and get re-learned on the next run of the script
(every night).

So - is the syntax correct for a maildir format? 
In particular will it work with the current structure (i.e. will the
line sa-learn --ham ~/Maildir/.{Misc,Personal,etc}.*/{cur,new}
correctly catch:
.Misc.Clubs.cur
.Misc.Clubs.new
.Misc.Car.cur
.Misc.Car.new
etc...?
 
Should I use --no-sync?

Many thanks for the help so far...

Mark





Re: sa-learn from a cronjob?

2014-05-01 Thread RW
On Wed, 30 Apr 2014 13:52:52 -0600
Bob Proulx wrote:


 
 The maildir exists and a cron script can be used to scan and process
 mail incoming there.  People do it.  It works.  Saying it does not
 work or is not sensible is just wrong mean talk.  People do this all
 of the time.

So do I. I haven't said it's wrong to train from Maildir, the issue is
that Ian's script trains from the new/ directory and then moves it to
cur/. 

When you train from a Maildir that's accessed from a client you need to
train from the cur/ directory (or both new/ and cur/), and either leave
it in that Maildir or move it to another. 

   Ian does it.  I do it.


I don't know how many times I have to repeat this before you
understand, but Ian doesn't use the conventional training folders
that the OP was asking about. His scheme is reliable because his
folders aren't opened in IMAP or an MUA. He acknowledged this in his
own reply to my original comment.

When I looked at Ian's script it was obvious that it either didn't
work for Ian or he was holding something back that the OP needed to
know. Either way a clarification seemed important. 

You accuse me of being negative, but all I've done is present technical
issues. You've made little effort to understand them or address my
points on a technical level, you've spent a lot of time misrepresenting
my motives and accusing me of being confused. Which of us is being
negative?

  








Re: sa-learn from a cronjob?

2014-04-30 Thread Bob Proulx
RW wrote:
 Bob Proulx wrote:
  The script is looping through mail files in a maildir and processing
  them remotely on the server through sa-learn.  After processing the
  messages it is moving the messages to mark them as having been read.
 
 No, the Maildir spec defines the S flag in the info field for marking
 mail as read (seen), the new/ to cur/ move  is done by an IMAP server
 (or a local Unix client) in the first session that sees the new mail. 
 
 Copying an email into an IMAP folder via IMAP will not put it into the
 new/ sub-directory of the underlying maildir. Opening a folder in IMAP
 will empty the new/ sub-directory.
 
 If you don't believe this, I suggest you actually try it on a real
 IMAP server.   I just tried it on Dovecot, and I found it behaves as I
 expected. Newly delivered mail is moved to cur/ when a client is first
 informed about it, copied mail goes to cur/ in the destination mailbox. 

Hmm...  Works for me.  Apparently it works for Ian.  YMMV.

Personally my process removes mail from incoming spam-new folder and
then saves it into the processed spam folder.  That is the way I
prefer to run it.  I use two folders rather than one.  Again YMMV.
Works for me.  Sorry if it does not work for you.

   You might have mentioned that because it means it's not the
   solution you implied when you wrote Here is my cronjob for that
   purpose. It's certainly not appropriate to users that don't like
   the command line.
  
  Sorry but you are incorrect.  Users of Ian's system need not use the
  command line.  His solution directly answered the Dan's question.
 
 No, he said himself that my objections don't apply because it's an
 isolated mailbox that's not read by anything except the cron script. A
 macro in the client places the mail directly into the mailbox (bypassing
 the client's conventional mailbox handling) - this is really only even
 remotely sensible for a local instance of mutt, emacs etc.

I think you are completely misunderstanding how this type of process
works.  And I can't avoid saying that this seems intentional by the
tone.  Sorry.  But that is the way it reads to me.  Have tried to help
in good faith but if that good faith is not reciprocated then I am
going to lose interest very quickly.

But let me try again very briefly one last time anyway since I am an
incorrigible optimist.  Two things are very common.  IMAP servers.
Use of maildir.  One does not require the other.  But they very often
appear together.  It is not required to use mutt or emacs or other of
the traditional email clients for this even if that is a typical
desired developer environment.  All that is required for this type of
scripted method is that the backend use maildirs for mail storage.
That way the files can be scanned and processed offline.  I dare say
that most of the masses use web email clients these days.  Or if not
most then a very large number.  They will never see the maildir.

Since use of maildirs is typical for an IMAP server it means that any
of the plethora of imap clients, including web email interfaces to
imap, can be used to interact with the imap server and through that
the maildir folders on the backend.  A user running an imap client
might never see the maildir.  A user running a web mail client would
certainly never see a maildir.  That doess not mean that the maildir
does not exist.  That does not mean that the maildir cannot be scanned
and processed offline for background training of the Bayes database.

The maildir exists and a cron script can be used to scan and process
mail incoming there.  People do it.  It works.  Saying it does not
work or is not sensible is just wrong mean talk.  People do this all
of the time.  Ian does it.  I do it.  Meanwhile no one is disputing
that there are better ways to do things.  There are always better
ways.  Which is why it is so much appreciated when people share.  Then
we can all learn and move forward.  But what can be said when someone
says that something people are doing and making good use of is not
sensible?  I think I will choose to say nothing more.

 Mostly, it's pretty trivial to train Bayes from Maildir, but there
 is one significant complication, and that's that moving mail between
 Maildirs after training may break IMAP keywords, which some clients
 use for custom flags or for sharing proprietary metadata between
 separate client instances. 

Yes it is pretty trivial.  Which has been the topic of this thread.
Simple scripts to scan and process maildirs.  Here you point out some
likely valid issues of breaking tags.  However maintaining tags for
spam messages moved into the training folder isn't a problem that I
find compelling.  Certainly not compelling enough to not do it.

I look forward to reading your positive contribution to the anti-spam
effort.

Bob

-- 
  http://xkcd.com/386/


Re: sa-learn from a cronjob?

2014-04-27 Thread RW
On Thu, 24 Apr 2014 14:37:52 -0600
Bob Proulx wrote:

 RW wrote:
  Ian Zimmerman wrote:
   RW wrote:
   RW I don't think it will work for the purpose mentioned, and if
   RW it's working properly for you, there's a lot you're not
   RW mentioning.
 
 I looked at the script and it looks like an example that would work
 for Ian fine.
 ...
 
   RW It's only looking for mail in the immediate post-delivery
   RW state after it's been put into the mailbox by an MTA or MDA
   RW and before it's been detected as new mail by an MUA (directly
   RW or via IMAP). It wont learn mail put into the folders by an
   RW MUA or IMAP at all.
 
 No.  That isn't what the script is doing.
 
 The script is looping through mail files in a maildir and processing
 them remotely on the server through sa-learn.  After processing the
 messages it is moving the messages to mark them as having been read.

No, the Maildir spec defines the S flag in the info field for marking
mail as read (seen), the new/ to cur/ move  is done by an IMAP server
(or a local Unix client) in the first session that sees the new mail. 

Copying an email into an IMAP folder via IMAP will not put it into the
new/ sub-directory of the underlying maildir. Opening a folder in IMAP
will empty the new/ sub-directory.

If you don't believe this, I suggest you actually try it on a real
IMAP server.   I just tried it on Dovecot, and I found it behaves as I
expected. Newly delivered mail is moved to cur/ when a client is first
informed about it, copied mail goes to cur/ in the destination mailbox. 

  You might have mentioned that because it means it's not the
  solution you implied when you wrote Here is my cronjob for that
  purpose. It's certainly not appropriate to users that don't like
  the command line.
 
 Sorry but you are incorrect.  Users of Ian's system need not use the
 command line.  His solution directly answered the Dan's question.

No, he said himself that my objections don't apply because it's an
isolated mailbox that's not read by anything except the cron script. A
macro in the client places the mail directly into the mailbox (bypassing
the client's conventional mailbox handling) - this is really only even
remotely sensible for a local instance of mutt, emacs etc.


Mostly, it's pretty trivial to train Bayes from Maildir, but there
is one significant complication, and that's that moving mail between
Maildirs after training may break IMAP keywords, which some clients
use for custom flags or for sharing proprietary metadata between
separate client instances. 








Re: sa-learn from a cronjob?

2014-04-24 Thread RW
On Wed, 23 Apr 2014 19:15:13 -0700
Ian Zimmerman wrote:

 On Sun, 20 Apr 2014 12:14:37 -0700 (PDT)
 Dan Mahoney, System Admin d...@prime.gushi.org wrote:
 
  Most of my users aren't command-line friendly.  I'd like to
  basically have my IMAP server default to handing out two imap
  mailboxes that get auto-crontabbed to training bayes.
 
 Here is my cronjob for that purpose, in its entirety.  

I don't think it will work for the purpose mentioned, and if it's
working properly for you, there's a lot you're not mentioning.

It's only looking for mail in the immediate post-delivery state after
it's been put into the mailbox by an MTA or MDA and before it's
been detected as new mail by an MUA (directly or via IMAP). It wont
learn mail put into the folders by an MUA or IMAP at all.

You need to use separate destination mailboxes.



  1   2   3   4   5   6   7   8   9   >