Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-15 Thread Thomas Eckardt
1 of the changes in 21277 is because of my report.  Very slow startup of 
the rebuild process.
minor fix for too slow SQL servers by preventing one unneeded SQL 
statement

2+ of the changes in 21280 stemmed from my messages.  Too many open files 
in Windows, early bad SSL changes, catching invalid regex instead of ASSP 
crashing
fixes a Super-DAU setup


21287 & 21290: your changes to griplist folder creation, changes/fixes to 
BereleyDB error logging, gui changes, and windows file descriptor changes 
are because of things I've brought up
fixes a Super-DAU setup

21293: The NWLI changes are because of what I asked
has worked before and after code changes nearly the same way (no changes 
for advanced users) - yes code and doc are somehow better now

7 of the 8 changes in 21302 are because of my reports, questions, 
requests, and suggestions.  Related to external file change times not 
being recorded in ASSP (long time bug), improvement in a single file 
changing causing all to be reloaded, changes to the analyzer for reports 
from Outlook, corpus cleanup for DKIM WL/NP matches.
bug in file change time for 'Groups' feature if include files are 
externaly changed, not using any of the recommended and documented ways
anything else is more or less code cosmetic - there is no need to cleanup 
anything from the corpus (it's nice to have) - the default rebuild engine 
is doing it well

21396 more changes because of discussions about Outlook reporting  (FYI  
forward as attachment from Outlook still doesn't result in correct analyze 
reports nor does multiple report attachments in a single email from 
Outlook work at all.)
nothing really changed - one minor change to catch wrong outlook reporting 
.msg + header corrections for wrong reported mails .eml

21317 After my questions about the unusual request for help for a way to 
match username of the recipient to the sender we discovered the bug about 
unoptimized weighted bombs with a scoring parameter and the bug with 
definite statements
?(DEFINE) forced by me - nobody is using it
<<<...>>>=>xx really a bug


So what is left from over 30 of your posts in the last 2 months - hours of 
reading, rereading and answering, analyzing and fixing things which 
normaly never happen, thinking about touching the assp core functions? - 
Not much left - one bug.
To come to an end - for example take the subject of this thread "Scan 
entire message for Bombs, regardless of MaxBytes setting?" 
Everybody who knows the concept of assp will get tears in the eyes if 
reading this. I don't want to talk with you about the assp concept!

Thank you again Ken for testing assp and reporting bugs.
Join the assp forum, if you want. This mailing list has only ~80 members 
the forum has ~1390 members. Possibly you'll get better help there. Every 
forum has a 'Suggestion and Feedback', a 'How do I' and a 
'Troubleshooting' section.

Ken, I don't want to prevent you from posting here using any SF project 
rule - everyone should be and is free to join or left this mailinglist. 
But 'think assp' before you post, keep you posts short, be patient and 
accept some thing like 'I'll think about' and in particular 'No'.

Thomas 




Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  14.11.2021 17:01
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



I can not decypher what this means:
most - where? -> forum , bug tracker , self testing, forced by attackers
and it's my lack of clarity on your short replies which leads me to 
question further.  

I need to find a way to still be able to report my findings and ask my 
questions without being a bother.  The last thing I want to be is a 
burden, but I have no other way to communicate with you, as the sole 
developer on a project that has minimal user communication other than what 
you and I discuss.

While I wish it were easier for me to be more concise, my persistence and 
full description of issues and challenges has resulted in far more than 
the one change you referenced.  I've outlined some of them from the last 7 
versions below. 
1 of the changes in 21277 is because of my report.  Very slow startup of 
the rebuild process.
2+ of the changes in 21280 stemmed from my messages.  Too many open files 
in Windows, early bad SSL changes, catching invalid regex instead of ASSP 
crashing
21287 & 21290: your changes to griplist folder creation, changes/fixes to 
BereleyDB error logging, gui changes, and windows file descriptor changes 
are because of things I've brought up
21293: The NWLI changes are because of what I asked
7 of the 8 changes in 21302 are because of my reports, questions, 
requests, and suggestions.  Related to external file change times not 
being recorded in ASSP (long time bug), improvement in a single file 
changing causing all to be reloaded, changes to the ana

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-15 Thread Thomas Eckardt
>I have no other way to communicate with you

I told you, to use the forum (the assp - forum, what else?). 
http://sourceforge.net/p/assp/forum/

Thomas



Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  14.11.2021 17:01
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



I can not decypher what this means:
most - where? -> forum , bug tracker , self testing, forced by attackers
and it's my lack of clarity on your short replies which leads me to 
question further.  

I need to find a way to still be able to report my findings and ask my 
questions without being a bother.  The last thing I want to be is a 
burden, but I have no other way to communicate with you, as the sole 
developer on a project that has minimal user communication other than what 
you and I discuss.

While I wish it were easier for me to be more concise, my persistence and 
full description of issues and challenges has resulted in far more than 
the one change you referenced.  I've outlined some of them from the last 7 
versions below. 
1 of the changes in 21277 is because of my report.  Very slow startup of 
the rebuild process.
2+ of the changes in 21280 stemmed from my messages.  Too many open files 
in Windows, early bad SSL changes, catching invalid regex instead of ASSP 
crashing
21287 & 21290: your changes to griplist folder creation, changes/fixes to 
BereleyDB error logging, gui changes, and windows file descriptor changes 
are because of things I've brought up
21293: The NWLI changes are because of what I asked
7 of the 8 changes in 21302 are because of my reports, questions, 
requests, and suggestions.  Related to external file change times not 
being recorded in ASSP (long time bug), improvement in a single file 
changing causing all to be reloaded, changes to the analyzer for reports 
from Outlook, corpus cleanup for DKIM WL/NP matches.
21396 more changes because of discussions about Outlook reporting  (FYI  
forward as attachment from Outlook still doesn't result in correct analyze 
reports nor does multiple report attachments in a single email from 
Outlook work at all.)
21317 After my questions about the unusual request for help for a way to 
match username of the recipient to the sender we discovered the bug about 
unoptimized weighted bombs with a scoring parameter and the bug with 
definite statements
And over the years you've added useful features and fixed bugs because of 
my questions or requests which you originally dismissed as being misguided

There's a trend here. When I'm active on this forum, I discuss things that 
lead you to improve ASSP which benefits everyone.

If I had asked my question and then not responded to your short "no" or 
"have you thought about this" type of replies, would these changes have 
been made?  If I hadn't fully described the issue/question/challenge, how 
would you have known what I was talking about?

I will now step away from this form as requested for as long as I am able. 
I do hope that you are willing to entertain future questions/concerns once 
I return, if not for me, then for the rest of the quiet spam fighters on 
this list.

On Sun, Nov 14, 2021 at 5:59 AM Thomas Eckardt  wrote:
>How many of the changes in the last 10 or so versions of ASSP have been 
from the requests of anyone else on this list?  

how many? 1 at 5.11.2021 - weight bug 

most - where? -> forum , bug tracker , self testing, forced by attackers 

You may use the forum, where everyone is free to skip reading your endless 
posts and blogs. It takes simply too much time to pick up the 1 to 5% of 
helpful content and to be forced by you to answer also the rest. 


Thomas 





Von:"K Post"  
An:"ASSP development mailing list" <
assp-test@lists.sourceforge.net> 
Datum:    14.11.2021 00:14 
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? 



I don't know what I've done to deserve that reply, but regardless, I'm 
sorry to have upset you.  I will take a long break from posting 
further here, but please do know that I'm appreciative of your continued 
support of this important program.  

Before I go, please entertain these thoughts:   

I hope that you're able to re-evaluate your request for me to go away.  
I've recommended more very good change requests to ASSP than ones that you 
consider to be bad.  I'm not able to implement them myself.  I'm not 
perfect, but your request for me to sign off of this list, which is a 
critical resource, is unfair. 

How many of the changes in the last 10 or so versions of ASSP have been 
from the requests of anyone else on this list?  How many bugs have been 
quashed because of things I've discovered?  How many improvements did you, 
and only you, make because of questions

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-14 Thread K Post
I can not decypher what this means:

most - where? -> forum , bug tracker , self testing, forced by attackers

and it's my lack of clarity on your short replies which leads me to
question further.

I need to find a way to still be able to report my findings and ask my
questions without being a bother.  The last thing I want to be is a burden,
but I have no other way to communicate with you, as the sole developer on a
project that has minimal user communication other than what you and I
discuss.

While I wish it were easier for me to be more concise, my persistence and
full description of issues and challenges has resulted in far more than the
one change you referenced.  I've outlined some of them from the last 7
versions below.


   - 1 of the changes in 21277 is because of my report.  Very slow startup
   of the rebuild process.
   - 2+ of the changes in 21280 stemmed from my messages.  Too many open
   files in Windows, early bad SSL changes, catching invalid regex instead of
   ASSP crashing
   - 21287 & 21290: your changes to griplist folder creation, changes/fixes
   to BereleyDB error logging, gui changes, and windows file descriptor
   changes are because of things I've brought up
   - 21293: The NWLI changes are because of what I asked
   - 7 of the 8 changes in 21302 are because of my reports, questions,
   requests, and suggestions.  Related to external file change times not being
   recorded in ASSP (long time bug), improvement in a single file changing
   causing all to be reloaded, changes to the analyzer for reports from
   Outlook, corpus cleanup for DKIM WL/NP matches.
   - 21396 more changes because of discussions about Outlook reporting
   (FYI  forward as attachment from Outlook still doesn't result in correct
   analyze reports nor does multiple report attachments in a single email from
   Outlook work at all.)
   - 21317 After my questions about the unusual request for help for a way
   to match username of the recipient to the sender we discovered the bug
   about unoptimized weighted bombs with a scoring parameter and the bug with
   definite statements

And over the years you've added useful features and fixed bugs because of
my questions or requests which you originally dismissed as being misguided

There's a trend here. When I'm active on this forum, I discuss things that
lead you to improve ASSP which benefits everyone.

If I had asked my question and then not responded to your short "no" or
"have you thought about this" type of replies, would these changes have
been made?  If I hadn't fully described the issue/question/challenge, how
would you have known what I was talking about?

I will now step away from this form as requested for as long as I am able.
I do hope that you are willing to entertain future questions/concerns once
I return, if not for me, then for the rest of the quiet spam fighters on
this list.

On Sun, Nov 14, 2021 at 5:59 AM Thomas Eckardt 
wrote:

> >How many of the changes in the last 10 or so versions of ASSP have been
> from the requests of anyone else on this list?
>
> how many? 1 at 5.11.2021 - weight bug
>
> most - where? -> forum , bug tracker , self testing, forced by attackers
>
> You may use the forum, where everyone is free to skip reading your endless
> posts and blogs. It takes simply too much time to pick up the 1 to 5% of
> helpful content and to be forced by you to answer also the rest.
>
>
> Thomas
>
>
>
>
>
> Von:"K Post" 
> An:"ASSP development mailing list" <
> assp-test@lists.sourceforge.net>
> Datum:    14.11.2021 00:14
> Betreff:    Re: [Assp-test] Concept Question: Scan entire message for
> Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?
> --
>
>
>
> I don't know what I've done to deserve that reply, but regardless, I'm
> sorry to have upset you.  I will take a long break from posting
> further here, but please do know that I'm appreciative of your continued
> support of this important program.
>
> Before I go, please entertain these thoughts:
>
> I hope that you're able to re-evaluate your request for me to go away.
> I've recommended more very good change requests to ASSP than ones that you
> consider to be bad.  I'm not able to implement them myself.  I'm not
> perfect, but your request for me to sign off of this list, which is a
> critical resource, is unfair.
>
> How many of the changes in the last 10 or so versions of ASSP have been
> from the requests of anyone else on this list?  How many bugs have been
> quashed because of things I've discovered?  How many improvements did you,
> and only you, make because of questions I've asked and because of feature
> requests I've made (recently and over the many years)?
>
> Are you angry because I'm (a

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-14 Thread Thomas Eckardt
>How many of the changes in the last 10 or so versions of ASSP have been 
from the requests of anyone else on this list? 

how many? 1 at 5.11.2021 - weight bug

most - where? -> forum , bug tracker , self testing, forced by attackers

You may use the forum, where everyone is free to skip reading your endless 
posts and blogs. It takes simply too much time to pick up the 1 to 5% of 
helpful content and to be forced by you to answer also the rest.


Thomas





Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  14.11.2021 00:14
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



I don't know what I've done to deserve that reply, but regardless, I'm 
sorry to have upset you.  I will take a long break from posting 
further here, but please do know that I'm appreciative of your continued 
support of this important program. 

Before I go, please entertain these thoughts:  

I hope that you're able to re-evaluate your request for me to go away.  
I've recommended more very good change requests to ASSP than ones that you 
consider to be bad.  I'm not able to implement them myself.  I'm not 
perfect, but your request for me to sign off of this list, which is a 
critical resource, is unfair.

How many of the changes in the last 10 or so versions of ASSP have been 
from the requests of anyone else on this list?  How many bugs have been 
quashed because of things I've discovered?  How many improvements did you, 
and only you, make because of questions I've asked and because of feature 
requests I've made (recently and over the many years)?

Are you angry because I'm (adminitedly) long winded?  Please understand 
that this is not out of disrespect, it's because I want to make sure that 
I'm being clear.  When I get a short answer, I try to continue the 
conversation.  This is a discussion list after all.

Are you angry because I'm persistent?  My persistence is also not out of 
disrespect, it's because I'm inquisitive,  am by no means an expert in 
coding or the inner workings of spam detection, and have a burning desire 
to continue to see ASSP improve.  Often I ask a detailed question, and 
only get an answer back from you like "have you considered this?" or "no" 
without explanation.  Is it so bad that I ask why not?  I wait patiently 
for your replies, but do inquire more if my questions haven't been fully 
answered.  If you don't have the time or desire to entertain my questions, 
so be it, but please remember that most of what I ask has ultimately led 
to you eventually improving ASSP. 

Anyway, I don't expect and certainly don't require a reply here.  But 
please know that my intentions are pure, I'm charitable, patient, and a 
good person. It hurts deeply that you seem to think otherwise.  I don't 
have the experience nor the ability that you do, not even close, but I 
like to think that even if I can be frustrating that I'm ultimately bring 
some good to the ASSP world by offering suggestions and asking questions.



On Sat, Nov 13, 2021 at 3:56 AM Thomas Eckardt  wrote:
Ken , it would be nice if you consider to signoff this list or at least to 
no longer post here. 

Thank you. 

Thomas





Von:"K Post"  
An:"ASSP development mailing list" <
assp-test@lists.sourceforge.net> 
Datum:    12.11.2021 22:46 
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? 



First off, WOW.  Our rebuild times are in no way similar.   At first I 
thought it was you with fancy SSD's and lots of horsepower, but I'm seeing 
now that you have both useDB4Rebuild off and RebuildUseFileModel on.  The 
opposite of my settings.  I have useDB4Rebuild on and never enabled the 
RebuildUsedFileModel after initial attempts were failing (Early on with 
that feature).  useDB4Rebuild is the default and I was always worried 
about RAM when I started using ASSP 10+ years ago and never looked back.   


A long rebuild time doesn't bother me, but seeing how fast you can do one 
has got me back to needing to test the settings on my end again.  Thanks 
for that encouragement. 


I'm worried that going up to 50k maxbytes on my system seemed to cause a 
lot of false positives.  I don't understand how that's possible, but it's 
what happened.  I would have thought it was the other way around, too much 
spam getting through vs. too much legit being blocked.  Plus, I don't 
think that generally using that much for bayesian is necessary (or maybe 
it's even detrimental?)  Accuracy was very high for me at  6k and 10k, but 
I was missing the bombs.  


The question remains for me about the >CONCEPT< of optionally scanning 
more of a message at the time of attempted delivery for bombs.  ClamAV 
uses its own maximum size setting.  Why not also give us that optio

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-13 Thread K Post
I don't know what I've done to deserve that reply, but regardless, I'm
sorry to have upset you.  I will take a long break from posting
further here, but please do know that I'm appreciative of your continued
support of this important program.

Before I go, please entertain these thoughts:

I hope that you're able to re-evaluate your request for me to go away.
I've recommended more very good change requests to ASSP than ones that you
consider to be bad.  I'm not able to implement them myself.  I'm not
perfect, but your request for me to sign off of this list, which is a
critical resource, is unfair.

How many of the changes in the last 10 or so versions of ASSP have been
from the requests of anyone else on this list?  How many bugs have been
quashed because of things I've discovered?  How many improvements did you,
and only you, make because of questions I've asked and because of feature
requests I've made (recently and over the many years)?

Are you angry because I'm (adminitedly) long winded?  Please understand
that this is not out of disrespect, it's because I want to make sure that
I'm being clear.  When I get a short answer, I try to continue the
conversation.  This is a discussion list after all.

Are you angry because I'm persistent?  My persistence is also not out of
disrespect, it's because I'm inquisitive,  am by no means an expert in
coding or the inner workings of spam detection, and have a burning desire
to continue to see ASSP improve.  Often I ask a detailed question, and only
get an answer back from you like "have you considered this?" or "no"
without explanation.  Is it so bad that I ask why not?  I wait patiently
for your replies, but do inquire more if my questions haven't been fully
answered.  If you don't have the time or desire to entertain my questions,
so be it, but please remember that most of what I ask has ultimately led to
you eventually improving ASSP.

Anyway, I don't expect and certainly don't require a reply here.  But
please know that my intentions are pure, I'm charitable, patient, and a
good person. It hurts deeply that you seem to think otherwise.  I don't
have the experience nor the ability that you do, not even close, but I like
to think that even if I can be frustrating that I'm ultimately bring some
good to the ASSP world by offering suggestions and asking questions.



On Sat, Nov 13, 2021 at 3:56 AM Thomas Eckardt 
wrote:

> Ken , it would be nice if you consider to signoff this list or at least to
> no longer post here.
>
> Thank you.
>
> Thomas
>
>
>
>
>
> Von:"K Post" 
> An:"ASSP development mailing list" <
> assp-test@lists.sourceforge.net>
> Datum:    12.11.2021 22:46
> Betreff:    Re: [Assp-test] Concept Question: Scan entire message for
> Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?
> --
>
>
>
> First off, WOW.  Our rebuild times are in no way similar.   At first I
> thought it was you with fancy SSD's and lots of horsepower, but I'm seeing
> now that you have both useDB4Rebuild off and RebuildUseFileModel on.  The
> opposite of my settings.  I have useDB4Rebuild on and never enabled the
> RebuildUsedFileModel after initial attempts were failing (Early on with
> that feature).  useDB4Rebuild is the default and I was always worried about
> RAM when I started using ASSP 10+ years ago and never looked back.
>
> A long rebuild time doesn't bother me, but seeing how fast you can do one
> has got me back to needing to test the settings on my end again.  Thanks
> for that encouragement.
>
>
> I'm worried that going up to 50k maxbytes on my system seemed to cause a
> lot of false positives.  I don't understand how that's possible, but it's
> what happened.  I would have thought it was the other way around, too much
> spam getting through vs. too much legit being blocked.  Plus, I don't think
> that generally using that much for bayesian is necessary (or maybe it's
> even detrimental?)  Accuracy was very high for me at  6k and 10k, but I was
> missing the bombs.
>
>
> The question remains for me about the >CONCEPT< of optionally scanning
> more of a message at the time of attempted delivery for bombs.  ClamAV uses
> its own maximum size setting.  Why not also give us that option for Bombs?
> For the case I explained where bombs are late in the email body and likely
> other scenarios, don't you think it would be helpful to have a
> BombAddlBytes variable in the GUI?
>
> You know there's no way that I could ever code a plugin and that there's
> even less of a chance of this charity paying for one to be built!  I still
> have duct tape holding my desk chair together.
>
> Modifying getbody seems pretty straight forward.  Add a new variable
> called $bombdataref that would be us

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-13 Thread Thomas Eckardt
Ken , it would be nice if you consider to signoff this list or at least to 
no longer post here.

Thank you.

Thomas





Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  12.11.2021 22:46
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



First off, WOW.  Our rebuild times are in no way similar.   At first I 
thought it was you with fancy SSD's and lots of horsepower, but I'm seeing 
now that you have both useDB4Rebuild off and RebuildUseFileModel on.  The 
opposite of my settings.  I have useDB4Rebuild on and never enabled the 
RebuildUsedFileModel after initial attempts were failing (Early on with 
that feature).  useDB4Rebuild is the default and I was always worried 
about RAM when I started using ASSP 10+ years ago and never looked back.  

A long rebuild time doesn't bother me, but seeing how fast you can do one 
has got me back to needing to test the settings on my end again.  Thanks 
for that encouragement.


I'm worried that going up to 50k maxbytes on my system seemed to cause a 
lot of false positives.  I don't understand how that's possible, but it's 
what happened.  I would have thought it was the other way around, too much 
spam getting through vs. too much legit being blocked.  Plus, I don't 
think that generally using that much for bayesian is necessary (or maybe 
it's even detrimental?)  Accuracy was very high for me at  6k and 10k, but 
I was missing the bombs. 


The question remains for me about the >CONCEPT< of optionally scanning 
more of a message at the time of attempted delivery for bombs.  ClamAV 
uses its own maximum size setting.  Why not also give us that option for 
Bombs?  For the case I explained where bombs are late in the email body 
and likely other scenarios, don't you think it would be helpful to have a 
BombAddlBytes variable in the GUI? 

You know there's no way that I could ever code a plugin and that there's 
even less of a chance of this charity paying for one to be built!  I still 
have duct tape holding my desk chair together.  

Modifying getbody seems pretty straight forward.  Add a new variable 
called $bombdataref that would be used in place of $dataref for all bomb 
comparisons - similarly to the way that $clamavbytes is for the clamav 
stuff.  
my $bombdataref = $maxbytes + $BombAddlBytes : $BombAddlBytes : 0;
then, instead of if ( ! BombOK( $fh, $dataref ) ) { 
if ( ! BombOK( $fh, $bombdataref ) ) {
and the like everywhere that there's a bomb or script check in getbody

There would also need to be changes in analyze and anywhere else that the 
bomb checks are done.

I'm more than willing to try to modify ASSP as described above, give it a 
go, and report back.  It won't be easy for me to make the changes and have 
it work, but I'm game.  Before I do though, I'm concerned that you don't 
think that scanning more for bombs is a sound concept.  Or maybe you just 
don't think it's necessary?  I'm most interested in your opinion on that 
before I move forward.




On Fri, Nov 12, 2021 at 1:08 PM Thomas Eckardt  wrote:
Nov-12-21 04:00:20 RebuildSpamDB-thread rebuildspamdb-version 8.14 started 
in ASSP version 2.6.6(21314) 

Nov-12-21 04:00:20 detection of local disclaimers is enabled 

Nov-12-21 04:00:20 info: 'useDB4Rebuild' is NOT set to on - the rebuild 
spamdb process will possibly require a large amount of memory - but it 
will run very fast! 

Nov-12-21 04:00:20 RebuildSpamDB reloaded and uses the internal FileModel 
(with 39917 entries) to speedup processing 

Nov-12-21 04:00:20 RebuildSpamDB allocated 963.08 MByte of RAM to load the 
internal FileModel 

Nov-12-21 04:00:20 RebuildSpamDB will create a Hidden Markov Model 

Nov-12-21 04:00:20 RebuildSpamDB will include attachment-database-entries 
in to spamdb 

Nov-12-21 04:00:20 RebuildSpamDB will create unicode enabled databases 

Nov-12-21 04:00:20 RebuildSpamDB will process all words as Sequence of UAX 
#29 Grapheme Clusters 

Nov-12-21 04:00:20 RebuildSpamDB will normalize unicode characters 

Nov-12-21 04:00:20 RebuildSpamDB will use the ASSP_WordStem engine 

Nov-12-21 04:00:20 ---ASSP Settings--- 

Nov-12-21 04:00:20 RebuildSpamDB will create private spamdb entries for 
users email addresses and each local domain. 

Nov-12-21 04:00:20 Do Not Collect RedRe Messages: Enabled 
**Messages matching the RedRe will be removed from the corpus!** 

Nov-12-21 04:00:20 Use Subject as Maillog Names: True 
Nov-12-21 04:00:20 Maxbytes: 25,000 
Nov-12-21 04:00:20 Maxfiles: 31,000 
Nov-12-21 04:00:20 RebuildFileTimeLimit: 1 5 
Nov-12-21 04:00:20 RebuildFileTimeLimit: files will be moved away from the 
corpus if their processing takes longer than 5 second(s) 

processing ~40.000 corpus files in ~4 minutes 
building 15.500 spamdb.helo records in 2 seconds 
building 3.200.000 spamdb records in 25 seconds 
building 7.200.000 hmmdb records in 1:33 seconds 

complete pr

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-12 Thread K Post
mprove detection rates)
>
> If you need to process complete mails for bombs - you'll need to write
> your own level 2 assp-plugin.
>
> Thomas
>
>
>
>
>
>
>
>
> Von:    "K Post" 
> An:    "ASSP development mailing list" <
> assp-test@lists.sourceforge.net>
> Datum:12.11.2021 16:56
> Betreff:Re: [Assp-test] Concept Question: Scan entire message for
> Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?
> --
>
>
>
>
> Absolutely I've thought about this.  I consider everything I post prior to
> posting.
>
> Can you briefly explain why the ability to scan (MaxBytes + some
> additional amount)kb on incoming mails for bombs but only use MaxBytes for
> bayesian and the rebuild would be such a bad idea?
>
> Since you questioned if I ever thought about this, here's what the thought
> process is and the reason for the request.  Maybe I didn't explain myself
> well enough in the previous messages:
>
> The MaxBytes "documentation" says to lower it to 3000 for a mature
> installation, but 10x larger than that if you can handle it.
>
> How many bytes of the message body will ASSP look at - the message header
> is always included in all checks. Mails stored in the collecting folders
> will be truncated to this size, if StoreCompleteMail is disabled. *The
> average of Ham messages (message body) is 6K, the average of Spam messages
> is 3K.* Usually the spam folder will be filled quicker than the notspam
> folder, therefore set this value to 4000 to get more wordpairs per Ham
> Message. When both folders are close to the maxfiles limit, reduce it to
> 3000.
>
> If your system is fast enough and has enough RAM multiply all the above
> recommendations and the default value by ten.
>
>
> The gui doesn't say "IF the average is 6k ham, 3k spam," is says that it
> IS 6k ham / 3k spam.  That's not true of my installation.  My average spam
> size, as I've mentioned before, has a median size of about 20kb because of
> all of the html in them.  And not-spam has a median size of 40kb.  Using
> the logic in your gui, *I believe I should set my MaxBytes to 20kb*, the
> median size of my spam corpus.
>
> But, if I set my MaxBytes to 20kb (which it appears to be able to handle
> okay, rebuilding in an hour and change), then bombs after 20kb aren't
> detected when a message is attempting delivery.
>
> Why does this matter to me?
> We're seeing messages with @*gmail.com* <http://gmail.com/> and @
> *whatever.onmicrosoft.com* <http://whatever.onmicrosoft.com/> addresses
> that are copying legitimate looking order receipts from vendors like
> Amazon.com, BestBuy (US based big box electronics store), and Norton.  Many
> look identical to a legitimate message.  Ultimately, they want to call them
> on the phone and give your credit card number, using the guise that they're
> going to refund it.  Classic scam.
>
> These messages will always pass bayesian, they read identically to real
> messages.  BUT, I can detect some with the phone numbers that they direct
> people to.   The email addresses change frequently, but the scam phone
> numbers remain pretty constant.  I could maintain a list of known bad phone
> numbers (also available online) to capture these messages before they're
> delivered.  Simple.  If the message has one of these phone numbers, score
> it such that it'll get blocked.
>
> *The problem with many of these emails is that the phone number is way
> past the 3k mark, and past the 20k mark too.  The scammers have a bunch of
> HTML in the "confirmation" email, just like real stores tend to do.  I
> tried increasing MaxBytes up to 50kb, which easily caught messages with
> bombs later in the body, but that then seemed to cause a lot of false
> positives and obviously much longer rebuild process.  *
>
> If there could be a "continue canning for bombs for ___kb after maxbytes"
> setting, that would let bombs later in the body be detected.  I don't know
> what the downside to having such a feature would be.
>
>
> Based on your reaction to my question, I'm obviously missing something
> important.
>
>
>
>
>
> On Thu, Nov 11, 2021 at 1:38 AM Thomas Eckardt <
> *thomas.ecka...@thockar.com* > wrote:
> >Is there logic to having a separate MaxBytes setting like
> MaxBytesForBombs that's used only during message delivery?  That way, the
> entire message can be scanned for bombs, but the rebuild could use a lower
> number to better balance the differential between the average sized spam
> and average sized not-spam message.
>
> DID YOU EVER thougth about that ??? Or d

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-12 Thread Thomas Eckardt
Nov-12-21 04:00:20 RebuildSpamDB-thread rebuildspamdb-version 8.14 started 
in ASSP version 2.6.6(21314)

Nov-12-21 04:00:20 detection of local disclaimers is enabled

Nov-12-21 04:00:20 info: 'useDB4Rebuild' is NOT set to on - the rebuild 
spamdb process will possibly require a large amount of memory - but it 
will run very fast!

Nov-12-21 04:00:20 RebuildSpamDB reloaded and uses the internal FileModel 
(with 39917 entries) to speedup processing

Nov-12-21 04:00:20 RebuildSpamDB allocated 963.08 MByte of RAM to load the 
internal FileModel

Nov-12-21 04:00:20 RebuildSpamDB will create a Hidden Markov Model

Nov-12-21 04:00:20 RebuildSpamDB will include attachment-database-entries 
in to spamdb

Nov-12-21 04:00:20 RebuildSpamDB will create unicode enabled databases

Nov-12-21 04:00:20 RebuildSpamDB will process all words as Sequence of UAX 
#29 Grapheme Clusters

Nov-12-21 04:00:20 RebuildSpamDB will normalize unicode characters

Nov-12-21 04:00:20 RebuildSpamDB will use the ASSP_WordStem engine

Nov-12-21 04:00:20 ---ASSP Settings---

Nov-12-21 04:00:20 RebuildSpamDB will create private spamdb entries for 
users email addresses and each local domain.

Nov-12-21 04:00:20 Do Not Collect RedRe Messages: Enabled
**Messages matching the RedRe will be removed from the corpus!**

Nov-12-21 04:00:20 Use Subject as Maillog Names: True
Nov-12-21 04:00:20 Maxbytes: 25,000 
Nov-12-21 04:00:20 Maxfiles: 31,000 
Nov-12-21 04:00:20 RebuildFileTimeLimit: 1 5 
Nov-12-21 04:00:20 RebuildFileTimeLimit: files will be moved away from the 
corpus if their processing takes longer than 5 second(s) 

processing ~40.000 corpus files in ~4 minutes
building 15.500 spamdb.helo records in 2 seconds
building 3.200.000 spamdb records in 25 seconds
building 7.200.000 hmmdb records in 1:33 seconds

complete processing time is 6 minutes.

populating the records to the mysql database takes some minutes longer


So -  maxBytes:=100.000 seems to be a possible setting (but this will IMHO 
not improve detection rates)

If you need to process complete mails for bombs - you'll need to write 
your own level 2 assp-plugin.

Thomas








Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  12.11.2021 16:56
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?




Absolutely I've thought about this.  I consider everything I post prior to 
posting.

Can you briefly explain why the ability to scan (MaxBytes + some 
additional amount)kb on incoming mails for bombs but only use MaxBytes for 
bayesian and the rebuild would be such a bad idea?

Since you questioned if I ever thought about this, here's what the thought 
process is and the reason for the request.  Maybe I didn't explain myself 
well enough in the previous messages:

The MaxBytes "documentation" says to lower it to 3000 for a mature 
installation, but 10x larger than that if you can handle it.

How many bytes of the message body will ASSP look at - the message header 
is always included in all checks. Mails stored in the collecting folders 
will be truncated to this size, if StoreCompleteMail is disabled. The 
average of Ham messages (message body) is 6K, the average of Spam messages 
is 3K. Usually the spam folder will be filled quicker than the notspam 
folder, therefore set this value to 4000 to get more wordpairs per Ham 
Message. When both folders are close to the maxfiles limit, reduce it to 
3000.
If your system is fast enough and has enough RAM multiply all the above 
recommendations and the default value by ten.

The gui doesn't say "IF the average is 6k ham, 3k spam," is says that it 
IS 6k ham / 3k spam.  That's not true of my installation.  My average spam 
size, as I've mentioned before, has a median size of about 20kb because of 
all of the html in them.  And not-spam has a median size of 40kb.  Using 
the logic in your gui, I believe I should set my MaxBytes to 20kb, the 
median size of my spam corpus.  

But, if I set my MaxBytes to 20kb (which it appears to be able to handle 
okay, rebuilding in an hour and change), then bombs after 20kb aren't 
detected when a message is attempting delivery.  

Why does this matter to me?
We're seeing messages with @gmail.com and @whatever.onmicrosoft.com 
addresses that are copying legitimate looking order receipts from vendors 
like Amazon.com, BestBuy (US based big box electronics store), and 
Norton.  Many look identical to a legitimate message.  Ultimately, they 
want to call them on the phone and give your credit card number, using the 
guise that they're going to refund it.  Classic scam.

These messages will always pass bayesian, they read identically to real 
messages.  BUT, I can detect some with the phone numbers that they direct 
people to.   The email addresses change frequently, but the scam phone 
numbers remain pretty constant.  I could maintain a list of known bad 
phone nu

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-12 Thread K Post
Absolutely I've thought about this.  I consider everything I post prior to
posting.

Can you briefly explain why the ability to scan (MaxBytes + some additional
amount)kb on incoming mails for bombs but only use MaxBytes for bayesian
and the rebuild would be such a bad idea?

Since you questioned if I ever thought about this, here's what the thought
process is and the reason for the request.  Maybe I didn't explain myself
well enough in the previous messages:

The MaxBytes "documentation" says to lower it to 3000 for a mature
installation, but 10x larger than that if you can handle it.

How many bytes of the message body will ASSP look at - the message header
is always included in all checks. Mails stored in the collecting folders
will be truncated to this size, if StoreCompleteMail is disabled. *The
average of Ham messages (message body) is 6K, the average of Spam messages
is 3K.* Usually the spam folder will be filled quicker than the notspam
folder, therefore set this value to 4000 to get more wordpairs per Ham
Message. When both folders are close to the maxfiles limit, reduce it to
3000.
If your system is fast enough and has enough RAM multiply all the above
recommendations and the default value by ten.


The gui doesn't say "IF the average is 6k ham, 3k spam," is says that it IS
6k ham / 3k spam.  That's not true of my installation.  My average spam
size, as I've mentioned before, has a median size of about 20kb because of
all of the html in them.  And not-spam has a median size of 40kb.  Using
the logic in your gui, *I believe I should set my MaxBytes to 20kb*, the
median size of my spam corpus.

But, if I set my MaxBytes to 20kb (which it appears to be able to handle
okay, rebuilding in an hour and change), then bombs after 20kb aren't
detected when a message is attempting delivery.

Why does this matter to me?
We're seeing messages with @gmail.com and @whatever.onmicrosoft.com
addresses that are copying legitimate looking order receipts from vendors
like Amazon.com, BestBuy (US based big box electronics store), and Norton.
Many look identical to a legitimate message.  Ultimately, they want to call
them on the phone and give your credit card number, using the guise that
they're going to refund it.  Classic scam.

These messages will always pass bayesian, they read identically to real
messages.  BUT, I can detect some with the phone numbers that they direct
people to.   The email addresses change frequently, but the scam phone
numbers remain pretty constant.  I could maintain a list of known bad phone
numbers (also available online) to capture these messages before they're
delivered.  Simple.  If the message has one of these phone numbers, score
it such that it'll get blocked.

*The problem with many of these emails is that the phone number is way past
the 3k mark, and past the 20k mark too.  The scammers have a bunch of HTML
in the "confirmation" email, just like real stores tend to do.  I tried
increasing MaxBytes up to 50kb, which easily caught messages with bombs
later in the body, but that then seemed to cause a lot of false positives
and obviously much longer rebuild process.  *

If there could be a "continue canning for bombs for ___kb after maxbytes"
setting, that would let bombs later in the body be detected.  I don't know
what the downside to having such a feature would be.


Based on your reaction to my question, I'm obviously missing something
important.





On Thu, Nov 11, 2021 at 1:38 AM Thomas Eckardt 
wrote:

> >Is there logic to having a separate MaxBytes setting like
> MaxBytesForBombs that's used only during message delivery?  That way, the
> entire message can be scanned for bombs, but the rebuild could use a lower
> number to better balance the differential between the average sized spam
> and average sized not-spam message.
>
> DID YOU EVER thougth about that ??? Or do you only write
> something to fillup the community mailing list?
>
> No - no way!
>
> Thomas
>
>
>
>
>
>
>
> Von:"K Post" 
> An:"ASSP development mailing list" <
> assp-test@lists.sourceforge.net>
> Datum:        10.11.2021 20:22
> Betreff:Re: [Assp-test] Concept Question: Scan entire message for
> Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?
> --
>
>
>
> After about 12 weeks of going from MaxBytes of 4k to MaxBytes of 50k, 've
> seen:
> 1) Rebuild go from just over an hour (with 30k MaxFiles) to just over 2
> hours.  I'm fine with that, there's more to scan
> 2) Bomb detections improve, as a lot of what's detected is beyond the 20k
> or 30k mark
> 3) but, bayesian false positives going way up.  Lots of mail that would
> have (correctly) been delivered, is now getting too high of a score and is
> blocked.
>
> Surely #3 is specific to the

Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-10 Thread Thomas Eckardt
>Is there logic to having a separate MaxBytes setting like 
MaxBytesForBombs that's used only during message delivery?  That way, the 
entire message can be scanned for bombs, but the rebuild could use a lower 
number to better balance the differential between the average sized spam 
and average sized not-spam message.

DID YOU EVER thougth about that ??? Or do you only write 
something to fillup the community mailing list?

No - no way!

Thomas







Von:"K Post" 
An: "ASSP development mailing list" 
Datum:  10.11.2021 20:22
Betreff:    Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



After about 12 weeks of going from MaxBytes of 4k to MaxBytes of 50k, 've 
seen:
1) Rebuild go from just over an hour (with 30k MaxFiles) to just over 2 
hours.  I'm fine with that, there's more to scan
2) Bomb detections improve, as a lot of what's detected is beyond the 20k 
or 30k mark
3) but, bayesian false positives going way up.  Lots of mail that would 
have (correctly) been delivered, is now getting too high of a score and is 
blocked.

Surely #3 is specific to the types of messages my users are getting and I 
can tweak settings.  BUT, it makes me raise this question again:
Is there logic to having a separate MaxBytes setting like MaxBytesForBombs 
that's used only during message delivery?  That way, the entire message 
can be scanned for bombs, but the rebuild could use a lower number to 
better balance the differential between the average sized spam and average 
sized not-spam message.



On Mon, Nov 1, 2021 at 2:43 PM K Post  wrote:
When looking at the "Use this HTML Parser" section on the GUI, I found 
this line:
it is recommended to set MaxBytes to 5 (be carefull on heavy load 
systems - spam bomb regular expressions will take longer using 5!).\
I'm going to change my settings and see how bad the rebuild time is.  I've 
got enough processing power and RAM now, but the disks aren't SSD.  Just a 
4 disk Raid 1+0 traditional HDD setup.  We'll see...

Since HTMl email accounts for a big percentage of all mail,  might it be a 
good idea to update/expand the guidance in the MaxBytes section of the 
GUI?   



On Fri, Oct 29, 2021 at 8:40 PM K Post  wrote:
Summary:
Should/could any consideration be given to having ASSP scan the entire 
message at the time it is received for Bombs (only), while still using 
MaxBytes for Bayesian/HMM?

We've been having some cleverly crafted messages slipping through all 
filters that would be easy to catch with Bombs if only the catchable 
content came before MaxBytes.  These messages are 20kb+, They have a scam 
phone number at the very end of the larger than MaxBytes messages.  I 
want/need to use bombs to catch the scam phone numbers.

With MaxBytes set to 3000, which is useful for faster RebuildSpamDB, these 
BombDataRE matches just aren't being caught.  If I increase MaxBytes, my 
BombDataRE catches them, but then rebuildspamdb is (probably? see below) 
longer than it needs to be.

So, is there any value in considering a MaxBytesAdditionalForBombs 
variable which would be added to MaxBytes and only used when scanning for 
bombs as messages arrive?   Would that kill performance??  Other 
downsides?

We could still only look at MaxBytes for Bayesian/HMM since it's only 
MaxBytes used when building those databases.

What do you think?

And while we're talking MaxBytes:
I've asked this before, is the guidance for 3kb for MaxBytes once there's 
a mature corpus still a valid recommendation?  With unlimited horsepower 
and ram, sure, why not, do 30kb or 100kb.  That's not my reality, so I 
want to see where to best allocate resources. If 3kb is still the 
guidance, even though the spam files I'm seeing have a median size around 
20kb, so be it.  I feel like when that guidance was written, html wasn't 
used as prolifically in spam.  The median size of notspam in my corpus is 
about 40kb.  That's determined unscientifically by sorting by size and 
scrolling to approximately half way down.

Thanks.  Have a good weekend.
Ken
___
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
***
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
***


___
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test


Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-10 Thread K Post
After about 12 weeks of going from MaxBytes of 4k to MaxBytes of 50k, 've
seen:
1) Rebuild go from just over an hour (with 30k MaxFiles) to just over 2
hours.  I'm fine with that, there's more to scan
2) Bomb detections improve, as a lot of what's detected is beyond the 20k
or 30k mark
3) but, bayesian false positives going way up.  Lots of mail that would
have (correctly) been delivered, is now getting too high of a score and is
blocked.

Surely #3 is specific to the types of messages my users are getting and I
can tweak settings.  BUT, it makes me raise this question again:
Is there logic to having a separate MaxBytes setting like MaxBytesForBombs
that's used only during message delivery?  That way, the entire message can
be scanned for bombs, but the rebuild could use a lower number to better
balance the differential between the average sized spam and average sized
not-spam message.



On Mon, Nov 1, 2021 at 2:43 PM K Post  wrote:

> When looking at the "Use this HTML Parser" section on the GUI, I found
> this line:
>
> it is recommended to set MaxBytes to 5 (be carefull on heavy load
> systems - spam bomb regular expressions will take longer using 5!).\
>
> I'm going to change my settings and see how bad the rebuild time is.  I've
> got enough processing power and RAM now, but the disks aren't SSD.  Just a
> 4 disk Raid 1+0 traditional HDD setup.  We'll see...
>
> Since HTMl email accounts for a big percentage of all mail,  might it be a
> good idea to update/expand the guidance in the MaxBytes section of the
> GUI?
>
>
>
> On Fri, Oct 29, 2021 at 8:40 PM K Post  wrote:
>
>> Summary:
>> *Should/could any consideration be given to having ASSP scan the entire
>> message at the time it is received for Bombs (only), while still using
>> MaxBytes for Bayesian/HMM?*
>>
>> We've been having some cleverly crafted messages slipping through all
>> filters that would be easy to catch with Bombs if only the catchable
>> content came before MaxBytes.  These messages are 20kb+, They have a scam
>> phone number at the very end of the larger than MaxBytes messages.  I
>> want/need to use bombs to catch the scam phone numbers.
>>
>> With MaxBytes set to 3000, which is useful for faster RebuildSpamDB,
>> these BombDataRE matches just aren't being caught.  If I increase MaxBytes,
>> my BombDataRE catches them, but then rebuildspamdb is (probably? see below)
>> longer than it needs to be.
>>
>> So, is there any value in considering a* MaxBytesAdditionalForBombs *variable
>> which would be *added to MaxBytes *and only used when scanning for bombs
>> as messages arrive?   Would that kill performance??  Other downsides?
>>
>> We could still only look at MaxBytes for Bayesian/HMM since it's only
>> MaxBytes used when building those databases.
>>
>> What do you think?
>>
>> And while we're talking MaxBytes:
>> I've asked this before, is the guidance for 3kb for MaxBytes once there's
>> a mature corpus still a valid recommendation?  With unlimited horsepower
>> and ram, sure, why not, do 30kb or 100kb.  That's not my reality, so I want
>> to see where to best allocate resources. If 3kb is still the guidance, even
>> though the spam files I'm seeing have a median size around 20kb, so be it.
>> I feel like when that guidance was written, html wasn't used as
>> prolifically in spam.  The median size of notspam in my corpus is about
>> 40kb.  That's determined unscientifically by sorting by size and scrolling
>> to approximately half way down.
>>
>> Thanks.  Have a good weekend.
>> Ken
>>
>>
___
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test


Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-11-01 Thread K Post
When looking at the "Use this HTML Parser" section on the GUI, I found this
line:

it is recommended to set MaxBytes to 5 (be carefull on heavy load
systems - spam bomb regular expressions will take longer using 5!).\

I'm going to change my settings and see how bad the rebuild time is.  I've
got enough processing power and RAM now, but the disks aren't SSD.  Just a
4 disk Raid 1+0 traditional HDD setup.  We'll see...

Since HTMl email accounts for a big percentage of all mail,  might it be a
good idea to update/expand the guidance in the MaxBytes section of the
GUI?



On Fri, Oct 29, 2021 at 8:40 PM K Post  wrote:

> Summary:
> *Should/could any consideration be given to having ASSP scan the entire
> message at the time it is received for Bombs (only), while still using
> MaxBytes for Bayesian/HMM?*
>
> We've been having some cleverly crafted messages slipping through all
> filters that would be easy to catch with Bombs if only the catchable
> content came before MaxBytes.  These messages are 20kb+, They have a scam
> phone number at the very end of the larger than MaxBytes messages.  I
> want/need to use bombs to catch the scam phone numbers.
>
> With MaxBytes set to 3000, which is useful for faster RebuildSpamDB, these
> BombDataRE matches just aren't being caught.  If I increase MaxBytes, my
> BombDataRE catches them, but then rebuildspamdb is (probably? see below)
> longer than it needs to be.
>
> So, is there any value in considering a* MaxBytesAdditionalForBombs *variable
> which would be *added to MaxBytes *and only used when scanning for bombs
> as messages arrive?   Would that kill performance??  Other downsides?
>
> We could still only look at MaxBytes for Bayesian/HMM since it's only
> MaxBytes used when building those databases.
>
> What do you think?
>
> And while we're talking MaxBytes:
> I've asked this before, is the guidance for 3kb for MaxBytes once there's
> a mature corpus still a valid recommendation?  With unlimited horsepower
> and ram, sure, why not, do 30kb or 100kb.  That's not my reality, so I want
> to see where to best allocate resources. If 3kb is still the guidance, even
> though the spam files I'm seeing have a median size around 20kb, so be it.
> I feel like when that guidance was written, html wasn't used as
> prolifically in spam.  The median size of notspam in my corpus is about
> 40kb.  That's determined unscientifically by sorting by size and scrolling
> to approximately half way down.
>
> Thanks.  Have a good weekend.
> Ken
>
>
___
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test


[Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?

2021-10-29 Thread K Post
Summary:
*Should/could any consideration be given to having ASSP scan the entire
message at the time it is received for Bombs (only), while still using
MaxBytes for Bayesian/HMM?*

We've been having some cleverly crafted messages slipping through all
filters that would be easy to catch with Bombs if only the catchable
content came before MaxBytes.  These messages are 20kb+, They have a scam
phone number at the very end of the larger than MaxBytes messages.  I
want/need to use bombs to catch the scam phone numbers.

With MaxBytes set to 3000, which is useful for faster RebuildSpamDB, these
BombDataRE matches just aren't being caught.  If I increase MaxBytes, my
BombDataRE catches them, but then rebuildspamdb is (probably? see below)
longer than it needs to be.

So, is there any value in considering a* MaxBytesAdditionalForBombs *variable
which would be *added to MaxBytes *and only used when scanning for bombs as
messages arrive?   Would that kill performance??  Other downsides?

We could still only look at MaxBytes for Bayesian/HMM since it's only
MaxBytes used when building those databases.

What do you think?

And while we're talking MaxBytes:
I've asked this before, is the guidance for 3kb for MaxBytes once there's a
mature corpus still a valid recommendation?  With unlimited horsepower and
ram, sure, why not, do 30kb or 100kb.  That's not my reality, so I want to
see where to best allocate resources. If 3kb is still the guidance, even
though the spam files I'm seeing have a median size around 20kb, so be it.
I feel like when that guidance was written, html wasn't used as
prolifically in spam.  The median size of notspam in my corpus is about
40kb.  That's determined unscientifically by sorting by size and scrolling
to approximately half way down.

Thanks.  Have a good weekend.
Ken
___
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test