On 2020-12-09 9:48 am, deano-spamassas...@areyes.com wrote: 

> On 2020-12-09 4:41 am, @lbutlr wrote: 
> 
> On 08 Dec 2020, at 13:54, micah anderson <mi...@riseup.net> wrote:
> Kris Deugau <kdeu...@vianet.ca> writes: There will only be one database and 
> set of tables, but one of the fields in each table is the user identifier. 
> Fair warning - if you go full per-user on a large system, this will MASSIVELY 
> balloon the size of your Bayes database, and most users will idle below the 
> learning thresholds for quite a long time.

> Can you give an idea of the size calculation? I'm wanting to do this, but I 
> need to figure out how much space I need to allocate per user!

That would be pretty hard to predict as it would vary a lot based on the
users and the mail.

I don't think Bayes is really that big (a few MB max?)

It's not big. Here's my personal spamassassin database (just a few
users, but SA has been running for years and years ... About 48MB 

> mysql> SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + INDEX_LENGTH) / 
> 1024 ) AS `Size (KB)` FROM information_schema.TABLES WHERE TABLE_SCHEMA = 
> "spamassassin" ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
> +-------------------+-----------+
> | Table | Size (KB) |
> +-------------------+-----------+
> | bayes_token | 48160 |
> | awl | 1040 |
> | bayes_vars | 32 |
> | bayes_seen | 16 |
> | bayes_global_vars | 16 |
> | bayes_expire | 16 |
> +-------------------+-----------+
> 6 rows in set (0.00 sec)

I did it again on a test server - same corpus, latest SA etc. It's been
trained on ham/spam. 

> MariaDB [spamassassin]> SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + 
> INDEX_LENGTH) / 1024 / 1024 ) AS `Size (MB)` FROM information_schema.TABLES 
> WHERE TABLE_SCHEMA = "spamassassin" ORDER BY (DATA_LENGTH + INDEX_LENGTH) 
> DESC;
> +-------------------+-----------+
> | Table | Size (MB) |
> +-------------------+-----------+
> | bayes_token | 118 |
> | txrep | 17 |
> | bayes_seen | 3 |
> | bayes_vars | 0 |
> | awl | 0 |
> | bayes_expire | 0 |
> | bayes_global_vars | 0 |
> +-------------------+-----------+
> 7 rows in set (0.001 sec)

So a bit bigger. 

Reply via email to