On 2020-12-09 9:48 am, deano-spamassas...@areyes.com wrote:
> On 2020-12-09 4:41 am, @lbutlr wrote: > > On 08 Dec 2020, at 13:54, micah anderson <mi...@riseup.net> wrote: > Kris Deugau <kdeu...@vianet.ca> writes: There will only be one database and > set of tables, but one of the fields in each table is the user identifier. > Fair warning - if you go full per-user on a large system, this will MASSIVELY > balloon the size of your Bayes database, and most users will idle below the > learning thresholds for quite a long time. > Can you give an idea of the size calculation? I'm wanting to do this, but I > need to figure out how much space I need to allocate per user! That would be pretty hard to predict as it would vary a lot based on the users and the mail. I don't think Bayes is really that big (a few MB max?) It's not big. Here's my personal spamassassin database (just a few users, but SA has been running for years and years ... About 48MB > mysql> SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + INDEX_LENGTH) / > 1024 ) AS `Size (KB)` FROM information_schema.TABLES WHERE TABLE_SCHEMA = > "spamassassin" ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC; > +-------------------+-----------+ > | Table | Size (KB) | > +-------------------+-----------+ > | bayes_token | 48160 | > | awl | 1040 | > | bayes_vars | 32 | > | bayes_seen | 16 | > | bayes_global_vars | 16 | > | bayes_expire | 16 | > +-------------------+-----------+ > 6 rows in set (0.00 sec) I did it again on a test server - same corpus, latest SA etc. It's been trained on ham/spam. > MariaDB [spamassassin]> SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + > INDEX_LENGTH) / 1024 / 1024 ) AS `Size (MB)` FROM information_schema.TABLES > WHERE TABLE_SCHEMA = "spamassassin" ORDER BY (DATA_LENGTH + INDEX_LENGTH) > DESC; > +-------------------+-----------+ > | Table | Size (MB) | > +-------------------+-----------+ > | bayes_token | 118 | > | txrep | 17 | > | bayes_seen | 3 | > | bayes_vars | 0 | > | awl | 0 | > | bayes_expire | 0 | > | bayes_global_vars | 0 | > +-------------------+-----------+ > 7 rows in set (0.001 sec) So a bit bigger.