Benoit Tellier created JAMES-3937:
-------------------------------------
Summary: Improve CassandraThreadIdGuessingAlgorithm
Key: JAMES-3937
URL: https://issues.apache.org/jira/browse/JAMES-3937
Project: James Server
Issue Type: Improvement
Components: cassandra, mailbox
Affects Versions: 3.8.0
Reporter: Benoit Tellier
Fix For: 3.9.0
h3. Why?
CassandraThreadIdGuessingAlgorithm tables occupies a non neglictible amount of
space.
Out of a 20 GB database I have in one of my production platform:
{code:java}
Table: threadlookuptable
SSTable count: 4
Space used (total): 360 263 739
Table: threadtable
SSTable count: 8
Space used (total): 1 050 590 715
{code}
Which is non neglictible.
The goal here would be to reduce the space used in database by thread
allocation.
h4. Other concerns
Storing subjects as is linked to usernames is likely problematic in terms of
privacy.
h3. How ?
Thread Guessing Algorithm do not need raw values to operate but works with
hashs as demonstrated by [insert poc PR link here].
h3. Impact ?
As threads are partitionned by users risk of collision is extremly low and
false posotives might only result in incorrect thread grouping, making this use
case none sensitive to hash collisions. Use of non cryptographic hash methods
is thus acceptable.
We expect a significant space reduction.
*Migration*: We will just create a new table and drop the old one. THis will
cause a discontinuity in thread allocation: 2 threads instead of one. This
seems acceptable and preferable to a complex migration in our eyes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]