[ 
https://issues.apache.org/jira/browse/JAMES-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Tellier updated JAMES-3937:
----------------------------------
    Description: 
h3. Why?

CassandraThreadIdGuessingAlgorithm tables occupies a non neglictible amount of 
space.

Out of a 20 GB database I have in one of my production platform:

{code:java}
                Table: threadlookuptable
                SSTable count: 4
                Space used (total): 360 263 739

                Table: threadtable
                SSTable count: 8
                Space used (total): 1 050 590 715
{code}

Which is non neglictible.

The goal here would be to reduce the space used in database by thread 
allocation.

h4. Other concerns

Storing subjects as is linked to usernames is likely problematic in terms of 
privacy.

h3. How ?

Thread Guessing Algorithm do not need raw values to operate but works with 
hashs as demonstrated by https://issues.apache.org/jira/browse/JAMES-3937.

h3. Impact ?

As threads are partitionned by users risk of collision is extremly low and 
false posotives might only result in incorrect thread grouping, making this use 
case none sensitive to hash collisions. Use of non cryptographic hash methods 
is thus acceptable.

We expect a significant space reduction.

*Migration*: We will just create a new table and drop the old one. THis will 
cause a discontinuity in thread allocation: 2 threads instead of one. This 
seems acceptable and preferable to a complex migration in our eyes.

 


  was:
h3. Why?

CassandraThreadIdGuessingAlgorithm tables occupies a non neglictible amount of 
space.

Out of a 20 GB database I have in one of my production platform:

{code:java}
                Table: threadlookuptable
                SSTable count: 4
                Space used (total): 360 263 739

                Table: threadtable
                SSTable count: 8
                Space used (total): 1 050 590 715
{code}

Which is non neglictible.

The goal here would be to reduce the space used in database by thread 
allocation.

h4. Other concerns

Storing subjects as is linked to usernames is likely problematic in terms of 
privacy.

h3. How ?

Thread Guessing Algorithm do not need raw values to operate but works with 
hashs as demonstrated by [insert poc PR link here].

h3. Impact ?

As threads are partitionned by users risk of collision is extremly low and 
false posotives might only result in incorrect thread grouping, making this use 
case none sensitive to hash collisions. Use of non cryptographic hash methods 
is thus acceptable.

We expect a significant space reduction.

*Migration*: We will just create a new table and drop the old one. THis will 
cause a discontinuity in thread allocation: 2 threads instead of one. This 
seems acceptable and preferable to a complex migration in our eyes.

 



> Improve CassandraThreadIdGuessingAlgorithm
> ------------------------------------------
>
>                 Key: JAMES-3937
>                 URL: https://issues.apache.org/jira/browse/JAMES-3937
>             Project: James Server
>          Issue Type: Improvement
>          Components: cassandra, mailbox
>    Affects Versions: 3.8.0
>            Reporter: Benoit Tellier
>            Priority: Major
>             Fix For: 3.9.0
>
>
> h3. Why?
> CassandraThreadIdGuessingAlgorithm tables occupies a non neglictible amount 
> of space.
> Out of a 20 GB database I have in one of my production platform:
> {code:java}
>               Table: threadlookuptable
>               SSTable count: 4
>               Space used (total): 360 263 739
>               Table: threadtable
>               SSTable count: 8
>               Space used (total): 1 050 590 715
> {code}
> Which is non neglictible.
> The goal here would be to reduce the space used in database by thread 
> allocation.
> h4. Other concerns
> Storing subjects as is linked to usernames is likely problematic in terms of 
> privacy.
> h3. How ?
> Thread Guessing Algorithm do not need raw values to operate but works with 
> hashs as demonstrated by https://issues.apache.org/jira/browse/JAMES-3937.
> h3. Impact ?
> As threads are partitionned by users risk of collision is extremly low and 
> false posotives might only result in incorrect thread grouping, making this 
> use case none sensitive to hash collisions. Use of non cryptographic hash 
> methods is thus acceptable.
> We expect a significant space reduction.
> *Migration*: We will just create a new table and drop the old one. THis will 
> cause a discontinuity in thread allocation: 2 threads instead of one. This 
> seems acceptable and preferable to a complex migration in our eyes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to