[ 
https://issues.apache.org/jira/browse/JAMES-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

René Cordier updated JAMES-3586:
--------------------------------
    Description: 
h2. Context

Some users are storing all message content in Cassandra and thus stores huge 
amount of data.

We would like to reduce the performance costs to read this large amount of data.

Blobs being immutable, we have a guaranty that:
 * If we read something the value is up-to date
 * If we fail at reading something we have a guaranty the content had not been 
replicated yet. A second read with a higher consistency level will read the 
data (and consistency piggy backed on consistency levels will heal the data)

Cassandra being very efficient at replicating things (think hinted handoff, 
direct asynchronous replication), we can expect that data is correctly 
duplicated before reads are attempted.
h2. Decision

Via a configuration option, allow optimizing blob access.
 * If enabled, perform a first read at CL one and fallback if needed by 
performing a second read at the regular CL
 * If disabled, only a read at the regular CL will be attempted

A metric should be implemented to track the CL one hit rate, allowing an 
effective review of the effectiveness of this solution.
h2. Consequences

In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction 
across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
h2. Work to be conducted

Add a configuration option in cassandra.properties:
{code:bash}
# Experimental configuration option. Defaults to false.
# Enabling it resutls in reading strictly immutable (not deleted, not updated) 
data at CL ONE. If the data is missing,
# we can be sure that the data had not been replicated yet, a second read is 
performed  with a higher consistency level.
# This option still offer the same level of consistency (thanks to strict 
immutability) but might result in higher resource usage in case of mis-behaving 
replication.
# Metrics can be used to mesure the efficiency of this.
optimistic.consistency.level.enabled=false
{code}
 

  was:
h2. Context

Some users are storing all message content in Cassandra and thus stores huge 
amount of data.

We would like to reduce the performance costs to read this large amount of data.

Blobs being immutable, we have a guaranty that:
 * If we read something the value is up-to date
 * If we fail at reading something we have a guaranty the content had not been 
replicated yet. A second read with a higher consistency level will read the 
data (and consistency piggy backed on consistency levels will heal the data)

Cassandra being very efficient at replicating things (think hinted handoff, 
direct asynchronous replication), we can expect that data is correctly 
duplicated before reads are attempted.
h2. Decision

Via a configuration option, allow optimizing blob access.
 * If enabled, perform a first read at CL one and fallback if needed by 
performing a second read at the regular CL
 * If disabled, only a read at the regular CL will be attempted

A metric should be implemented to track the CL one hit rate, allowing an 
effective review of the effectiveness of this solution.
h2. Consequences

In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction 
across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
h2. Work to be conducted

Add a configuration option in cassandra.properties:
{code:bash}
# Experimental configuration option. Defaults to false.
# Enabling it resutls in reading strictly immutable (not deleted, not updated) 
data at CL ONE. If the data is missing,
# we can be sure that the data had not been replicated yet, a second read is 
performed  with a higher consistency level.
# This option still offer the same level of consistency (thanks to strict 
immutability) but might result in higher resource usage in case of mis-behaving 
replication.
# Metrics can be used to mesure the efficiency of this.
optimistic.consistency.level.enabled=false
{code}
Also apply it for messagev3 and attachmentv2 tables to benefit from it.


> CL one option for the Cassandra blob store
> ------------------------------------------
>
>                 Key: JAMES-3586
>                 URL: https://issues.apache.org/jira/browse/JAMES-3586
>             Project: James Server
>          Issue Type: Improvement
>            Reporter: René Cordier
>            Priority: Major
>             Fix For: 3.7.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Context
> Some users are storing all message content in Cassandra and thus stores huge 
> amount of data.
> We would like to reduce the performance costs to read this large amount of 
> data.
> Blobs being immutable, we have a guaranty that:
>  * If we read something the value is up-to date
>  * If we fail at reading something we have a guaranty the content had not 
> been replicated yet. A second read with a higher consistency level will read 
> the data (and consistency piggy backed on consistency levels will heal the 
> data)
> Cassandra being very efficient at replicating things (think hinted handoff, 
> direct asynchronous replication), we can expect that data is correctly 
> duplicated before reads are attempted.
> h2. Decision
> Via a configuration option, allow optimizing blob access.
>  * If enabled, perform a first read at CL one and fallback if needed by 
> performing a second read at the regular CL
>  * If disabled, only a read at the regular CL will be attempted
> A metric should be implemented to track the CL one hit rate, allowing an 
> effective review of the effectiveness of this solution.
> h2. Consequences
> In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction 
> across the cluster, lowering a lot the read pressure on the Cassandra 
> BlobStore.
> h2. Work to be conducted
> Add a configuration option in cassandra.properties:
> {code:bash}
> # Experimental configuration option. Defaults to false.
> # Enabling it resutls in reading strictly immutable (not deleted, not 
> updated) data at CL ONE. If the data is missing,
> # we can be sure that the data had not been replicated yet, a second read is 
> performed  with a higher consistency level.
> # This option still offer the same level of consistency (thanks to strict 
> immutability) but might result in higher resource usage in case of 
> mis-behaving replication.
> # Metrics can be used to mesure the efficiency of this.
> optimistic.consistency.level.enabled=false
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to