Benoit Tellier created JAMES-3591:
-------------------------------------

             Summary: Warn against CassandraBlobStoreDAO usage and its use
                 Key: JAMES-3591
                 URL: https://issues.apache.org/jira/browse/JAMES-3591
             Project: James Server
          Issue Type: Task
            Reporter: Benoit Tellier


h3. Why?

Cassandra is not made for large binaries storage. And deliver sub-optimal 
performances compared to ObjectStorage alternatives (like S3, MinIO or Apache 
Ozone).

We need to ensure users are fully aware of the consequences while choosing this 
option.

Thus we should add warnings in:

 - The code via java doc
 - The documentation websites
 - dockerhub README
 - A log upon startup.
 - Sample configuration file.

h3. Related exhanges

I did have exchanges with Nate Mc Call on this topic:

{code:java}
Hi folks - would really like to talk to anyone that worked on the Cassandra 
Blob Store implementation about potentially pulling this out for general use. 
Please ping on zzn...@apache.org or zznate on asf's slack. 
{code}

Then exchanging by email:

{code:java}
Hello Nate,

Thank you very much for raising this topic.

I am seriously concerned with the performance and storage costs of the 
Cassandra blob store for quite some time already.

The Apache James PMC had been reluctant to remove it as we were worry bringing 
additional runtime dependencies to the project (meaning forcing users to rely 
on an object store like Ozone or MinIO).

I personnaly encourage any move on this topic to deprecate/provide extensive 
warnings regarding its use and am very curious to know what you have to say 
about it.

Best regards,
{code}

Answered by:

{code:java}
Hi Benoit,
Thanks for the response. At a high level, I completely agree with you - a 
database of any sort is not the right place for binary content. That said, I 
regularly see cases where folks are in a situation like "this is what we have 
provisioned and accounted for, let's just use it."

As it stands, this is one of the better binary storage approaches which I have 
seen implemented. A checksumming, reactive API with a configurable chunk size 
solves a lot of problems for people.

At the end of the day though, I do very much agree that the right answer is to 
use a distributed filesystem of some sort (Ozone and MinIO would definitely be 
better), and folks should be warned about the substantial storage and 
performance overhead of doing it in C*. But this approach at least will "suck 
less" than many others I have seen using C* similarly. 

Thanks again for the response, and nice to meet you either way.

Cheers,
-Nate
{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to