Thank you Jon, great article as usually!
One topic that was discussed in the article is filesystem cache which is traditionally leveraged for data caching in Cassandra (with row-caching disabled by default). IIRC mmap() is used. Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + maintain own, not kernel-managed, DB Cache thus improving overall performance. As Cassandra is designed to be a DB with low response time, this approach with DIO/AIO/DB Cache seems to be a really useful feature. Just out of curiosity, are there reasons why this advanced IO stack wasn't implemented, except lack of resources to do this? Regards, Kyrill ________________________________ From: Eric Plowe <eric.pl...@gmail.com> Sent: Wednesday, August 8, 2018 9:39:44 PM To: user@cassandra.apache.org Subject: Re: Compression Tuning Tutorial Great post, Jonathan! Thank you very much. ~Eric On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad <j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote: Hey folks, We've noticed a lot over the years that people create tables usually leaving the default compression parameters, and have spent a lot of time helping teams figure out the right settings for their cluster based on their workload. I finally managed to write some thoughts down along with a high level breakdown of how the internals function that should help people pick better settings for their cluster. This post focuses on a mixed 50:50 read:write workload, but the same conclusions are drawn from a read heavy workload. Hopefully this helps some folks get better performance / save some money on hardware! http://thelastpickle.com/blog/2018/08/08/compression_performance.html -- Jon Haddad Principal Consultant, The Last Pickle