Hi Julien, Whether skinny rows or wide rows, data for a partition key is always > completely updated / overwritten, ie. every command is an insert.
Insert and updates are kind of the same thing in Cassandra for standard data types, as Cassandra appends the operation and do not actually update any past data right away. My guess is you are actually 'updating' existing columns, rows or partitions. We manage data expiration with TTL set to several days. > I believe that for the reason mentioned above, this TTL only applies to data that would not be overwritten. All the updated / reinserted data, is resetting the TTL timer to the new value given to the column, range, row, or partition. This imposes a great load on the cluster (huge CPU consumption), this load > greatly impacts the constant reads we have. Read latency are fine the rest > of the time. > This is expected in writes heavy scenario. Writes are not touching the data disk, thus, the CPU is often the bottleneck in this case. Also, it is known that Spark (and similar distributed processing technologies) can harm regular transactions. Possible options to reduce the impact: - Use a specific data center for analytics, within the same cluster, and work locally there. Writes will still be replicated to the original DC (asynchronously) but it will no longer be responsible for coordinating the analytical jobs. - Use a coordinator ring to delegate most of the work to this 'proxy layer' between clients and Cassandra nodes (with data). A good starting point could be: https://www.youtube.com/watch?v=K0sQvaxiDH0. I am not sure how experimental or hard to deploy this architecture is, but I see there a smart move, probably very good for some use case. Maybe yours? - Simply limit the write speed in Spark if doable from a service perspective or add nodes, so spark is never strong enough to break regular transactions (this could be very expensive). - Run Spark mostly on off-peak hours - ... probably some more I cannot think of just now :). Is there any best practices we should follow to ease the load when > importing data into C* except > - reducing the number of concurrent writes and throughput on the driver > side > Yes, as mentioned above, on throttling spark throughput if doable is definitively a good idea. If not you might have terrible surprises if someone from the dev team decides to add some more writes suddenly and Cassandra side is not ready for it. - reducing the number of compaction threads and throughput on the cluster > Generally the number of compaction is well defined by default. You don't want to use more than 1/4 or 1/2 of the total available and generally no more than 8. Lowering the compaction throughput is a double-edged sword. Yes it would free some disk throughput immediately. Yet if compactions are stacking, SSTables are merging slowly and reads performances will decrease substantially, quite fast, as each read will have to hit a lot of files thus making an increasing number of reads. The throughput should be set to a value that is fast enough to keep up with compactions. If you really have to rewrite 100% of the data, every day, I would suggest you to create 10 new tables every day instead of rewriting existing data. Writing a new table 'MyAwesomeTable-20180130' for example and then simply dropping the one from 2 or 3 days ago and cleaning the snapshot, might be more efficient I would say. On the client side, it is about adding the date (passed or calculated). In particular : > - is there any evidence that writing multiple tables at the same time > produces more load than writing the tables one at a time when tables are > completely written at once such as we do? I don't think so, excepted maybe that compactions within a single table cannot be done all in parallel, thus you would probably limit the load a bit in Cassandra. I am not even sure, a lot of progress was made in the past to make compactions more efficient :). - because of the heavy writes, we use STC. Is it the best choice > considering data is completely overwritten once a day? Tables contain > collections and UDTs. STCS sounds reasonable, I would not start tuning here. TWCS could be considered to evict tombstones efficiently, but as I said earlier, I don't think you have a lot of expired tombstones, I would guess compactions + coordination for writes is being the cluster killer in your case, but please, let us know how compactions and tombstones look like in your cluster . - compactions: nodetool compactionstats -H / check pending compactions - tombstones: use sstablemetadata on biggest / oldest SSTables or use monitoring to check the ratio of droppable tombstones. It's a very specific use case I never faced and I don't know your exact use case, so I am mostly guessing here. I can be wrong on some of the points above, but I am sure some people around will step in and correct me where this is the case :). I hope you'll find some useful information though. C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-01-30 8:12 GMT+00:00 Julien Moumne <jmou...@deezer.com>: > Hello, I am looking for best practices for the following use case : > > Once a day, we insert at the same time 10 full tables (several 100GiB > each) using Spark C* driver, without batching, with CL set to ALL. > > Whether skinny rows or wide rows, data for a partition key is always > completely updated / overwritten, ie. every command is an insert. > > This imposes a great load on the cluster (huge CPU consumption), this load > greatly impacts the constant reads we have. Read latency are fine the rest > of the time. > > Is there any best practices we should follow to ease the load when > importing data into C* except > - reducing the number of concurrent writes and throughput on the driver > side > - reducing the number of compaction threads and throughput on the cluster > > In particular : > - is there any evidence that writing multiple tables at the same time > produces more load than writing the tables one at a time when tables are > completely written at once such as we do? > - because of the heavy writes, we use STC. Is it the best choice > considering data is completely overwritten once a day? Tables contain > collections and UDTs. > > (We manage data expiration with TTL set to several days. > We use SSDs.) > > Thanks! >