Hi Julien,

Whether skinny rows or wide rows, data for a partition key is always
> completely updated / overwritten, ie. every command is an insert.


Insert and updates are kind of the same thing in Cassandra for standard
data types, as Cassandra appends the operation and do not actually update
any past data right away. My guess is you are actually  'updating'
existing columns, rows or partitions.

We manage data expiration with TTL set to several days.
>

I believe that for the reason mentioned above, this TTL only applies to
data that would not be overwritten. All the updated / reinserted data, is
resetting the TTL timer to the new value given to the column, range, row,
or partition.

This imposes a great load on the cluster (huge CPU consumption), this load
> greatly impacts the constant reads we have. Read latency are fine the rest
> of the time.
>

This is expected in writes heavy scenario. Writes are not touching the data
disk, thus, the CPU is often the bottleneck in this case. Also, it is known
that Spark (and similar distributed processing technologies) can harm
regular transactions.

Possible options to reduce the impact:

- Use a specific data center for analytics, within the same cluster, and
work locally there. Writes will still be replicated to the original DC
(asynchronously) but it will no longer be responsible for coordinating the
analytical jobs.
- Use a coordinator ring to delegate most of the work to this 'proxy layer'
between clients and Cassandra nodes (with data). A good starting point
could be: https://www.youtube.com/watch?v=K0sQvaxiDH0. I am not sure how
experimental or hard to deploy this architecture is, but I see there a
smart move, probably very good for some use case. Maybe yours?
- Simply limit the write speed in Spark if doable from a service
perspective or add nodes, so spark is never strong enough to break regular
transactions (this could be very expensive).
- Run Spark mostly on off-peak hours
- ... probably some more I cannot think of just now :).

Is there any best practices we should follow to ease the load when
> importing data into C* except
>  - reducing the number of concurrent writes and throughput on the driver
> side
>

Yes, as mentioned above, on throttling spark throughput if doable is
definitively a good idea. If not you might have terrible surprises if
someone from the dev team decides to add some more writes suddenly and
Cassandra side is not ready for it.

 - reducing the number of compaction threads and throughput on the cluster
>

Generally the number of compaction is well defined by default. You don't
want to use more than 1/4 or 1/2 of the total available and generally no
more than 8. Lowering the compaction throughput is a double-edged sword.
Yes it would free some disk throughput immediately. Yet if compactions are
stacking, SSTables are merging slowly and reads performances will decrease
substantially, quite fast, as each read will have to hit a lot of files
thus making an increasing number of reads. The throughput should be set to
a value that is fast enough to keep up with compactions.

If you really have to rewrite 100% of the data, every day, I would suggest
you to create 10 new tables every day instead of rewriting existing data.
Writing a new table 'MyAwesomeTable-20180130' for example and then simply
dropping the one from 2 or 3 days ago and cleaning the snapshot, might be
more efficient I would say. On the client side, it is about adding the date
(passed or calculated).

In particular :
>  - is there any evidence that writing multiple tables at the same time
> produces more load than writing the tables one at a time when tables are
> completely written at once such as we do?


I don't think so, excepted maybe that compactions within a single table
cannot be done all in parallel, thus you would probably limit the load a
bit in Cassandra. I am not even sure, a lot of progress was made in the
past to make compactions more efficient :).

 - because of the heavy writes, we use STC. Is it the best choice
> considering data is completely overwritten once a day? Tables contain
> collections and UDTs.


STCS sounds reasonable, I would not start tuning here. TWCS could be
considered to evict tombstones efficiently, but as I said earlier, I don't
think you have a lot of expired tombstones, I would guess compactions +
coordination for writes is being the cluster killer in your case, but
please, let us know how compactions and tombstones look like in your
cluster  .

- compactions: nodetool compactionstats -H / check pending compactions
- tombstones: use sstablemetadata on biggest / oldest SSTables or use
monitoring to check the ratio of droppable tombstones.

It's a very specific use case I never faced and I don't know your exact use
case, so I am mostly guessing here. I can be wrong on some of the points
above, but I am sure some people around will step in and correct me where
this is the case :). I hope you'll find some useful information though.

C*heers,
-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-01-30 8:12 GMT+00:00 Julien Moumne <jmou...@deezer.com>:

> Hello, I am looking for best practices for the following use case :
>
> Once a day, we insert at the same time 10 full tables (several 100GiB
> each) using Spark C* driver, without batching, with CL set to ALL.
>
> Whether skinny rows or wide rows, data for a partition key is always
> completely updated / overwritten, ie. every command is an insert.
>
> This imposes a great load on the cluster (huge CPU consumption), this load
> greatly impacts the constant reads we have. Read latency are fine the rest
> of the time.
>
> Is there any best practices we should follow to ease the load when
> importing data into C* except
>  - reducing the number of concurrent writes and throughput on the driver
> side
>  - reducing the number of compaction threads and throughput on the cluster
>
> In particular :
>  - is there any evidence that writing multiple tables at the same time
> produces more load than writing the tables one at a time when tables are
> completely written at once such as we do?
>  - because of the heavy writes, we use STC. Is it the best choice
> considering data is completely overwritten once a day? Tables contain
> collections and UDTs.
>
> (We manage data expiration with TTL set to several days.
> We use SSDs.)
>
> Thanks!
>

Reply via email to