Re: Network Bandwidth and Multi-DC replication

2020-12-16 Thread Jens Fischer
Hallo Jeff, very interesting stuff, thank you for sharing! Indeed, I am storing time-series data. The table has 67 columns. Writing is done in two steps: First 43 fields (3 primary key fields and 40 data fields) than 27 fields (3 primary key fields and 24 data fields) in a second step, always

Re: Network Bandwidth and Multi-DC replication

2020-12-15 Thread Jeff Jirsa
There's a small amount of overhead on each packet for serialization - e.g., each mutation is tied to a column family (uuid) and gets serialized with sizes and checksums, so I guess there's a point where your updates are small enough that the overhead of the mutations starts being visible. You

Re: Network Bandwidth and Multi-DC replication

2020-12-15 Thread Jens Fischer
Hi Scott, Thank you for your help. There was an error or at least an ambiguity in my second Mail! I wrote: I still see outgoing cross-DC traffic of ~ 2x the “write size A” What I wanted to say was: I still see outgoing cross-DC traffic of ~ 2x the “write size A” per remote DC or 4x the

Re: Network Bandwidth and Multi-DC replication

2020-12-09 Thread Scott Hirleman
2x makes sense tho. If you have 3 DCs, you write locally to DC1 and then it gets replicated once in DC1 and then it gets replicated to DC2 AND DC3 at consistency local_one via cross DCtraffic to one of the nodes in each DC, then replicated in each DC to a second node via local traffic Write comes

Re: Network Bandwidth and Multi-DC replication

2020-12-02 Thread Jens Fischer
Hi, I checked for all the given other factors - anti entropy repair, hints, read repair - and I still see outgoing cross-DC traffic of ~ 2x the “write size A” (as defined below). Given Jeffs answers this is not to be expected, i.e. there is something wrong here. Does anybody have an idea how

Re: Network Bandwidth and Multi-DC replication

2020-11-30 Thread Jens Fischer
Hi Jeff, Thank you for your answer, very helpful already! All writes are done with `LOCAL_ONE` and we have RF=2 in each data center. To compare our examples we need to come to an agreement on what you are calling “write size A”. I gave two different write sizes: I call the bandwidth for

Re: Network Bandwidth and Multi-DC replication

2020-11-26 Thread Jeff Jirsa
> On Nov 26, 2020, at 9:53 AM, Jens Fischer wrote: > >  Hi, > > we run a Cassandra cluster with three DCs. We noticed that the traffic > incurred by running the Cluster is significant. > > Consider the following simplified IoT scenario: > > * time series data from devices in the field is

Network Bandwidth and Multi-DC replication

2020-11-26 Thread Jens Fischer
Hi, we run a Cassandra cluster with three DCs. We noticed that the traffic incurred by running the Cluster is significant. Consider the following simplified IoT scenario: * time series data from devices in the field is received at Node A * Node A inserts the data into DC 1 * DC 1 replicates