[ https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122833#comment-16122833 ]
Paulo Motta commented on CASSANDRA-13162: ----------------------------------------- Closing as this was superseded by CASSANDRA-13614 and CASSANDRA-13065. > Batchlog replay is throttled during bootstrap, creating conditions for > incorrect query results on materialized views > -------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-13162 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13162 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views > Reporter: Wei Deng > Assignee: Andrés de la Peña > Priority: Critical > Labels: bootstrap, materializedviews > > I've tested this in a C* 3.0 cluster with a couple of Materialized Views > defined (one base table and two MVs on that base table). The data volume is > not very high per node (about 80GB of data per node total, and that > particular base table has about 25GB of data uncompressed with one MV taking > 18GB compressed and the other MV taking 3GB), and the cluster is using decent > hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB > gp2 EBS volumes). > This is originally a 9-node cluster. It appears that after adding 3 more > nodes to the DC, the system.batches table accumulated a lot of data on the 3 > new nodes (each having around 20GB under system.batches directory), and in > the subsequent week the batchlog on the 3 new nodes got slowly replayed back > to the rest of the nodes in the cluster. The bottleneck seems to be the > throttling defined in this cassandra.yaml setting: > batchlog_replay_throttle_in_kb, which by default is set to 1MB/s. > Given that it is taking almost a week (and still hasn't finished) for the > batchlog (from MV) to be replayed after the boostrap finishes, it seems only > reasonable to unthrottle (or at least give it a much higher throttle rate) > during the initial bootstrap, and hence I'd consider this a bug for our > current MV implementation. > Also as far as I understand, the bootstrap logic won't wait for the > backlogged batchlog to be fully replayed before changing the new > bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in > this state for a long time, we basically will get wrong answers on the MVs > during that whole duration (until batchlog is fully played to the cluster), > which adds even more criticality to this bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org