Re: Cassandra commitlog corruption on hard shutdown

2022-04-05 Thread Erick Ramirez
Thanks for circling back and posting your experience! >

Re: Cassandra commitlog corruption on hard shutdown

2022-04-04 Thread Leon Zaruvinsky
Hi all, I wanted to echo back on this thread a bit of a "win". In investigating ways to mitigate the "corruption on hard shutdown" issue, we came across the Group Commitlog feature that was added in 4.0 ( https://issues.apache.org/jira/browse/CASSANDRA-13530). We backported and enabled this

Re: Cassandra commitlog corruption on hard shutdown

2021-08-03 Thread Leon Zaruvinsky
Following up, I've found that we tend to encounter one of three types of exceptions/commitlog corruptions: 1. org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Mutation checksum failure at ... in CommitLog-5-1531150627243.log at

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Thanks for the links/comments Jeff and Bowen. We run xfs. Not sure that we can switch to zfs, so a different solution would be preferred. I’ll take a look through that patch – maybe I’ll try to backport and replicate. We’ve seen both cases where the commitlog is just 0s (empty) and where it has

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Jeff Jirsa
The commitlog code has changed DRASTICALLY between 2.x and trunk. If it's really a bunch of trailing 0s as was suggested later, then https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least one cause/case of that particular bug. On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Bowen Song
I have seen the same error in Cassandra 3.x too, and in fact quite a few times. On a few occasions, I opened the corrupted commit log file in a hex editor, and it was filled with a lots of 0x00s. I believe it was caused by the combination of the way Cassandra flushes the commit log + the way

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
And for completeness, a sample stack trace: ERROR [2021-07-21T02:11:01.994Z] org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay. Commit disk failure policy is stop_on_startup; terminating thread (throwable0_message: Mutation checksum failure at 15167277 in

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Currently we're using commitlog_batch: commitlog_sync: batch commitlog_sync_batch_window_in_ms: 2 commitlog_segment_size_in_mb: 32 durable_writes is also true. Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious if much in this space has changed since then

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Jeff Jirsa
What commitlog settings are you using? Default is periodic with 10s sync. That leaves you a 10s window on hard poweroff/crash. I would also expect cassandra to cleanup and start cleanly, which version are you running? On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky wrote: > Hi Cassandra

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Arvinder Dhillon
I thought durable_writes is the solution. -Arvinder On Mon, Jul 26, 2021, 1:00 PM Leon Zaruvinsky wrote: > Hi Cassandra community, > > We (and others) regularly run into commit log corruptions that are caused > by Cassandra, or the underlying infrastructure, being hard restarted. I > suspect

Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Hi Cassandra community, We (and others) regularly run into commit log corruptions that are caused by Cassandra, or the underlying infrastructure, being hard restarted. I suspect that this is because it happens in the middle of a commitlog file write to disk. Could anyone point me at resources /