Re: Cassandra commitlog corruption on hard shutdown

Jeff Jirsa Mon, 26 Jul 2021 15:37:58 -0700

The commitlog code has changed DRASTICALLY between 2.x and trunk.

If it's really a bunch of trailing 0s as was suggested later, then
https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least
one cause/case of that particular bug.




On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky <leonzaruvin...@gmail.com>
wrote:

> And for completeness, a sample stack trace:
>
> ERROR [2021-07-21T02:11:01.994Z] org.apache.cassandra.db.commitlog.CommitLog: 
> Failed commit log replay. Commit disk failure policy is stop_on_startup; 
> terminating thread (throwable0_message: Mutation checksum failure at 15167277 
> in CommitLog-5-1626828286977.log)
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
>       at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
>       at 
> org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
>       at 
> org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
>       at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222)
>       at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>       at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:741)
>
>
> On Mon, Jul 26, 2021 at 6:08 PM Leon Zaruvinsky <leonzaruvin...@gmail.com>
> wrote:
>
>> Currently we're using commitlog_batch:
>>
>>     commitlog_sync: batch
>>     commitlog_sync_batch_window_in_ms: 2
>>     commitlog_segment_size_in_mb: 32
>>
>> durable_writes is also true.
>>
>> Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious
>> if much in this space has changed since then (I've looked through the
>> changelogs and nothing stood out).
>>
>> On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> What commitlog settings are you using?
>>>
>>> Default is periodic with 10s sync. That leaves you a 10s window on hard
>>> poweroff/crash.
>>>
>>> I would also expect cassandra to cleanup and start cleanly, which
>>> version are you running?
>>>
>>>
>>>
>>> On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky <
>>> leonzaruvin...@gmail.com> wrote:
>>>
>>>> Hi Cassandra community,
>>>>
>>>> We (and others) regularly run into commit log corruptions that are
>>>> caused by Cassandra, or the underlying infrastructure, being hard
>>>> restarted.  I suspect that this is because it happens in the middle of a
>>>> commitlog file write to disk.
>>>>
>>>> Could anyone point me at resources / code to understand why this is
>>>> happening?  Shouldn't Cassandra not be acking writes until the commitlog is
>>>> safely written to disk?  I would expect that on startup, Cassandra should
>>>> be able to clean up bad commitlog files and recover gracefully.
>>>>
>>>> I've seen various references online to this issue as something that
>>>> will be fixed in the future - so I'm curious if there is any movement or
>>>> thoughts there.
>>>>
>>>> Thanks a bunch,
>>>> Leon
>>>>
>>>

Re: Cassandra commitlog corruption on hard shutdown

Reply via email to