[jira] [Updated] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-10-28 Thread Phil O Conduin (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil O Conduin updated CASSANDRA-15274:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

Our datafile corruption issues were a problem with the OS wrongly taking one 
block belonging to a C* data file thinking it was no longer used and treating 
it as a free block that would later be used.

For example:
C* deletes file after compaction, OS collects all blocks which are free now and 
sends TRIM command to SSD, but SSD from time to time picks the wrong block, not 
the one reported by OS - does the trim - causing zeroized blocks to be seen in 
the datafile and later use it for different file.
So the symptom is - we suddenly see 4096 zeroes in the datafile- it means SSD 
just trimmed the block, after some time we can see some data written to those 
blocks - it means the block is used by other file and therefore gives us a 
corrupt file.

We turned off the scheduled TRIM function on the OS and we are no longer 
getting corruptions.

This was very difficult to pinpoint.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>  Labels: impact-high
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-10-28 Thread Phil O Conduin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961436#comment-16961436
 ] 

Phil O Conduin commented on CASSANDRA-15274:


Hi Benedict,

Sorry I forgot to come back and update this jira.

 

Our datafile corruption issues were a problem with the OS wrongly taking one 
block belonging to a C* data file thinking it was no longer used and treating 
it as a free block that would later be used.

For example:
C* deletes file after compaction, OS collects all blocks which are free now and 
sends TRIM command to SSD, but SSD from time to time picks the wrong block, not 
the one reported by OS - does the trim - causing zeroized blocks to be seen in 
the datafile and later use it for different file.
So the symptom is - we suddenly see 4096 zeroes in the datafile- it means SSD 
just trimmed the block, after some time we can see some data written to those 
blocks - it means the block is used by other file and therefore gives us a 
corrupt file.

We turned off the scheduled TRIM function on the OS and we are no longer 
getting corruptions.

This was very difficult to pinpoint.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>  Labels: impact-high
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-09-05 Thread Phil O Conduin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920810#comment-16920810
 ] 

Phil O Conduin edited comment on CASSANDRA-15274 at 9/5/19 2:18 PM:


Hi,

We managed to remove the CRC check from the code and build. When we do a 
sstable2json on a corrupt file we are not seeing an issue with CRC.
 This time it is not CRC check, but exception during an attempt to decompress 
the chunk, so I think we got the answer to our question - it is not just CRC 
check problem.

 

Another area of investigation of this issue, we decided to create a script that 
generated MD5 checksums against all sstable files. This script runs from cron 
twice per day and logs checksums of all sstable files.
 We capture the md5 and then compare it over the lifetime fo the file. We have 
proved that the md5 checksum number is not changing. This would indicate a 
possible bug in Cassandra at time of compacting/writing the file.

 

 

Taking the latest file for example:

First reported in cassandra log Sep 01 08:39:48

{{Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Failed creating a 
merkle tree for repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422], /x.x.x.x (see 
log for details)}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Received merkle tree for CF from 
/x.x.x.x}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: WARN 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 CF sync failed}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Requesting merkle trees for 
CF_RecentIndex (to [/x.x.x.x, /1x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x, 
/x.x.x.x])}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in 
thread Thread[RepairJobTask:24,5,main]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, 
(-2320162195562336336,-2318312110429971422] Validation failed in /x.x.x.x}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in 
thread Thread[ValidationExecutor:53,1,main]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: 
org.apache.cassandra.io.FSReadError: 
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/data/ssd2/data/KeyspaceMetadata/CF-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:324)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{ Sep 01 08:39:48 hostname cassandra[16223]: at 

[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-09-05 Thread Phil O Conduin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920810#comment-16920810
 ] 

Phil O Conduin edited comment on CASSANDRA-15274 at 9/5/19 2:17 PM:


Hi,

We managed to remove the CRC check from the code and build. When we do a 
sstable2json on a corrupt file we are not seeing an issue with CRC.
This time it is not CRC check, but exception during an attempt to decompress 
the chunk, so I think we got the answer to our question - it is not just CRC 
check problem.

 

Another area of investigation of this issue, we decided to create a script that 
generated MD5 checksums against all sstable files. This script runs from cron 
twice per day and logs checksums of all sstable files.
We capture the md5 and then compare it over the lifetime fo the file. We have 
proved that the md5 checksum number is not changing. This would indicate a 
possible bug in Cassandra at time of compacting/writing the file.

 

 

Taking the latest file for example:

First reported in cassandra log Sep 01 08:39:48

Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Failed creating a 
merkle tree for repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422], /x.x.x.x (see 
log for details)
Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Received merkle tree for CF from /x.x.x.x
Sep 01 08:39:48 hostname cassandra[16223]: WARN 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 CF sync failed
Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Requesting merkle trees for 
CF_RecentIndex (to [/x.x.x.x, /1x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x, 
/x.x.x.x])
Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread 
Thread[RepairJobTask:24,5,main]
Sep 01 08:39:48 hostname cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, 
(-2320162195562336336,-2318312110429971422] Validation failed in /x.x.x.x
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]
Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]
Sep 01 08:39:48 hostname cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]
Sep 01 08:39:48 hostname cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]
Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread 
Thread[ValidationExecutor:53,1,main]
Sep 01 08:39:48 hostname cassandra[16223]: org.apache.cassandra.io.FSReadError: 
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/data/ssd2/data/KeyspaceMetadata/CF-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:324)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Sep 01 08:39:48 hostname cassandra[16223]: at 

[jira] [Updated] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-09-02 Thread Phil O Conduin (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil O Conduin updated CASSANDRA-15274:
---
Impacts:   (was: None)

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:81)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-09-02 Thread Phil O Conduin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920810#comment-16920810
 ] 

Phil O Conduin commented on CASSANDRA-15274:


Hi,


We managed to remove the CRC check from the code and build. When we do a 
sstable2json on a corrupt file we are not seeing an issue with CRC.
This time it is not CRC check, but exception during an attempt to decompress 
the chunk, so I think we got the answer to our question - it is not just CRC 
check problem.

 

Another area of investigation of this issue, we decided to create a script that 
generated MD5 checksums against all sstable files. This script runs from cron 
twice per day and logs checksums of all sstable files.
We capture the md5 and then compare it over the lifetime fo the file. We have 
proved that the md5 checksum number is not changing. This would indicate a 
possible bug in Cassandra at time of compacting/writing the file.

 

 

Taking the latest file for example:

*First reported in cassandra log Sep 01 08:39:48*

{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Failed creating a merkle tree for [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]], /10.2.41.38 (see log for 
details)}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] Received merkle 
tree for CF_ConversationIndex1 from /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
WARN 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] 
CF_ConversationIndex1 sync failed}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] Requesting merkle 
trees for CF_RecentIndex (to [/10.2.41.34, /10.2.41.48, /10.2.57.54, 
/10.2.57.46, /10.2.57.12, /10.2.41.38])}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Exception in thread Thread[RepairJobTask:24,5,main]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Exception in thread Thread[ValidationExecutor:53,1,main]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.io.FSReadError: 
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-23 Thread Phil O Conduin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914135#comment-16914135
 ] 

Phil O Conduin commented on CASSANDRA-15274:


Hi [~benedict]

We are having trouble building the code to bypass the setCrcCheckChance. 

On the new build when we run sstable2json it still hits the chunk exception.

Any chance you could help us on our version of the code - 2.2.13?

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Phil O Conduin (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906344#comment-16906344
 ] 

Phil O Conduin commented on CASSANDRA-15274:


[~benedict] thanks a lot for the explanation.  We have a ticket open with Cisco 
for help on this also.

Can you explain a little more about how we validate for actual corruption, how 
would I go about comparing data written to files?

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Created] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-12 Thread Phil O Conduin (JIRA)
Phil O Conduin created CASSANDRA-15274:
--

 Summary: Multiple Corrupt datafiles across entire environment 
 Key: CASSANDRA-15274
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
 Project: Cassandra
  Issue Type: Bug
  Components: Local/Compaction
Reporter: Phil O Conduin


Cassandra Version: 2.2.13

PRE-PROD environment.
 * 2 datacenters.
 * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
 * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
 * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site B.

We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
datacenter each running with its own Cassandra back end in a cluster together.

OS Details [Red Hat Linux]
cass_a@x 0 10:53:01 ~ $ uname -a
Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
x86_64 x86_64 GNU/Linux

cass_a@x 0 10:57:31 ~ $ cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"

Storage Layout 
cass_a@xx 0 10:46:28 ~ $ df -h
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
devtmpfs                            63G     0   63G   0% /dev
tmpfs                               63G     0   63G   0% /dev/shm
tmpfs                               63G  4.1G   59G   7% /run
tmpfs                               63G     0   63G   0% /sys/fs/cgroup
>> 4 cassandra instances
/dev/sdd                           1.5T  802G  688G  54% /data/ssd4
/dev/sda                           1.5T  798G  692G  54% /data/ssd1
/dev/sdb                           1.5T  681G  810G  46% /data/ssd2
/dev/sdc                           1.5T  558G  932G  38% /data/ssd3

Cassandra load is about 200GB and the rest of the space is snapshots

CPU
cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
CPU(s):                64
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2

*Description of problem:*
During repair of the cluster, we are seeing multiple corruptions in the log 
files on a lot of instances.  There seems to be no pattern to the corruption.  
It seems that the repair job is finding all the corrupted files for us.  The 
repair will hang on the node where the corrupted file is found.  To fix this we 
remove/rename the datafile and bounce the Cassandra instance.  Our hardware/OS 
team have stated there is no problem on their side.  I do not believe it the 
repair causing the corruption. 

 

So let me give you an example of a corrupted file and maybe someone might be 
able to work through it with me?

When this corrupted file was reported in the log it looks like it was the 
repair that found it.

$ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
"2019-08-07 22:45:00"

Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
0%/0% of on/off-heap limit)
Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle tree 
for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
(-1476350953672479093,-1474461
Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
Thread[ValidationExecutor:825,1,main]
Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/x/ssd2/data/KeyspaceMetadata/x-1e453cb0
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:81)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at 
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Aug 07 22:30:33 cassandra[34611]: at