[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment
[ https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920810#comment-16920810 ] Phil O Conduin edited comment on CASSANDRA-15274 at 9/5/19 2:18 PM: Hi, We managed to remove the CRC check from the code and build. When we do a sstable2json on a corrupt file we are not seeing an issue with CRC. This time it is not CRC check, but exception during an attempt to decompress the chunk, so I think we got the answer to our question - it is not just CRC check problem. Another area of investigation of this issue, we decided to create a script that generated MD5 checksums against all sstable files. This script runs from cron twice per day and logs checksums of all sstable files. We capture the md5 and then compare it over the lifetime fo the file. We have proved that the md5 checksum number is not changing. This would indicate a possible bug in Cassandra at time of compacting/writing the file. Taking the latest file for example: First reported in cassandra log Sep 01 08:39:48 {{Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Failed creating a merkle tree for repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422], /x.x.x.x (see log for details)}} {{ Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Received merkle tree for CF from /x.x.x.x}} {{ Sep 01 08:39:48 hostname cassandra[16223]: WARN 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 CF sync failed}} {{ Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Requesting merkle trees for CF_RecentIndex (to [/x.x.x.x, /1x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x])}} {{ Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread Thread[RepairJobTask:24,5,main]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: org.apache.cassandra.exceptions.RepairException: repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422] Validation failed in /x.x.x.x}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_172]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_172]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread Thread[ValidationExecutor:53,1,main]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: org.apache.cassandra.io.FSReadError: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /data/ssd2/data/KeyspaceMetadata/CF-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:324) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132) ~[apache-cassandra-2.2.13.jar:2.2.13]}} {{ Sep 01 08:39:48 hostname cassandra[16223]: at
[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment
[ https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920810#comment-16920810 ] Phil O Conduin edited comment on CASSANDRA-15274 at 9/5/19 2:17 PM: Hi, We managed to remove the CRC check from the code and build. When we do a sstable2json on a corrupt file we are not seeing an issue with CRC. This time it is not CRC check, but exception during an attempt to decompress the chunk, so I think we got the answer to our question - it is not just CRC check problem. Another area of investigation of this issue, we decided to create a script that generated MD5 checksums against all sstable files. This script runs from cron twice per day and logs checksums of all sstable files. We capture the md5 and then compare it over the lifetime fo the file. We have proved that the md5 checksum number is not changing. This would indicate a possible bug in Cassandra at time of compacting/writing the file. Taking the latest file for example: First reported in cassandra log Sep 01 08:39:48 Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Failed creating a merkle tree for repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422], /x.x.x.x (see log for details) Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Received merkle tree for CF from /x.x.x.x Sep 01 08:39:48 hostname cassandra[16223]: WARN 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 CF sync failed Sep 01 08:39:48 hostname cassandra[16223]: INFO 07:39:48 repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 Requesting merkle trees for CF_RecentIndex (to [/x.x.x.x, /1x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x, /x.x.x.x]) Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread Thread[RepairJobTask:24,5,main] Sep 01 08:39:48 hostname cassandra[16223]: org.apache.cassandra.exceptions.RepairException: repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on KeyspaceMetadata/CF, (-2320162195562336336,-2318312110429971422] Validation failed in /x.x.x.x Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_172] Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172] Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_172] Sep 01 08:39:48 hostname cassandra[16223]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172] Sep 01 08:39:48 hostname cassandra[16223]: at java.lang.Thread.run(Thread.java:748) [na:1.8.0_172] Sep 01 08:39:48 hostname cassandra[16223]: ERROR 07:39:48 Exception in thread Thread[ValidationExecutor:53,1,main] Sep 01 08:39:48 hostname cassandra[16223]: org.apache.cassandra.io.FSReadError: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /data/ssd2/data/KeyspaceMetadata/CF-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:324) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) ~[apache-cassandra-2.2.13.jar:2.2.13] Sep 01 08:39:48 hostname cassandra[16223]: at
[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment
[ https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908894#comment-16908894 ] Vladimir Vavro edited comment on CASSANDRA-15274 at 8/16/19 9:30 AM: - Since affected version is 2.2.x there is no sstabledump available, but there is sstable2json. We tried to export one file and the attempt failed - but it looks like it again failed during the crc check based on this part of error message: Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-10664-big-Data.db): corruption detected, chunk at 7392105638 of length 35173. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185) Is it possible that sstable2json is using the same code to handle the data as Cassandra normally does? If it true, is it different for newer utilities sstableexport/sstabledump ? was (Author: vvavro): Since affected version is 2.2.x there is no sstabledump available, but there is sstable2json. We tried to export one file and the attempt failed - but it looks like it again failed during the crc check based on this part of error message: Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-10664-big-Data.db): corruption detected, chunk at 7392105638 of length 35173. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185) Is it possible that sstable2json is using the same code to handle the data as Cassandra normally does? If it true, is it different for newer utilities sstableexport/sstabledump ? > Multiple Corrupt datafiles across entire environment > - > > Key: CASSANDRA-15274 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15274 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Phil O Conduin >Priority: Normal > > Cassandra Version: 2.2.13 > PRE-PROD environment. > * 2 datacenters. > * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_) > * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d) > * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site > B. > We also have 2 Reaper Nodes we use for repair. One reaper node in each > datacenter each running with its own Cassandra back end in a cluster together. > OS Details [Red Hat Linux] > cass_a@x 0 10:53:01 ~ $ uname -a > Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 > x86_64 x86_64 GNU/Linux > cass_a@x 0 10:57:31 ~ $ cat /etc/*release > NAME="Red Hat Enterprise Linux Server" > VERSION="7.6 (Maipo)" > ID="rhel" > Storage Layout > cass_a@xx 0 10:46:28 ~ $ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg01-lv_root 20G 2.2G 18G 11% / > devtmpfs 63G 0 63G 0% /dev > tmpfs 63G 0 63G 0% /dev/shm > tmpfs 63G 4.1G 59G 7% /run > tmpfs 63G 0 63G 0% /sys/fs/cgroup > >> 4 cassandra instances > /dev/sdd 1.5T 802G 688G 54% /data/ssd4 > /dev/sda 1.5T 798G 692G 54% /data/ssd1 > /dev/sdb 1.5T 681G 810G 46% /data/ssd2 > /dev/sdc 1.5T 558G 932G 38% /data/ssd3 > Cassandra load is about 200GB and the rest of the space is snapshots > CPU > cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\(' > CPU(s): 64 > Thread(s) per core: 2 > Core(s) per socket: 16 > Socket(s): 2 > *Description of problem:* > During repair of the cluster, we are seeing multiple corruptions in the log > files on a lot of instances. There seems to be no pattern to the corruption. > It seems that the repair job is finding all the corrupted files for us. The > repair will hang on the node where the corrupted file is found. To fix this > we remove/rename the datafile and bounce the Cassandra instance. Our > hardware/OS team have stated there is no problem on their side. I do not > believe it the repair causing the corruption. > > So let me give you an example of a corrupted file and maybe someone might be > able to work through it with me? > When this corrupted file was reported in the log it looks like it was the > repair that found it. > $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until > "2019-08-07 22:45:00" > Aug 07 22:30:33 cassandra[34611]: INFO
[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment
[ https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907229#comment-16907229 ] Benedict edited comment on CASSANDRA-15274 at 8/14/19 12:30 PM: bq. if they print their entire contents successfully there's already a reasonable chance that the data is not corrupted This comment was alluding to that likelihood - but that we would instead fail to parse the data because of corruption of the stream, long before we printed any garbage out. If we manage to print out, and we do this for every "corrupted" block (and there are many of them), it becomes very likely (but not certain) that the files aren't truly corrupted. was (Author: benedict): bq. if they print their entire contents successfully there's already a reasonable chance that the data is not corrupted This comment was alluding to that likelihood - but that we would instead fail to parse the data because of corruption of the stream, long before we printed any garbage out. If we manage to print out, and we do this for every "corrupted" block (and there are many of them), it becomes very likely the files aren't truly corrupted. > Multiple Corrupt datafiles across entire environment > - > > Key: CASSANDRA-15274 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15274 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Phil O Conduin >Priority: Normal > > Cassandra Version: 2.2.13 > PRE-PROD environment. > * 2 datacenters. > * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_) > * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d) > * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site > B. > We also have 2 Reaper Nodes we use for repair. One reaper node in each > datacenter each running with its own Cassandra back end in a cluster together. > OS Details [Red Hat Linux] > cass_a@x 0 10:53:01 ~ $ uname -a > Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 > x86_64 x86_64 GNU/Linux > cass_a@x 0 10:57:31 ~ $ cat /etc/*release > NAME="Red Hat Enterprise Linux Server" > VERSION="7.6 (Maipo)" > ID="rhel" > Storage Layout > cass_a@xx 0 10:46:28 ~ $ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg01-lv_root 20G 2.2G 18G 11% / > devtmpfs 63G 0 63G 0% /dev > tmpfs 63G 0 63G 0% /dev/shm > tmpfs 63G 4.1G 59G 7% /run > tmpfs 63G 0 63G 0% /sys/fs/cgroup > >> 4 cassandra instances > /dev/sdd 1.5T 802G 688G 54% /data/ssd4 > /dev/sda 1.5T 798G 692G 54% /data/ssd1 > /dev/sdb 1.5T 681G 810G 46% /data/ssd2 > /dev/sdc 1.5T 558G 932G 38% /data/ssd3 > Cassandra load is about 200GB and the rest of the space is snapshots > CPU > cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\(' > CPU(s): 64 > Thread(s) per core: 2 > Core(s) per socket: 16 > Socket(s): 2 > *Description of problem:* > During repair of the cluster, we are seeing multiple corruptions in the log > files on a lot of instances. There seems to be no pattern to the corruption. > It seems that the repair job is finding all the corrupted files for us. The > repair will hang on the node where the corrupted file is found. To fix this > we remove/rename the datafile and bounce the Cassandra instance. Our > hardware/OS team have stated there is no problem on their side. I do not > believe it the repair causing the corruption. > > So let me give you an example of a corrupted file and maybe someone might be > able to work through it with me? > When this corrupted file was reported in the log it looks like it was the > repair that found it. > $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until > "2019-08-07 22:45:00" > Aug 07 22:30:33 cassandra[34611]: INFO 21:30:33 Writing > Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, > 0%/0% of on/off-heap limit) > Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle > tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, > (-1476350953672479093,-1474461 > Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread > Thread[ValidationExecutor:825,1,main] > Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /x/ssd2/data/KeyspaceMetadata/x-1e453cb0 > Aug 07 22:30:33