>>> Eric Robinson <[email protected]> schrieb am 25.09.2017 um 23:03 in Nachricht <dm5pr03mb27290264e6e09f5c8f744467fa...@dm5pr03mb2729.namprd03.prod.outlook.com>
> Problem: > > Under high write load, DRBD exhibits data corruption. In repeated tests over > a month-long period, file corruption occurred after 700-900 GB of data had > been > written to the DRBD volume. > > Testing Platform: > > 2 x Dell PowerEdge R610 servers > 32GB RAM > 6 x Samsung SSD 840 Pro 512GB (latest firmware) > Dell H200 JBOD Controller > SUSE Linux Enterprise Server 12 SP2 (kernel 4.4.74-92.32) > Gigabit network, 900 Mbps throughput, < 1ms latency, 0 packet loss > > Initial Setup: > > Create 2 RAID-0 software arrays using either mdadm or LVM > On Array 1: sda5 through sdf5, create DRBD replicated volume > (drbd0) with an ext4 filesystem > On Array 2: sda6 through sdf6, create LVM logical volume > with an ext4 filesystem > > Procedure: > > Download and build the TrimTester SSD burn-in and TRIM > verification tool from Algolia (https://github.com/algolia/trimtester). > Run TrimTester against the filesystem on drbd0, wait for > corruption to occur > Run TrimTester against the non-drbd backed filesystem, wait > for corruption to occur I don't know the tool, but isn't the expectation a bit high that the tool will trim the correct blocks throuch drbd->LVM/mdadm->device? Why not use the tool on the affected devices directly? > > Results: > > In multiple tests over a period of a month, TrimTester would report file > corruption when run against the DRBD volume after 700-900 GB of data had been > written. The error would usually appear within an hour or two. However, when > running it against the non-DRBD volume on the same physical drives, no > corruption would occur. We could let the burn-in run for 15+ hours and write > 20+ TB of data without a problem. Results were the same with DRBD 8.4 and > 9.0. We also tried disabling the TRIM-testing part of TrimTester and using it > as a simple burn-in tool, just to make sure that SSD TRIM was not a factor. > > Conclusion: > > We are aware of some controversy surrounding the Samsung SSD 8XX series > drives; however, the issues related to that controversy were resolved and no > longer exist as of kernel 4.2. The 840 Pro drives are confirmed to support > RZAT. Also, the data corruption would only occur when writing through the > DRBD layer. It never occurred when bypassing the DRBD layer and writing > directly to the drives, so we must conclude that DRBD has a data corruption > bug under high write load. However, we would be more than happy to be proved > wrong. > > -- > Eric Robinson _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
