Re: [PATCH] btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

2017-02-21 Thread Lakshmipathi.G
> >
> >Looked into patch description:
> >
> >After scrubbing dev3 only:
> >0xcdcd (Good)  |  0xcdcd  | 0xcdcd (Bad)
> >(D1)  (D2)(P)
> >
> >So the Parity stripe (P) always get replaced by exact content of D1/D2 
> >(data-stripe)
> >or by random  data?
> 
> Neither. it's just XOR result of D2(never changed, 0xcdcd) and old D1(wrong,
> 0x).
> 0xcdcd XOR 0x = 0xcdcd
> 
> So you got 0xcdcd, bad result.
> 
> If you corrupt D1 with random data, then parity will be random then.
> 
> >If it always  get replaced by exact value from either
> >D1 or D2.  I think current script can be modified to detect that bug. If 
> >parity gets
> >replaced by random value, then it will the make task more difficult.
> 
> Not hard to detect.
> As the content is completely under your control, you know the correct parity
> value, and you can verify it very easy.
> 

version-3 of this script calculates exact data/parity location, instead of 
dumping data 
and searching locations. Tested with upto 8MB file, from the output all 128 
data-stripes 
and 64 parity stripe location seems fine. It constantly hit the parity bug with 
the script.


If the script gets accepted, will add slightly other corruption variants likes:
- corrupt all even stripe (D2,D4..)
- corrupt all odd stripe  (D1,D3..)
- corrupt all parity stripes
- corrupt all both data stripe (D0 & D1) and expect error message
(Cover above cases for RAID6)

thanks.

Cheers.
Lakshmipathi.G

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

2017-02-16 Thread Lakshmipathi.G
On Thu, Feb 16, 2017 at 09:12:31AM +0800, Qu Wenruo wrote:
> 
> 
> At 02/16/2017 04:56 AM, Lakshmipathi.G wrote:
> >On Wed, Feb 15, 2017 at 05:29:33PM +0800, Qu Wenruo wrote:
> >>
> >>
> >>At 02/15/2017 05:03 PM, Lakshmipathi.G wrote:
> >>>Signed-off-by: Lakshmipathi.G 
> >>>---
> >>>.../020-raid5-datastripe-corruption/test.sh| 224 
> >>>+
> >>>1 file changed, 224 insertions(+)
> >>>create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>
> >>>diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh 
> >>>b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>new file mode 100755
> >>>index 000..d04c430
> >>>--- /dev/null
> >>>+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>@@ -0,0 +1,224 @@
> >>>+#!/bin/bash
> >>>+#
> >>>+# Raid5: Inject data stripe corruption and fix them using scrub.
> >>>+#
> >>>+# Script will perform the following:
> >>>+# 1) Create Raid5 using 3 loopback devices.
> >>>+# 2) Ensure file layout is created in a predictable manner.
> >>>+#Each data stripe(64KB) should uniquely start with 'DN',
> >>>+#where N represents the data stripe number.(ex:D0,D1 etc)
> >>
> >>If you want really predictable layout, you could just upload compressed
> >>images for this purpose.
> >>
> >>Which makes things super easy, and unlike fstests, btrfs-progs self-test
> >>accepts such images.
> >>
> >>>+# 3) Once file is created with specific layout, gather data stripe details
> >>>+#like devicename, position and actual on-disk data.
> >>>+# 4) Now use 'dd' to verify the data-stripe against its expected value
> >>>+#and inject corruption by zero'ing out contents.
> >>>+# 5) After injecting corruption, running online-scrub is expected to fix
> >>>+#the corrupted data stripe with the help of parity block and
> >>>+#corresponding data stripe.
> >>
> >>You should also verify parity stripe is not corrupted.
> >>It's already known that RAID5/6 will corrupted parity while recovering data
> >>stripe.
> >>
> >>Kernel patch for this, with detailed bug info.
> >>https://patchwork.kernel.org/patch/9553581/
> >>
> >>>+# 6) Finally, validate the data stripe has original un-corrupted value.
> >>>+#
> >>>+#  Note: This script doesn't handle parity block corruption.
> >>
> >>Normally such test case should belong to xfstests (renamed to fstests
> >>recently) as we're verifying kernel behavior, not btrfs-progs behavior.
> >>
> >>But since fstests test case should be as generic as possible, and we don't
> >>have a good enough tool to corrupt given data/parity stripe, my previously
> >>submitted test case is rejected.
> >>
> >>Personally speaking, this seems to be a dilemma for me.
> >>
> >>We really need a test case for this, bugs has been spotted that RAID5/6
> >>scrub will corrupt P/Q while recovering data stripe.
> >>But we need to enhance btrfs-corrupt-block to a better shape to make fstests
> >>to accept it, and it won't take a short time.
> >>
> >>So I really have no idea what should we do for such test.
> >>
> >>Thanks,
> >>Qu
> >
> >Will check compressed images for parity strpe testing. I assume at the 
> >moment,
> >we currently support single static compressed image. Adding more than one 
> >static
> >compressed images like disk1.img disk2.img disk3.img for RAID is supported in
> >existing test framework?
> 
> Not yet, but since you can use test.sh instead of running check_image() from
> test frameset, it's never a big problem.
> 
ok, will check it out.
> >
> >Using compressed images for checking parity seems little easier than 
> >computing
> >via scripting.
> >
> >Looked into patch description:
> >
> >After scrubbing dev3 only:
> >0xcdcd (Good)  |  0xcdcd  | 0xcdcd (Bad)
> >(D1)  (D2)(P)
> >
> >So the Parity stripe (P) always get replaced by exact content of D1/D2 
> >(data-stripe)
> >or by random  data?
> 
> Neither. it's just XOR result of D2(never changed, 0xcdcd) and old D1(wrong,
> 0x).
> 0xcdcd XOR 0x = 0xcdcd
> 
> So you got 0xcdcd, bad result.
> 
> If you corrupt D1 with random data, then parity will be random then.
> 
> >If it always  get replaced by exact value from either
> >D1 or D2.  I think current script can be modified to detect that bug. If 
> >parity gets
> >replaced by random value, then it will the make task more difficult.
> 
> Not hard to detect.
> As the content is completely under your control, you know the correct parity
> value, and you can verify it very easy.
> 
The script corrupts data-stripe (D1 or D2) in the random manner. So lets assume 
wrong 
parity will be in random format.

I tried for one-liners in computing XOR of two strings. 
str1 = "D0x"
str2 = "D1x"

failed to figure it out. I think parity will be 
""0001"
for above case. For higher-numbered  data-stripe (D15), parity will be 
slighly 
different like "1000"


Re: [PATCH] btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

2017-02-15 Thread Lakshmipathi.G
On Wed, Feb 15, 2017 at 05:29:33PM +0800, Qu Wenruo wrote:
> 
> 
> At 02/15/2017 05:03 PM, Lakshmipathi.G wrote:
> >Signed-off-by: Lakshmipathi.G 
> >---
> > .../020-raid5-datastripe-corruption/test.sh| 224 
> > +
> > 1 file changed, 224 insertions(+)
> > create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >
> >diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh 
> >b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >new file mode 100755
> >index 000..d04c430
> >--- /dev/null
> >+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >@@ -0,0 +1,224 @@
> >+#!/bin/bash
> >+#
> >+# Raid5: Inject data stripe corruption and fix them using scrub.
> >+#
> >+# Script will perform the following:
> >+# 1) Create Raid5 using 3 loopback devices.
> >+# 2) Ensure file layout is created in a predictable manner.
> >+#Each data stripe(64KB) should uniquely start with 'DN',
> >+#where N represents the data stripe number.(ex:D0,D1 etc)
> 
> If you want really predictable layout, you could just upload compressed
> images for this purpose.
> 
> Which makes things super easy, and unlike fstests, btrfs-progs self-test
> accepts such images.
> 
> >+# 3) Once file is created with specific layout, gather data stripe details
> >+#like devicename, position and actual on-disk data.
> >+# 4) Now use 'dd' to verify the data-stripe against its expected value
> >+#and inject corruption by zero'ing out contents.
> >+# 5) After injecting corruption, running online-scrub is expected to fix
> >+#the corrupted data stripe with the help of parity block and
> >+#corresponding data stripe.
> 
> You should also verify parity stripe is not corrupted.
> It's already known that RAID5/6 will corrupted parity while recovering data
> stripe.
> 
> Kernel patch for this, with detailed bug info.
> https://patchwork.kernel.org/patch/9553581/
> 
> >+# 6) Finally, validate the data stripe has original un-corrupted value.
> >+#
> >+#  Note: This script doesn't handle parity block corruption.
> 
> Normally such test case should belong to xfstests (renamed to fstests
> recently) as we're verifying kernel behavior, not btrfs-progs behavior.
> 
> But since fstests test case should be as generic as possible, and we don't
> have a good enough tool to corrupt given data/parity stripe, my previously
> submitted test case is rejected.
> 
> Personally speaking, this seems to be a dilemma for me.
> 
> We really need a test case for this, bugs has been spotted that RAID5/6
> scrub will corrupt P/Q while recovering data stripe.
> But we need to enhance btrfs-corrupt-block to a better shape to make fstests
> to accept it, and it won't take a short time.
> 
> So I really have no idea what should we do for such test.
> 
> Thanks,
> Qu

Will check compressed images for parity strpe testing. I assume at the moment,
we currently support single static compressed image. Adding more than one static
compressed images like disk1.img disk2.img disk3.img for RAID is supported in
existing test framework?

Using compressed images for checking parity seems little easier than computing
via scripting.

Looked into patch description:

After scrubbing dev3 only:
0xcdcd (Good)  |  0xcdcd  | 0xcdcd (Bad) 
(D1)  (D2)(P) 

So the Parity stripe (P) always get replaced by exact content of D1/D2 
(data-stripe)
or by random  data? If it always  get replaced by exact value from either
D1 or D2.  I think current script can be modified to detect that bug. If parity 
gets
replaced by random value, then it will the make task more difficult.

Yes, without better support for RAID with tools like btrfs-corrupt-block, it 
will be
hard to play-around with RAID to create test scripts.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html