Re: Help RAID5 reshape Oops / backup-file

2007-10-17 Thread Nagilum

- Message from [EMAIL PROTECTED] -
Date: Tue, 16 Oct 2007 14:50:09 +0200
From: Nagilum [EMAIL PROTECTED]
Reply-To: Nagilum [EMAIL PROTECTED]
 Subject: Re: Help RAID5 reshape Oops / backup-file
  To: Neil Brown [EMAIL PROTECTED]
  Cc: linux-raid@vger.kernel.org



- Message from [EMAIL PROTECTED] -
Date: Tue, 16 Oct 2007 11:16:19 +1000
From: Neil Brown [EMAIL PROTECTED]
Reply-To: Neil Brown [EMAIL PROTECTED]
 Subject: Re: Help RAID5 reshape Oops / backup-file
  To: Nagilum [EMAIL PROTECTED]
  Cc: linux-raid@vger.kernel.org


Please let me know how it goes.

Also, if you could show me mdadm.conf and mdrun.conf from the
initrd, that might help.

Thanks,
NeilBrown




- End message from [EMAIL PROTECTED] -


Ok, the array reshaped successfully and is back in production. :)
Here the content of the (old) initrd /etc/mdadm/mdadm.conf:

DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=3  
UUID=25da80a6:d56eb9d6:c7780c0e:bc15422d


That needed updating of course. I don't have a mdrun.conf on my  
system or on the initrd.
I'll try to replicate the issue using a different machine and plain  
files (lets see if that works) at the weekend and let you know if I  
succeed.

Again, thank you so much for the patch!
Alex.


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgpDLsjs2EcYb.pgp
Description: PGP Digital Signature


Unsure of changes to array

2007-10-17 Thread Jonathan Gazeley

Dear all,

This is hopefully a simple question for you to answer, but I am fairly 
new to RAID and don't want to risk losing my data!


My setup is as follows:
- I have four 500GB disks. Each disk is split into a 5GB partition, and 
a 495GB partition.
- The four 5GB partitions are in a RAID-5 array, md0. CentOS is 
installed on this 15GB partition.
- The four 495GB partiions are in a RAID-5 array, md1. This partition 
holds my user data.


However, I have decided I no longer wish to two arrays across the disks. 
I've added a fifth disk on which I have installed the OS without RAID, 
meaning the old md0 is currently unused. Can I simply remove the four 
5GB partitions, and resize the four 495GB partitions to fill the entire 
disk? Will this break anything?


Before anybody tells me off for having the OS on a non-RAID disk, this 
is a home server and therefore high availability is not an issue. But 
keeping my data safe against disk failures is important to me.


Cheers,
Jonathan

--
Jonathan Gazeley
ResNet | Wireless Team
Information Services
--

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each 
server has a partitionable raid5 array (/dev/md/d0) and I have to 
synchronize both raid5 volumes by raid1. Thus, I have tried to build a 
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from 
the second server) and I obtain a BUG :


Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1 
/dev/sdi1

...


Hello,

	I have fixed iscsi-target, and I have tested it. It works now without 
any trouble. Patches were posted on iscsi-target mailing list. When I 
use iSCSI to access to foreign raid5 volume, it works fine. I can format 
foreign volume, copy large files on it... But when I tried to create a 
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I 
receive my well known Oops. You can find my dmesg after Oops :


md: md_d0 stopped.
md: bindsdd1
md: bindsde1
md: bindsdf1
md: bindsdg1
md: bindsdh1

md: bindsdc1
raid5: device sdc1 operational as raid disk 0
raid5: device sdh1 operational as raid disk 5
raid5: device sdg1 operational as raid disk 4
raid5: device sdf1 operational as raid disk 3
raid5: device sde1 operational as raid disk 2
raid5: device sdd1 operational as raid disk 1
raid5: allocated 12518kB for md_d0
raid5: raid level 5 set md_d0 active with 6 out of 6 devices, algorithm 2
RAID5 conf printout:
 --- rd:6 wd:6
 disk 0, o:1, dev:sdc1
 disk 1, o:1, dev:sdd1
 disk 2, o:1, dev:sde1
 disk 3, o:1, dev:sdf1
 disk 4, o:1, dev:sdg1
 disk 5, o:1, dev:sdh1
 md_d0: p1
scsi3 : iSCSI Initiator over TCP/IP
scsi 3:0:0:0: Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB)
sd 3:0:0:0: [sdi] Write Protect is off
sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08
sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't 
support DPO or FUA

sd 3:0:0:0: [sdi] 2929451520 512-byte hardware sectors (1499879 MB)
sd 3:0:0:0: [sdi] Write Protect is off
sd 3:0:0:0: [sdi] Mode Sense: 77 00 00 08
sd 3:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't 
support DPO or FUA

 sdi: sdi1
sd 3:0:0:0: [sdi] Attached SCSI disk
md: bindmd_d0p1
md: bindsdi1
md: md7: raid array is not clean -- starting background reconstruction
raid1: raid set md7 active with 2 out of 2 mirrors
md: resync of RAID array md7
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec) for resync.

md: using 256k window, over a total of 1464725632 blocks.
kernel BUG at drivers/md/raid5.c:380!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(4929): Kernel bad sw trap 5 [#1]
TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: 
Not tainted

TPC: get_stripe_work+0x1f4/0x200
g0: 0005 g1: 007c0400 g2: 0001 g3: 
00748400
g4: f800feeb6880 g5: f8000208 g6: f800e7598000 g7: 
00748528
o0: 0029 o1: 00715798 o2: 017c o3: 
0005
o4: 0006 o5: f800e8f0a060 sp: f800e759ad81 ret_pc: 
005ed504

RPC: get_stripe_work+0x1ec/0x200
l0: 0002 l1:  l2: f800e8f0a0a0 l3: 
f800e8f09fe8
l4: f800e8f0a088 l5: fff8 l6: 0005 l7: 
f800e8374000
i0: f800e8f0a028 i1:  i2: 0004 i3: 
f800e759b720
i4: 0080 i5: 0080 i6: f800e759ae51 i7: 
005f0274

I7: handle_stripe5+0x4fc/0x1340
Caller[005f0274]: handle_stripe5+0x4fc/0x1340
Caller[005f211c]: handle_stripe+0x24/0x13e0
Caller[005f4450]: make_request+0x358/0x600
Caller[00542890]: generic_make_request+0x198/0x220
Caller[005eb240]: sync_request+0x608/0x640
Caller[005fef7c]: md_do_sync+0x384/0x920
Caller[005ff8f0]: md_thread+0x38/0x140
Caller[00478b40]: kthread+0x48/0x80
Caller[004273d0]: kernel_thread+0x38/0x60
Caller[00478de0]: kthreadd+0x148/0x1c0
Instruction DUMP: 9210217c  7ff8f57f  90122398 91d02005 30680004 
0100  0100  0100  9de3bf00


I suspect a major bug in raid5 code but I don't know how debug it...

	md7 was crated by mdadm -C /dev/md7 -l1 -n2 /dev/md/d0 /dev/sdi1. 
/dev/md/d0 is a raid5 volume, and sdi a iSCSI disk.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-17 Thread Mike Snitzer
mdadm 2.4.1 through 2.5.6 works. mdadm-2.6's Improve allocation and
use of space for bitmaps in version1 metadata
(199171a297a87d7696b6b8c07ee520363f4603c1) would seem like the
offending change.  Using 1.2 metdata works.

I get the following using the tip of the mdadm git repo or any other
version of mdadm 2.6.x:

# mdadm --create /dev/md2 --run -l 1 --metadata=1.0 --bitmap=internal
-n 2 /dev/sdf --write-mostly /dev/nbd2
mdadm: /dev/sdf appears to be part of a raid array:
level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
mdadm: /dev/nbd2 appears to be part of a raid array:
level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
mdadm: RUN_ARRAY failed: Input/output error
mdadm: stopped /dev/md2

kernel log shows:
md2: bitmap initialized from disk: read 22/22 pages, set 715290 bits, status: 0
created bitmap (350 pages) for device md2
md2: failed to create bitmap (-5)
md: pers-run() failed ...
md: md2 stopped.
md: unbindnbd2
md: export_rdev(nbd2)
md: unbindsdf
md: export_rdev(sdf)
md: md2 stopped.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-17 Thread Dan Williams
On 10/17/07, Dan Williams [EMAIL PROTECTED] wrote:
 On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote:
  BERTRAND Joël wrote:
   Hello,
  
   I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
   server has a partitionable raid5 array (/dev/md/d0) and I have to
   synchronize both raid5 volumes by raid1. Thus, I have tried to build a
   raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
   the second server) and I obtain a BUG :
  
   Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
   /dev/sdi1
   ...
 
  Hello,
 
  I have fixed iscsi-target, and I have tested it. It works now 
  without
  any trouble. Patches were posted on iscsi-target mailing list. When I
  use iSCSI to access to foreign raid5 volume, it works fine. I can format
  foreign volume, copy large files on it... But when I tried to create a
  new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
  receive my well known Oops. You can find my dmesg after Oops :
 

 Can you send your .config and your bootup dmesg?


I found a problem which may lead to the operations count dropping
below zero.  If ops_complete_biofill() gets preempted in between the
following calls:

raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);

...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion.  In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(sh-lock).  Patch attached.

--
Dan


fix-biofill-clear.patch
Description: Binary data


Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

Dan Williams wrote:

On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
server has a partitionable raid5 array (/dev/md/d0) and I have to
synchronize both raid5 volumes by raid1. Thus, I have tried to build a
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
the second server) and I obtain a BUG :

Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
/dev/sdi1
...

Hello,

I have fixed iscsi-target, and I have tested it. It works now without
any trouble. Patches were posted on iscsi-target mailing list. When I
use iSCSI to access to foreign raid5 volume, it works fine. I can format
foreign volume, copy large files on it... But when I tried to create a
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
receive my well known Oops. You can find my dmesg after Oops :



	Your patch does not work for me. It was applied, new kernel was built, 
and I obtain the same Oops.



Can you send your .config and your bootup dmesg?


Yes, of course ;-) Both files are attached. My new Oops is :

kernel BUG at drivers/md/raid5.c:380!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(4258): Kernel bad sw trap 5 [#1]
TSTATE: 80001606 TPC: 005ed50c TNPC: 005ed510 Y: 
Not tainted

TPC: get_stripe_work+0x1f4/0x200

(exactly the same than the old one ;-) ). I have patched iscsi-target to 
avoid alignement bug on sparc64. Do you think a bug in ietd can produced 
this kind of bug ? Patch I have written for iscsi-target (against SVN) 
is attached too.


Regards,

JKB
PROMLIB: Sun IEEE Boot Prom 'OBP 4.23.4 2006/08/04 20:45'
PROMLIB: Root node compatible: sun4v
Linux version 2.6.23 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070831 
(prerelease) (Debian 4.1.2-16)) #7 SMP Wed Oct 17 17:52:22 CEST 2007
ARCH: SUN4V
Ethernet address: 00:14:4f:6f:59:fe
OF stdout device is: /[EMAIL PROTECTED]/[EMAIL PROTECTED]
PROM: Built device tree with 74930 bytes of memory.
MDESC: Size is 32560 bytes.
PLATFORM: banner-name [Sun Fire(TM) T1000]
PLATFORM: name [SUNW,Sun-Fire-T1000]
PLATFORM: hostid [846f59fe]
PLATFORM: serial# [00ab4130]
PLATFORM: stick-frequency [3b9aca00]
PLATFORM: mac-address [144f6f59fe]
PLATFORM: watchdog-resolution [1000 ms]
PLATFORM: watchdog-max-timeout [3153600 ms]
On node 0 totalpages: 522246
  Normal zone: 3583 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 518663 pages, LIFO batch:15
  Movable zone: 0 pages used for memmap
Built 1 zonelists in Zone order.  Total pages: 518663
Kernel command line: root=/dev/md0 ro md=0,/dev/sda4,/dev/sdb4 raid=noautodetect
md: Will configure md0 (super-block) from /dev/sda4,/dev/sdb4, below.
PID hash table entries: 4096 (order: 12, 32768 bytes)
clocksource: mult[1] shift[16]
clockevent: mult[8000] shift[31]
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table entries: 524288 (order: 9, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 8, 2097152 bytes)
Memory: 4138072k available (2608k kernel code, 960k data, 144k init) 
[f800,fffc8000]
SLUB: Genslabs=23, HWalign=32, Order=0-2, MinObjects=8, CPUs=32, Nodes=1
Calibrating delay using timer specific routine.. 1995.16 BogoMIPS (lpj=3990330)
Mount-cache hash table entries: 512
Brought up 24 CPUs
xor: automatically using best checksumming function: Niagara
   Niagara   :   240.000 MB/sec
xor: using function: Niagara (240.000 MB/sec)
NET: Registered protocol family 16
PCI: Probing for controllers.
SUN4V_PCI: Registered hvapi major[1] minor[0]
/[EMAIL PROTECTED]: SUN4V PCI Bus Module
/[EMAIL PROTECTED]: PCI IO[e81000] MEM[ea]
/[EMAIL PROTECTED]: SUN4V PCI Bus Module
/[EMAIL PROTECTED]: PCI IO[f01000] MEM[f2]
PCI: Scanning PBM /[EMAIL PROTECTED]
PCI: Scanning PBM /[EMAIL PROTECTED]
ebus: No EBus's found.
SCSI subsystem initialized
NET: Registered protocol family 2
Time: stick clocksource has been installed.
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 20
Switched to high resolution mode on CPU 8
Switched to high resolution mode on CPU 21
Switched to high resolution mode on CPU 9
Switched to high resolution mode on CPU 22
Switched to high resolution mode on CPU 10
Switched to high resolution mode on CPU 23
Switched to high resolution mode on CPU 11
Switched to high resolution mode on CPU 12
Switched to high resolution mode on CPU 13
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 14
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 15
Switched to high resolution mode on CPU 3
Switched to high resolution mode on CPU 16
Switched to high resolution mode on CPU 4
Switched to high resolution mode on CPU 17
Switched to high resolution mode on 

Re: [BUG] Raid5 trouble

2007-10-17 Thread BERTRAND Joël

Dan Williams wrote:

On 10/17/07, Dan Williams [EMAIL PROTECTED] wrote:

On 10/17/07, BERTRAND Joël [EMAIL PROTECTED] wrote:

BERTRAND Joël wrote:

Hello,

I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
server has a partitionable raid5 array (/dev/md/d0) and I have to
synchronize both raid5 volumes by raid1. Thus, I have tried to build a
raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
the second server) and I obtain a BUG :

Root gershwin:[/usr/scripts]  mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
/dev/sdi1
...

Hello,

I have fixed iscsi-target, and I have tested it. It works now without
any trouble. Patches were posted on iscsi-target mailing list. When I
use iSCSI to access to foreign raid5 volume, it works fine. I can format
foreign volume, copy large files on it... But when I tried to create a
new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
receive my well known Oops. You can find my dmesg after Oops :


Can you send your .config and your bootup dmesg?



I found a problem which may lead to the operations count dropping
below zero.  If ops_complete_biofill() gets preempted in between the
following calls:

raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);

...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion.  In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(sh-lock).  Patch attached.


Dan,

I have modified get_stripe_work like this :

static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;

pending = sh-ops.pending;

test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a=ack;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b=ack;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c=ack;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d=ack;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e=ack;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f=ack;
if (test_and_clear_bit(STRIPE_OP_IO, sh-ops.pending))
ack++;
g=ack;

sh-ops.count -= ack;

if (sh-ops.count0) printk(%d %d %d %d %d %d %d\n, 
a,b,c,d,e,f,g);

BUG_ON(sh-ops.count  0);

return pending;
}

and I obtain on console :

 1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
  \|/  \|/
  @'/ .. \`@
  /_| \__/ |_\
 \__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]

If that can help you...

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: experiences with raid5: stripe_queue patches

2007-10-17 Thread Bernd Schubert
Hello Dan, hello Neil,

thanks for your help!

On Tuesday 16 October 2007 19:31:08 Dan Williams wrote:
 On Mon, 2007-10-15 at 08:03 -0700, Bernd Schubert wrote:
  Hi,
 
  in order to tune raid performance I did some benchmarks with and
  without the
  stripe queue patches. 2.6.22 is only for comparison to rule out other
  effects, e.g. the new scheduler, etc.

 Thanks for testing!

  It seems there is a regression with these patch regarding the re-write
  performance, as you can see its almost 50% of what it should be.
 
  write  re-write   read   re-read
  480844.26  448723.48  707927.55  706075.02 (2.6.22 w/o SQ patches)
  487069.47  232574.30  709038.28  707595.09 (2.6.23 with SQ patches)
  469865.75  438649.88  711211.92  703229.00 (2.6.23 without SQ patches)

 A quick way to verify that it is a fairness issue is to simply not
 promote full stripe writes to their own list, debug patch follows:

I tested with that and the rewrite performance is better, but still not 
perfect:

  write  re-writeread  re-read
461794.14   377896.27  701793.81  693018.02


[...]

 I made a rough attempt at multi-threading raid5[1] a while back.
 However, this configuration only helps affinity, it does not address the
 cases where the load needs to be further rebalanced between cpus.

  Thanks,
  Bernd

 [1] http://marc.info/?l=linux-raidm=117262977831208w=2
 Note this implementation incorrectly handles the raid6 spare_page, we
 would need a spare_page per cpu.

Ah great, I will test this on Friday.

Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unsure of changes to array

2007-10-17 Thread Bill Davidsen

Jonathan Gazeley wrote:

Dear all,

This is hopefully a simple question for you to answer, but I am fairly 
new to RAID and don't want to risk losing my data!


My setup is as follows:
- I have four 500GB disks. Each disk is split into a 5GB partition, 
and a 495GB partition.
- The four 5GB partitions are in a RAID-5 array, md0. CentOS is 
installed on this 15GB partition.
- The four 495GB partiions are in a RAID-5 array, md1. This partition 
holds my user data.


However, I have decided I no longer wish to two arrays across the 
disks. I've added a fifth disk on which I have installed the OS 
without RAID, meaning the old md0 is currently unused. Can I simply 
remove the four 5GB partitions, and resize the four 495GB partitions 
to fill the entire disk? Will this break anything?


Not unless you make a mistake in typing, or your hardware or power fails 
during the process. Of course in that case you will possibly lose 
everything.


Before anybody tells me off for having the OS on a non-RAID disk, this 
is a home server and therefore high availability is not an issue. But 
keeping my data safe against disk failures is important to me.


Having your o/s and swap on RAID reduces your chances of losing your 
data. If it were me and reliability were the goal, I would have the o/s 
mirrored on the first two drives (as seen by the BIOS) so you boot if 
one fails hard, then put swap RAID-10 in the little 5GB partition on the 
other three drives, then convert the raid-5 to raid-6 using the rest of 
the added fifth drive.


Anything which reduces the chances of an unclean shutdown improves the 
chances of keeping your data. A decent UPS is a big help in that regard. 
Disk failures on the data drives are not the only threat to your data!


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-17 Thread Bill Davidsen

Mike Snitzer wrote:

mdadm 2.4.1 through 2.5.6 works. mdadm-2.6's Improve allocation and
use of space for bitmaps in version1 metadata
(199171a297a87d7696b6b8c07ee520363f4603c1) would seem like the
offending change.  Using 1.2 metdata works.

I get the following using the tip of the mdadm git repo or any other
version of mdadm 2.6.x:

# mdadm --create /dev/md2 --run -l 1 --metadata=1.0 --bitmap=internal
-n 2 /dev/sdf --write-mostly /dev/nbd2
mdadm: /dev/sdf appears to be part of a raid array:
level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
mdadm: /dev/nbd2 appears to be part of a raid array:
level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
mdadm: RUN_ARRAY failed: Input/output error
mdadm: stopped /dev/md2

kernel log shows:
md2: bitmap initialized from disk: read 22/22 pages, set 715290 bits, status: 0
created bitmap (350 pages) for device md2
md2: failed to create bitmap (-5)
md: pers-run() failed ...
md: md2 stopped.
md: unbindnbd2
md: export_rdev(nbd2)
md: unbindsdf
md: export_rdev(sdf)
md: md2 stopped.
  


I would start by retrying with an external bitmap, to see if for some 
reason there isn't room for the bitmap. If that fails, perhaps no bitmap 
at all would be a useful data point. Was the original metadata the same 
version? Things moved depending on the exact version, and some 
--zero-superblock magic might be needed. Hopefully Neil can clarify, I'm 
just telling you what I suspect is the problem, and maybe a 
non-destructive solution.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-17 Thread Mike Snitzer
On 10/17/07, Bill Davidsen [EMAIL PROTECTED] wrote:
 Mike Snitzer wrote:
  mdadm 2.4.1 through 2.5.6 works. mdadm-2.6's Improve allocation and
  use of space for bitmaps in version1 metadata
  (199171a297a87d7696b6b8c07ee520363f4603c1) would seem like the
  offending change.  Using 1.2 metdata works.
 
  I get the following using the tip of the mdadm git repo or any other
  version of mdadm 2.6.x:
 
  # mdadm --create /dev/md2 --run -l 1 --metadata=1.0 --bitmap=internal
  -n 2 /dev/sdf --write-mostly /dev/nbd2
  mdadm: /dev/sdf appears to be part of a raid array:
  level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
  mdadm: /dev/nbd2 appears to be part of a raid array:
  level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
  mdadm: RUN_ARRAY failed: Input/output error
  mdadm: stopped /dev/md2
 
  kernel log shows:
  md2: bitmap initialized from disk: read 22/22 pages, set 715290 bits, 
  status: 0
  created bitmap (350 pages) for device md2
  md2: failed to create bitmap (-5)
  md: pers-run() failed ...
  md: md2 stopped.
  md: unbindnbd2
  md: export_rdev(nbd2)
  md: unbindsdf
  md: export_rdev(sdf)
  md: md2 stopped.
 

 I would start by retrying with an external bitmap, to see if for some
 reason there isn't room for the bitmap. If that fails, perhaps no bitmap
 at all would be a useful data point. Was the original metadata the same
 version? Things moved depending on the exact version, and some
 --zero-superblock magic might be needed. Hopefully Neil can clarify, I'm
 just telling you what I suspect is the problem, and maybe a
 non-destructive solution.

Creating with an external bitmap works perfectly fine.  As does
creating without a bitmap.  --zero-superblock hasn't helped.  Metadata
v1.1 and v1.2 works with an internal bitmap.  I'd like to use v1.0
with an internal bitmap (using an external bitmap isn't an option for
me).

It does appear that the changes to sb super1.c aren't leaving adequate
room for the bitmap.  Looking at the relevant diff for v1.0 metadata
the newer super1.c code makes use of a larger bitmap (128K) for
devices  200GB.  My blockdevice is 700GB.  So could the larger
blockdevice possibly explain why others haven't noticed this?

Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software RAID when it works and when it doesn't

2007-10-17 Thread Support
On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote:

 Was the disk driver generating any low level errors or otherwise
 indicating that it might be retrying operations on the bad drive at
 the time (i.e. console diagnostics)?  As Neil mentioned later, the md layer
 is at the mercy of the low level disk driver.  We've observed abysmal
 RAID1 recovery times on failing SATA disks because all the time is
 being spent in the driver retrying operations which will never succeed.
 Also, read errors don't tend to fail the array so when the bad disk is
 again accessed for some subsequent read the whole hopeless retry process
 begins anew.

The console was full of errors like:

end_request: I/O error, dev sdb, sector 42644555

I don't know what generates those messages.

As I asked before but never got an answer, is there a way to do timeouts
within the md code so that we are not at the mercy of the lower layer
drivers?

 
 I posted a patch about 6 weeks ago which attempts to improve this situation
 for RAID1 by telling the driver not to retry on failures and giving some
 weight to read errors for failing the array.  Hopefully, Neil is still
 mulling it over and it or something similar will eventually make it into
 the main line kernel as a solution for this problem.
 --
 Mike Accetta
 

Thanks,

Alberto
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid5 trouble

2007-10-17 Thread Dan Williams
On Wed, 2007-10-17 at 09:44 -0700, BERTRAND Joël wrote:
 Dan,
 
 I have modified get_stripe_work like this :
 
 static unsigned long get_stripe_work(struct stripe_head *sh)
 {
  unsigned long pending;
  int ack = 0;
  int a,b,c,d,e,f,g;
 
  pending = sh-ops.pending;
 
  test_and_ack_op(STRIPE_OP_BIOFILL, pending);
  a=ack;
  test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
  b=ack;
  test_and_ack_op(STRIPE_OP_PREXOR, pending);
  c=ack;
  test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
  d=ack;
  test_and_ack_op(STRIPE_OP_POSTXOR, pending);
  e=ack;
  test_and_ack_op(STRIPE_OP_CHECK, pending);
  f=ack;
  if (test_and_clear_bit(STRIPE_OP_IO, sh-ops.pending))
  ack++;
  g=ack;
 
  sh-ops.count -= ack;
 
  if (sh-ops.count0) printk(%d %d %d %d %d %d %d\n,
 a,b,c,d,e,f,g);
  BUG_ON(sh-ops.count  0);
 
  return pending;
 }
 
 and I obtain on console :
 
   1 1 1 1 1 2
 kernel BUG at drivers/md/raid5.c:390!
\|/  \|/
@'/ .. \`@
/_| \__/ |_\
   \__U_/
 md7_resync(5409): Kernel bad sw trap 5 [#1]
 
 If that can help you...
 
 JKB

This gives more evidence that it is probably mishandling of
STRIPE_OP_BIOFILL.  The attached patch (replacing the previous) moves
the clearing of these bits into handle_stripe5 and adds some debug
information.

--
Dan
raid5: fix clearing of biofill operations (try2)

From: Dan Williams [EMAIL PROTECTED]

ops_complete_biofill() runs outside of spin_lock(sh-lock) and clears the
'pending' and 'ack' bits.  Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.

Move the clearing of these bits to handle_stripe5(), under the lock.

Signed-off-by: Dan Williams [EMAIL PROTECTED]
---

 drivers/md/raid5.c |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f96dea9..3808f52 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(struct stripe_head *sh)
 		ack++;
 
 	sh-ops.count -= ack;
-	BUG_ON(sh-ops.count  0);
+	if (unlikely(sh-ops.count  0)) {
+		printk(KERN_ERR pending: %#lx ops.pending: %#lx ops.ack: %#lx 
+			ops.complete: %#lx\n, pending, sh-ops.pending,
+			sh-ops.ack, sh-ops.complete);
+		BUG();
+	}
 
 	return pending;
 }
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
 			}
 		}
 	}
-	clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
-	clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
+	set_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
 
 	return_io(return_bi);
 
@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)
 	s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state);
 	/* Now to look around and see what can be done */
 
+	/* clean-up completed biofill operations */
+	if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) {
+		clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
+		clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
+		clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
+	}
+
 	rcu_read_lock();
 	for (i=disks; i--; ) {
 		mdk_rdev_t *rdev;


Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-17 Thread Neil Brown
On Wednesday October 17, [EMAIL PROTECTED] wrote:
 mdadm 2.4.1 through 2.5.6 works. mdadm-2.6's Improve allocation and
 use of space for bitmaps in version1 metadata
 (199171a297a87d7696b6b8c07ee520363f4603c1) would seem like the
 offending change.  Using 1.2 metdata works.
 
 I get the following using the tip of the mdadm git repo or any other
 version of mdadm 2.6.x:
 
 # mdadm --create /dev/md2 --run -l 1 --metadata=1.0 --bitmap=internal
 -n 2 /dev/sdf --write-mostly /dev/nbd2
 mdadm: /dev/sdf appears to be part of a raid array:
 level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
 mdadm: /dev/nbd2 appears to be part of a raid array:
 level=raid1 devices=2 ctime=Wed Oct 17 10:17:31 2007
 mdadm: RUN_ARRAY failed: Input/output error
 mdadm: stopped /dev/md2
 
 kernel log shows:
 md2: bitmap initialized from disk: read 22/22 pages, set 715290 bits, status:  0
 created bitmap (350 pages) for device md2
 md2: failed to create bitmap (-5)

Could you please tell me the exact size of your device?  Then should
be able to reproduce it and test a fix.

(It works for a 734003201K device).

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html