Re: Can't get md array to shut down cleanly

2006-07-07 Thread Neil Brown
On Friday July 7, [EMAIL PROTECTED] wrote:
  How are you shutting down the machine?  If something sending SIGKILL
  to all processes?
 
 First SIGTERM, then SIGKILL, yes.
 

That really should cause the array to be clean.  Once the md thread
gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean'
the moment there are no pending writes.

  You could try the following patch.  I think it should be safe.
 
 Hmm, it said chunk failed, so I replaced the line by hand. That didn't
 want to compile because mode supposedly wasn't defined ... was that
 supposed to be mddev-safemode? Closest thing to a mode I could find
 ...

That patch was against latest -mm For earlier kernels you want to
test 'ro'.

   if (!no  atomic_read(mddev-active)2) {
  printk(md: %s still ing use.\n 


 Anyway, this is much better: (lines with * are new)
 
 Done unmounting local file systems.
 *md: md0 stopped
 *md: unbind sdf
 *md: export_rdevsdf
 *[last two lines for each disk.]
 *Stopping RAID arrays ... done (1 array(s) stopped).
 Mounting root filesystem read-only ... done

That isn't good. You've stopped the array before the filesystem is
readonly.  Switching to readonly could cause a write which won't work
as the array doesn't exist any more...

NeilBrown


 Will now halt.
 md: stopping all md devices
 * md: md0 switched to read-only mode
 Synchronizing SCSI cache for disk /dev/sdf:
 [...]
 
 As you can see the error message is gone now. Much more interesting
 are the lines before the Will now halt. line. Those were not there
 before -- apparently this first attempt by whatever to shutdown the
 array failed silently.
 
 Not sure if this actually fixes the resync problem (I sure hope so,
 after the last of these no fs could be found anymore on the device)
 but it's 5 past 3 already, will try tomorrow.
 
 Thanks,
 
 C.
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issue with internal bitmaps

2006-07-07 Thread Luca Berra

On Fri, Jul 07, 2006 at 08:16:18AM +1000, Neil Brown wrote:

On Thursday July 6, [EMAIL PROTECTED] wrote:

hello, i just realized that internal bitmaps do not seem to work
anymore.


I cannot imagine why.  Nothing you have listed show anything wrong
with md...

Maybe you were expecting
  mdadm -X /dev/md100
to do something useful.  Like -E, -X must be applied to a component
device.  Try
  mdadm -X /dev/sda1


/me needs some strong coffe. yes you are right, sorry

L.


--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange intermittant errors + RAID doesn't fail the disk.

2006-07-07 Thread Mattias Wadenstein

On Fri, 7 Jul 2006, Neil Brown wrote:


On Thursday July 6, [EMAIL PROTECTED] wrote:

I suggest you find a SATA related mailing list to post this to (Look
in the MAINTAINERS file maybe) or post it to linux-kernel.


linux-ide couldn't help much, aside from recommending a bleeding-edge
patchset which should fix a lot of things SATA:
http://home-tj.org/files/libata-tj-stable/

What fixed the error, though, was exchanging one of the cables. (Just
my luck, it was new and supposedly quality, ... oh well)

I'm still interested in why the md code didn't fail the disk. While it
was 'up' any access to the array would hang for a long time,
ultimately fail and corrupt the fs to boot. When I failed the disk
manually everything was fine (if degraded) again.


md is very dependant on the driver doing the right thing.  It doesn't
do any timeouts or anything like that - it assumes the driver will.
md simply trusts the return status from the drive, and fails a drive
if and only if a write to the drive is reported as failing (if a read
fails, md trys to over-write with good data first).


Hmm.. Perhaps a bit of extra logic there might be good? If you try to 
re-write the failing bit with good data, try to read the recently written 
data back (perhaps after a bit of wait). If that still fails, then fail 
the disk.


If it can't remember recently written data, it is clearly unsuitable for a 
running system. But the occasional block going bad (and getting remapped 
at a write) wouldn't trigger it.


/Mattias Wadenstein
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't get md array to shut down cleanly

2006-07-07 Thread Christian Pernegger

Good morning!


That patch was against latest -mm For earlier kernels you want to
test 'ro'.


Ok. Was using stock 2.6.17.


 Done unmounting local file systems.
 *md: md0 stopped
 *md: unbind sdf
 *md: export_rdevsdf
 *[last two lines for each disk.]
 *Stopping RAID arrays ... done (1 array(s) stopped).
 Mounting root filesystem read-only ... done

That isn't good. You've stopped the array before the filesystem is
readonly.  Switching to readonly could cause a write which won't work
as the array doesn't exist any more...


I don't have root on the md, just a regular fs, which is unmounted
just before that first line above.


That really should cause the array to be clean.  Once the md thread
gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean'
the moment there are no pending writes.


After digging a little deeper it seems that the md thread(s) might not
get their SIGKILL after all. The relevant portion from S20sendsigs is
as follows:

do_stop () {
   # Kill all processes.
   log_action_begin_msg Sending all processes the TERM signal
   killall5 -15
   log_action_end_msg 0
   sleep 5
   log_action_begin_msg Sending all processes the KILL signal
   killall5 -9
   log_action_end_msg 0
}

Apparently killall5 excludes kernel threads. I tried regular killall
but that kills the shutdown script as well :) What do other distros
use? I could file a bug but I highly doubt it would be seen as such.

S40umountfs unmounts non-root filesystems
S50mdadm-raid tries to stop arrays (and maybe succeeds, with patch)
via mdadm --stop.
S90halt halts the machine.

I'd really feel better if I didn't have to rely on userspace at all to
shut down my arrays, though. At least for people with root-on-RAID the
shutdown just before halt / reboot will have to work, anyway.

Any idea what could keep the mddev-active above 2?

Happy to help with bughuntiung -- I can't use the box properly anyway
until I can be sure this is solved.

Thanks,

C.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz

p34:~# mdadm /dev/md3 -a /dev/hde1
mdadm: added /dev/hde1

p34:~# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
  Creation Time : Fri Jun 30 09:17:12 2006
 Raid Level : raid5
 Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Fri Jul  7 08:25:44 2006
  State : clean
 Active Devices : 6
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 512K

   UUID : e76e403c:7811eb65:73be2f3b:0c2fc2ce
 Events : 0.232940

Number   Major   Minor   RaidDevice State
   0  2210  active sync   /dev/hdc1
   1  5611  active sync   /dev/hdi1
   2   312  active sync   /dev/hda1
   3   8   493  active sync   /dev/sdd1
   4  8814  active sync   /dev/hdm1
   5   8   335  active sync   /dev/sdc1

   6  331-  spare   /dev/hde1
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~# mdadm --grow /dev/md3 --bitmap=internal --raid-disks=7
mdadm: can change at most one of size, raiddisks, bitmap, and layout
p34:~# umount /dev/md3
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~#

The disk only has about 350GB of 1.8TB used, any idea why I get this 
error?


I searched google but could not find anything on this issue when trying to 
grow the array?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Setting up mdadm.conf for UUID?

2006-07-07 Thread Ewan Grantham

My RAID-5 array is composed of six USB drives. Unfortunately, my
Ubuntu Dapper system doesn't always assign the same devices to the
drives after a reboot. However, mdadm doesn't seem to like having an
mdadm.conf that doesn't have a Devices line with specified device
names.

Anyway to setup an mdadm.conf so that it will just assemble the drives
using the UUID of the array? Or is the trick to not have an mdadm.conf
and add something to a runlevel script?

TIA,
Ewan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz

On Fri, 7 Jul 2006, Justin Piszcz wrote:


p34:~# mdadm /dev/md3 -a /dev/hde1
mdadm: added /dev/hde1

p34:~# mdadm -D /dev/md3
/dev/md3:
   Version : 00.90.03
 Creation Time : Fri Jun 30 09:17:12 2006
Raid Level : raid5
Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
   Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 6
 Total Devices : 7
Preferred Minor : 3
   Persistence : Superblock is persistent

   Update Time : Fri Jul  7 08:25:44 2006
 State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
 Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

  UUID : e76e403c:7811eb65:73be2f3b:0c2fc2ce
Events : 0.232940

   Number   Major   Minor   RaidDevice State
  0  2210  active sync   /dev/hdc1
  1  5611  active sync   /dev/hdi1
  2   312  active sync   /dev/hda1
  3   8   493  active sync   /dev/sdd1
  4  8814  active sync   /dev/hdm1
  5   8   335  active sync   /dev/sdc1

  6  331-  spare   /dev/hde1
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~# mdadm --grow /dev/md3 --bitmap=internal --raid-disks=7
mdadm: can change at most one of size, raiddisks, bitmap, and layout
p34:~# umount /dev/md3
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~#

The disk only has about 350GB of 1.8TB used, any idea why I get this error?

I searched google but could not find anything on this issue when trying to 
grow the array?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Is it because I use a 512kb chunksize?

Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough 
stripes.  Needed 512
Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array 
info. -28


So the RAID5 reshape only works if you use a 128kb or smaller chunk size?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: second controller: what will my discs be called, and does it matter?

2006-07-07 Thread Gabor Gombas
On Thu, Jul 06, 2006 at 08:12:14PM +0200, Dexter Filmore wrote:

 How can I tell if the discs on the new controller will become sd[e-h] or if 
 they'll be the new a-d and push the existing ones back?

If they are the same type (or more precisely, if they use the same
driver), then their order on the PCI bus will decide. Otherwise, if you
are using modules, then the order you load the drivers will decide. If
the drivers are built into the kernel, then their link order will
decide.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Resizing RAID-1 arrays - some possible bugs and problems

2006-07-07 Thread Reuben Farrelly
I'm just in the process of upgrading the RAID-1 disks in my server, and have 
started to experiment with the RAID-1 --grow command.  The first phase of the 
change went well, I added the new disks to the old arrays and then increased the 
size of the arrays to include both the new and old disks.  This meant that I had 
a full and clean transfer of all the data.  Then took the old disks out...it all 
worked nicely.


However I've had two problems with the next phase which was the resizing of the 
arrays.


Firstly, after moving the array, the kernel still seems to think that the raid 
array is only as big as the older disks.  This is to be expected, however 
looking at the output of this:


[EMAIL PROTECTED] /]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Sat Nov  5 14:02:50 2005
 Raid Level : raid1
 Array Size : 24410688 (23.28 GiB 25.00 GB)
Device Size : 24410688 (23.28 GiB 25.00 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Sat Jul  8 01:23:54 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

   UUID : 24de08b7:e256a424:cca64cdd:638a1428
 Events : 0.5139442

Number   Major   Minor   RaidDevice State
   0   8   340  active sync   /dev/sdc2
   1   821  active sync   /dev/sda2
[EMAIL PROTECTED] /]#

We note that the Device Size according to the system is still 25.0 GB.  Except 
that the device size is REALLY 40Gb, as seen by the output of fdisk -l:


/dev/sda2   8487139070080   fd  Linux raid autodetect

and

/dev/sdc2   8487139070080   fd  Linux raid autodetect

Is that a bug?  My expectation is that this field should now reflect the size of 
the device/partition, with the *Array Size* still being the original, unresized 
size.


Secondly, I understand that I need to use the --grow command to bring the array 
up to the size of the device.
How do I know what size I should specify?  On my old disk, the size of the 
partition as read by fdisk was slightly larger than the array and device size as 
shown by mdadm.

How much difference should there be?
(Hint:  maybe this could be documented in the manpage (please), NeilB?)


And lastly, I felt brave and decided to plunge ahead, resize to 128 blocks 
smaller than the device size.  mdadm --grow /dev/md1 --size=


The kernel then went like this:

md: couldn't update array info. -28
VFS: busy inodes on changed media.
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)

...and kept going and going and going, every now and then the count incremented 
up until about 155 by which point I shut the box down.
The array then refused to come up on boot and after forcing it to reassemble it 
did a full dirty resync up:


md: bindsda3
md: md1 stopped.
md: unbindsda3
md: export_rdev(sda3)
md: bindsda3
md: bindsdc3
md: md1: raid array is not clean -- starting background reconstruction
raid1: raid set md1 active with 2 out of 2 mirrors
attempt to access beyond end of device
sdc3: rw=16, want=39086152, limit=39086145
attempt to access beyond end of device
sda3: rw=16, want=39086152, limit=39086145
md1: bitmap initialized from disk: read 23/38 pages, set 183740 bits, status: -5
md1: failed to create bitmap (-5)
md: pers-run() failed ...
md: array md1 already has disks!
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap file is out of date (0  4258299) -- forcing full recovery
md1: bitmap file is out of date, doing full recovery
md1: bitmap initialized from disk: read 10/10 pages, set 305359 bits, status: 0
created bitmap (150 pages) for device md1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) 
 for reconstruction.

md: using 128k window, over a total of 19542944 blocks.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md1, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
md: md1: sync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:sdc3
 disk 1, wo:0, o:1, dev:sda3

That was not really what I expected to happen.

I am running mdadm-2.3.1 which is the current version shipped with Fedora Core 
right now, but I'm about to file a bug report to get this upgraded.  A 

Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


p34:~# mdadm /dev/md3 -a /dev/hde1
mdadm: added /dev/hde1

p34:~# mdadm -D /dev/md3
/dev/md3:
   Version : 00.90.03
 Creation Time : Fri Jun 30 09:17:12 2006
Raid Level : raid5
Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
   Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 6
 Total Devices : 7
Preferred Minor : 3
   Persistence : Superblock is persistent

   Update Time : Fri Jul  7 08:25:44 2006
 State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
 Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

  UUID : e76e403c:7811eb65:73be2f3b:0c2fc2ce
Events : 0.232940

   Number   Major   Minor   RaidDevice State
  0  2210  active sync   /dev/hdc1
  1  5611  active sync   /dev/hdi1
  2   312  active sync   /dev/hda1
  3   8   493  active sync   /dev/sdd1
  4  8814  active sync   /dev/hdm1
  5   8   335  active sync   /dev/sdc1

  6  331-  spare   /dev/hde1
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~# mdadm --grow /dev/md3 --bitmap=internal --raid-disks=7
mdadm: can change at most one of size, raiddisks, bitmap, and layout
p34:~# umount /dev/md3
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~#

The disk only has about 350GB of 1.8TB used, any idea why I get this 
error?


I searched google but could not find anything on this issue when trying to 
grow the array?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Is it because I use a 512kb chunksize?

Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough 
stripes.  Needed 512
Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array 
info. -28


So the RAID5 reshape only works if you use a 128kb or smaller chunk size?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




From the source:


/* Can only proceed if there are plenty of stripe_heads.
@@ -2599,30 +2593,48 @@ static int raid5_reshape(mddev_t *mddev,
* If the chunk size is greater, user-space should request more
* stripe_heads first.
*/
- if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
+ if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes ||
+ (mddev-new_chunk / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
printk(KERN_WARNING raid5: reshape: not enough stripes. Needed %lu\n,
(mddev-chunk_size / STRIPE_SIZE)*4);
return -ENOSPC;
}

I don't see anything that mentions one needs to use a certain chunk size?

Any idea what the problem is here?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Neil,

Any comments?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange intermittant errors + RAID doesn't fail the disk.

2006-07-07 Thread Doug Ledford
On Fri, 2006-07-07 at 00:29 +0200, Christian Pernegger wrote:

  I don't know exactly how the driver was responding to the bad cable,
  but it clearly wasn't returning an error, so md didn't fail it.
 
 There were a lot of errors in dmesg -- seems like they did not get
 passed up to md? I find it surprising that the md layer doesn't have
 its own timeouts, but then I know nothing about such things :)
 
 Thanks for clearing this up for me,
 
 C.
 
 [...]
 ata2: port reset, p_is 800 is 2 pis 0 cmd 44017 tf d0 ss 123 se 0
 ata2: status=0x50 { DriveReady SeekComplete }
 sdc: Current: sense key: No Sense
Additional sense: No additional sense information
 ata2: handling error/timeout
 ata2: port reset, p_is 0 is 0 pis 0 cmd 44017 tf 150 ss 123 se 0
 ata2: status=0x50 { DriveReady SeekComplete }
 ata2: error=0x01 { AddrMarkNotFound }
 sdc: Current: sense key: No Sense
Additional sense: No additional sense information
 [repeat]

This looks like a bad sd/sata lld interaction problem.  Specifically,
the sata driver wasn't filling in a suitable sense code block to
simulate auto-sense on the command, and the scsi disk driver was either
trying to get sense or retrying the same command.  Anyway, not an md
issue, a sata/scsi issue in terms of why it wasn't getting out of the
reset loop eventually.  I would send your bad cable to Jeff Garzik for
further analysis of the problem ;-)

-- 
Doug Ledford [EMAIL PROTECTED]
http://people.redhat.com/dledford

Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing RAID-1 arrays - some possible bugs and problems

2006-07-07 Thread Justin Piszcz



On Sat, 8 Jul 2006, Reuben Farrelly wrote:

I'm just in the process of upgrading the RAID-1 disks in my server, and have 
started to experiment with the RAID-1 --grow command.  The first phase of the 
change went well, I added the new disks to the old arrays and then increased 
the size of the arrays to include both the new and old disks.  This meant 
that I had a full and clean transfer of all the data.  Then took the old 
disks out...it all worked nicely.


However I've had two problems with the next phase which was the resizing of 
the arrays.


Firstly, after moving the array, the kernel still seems to think that the 
raid array is only as big as the older disks.  This is to be expected, 
however looking at the output of this:


[EMAIL PROTECTED] /]# mdadm --detail /dev/md0
/dev/md0:
   Version : 00.90.03
 Creation Time : Sat Nov  5 14:02:50 2005
Raid Level : raid1
Array Size : 24410688 (23.28 GiB 25.00 GB)
   Device Size : 24410688 (23.28 GiB 25.00 GB)
  Raid Devices : 2
 Total Devices : 2
Preferred Minor : 0
   Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sat Jul  8 01:23:54 2006
 State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
 Spare Devices : 0

  UUID : 24de08b7:e256a424:cca64cdd:638a1428
Events : 0.5139442

   Number   Major   Minor   RaidDevice State
  0   8   340  active sync   /dev/sdc2
  1   821  active sync   /dev/sda2
[EMAIL PROTECTED] /]#

We note that the Device Size according to the system is still 25.0 GB. 
Except that the device size is REALLY 40Gb, as seen by the output of fdisk 
-l:


/dev/sda2   8487139070080   fd  Linux raid autodetect

and

/dev/sdc2   8487139070080   fd  Linux raid autodetect

Is that a bug?  My expectation is that this field should now reflect the size 
of the device/partition, with the *Array Size* still being the original, 
unresized size.


Secondly, I understand that I need to use the --grow command to bring the 
array up to the size of the device.
How do I know what size I should specify?  On my old disk, the size of the 
partition as read by fdisk was slightly larger than the array and device size 
as shown by mdadm.

How much difference should there be?
(Hint:  maybe this could be documented in the manpage (please), NeilB?)


And lastly, I felt brave and decided to plunge ahead, resize to 128 blocks 
smaller than the device size.  mdadm --grow /dev/md1 --size=


The kernel then went like this:

md: couldn't update array info. -28
VFS: busy inodes on changed media.
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)

...and kept going and going and going, every now and then the count 
incremented up until about 155 by which point I shut the box down.
The array then refused to come up on boot and after forcing it to reassemble 
it did a full dirty resync up:


md: bindsda3
md: md1 stopped.
md: unbindsda3
md: export_rdev(sda3)
md: bindsda3
md: bindsdc3
md: md1: raid array is not clean -- starting background reconstruction
raid1: raid set md1 active with 2 out of 2 mirrors
attempt to access beyond end of device
sdc3: rw=16, want=39086152, limit=39086145
attempt to access beyond end of device
sda3: rw=16, want=39086152, limit=39086145
md1: bitmap initialized from disk: read 23/38 pages, set 183740 bits, status: 
-5

md1: failed to create bitmap (-5)
md: pers-run() failed ...
md: array md1 already has disks!
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap file is out of date (0  4258299) -- forcing full recovery
md1: bitmap file is out of date, doing full recovery
md1: bitmap initialized from disk: read 10/10 pages, set 305359 bits, status: 
0

created bitmap (150 pages) for device md1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec)  for reconstruction.

md: using 128k window, over a total of 19542944 blocks.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md1, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
md: md1: sync done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sdc3
disk 1, wo:0, o:1, dev:sda3

That was not really what I expected to happen.

I am running mdadm-2.3.1 which is the current version shipped with Fedora 
Core right now, but I'm about to file a bug report 

Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Fri, 7 Jul 2006, Justin Piszcz wrote:




On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


p34:~# mdadm /dev/md3 -a /dev/hde1
mdadm: added /dev/hde1

p34:~# mdadm -D /dev/md3
/dev/md3:
   Version : 00.90.03
 Creation Time : Fri Jun 30 09:17:12 2006
Raid Level : raid5
Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
   Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 6
 Total Devices : 7
Preferred Minor : 3
   Persistence : Superblock is persistent

   Update Time : Fri Jul  7 08:25:44 2006
 State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
 Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

  UUID : e76e403c:7811eb65:73be2f3b:0c2fc2ce
Events : 0.232940

   Number   Major   Minor   RaidDevice State
  0  2210  active sync   /dev/hdc1
  1  5611  active sync   /dev/hdi1
  2   312  active sync   /dev/hda1
  3   8   493  active sync   /dev/sdd1
  4  8814  active sync   /dev/hdm1
  5   8   335  active sync   /dev/sdc1

  6  331-  spare   /dev/hde1
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~# mdadm --grow /dev/md3 --bitmap=internal --raid-disks=7
mdadm: can change at most one of size, raiddisks, bitmap, and layout
p34:~# umount /dev/md3
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device
p34:~#

The disk only has about 350GB of 1.8TB used, any idea why I get this 
error?


I searched google but could not find anything on this issue when trying 
to grow the array?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Is it because I use a 512kb chunksize?

Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough 
stripes.  Needed 512
Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array 
info. -28


So the RAID5 reshape only works if you use a 128kb or smaller chunk size?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




From the source:


/* Can only proceed if there are plenty of stripe_heads.
@@ -2599,30 +2593,48 @@ static int raid5_reshape(mddev_t *mddev,
* If the chunk size is greater, user-space should request more
* stripe_heads first.
*/
- if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
+ if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes ||
+ (mddev-new_chunk / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
printk(KERN_WARNING raid5: reshape: not enough stripes. Needed %lu\n,
(mddev-chunk_size / STRIPE_SIZE)*4);
return -ENOSPC;
}

I don't see anything that mentions one needs to use a certain chunk size?

Any idea what the problem is here?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Neil,

Any comments?

Justin.



The --grow option worked, sort of.

p34:~# mdadm /dev/md3 --grow --size=max
p34:~# umount /dev/md3
p34:~# mdadm -S /dev/md3
p34:~# mount /dev/md3
Segmentation fault
p34:~#

[4313355.425000] BUG: unable to handle kernel NULL pointer dereference at 
virtual address 00d4

[4313355.425000]  printing eip:
[4313355.425000] c03c377b
[4313355.425000] *pde = 
[4313355.425000] Oops: 0002 [#1]
[4313355.425000] PREEMPT SMP
[4313355.425000] CPU:0
[4313355.425000] EIP:0060:[c03c377b]Not tainted VLI
[4313355.425000] EFLAGS: 00010046   (2.6.17.3 #4)
[4313355.425000] EIP is at _spin_lock_irqsave+0x14/0x61
[4313355.425000] eax:    ebx: f7e6c000   ecx: c0333d12   edx: 
0202
[4313355.425000] esi: 00d4   edi: f7fb9600   ebp: 00d4   esp: 
f7e6dc94

[4313355.425000] ds: 007b   es: 007b   ss: 0068
[4313355.425000] Process mount (pid: 22892, threadinfo=f7e6c000 
task=c1a90580)
[4313355.425000] Stack: c19947e4  c0333d32 0002 c012aaa2 
f7e6dccc f7e6dc9c f7e6dc9c
[4313355.425000]f7e6dccc c0266b8d c19947e4   
e11a61f8 f7e6dccc f7e6dccc
[4313355.425000]0005 f7fda014 f7fda000 f7fe8c00 c0259a79 
e11a61c0 0001 001f

[4313355.425000] Call Trace:
[4313355.425000]  c0333d32 raid5_unplug_device+0x20/0x65  c012aaa2 
flush_workqueue+0x67/0x87

Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Fri, 7 Jul 2006, Justin Piszcz wrote:




On Fri, 7 Jul 2006, Justin Piszcz wrote:




On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


On Fri, 7 Jul 2006, Justin Piszcz wrote:


p34:~# mdadm /dev/md3 -a /dev/hde1
mdadm: added /dev/hde1

p34:~# mdadm -D /dev/md3
/dev/md3:
   Version : 00.90.03
 Creation Time : Fri Jun 30 09:17:12 2006
Raid Level : raid5
Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
   Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 6
 Total Devices : 7
Preferred Minor : 3
   Persistence : Superblock is persistent

   Update Time : Fri Jul  7 08:25:44 2006
 State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
 Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

  UUID : e76e403c:7811eb65:73be2f3b:0c2fc2ce
Events : 0.232940

   Number   Major   Minor   RaidDevice State
  0  2210  active sync   /dev/hdc1
  1  5611  active sync   /dev/hdi1
  2   312  active sync   /dev/hda1
  3   8   493  active sync   /dev/sdd1
  4  8814  active sync   /dev/hdm1
  5   8   335  active sync   /dev/sdc1

  6  331-  spare   /dev/hde1
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on 
device

p34:~# mdadm --grow /dev/md3 --bitmap=internal --raid-disks=7
mdadm: can change at most one of size, raiddisks, bitmap, and layout
p34:~# umount /dev/md3
p34:~# mdadm --grow /dev/md3 --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on 
device

p34:~#

The disk only has about 350GB of 1.8TB used, any idea why I get this 
error?


I searched google but could not find anything on this issue when trying 
to grow the array?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Is it because I use a 512kb chunksize?

Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough 
stripes.  Needed 512
Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array 
info. -28


So the RAID5 reshape only works if you use a 128kb or smaller chunk size?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel 
in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




From the source:


/* Can only proceed if there are plenty of stripe_heads.
@@ -2599,30 +2593,48 @@ static int raid5_reshape(mddev_t *mddev,
* If the chunk size is greater, user-space should request more
* stripe_heads first.
*/
- if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
+ if ((mddev-chunk_size / STRIPE_SIZE) * 4  conf-max_nr_stripes ||
+ (mddev-new_chunk / STRIPE_SIZE) * 4  conf-max_nr_stripes) {
printk(KERN_WARNING raid5: reshape: not enough stripes. Needed %lu\n,
(mddev-chunk_size / STRIPE_SIZE)*4);
return -ENOSPC;
}

I don't see anything that mentions one needs to use a certain chunk size?

Any idea what the problem is here?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Neil,

Any comments?

Justin.



The --grow option worked, sort of.

p34:~# mdadm /dev/md3 --grow --size=max
p34:~# umount /dev/md3
p34:~# mdadm -S /dev/md3
p34:~# mount /dev/md3
Segmentation fault
p34:~#

[4313355.425000] BUG: unable to handle kernel NULL pointer dereference at 
virtual address 00d4

[4313355.425000]  printing eip:
[4313355.425000] c03c377b
[4313355.425000] *pde = 
[4313355.425000] Oops: 0002 [#1]
[4313355.425000] PREEMPT SMP
[4313355.425000] CPU:0
[4313355.425000] EIP:0060:[c03c377b]Not tainted VLI
[4313355.425000] EFLAGS: 00010046   (2.6.17.3 #4)
[4313355.425000] EIP is at _spin_lock_irqsave+0x14/0x61
[4313355.425000] eax:    ebx: f7e6c000   ecx: c0333d12   edx: 
0202
[4313355.425000] esi: 00d4   edi: f7fb9600   ebp: 00d4   esp: 
f7e6dc94

[4313355.425000] ds: 007b   es: 007b   ss: 0068
[4313355.425000] Process mount (pid: 22892, threadinfo=f7e6c000 
task=c1a90580)
[4313355.425000] Stack: c19947e4  c0333d32 0002 c012aaa2 f7e6dccc 
f7e6dc9c f7e6dc9c
[4313355.425000]f7e6dccc c0266b8d c19947e4   e11a61f8 
f7e6dccc f7e6dccc
[4313355.425000]0005 f7fda014 f7fda000 f7fe8c00 c0259a79 e11a61c0 
0001 001f

[4313355.425000] Call Trace:
[4313355.425000]  c0333d32 

Re: Can't get md array to shut down cleanly

2006-07-07 Thread Christian Pernegger

It seems like it really isn't an md issue -- when I remove everything
to do with evms (userspace tools + initrd hooks) everything works
fine.

I took your patch back out and put a few printks in there ...
Without evms the active counter is 1 in an idle state, i. e. after the box
has finished booting.
With evms the counter is 2 in an idle state, and always one higher.

Directly before any attempt to shut down the array the counter is 3
with evms (thus the error) but only 2 without it.

I don't know if evms is buggy and fails to put back a reference or if
the +1 increase in the active counter is legit, and md.c needs a
better check then just active needs to be below 3.

Longish dmesg excerpt follows, maybe someone can pinpoint the cause
and decide what
needs to be done.

md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid5: automatically using best checksumming function: generic_sse
 generic_sse:  4566.000 MB/sec
raid5: using function: generic_sse (4566.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid6: int64x1   1331 MB/s
raid6: int64x2   1650 MB/s
raid6: int64x4   2018 MB/s
raid6: int64x8   1671 MB/s
raid6: sse2x12208 MB/s
raid6: sse2x23104 MB/s
raid6: sse2x42806 MB/s
raid6: using algorithm sse2x2 (3104 MB/s)
md: raid6 personality registered for level 6
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: bindsdb
md: REF DOWN: 1
md: REF UP: 2
md: bindsdc
md: REF DOWN: 1
md: REF UP: 2
md: bindsdd
md: REF DOWN: 1
md: REF UP: 2
md: bindsde
md: REF DOWN: 1
md: REF UP: 2
md: REF UP: 3
md: REF DOWN: 2
raid5: device sdd operational as raid disk 2
raid5: device sdc operational as raid disk 1
raid5: device sdb operational as raid disk 0
raid5: allocated 4262kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sdb
disk 1, o:1, dev:sdc
disk 2, o:1, dev:sdd
md0: bitmap initialized from disk: read 15/15 pages, set 0 bits, status: 0
created bitmap (233 pages) for device md0
RAID5 conf printout:
md: REF DOWN: 1
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sdb
disk 1, o:1, dev:sdc
disk 2, o:1, dev:sdd
disk 3, o:1, dev:sde
md: REF UP: 2
md: REF UP: 3
md: REF DOWN: 2
md: REF DOWN: 1
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than
20 KB/sec) for reconstruction.
md: using 128k window, over a total of 488386432 blocks.
md: REF UP: 2
md: REF DOWN: 1

*** [up to here everything is fine, but the counter never again drops
to 1 afterwards] ***

md: REF UP: 2
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
md: REF UP: 3
hw_random: RNG not detected
md: REF DOWN: 2
Adding 4000176k swap on /dev/evms/sda2.  Priority:-1 extents:1 across:4000176k
EXT3 FS on dm-0, internal journal
md: REF UP: 3
md: REF DOWN: 2
*** [last two lines repeated fairly often, but more like excessive
polling than an infinite error loop] ***

Regards,

C.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing RAID-1 arrays - some possible bugs and problems

2006-07-07 Thread Reuben Farrelly



On 8/07/2006 6:52 a.m., Justin Piszcz wrote:

Reuben,

What chunk size did you use?

I can't even get mine to get past this part:

p34:~# mdadm /dev/md3 --grow --raid-disks=7
mdadm: Need to backup 15360K of critical section..
mdadm: Cannot set device size/shape for /dev/md3: No space left on device

Justin.



Just whatever the system selected for the chunk size, ie I didn't specify it 
myself ;)


[EMAIL PROTECTED] cisco]# cat /proc/mdstat
Personalities : [raid1]

md0 : active raid1 sdc2[0] sda2[1]
  24410688 blocks [2/2] [UU]
  bitmap: 0/187 pages [0KB], 64KB chunk

md1 : active raid1 sdc3[0] sda3[1]
  19542944 blocks [2/2] [UU]
  bitmap: 0/150 pages [0KB], 64KB chunk

md2 : active raid1 sdc5[0] sda5[1]
  4891648 blocks [2/2] [UU]
  bitmap: 2/150 pages [8KB], 16KB chunk

I was working on md1 when I filed the email earlier.

I wonder if the chunk size is left as-is after a --grow, and if this is optimal 
or not or if this could lead to issues...


reuben

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Neil Brown
On Friday July 7, [EMAIL PROTECTED] wrote:
  
  Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough 
  stripes.  Needed 512
  Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array 
  info. -28
  
  So the RAID5 reshape only works if you use a 128kb or smaller chunk size?
  
 
 Neil,
 
 Any comments?
 

Yes.   This is something I need to fix in the next mdadm.
You need to tell md/raid5 to increase the size of the stripe cache
before the grow can proceed.  You can do this with

  echo 600  /sys/block/md3/md/stripe_cache_size

Then the --grow should work.  The next mdadm will do this for you.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing RAID-1 arrays - some possible bugs and problems

2006-07-07 Thread Neil Brown
On Saturday July 8, [EMAIL PROTECTED] wrote:
 I'm just in the process of upgrading the RAID-1 disks in my server, and have 
 started to experiment with the RAID-1 --grow command.  The first phase of the 
 change went well, I added the new disks to the old arrays and then increased 
 the 
 size of the arrays to include both the new and old disks.  This meant that I 
 had 
 a full and clean transfer of all the data.  Then took the old disks out...it 
 all 
 worked nicely.
 
 However I've had two problems with the next phase which was the resizing of 
 the 
 arrays.
 
 Firstly, after moving the array, the kernel still seems to think that the 
 raid 
 array is only as big as the older disks.  This is to be expected, however 
 looking at the output of this:
 
 [EMAIL PROTECTED] /]# mdadm --detail /dev/md0
 /dev/md0:
  Version : 00.90.03
Creation Time : Sat Nov  5 14:02:50 2005
   Raid Level : raid1
   Array Size : 24410688 (23.28 GiB 25.00 GB)
  Device Size : 24410688 (23.28 GiB 25.00 GB)
 
 We note that the Device Size according to the system is still 25.0 GB.  
 Except 
 that the device size is REALLY 40Gb, as seen by the output of fdisk -l:

Device Size is a slight misnomer.  It actually means the amount of
this device that will be used in the array.   Maybe I should make it
Used Device Size.
 
 Secondly, I understand that I need to use the --grow command to bring the 
 array 
 up to the size of the device.
 How do I know what size I should specify? 

 --size=max

  This  value  can  be  set with --grow for RAID level 1/4/5/6. If 
the array was
  created with a size smaller than the currently active drives, the 
extra  space
  can  be  accessed  using  --grow.  The size can be given as max 
which means to
  choose the largest size that fits on all current drives.

 How much difference should there be?
 (Hint:  maybe this could be documented in the manpage (please), NeilB?)

man 4 md
   The  common  format - known as version 0.90 - has a superblock that is 
4K long and is
   written into a 64K aligned block that starts at least 64K and less than 
128K from the
   end  of  the  device (i.e. to get the address of the superblock round 
the size of the
   device down to a multiple of 64K and then subtract 64K).  The available 
size of  each
   device is the amount of space before the super block, so between 64K and 
128K is lost
   when a device in incorporated into an MD array.  This  superblock  
stores  multi-byte
   fields in a processor-dependant manner, so arrays cannot easily be moved 
between com-
   puters with different processors.


 
 
 And lastly, I felt brave and decided to plunge ahead, resize to 128 blocks 
 smaller than the device size.  mdadm --grow /dev/md1 --size=
 
 The kernel then went like this:
 
 md: couldn't update array info. -28
 VFS: busy inodes on changed media.
 md1: invalid bitmap page request: 150 ( 149)
 md1: invalid bitmap page request: 150 ( 149)
 md1: invalid bitmap page request: 150 ( 149)

Oh dear, that's bad.

I guess I didn't think through resizing of an array with an active
bitmap properly... :-(
That won't be fixed in a hurry I'm afraid.
You'll need to remove the bitmap before the grow and re-add it
afterwards, which isn't really ideal.  
I'll look at making this more robust when I return from vacation in a
week or so.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting up mdadm.conf for UUID?

2006-07-07 Thread Neil Brown
On Friday July 7, [EMAIL PROTECTED] wrote:
 My RAID-5 array is composed of six USB drives. Unfortunately, my
 Ubuntu Dapper system doesn't always assign the same devices to the
 drives after a reboot. However, mdadm doesn't seem to like having an
 mdadm.conf that doesn't have a Devices line with specified device
 names.

Sure it does

  DEVICE /dev/sd*
or 
  DEVICE partitions

 
 Anyway to setup an mdadm.conf so that it will just assemble the drives
 using the UUID of the array? Or is the trick to not have an mdadm.conf
 and add something to a runlevel script?
 

Yes.

 DEVICE partitions
 ARRAY /dev/md0 UUID=what:ever:it:ius
 ARRAY /dev/md1 UUID=uuid:for:md:one

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Sat, 8 Jul 2006, Neil Brown wrote:


On Friday July 7, [EMAIL PROTECTED] wrote:


Jul  7 08:44:59 p34 kernel: [4295845.933000] raid5: reshape: not enough
stripes.  Needed 512
Jul  7 08:44:59 p34 kernel: [4295845.962000] md: couldn't update array
info. -28

So the RAID5 reshape only works if you use a 128kb or smaller chunk size?



Neil,

Any comments?



Yes.   This is something I need to fix in the next mdadm.
You need to tell md/raid5 to increase the size of the stripe cache
before the grow can proceed.  You can do this with

 echo 600  /sys/block/md3/md/stripe_cache_size

Then the --grow should work.  The next mdadm will do this for you.

NeilBrown



Hey!  You're awake :)

I am going to try it with just 64kb to prove to myself it works with that, 
but then I will re-create the raid5 again like I had it before and attempt 
it again, I did not see that documented anywhere!! Also, how do you use 
the --backup-file option? Nobody seems to know!

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Neil Brown
On Friday July 7, [EMAIL PROTECTED] wrote:
 
 Hey!  You're awake :)

Yes, and thinking about breakfast (it's 8:30am here).

 
 I am going to try it with just 64kb to prove to myself it works with that, 
 but then I will re-create the raid5 again like I had it before and attempt 
 it again, I did not see that documented anywhere!! Also, how do you use 
 the --backup-file option? Nobody seems to know!

man mdadm
   --backup-file=
  This  is  needed  when  --grow is used to increase the number of
  raid-devices in a RAID5 if there  are no  spare  devices  avail-
  able.   See  the section below on RAID_DEVICE CHANGES.  The file
  should be stored on a separate device, not  on  the  raid  array
  being reshaped.


So e.g.
   mdadm --grow /dev/md3 --raid-disk=7 --backup-file=/root/md3-backup

mdadm will copy the first few stripes to /root/md3-backup and start
the reshape.  Once it gets past the critical section, mdadm will
remove the file.
If your system crashed during the critical section, then you wont be
able to assemble the array without providing the backup file:

e.g.
  mdadm --assemble /dev/md3 --backup-file=/root/md3-backup /dev/sd[a-g]

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Sat, 8 Jul 2006, Neil Brown wrote:


On Friday July 7, [EMAIL PROTECTED] wrote:


Hey!  You're awake :)


Yes, and thinking about breakfast (it's 8:30am here).



I am going to try it with just 64kb to prove to myself it works with that,
but then I will re-create the raid5 again like I had it before and attempt
it again, I did not see that documented anywhere!! Also, how do you use
the --backup-file option? Nobody seems to know!


man mdadm
  --backup-file=
 This  is  needed  when  --grow is used to increase the number of
 raid-devices in a RAID5 if there  are no  spare  devices  avail-
 able.   See  the section below on RAID_DEVICE CHANGES.  The file
 should be stored on a separate device, not  on  the  raid  array
 being reshaped.


So e.g.
  mdadm --grow /dev/md3 --raid-disk=7 --backup-file=/root/md3-backup

mdadm will copy the first few stripes to /root/md3-backup and start
the reshape.  Once it gets past the critical section, mdadm will
remove the file.
If your system crashed during the critical section, then you wont be
able to assemble the array without providing the backup file:

e.g.
 mdadm --assemble /dev/md3 --backup-file=/root/md3-backup /dev/sd[a-g]

NeilBrown



Gotcha, thanks.

Quick question regarding reshaping, must one wait until the re-shape is 
completed before he or she grows the file system?


With the re-shape still in progress, I tried to grow the xfs FS but it 
stayed the same.


p34:~# df -h | grep /raid5
/dev/md3  746G   80M  746G   1% /raid5

p34:~# mdadm /dev/md3 --grow --raid-disks=4
mdadm: Need to backup 384K of critical section..
mdadm: ... critical section passed.
p34:~#

p34:~# cat /proc/mdstat
md3 : active raid5 hdc1[3] sdc1[2] hde1[1] hda1[0]
  781417472 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] 
[]
  []  reshape =  0.0% (85120/390708736) 
finish=840.5min speed=7738K/sec

p34:~#

p34:~# mount /raid5
p34:~# xfs_growfs /raid5
meta-data=/dev/md3   isize=256agcount=32, agsize=6104816 
blks

 =   sectsz=4096  attr=0
data =   bsize=4096   blocks=195354112, imaxpct=25
 =   sunit=16 swidth=48 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=32768, version=2
 =   sectsz=4096  sunit=1 blks
realtime =none   extsz=196608 blocks=0, rtextents=0
data blocks changed from 195354112 to 195354368
p34:~#

p34:~# umount /raid5
p34:~# mount /raid5
p34:~# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md3  746G   80M  746G   1% /raid5
p34:~#

I guess one has to wait until the reshape is complete before growing the 
filesystem..?

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Justin Piszcz



On Sat, 8 Jul 2006, Neil Brown wrote:


On Friday July 7, [EMAIL PROTECTED] wrote:


I guess one has to wait until the reshape is complete before growing the
filesystem..?


Yes.  The extra space isn't available until the reshape has completed
(if it was available earlier, the reshape wouldn't be necessary)

NeilBrown



Just wanted to confirm, thanks for all the help, I look forward to the new 
revision of mdadm :)  In the mean time, after I get another drive I will 
try your work around, but so far it looks good, thanks.!


Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)

2006-07-07 Thread Neil Brown
On Friday July 7, [EMAIL PROTECTED] wrote:
 
 I guess one has to wait until the reshape is complete before growing the 
 filesystem..?

Yes.  The extra space isn't available until the reshape has completed
(if it was available earlier, the reshape wouldn't be necessary)

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing RAID-1 arrays - some possible bugs and problems

2006-07-07 Thread Reuben Farrelly



On 8/07/2006 10:12 a.m., Neil Brown wrote:

On Saturday July 8, [EMAIL PROTECTED] wrote:


And lastly, I felt brave and decided to plunge ahead, resize to 128 blocks 
smaller than the device size.  mdadm --grow /dev/md1 --size=


The kernel then went like this:

md: couldn't update array info. -28
VFS: busy inodes on changed media.
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)
md1: invalid bitmap page request: 150 ( 149)


Oh dear, that's bad.

I guess I didn't think through resizing of an array with an active
bitmap properly... :-(
That won't be fixed in a hurry I'm afraid.
You'll need to remove the bitmap before the grow and re-add it
afterwards, which isn't really ideal.  
I'll look at making this more robust when I return from vacation in a

week or so.

NeilBrown


Thanks for the response and references to the manpage, Neil.  I had misread the 
reference to 'max' and not realised it was a keyword/option that could be passed 
to --size.


I disabled bitmaps and the machine went into a bit of a spin.  It returned 
immediately at the prompt when I removed the bitmap (--bitmap=none) but each 
session locked up as soon as I attempted any sort of disk I/O, ie attempting 
things like 'df' or using 'less' which is mounted on the root filesystem on md0 
which is one I was disabling bitmaps on.  I had to power cycle the box to get a 
response from it.


Nothing was logged in /var/log/messages so unfortunately I've not anything to 
work with on it.  But I guess it further suggests that it was the disk I/O that 
did fall over.


Then I rebooted into single user mode, noticed that the system had in fact 
removed the bitmaps, and did my resize using --size=max.  It worked!  Now I can 
move on to resizing the filesystems themselves.


In other news, last night I requested an upgrade of mdadm in Fedora Core/Devel 
and this has since been done, so 2.5.2 should come through tonight in the 
nightly FC build (and of course be in FC6 when it comes out).


reuben


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html