Re: Is shrinking raid5 possible?

2006-06-18 Thread Paul Davidson

Neil Brown wrote:

Yep.
The '--size' option refers to:
  Amount  (in  Kibibytes)  of  space  to  use  from  each drive in
  RAID1/4/5/6.  This must be a multiple of  the  chunk  size,  and
  must  leave about 128Kb of space at the end of the drive for the
  RAID superblock.  
(from the man page).


So you were telling md to use the first 600GB of each device in the
array, and it told you there wasn't that much room.
If your array has N drives, you need to divide the target array size
by N-1 to find the target device size.
So if you have a 5 drive array, then you want
  --size=157286400

NeilBrown


Thanks, and sorry for not being able to read properly -- I read this at
least three times and didn't notice it was the drive size and not the
array size.

Cheers, Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is shrinking raid5 possible?

2006-06-18 Thread Neil Brown
On Monday June 19, [EMAIL PROTECTED] wrote:
> Hi,
> 
> I'd like to shrink the size of a RAID5 array - is this
> possible? My first attempt shrinking 1.4Tb to 600Gb,
> 
> mdadm --grow /dev/md5 --size=629145600
> 
> gives
> 
> mdadm: Cannot set device size/shape for /dev/md5: No space left on device

Yep.
The '--size' option refers to:
  Amount  (in  Kibibytes)  of  space  to  use  from  each drive in
  RAID1/4/5/6.  This must be a multiple of  the  chunk  size,  and
  must  leave about 128Kb of space at the end of the drive for the
  RAID superblock.  
(from the man page).

So you were telling md to use the first 600GB of each device in the
array, and it told you there wasn't that much room.
If your array has N drives, you need to divide the target array size
by N-1 to find the target device size.
So if you have a 5 drive array, then you want
  --size=157286400

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is shrinking raid5 possible?

2006-06-18 Thread Paul Davidson

Hi,

I'd like to shrink the size of a RAID5 array - is this
possible? My first attempt shrinking 1.4Tb to 600Gb,

mdadm --grow /dev/md5 --size=629145600

gives

mdadm: Cannot set device size/shape for /dev/md5: No space left on device

which is true but not particularly relevant :). If mdadm
doesn't support this for online arrays, can I do it offline
somehow?

I'd like to retain the ext3 filesystem on this device,
which I have already shrunk to 400Gb with resize2fs.

Thanks for any help,
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 reshape

2006-06-18 Thread Neil Brown
On Sunday June 18, [EMAIL PROTECTED] wrote:
> This from dmesg might help diagnose the problem:
> 

Yes, that helps a lot, thanks.

The problem is that the reshape thread is restarting before the array
is fully set-up, so it ends up dereferencing a NULL pointer.

This patch should fix it.
In fact, there is a small chance that next time you boot it will work
without this patch, but the patch makes it more reliable.

There definitely should be no data-loss due to this bug.

Thanks,
NeilBrown



### Diffstat output
 ./drivers/md/md.c|6 --
 ./drivers/md/raid5.c |3 ---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2006-05-30 15:07:14.0 +1000
+++ ./drivers/md/md.c   2006-06-19 12:01:47.0 +1000
@@ -2719,8 +2719,6 @@ static int do_md_run(mddev_t * mddev)
}

set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-   md_wakeup_thread(mddev->thread);
-   
if (mddev->sb_dirty)
md_update_sb(mddev);
 
@@ -2738,6 +2736,10 @@ static int do_md_run(mddev_t * mddev)
 
mddev->changed = 1;
md_new_event(mddev);
+
+   md_wakeup_thread(mddev->thread);
+   md_wakeup_thread(mddev->sync_thread);
+
return 0;
 }
 

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2006-06-19 11:56:41.0 +1000
+++ ./drivers/md/raid5.c2006-06-19 11:56:44.0 +1000
@@ -2373,9 +2373,6 @@ static int run(mddev_t *mddev)
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
mddev->sync_thread = md_register_thread(md_do_sync, mddev,
"%s_reshape");
-   /* FIXME if md_register_thread fails?? */
-   md_wakeup_thread(mddev->sync_thread);
-
}
 
/* read-ahead size must cover two whole stripes, which is
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the question about raid0_make_request

2006-06-18 Thread Neil Brown
On Monday June 19, [EMAIL PROTECTED] wrote:
> When I read  the code of raid0_make_request,I meet some questions.
> 
> 1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes.
> so why do we use block substract zone->zone_offset? The
> zone->zone_offset is the zone offset relative the mddev in sectors.

zone_offset is set to 'curr_zone_offset' in create_strip_zones,
curr_zone_offset is a sum of 'zone->size' values.
zone->size is (typically) calculated by
(smallest->size - current_offset) *c
'smallest' is an rdev.
So the unit of 'zone_offset' are ultimately the same units as that of
 rdev->size.
rdev->size is set in md.c is set e.g. from
 calc_dev_size(rdev, sb->chunk_size);
which uses the value from calc_dev_sboffset which shifts the size in
bytes by BLOCK_SIZE_BITS which is defined in fs.h to be 10.
So the units of zone_offset is in kilobytes, not sectors.

> 
> 2\ the codes below:
>   x = block >> chunksize_bits;
>   tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> actually, we get the underlying device by 'sector_div(x,
> zone->nb_dev)'.The var x is the chunk nr relative to the start  of the
> mddev in my opinion.But not all of the zone->nb_dev is the same, so we
> cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think.

x is the chunk number relative to the start of the current zone, not
the start of the mddev:
sector_t x =  (block - zone->zone_offset) >> chunksize_bits;

taking the remainder after dividing this by the number of devices in
the current zone gives the number of the device to use.

Hope that helps.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


the question about raid0_make_request

2006-06-18 Thread liu yang

When I read  the code of raid0_make_request,I meet some questions.

1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes.
so why do we use block substract zone->zone_offset? The
zone->zone_offset is the zone offset relative the mddev in sectors.

2\ the codes below:
 x = block >> chunksize_bits;
 tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
actually, we get the underlying device by 'sector_div(x,
zone->nb_dev)'.The var x is the chunk nr relative to the start  of the
mddev in my opinion.But not all of the zone->nb_dev is the same, so we
cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think.

Why?Could you explain them to me?
Thanks!
Regards.

YangLiu
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 reshape

2006-06-18 Thread Nigel J. Terry

Nigel J. Terry wrote:

Neil Brown wrote:

OK, thanks for the extra details.  I'll have a look and see what I can
find, but it'll probably be a couple of days before I have anything
useful for you.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  



This from dmesg might help diagnose the problem:

md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb1 ...
md:  adding sdb1 ...
md:  adding sda1 ...
md:  adding hdc1 ...
md:  adding hdb1 ...
md: created md0
md: bind
md: bind
md: bind
md: bind
md: running: 
raid5: automatically using best checksumming function: generic_sse
  generic_sse:  6795.000 MB/sec
raid5: using function: generic_sse (6795.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: reshape will continue
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 0
raid5: device hdb1 operational as raid disk 2
raid5: allocated 4268kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:hdb1
...ok start reshape thread
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec) for reconstruction.

md: using 128k window, over a total of 245111552 blocks.
Unable to handle kernel NULL pointer dereference at  RIP:
<>{stext+2145382632}
PGD 7c3f9067 PUD 7cb9e067 PMD 0
Oops: 0010 [1] SMP
CPU 0
Modules linked in: raid5 xor usb_storage video button battery ac lp 
parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus 
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd 
i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore 
snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv 
libata sd_mod scsi_mod

Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1
RIP: 0010:[<>] <>{stext+2145382632}
RSP: :81007aa43d60  EFLAGS: 00010246
RAX: 81007cf72f20 RBX: 81007c682000 RCX: 0006
RDX:  RSI:  RDI: 81007cf72f20
RBP: 02090900 R08:  R09: 810037f497b0
R10: 000b44ffd564 R11: 8022c92a R12: 
R13: 0100 R14:  R15: 
FS:  0066d870() GS:80611000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 7bebc000 CR4: 06e0
Process md0_reshape (pid: 1432, threadinfo 81007aa42000, task 
810037f497b0)

Stack: 803dce42  1d383600 
     
   
Call Trace: {md_do_sync+1307} 
{thread_return+0}
  {thread_return+94} 
{keventd_create_kthread+0}
  {md_thread+248} 
{keventd_create_kthread+0}

  {md_thread+0} {kthread+254}
  {child_rip+8} 
{keventd_create_kthread+0}

  {thread_return+0} {kthread+0}
  {child_rip+0}

Code:  Bad RIP value.
RIP <>{stext+2145382632} RSP 
CR2: 
<6>md: ... autorun DONE.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SW RAID 5 Bug? - Slow After Rebuild (XFS+2.6.16.20)

2006-06-18 Thread Justin Piszcz
I set a disk faulty and then rebuilt it, afterwards, I got horrible 
performance, I was using 2.6.16.20 during the tests.


The FS I use is XFS.

# xfs_info /dev/md3
meta-data=/dev/root  isize=256agcount=16, agsize=1097941 
blks

 =   sectsz=512   attr=0
data =   bsize=4096   blocks=17567056, imaxpct=25
 =   sunit=0  swidth=0 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=8577, version=1
 =   sectsz=512   sunit=0 blks
realtime =none   extsz=65536  blocks=0, rtextents=0

After a raid5 rebuild before reboot:

$ cat 448mb.img > /dev/null

 0  1  4  25104 64 90556000 4 0 1027   154  0  0 88 12
 0  0  4  14580 64 91412800 1434434 1081   718  0  2 77 21
 0  0  4  14516 64 91236000 10312   184 1128  1376  0  3 97  0
 0  0  4  15244 64 91188400 12660 0 1045  1248  0  3 97  0
 0  0  4  15464 64 91127200 11916 0 1055  1081  0  3 98  0
 0  1  4  15100 64 91548800  7844 0 1080   592  0  3 76 21
 0  1  4  13840 64 91678000  1268 0 1295  1757  0  1 49 49
 0  1  4  13480 64 91718800   38848 1050   142  0  1 50 49
procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 0  1  4  14816 64 91589600   492 0 1047   321  0  1 49 49
 0  1  4  14504 64 91623600   324 0 1022   108  0  2 50 49
 0  1  4  14144 64 91657600   388 0 1021   108  0  1 50 50
 0  1  4  13904 64 91684800   256 0 1043   159  0  0 50 49
 0  1  4  13728 64 91712000   26024 1032   102  0  1 50 49
 0  0  4  15244 64 91304000 11856 0 1042  1315  0  3 90  7
 0  0  4  14564 64 91365200 12288 0 1068  1137  1  3 97  0
 0  0  4  15252 64 91297200 12288 0 1054  1128  0  3 97  0
 0  0  4  15132 64 91310800 16384 0 1048  1368  0  4 96  0
 0  0  4  15372 64 91283600 12288 0 1062  1125  0  3 97  0
 0  0  4  15660 64 91263200 12288 0 1065  1093  0  3 97  0
 0  0  4  15388 64 91276800 12288 0 1042  1051  0  3 97  0
 0  0  4  15028 64 91331200 12288 0 1040  1122  0  3 97  0

With an ftp:
 0  1  4 208564 64 72366000  8192 0 1945   495  0  4 53 44
 1  0  4 200592 64 73182000  8192 0 1828   459  0  5 52 44
 0  0  4 194472 64 73794000  6144 0 1396   220  0  2 50 47
 0  1  4 186128 64 74616800  8192 0 1622   377  0  4 51 45
 0  1  4 180008 64 75228800  6144 0 1504   339  0  3 51 46
 0  1  4 174012 64 75847600  6144 0 1438   229  0  3 51 47
 0  1  4 167956 64 76459600  6144 0 1498   263  0  2 51 46
 0  1  4 162084 64 77071600  6144 0 1497   326  0  3 51 46
 0  1  4 156152 64 77690400  6144 0 1476   293  0  3 51 47
 0  1  4 150048 64 78302400  614420 1514   273  0  2 51 46

Also note, when I run 'sync' it would take up to 5 minutes!!! And I was 
not even doing anything on the array.


After reboot:

`448mb.img' at 161467144 (34%) 42.82M/s eta:7s [Receiving data] 
`448mb.img' at 283047424 (60%) 45.23M/s eta:4s [Receiving data] 
`448mb.img' at 406802192 (86%) 46.29M/s eta:1s [Receiving data]


Write speed to the RAID5 is also back to normal.

 0  0  0  16664  8 92894000 0 44478 1522 19791  1 35 43 21
 0  0  0  15304  8 9303680020 49816 1437 19260  0 21 59 20
 0  0  4  16964  8 9283240020 50388 1410 20059  0 20 47 33
 0  0  4  13504  8 93192800 0 46792 1449 16712  0 17 69 15
 0  0  4  14952  8 93043200 8 43510 1489 16443  0 16 60 23
 0  0  4  16328  8 9290720036 50316 1498 16972  1 19 59 23
 0  1  4  16708  8 92846000 0 45604 1504 17196  0 19 55 26
procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 0  0  4  16968  8 92812004 0 47640 1584 17821  0 19 57 25
 0  0  4  15160  8 92988800 0 40836 1637 15335  0 17 63 19
 0  1  4  15372  8 92961600 0 41932 1630 14862  0 17 64 19

Was curious if anyone else had seen this?

/dev/md3:
Version : 00.90.03
  Creation Time : Sun Jun 11 16:52:00 2006
 Raid Level : raid5
 Array Size : 1562834944 (1490.44 GiB 1600.34 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 5
  Total Devices : 5
Preferre