Re: Raid over 48 disks

2007-12-19 Thread Mattias Wadenstein

On Wed, 19 Dec 2007, Neil Brown wrote:


On Tuesday December 18, [EMAIL PROTECTED] wrote:

We're investigating the possibility of running Linux (RHEL) on top of
Sun's X4500 Thumper box:

http://www.sun.com/servers/x64/x4500/

Basically, it's a server with 48 SATA hard drives. No hardware RAID.
It's designed for Sun's ZFS filesystem.

So... we're curious how Linux will handle such a beast. Has anyone run
MD software RAID over so many disks? Then piled LVM/ext3 on top of
that? Any suggestions?


There are those that have run Linux MD RAID on thumpers before. I vaguely 
recall some driver issues (unrelated to MD) that made it less suitable 
than solaris, but that might be fixed in recent kernels.



Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use RAID0 to
combine them together.  This would give you adequate reliability and
performance and still a large amount of storage space.


My personal suggestion would be 5 9-disk raid6s, one raid1 root mirror and 
one hot spare. Then raid0, lvm, or separate filesystem on those 5 raidsets 
for data, depending on your needs.


You get almost as much data space as with the 6 8-disk raid6s, and have a 
separate pair of disks for all the small updates (logging, metadata, etc), 
so this makes alot of sense if most of the data is bulk file access.


/Mattias Wadenstein
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5 resizing

2007-12-19 Thread CaT
Hi,

I'm thinking of slowly replacing disks in my raid5 array with bigger
disks and then resize the array to fill up the new disks. Is this
possible? Basically I would like to go from:

3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of
storage.

It seems like it should be, but... :)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 resizing

2007-12-19 Thread Neil Brown
On Wednesday December 19, [EMAIL PROTECTED] wrote:
 Hi,
 
 I'm thinking of slowly replacing disks in my raid5 array with bigger
 disks and then resize the array to fill up the new disks. Is this
 possible? Basically I would like to go from:
 
 3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of
 storage.
 
 It seems like it should be, but... :)

Yes.

mdadm --grow /dev/mdX --size=max

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ERROR] scsi.c: In function 'scsi_get_serial_number_page'

2007-12-19 Thread Thierry Iceta

Hi

I would like to use raidtools-1.00.3 on Rhel5 distribution
but I got thie error
Could you tell me if a new version is available or if a patch exists
to use raidtools on Rhel5
Thanks for your answer

Thierry

gcc -O2 -Wall -DMD_VERSION=\raidtools-1.00.3\ -c -o rrc_common.o 
rrc_common.c

raid_io.c:96: error: expected declaration specifiers or â before â
raid_io.c:96: error: expected declaration specifiers or â before â
raid_io.c:96: error: expected declaration specifiers or â before â
raid_io.c:97: error: expected declaration specifiers or â before â
raid_io.c:97: error: expected declaration specifiers or â before â
raid_io.c:98: error: expected declaration specifiers or â before â
raid_io.c:101: warning: return type defaults to â
raid_io.c: In function â:
raid_io.c:102: error: expected â, â, â, â or â before â token
raid_io.c:119: error: expected â, â, â, â or â before â token
raid_io.c:214: error: expected â, â, â, â or â before â token
raid_io.c:267: error: expected â, â, â, â or â before â token
raid_io.c:361: error: expected â, â, â, â or â before â token
raid_io.c:519: error: expected â, â, â, â or â before â token
raid_io.c:96: error: parameter name omitted
raid_io.c:96: error: parameter name omitted
raid_io.c:96: error: parameter name omitted
raid_io.c:97: error: parameter name omitted
raid_io.c:97: error: parameter name omitted
raid_io.c:98: error: parameter name omitted
raid_io.c:539: error: expected â at end of input
make: *** [raid_io.o] Error 1
make: *** Waiting for unfinished jobs
scsi.c: In function â:
scsi.c:434: warning: pointer targets in passing argument 2 of â differ 
in signedness


--
 __
 Bull, Architect of an Open World TM

 Open Software RD

   Email   :[EMAIL PROTECTED]
   Bull SABullcom :229 76 29
   1, rue de Provence Phone   :+33 04 76 29 76 29
   B.P. 208   http://www.bull.com  
   38432 Echirolles-CEDEX Office  :FREC B1-361 
 __


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR] scsi.c: In function 'scsi_get_serial_number_page'

2007-12-19 Thread Michael Tokarev
Thierry Iceta wrote:
 Hi
 
 I would like to use raidtools-1.00.3 on Rhel5 distribution
 but I got thie error

Use mdadm instead.  Raidtools is dangerous/unsafe, and is
not maintained for a long time already.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 resizing

2007-12-19 Thread CaT
On Wed, Dec 19, 2007 at 10:59:41PM +1100, Neil Brown wrote:
 On Wednesday December 19, [EMAIL PROTECTED] wrote:
  Hi,
  
  I'm thinking of slowly replacing disks in my raid5 array with bigger
  disks and then resize the array to fill up the new disks. Is this
  possible? Basically I would like to go from:
  
  3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of
  storage.
  
  It seems like it should be, but... :)
 
 Yes.
 
 mdadm --grow /dev/mdX --size=max

Oh -joy-. I love linux sw raid. :) The only thing it seems to lack is
battery backed-up cache.

Thank you.

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid over 48 disks

2007-12-19 Thread Russell Smith

Guy Watkins wrote:

} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of Brendan Conoboy
} Sent: Tuesday, December 18, 2007 3:36 PM
} To: Norman Elton
} Cc: linux-raid@vger.kernel.org
} Subject: Re: Raid over 48 disks
} 
} Norman Elton wrote:

}  We're investigating the possibility of running Linux (RHEL) on top of
}  Sun's X4500 Thumper box:
} 
}  http://www.sun.com/servers/x64/x4500/
} 
} Neat- 6 8 port SATA controllers!  It'll be worth checking to be sure

} each controller has equal bandwidth.  If some controllers are on slower
} buses than others you may want to consider that and balance the md
} device layout.

Assuming the 6 controllers are equal, I would make 3 16 disk RAID6 arrays
using 2 disks from each controller.  That way any 1 controller can fail and
your system will still be running.  6 disks will be used for redundancy.

Or 6 8 disk RAID6 arrays using 1 disk from each controller).  That way any 2
controllers can fail and your system will still be running.  12 disks will
be used for redundancy.  Might be too excessive!

Combine them into a RAID0 array.

Guy

Sounds interesting!

Just out of interest, whats stopping you from using Solaris?

Though, I'm curious how md will compare to ZFS performance wise. There 
is some interesting configuration info / advice for Solaris here: 
http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide 
esp for the X4500.



Russell
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz

The (up to) 30% percent figure is mentioned here:
http://insights.oetiker.ch/linux/raidoptimization.html

On http://forums.storagereview.net/index.php?showtopic=25786:
This user writes about the problem:
XP, and virtually every O/S and partitioning software of XP's day, by default 
places the first partition on a disk at sector 63. Being an odd number, and 
31.5KB into the drive, it isn't ever going to align with any stripe size. This 
is an unfortunate industry standard.
Vista on the other hand, aligns the first partition on sector 2048 by default 
as a by-product of it's revisions to support large-sector sized hard drives. As 
RAID5 arrays in write mode mimick the performance characteristics of 
large-sector size hard drives, this comes as a great if not inadvertent 
benefit. 2048 is evenly divisible by 2 and 4 (allowing for 3 and 5 drive arrays 
optimally) and virtually every stripe size in common use. If you are however 
using a 4-drive RAID5, you're SOOL.

Page 9 in this PDF (EMC_BestPractice_R22.pdf) shows the problem graphically:
http://bbs.doit.com.cn/attachment.php?aid=6757

--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid autodetect

---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct 
start and end size if I wanted to make sure the RAID5 was stripe aligned?


Or is there a better way to do this, does parted handle this situation 
better?


What is the best (and correct) way to calculate stripe-alignment on the 
RAID5 device itself?


---

The EMC paper recommends:

Disk partition adjustment for Linux systems
In Linux, align the partition table before data is written to the LUN, as 
the partition map will be rewritten
and all data on the LUN destroyed. In the following example, the LUN is 
mapped to
/dev/emcpowerah, and the LUN stripe element size is 128 blocks. Arguments 
for the fdisk utility are

as follows:
fdisk/dev/emcpowerah
x  # expert mode
b  # adjust starting block number
1  # choose partition 1
128 #set it to 128, our stripe element size
w  # write the new partition

---

Does this also apply to Linux/SW RAID5?  Or are there any caveats that are 
not taken into account since it is based in SW vs. HW?


---

What it currently looks like:

Command (m for help): x

Expert command (m for help): p

Disk /dev/sdc: 255 heads, 63 sectors, 18241 cylinders

Nr AF  Hd Sec  Cyl  Hd Sec  Cyl Start  Size ID
 1 00   1   10 254  63 1023 63  293041602 fd
 2 00   0   00   0   00  0  0 00
 3 00   0   00   0   00  0  0 00
 4 00   0   00   0   00  0  0 00

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid over 48 disks

2007-12-19 Thread Bill Davidsen

Thiemo Nagel wrote:

Performance of the raw device is fair:
# dd if=/dev/md2 of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 15.6071 seconds, 550 MB/s

Somewhat less through ext3 (created with -E stride=64):
# dd if=largetestfile of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 26.4103 seconds, 325 MB/s


Quite slow?

10 disks (raptors) raid 5 on regular sata controllers:

# dd if=/dev/md3 of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 10.718 seconds, 801 MB/s

# dd if=bigfile of=/dev/zero bs=128k count=64k
3640379392 bytes (3.6 GB) copied, 6.58454 seconds, 553 MB/s


Interesting.  Any ideas what could be the reason?  How much do you get 
from a single drive?  -- The Samsung HD501LJ that I'm using gives 
~84MB/s when reading from the beginning of the disk.


With RAID 5 I'm getting slightly better results (though I really 
wonder why, since naively I would expect identical read performance) 
but that does only account for a small part of the difference:


16k read64k write
  
chunk
  
sizeRAID 5RAID 6RAID 5RAID 6
  
128k492497268270
  
256k615530288270
  
512k625607230174
  
1024k   65062017075
  


What is your stripe cache size?

--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid over 48 disks

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Bill Davidsen wrote:


Thiemo Nagel wrote:

Performance of the raw device is fair:
# dd if=/dev/md2 of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 15.6071 seconds, 550 MB/s

Somewhat less through ext3 (created with -E stride=64):
# dd if=largetestfile of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 26.4103 seconds, 325 MB/s


Quite slow?

10 disks (raptors) raid 5 on regular sata controllers:

# dd if=/dev/md3 of=/dev/zero bs=128k count=64k
8589934592 bytes (8.6 GB) copied, 10.718 seconds, 801 MB/s

# dd if=bigfile of=/dev/zero bs=128k count=64k
3640379392 bytes (3.6 GB) copied, 6.58454 seconds, 553 MB/s


Interesting.  Any ideas what could be the reason?  How much do you get from 
a single drive?  -- The Samsung HD501LJ that I'm using gives ~84MB/s when 
reading from the beginning of the disk.


With RAID 5 I'm getting slightly better results (though I really wonder 
why, since naively I would expect identical read performance) but that does 
only account for a small part of the difference:


16k read64k write
  chunk
  sizeRAID 5RAID 6RAID 5RAID 6
  128k492497268270
  256k615530288270
  512k625607230174
  1024k   65062017075



What is your stripe cache size?


# Set stripe-cache_size for RAID5.
echo Setting stripe_cache_size to 16 MiB for /dev/md3
echo 16384  /sys/block/md3/md/stripe_cache_size

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Mattias Wadenstein

On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid autodetect

---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start 
and end size if I wanted to make sure the RAID5 was stripe aligned?


Or is there a better way to do this, does parted handle this situation 
better?


From that setup it seems simple, scrap the partition table and use the 
disk device for raid. This is what we do for all data storage disks (hw 
raid) and sw raid members.


/Mattias Wadenstein
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:


On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid 
autodetect


---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct 
start and end size if I wanted to make sure the RAID5 was stripe aligned?


Or is there a better way to do this, does parted handle this situation 
better?


From that setup it seems simple, scrap the partition table and use the 
disk device for raid. This is what we do for all data storage disks (hw raid) 
and sw raid members.


/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take my 
machine apart for a BIOS downgrade when I plugged in the sata devices 
again I did not plug them back in the same order, everything worked of 
course but when I ran LILO it said it was not part of the RAID set, 
because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if 
I had not used partitions here, I'd have lost (or more of the drives) due 
to a bad LILO run?


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Jon Nelson
On 12/19/07, Justin Piszcz [EMAIL PROTECTED] wrote:


 On Wed, 19 Dec 2007, Mattias Wadenstein wrote:
  From that setup it seems simple, scrap the partition table and use the
  disk device for raid. This is what we do for all data storage disks (hw 
  raid)
  and sw raid members.
 
  /Mattias Wadenstein
 

 Is there any downside to doing that?  I remember when I had to take my

There is one (just pointed out to me yesterday): having the partition
and having it labeled as raid makes identification quite a bit easier
for humans and software, too.

-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid over 48 disks

2007-12-19 Thread Bill Davidsen

Mattias Wadenstein wrote:

On Wed, 19 Dec 2007, Neil Brown wrote:


On Tuesday December 18, [EMAIL PROTECTED] wrote:

We're investigating the possibility of running Linux (RHEL) on top of
Sun's X4500 Thumper box:

http://www.sun.com/servers/x64/x4500/

Basically, it's a server with 48 SATA hard drives. No hardware RAID.
It's designed for Sun's ZFS filesystem.

So... we're curious how Linux will handle such a beast. Has anyone run
MD software RAID over so many disks? Then piled LVM/ext3 on top of
that? Any suggestions?


There are those that have run Linux MD RAID on thumpers before. I 
vaguely recall some driver issues (unrelated to MD) that made it less 
suitable than solaris, but that might be fixed in recent kernels.



Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use RAID0 to
combine them together.  This would give you adequate reliability and
performance and still a large amount of storage space.


My personal suggestion would be 5 9-disk raid6s, one raid1 root mirror 
and one hot spare. Then raid0, lvm, or separate filesystem on those 5 
raidsets for data, depending on your needs.


Other than thinking raid-10 better than  raid-1for performance, I like it.


You get almost as much data space as with the 6 8-disk raid6s, and 
have a separate pair of disks for all the small updates (logging, 
metadata, etc), so this makes alot of sense if most of the data is 
bulk file access.


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR] scsi.c: In function 'scsi_get_serial_number_page'

2007-12-19 Thread Bill Davidsen

Thierry Iceta wrote:

Hi

I would like to use raidtools-1.00.3 on Rhel5 distribution
but I got thie error
Could you tell me if a new version is available or if a patch exists
to use raidtools on Rhel5


raidtools is old and unmaintained. Use mdadm.

--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Jon Nelson wrote:


On 12/19/07, Justin Piszcz [EMAIL PROTECTED] wrote:



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:

From that setup it seems simple, scrap the partition table and use the

disk device for raid. This is what we do for all data storage disks (hw raid)
and sw raid members.

/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take my


There is one (just pointed out to me yesterday): having the partition
and having it labeled as raid makes identification quite a bit easier
for humans and software, too.

--
Jon



Some nice graphs found here:
http://sqlblog.com/blogs/linchi_shea/archive/2007/02/01/performance-impact-of-disk-misalignment.aspx

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Bill Davidsen

Justin Piszcz wrote:



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:


On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid 
autodetect


---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the 
correct start and end size if I wanted to make sure the RAID5 was 
stripe aligned?


Or is there a better way to do this, does parted handle this 
situation better?


From that setup it seems simple, scrap the partition table and use the 
disk device for raid. This is what we do for all data storage disks 
(hw raid) and sw raid members.


/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take my 
machine apart for a BIOS downgrade when I plugged in the sata devices 
again I did not plug them back in the same order, everything worked of 
course but when I ran LILO it said it was not part of the RAID set, 
because /dev/sda had become /dev/sdg and overwrote the MBR on the 
disk, if I had not used partitions here, I'd have lost (or more of the 
drives) due to a bad LILO run?


As other posts have detailed, putting the partition on a 64k aligned 
boundary can address the performance problems. However, a poor choice of 
chunk size, cache_buffer size, or just random i/o in small sizes can eat 
up a lot of the benefit.


I don't think you need to give up your partitions to get the benefit of 
alignment.


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Bill Davidsen wrote:


Justin Piszcz wrote:



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:


On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid 
autodetect


---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct 
start and end size if I wanted to make sure the RAID5 was stripe aligned?


Or is there a better way to do this, does parted handle this situation 
better?


From that setup it seems simple, scrap the partition table and use the 
disk device for raid. This is what we do for all data storage disks (hw 
raid) and sw raid members.


/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take my 
machine apart for a BIOS downgrade when I plugged in the sata devices again 
I did not plug them back in the same order, everything worked of course but 
when I ran LILO it said it was not part of the RAID set, because /dev/sda 
had become /dev/sdg and overwrote the MBR on the disk, if I had not used 
partitions here, I'd have lost (or more of the drives) due to a bad LILO 
run?


As other posts have detailed, putting the partition on a 64k aligned boundary 
can address the performance problems. However, a poor choice of chunk size, 
cache_buffer size, or just random i/o in small sizes can eat up a lot of the 
benefit.


I don't think you need to give up your partitions to get the benefit of 
alignment.


--
Bill Davidsen [EMAIL PROTECTED]
Woe unto the statesman who makes war without a reason that will still
be valid when the war is over... Otto von Bismark 



Hrmm..

I am doing a benchmark now with:

6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup.

unligned, just fdisk /dev/sdc, mkpartition, fd raid.
 aligned, fdisk, expert, start at 512 as the off-set

Per a Microsoft KB:

Example of alignment calculations in kilobytes for a 256-KB stripe unit 
size:

(63 * .5) / 256 = 0.123046875
(64 * .5) / 256 = 0.125
(128 * .5) / 256 = 0.25
(256 * .5) / 256 = 0.5
(512 * .5) / 256 = 1
These examples shows that the partition is not aligned correctly for a 
256-KB stripe unit size until the partition is created by using an offset 
of 512 sectors (512 bytes per sector).


So I should start at 512 for a 256k chunk size.

I ran bonnie++ three consecutive times and took the average for the 
unaligned, rebuilding the RAID5 now and then I will re-execute the test 3 
additional times and take the average of that.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Jon Nelson
On 12/19/07, Bill Davidsen [EMAIL PROTECTED] wrote:
 As other posts have detailed, putting the partition on a 64k aligned
 boundary can address the performance problems. However, a poor choice of
 chunk size, cache_buffer size, or just random i/o in small sizes can eat
 up a lot of the benefit.

 I don't think you need to give up your partitions to get the benefit of
 alignment.

How might that benefit be realized?
Assume I have 3 disks, /dev/sd{b,c,d} all partitioned identically with
4 partitions, and I want to use /dev/sd{b,c,d}3 for a new SW raid.

What sequence of steps can I take to ensure that my raid is aligned on
a 64K boundary?
What effect do the different superblock formats have, if any, in this situation?


-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Jon Nelson
On 12/19/07, Bill Davidsen [EMAIL PROTECTED] wrote:
 As other posts have detailed, putting the partition on a 64k aligned
 boundary can address the performance problems. However, a poor choice of
 chunk size, cache_buffer size, or just random i/o in small sizes can eat
 up a lot of the benefit.

 I don't think you need to give up your partitions to get the benefit of
 alignment.

How might that benefit be realized?
Assume I have 3 disks, /dev/sd{b,c,d} all partitioned identically with
4 partitions, and I want to use /dev/sd{b,c,d}3 for a new SW raid.

What sequence of steps can I take to ensure that my raid is aligned on
a 64K boundary?
What effect do the different superblock formats have, if any, in this situation?

-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


help diagnosing bad disk

2007-12-19 Thread Jon Sabo
So I was trying to copy over some Indiana Jones wav files and it
wasn't going my way.  I noticed that my software raid device showed:

/dev/md1 on / type ext3 (rw,errors=remount-ro)

Is this saying that it was remounted, read only because it found a
problem with the md1 meta device?  That's what it looks like it's
saying but I can still write to /.

mdadm --detail showed:

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:14 2007
 Raid Level : raid1
 Array Size : 1951744 ( 1906.32 MiB 1998.59 MB)
Device Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Dec 19 12:59:56 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 157f716c:0e7aebca:c20741f6
:bb6099c9
 Events : 0.28

 Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   001  removed

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
 /dev/md1:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:47 2007
 Raid Level : raid1
 Array Size : 974808064 (929.65 GiB 998.20 GB)
Device Size : 974808064 (929.65 GiB 998.20 GB)
Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 19 13:14:53 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
 Events : 0.1990

Number   Major   Minor   RaidDevice State
   0   820  active sync   /dev/sda2
   1   001  removed


I have two 1 terabyte sata drives in this box.  From what I was
reading wouldn't it show an F for the failed drive?  I thought I would
see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F.
What is this saying and how do you know that its /dev/sdb and not some
other drive?  It shows removed and that the state is clean, degraded.
Is that something you can recover from with out returning this disk
and putting in a new one to add to the raid1 array?

Thanks,

Jonathan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help diagnosing bad disk

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Jon Sabo wrote:


So I was trying to copy over some Indiana Jones wav files and it
wasn't going my way.  I noticed that my software raid device showed:

/dev/md1 on / type ext3 (rw,errors=remount-ro)

Is this saying that it was remounted, read only because it found a
problem with the md1 meta device?  That's what it looks like it's
saying but I can still write to /.

mdadm --detail showed:

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
   Version : 00.90.03
 Creation Time : Mon Jul 30 21:47:14 2007
Raid Level : raid1
Array Size : 1951744 ( 1906.32 MiB 1998.59 MB)
   Device Size : 1951744 (1906.32 MiB 1998.59 MB)
  Raid Devices : 2
 Total Devices : 1
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Wed Dec 19 12:59:56 2007
 State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
 Spare Devices : 0

  UUID : 157f716c:0e7aebca:c20741f6
:bb6099c9
Events : 0.28

Number   Major   Minor   RaidDevice State
  0   810  active sync   /dev/sda1
  1   001  removed

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
/dev/md1:
   Version : 00.90.03
 Creation Time : Mon Jul 30 21:47:47 2007
Raid Level : raid1
Array Size : 974808064 (929.65 GiB 998.20 GB)
   Device Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
 Total Devices : 1
Preferred Minor : 1
   Persistence : Superblock is persistent

   Update Time : Wed Dec 19 13:14:53 2007
 State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
 Spare Devices : 0

  UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
Events : 0.1990

   Number   Major   Minor   RaidDevice State
  0   820  active sync   /dev/sda2
  1   001  removed


I have two 1 terabyte sata drives in this box.  From what I was
reading wouldn't it show an F for the failed drive?  I thought I would
see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F.
What is this saying and how do you know that its /dev/sdb and not some
other drive?  It shows removed and that the state is clean, degraded.
Is that something you can recover from with out returning this disk
and putting in a new one to add to the raid1 array?


mdadm /dev/md1 -a /dev/sdb2 to re-add it back into the array

What does cat /proc/mdstat show?

I would also show us: smartctl -a /dev/sdb

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help diagnosing bad disk

2007-12-19 Thread Jon Sabo
I found the problem.   The power was unplugged from the drive.  The
sata power connectors aren't very good at securing the connector.  I
reattached the power connector to the sata drive and booted up.  This
is what it looks like now:

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:14 2007
 Raid Level : raid1
 Array Size : 1951744 (1906.32 MiB 1998.59 MB)
Device Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Dec 19 13:48:12 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 157f716c:0e7aebca:c20741f6:bb6099c9
 Events : 0.44

Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   001  removed
[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:47 2007
 Raid Level : raid1
 Array Size : 974808064 (929.65 GiB 998.20 GB)
Device Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 19 13:50:02 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
 Events : 0.1498340

Number   Major   Minor   RaidDevice State
   0   000  removed
   1   8   181  active sync   /dev/sdb2


How do I put it back into the correct state?

Thanks!

Jonathan

On Dec 19, 2007 1:23 PM, Justin Piszcz [EMAIL PROTECTED] wrote:



 On Wed, 19 Dec 2007, Jon Sabo wrote:

  So I was trying to copy over some Indiana Jones wav files and it
  wasn't going my way.  I noticed that my software raid device showed:
 
  /dev/md1 on / type ext3 (rw,errors=remount-ro)
 
  Is this saying that it was remounted, read only because it found a
  problem with the md1 meta device?  That's what it looks like it's
  saying but I can still write to /.
 
  mdadm --detail showed:
 
  [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
  /dev/md0:
 Version : 00.90.03
   Creation Time : Mon Jul 30 21:47:14 2007
  Raid Level : raid1
  Array Size : 1951744 ( 1906.32 MiB 1998.59 MB)
 Device Size : 1951744 (1906.32 MiB 1998.59 MB)
Raid Devices : 2
   Total Devices : 1
  Preferred Minor : 0
 Persistence : Superblock is persistent
 
 Update Time : Wed Dec 19 12:59:56 2007
   State : clean, degraded
  Active Devices : 1
  Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0
 
UUID : 157f716c:0e7aebca:c20741f6
  :bb6099c9
  Events : 0.28
 
  Number   Major   Minor   RaidDevice State
0   810  active sync   /dev/sda1
1   001  removed
 
  [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
  /dev/md1:
 Version : 00.90.03
   Creation Time : Mon Jul 30 21:47:47 2007
  Raid Level : raid1
  Array Size : 974808064 (929.65 GiB 998.20 GB)
 Device Size : 974808064 (929.65 GiB 998.20 GB)
 Raid Devices : 2
   Total Devices : 1
  Preferred Minor : 1
 Persistence : Superblock is persistent
 
 Update Time : Wed Dec 19 13:14:53 2007
   State : clean, degraded
  Active Devices : 1
  Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0
 
UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
  Events : 0.1990
 
 Number   Major   Minor   RaidDevice State
0   820  active sync   /dev/sda2
1   001  removed
 
 
  I have two 1 terabyte sata drives in this box.  From what I was
  reading wouldn't it show an F for the failed drive?  I thought I would
  see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F.
  What is this saying and how do you know that its /dev/sdb and not some
  other drive?  It shows removed and that the state is clean, degraded.
  Is that something you can recover from with out returning this disk
  and putting in a new one to add to the raid1 array?

 mdadm /dev/md1 -a /dev/sdb2 to re-add it back into the array

 What does cat /proc/mdstat show?

 I would also show us: smartctl -a /dev/sdb

 Justin.


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help diagnosing bad disk

2007-12-19 Thread Bill Davidsen

Jon Sabo wrote:

So I was trying to copy over some Indiana Jones wav files and it
wasn't going my way.  I noticed that my software raid device showed:

/dev/md1 on / type ext3 (rw,errors=remount-ro)

Is this saying that it was remounted, read only because it found a
problem with the md1 meta device?  That's what it looks like it's
saying but I can still write to /.

mdadm --detail showed:

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:14 2007
 Raid Level : raid1
 Array Size : 1951744 ( 1906.32 MiB 1998.59 MB)
Device Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Dec 19 12:59:56 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 157f716c:0e7aebca:c20741f6
:bb6099c9
 Events : 0.28

 Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   001  removed

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
 /dev/md1:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:47 2007
 Raid Level : raid1
 Array Size : 974808064 (929.65 GiB 998.20 GB)
Device Size : 974808064 (929.65 GiB 998.20 GB)
Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 19 13:14:53 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
 Events : 0.1990

Number   Major   Minor   RaidDevice State
   0   820  active sync   /dev/sda2
   1   001  removed


I have two 1 terabyte sata drives in this box.  From what I was
reading wouldn't it show an F for the failed drive?  I thought I would
see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F.
What is this saying and how do you know that its /dev/sdb and not some
other drive?  It shows removed and that the state is clean, degraded.
Is that something you can recover from with out returning this disk
and putting in a new one to add to the raid1 array?
  


You can try adding the partitions back to your array, but I suspect 
something bad has happened to your sdb drive, since it's failed out of 
both arrays. You can use dmesg to look for any additional information.


Justin gave you the rest of the info you need to investigate, I'll not 
repeat it. ;-)


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Bill Davidsen

Justin Piszcz wrote:



On Wed, 19 Dec 2007, Bill Davidsen wrote:


Justin Piszcz wrote:



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:


On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid 
autodetect


---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the 
correct start and end size if I wanted to make sure the RAID5 was 
stripe aligned?


Or is there a better way to do this, does parted handle this 
situation better?


From that setup it seems simple, scrap the partition table and use 
the 
disk device for raid. This is what we do for all data storage disks 
(hw raid) and sw raid members.


/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take 
my machine apart for a BIOS downgrade when I plugged in the sata 
devices again I did not plug them back in the same order, everything 
worked of course but when I ran LILO it said it was not part of the 
RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR 
on the disk, if I had not used partitions here, I'd have lost (or 
more of the drives) due to a bad LILO run?


As other posts have detailed, putting the partition on a 64k aligned 
boundary can address the performance problems. However, a poor choice 
of chunk size, cache_buffer size, or just random i/o in small sizes 
can eat up a lot of the benefit.


I don't think you need to give up your partitions to get the benefit 
of alignment.


--
Bill Davidsen [EMAIL PROTECTED]
Woe unto the statesman who makes war without a reason that will still
be valid when the war is over... Otto von Bismark


Hrmm..

I am doing a benchmark now with:

6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup.

unligned, just fdisk /dev/sdc, mkpartition, fd raid.
 aligned, fdisk, expert, start at 512 as the off-set

Per a Microsoft KB:

Example of alignment calculations in kilobytes for a 256-KB stripe 
unit size:

(63 * .5) / 256 = 0.123046875
(64 * .5) / 256 = 0.125
(128 * .5) / 256 = 0.25
(256 * .5) / 256 = 0.5
(512 * .5) / 256 = 1
These examples shows that the partition is not aligned correctly for a 
256-KB stripe unit size until the partition is created by using an 
offset of 512 sectors (512 bytes per sector).


So I should start at 512 for a 256k chunk size.

I ran bonnie++ three consecutive times and took the average for the 
unaligned, rebuilding the RAID5 now and then I will re-execute the 
test 3 additional times and take the average of that.


I'm going to try another approach, I'll describe it when I get results 
(or not).


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help diagnosing bad disk

2007-12-19 Thread Jon Sabo
We'll here's the rest of the info I should have sent in the last email:

[EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat
Personalities : [multipath] [raid1]
md1 : active raid1 sdb2[1]
  974808064 blocks [2/1] [_U]

md0 : active raid1 sda1[0]
  1951744 blocks [2/1] [U_]

unused devices: none
[EMAIL PROTECTED]:/home/illsci# dmesg | grep sdb
sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
md: bindsdb1
md: kicking non-fresh sdb1 from array!
md: unbindsdb1
md: export_rdev(sdb1)
md: bindsdb2
[EMAIL PROTECTED]:/home/illsci# dmesg | grep sda
sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
md: bindsda1
md: bindsda2
md: kicking non-fresh sda2 from array!
md: unbindsda2
md: export_rdev(sda2)

[EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sda
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA  Hitachi HDS72101 Version: GKAO
Serial number:   GTJ000PAG2HZUC
Device type: disk
Local Time is: Wed Dec 19 14:13:47 2007 EST
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
[EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sdb
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA  Hitachi HDS72101 Version: GKAO
Serial number:   GTJ000PAG2K43C
Device type: disk
Local Time is: Wed Dec 19 14:13:49 2007 EST
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging




On Dec 19, 2007 2:16 PM, Bill Davidsen [EMAIL PROTECTED] wrote:

 Jon Sabo wrote:
  So I was trying to copy over some Indiana Jones wav files and it
  wasn't going my way.  I noticed that my software raid device showed:
 
  /dev/md1 on / type ext3 (rw,errors=remount-ro)
 
  Is this saying that it was remounted, read only because it found a
  problem with the md1 meta device?  That's what it looks like it's
  saying but I can still write to /.
 
  mdadm --detail showed:
 
  [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
  /dev/md0:
  Version : 00.90.03
Creation Time : Mon Jul 30 21:47:14 2007
   Raid Level : raid1
   Array Size : 1951744 ( 1906.32 MiB 1998.59 MB)
  Device Size : 1951744 (1906.32 MiB 1998.59 MB)
 Raid Devices : 2
Total Devices : 1
  Preferred Minor : 0
  Persistence : Superblock is persistent
 
  Update Time : Wed Dec 19 12:59:56 2007
State : clean, degraded
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 0
Spare Devices : 0
 
 UUID : 157f716c:0e7aebca:c20741f6
  :bb6099c9
   Events : 0.28
 
   Number   Major   Minor   RaidDevice State
 0   810  active sync   /dev/sda1
 1   001  removed
 
  [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
   /dev/md1:
  Version : 00.90.03
Creation Time : Mon Jul 30 21:47:47 2007
   Raid Level : raid1
   Array Size : 974808064 (929.65 GiB 998.20 GB)
  Device Size : 974808064 (929.65 GiB 998.20 GB)
  Raid Devices : 2
Total Devices : 1
  Preferred Minor : 1
  Persistence : Superblock is persistent
 
  Update Time : Wed Dec 19 13:14:53 2007
State : clean, degraded
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 0
Spare Devices : 0
 
 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
   Events : 0.1990
 
  Number   Major   Minor   RaidDevice State
 0   820  active sync   /dev/sda2
 1   001  removed
 
 
  I have two 1 terabyte sata drives in this box.  From what I was
  reading wouldn't it show an F for the failed drive?  I thought I would
  see that /dev/sdb1 and /dev/sdb2 were failed and it 

Re: help diagnosing bad disk

2007-12-19 Thread Jon Sabo
I think I got it now.  Thanks for your help!

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:14 2007
 Raid Level : raid1
 Array Size : 1951744 (1906.32 MiB 1998.59 MB)
Device Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Dec 19 14:15:31 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 157f716c:0e7aebca:c20741f6:bb6099c9
 Events : 0.48

Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   001  removed
[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
  Creation Time : Mon Jul 30 21:47:47 2007
 Raid Level : raid1
 Array Size : 974808064 (929.65 GiB 998.20 GB)
Device Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 19 14:19:06 2007
  State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

   UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
 Events : 0.1498998

Number   Major   Minor   RaidDevice State
   0   000  removed
   1   8   181  active sync   /dev/sdb2
[EMAIL PROTECTED]:/home/illsci# mdadm /dev/md0 -a /dev/sdb1
mdadm: re-added /dev/sdb1
[EMAIL PROTECTED]:/home/illsci# mdadm /dev/md1 -a /dev/sda2
mdadm: re-added /dev/sda2
[EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat
Personalities : [multipath] [raid1]
md1 : active raid1 sda2[2] sdb2[1]
  974808064 blocks [2/1] [_U]
resync=DELAYED

md0 : active raid1 sdb1[2] sda1[0]
  1951744 blocks [2/1] [U_]
  [=...]  recovery = 86.6% (1693504/1951744)
finish=0.0min speed=80643K/sec

unused devices: none
[EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat
Personalities : [multipath] [raid1]
md1 : active raid1 sda2[2] sdb2[1]
  974808064 blocks [2/1] [_U]
  []  recovery =  0.0% (86848/974808064)
finish=186.9min speed=86848K/sec

md0 : active raid1 sdb1[1] sda1[0]
  1951744 blocks [2/2] [UU]

unused devices: none


On Dec 19, 2007 2:09 PM, Jon Sabo [EMAIL PROTECTED] wrote:
 We'll here's the rest of the info I should have sent in the last email:

 [EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat
 Personalities : [multipath] [raid1]
 md1 : active raid1 sdb2[1]
   974808064 blocks [2/1] [_U]

 md0 : active raid1 sda1[0]
   1951744 blocks [2/1] [U_]

 unused devices: none
 [EMAIL PROTECTED]:/home/illsci# dmesg | grep sdb
 sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB)
 sd 1:0:0:0: [sdb] Write Protect is off
 sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA
 sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB)
 sd 1:0:0:0: [sdb] Write Protect is off
 sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA
  sdb: sdb1 sdb2
 sd 1:0:0:0: [sdb] Attached SCSI disk
 md: bindsdb1
 md: kicking non-fresh sdb1 from array!
 md: unbindsdb1
 md: export_rdev(sdb1)
 md: bindsdb2
 [EMAIL PROTECTED]:/home/illsci# dmesg | grep sda
 sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB)
 sd 0:0:0:0: [sda] Write Protect is off
 sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA
 sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB)
 sd 0:0:0:0: [sda] Write Protect is off
 sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA
  sda: sda1 sda2
 sd 0:0:0:0: [sda] Attached SCSI disk
 md: bindsda1
 md: bindsda2
 md: kicking non-fresh sda2 from array!
 md: unbindsda2
 md: export_rdev(sda2)

 [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sda
 smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/

 Device: ATA  Hitachi HDS72101 Version: GKAO
 Serial number:   GTJ000PAG2HZUC
 Device type: disk
 Local Time is: Wed Dec 19 14:13:47 2007 EST
 Device does not support SMART

 Error Counter logging not supported

 [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
 Device does not support Self Test logging
 [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sdb
 smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/

 Device: ATA  Hitachi HDS72101 Version: GKAO
 Serial number:   

Re: help diagnosing bad disk

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Jon Sabo wrote:


I found the problem.   The power was unplugged from the drive.  The
sata power connectors aren't very good at securing the connector.  I
reattached the power connector to the sata drive and booted up.  This
is what it looks like now:

[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0
/dev/md0:
   Version : 00.90.03
 Creation Time : Mon Jul 30 21:47:14 2007
Raid Level : raid1
Array Size : 1951744 (1906.32 MiB 1998.59 MB)
   Device Size : 1951744 (1906.32 MiB 1998.59 MB)
  Raid Devices : 2
 Total Devices : 1
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Wed Dec 19 13:48:12 2007
 State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
 Spare Devices : 0

  UUID : 157f716c:0e7aebca:c20741f6:bb6099c9
Events : 0.44

   Number   Major   Minor   RaidDevice State
  0   810  active sync   /dev/sda1
  1   001  removed
[EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1
/dev/md1:
   Version : 00.90.03
 Creation Time : Mon Jul 30 21:47:47 2007
Raid Level : raid1
Array Size : 974808064 (929.65 GiB 998.20 GB)
   Device Size : 974808064 (929.65 GiB 998.20 GB)
  Raid Devices : 2
 Total Devices : 1
Preferred Minor : 1
   Persistence : Superblock is persistent

   Update Time : Wed Dec 19 13:50:02 2007
 State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
 Spare Devices : 0

  UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744
Events : 0.1498340

   Number   Major   Minor   RaidDevice State
  0   000  removed
  1   8   181  active sync   /dev/sdb2


How do I put it back into the correct state?

Thanks!


mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sda1

Weird that they got out out of sync on different drives.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Bill Davidsen wrote:


Justin Piszcz wrote:



On Wed, 19 Dec 2007, Bill Davidsen wrote:


Justin Piszcz wrote:



On Wed, 19 Dec 2007, Mattias Wadenstein wrote:


On Wed, 19 Dec 2007, Justin Piszcz wrote:


--

Now to my setup / question:

# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

  Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid 
autodetect


---

If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct 
start and end size if I wanted to make sure the RAID5 was stripe 
aligned?


Or is there a better way to do this, does parted handle this situation 
better?


From that setup it seems simple, scrap the partition table and use the 
disk device for raid. This is what we do for all data storage disks (hw 
raid) and sw raid members.


/Mattias Wadenstein



Is there any downside to doing that?  I remember when I had to take my 
machine apart for a BIOS downgrade when I plugged in the sata devices 
again I did not plug them back in the same order, everything worked of 
course but when I ran LILO it said it was not part of the RAID set, 
because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, 
if I had not used partitions here, I'd have lost (or more of the drives) 
due to a bad LILO run?


As other posts have detailed, putting the partition on a 64k aligned 
boundary can address the performance problems. However, a poor choice of 
chunk size, cache_buffer size, or just random i/o in small sizes can eat 
up a lot of the benefit.


I don't think you need to give up your partitions to get the benefit of 
alignment.


--
Bill Davidsen [EMAIL PROTECTED]
Woe unto the statesman who makes war without a reason that will still
be valid when the war is over... Otto von Bismark


Hrmm..

I am doing a benchmark now with:

6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup.

unligned, just fdisk /dev/sdc, mkpartition, fd raid.
 aligned, fdisk, expert, start at 512 as the off-set

Per a Microsoft KB:

Example of alignment calculations in kilobytes for a 256-KB stripe unit 
size:

(63 * .5) / 256 = 0.123046875
(64 * .5) / 256 = 0.125
(128 * .5) / 256 = 0.25
(256 * .5) / 256 = 0.5
(512 * .5) / 256 = 1
These examples shows that the partition is not aligned correctly for a 
256-KB stripe unit size until the partition is created by using an offset 
of 512 sectors (512 bytes per sector).


So I should start at 512 for a 256k chunk size.

I ran bonnie++ three consecutive times and took the average for the 
unaligned, rebuilding the RAID5 now and then I will re-execute the test 3 
additional times and take the average of that.


I'm going to try another approach, I'll describe it when I get results (or 
not).


Waiting for the raid to rebuild then I will re-run thereafter.

  [=...]  recovery = 86.7% (339104640/390708480) 
finish=30.8min speed=27835K/sec


...


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 reshape/resync - BUGREPORT/PROBLEM

2007-12-19 Thread Nagilum

- Message from [EMAIL PROTECTED] -

- Message from [EMAIL PROTECTED] -

Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100)


 Ok, I've recreated the problem in form of a semiautomatic testcase.
 All necessary files (plus the old xfs_repair output) are at:

   http://www.nagilum.de/md/

 After running the test.sh the created xfs filesystem on the raid
 device is broken and (at last in my case) cannot be mounted anymore.

 I think that you should file a bugreport



- End message from [EMAIL PROTECTED] -

Where would I file this bug report? I thought this is the place?
I could also really use a way to fix that corruption. :(


ouch. To be honest I subscribed here just a month ago, so I'm not
sure. But I haven't seen other bugreports here so far.

I was expecting that there is some bugzilla?


Not really I'm afraid. At least not aware of anything like that for vanilla.

Anyway I just verified the bug on 2.6.23.11 and 2.6.24-rc5-git4.
Also I came across the bug on amd64 while I'm now using a PPC750  
machine to verify the bug. So it's an architecture undependant bug.  
(but that was to be expected)
I also prepared a different version of the testcase v2_start.sh and  
v2_test.sh. This will print out all the wrong bytes (longs to be  
exact) + location.

It shows the data is there, but scattered. :(
Kind regards,
Alex.

- End message from [EMAIL PROTECTED] -




#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgptVVVnLvuof.pgp
Description: PGP Digital Signature


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Bill Davidsen wrote:

I'm going to try another approach, I'll describe it when I get results (or 
not).


http://home.comcast.net/~jpiszcz/align_vs_noalign/

Hardly any difference at whatsoever, only on the per char for read/write 
is it any faster..?


Average of 3 runs taken:

$ cat align/*log|grep ,
p63,8G,57683,94,86479,13,55242,8,63495,98,147647,11,434.8,0,16:10:16/64,1334210,10,330,2,120,1,3978,10,312,2
p63,8G,57973,95,76702,11,50830,7,62291,99,136477,10,388.3,0,16:10:16/64,1252548,6,296,1,115,1,7927,20,373,2
p63,8G,57758,95,80847,12,52144,8,63874,98,144747,11,443.4,0,16:10:16/64,1242445,6,303,1,117,1,6767,17,359,2

$ cat noalign/*log|grep ,
p63,8G,57641,94,85494,12,55669,8,63802,98,146925,11,434.8,0,16:10:16/64,1353180,8,314,1,117,1,8684,22,283,2
p63,8G,57705,94,85929,12,56708,8,63855,99,143437,11,436.2,0,16:10:16/64,12211519,29,297,1,113,1,3218,8,325,2
p63,8G,57783,94,78226,11,48580,7,63487,98,137721,10,438.7,0,16:10:16/64,1243229,8,307,1,120,1,4247,11,313,2

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help diagnosing bad disk

2007-12-19 Thread Iustin Pop
On Wed, Dec 19, 2007 at 01:18:21PM -0500, Jon Sabo wrote:
 So I was trying to copy over some Indiana Jones wav files and it
 wasn't going my way.  I noticed that my software raid device showed:
 
 /dev/md1 on / type ext3 (rw,errors=remount-ro)
 
 Is this saying that it was remounted, read only because it found a
 problem with the md1 meta device?  That's what it looks like it's
 saying but I can still write to /.

FYI, it means that it is currently rw, and if there are errors, it
will remount the filesystem readonly (as opposed to panic-ing).

regards,
iustin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Robin Hill
On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote:

 The (up to) 30% percent figure is mentioned here:
 http://insights.oetiker.ch/linux/raidoptimization.html

That looks to be referring to partitioning a RAID device - this'll only
apply to hardware RAID or partitionable software RAID, not to the normal
use case.  When you're creating an array out of standard partitions then
you know the array stripe size will align with the disks (there's no way
it cannot), and you can set the filesystem stripe size to align as well
(XFS will do this automatically).

I've actually done tests on this with hardware RAID to try to find the
correct partition offset, but wasn't able to see any difference (using
bonnie++ and moving the partition start by one sector at a time).

 # fdisk -l /dev/sdc

 Disk /dev/sdc: 150.0 GB, 150039945216 bytes
 255 heads, 63 sectors/track, 18241 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Disk identifier: 0x5667c24a

Device Boot  Start End  Blocks   Id  System
 /dev/sdc1   1   18241   146520801   fd  Linux raid 
 autodetect

This looks to be a normal disk - the partition offsets shouldn't be
relevant here (barring any knowledge of the actual physical disk layout
anyway, and block remapping may well make that rather irrelevant).

That's my take on this one anyway.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpF38P14XDRA.pgp
Description: PGP signature


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Justin Piszcz



On Wed, 19 Dec 2007, Robin Hill wrote:


On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote:


The (up to) 30% percent figure is mentioned here:
http://insights.oetiker.ch/linux/raidoptimization.html


That looks to be referring to partitioning a RAID device - this'll only
apply to hardware RAID or partitionable software RAID, not to the normal
use case.  When you're creating an array out of standard partitions then
you know the array stripe size will align with the disks (there's no way
it cannot), and you can set the filesystem stripe size to align as well
(XFS will do this automatically).

I've actually done tests on this with hardware RAID to try to find the
correct partition offset, but wasn't able to see any difference (using
bonnie++ and moving the partition start by one sector at a time).


# fdisk -l /dev/sdc

Disk /dev/sdc: 150.0 GB, 150039945216 bytes
255 heads, 63 sectors/track, 18241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x5667c24a

   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   18241   146520801   fd  Linux raid
autodetect


This looks to be a normal disk - the partition offsets shouldn't be
relevant here (barring any knowledge of the actual physical disk layout
anyway, and block remapping may well make that rather irrelevant).

That's my take on this one anyway.

Cheers,
   Robin
--
___
   ( ' } |   Robin Hill[EMAIL PROTECTED] |
  / / )  | Little Jim says |
 // !!   |  He fallen in de water !! |



Interesting, yes, I am using XFS as well, thanks for the response.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Jon Nelson
On 12/19/07, Michal Soltys [EMAIL PROTECTED] wrote:
 Justin Piszcz wrote:
 
  Or is there a better way to do this, does parted handle this situation
  better?
 
  What is the best (and correct) way to calculate stripe-alignment on the
  RAID5 device itself?
 
 
  Does this also apply to Linux/SW RAID5?  Or are there any caveats that
  are not taken into account since it is based in SW vs. HW?
 
  ---

 In case of SW or HW raid, when you place raid aware filesystem directly on
 it, I don't see any potential poblems

 Also, if md's superblock version/placement actually mattered, it'd be pretty
 strange. The space available for actual use - be it partitions or filesystem
 directly - should be always nicely aligned. I don't know that for sure though.

 If you use SW partitionable raid, or HW raid with partitions, then you would
 have to align it on a chunk boundary manually. Any selfrespecting os
 shouldn't complain a partition doesn't start on cylinder boundary these
 days. LVM can complicate life a bit too - if you want it's volumes to be
 chunk-aligned.

That, for me, is the next question - how can one educate LVM about the
underlying block device such that logical volumes carved out of that
space align properly - many of us have experienced 30% (or so)
performance losses for the convenience of LVM (and mighty convenient
it is).


-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm --stop goes off and never comes back?

2007-12-19 Thread Neil Brown
On Tuesday December 18, [EMAIL PROTECTED] wrote:
 This just happened to me.
 Create raid with:
 
 mdadm --create /dev/md2 --level=raid10 --raid-devices=3
 --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3
 
 cat /proc/mdstat
 
 md2 : active raid10 sdd3[2] sdc3[1] sdb3[0]
   5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU]
   [==..]  resync = 14.6% (859968/5855424)
 finish=1.3min speed=61426K/sec
 
 Some log messages:
 
 Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean --
 starting background reconstruction
 Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out
 of 3 devices
 Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2
 Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_  speed: 1000
 KB/sec/disk.
 Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO
 bandwidth (but not more than 20 KB/sec) for resync.
 Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of
 5855424 blocks.
 Dec 18 15:03:36 turnip kernel: md: md2: resync done.
 Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2.
 
 I tried to stop the array:
 
 mdadm --stop /dev/md2
 
 and mdadm never came back. It's off in the kernel somewhere. :-(
 
 kill, of course, has no effect.
 The machine still runs fine, the rest of the raids (md0 and md1) work
 fine (same disks).
 
 The output (snipped, only mdadm) of 'echo t  /proc/sysrq-trigger'
 
 Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0
 3943  1 (NOTLB)
 Dec 18 15:09:13 turnip kernel:  810033e7ddc8 0086
  0092
 Dec 18 15:09:13 turnip kernel:  0fc7 810033e7dd78
 80617800 80617800
 Dec 18 15:09:13 turnip kernel:  8061d210 80617800
 80617800 
 Dec 18 15:09:13 turnip kernel: Call Trace:
 Dec 18 15:09:13 turnip kernel:  [803fac96]
 __mutex_lock_interruptible_slowpath+0x8b/0xca
 Dec 18 15:09:13 turnip kernel:  [802acccb] do_open+0x222/0x2a5
 Dec 18 15:09:13 turnip kernel:  [8038705d] md_seq_show+0x127/0x6c1
 Dec 18 15:09:13 turnip kernel:  [80275597] vma_merge+0x141/0x1ee
 Dec 18 15:09:13 turnip kernel:  [802a2aa0] seq_read+0x1bf/0x28b
 Dec 18 15:09:13 turnip kernel:  [8028a42d] vfs_read+0xcb/0x153
 Dec 18 15:09:13 turnip kernel:  [8028a7c1] sys_read+0x45/0x6e
 Dec 18 15:09:13 turnip kernel:  [80209c2e] system_call+0x7e/0x83
 
 
 
 What happened? Is there any debug info I can provide before I reboot?

Don't know very odd.

The rest of the 'sysrq' output would possibly help.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm --stop goes off and never comes back?

2007-12-19 Thread Jon Nelson
On 12/19/07, Neil Brown [EMAIL PROTECTED] wrote:
 On Tuesday December 18, [EMAIL PROTECTED] wrote:
  This just happened to me.
  Create raid with:
 
  mdadm --create /dev/md2 --level=raid10 --raid-devices=3
  --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3
 
  cat /proc/mdstat
 
  md2 : active raid10 sdd3[2] sdc3[1] sdb3[0]
5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU]
[==..]  resync = 14.6% (859968/5855424)
  finish=1.3min speed=61426K/sec
 
  Some log messages:
 
  Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean --
  starting background reconstruction
  Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out
  of 3 devices
  Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2
  Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_  speed: 1000
  KB/sec/disk.
  Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO
  bandwidth (but not more than 20 KB/sec) for resync.
  Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of
  5855424 blocks.
  Dec 18 15:03:36 turnip kernel: md: md2: resync done.
  Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2.
 
  I tried to stop the array:
 
  mdadm --stop /dev/md2
 
  and mdadm never came back. It's off in the kernel somewhere. :-(
 
  kill, of course, has no effect.
  The machine still runs fine, the rest of the raids (md0 and md1) work
  fine (same disks).
 
  The output (snipped, only mdadm) of 'echo t  /proc/sysrq-trigger'
 
  Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0
  3943  1 (NOTLB)
  Dec 18 15:09:13 turnip kernel:  810033e7ddc8 0086
   0092
  Dec 18 15:09:13 turnip kernel:  0fc7 810033e7dd78
  80617800 80617800
  Dec 18 15:09:13 turnip kernel:  8061d210 80617800
  80617800 
  Dec 18 15:09:13 turnip kernel: Call Trace:
  Dec 18 15:09:13 turnip kernel:  [803fac96]
  __mutex_lock_interruptible_slowpath+0x8b/0xca
  Dec 18 15:09:13 turnip kernel:  [802acccb] do_open+0x222/0x2a5
  Dec 18 15:09:13 turnip kernel:  [8038705d] md_seq_show+0x127/0x6c1
  Dec 18 15:09:13 turnip kernel:  [80275597] vma_merge+0x141/0x1ee
  Dec 18 15:09:13 turnip kernel:  [802a2aa0] seq_read+0x1bf/0x28b
  Dec 18 15:09:13 turnip kernel:  [8028a42d] vfs_read+0xcb/0x153
  Dec 18 15:09:13 turnip kernel:  [8028a7c1] sys_read+0x45/0x6e
  Dec 18 15:09:13 turnip kernel:  [80209c2e] system_call+0x7e/0x83
 
 
 
  What happened? Is there any debug info I can provide before I reboot?

 Don't know very odd.

 The rest of the 'sysrq' output would possibly help.

Does this help? It's the same syscall and args, I think, as above.

Dec 18 15:09:13 turnip kernel: hald  S 0001e52f4793e397 0
3040  1 (NOTLB)
Dec 18 15:09:13 turnip kernel:  81003aa51e38 0086
 802
68ee6
Dec 18 15:09:13 turnip kernel:  81002a97e5c0 81003aa51de8
80617800 806
17800
Dec 18 15:09:13 turnip kernel:  8061d210 80617800
80617800 810
0bb48
Dec 18 15:09:13 turnip kernel: Call Trace:
Dec 18 15:09:13 turnip kernel:  [80268ee6]
get_page_from_freelist+0x3c4/0x545
Dec 18 15:09:13 turnip kernel:  [803fac96]
__mutex_lock_interruptible_slowpath+0x8b/
0xca
Dec 18 15:09:13 turnip kernel:  [80387adf] md_attr_show+0x2f/0x64
Dec 18 15:09:13 turnip kernel:  [802cd142] sysfs_read_file+0xb3/0x111
Dec 18 15:09:13 turnip kernel:  [8028a42d] vfs_read+0xcb/0x153
Dec 18 15:09:13 turnip kernel:  [8028a7c1] sys_read+0x45/0x6e
Dec 18 15:09:13 turnip kernel:  [80209c2e] system_call+0x7e/0x83
Dec 18 15:09:13 turnip kernel:


-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm --stop goes off and never comes back?

2007-12-19 Thread Jon Nelson
On 12/19/07, Jon Nelson [EMAIL PROTECTED] wrote:
 On 12/19/07, Neil Brown [EMAIL PROTECTED] wrote:
  On Tuesday December 18, [EMAIL PROTECTED] wrote:
   This just happened to me.
   Create raid with:
  
   mdadm --create /dev/md2 --level=raid10 --raid-devices=3
   --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3
  
   cat /proc/mdstat
  
   md2 : active raid10 sdd3[2] sdc3[1] sdb3[0]
 5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU]
 [==..]  resync = 14.6% (859968/5855424)
   finish=1.3min speed=61426K/sec
  
   Some log messages:
  
   Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean --
   starting background reconstruction
   Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out
   of 3 devices
   Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2
   Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_  speed: 1000
   KB/sec/disk.
   Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO
   bandwidth (but not more than 20 KB/sec) for resync.
   Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of
   5855424 blocks.
   Dec 18 15:03:36 turnip kernel: md: md2: resync done.
   Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2.
  
   I tried to stop the array:
  
   mdadm --stop /dev/md2
  
   and mdadm never came back. It's off in the kernel somewhere. :-(
  
   kill, of course, has no effect.
   The machine still runs fine, the rest of the raids (md0 and md1) work
   fine (same disks).
  
   The output (snipped, only mdadm) of 'echo t  /proc/sysrq-trigger'
  
   Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0
   3943  1 (NOTLB)
   Dec 18 15:09:13 turnip kernel:  810033e7ddc8 0086
    0092
   Dec 18 15:09:13 turnip kernel:  0fc7 810033e7dd78
   80617800 80617800
   Dec 18 15:09:13 turnip kernel:  8061d210 80617800
   80617800 
   Dec 18 15:09:13 turnip kernel: Call Trace:
   Dec 18 15:09:13 turnip kernel:  [803fac96]
   __mutex_lock_interruptible_slowpath+0x8b/0xca
   Dec 18 15:09:13 turnip kernel:  [802acccb] do_open+0x222/0x2a5
   Dec 18 15:09:13 turnip kernel:  [8038705d] 
   md_seq_show+0x127/0x6c1
   Dec 18 15:09:13 turnip kernel:  [80275597] vma_merge+0x141/0x1ee
   Dec 18 15:09:13 turnip kernel:  [802a2aa0] seq_read+0x1bf/0x28b
   Dec 18 15:09:13 turnip kernel:  [8028a42d] vfs_read+0xcb/0x153
   Dec 18 15:09:13 turnip kernel:  [8028a7c1] sys_read+0x45/0x6e
   Dec 18 15:09:13 turnip kernel:  [80209c2e] system_call+0x7e/0x83
  
  
  
   What happened? Is there any debug info I can provide before I reboot?
 
  Don't know very odd.
 
  The rest of the 'sysrq' output would possibly help.

 Does this help? It's the same syscall and args, I think, as above.

 Dec 18 15:09:13 turnip kernel: hald  S 0001e52f4793e397 0
 3040  1 (NOTLB)
 Dec 18 15:09:13 turnip kernel:  81003aa51e38 0086
  802
 68ee6
 Dec 18 15:09:13 turnip kernel:  81002a97e5c0 81003aa51de8
 80617800 806
 17800
 Dec 18 15:09:13 turnip kernel:  8061d210 80617800
 80617800 810
 0bb48
 Dec 18 15:09:13 turnip kernel: Call Trace:
 Dec 18 15:09:13 turnip kernel:  [80268ee6]
 get_page_from_freelist+0x3c4/0x545
 Dec 18 15:09:13 turnip kernel:  [803fac96]
 __mutex_lock_interruptible_slowpath+0x8b/
 0xca
 Dec 18 15:09:13 turnip kernel:  [80387adf] md_attr_show+0x2f/0x64
 Dec 18 15:09:13 turnip kernel:  [802cd142] 
 sysfs_read_file+0xb3/0x111
 Dec 18 15:09:13 turnip kernel:  [8028a42d] vfs_read+0xcb/0x153
 Dec 18 15:09:13 turnip kernel:  [8028a7c1] sys_read+0x45/0x6e
 Dec 18 15:09:13 turnip kernel:  [80209c2e] system_call+0x7e/0x83
 Dec 18 15:09:13 turnip kernel:

NOTE: kernel is stock openSUSE 10.3 kernel, x86_64, 2.6.22.13-0.3-default.


-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html