Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Doug Ledford wrote: Nah. Even if we had concluded that udev was to blame here, I'm not entirely certain that we hadn't left Daniel with the impression that we suspected it versus blamed it, so reiterating it doesn't hurt. And I'm sure no one has given him a fix for the problem (although Neil did request a change that will give debug output, but not solve the problem), so not dropping it entirely would seem appropriate as well. I've opened a bug report on Ubuntu's Launchpad.net. Scott James Remnant asked me to cc him on Neil's incremental reference - we'll see what happens from here. Thanks for the help guys. At the moment, I've changed my mdadm.conf to explicitly list the drives, instead of the auto=partition parameter. We'll see what happens on the next reboot. I don't know if it means anything, but I'm using a self-compiled 2.6.22 kernel - with initrd. At least I THINK I'm using initrd - I have an image, but I don't see an initrd line in my grub config. HmmI'm going to add a stanza that includes the initrd and see what happens also. Wow. Been a while since I asked about this - I just realized a reboot or two has come and gone. I checked my md status - everything was online! Cool. My current dmesg output: sata_nv :00:07.0: version 3.4 ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23 ACPI: PCI Interrupt :00:07.0[A] -> Link [LTID] -> GSI 23 (level, high) -> IRQ 23 sata_nv :00:07.0: Using ADMA mode PCI: Setting latency timer of device :00:07.0 to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 bmdma 0x00 011410 irq 23 ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 bmdma 0x00 011418 irq 23 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata1: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 1:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata2: bounce limit 0x, segment boundary 0x, hw segs 61 ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22 ACPI: PCI Interrupt :00:08.0[A] -> Link [LSI1] -> GSI 22 (level, high) -> IRQ 22 sata_nv :00:08.0: Using ADMA mode PCI: Setting latency timer of device :00:08.0 to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 bmdma 0x00 011420 irq 22 ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 bmdma 0x00 011428 irq 22 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata4.00: configured for UDMA/133 scsi 2:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata3: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 3:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata4: bounce limit 0x, segment boundary 0x, hw segs 61 sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: unknown partition table sd 0:0:0:0: [sda] Attached SCSI disk sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 312581
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Doug Ledford wrote: Nah. Even if we had concluded that udev was to blame here, I'm not entirely certain that we hadn't left Daniel with the impression that we suspected it versus blamed it, so reiterating it doesn't hurt. And I'm sure no one has given him a fix for the problem (although Neil did request a change that will give debug output, but not solve the problem), so not dropping it entirely would seem appropriate as well. I've opened a bug report on Ubuntu's Launchpad.net. Scott James Remnant asked me to cc him on Neil's incremental reference - we'll see what happens from here. Thanks for the help guys. At the moment, I've changed my mdadm.conf to explicitly list the drives, instead of the auto=partition parameter. We'll see what happens on the next reboot. I don't know if it means anything, but I'm using a self-compiled 2.6.22 kernel - with initrd. At least I THINK I'm using initrd - I have an image, but I don't see an initrd line in my grub config. HmmI'm going to add a stanza that includes the initrd and see what happens also. What did that do? -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Doug Ledford wrote: Nah. Even if we had concluded that udev was to blame here, I'm not entirely certain that we hadn't left Daniel with the impression that we suspected it versus blamed it, so reiterating it doesn't hurt. And I'm sure no one has given him a fix for the problem (although Neil did request a change that will give debug output, but not solve the problem), so not dropping it entirely would seem appropriate as well. I've opened a bug report on Ubuntu's Launchpad.net. Scott James Remnant asked me to cc him on Neil's incremental reference - we'll see what happens from here. Thanks for the help guys. At the moment, I've changed my mdadm.conf to explicitly list the drives, instead of the auto=partition parameter. We'll see what happens on the next reboot. I don't know if it means anything, but I'm using a self-compiled 2.6.22 kernel - with initrd. At least I THINK I'm using initrd - I have an image, but I don't see an initrd line in my grub config. HmmI'm going to add a stanza that includes the initrd and see what happens also. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Mon, 2007-10-29 at 22:29 +0100, Luca Berra wrote: > At which point he found that > >the udev scripts in ubuntu are being stupid, and from the looks of it > >are the cause of the problem. So, I've considered the initial issue > >root caused for a bit now. > It seems i made an idiot of myself by missing half of the thread, and i > even knew ubuntu was braindead in their use of udev at startup, since a > similar discussion came up on the lvm or the dm-devel mailing list (that > time iirc it was about lvm over multipath) Nah. Even if we had concluded that udev was to blame here, I'm not entirely certain that we hadn't left Daniel with the impression that we suspected it versus blamed it, so reiterating it doesn't hurt. And I'm sure no one has given him a fix for the problem (although Neil did request a change that will give debug output, but not solve the problem), so not dropping it entirely would seem appropriate as well. -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Mon, Oct 29, 2007 at 11:47:19AM -0400, Doug Ledford wrote: On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote: On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote: >Doug Ledford wrote: >>Anyway, I happen to *like* the idea of using full disk devices, but the >>reality is that the md subsystem doesn't have exclusive ownership of the >>disks at all times, and without that it really needs to stake a claim on >>the space instead of leaving things to chance IMO. >> >I've been re-reading this post numerous times - trying to ignore the >burgeoning flame war :) - and this last sentence finally clicked with me. > I am sorry Daniel, when i read Doug and Bill, stating that your issue was not having a partition table, i immediately took the bait and forgot about your original issue. I never said *his* issue was lack of partition table, I just said I don't recommend that because it's flaky. The last statement I made maybe i misread you but Bill was quite clear. about his issue was to ask about whether the problem was happening during initrd time or sysinit time to try and identify if it was failing before or after / was mounted to try and determine where the issue might lay. Then we got off on the tangent about partitions, and at the same time Neil started asking about udev, at which point it came out that he's running ubuntu, and as much as I would like to help, the fact of the matter is that I've never touched ubuntu and wouldn't have the faintest clue, so I let Neil handle it. At which point he found that the udev scripts in ubuntu are being stupid, and from the looks of it are the cause of the problem. So, I've considered the initial issue root caused for a bit now. It seems i made an idiot of myself by missing half of the thread, and i even knew ubuntu was braindead in their use of udev at startup, since a similar discussion came up on the lvm or the dm-devel mailing list (that time iirc it was about lvm over multipath) like udev/hal that believes it knows better than you about what you have on your disks. but _NEITHER OF THESE IS YOUR PROBLEM_ imho Actually, it looks like udev *is* the problem, but not because of partition tables. you are right. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Nothing in the documentation (that I read - granted I don't always read everything) stated that partitioning prior to md creation was necessary - in fact references were provided on how to use complete disks. Is there an "official" position on, "To Partition, or Not To Partition"? Particularly for my application - dedicated Linux server, RAID-10 configuration, identical drives. My simplistic reason for always making one partition on md drives, about 100MB smaller than the full space, has been as insurance to allow use of a replacement drive from another manufacturer, which while nominally marked as the same size as the originals, is in fact slightly smaller. Regards, Richard - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Sun, 2007-10-28 at 22:59 -0700, Daniel L. Miller wrote: > Doug Ledford wrote: > > Anyway, I happen to *like* the idea of using full disk devices, but the > > reality is that the md subsystem doesn't have exclusive ownership of the > > disks at all times, and without that it really needs to stake a claim on > > the space instead of leaving things to chance IMO. > > > I've been re-reading this post numerous times - trying to ignore the > burgeoning flame war :) - and this last sentence finally clicked with me. > > As I'm a novice Linux user - and not involved in development at all - > bear with me if I'm stating something obvious. And if I'm wrong - > please be gentle! > > 1. md devices are not "native" to the kernel - they are > created/assembled/activated/whatever by a userspace program. My real point was that md doesn't own the disks, meaning that during startup, and at other points in time, software other than the md stack can attempt to use the disk directly. That software may be the linux file system code, linux lvm code, or in some case entirely different OS software. Given that these situations can arise, using a partition table to mark the space as in use by linux is what I meant by staking a claim. It doesn't keep the linux kernel from using it because it thinks it owns it, but it does stop other software from attempting to use it. > 2. Because md devices are "non-native" devices, and are composed of > "native" devices, the kernel may try to use those components directly > without going through md. In the case of superblocks at the end, yes. The kernel may see the underlying file system or lvm disk label even if the md device is not started. > 3. Creating a partition table somehow (I'm still not clear how/why) > reduces the chance the kernel will access the drive directly without md. The partition table is more to tell other software that linux owns the space and to avoid mistakes where someone runs fdisk on a disk accidentally and wipes out your array because they added a partition table on what they thought was a new disk (more likely when you have large arrays of disks attached via fiber channel or such than in a single system). Putting the superblock at the beginning of the md device is the main thing that guarantees the kernel will never try to use what's inside the md device without the md device running. > These concepts suddenly have me terrified over my data integrity. Is > the md system so delicate that BOOT sequence can corrupt it? If you have your superblocks at the end of the devices, then there are certain failure modes that can cause data inconsistencies. Generally speaking they won't harm the array itself, it's just that the different disks in a raid1 array might contain different data. If you don't use partitions, then the majority of failure scenarios involve things like accidental use of fdisk on the unpartitioned device, access of the device by other OSes, that sort of thing. > How is it > more reliable AFTER the completed boot sequence? Once the array is up and running, the constituent disks are marked as busy in the operating system, which prevents other portions of the linux kernel and other software in general from getting at the md owned disks. > Nothing in the documentation (that I read - granted I don't always read > everything) stated that partitioning prior to md creation was necessary > - in fact references were provided on how to use complete disks. Is > there an "official" position on, "To Partition, or Not To Partition"? > Particularly for my application - dedicated Linux server, RAID-10 > configuration, identical drives. > > And if partitioning is the answer - what do I need to do with my live > dataset? Drop one drive, partition, then add the partition as a new > drive to the set - and repeat for each drive after the rebuild finishes? You *probably*, and I emphasize probably, don't need to do anything. I emphasize it because I don't know enough about your situation to say so with 100% certainty. If I'm wrong, it's not my fault. Now, that said, here's the gist of the situation. There are specific failure cases that can corrupt data in an md raid1 array mainly related to superblocks at the end of devices. There are specific failure cases where an unpartitioned device can be accidentally partitioned or where a partitioned md array in combination with superblocks at the end and using a whole disk device can be misrecognized as a partitioned normal drive. There are, on the other hand, cases where it's perfectly safe to use unpartitioned devices, or superblocks at the end of devices. My recommendation when someone asks what to do is to use partitions, and to use superblocks at the beginning of the devices (except for /boot since that isn't supported at the moment). The reason I give that advice is that I assume if a person knows enough to know when it's safe to use unpartitioned devices, like Luca, then they w
Re: Raid-10 mount at startup always has problem
On Mon, Oct 29, 2007 at 08:41:39AM +0100, Luca Berra wrote: > consider a storage with 64 spt, an io size of 4k and partition starting > at sector 63. > first io request will require two ios from the storage (1 for sector 63, > and one for sectors 64 to 70) > the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be > on the same track > the 8th will again require to be split, and so on. > this causes the storage to do 1 unnecessary io every 8. YMMV. That's only true for random reads. If the OS does sufficient read-ahead then sequential reads are affected much less. But the killers are the misaligned random writes since then (considering RAID5/6 for simplicity) the stripe has to be read from all component disks before it can be written back. Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote: > On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote: > >Doug Ledford wrote: > >>Anyway, I happen to *like* the idea of using full disk devices, but the > >>reality is that the md subsystem doesn't have exclusive ownership of the > >>disks at all times, and without that it really needs to stake a claim on > >>the space instead of leaving things to chance IMO. > >> > >I've been re-reading this post numerous times - trying to ignore the > >burgeoning flame war :) - and this last sentence finally clicked with me. > > > I am sorry Daniel, when i read Doug and Bill, stating that your issue > was not having a partition table, i immediately took the bait and forgot > about your original issue. I never said *his* issue was lack of partition table, I just said I don't recommend that because it's flaky. The last statement I made about his issue was to ask about whether the problem was happening during initrd time or sysinit time to try and identify if it was failing before or after / was mounted to try and determine where the issue might lay. Then we got off on the tangent about partitions, and at the same time Neil started asking about udev, at which point it came out that he's running ubuntu, and as much as I would like to help, the fact of the matter is that I've never touched ubuntu and wouldn't have the faintest clue, so I let Neil handle it. At which point he found that the udev scripts in ubuntu are being stupid, and from the looks of it are the cause of the problem. So, I've considered the initial issue root caused for a bit now. > like udev/hal that believes it knows better than you about what you have > on your disks. > but _NEITHER OF THESE IS YOUR PROBLEM_ imho Actually, it looks like udev *is* the problem, but not because of partition tables. > I am also sorry to say that i fail to identify what the source of your > problem is, we should try harder instead of flaming between us. We can do both, or at least I can :-P > Is it possible to reproduce it on the live system > e.g. unmount, stop array, start it again and mount. > I bet it will work flawlessly in this case. > then i would disable starting this array at boot, and start it manually > when the system is up (stracing mdadm, so we can see what it does) > > I am also wondering about this: > md: md0: raid array is not clean -- starting background reconstruction > does your system shut down properly? > do you see the message about stopping md at the very end of the > reboot/halt process? The root cause is that as udev adds his sata devices one at a time, on each add of the sata device it invokes mdadm to see if there is an array to start, and it doesn't use incremental mode on mdadm. As a result, as soon as there are 3 out of the 4 disks present, mdadm starts the array in degraded mode. It's probably a race between the mdadm started on the third disk and mdadm started on the fourth disk that results in the message about being unable to set the array info. The one loosing the race gets the error as the other one has already manipulated the array (for example, the 4th disk mdadm could be trying to add the first disk to the array, but it's already there, so it gets this error and bails). So, as much as you might dislike mkinitrd since 5.0 Luca, it doesn't have this particular problem ;-) In the initrd we produce, it loads all the SCSI/SATA/etc drivers first, then calls mkblkdevs which forces all of the devices to appear in /dev, and only then does it start the mdadm/lvm configuration. Daniel, I make no promises what so ever that this will even work at all as it may fail to load modules or all other sorts of weirdness, but if you want to test the theory, you can download the latest mkinitrd from fedoraproject.org, then use it to create an initrd image under some other name than your default image name, then manually edit your boot to have an extra stanza that uses the mkinitrd generated initrd image instead of the ubuntu image, and then just see if it brings the md device up cleanly instead of in degraded mode. That should be a fairly quick and easy way to test if Neil's analysis of the udev script was right. -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Mon, 2007-10-29 at 09:22 -0400, Bill Davidsen wrote: > > consider a storage with 64 spt, an io size of 4k and partition starting > > at sector 63. > > first io request will require two ios from the storage (1 for sector 63, > > and one for sectors 64 to 70) > > the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be > > on the same track > > the 8th will again require to be split, and so on. > > this causes the storage to do 1 unnecessary io every 8. YMMV. > No one makes drives with fixed spt any more. Your assumptions are a > decade out of date. Your missing the point, it's not about drive tracks, it's about array tracks, aka chunks. A 64k write, that should write to one and only one chunk, ends up spanning two. That increases the amount of writing the array has to do and the number of disks it busies for a typical single I/O operation. -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Sun, 2007-10-28 at 20:21 -0400, Bill Davidsen wrote: > Doug Ledford wrote: > > On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: > > > >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: > >> > >>> The partition table is the single, (mostly) universally recognized > >>> arbiter of what possible data might be on the disk. Having a partition > >>> table may not make mdadm recognize the md superblock any better, but it > >>> keeps all that other stuff from even trying to access data that it > >>> doesn't have a need to access and prevents random luck from turning your > >>> day bad. > >>> > >> on a pc maybe, but that is 20 years old design. > >> > > > > So? Unix is 35+ year old design, I suppose you want to switch to Vista > > then? > > > > > >> partition table design is limited because it is still based on C/H/S, > >> which do not exist anymore. > >> Put a partition table on a big storage, say a DMX, and enjoy a 20% > >> performance decrease. > >> > > > > Because you didn't stripe align the partition, your bad. > > > Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID > you're about to create), or ??? I don't notice my FC6 or FC7 install > programs using any special partition location to start, I have only run > (tried to run) FC8-test3 for the live CD, so I can't say what it might > do. CentOS4 didn't do anything obvious, either, so unless I really > misunderstand your position at redhat, that would be your bad. ;-) > > If you mean start a partition on a pseudo-CHS boundary, fdisk seems to > use what it thinks are cylinders for that. > > Please clarify what alignment provides a performance benefit. Luca was specifically talking about the big multi-terabyte to petabyte hardware arrays on the market. DMX, DDN, and others. When they export a volume to the OS, there is an underlying stripe layout to that volume. If you don't use any partition table at all, you are automatically aligned with their stripes. However, if you do, then you have to align your partition on a chunk boundary or else performance drops pretty dramatically as a result of more writes than not crossing chunk boundaries unnecessarily. It's only relevant when you are talking about a raid device that shows the OS a single logical disk made from lots of other disks. -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
Luca Berra wrote: On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote: Because you didn't stripe align the partition, your bad. Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID the real stripe (track) size of the storage, you must read the manual and/or bug technical support for that info. That's my point, there *is* no "real stripe (track) size of the storage" because modern drives use zone bit recording, and sectors per track depends on track, and changes within a partition. See http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm http://www.storagereview.com/guide2000/ref/hdd/op/mediaTracks.html you're about to create), or ??? I don't notice my FC6 or FC7 install programs using any special partition location to start, I have only run (tried to run) FC8-test3 for the live CD, so I can't say what it might do. CentOS4 didn't do anything obvious, either, so unless I really misunderstand your position at redhat, that would be your bad. ;-) If you mean start a partition on a pseudo-CHS boundary, fdisk seems to use what it thinks are cylinders for that. Yes, fdisk will create partition at sector 63 (due to CHS being braindead, other than fictional: 63 sectors-per-track) most arrays use 64 or 128 spt, and array cache are aligned accordingly. So 63 is almost always the wrong choice. As the above links show, there's no right choice. for the default choice you must consider what spt your array uses, iirc (this is from memory, so double check these figures) IBM 64 spt (i think) EMC DMX 64 EMC CX 128??? HDS (and HP XP) except OPEN-V 96 HDS (and HP XP) OPEN-V 128 HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i never found a technical explanation about that. previous HP EVA versions did (maybe 64). you might then want to consider how data is laid out on the storage, but i believe the storage cache is enough to deal with that issue. Please note that "0" is always well aligned. Note to people who is now wondering WTH i am talking about. consider a storage with 64 spt, an io size of 4k and partition starting at sector 63. first io request will require two ios from the storage (1 for sector 63, and one for sectors 64 to 70) the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be on the same track the 8th will again require to be split, and so on. this causes the storage to do 1 unnecessary io every 8. YMMV. No one makes drives with fixed spt any more. Your assumptions are a decade out of date. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote: Doug Ledford wrote: Anyway, I happen to *like* the idea of using full disk devices, but the reality is that the md subsystem doesn't have exclusive ownership of the disks at all times, and without that it really needs to stake a claim on the space instead of leaving things to chance IMO. I've been re-reading this post numerous times - trying to ignore the burgeoning flame war :) - and this last sentence finally clicked with me. I am sorry Daniel, when i read Doug and Bill, stating that your issue was not having a partition table, i immediately took the bait and forgot about your original issue. I have no reason to believe your problem is due to not having a partition table on your devices. sda: unknown partition table sdb: unknown partition table sdc: unknown partition table sdd: unknown partition table the above clearly shows that the kernel does not see a partition table where there is none which happens in some cases and bit Doug so hard. Note, it does not happen at random, it should happen only if you use a partitioned md device with a superblock at the end. Or if you configure it wrongly as Doug did. (i am not accusing Doug of being stupid at all, it is a fairly common mistake to make and we should try to prevent this in mdadm as much as we can) Again, having the kernel find a partition table where there is none, should not pose a problem at all unless there is some badly designed software like udev/hal that believes it knows better than you about what you have on your disks. but _NEITHER OF THESE IS YOUR PROBLEM_ imho I am also sorry to say that i fail to identify what the source of your problem is, we should try harder instead of flaming between us. Is it possible to reproduce it on the live system e.g. unmount, stop array, start it again and mount. I bet it will work flawlessly in this case. then i would disable starting this array at boot, and start it manually when the system is up (stracing mdadm, so we can see what it does) I am also wondering about this: md: md0: raid array is not clean -- starting background reconstruction does your system shut down properly? do you see the message about stopping md at the very end of the reboot/halt process? L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote: Because you didn't stripe align the partition, your bad. Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID the real stripe (track) size of the storage, you must read the manual and/or bug technical support for that info. you're about to create), or ??? I don't notice my FC6 or FC7 install programs using any special partition location to start, I have only run (tried to run) FC8-test3 for the live CD, so I can't say what it might do. CentOS4 didn't do anything obvious, either, so unless I really misunderstand your position at redhat, that would be your bad. ;-) If you mean start a partition on a pseudo-CHS boundary, fdisk seems to use what it thinks are cylinders for that. Yes, fdisk will create partition at sector 63 (due to CHS being braindead, other than fictional: 63 sectors-per-track) most arrays use 64 or 128 spt, and array cache are aligned accordingly. So 63 is almost always the wrong choice. for the default choice you must consider what spt your array uses, iirc (this is from memory, so double check these figures) IBM 64 spt (i think) EMC DMX 64 EMC CX 128??? HDS (and HP XP) except OPEN-V 96 HDS (and HP XP) OPEN-V 128 HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i never found a technical explanation about that. previous HP EVA versions did (maybe 64). you might then want to consider how data is laid out on the storage, but i believe the storage cache is enough to deal with that issue. Please note that "0" is always well aligned. Note to people who is now wondering WTH i am talking about. consider a storage with 64 spt, an io size of 4k and partition starting at sector 63. first io request will require two ios from the storage (1 for sector 63, and one for sectors 64 to 70) the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be on the same track the 8th will again require to be split, and so on. this causes the storage to do 1 unnecessary io every 8. YMMV. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Doug Ledford wrote: Anyway, I happen to *like* the idea of using full disk devices, but the reality is that the md subsystem doesn't have exclusive ownership of the disks at all times, and without that it really needs to stake a claim on the space instead of leaving things to chance IMO. I've been re-reading this post numerous times - trying to ignore the burgeoning flame war :) - and this last sentence finally clicked with me. As I'm a novice Linux user - and not involved in development at all - bear with me if I'm stating something obvious. And if I'm wrong - please be gentle! 1. md devices are not "native" to the kernel - they are created/assembled/activated/whatever by a userspace program. 2. Because md devices are "non-native" devices, and are composed of "native" devices, the kernel may try to use those components directly without going through md. 3. Creating a partition table somehow (I'm still not clear how/why) reduces the chance the kernel will access the drive directly without md. These concepts suddenly have me terrified over my data integrity. Is the md system so delicate that BOOT sequence can corrupt it? How is it more reliable AFTER the completed boot sequence? Nothing in the documentation (that I read - granted I don't always read everything) stated that partitioning prior to md creation was necessary - in fact references were provided on how to use complete disks. Is there an "official" position on, "To Partition, or Not To Partition"? Particularly for my application - dedicated Linux server, RAID-10 configuration, identical drives. And if partitioning is the answer - what do I need to do with my live dataset? Drop one drive, partition, then add the partition as a new drive to the set - and repeat for each drive after the rebuild finishes? -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Doug Ledford wrote: On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: The partition table is the single, (mostly) universally recognized arbiter of what possible data might be on the disk. Having a partition table may not make mdadm recognize the md superblock any better, but it keeps all that other stuff from even trying to access data that it doesn't have a need to access and prevents random luck from turning your day bad. on a pc maybe, but that is 20 years old design. So? Unix is 35+ year old design, I suppose you want to switch to Vista then? partition table design is limited because it is still based on C/H/S, which do not exist anymore. Put a partition table on a big storage, say a DMX, and enjoy a 20% performance decrease. Because you didn't stripe align the partition, your bad. Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID you're about to create), or ??? I don't notice my FC6 or FC7 install programs using any special partition location to start, I have only run (tried to run) FC8-test3 for the live CD, so I can't say what it might do. CentOS4 didn't do anything obvious, either, so unless I really misunderstand your position at redhat, that would be your bad. ;-) If you mean start a partition on a pseudo-CHS boundary, fdisk seems to use what it thinks are cylinders for that. Please clarify what alignment provides a performance benefit. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Sun, 2007-10-28 at 14:37 +0100, Luca Berra wrote: > On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote: > >Most of the time it does. But those times where it can fail, the > >failure is due to not taking the precautions necessary to prevent it: > >aka labeling disk usage via some sort of partition table/disklabel/etc. > I strongly disagree. > the failure is badly designed software. Then you need to blame Ingo who made putting the superblock at the end of the device the standard. If the superblock were always at the beginning, then this whole argument would be moot. Things would be reliable the way you want. > >Using whole disk devices isn't a means of organizing space. It's a way > >to get a rather miniscule amount of space back by *not* organizing the > >space. > if i am using, say lvm to organize disk space, a partition table is > unnecessary to the organization, and it is natural not using them. If you are using straight lvm then you don't have this problem anyway. Lvm doesn't allow the underlying physical device to *look* like a valid, partitioned, single device. Md does when the superblock is at the end. > >This whole argument seems to boil down to you wanting to perfectly > >optimize your system for your use case which includes controlling the > >environment enough that you know it's safe to not partition your disks, > >where as I argue that although this works in controlled environments, it > >is known to have failure modes in other environments, and I would be > >totally remiss if I recommended to my customers that they should take > >the risk that you can ignore because of your controlled environment > >since I know a lot of my customers *don't* have a controlled environment > >such as you do. > > The whole argument to me boils down to the fact that not having a partition > table on a device is possible, and software that do not consider this > eventuality is flawed, It's simply not possible to 100% certain differentiate between an md whole disk partitioned device with a superblock at the end and a regular device. Period. You can try to be clever, but you can also get tripped up. The flaw is not with the software, it's with a design that allowed this to happen. > and recommnding to work-around flawed software is > just digging your head in the sand. If a design is broken but in place, I have no choice but to work around it. Anything else is just stupid. > But i believe i did not convince you one ounce more than you convinced > me, so i'll quit this thread which is getting too far. > > Regards, > L. > > -- > Luca Berra -- [EMAIL PROTECTED] > Communication Media & Services S.r.l. > /"\ > \ / ASCII RIBBON CAMPAIGN > XAGAINST HTML MAIL > / \ > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote: On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote: >On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: >> >The partition table is the single, (mostly) universally recognized >> >arbiter of what possible data might be on the disk. Having a partition >> >table may not make mdadm recognize the md superblock any better, but it >> >keeps all that other stuff from even trying to access data that it >> >doesn't have a need to access and prevents random luck from turning your >> >day bad. >> on a pc maybe, but that is 20 years old design. > >So? Unix is 35+ year old design, I suppose you want to switch to Vista >then? unix is a 35+ year old design that evolved in time, some ideas were kept, some ditched. BSD disk labels are still in use, SunOS disk labels are still in use, i am not a solaris expert, do they still use disk labels under vxvm? oh, by the way, disklabels do not support the partition type attribute. partition tables are somewhat on the way out, but only because they are being replaced by the new EFI disk partitioning method. The only place where partitionless devices is common is in dedicated raid boxes where the raid controller is the only thing that will *ever* see that disk. well i am more used to other os (HP, AIX) where lvm is the common mean of accessing disk devices by default fdisk misalignes partition tables and aligning them is more complex than just doing without. So. You really need to take the time and to understand the alignment of the device because then and only then can you pass options to mke2fs to yes and i am not the only person in the world doing that. >Linux works properly with a partition table, so this is a specious >statement. It should also work properly without one. Most of the time it does. But those times where it can fail, the failure is due to not taking the precautions necessary to prevent it: aka labeling disk usage via some sort of partition table/disklabel/etc. I strongly disagree. the failure is badly designed software. Did you stick your mmc card in there during the install of the OS? My laptop has a built-in mmc slot, so i sometimes leave a card plugged in. But the mmc thing was just an example, it is not that critical. i don't count myself as a moron, what i am trying to say is that partition tables are one way of organizing disk space, not the only one. Using whole disk devices isn't a means of organizing space. It's a way to get a rather miniscule amount of space back by *not* organizing the space. if i am using, say lvm to organize disk space, a partition table is unnecessary to the organization, and it is natural not using them. This whole argument seems to boil down to you wanting to perfectly optimize your system for your use case which includes controlling the environment enough that you know it's safe to not partition your disks, where as I argue that although this works in controlled environments, it is known to have failure modes in other environments, and I would be totally remiss if I recommended to my customers that they should take the risk that you can ignore because of your controlled environment since I know a lot of my customers *don't* have a controlled environment such as you do. The whole argument to me boils down to the fact that not having a partition table on a device is possible, and software that do not consider this eventuality is flawed, and recommnding to work-around flawed software is just digging your head in the sand. But i believe i did not convince you one ounce more than you convinced me, so i'll quit this thread which is getting too far. Regards, L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote: > On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote: > >On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: > >> On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: > >> >The partition table is the single, (mostly) universally recognized > >> >arbiter of what possible data might be on the disk. Having a partition > >> >table may not make mdadm recognize the md superblock any better, but it > >> >keeps all that other stuff from even trying to access data that it > >> >doesn't have a need to access and prevents random luck from turning your > >> >day bad. > >> on a pc maybe, but that is 20 years old design. > > > >So? Unix is 35+ year old design, I suppose you want to switch to Vista > >then? > unix is a 35+ year old design that evolved in time, some ideas were > kept, some ditched. BSD disk labels are still in use, SunOS disk labels are still in use, partition tables are somewhat on the way out, but only because they are being replaced by the new EFI disk partitioning method. The only place where partitionless devices is common is in dedicated raid boxes where the raid controller is the only thing that will *ever* see that disk. Sometimes they do it on big SAN/NAS stuff because they don't want to align the partition table to the underlying device's stripe layout, but even then they do so in a tightly controlled environment where they know exactly which machines will be allowed to even try and access the device. > >> partition table design is limited because it is still based on C/H/S, > >> which do not exist anymore. > >> Put a partition table on a big storage, say a DMX, and enjoy a 20% > >> performance decrease. > > > >Because you didn't stripe align the partition, your bad. > :) > by default fdisk misalignes partition tables > and aligning them is more complex than just doing without. So. You really need to take the time and to understand the alignment of the device because then and only then can you pass options to mke2fs to align the fs metadata with the stripes as well thereby buying you ever more performance than just leaving off the partition table (assuming that's what you use, I don't know if other mkfs programs have the same options for aligning metadata with stripes). And if you take the time to understand the underlying stripe layout for the mkfs stuff, then you can use the same information to align the partition table. > >> >Oh, and let's not go into what can happen if you're talking about a dual > >> >boot machine and what Windows might do to the disk if it doesn't think > >> >the disk space is already spoken for by a linux partition. > >> Why the hell should the existance of windows limit the possibility of > >> linux working properly. > > > >Linux works properly with a partition table, so this is a specious > >statement. > It should also work properly without one. Most of the time it does. But those times where it can fail, the failure is due to not taking the precautions necessary to prevent it: aka labeling disk usage via some sort of partition table/disklabel/etc. > >> If i have a pc that dualboots windows i will take care of using the > >> common denominator of a partition table, if it is my big server i will > >> probably not. since it won't boot anything else than Linux. > > > >Doesn't really gain you anything, but your choice. Besides, the > >question wasn't "why shouldn't Luca Berra use whole disk devices", it > >was why I don't recommend using whole disk devices, and my > >recommendation wasn't based in the least bit upon a single person's use > >scenario. > If i am the only person in the world that believes partition tables > should not be required then i'll shut up. > > >> On the opposite, i once inserted an mmc memory card, which had been > >> initialized on my mobile phone, into the mmc slot of my laptop, and was > >> faced with a load of error about mmcblk0 having an invalid partition > >> table. > > > >So? The messages are just informative, feel free to ignore them. > but did not anaconda propose to wipe unpartitioned disks? Did you stick your mmc card in there during the install of the OS? That's the only time anaconda ever runs, and therefore the only time it ever checks your devices. It makes sense that during the initial install, when the OS is only configured to see locally connected devices, or possibly iSCSI devices that you have specifically told it to probe, that it would then ask you the question about those devices. Other network attached or shared devices are generally added after the initial install. > >The phone dictates the format, only a moron would say otherwise. But, > >then again, the phone doesn't care about interoperability and many other > >issues on memory cards that it thinks it owns, so only a moron would > >argue that because a phone doesn't use a partition table that nothing > >else in the computer realm needs to either. > i don't count myself as a moron, w
Re: Raid-10 mount at startup always has problem
On Sat, Oct 27, 2007 at 09:50:55AM +0200, Luca Berra wrote: >> Because you didn't stripe align the partition, your bad. > :) > by default fdisk misalignes partition tables > and aligning them is more complex than just doing without. Why use fdisk then? Use parted instead. It's not the kernel's fault if you use tools not suited for a given task... >> Linux works properly with a partition table, so this is a specious >> statement. > It should also work properly without one. It does: sd 0:0:2:0: [sdc] Very big device. Trying to use READ CAPACITY(16). sd 0:0:2:0: [sdc] 7812333568 512-byte hardware sectors (315 MB) sd 0:0:2:0: [sdc] Write Protect is off sd 0:0:2:0: [sdc] Mode Sense: 23 00 00 00 sd 0:0:2:0: [sdc] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA sdc: unknown partition table Works perfectly without any partition tables... You seem to be annoyed because the kernel tells you that there is no partition table it recognizes - but if that bothers you so, simply stop reading the kernel logs. My kernel also tells me that it failed to find an AGP bridge - by your logic that should mean that everyone still using AGP-capable motherboards should toss their system to the junkyard?!? Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Fri, Oct 26, 2007 at 06:53:40PM +0200, Gabor Gombas wrote: On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote: on a pc maybe, but that is 20 years old design. partition table design is limited because it is still based on C/H/S, which do not exist anymore. The MS-DOS format is not the only possible partition table layout. Other formats such as GPT do not have such limitations. Put a partition table on a big storage, say a DMX, and enjoy a 20% performance decrease. I assume your "big storage" uses some kind of RAID. Are your partitions stripe-aligned? (Btw. that has nothing to do with partitions, LVM can also suffer if PEs are not aligned). mine are, unfortunately the default is to start them at 32256 bytes into the device. Oh, and let's not go into what can happen if you're talking about a dual boot machine and what Windows might do to the disk if it doesn't think the disk space is already spoken for by a linux partition. Why the hell should the existance of windows limit the possibility of linux working properly. what i am saying is that a dual boot machine is not the only scenario we have. On the opposite, i once inserted an mmc memory card, which had been initialized on my mobile phone, into the mmc slot of my laptop, and was faced with a load of error about mmcblk0 having an invalid partition table. Obviously it had none, it was a plain fat filesystem. Is the solution partitioning it? I don't think the phone would agree. Well, it said it could not find a valid partition change. That was the truth. Why is it a problem if the kernel states a fact? it is random. reformatting it made the kernel message go away. i wonder if by chance something would decide it is a valid partition table -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote: On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: >The partition table is the single, (mostly) universally recognized >arbiter of what possible data might be on the disk. Having a partition >table may not make mdadm recognize the md superblock any better, but it >keeps all that other stuff from even trying to access data that it >doesn't have a need to access and prevents random luck from turning your >day bad. on a pc maybe, but that is 20 years old design. So? Unix is 35+ year old design, I suppose you want to switch to Vista then? unix is a 35+ year old design that evolved in time, some ideas were kept, some ditched. partition table design is limited because it is still based on C/H/S, which do not exist anymore. Put a partition table on a big storage, say a DMX, and enjoy a 20% performance decrease. Because you didn't stripe align the partition, your bad. :) by default fdisk misalignes partition tables and aligning them is more complex than just doing without. >Oh, and let's not go into what can happen if you're talking about a dual >boot machine and what Windows might do to the disk if it doesn't think >the disk space is already spoken for by a linux partition. Why the hell should the existance of windows limit the possibility of linux working properly. Linux works properly with a partition table, so this is a specious statement. It should also work properly without one. If i have a pc that dualboots windows i will take care of using the common denominator of a partition table, if it is my big server i will probably not. since it won't boot anything else than Linux. Doesn't really gain you anything, but your choice. Besides, the question wasn't "why shouldn't Luca Berra use whole disk devices", it was why I don't recommend using whole disk devices, and my recommendation wasn't based in the least bit upon a single person's use scenario. If i am the only person in the world that believes partition tables should not be required then i'll shut up. On the opposite, i once inserted an mmc memory card, which had been initialized on my mobile phone, into the mmc slot of my laptop, and was faced with a load of error about mmcblk0 having an invalid partition table. So? The messages are just informative, feel free to ignore them. but did not anaconda propose to wipe unpartitioned disks? The phone dictates the format, only a moron would say otherwise. But, then again, the phone doesn't care about interoperability and many other issues on memory cards that it thinks it owns, so only a moron would argue that because a phone doesn't use a partition table that nothing else in the computer realm needs to either. i don't count myself as a moron, what i am trying to say is that partition tables are one way of organizing disk space, not the only one. >Anyway, I happen to *like* the idea of using full disk devices, but the >reality is that the md subsystem doesn't have exclusive ownership of the >disks at all times, and without that it really needs to stake a claim on >the space instead of leaving things to chance IMO. Start removing the partition detection code from the blasted kernel and move it to userspace, which is already in place, but it is not the default. Which just moves where the work is done, not what work needs to be done. and also permits to decide if it hat to be done or not. It's a change for no benefit and a waste of time. the waste of time was having to put code in mdadm to undo partition detection on component devices, where partition detection should not have taken place. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote: > On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: > >The partition table is the single, (mostly) universally recognized > >arbiter of what possible data might be on the disk. Having a partition > >table may not make mdadm recognize the md superblock any better, but it > >keeps all that other stuff from even trying to access data that it > >doesn't have a need to access and prevents random luck from turning your > >day bad. > on a pc maybe, but that is 20 years old design. So? Unix is 35+ year old design, I suppose you want to switch to Vista then? > partition table design is limited because it is still based on C/H/S, > which do not exist anymore. > Put a partition table on a big storage, say a DMX, and enjoy a 20% > performance decrease. Because you didn't stripe align the partition, your bad. > >Oh, and let's not go into what can happen if you're talking about a dual > >boot machine and what Windows might do to the disk if it doesn't think > >the disk space is already spoken for by a linux partition. > Why the hell should the existance of windows limit the possibility of > linux working properly. Linux works properly with a partition table, so this is a specious statement. > If i have a pc that dualboots windows i will take care of using the > common denominator of a partition table, if it is my big server i will > probably not. since it won't boot anything else than Linux. Doesn't really gain you anything, but your choice. Besides, the question wasn't "why shouldn't Luca Berra use whole disk devices", it was why I don't recommend using whole disk devices, and my recommendation wasn't based in the least bit upon a single person's use scenario. > >And, in particular with mdadm, I once created a full disk md raid array > >on a couple disks, then couldn't get things arranged like I wanted, so I > >just partitioned the disks and then created new arrays in the partitions > >(without first manually zeroing the superblock for the whole disk > >array). Since I used a version 1.0 superblock on the whole disk array, > >and then used version 1.1 superblocks in the partitions, the net result > >was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0 > >superblocks in the last partition on the disk. Confused both myself and > >mdadm for a while. > yes, this is fun > On the opposite, i once inserted an mmc memory card, which had been > initialized on my mobile phone, into the mmc slot of my laptop, and was > faced with a load of error about mmcblk0 having an invalid partition > table. So? The messages are just informative, feel free to ignore them. > Obviously it had none, it was a plain fat filesystem. > Is the solution partitioning it? I don't think the phone would > agree. The phone dictates the format, only a moron would say otherwise. But, then again, the phone doesn't care about interoperability and many other issues on memory cards that it thinks it owns, so only a moron would argue that because a phone doesn't use a partition table that nothing else in the computer realm needs to either. > >Anyway, I happen to *like* the idea of using full disk devices, but the > >reality is that the md subsystem doesn't have exclusive ownership of the > >disks at all times, and without that it really needs to stake a claim on > >the space instead of leaving things to chance IMO. > Start removing the partition detection code from the blasted kernel and > move it to userspace, which is already in place, but it is not the > default. Which just moves where the work is done, not what work needs to be done. It's a change for no benefit and a waste of time. -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote: > on a pc maybe, but that is 20 years old design. > partition table design is limited because it is still based on C/H/S, > which do not exist anymore. The MS-DOS format is not the only possible partition table layout. Other formats such as GPT do not have such limitations. > Put a partition table on a big storage, say a DMX, and enjoy a 20% > performance decrease. I assume your "big storage" uses some kind of RAID. Are your partitions stripe-aligned? (Btw. that has nothing to do with partitions, LVM can also suffer if PEs are not aligned). >> Oh, and let's not go into what can happen if you're talking about a dual >> boot machine and what Windows might do to the disk if it doesn't think >> the disk space is already spoken for by a linux partition. > Why the hell should the existance of windows limit the possibility of > linux working properly. Well, if you want to convert a Windows partition to Linux by just changing the partition type, running mke2fs over it, and filling it with data, Windows will happily ignore the partition table change and will overwrite your data without any notice on the next boot (happened with one collegaue, not fun to debug). So much for automatic device type detection... > On the opposite, i once inserted an mmc memory card, which had been > initialized on my mobile phone, into the mmc slot of my laptop, and was > faced with a load of error about mmcblk0 having an invalid partition > table. Obviously it had none, it was a plain fat filesystem. > Is the solution partitioning it? I don't think the phone would > agree. Well, it said it could not find a valid partition change. That was the truth. Why is it a problem if the kernel states a fact? Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote: partition table (something that the Fedora/RHEL installers do to all disks without partition tables...well, the installer tells you there's no partition table and asks if you want to initialize it, but if someone is in a hurry and hits yes when they meant no, bye bye data). Cool feature The partition table is the single, (mostly) universally recognized arbiter of what possible data might be on the disk. Having a partition table may not make mdadm recognize the md superblock any better, but it keeps all that other stuff from even trying to access data that it doesn't have a need to access and prevents random luck from turning your day bad. on a pc maybe, but that is 20 years old design. partition table design is limited because it is still based on C/H/S, which do not exist anymore. Put a partition table on a big storage, say a DMX, and enjoy a 20% performance decrease. Oh, and let's not go into what can happen if you're talking about a dual boot machine and what Windows might do to the disk if it doesn't think the disk space is already spoken for by a linux partition. Why the hell should the existance of windows limit the possibility of linux working properly. If i have a pc that dualboots windows i will take care of using the common denominator of a partition table, if it is my big server i will probably not. since it won't boot anything else than Linux. And, in particular with mdadm, I once created a full disk md raid array on a couple disks, then couldn't get things arranged like I wanted, so I just partitioned the disks and then created new arrays in the partitions (without first manually zeroing the superblock for the whole disk array). Since I used a version 1.0 superblock on the whole disk array, and then used version 1.1 superblocks in the partitions, the net result was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0 superblocks in the last partition on the disk. Confused both myself and mdadm for a while. yes, this is fun On the opposite, i once inserted an mmc memory card, which had been initialized on my mobile phone, into the mmc slot of my laptop, and was faced with a load of error about mmcblk0 having an invalid partition table. Obviously it had none, it was a plain fat filesystem. Is the solution partitioning it? I don't think the phone would agree. Anyway, I happen to *like* the idea of using full disk devices, but the reality is that the md subsystem doesn't have exclusive ownership of the disks at all times, and without that it really needs to stake a claim on the space instead of leaving things to chance IMO. Start removing the partition detection code from the blasted kernel and move it to userspace, which is already in place, but it is not the default. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Thursday October 25, [EMAIL PROTECTED] wrote: > Neil Brown wrote: > > It might be worth finding out where mdadm is being run in the init > > scripts and add a "-v" flag, and redirecting stdout/stderr to some log > > file. > > e.g. > >mdadm -As -v > /var/log/mdadm-$$ 2>&1 > > > > And see if that leaves something useful in the log file. > > > > > I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules > file (BTW - running on Ubuntu 7.10 Gutsy): > > SUBSYSTEM=="block", ACTION=="add|change", > ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm > /sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1" Yes, that would do exactly what you are experiencing. Every time a component of a raid array is discovered, it will try to assemble all known arrays. So one drive appears, it tries to assemble the array but there aren't enough so it gives up. Then two drives. Chances are there still aren't enough, so it gives up again. Then when there are three drives it will successfully assemble the array - degraded. Then when there are 4 drives, it will be too late. I cannot see why that would lead to the "cannot update array info" error, but it certainly explains the rest. That is really bad stuff to have in udev. The "--incremental" mode was written precisely for use in udev. I wonder why they didn't use it Maybe you should log a bug report with Ubuntu and suggest they discuss their udev scripts with the developer of mdadm (that would be me I guess). NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Thursday October 25, [EMAIL PROTECTED] wrote: > Neil Brown wrote: > > > > BTW, I don't think your problem has anything to do with the fact that > > you are using whole partitions. > > > > You don't think the "unknown partition table" on sdd is related? Because > I read that as a sure indication that the system isn't considering the > drive as one without a partition table, and therefore isn't looking for > the superblock on the whole device. And as Doug pointed out, once you > decide that there is a partition table lots of things might try to use it. "unknown partition table" is what I would expect when using whole drive. It just mean "the first block doesn't look like a partition table", and if you have some early block of an ext3 (or other) filesystem in the first block (as you would in this case), you wouldn't expect it to look like a partition table. I don't understand what you are trying to say with your second sentence. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Bill Davidsen wrote: You don't think the "unknown partition table" on sdd is related? Because I read that as a sure indication that the system isn't considering the drive as one without a partition table, and therefore isn't looking for the superblock on the whole device. And as Doug pointed out, once you decide that there is a partition table lots of things might try to use it. Now, would the drive "letters" (sd[a-d]) change from reboot-to-reboot? Because it's not consistent - so far I've seen each of the four drives at one time or another fail during the boot. I've added the verbose logging to the udev mdadm rule, and I've also manually specified the drives in mdadm.conf instead of leaving it on auto. Curious what the next boot will bring. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Neil Brown wrote: On Wednesday October 24, [EMAIL PROTECTED] wrote: Current mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part still have the problem where on boot one drive is not part of the array. Is there a log file I can check to find out WHY a drive is not being added? It's been a while since the reboot, but I did find some entries in dmesg - I'm appending both the md lines and the physical disk related lines. The bottom shows one disk not being added (this time is was sda) - and the disk that gets skipped on each boot seems to be random - there's no consistent failure: Odd but interesting. Does it sometimes fail to start the array altogether? md: md0 stopped. md: md0 stopped. md: bind md: bind md: bind md: md0: raid array is not clean -- starting background reconstruction raid10: raid set md0 active with 3 out of 4 devices md: couldn't update array info. -22 ^^^ This is the most surprising line, and hence the one most likely to convey helpful information. This message is generated when a process calls "SET_ARRAY_INFO" on an array that is already running, and the changes implied by the new "array_info" are not supportable. The only way I can see this happening is if two copies of "mdadm" are running at exactly the same time and are both are trying to assemble the same array. The first calls SET_ARRAY_INFO and assembles the (partial) array. The second calls SET_ARRAY_INFO and gets this error. Not all devices are included because while when one mdadm when to look, at a device, the other has it locked and so the first just ignored it. I just tried that, and sometimes it worked, but sometimes it assembled with 3 out of 4 devices. I didn't get the "couldn't update array info" message, but that doesn't prove I'm wrong. I cannot imagine how that might be happening (two at once) unless maybe 'udev' had been configured to do something as soon as devices were discovered seems unlikely. It might be worth finding out where mdadm is being run in the init scripts and add a "-v" flag, and redirecting stdout/stderr to some log file. e.g. mdadm -As -v > /var/log/mdadm-$$ 2>&1 And see if that leaves something useful in the log file. BTW, I don't think your problem has anything to do with the fact that you are using whole partitions. You don't think the "unknown partition table" on sdd is related? Because I read that as a sure indication that the system isn't considering the drive as one without a partition table, and therefore isn't looking for the superblock on the whole device. And as Doug pointed out, once you decide that there is a partition table lots of things might try to use it. While it is debatable whether that is a good idea or not (I like the idea, but Doug doesn't and I respect his opinion) I doubt it would contribute to the current problem. Your description makes me nearly certain that there is some sort of race going on (that is the easiest way to explain randomly differing behaviours). The race is probably between different code 'locking' (opening with O_EXCL) the various devices. Give the above error message, two different 'mdadm's seems most likely, but an mdadm and a mount-by-label scan could probably do it too. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Neil Brown wrote: It might be worth finding out where mdadm is being run in the init scripts and add a "-v" flag, and redirecting stdout/stderr to some log file. e.g. mdadm -As -v > /var/log/mdadm-$$ 2>&1 And see if that leaves something useful in the log file. I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules file (BTW - running on Ubuntu 7.10 Gutsy): SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm /sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1" # This next line (only) is put into the initramfs, # where we run a strange script to activate only some of the arrays # as configured, instead of mdadm -As: #initramfs# SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm /scripts/local-top/mdadm from-udev" Could that initramfs line be causing the problem? -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Neil Brown wrote: It might be worth finding out where mdadm is being run in the init scripts and add a "-v" flag, and redirecting stdout/stderr to some log file. e.g. mdadm -As -v > /var/log/mdadm-$$ 2>&1 And see if that leaves something useful in the log file. I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules file (BTW - running on Ubuntu 7.10 Gutsy): SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm /sbin/mdadm -As -v > /var/log/mdadm-$$ 2>&1" # This next line (only) is put into the initramfs, # where we run a strange script to activate only some of the arrays # as configured, instead of mdadm -As: #initramfs# SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", RUN+="watershed -i udev-mdadm /scripts/local-top/mdadm from-udev" Could that initramfs line be causing the problem? -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
On Thu, 2007-10-25 at 16:12 +1000, Neil Brown wrote: > > md: md0 stopped. > > md: md0 stopped. > > md: bind > > md: bind > > md: bind > > md: md0: raid array is not clean -- starting background reconstruction > > raid10: raid set md0 active with 3 out of 4 devices > > md: couldn't update array info. -22 > ^^^ > > This is the most surprising line, and hence the one most likely to > convey helpful information. > > This message is generated when a process calls "SET_ARRAY_INFO" on an > array that is already running, and the changes implied by the new > "array_info" are not supportable. > > The only way I can see this happening is if two copies of "mdadm" are > running at exactly the same time and are both are trying to assemble > the same array. The first calls SET_ARRAY_INFO and assembles the > (partial) array. The second calls SET_ARRAY_INFO and gets this error. > Not all devices are included because while when one mdadm when to > look, at a device, the other has it locked and so the first just > ignored it. If mdadm copy A gets three of the devices, I wouldn't think mdadm copy B would have been able to get enough devices to decide to even try and assemble the array (assuming that once copy A locked the devices during open, that it then held the devices until time to assemble the array). -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
On Wed, 2007-10-24 at 22:43 -0700, Daniel L. Miller wrote: > Bill Davidsen wrote: > Daniel L. Miller wrote: > >> Current mdadm.conf: > >> DEVICE partitions > >> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 > >> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part > >> > >> still have the problem where on boot one drive is not part of the > >> array. Is there a log file I can check to find out WHY a drive is > >> not being added? It's been a while since the reboot, but I did find > >> some entries in dmesg - I'm appending both the md lines and the > >> physical disk related lines. The bottom shows one disk not being > >> added (this time is was sda) - and the disk that gets skipped on each > >> boot seems to be random - there's no consistent failure: > > > > I suspect the base problem is that you are using whole disks instead > > of partitions, and the problem with the partition table below is > > probably an indication that you have something on that drive which > > looks like a partition table but isn't. That prevents the drive from > > being recognized as a whole drive. You're lucky, if the data looked > > enough like a partition table to be valid the o/s probably would have > > tried to do something with it. > > [...] > > This may be the rare case where you really do need to specify the > > actual devices to get reliable operation. > OK - I'm officially confused now (I was just unofficially before). WHY > is it a problem using whole drives as RAID components? I would have > thought that building a RAID storage unit with identically sized drives > - and using each drive's full capacity - is exactly the way you're > supposed to! As much as anything else this can be summed up as you are thinking of how you are using the drives and not how unexpected software on your system might try and use your drives. Without a partition table, none of the software on your system can know what to do with the drives except mdadm when it finds an md superblock. That doesn't stop other software from *trying* to find out how to use your drives though. That includes the kernel trying to look for a valid partition table, mount possibly scanning the drive for a file system label, lvm scanning for an lvm superblock, mtools looking for a dos filesystem, etc. Under normal conditions, the random data on your drive will never look valid to these other pieces of software. But, once in a great while, it will look valid. And that's when all hell breaks loose. Or worse, you run a partition program such as fdisk on the device and it initializes the partition table (something that the Fedora/RHEL installers do to all disks without partition tables...well, the installer tells you there's no partition table and asks if you want to initialize it, but if someone is in a hurry and hits yes when they meant no, bye bye data). The partition table is the single, (mostly) universally recognized arbiter of what possible data might be on the disk. Having a partition table may not make mdadm recognize the md superblock any better, but it keeps all that other stuff from even trying to access data that it doesn't have a need to access and prevents random luck from turning your day bad. Oh, and let's not go into what can happen if you're talking about a dual boot machine and what Windows might do to the disk if it doesn't think the disk space is already spoken for by a linux partition. And, in particular with mdadm, I once created a full disk md raid array on a couple disks, then couldn't get things arranged like I wanted, so I just partitioned the disks and then created new arrays in the partitions (without first manually zeroing the superblock for the whole disk array). Since I used a version 1.0 superblock on the whole disk array, and then used version 1.1 superblocks in the partitions, the net result was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0 superblocks in the last partition on the disk. Confused both myself and mdadm for a while. Anyway, I happen to *like* the idea of using full disk devices, but the reality is that the md subsystem doesn't have exclusive ownership of the disks at all times, and without that it really needs to stake a claim on the space instead of leaving things to chance IMO. > I should mention that the boot/system drive is IDE, and > NOT part of the RAID. So I'm not worried about losing the system - but > I AM concerned about the data. I'm using four drives in a RAID-10 > configuration - I thought this would provide a good blend of safety and > performance for a small fileserver. > > Because it's RAID-10 - I would ASSuME that I can drop one drive (after > all, I keep booting one drive short), partition if necessary, and add it > back in. But how would splitting these disks into partitions improve > either stability or performance? -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/d
Re: Raid-10 mount at startup always has problem
On Wednesday October 24, [EMAIL PROTECTED] wrote: > Current mdadm.conf: > DEVICE partitions > ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 > UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part > > still have the problem where on boot one drive is not part of the > array. Is there a log file I can check to find out WHY a drive is not > being added? It's been a while since the reboot, but I did find some > entries in dmesg - I'm appending both the md lines and the physical disk > related lines. The bottom shows one disk not being added (this time is > was sda) - and the disk that gets skipped on each boot seems to be > random - there's no consistent failure: Odd but interesting. Does it sometimes fail to start the array altogether? > md: md0 stopped. > md: md0 stopped. > md: bind > md: bind > md: bind > md: md0: raid array is not clean -- starting background reconstruction > raid10: raid set md0 active with 3 out of 4 devices > md: couldn't update array info. -22 ^^^ This is the most surprising line, and hence the one most likely to convey helpful information. This message is generated when a process calls "SET_ARRAY_INFO" on an array that is already running, and the changes implied by the new "array_info" are not supportable. The only way I can see this happening is if two copies of "mdadm" are running at exactly the same time and are both are trying to assemble the same array. The first calls SET_ARRAY_INFO and assembles the (partial) array. The second calls SET_ARRAY_INFO and gets this error. Not all devices are included because while when one mdadm when to look, at a device, the other has it locked and so the first just ignored it. I just tried that, and sometimes it worked, but sometimes it assembled with 3 out of 4 devices. I didn't get the "couldn't update array info" message, but that doesn't prove I'm wrong. I cannot imagine how that might be happening (two at once) unless maybe 'udev' had been configured to do something as soon as devices were discovered seems unlikely. It might be worth finding out where mdadm is being run in the init scripts and add a "-v" flag, and redirecting stdout/stderr to some log file. e.g. mdadm -As -v > /var/log/mdadm-$$ 2>&1 And see if that leaves something useful in the log file. BTW, I don't think your problem has anything to do with the fact that you are using whole partitions. While it is debatable whether that is a good idea or not (I like the idea, but Doug doesn't and I respect his opinion) I doubt it would contribute to the current problem. Your description makes me nearly certain that there is some sort of race going on (that is the easiest way to explain randomly differing behaviours). The race is probably between different code 'locking' (opening with O_EXCL) the various devices. Give the above error message, two different 'mdadm's seems most likely, but an mdadm and a mount-by-label scan could probably do it too. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Bill Davidsen wrote: Daniel L. Miller wrote: Current mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part still have the problem where on boot one drive is not part of the array. Is there a log file I can check to find out WHY a drive is not being added? It's been a while since the reboot, but I did find some entries in dmesg - I'm appending both the md lines and the physical disk related lines. The bottom shows one disk not being added (this time is was sda) - and the disk that gets skipped on each boot seems to be random - there's no consistent failure: I suspect the base problem is that you are using whole disks instead of partitions, and the problem with the partition table below is probably an indication that you have something on that drive which looks like a partition table but isn't. That prevents the drive from being recognized as a whole drive. You're lucky, if the data looked enough like a partition table to be valid the o/s probably would have tried to do something with it. [...] This may be the rare case where you really do need to specify the actual devices to get reliable operation. OK - I'm officially confused now (I was just unofficially before). WHY is it a problem using whole drives as RAID components? I would have thought that building a RAID storage unit with identically sized drives - and using each drive's full capacity - is exactly the way you're supposed to! I should mention that the boot/system drive is IDE, and NOT part of the RAID. So I'm not worried about losing the system - but I AM concerned about the data. I'm using four drives in a RAID-10 configuration - I thought this would provide a good blend of safety and performance for a small fileserver. Because it's RAID-10 - I would ASSuME that I can drop one drive (after all, I keep booting one drive short), partition if necessary, and add it back in. But how would splitting these disks into partitions improve either stability or performance? -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Daniel L. Miller wrote: Richard Scobie wrote: Daniel L. Miller wrote: And you didn't ask, but my mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a Try adding auto=part at the end of you mdadm.conf ARRAY line. Thanks - will see what happens on my next reboot. Current mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part still have the problem where on boot one drive is not part of the array. Is there a log file I can check to find out WHY a drive is not being added? It's been a while since the reboot, but I did find some entries in dmesg - I'm appending both the md lines and the physical disk related lines. The bottom shows one disk not being added (this time is was sda) - and the disk that gets skipped on each boot seems to be random - there's no consistent failure: I suspect the base problem is that you are using whole disks instead of partitions, and the problem with the partition table below is probably an indication that you have something on that drive which looks like a partition table but isn't. That prevents the drive from being recognized as a whole drive. You're lucky, if the data looked enough like a partition table to be valid the o/s probably would have tried to do something with it. I can't see any easy (or safe) backout on this, you have used the whole disk, so you can't just drop a drive, partition, and add the partition back in place of the drive. And if you have a failure and ever have to replace a drive, you will have to use a drive or partition at least as large as what you have. Hopefully someone will have a good idea how to gracefully transition to a safer setup, if random data ever looks like a valid partition table, evil may occur. And if you ever get this on two drives at once the system won't boot. Two time-bomb cases, and they're not mutually exclusive. This may be the rare case where you really do need to specify the actual devices to get reliable operation. [...] md: raid10 personality registered for level 10 [...] md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. [...] scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 bmdma 0x00011410 irq 23 ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 bmdma 0x00011418 irq 23 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata1: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 1:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata2: bounce limit 0x, segment boundary 0x, hw segs 61 ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22 ACPI: PCI Interrupt :00:08.0[A] -> Link [LSI1] -> GSI 22 (level, high) -> IRQ 22 sata_nv :00:08.0: Using ADMA mode PCI: Setting latency timer of device :00:08.0 to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 bmdma 0x00011420 irq 22 ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 bmdma 0x00011428 irq 22 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata4.00: configured for UDMA/133 scsi 2:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata3: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 3:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata4: bounce limit 0x, segment boundary 0x, hw segs 61 sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: unknown partition table sd 0:0:0:0: [sda] Attached SCSI disk sd 1:0:0:0: [sdb] 312581808 512-byte h
Re: Raid-10 mount at startup always has problem
On Wed, 2007-10-24 at 07:22 -0700, Daniel L. Miller wrote: > Daniel L. Miller wrote: > > Richard Scobie wrote: > >> Daniel L. Miller wrote: > >> > >>> And you didn't ask, but my mdadm.conf: > >>> DEVICE partitions > >>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 > >>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a > >> > >> Try adding > >> > >> auto=part > >> > >> at the end of you mdadm.conf ARRAY line. > > Thanks - will see what happens on my next reboot. > > > Current mdadm.conf: > DEVICE partitions > ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 > UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part > > still have the problem where on boot one drive is not part of the > array. Is there a log file I can check to find out WHY a drive is not > being added? It usually means either the device is busy at the time the raid startup happened, or the device wasn't created by udev yet at the time the startup happened. It it failing to start the array properly in the initrd or is this happening after you've switched to the rootfs and are running the startup scripts? > md: md0 stopped. > md: md0 stopped. > md: bind > md: bind > md: bind Whole disk raid devices == bad. Lots of stuff can go wrong with that setup. > md: md0: raid array is not clean -- starting background reconstruction > raid10: raid set md0 active with 3 out of 4 devices > md: couldn't update array info. -22 > md: resync of RAID array md0 > md: minimum _guaranteed_ speed: 1000 KB/sec/disk. > md: using maximum available idle IO bandwidth (but not more than 20 > KB/sec) for resync. > md: using 128k window, over a total of 312581632 blocks. > Filesystem "md0": Disabling barriers, not supported by the underlying device > XFS mounting filesystem md0 > Starting XFS recovery on filesystem: md0 (logdev: internal) > Ending XFS recovery on filesystem: md0 (logdev: internal) > > > -- Doug Ledford <[EMAIL PROTECTED]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Richard Scobie wrote: Daniel L. Miller wrote: And you didn't ask, but my mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a Try adding auto=part at the end of you mdadm.conf ARRAY line. Thanks - will see what happens on my next reboot. Current mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part still have the problem where on boot one drive is not part of the array. Is there a log file I can check to find out WHY a drive is not being added? It's been a while since the reboot, but I did find some entries in dmesg - I'm appending both the md lines and the physical disk related lines. The bottom shows one disk not being added (this time is was sda) - and the disk that gets skipped on each boot seems to be random - there's no consistent failure: [...] md: raid10 personality registered for level 10 [...] md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. [...] scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 bmdma 0x00011410 irq 23 ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 bmdma 0x00011418 irq 23 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata1: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 1:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata2: bounce limit 0x, segment boundary 0x, hw segs 61 ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22 ACPI: PCI Interrupt :00:08.0[A] -> Link [LSI1] -> GSI 22 (level, high) -> IRQ 22 sata_nv :00:08.0: Using ADMA mode PCI: Setting latency timer of device :00:08.0 to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 bmdma 0x00011420 irq 22 ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 bmdma 0x00011428 irq 22 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133 ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32) ata4.00: configured for UDMA/133 scsi 2:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata3: bounce limit 0x, segment boundary 0x, hw segs 61 scsi 3:0:0:0: Direct-Access ATA ST3160811AS 3.AA PQ: 0 ANSI: 5 ata4: bounce limit 0x, segment boundary 0x, hw segs 61 sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: unknown partition table sd 0:0:0:0: [sda] Attached SCSI disk sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 1:0:0:0: [sdb] Attached SCSI disk sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB) sd 2:0:0:0: [sdc] Write Protect is off sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB) sd 2:0:0:0: [sdc] Write Protect is off sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: unknown partition table sd 2:0:0:0: [sdc] Attached SCSI disk sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB) sd 3:0:0:0: [sdd] Write Protect is off sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 s
Re: Raid-10 mount at startup always has problem
Richard Scobie wrote: Daniel L. Miller wrote: And you didn't ask, but my mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a Try adding auto=part at the end of you mdadm.conf ARRAY line. Thanks - will see what happens on my next reboot. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: And you didn't ask, but my mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a Hi Daniel, Try adding auto=part at the end of you mdadm.conf ARRAY line. Regards, Richard - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Bill Davidsen wrote: Daniel L. Miller wrote: Hi! I have a four-disk Raid-10 array that I created and mount with mdadm. It seems like every re-boot, either the array is not recognized altogether, or one of the disks is not added. Manually adding using mdadm works. What superblock version and partition type did you use? mdadm -D please. Thanks for the reply. I've been wondering why no one answered me - then discovered your answer in my mailbox! Must have been hiding somewhere . . . . Anyway - mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Oct 3 19:11:53 2006 Raid Level : raid10 Array Size : 312581632 (298.10 GiB 320.08 GB) Used Dev Size : 156290816 (149.05 GiB 160.04 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Sep 9 18:51:17 2007 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : near=2, far=1 Chunk Size : 32K UUID : 9d94b17b:f5fac31a:577c252b:0d4c4b2a Events : 0.10811466 Number Major Minor RaidDevice State 0 800 active sync /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd And you didn't ask, but my mdadm.conf: DEVICE partitions ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a Daniel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-10 mount at startup always has problem
Daniel L. Miller wrote: Hi! I have a four-disk Raid-10 array that I created and mount with mdadm. It seems like every re-boot, either the array is not recognized altogether, or one of the disks is not added. Manually adding using mdadm works. What superblock version and partition type did you use? mdadm -D please. Ubuntu, custom compiled kernel, 2.6.22 mdadm 2.6.2 Sata hard drives, nvidia CK804 controller - NOT using nvidia raid. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html