Re: mdadm 2.6.3 segfaults on assembly (v1 superblocks)
also sprach Neil Brown [EMAIL PROTECTED] [2007.09.24.0528 +0100]: Sure could. Thanks for the report. This patch (already in .git) should fix it. Apparently it does not, and it seems to be amd64-only since I saw it on amd64 and a bunch of people reported success on i386: http://bugs.debian.org/444682 Any help appreciated. I don't have an amd64 system around for another three weeks... -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] scientists will study your brain to learn more about your distant cousin, man. spamtraps: [EMAIL PROTECTED] digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
[solved] Bug#444682: mdadm segfault at super1.c:1004
also sprach martin f krafft [EMAIL PROTECTED] [2007.09.30.1234 +0100]: Oh well, I think this is an amd64-specific problem. Daniel, are you around today to debug this? Or anyone else with amd64? I don't have an amd64 machine around to test this for another three weeks, so I'd really appreciate if someone else stepped in. Okay, I did find one and I can reproduce. First thing to note: #535 0x0041f07c in load_super1 (st=0x634030, fd=8, sbp=0x7fff9f4fefd0, devname=0x0) at super1.c:1005 #536 0x0041f07c in load_super1 (st=0x634030, fd=8, sbp=0x7fff9f4fefd0, devname=0x0) at super1.c:1005 load_super1 apparently recurses infinitely. Looking at the code: static int load_super1(struct supertype *st, int fd, void **sbp, char *devname) { unsigned long long dsize; unsigned long long sb_offset; struct mdp_superblock_1 *super; int uuid[4]; struct bitmap_super_s *bsb; struct misc_dev_info *misc; if (st-ss == NULL || st-minor_version == -1) { int bestvers = -1; struct supertype tst; __u64 bestctime = 0; /* guess... choose latest ctime */ tst.ss = super1; for (tst.minor_version = 0; tst.minor_version = 2 ; tst.minor_version++) { switch(load_super1(st, fd, sbp, devname)) { I can't help but note that there is no way to break out of this loop if (st-ss == NULL || st-minor_version == -1) is true when it's called the first time. So it turns out that I think Neil simply forgot to replace the first argument by tst in commit a40b4fe, as the forthcoming patch does. -- .''`. martin f. krafft [EMAIL PROTECTED] : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systems because light travels faster than sound, some people appear to be intelligent, until you hear them speak. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix segfault on assembly on amd64 with v1 superblocks
Commit a40b4fe introduced a temporary supertype variable tst, instead of manipulating st directly. However, it was forgotton to pass tst into the recursive load_super1 call, causing an infinite recursion. Signed-off-by: martin f. krafft [EMAIL PROTECTED] --- super1.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/super1.c b/super1.c index 52783e7..06c2655 100644 --- a/super1.c +++ b/super1.c @@ -1001,7 +1001,7 @@ static int load_super1(struct supertype *st, int fd, void **sbp, char *devname) /* guess... choose latest ctime */ tst.ss = super1; for (tst.minor_version = 0; tst.minor_version = 2 ; tst.minor_version++) { - switch(load_super1(st, fd, sbp, devname)) { + switch(load_super1(tst, fd, sbp, devname)) { case 0: super = *sbp; if (bestvers == -1 || bestctime __le64_to_cpu(super-ctime)) { -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm 2.6.3 segfaults on assembly (v1 superblocks)
Hi, preparing the Debian package for mdadm 2.6.3, I found a segfault in mdadm/Assemble.c:254, in the line: } else if (tst-ss-load_super(tst,dfd, super, NULL)) { the problem is that tst-ss is NULL, due to reasons I have not yet uncovered. The segfault happens only in the second iteration of the for loop at line 212 and the load_super1 call, caused by the above load_super in the first iteration, causes tst-ss to be set to NULL. This happens in the first recursion (load_super1 calls itself), at which point the if (dsize 24) { check in super1.c:1033 fails and thus returns 1, which causes the outer load_super1 function to return 1 after setting st-ss to NULL in line super1.c:1013. This all happens while the dfd variable in Assemble.c:254 has value 8, and assuming this is a file descriptor, then lsof says: mdadm 25664 root8r BLK 22,3 2806 /dev/hdc3 /dev/hdc3 is an extended partition on the disk. /dev/hdc1 * 1 8 64228+ 83 Linux /dev/hdc2 9 132 996030 82 Linux swap / Solaris /dev/hdc3 133 30401 243135742+ 5 Extended /dev/hdc5 133 256 995998+ 83 Linux /dev/hdc6 257 505 261 83 Linux /dev/hdc7 506 28347 223640833+ 83 Linux /dev/hdc8 28348 3033916000708+ 83 Linux /dev/hdc9 30340 30401 497983+ 83 Linux I am failing to reproduce this on v0.9 superblock systems. Neil, could this be a bug? -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] nothing can cure the soul but the senses, just as nothing can cure the senses but the soul. -- oscar wilde spamtraps: [EMAIL PROTECTED] digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Re: removed disk md-device
also sprach Bernd Schubert [EMAIL PROTECTED] [2007.05.09.1417 +0200]: Problem-1) When the disk fails, udev will remove it from /dev. Unfortunately this will make it impossible to remove the disk or its partitions from /dev/mdX device, since mdadm tries to read the device fail and will abort if this file is not there. Please also see http://bugs.debian.org/416512. It would be nice if you could keep [EMAIL PROTECTED] on CC. mdadm upstream knows of the problem. See the bug log. -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] i worked myself up from nothing to a state of extreme poverty. -- groucho marx signature.asc Description: Digital signature (GPG/PGP)
Re: what does md do if it finds an inconsistency?
also sprach martin f krafft [EMAIL PROTECTED] [2007.05.06.0245 +0200]: With the check feature of the recent md feature, the question popped up what happens when an inconsistency is found. Does it fix it? If so, which disk it assumes to be wrong if an inconsistency is found? What I meant was of course echo repair sycn_action I am unsure what happens: piper:/sys/block/md7/md# cat mismatch_cnt 128 piper:/sys/block/md7/md# echo repair sync_action piper:/sys/block/md7/md# cat sync_action idle piper:/sys/block/md7/md# cat mismatch_cnt 128 If I do this again, then mismatch_cnt goes to 0. Not the first time. md7 : active raid10 sda2[0] sdc2[2] sdb2[1] 1373376 blocks 64K chunks 2 near-copies [3/3] [UUU] -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] the thought of suicide is a great consolation: by means of it one gets successfully through many a bad night. - friedrich nietzsche signature.asc Description: Digital signature (GPG/PGP)
Re: what does md do if it finds an inconsistency?
The first time it reports that it found (and repaired) 128 items. It does not mean that you now *have* 128 mismatches. The next run ('repair' or 'check') will find none (hopefully...) and report zero. Oh, this makes perfect sense, thanks for the explanation. As the mdadm maintainer for Debian, I would like to come up with a way to handle mismatches somewhat intelligently. I already have the check sync_action run once a month on all machines by default (can be turned on/off via debconf), and now I would like to find a good way to react when mismatch_count is non-zero. I don't want to write to the components without the admin's consent though. Maybe the ideal way would be to have mdadm --monitor send an email on mismatch_count0 or a cronjob that regularly sends reminders, until the admin logs in and runs e.g. /usr/share/mdadm/repairarray. Thoughts? Also, if a mismatch is found on a RAID1, how does md decide which copy is mismatched and which is correct? What about RAID 5/6/10? Thanks for your time! -martin -- Eyal Lebedinsky ([EMAIL PROTECTED]) http://samba.org/eyal/ attach .zip as .dat - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
what does md do if it finds an inconsistency?
Neil, With the check feature of the recent md feature, the question popped up what happens when an inconsistency is found. Does it fix it? If so, which disk it assumes to be wrong if an inconsistency is found? Cheers, -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] frank harris has been received in all the great houses -- once! -- oscar wilde signature.asc Description: Digital signature (GPG/PGP)
why not make everything partitionable?
Hi folks, you cannot create partitions within partitions, but you can well use whole disks for a filesystem without any partitions. Along the same lines, I wonder why md/mdadm distinguish between partitionable and non-partitionable in the first place. Why isn't everything partitionable? Thanks for any explanation(s)! -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] the reason that every major university maintains a department of mathematics is that it's cheaper than institutionalizing all those people. signature.asc Description: Digital signature (GPG/PGP)
Re: RAID5/10 chunk size and ext2/3 stride parameter
also sprach dean gaudet [EMAIL PROTECTED] [2006.11.03.2019 +0100]: I cannot find authoritative information about the relation between the RAID chunk size and the correct stride parameter to use when creating an ext2/3 filesystem. you know, it's interesting -- mkfs.xfs somehow gets the right sunit/swidth automatically from the underlying md device. i don't know enough about xfs to be able to agree or disagree with you on that. # mdadm --create --level=5 --raid-devices=4 --assume-clean --auto=yes /dev/md0 /dev/sd[abcd]1 mdadm: array /dev/md0 started. with 64k chunks i assume... # mkfs.xfs /dev/md0 meta-data=/dev/md0 isize=256agcount=32, agsize=9157232 blks = sectsz=4096 attr=0 data = bsize=4096 blocks=293031424, imaxpct=25 = sunit=16 swidth=48 blks, unwritten=1 sunit seems like the stride width i determined (64k chunks / 4k bzise), but what is swidth? Is it 64 * 3/4 because of the four device RAID5? # mdadm --create --level=10 --layout=f2 --raid-devices=4 --assume-clean --auto=yes /dev/md0 /dev/sd[abcd]1 mdadm: array /dev/md0 started. # mkfs.xfs -f /dev/md0 meta-data=/dev/md0 isize=256agcount=32, agsize=6104816 blks = sectsz=512 attr=0 data = bsize=4096 blocks=195354112, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 okay, so as before, 16 stride size and 64 stripe width, because we're now dealing with mirrors. # mdadm --create --level=10 --layout=n2 --raid-devices=4 --assume-clean --auto=yes /dev/md0 /dev/sd[abcd]1 mdadm: array /dev/md0 started. # mkfs.xfs -f /dev/md0 meta-data=/dev/md0 isize=256agcount=32, agsize=6104816 blks = sectsz=512 attr=0 data = bsize=4096 blocks=195354112, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 why not? in this case, -n2 and -f2 aren't any different, are they? in a near 2 layout i would expect sunit=16, swidth=32 ... but swidth=64 probably doesn't hurt. why? that's how i think it works -- i don't think ext[23] have a concept of stripe width like xfs does. they just want to know how to avoid putting all the critical data on one disk (which needs only the chunk size). but you should probably ask on the linux-ext4 mailing list. once i understand everything... -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] if you find a spelling mistake in the above, you get to keep it. signature.asc Description: Digital signature (GPG/PGP)
RAID5/10 chunk size and ext2/3 stride parameter
Hi, I cannot find authoritative information about the relation between the RAID chunk size and the correct stride parameter to use when creating an ext2/3 filesystem. My understanding is that (block * stride) == (chunk). So if I create a default RAID5/10 with 64k chunks, and create a filesystem with 4k blocks on it, I should choose stride 64k/4k = 16. Is the chunk size of an array equal to the stripe size? Or is it (n-1)*chunk size for RAID5 and (n/2)*chunk size for a plain near=2 RAID10? Also, I understand that it makes no sense to use stride for RAID1 as there are no stripes in that sense. But for RAID10 it makes sense, right? Thanks, -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] i like wagner's music better than anybody's. it is so loud that one can talk the whole time without other people hearing what one says. -- oscar wilde signature.asc Description: Digital signature (GPG/PGP)
why partition arrays?
As the Debian mdadm maintainer, I am often subjected to questions about partitionable arrays; people seem to want to use them in favour of normal arrays. I don't understand why. There's possibly an argument to be made about flexibility when it comes to resizing partitions within the array, but even most MD array types can be resized now. There's possibly an argument about saving space because of fewer sectors used/wasted with superblock information, but I am not going to buy that. Why would anyone want to create a partitionable array and put partitions in it, rather than creating separate arrays for each filesystem? Intuitively, this makes way more sense as then the partitions are independent of each other; one array can fail and the rest still works -- part of the reason why you partition in the first place. Would anyone help me answer this FAQ? (btw: [0] and [1] are obviously for public consumption; they are available under the terms of the artistic licence 2.0) 0. http://svn.debian.org/wsvn/pkg-mdadm/mdadm/trunk/debian/FAQ?op=filerev=0sc=0 1. http://svn.debian.org/wsvn/pkg-mdadm/mdadm/trunk/debian/README.recipes?op=filerev=0sc=0 -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] the liar at any rate recognises that recreation, not instruction, is the aim of conversation, and is a far more civilised being than the blockhead who loudly expresses his disbelief in a story which is told simply for the amusement of the company. -- oscar wilde signature.asc Description: Digital signature (GPG/PGP)
Re: why partition arrays?
also sprach Doug Ledford [EMAIL PROTECTED] [2006.10.18.1526 +0200]: There are a couple reasons I can think. Thanks for your elaborate response. If you don't mind, I shall link to it from the FAQ. I have one other question: do partitionable and traditional arrays actually differ in format? Put differently: can I assemble a traditional array as a partitionable one simply by specifying: mdadm --create ... /dev/md0 ... mdadm --stop /dev/md0 mdadm --assemble --auto=part ... /dev/md0 ... ? Or do the superblocks actually differ? Thanks, -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] the images rushed around his mind and tried to find somewhere to settle down and make sense. -- douglas adams, the hitchhiker's guide to the galaxy signature.asc Description: Digital signature (GPG/PGP)
avoiding the initial resync on --create
Hi all, I am looking at http://bugs.debian.org/251898 and wondering whether it is save to use --assume-clean (which prevents the initial resync) when creating RAID arrays from the Debian installer. Please also see the following discussion on IRC: madduck yeah, i am not sure --assume-clean is a good idea. peterS madduck: why not? I've tried to think of a reason it would fail for months, and so far I'm too stupid to think of one madduck even then madduck peterS: because it then assumes that it madduck it's clean, period. peterS yeah, so? peterS the blocks you have not written will have unreliable contents madduck in reality, the three components are not properly XORed peterS but why would you care about that? madduck hm. kinda true. peterS the blocks you _do_ write will be correct peterS even an uninitialised raid5 or raid6 seems like it would work perfectly well with --assume-clean Do you have any thoughts on the issue? If Debian were to --create its arrays with --assume-clean just before slapping a filesystem on them and installing the system, do you see any potential problems? -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] sometimes we sit and read other people's interpretations of our lyrics and think, 'hey, that's pretty good.' if we liked it, we would keep our mouths shut and just accept the credit as if it was what we meant all along. -- john lennon signature.asc Description: Digital signature (GPG/PGP)
converting RAID5 to RAID10
I have a 1.5Tb RAID5 machine (3*750Gb disks + 1 spare) and need to move some write-intensive services there. Unfortunately, the performance is unacceptable. Thus, I wanted to convert the machine to RAID10. My theory was: backup, remove the spare, set one disk faulty, remove it, create a degraded RAID10 on the two freed disks, copy data, kill RAID5, add disks to new RAID10. Unfortunately, mdadm (2.5.3) doesn't seem to agree; it complains that it cannot assemble a RAID10 with 4 devices when I ask it to: mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing I can kind of understand, but on the other hand I don't. After all, if you'll allow me to think in terms of 1+0 instead of 10 for a second, why doesn't mdadm just assemble /dev/sd[cd] as RAID0 and make the couple one of the two components of the RAID1? What I mean is: I could set up RAID1+0 that way; why doesn't it work for RAID10? Do you know of a way in which I could migrate the data to RAID10? Unfortunately, I do not have more 750Gb disks available nor a budget, and the 1.5Tb are 96% full. Cheers, -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] if a man treats life artistically, his brain is his heart. -- oscar wilde signature.asc Description: Digital signature (GPG/PGP)
Re: converting RAID5 to RAID10
also sprach Neil Brown [EMAIL PROTECTED] [2006.10.05.1214 +0200]: mdadm --create -l 10 -n 4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing Peter Samuelson of the Debian project already suggested this and it seems to work. Thanks a lot, Neil, for the quick and informative response. -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] the ships hung in the sky in much the same way that bricks don't. -- hitchhiker's guide to the galaxy signature.asc Description: Digital signature (GPG/PGP)
RAID10: near, far, offset -- which one?
I am trying to compare the three RADI10 layouts with each other. Assuming a simple 4 drive setup with 2 copies of each block, I understand that a near layout makes RAID10 resemble RAID1+0 (although it's not 1+0). I also understand that the far layout trades some read performance for some write performance, so it's best for read-intensive operations, like read-only file servers. I don't really understand the offset layout. Am I right in asserting that like near it keeps stripes together and thus requires less seeking, but stores the blocks at different offsets wrt the disks? If A,B,C are data blocks, a,b their parts, and 1,2 denote their copies, the following would be a classic RAID1+0 where 1,2 and 3,4 are RAID0 pairs combined into a RAID1: hdd1 Aa1 Ba1 Ca1 hdd2 Ab1 Bb1 Cb1 hdd3 Aa2 Ba2 Ca2 hdd4 Ab2 Bb2 Cb2 How would this look with the three different layouts? I think near is pretty much the same as above, but I can't figure out far and offset from the md(4) manpage. Also, what are their respective advantages and disadvantages? Thanks, -- martin; (greetings from the heart of the sun.) \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED] spamtraps: [EMAIL PROTECTED] a woman begins by resisting a man's advances and ends by blocking his retreat. -- oscar wilde signature.asc Description: Digital signature (GPG/PGP)