Re: mdadm 2.6.3 segfaults on assembly (v1 superblocks)

2007-09-30 Thread martin f krafft
also sprach Neil Brown [EMAIL PROTECTED] [2007.09.24.0528 +0100]:
 Sure could.  Thanks for the report.
 
 This patch (already in .git) should fix it.

Apparently it does not, and it seems to be amd64-only since I saw it
on amd64 and a bunch of people reported success on i386:

  http://bugs.debian.org/444682

Any help appreciated. I don't have an amd64 system around for
another three weeks...

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
scientists will study your brain to learn
more about your distant cousin, man.
 
spamtraps: [EMAIL PROTECTED]


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


[solved] Bug#444682: mdadm segfault at super1.c:1004

2007-09-30 Thread martin f. krafft
also sprach martin f krafft [EMAIL PROTECTED] [2007.09.30.1234 +0100]:
 Oh well, I think this is an amd64-specific problem. Daniel, are you
 around today to debug this? Or anyone else with amd64? I don't have
 an amd64 machine around to test this for another three weeks, so I'd
 really appreciate if someone else stepped in.

Okay, I did find one and I can reproduce. First thing to note:

#535 0x0041f07c in load_super1 (st=0x634030, fd=8, sbp=0x7fff9f4fefd0, 
devname=0x0) at super1.c:1005
#536 0x0041f07c in load_super1 (st=0x634030, fd=8, sbp=0x7fff9f4fefd0, 
devname=0x0) at super1.c:1005

load_super1 apparently recurses infinitely. Looking at the code:

  static int load_super1(struct supertype *st, int fd, void **sbp, char 
*devname)
  {
  unsigned long long dsize;
  unsigned long long sb_offset;
  struct mdp_superblock_1 *super;
  int uuid[4];
  struct bitmap_super_s *bsb;
  struct misc_dev_info *misc;

  if (st-ss == NULL || st-minor_version == -1) {
  int bestvers = -1;
  struct supertype tst;
  __u64 bestctime = 0;
  /* guess... choose latest ctime */
  tst.ss = super1;
  for (tst.minor_version = 0; tst.minor_version = 2 ; 
tst.minor_version++) {
  switch(load_super1(st, fd, sbp, devname)) {

I can't help but note that there is no way to break out of this
loop if (st-ss == NULL || st-minor_version == -1) is true when
it's called the first time.

So it turns out that I think Neil simply forgot to replace the first
argument by tst in commit a40b4fe, as the forthcoming patch does.

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems
 
because light travels faster than sound,
some people appear to be intelligent,
until you hear them speak.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix segfault on assembly on amd64 with v1 superblocks

2007-09-30 Thread martin f. krafft
Commit a40b4fe introduced a temporary supertype variable tst, instead of
manipulating st directly. However, it was forgotton to pass tst into the
recursive load_super1 call, causing an infinite recursion.

Signed-off-by: martin f. krafft [EMAIL PROTECTED]
---
 super1.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/super1.c b/super1.c
index 52783e7..06c2655 100644
--- a/super1.c
+++ b/super1.c
@@ -1001,7 +1001,7 @@ static int load_super1(struct supertype *st, int fd, void 
**sbp, char *devname)
/* guess... choose latest ctime */
tst.ss = super1;
for (tst.minor_version = 0; tst.minor_version = 2 ; 
tst.minor_version++) {
-   switch(load_super1(st, fd, sbp, devname)) {
+   switch(load_super1(tst, fd, sbp, devname)) {
case 0: super = *sbp;
if (bestvers == -1 ||
bestctime  __le64_to_cpu(super-ctime)) {
-- 
1.5.3.1

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm 2.6.3 segfaults on assembly (v1 superblocks)

2007-09-07 Thread martin f krafft
Hi,

preparing the Debian package for mdadm 2.6.3, I found a segfault in
mdadm/Assemble.c:254, in the line:

  } else if (tst-ss-load_super(tst,dfd, super, NULL)) {

the problem is that tst-ss is NULL, due to reasons I have not yet
uncovered. The segfault happens only in the second iteration of the
for loop at line 212 and the load_super1 call, caused by the above
load_super in the first iteration, causes tst-ss to be set to NULL.

This happens in the first recursion (load_super1 calls itself), at
which point the

  if (dsize  24) {

check in super1.c:1033 fails and thus returns 1, which causes the
outer load_super1 function to return 1 after setting st-ss to NULL
in line super1.c:1013.

This all happens while the dfd variable in Assemble.c:254 has value
8, and assuming this is a file descriptor, then lsof says:

  mdadm 25664 root8r  BLK   22,3 2806 /dev/hdc3

/dev/hdc3 is an extended partition on the disk.

/dev/hdc1   *   1   8   64228+  83  Linux
/dev/hdc2   9 132  996030   82  Linux swap / Solaris
/dev/hdc3 133   30401   243135742+   5  Extended
/dev/hdc5 133 256  995998+  83  Linux
/dev/hdc6 257 505 261   83  Linux
/dev/hdc7 506   28347   223640833+  83  Linux
/dev/hdc8   28348   3033916000708+  83  Linux
/dev/hdc9   30340   30401  497983+  83  Linux

I am failing to reproduce this on v0.9 superblock systems.

Neil, could this be a bug?

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
nothing can cure the soul but the senses,
 just as nothing can cure the senses but the soul.
-- oscar wilde
 
spamtraps: [EMAIL PROTECTED]


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Re: removed disk md-device

2007-05-09 Thread martin f krafft
also sprach Bernd Schubert [EMAIL PROTECTED] [2007.05.09.1417 +0200]:
 Problem-1) When the disk fails, udev will remove it from /dev. Unfortunately 
 this will make it impossible to remove the disk or its partitions 
 from /dev/mdX device, since mdadm tries to read the device fail and will 
 abort if this file is not there.

Please also see http://bugs.debian.org/416512. It would be nice if
you could keep [EMAIL PROTECTED] on CC.

mdadm upstream knows of the problem. See the bug log.

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
i worked myself up from nothing to a state of extreme poverty.
   -- groucho marx


signature.asc
Description: Digital signature (GPG/PGP)


Re: what does md do if it finds an inconsistency?

2007-05-06 Thread martin f krafft
also sprach martin f krafft [EMAIL PROTECTED] [2007.05.06.0245 +0200]:
 With the check feature of the recent md feature, the question popped
 up what happens when an inconsistency is found. Does it fix it? If
 so, which disk it assumes to be wrong if an inconsistency is found?

What I meant was of course

  echo repair  sycn_action

I am unsure what happens:

  piper:/sys/block/md7/md# cat mismatch_cnt
  128
  piper:/sys/block/md7/md# echo repair  sync_action
  piper:/sys/block/md7/md# cat sync_action
  idle
  piper:/sys/block/md7/md# cat mismatch_cnt
  128 

If I do this again, then mismatch_cnt goes to 0. Not the first time.

md7 : active raid10 sda2[0] sdc2[2] sdb2[1]
  1373376 blocks 64K chunks 2 near-copies [3/3] [UUU]

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
the thought of suicide is a great consolation: by means of it one
 gets successfully through many a bad night.
 - friedrich nietzsche


signature.asc
Description: Digital signature (GPG/PGP)


Re: what does md do if it finds an inconsistency?

2007-05-06 Thread martin f krafft
 The first time it reports that it found (and repaired) 128 items.
 It does not mean that you now *have* 128 mismatches.

 The next run ('repair' or 'check') will find none (hopefully...)
 and report zero.

Oh, this makes perfect sense, thanks for the explanation.

As the mdadm maintainer for Debian, I would like to come up with a way to
handle mismatches somewhat intelligently. I already have the check
sync_action run once a month on all machines by default (can be turned
on/off via debconf), and now I would like to find a good way to react when
mismatch_count is non-zero. I don't want to write to the components
without the admin's consent though.

Maybe the ideal way would be to have mdadm --monitor send an email on
mismatch_count0 or a cronjob that regularly sends reminders, until the
admin logs in and runs e.g. /usr/share/mdadm/repairarray.

Thoughts?

Also, if a mismatch is found on a RAID1, how does md decide which copy is
mismatched and which is correct? What about RAID 5/6/10?

Thanks for your time!
-martin


 --
 Eyal Lebedinsky ([EMAIL PROTECTED]) http://samba.org/eyal/
   attach .zip as .dat



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


what does md do if it finds an inconsistency?

2007-05-05 Thread martin f krafft
Neil,

With the check feature of the recent md feature, the question popped
up what happens when an inconsistency is found. Does it fix it? If
so, which disk it assumes to be wrong if an inconsistency is found?

Cheers,

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
frank harris has been received
 in all the great houses -- once!
-- oscar wilde


signature.asc
Description: Digital signature (GPG/PGP)


why not make everything partitionable?

2006-11-15 Thread martin f krafft
Hi folks,

you cannot create partitions within partitions, but you can well use
whole disks for a filesystem without any partitions.

Along the same lines, I wonder why md/mdadm distinguish between
partitionable and non-partitionable in the first place. Why isn't
everything partitionable?

Thanks for any explanation(s)!

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
the reason that every major university
maintains a department of mathematics
is that it's cheaper than
institutionalizing all those people.


signature.asc
Description: Digital signature (GPG/PGP)


Re: RAID5/10 chunk size and ext2/3 stride parameter

2006-11-04 Thread martin f krafft
also sprach dean gaudet [EMAIL PROTECTED] [2006.11.03.2019 +0100]:
  I cannot find authoritative information about the relation between
  the RAID chunk size and the correct stride parameter to use when
  creating an ext2/3 filesystem.
 
 you know, it's interesting -- mkfs.xfs somehow gets the right sunit/swidth 
 automatically from the underlying md device.

i don't know enough about xfs to be able to agree or disagree with
you on that.

 # mdadm --create --level=5 --raid-devices=4 --assume-clean --auto=yes 
 /dev/md0 /dev/sd[abcd]1
 mdadm: array /dev/md0 started.

with 64k chunks i assume...

 # mkfs.xfs /dev/md0
 meta-data=/dev/md0   isize=256agcount=32, agsize=9157232 
 blks
  =   sectsz=4096  attr=0
 data =   bsize=4096   blocks=293031424, imaxpct=25
  =   sunit=16 swidth=48 blks, unwritten=1

sunit seems like the stride width i determined (64k chunks / 4k
bzise), but what is swidth? Is it 64 * 3/4 because of the four
device RAID5?

 # mdadm --create --level=10 --layout=f2 --raid-devices=4 --assume-clean 
 --auto=yes /dev/md0 /dev/sd[abcd]1
 mdadm: array /dev/md0 started.
 # mkfs.xfs -f /dev/md0
 meta-data=/dev/md0   isize=256agcount=32, agsize=6104816 blks
  =   sectsz=512   attr=0
 data =   bsize=4096   blocks=195354112, imaxpct=25
  =   sunit=16 swidth=64 blks, unwritten=1

okay, so as before, 16 stride size and 64 stripe width, because
we're now dealing with mirrors.

 # mdadm --create --level=10 --layout=n2 --raid-devices=4 --assume-clean 
 --auto=yes /dev/md0 /dev/sd[abcd]1
 mdadm: array /dev/md0 started.
 # mkfs.xfs -f /dev/md0
 meta-data=/dev/md0   isize=256agcount=32, agsize=6104816 blks
  =   sectsz=512   attr=0
 data =   bsize=4096   blocks=195354112, imaxpct=25
  =   sunit=16 swidth=64 blks, unwritten=1

why not? in this case, -n2 and -f2 aren't any different, are they?

 in a near 2 layout i would expect sunit=16, swidth=32 ...  but swidth=64
 probably doesn't hurt.

why?

 that's how i think it works -- i don't think ext[23] have a concept of stripe
 width like xfs does.  they just want to know how to avoid putting all the
 critical data on one disk (which needs only the chunk size).  but you should
 probably ask on the linux-ext4 mailing list.

once i understand everything...

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
if you find a spelling mistake in the above, you get to keep it.


signature.asc
Description: Digital signature (GPG/PGP)


RAID5/10 chunk size and ext2/3 stride parameter

2006-10-24 Thread martin f krafft
Hi,

I cannot find authoritative information about the relation between
the RAID chunk size and the correct stride parameter to use when
creating an ext2/3 filesystem.

My understanding is that (block * stride) == (chunk). So if I create
a default RAID5/10 with 64k chunks, and create a filesystem with 4k
blocks on it, I should choose stride 64k/4k = 16.

Is the chunk size of an array equal to the stripe size? Or is it
(n-1)*chunk size for RAID5 and (n/2)*chunk size for a plain near=2
RAID10?

Also, I understand that it makes no sense to use stride for RAID1 as
there are no stripes in that sense. But for RAID10 it makes sense,
right?

Thanks,

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
i like wagner's music better than anybody's. it is so loud that one
 can talk the whole time without other people hearing what one says.
-- oscar wilde


signature.asc
Description: Digital signature (GPG/PGP)


why partition arrays?

2006-10-18 Thread martin f krafft
As the Debian mdadm maintainer, I am often subjected to questions
about partitionable arrays; people seem to want to use them in
favour of normal arrays. I don't understand why.

There's possibly an argument to be made about flexibility when it
comes to resizing partitions within the array, but even most MD
array types can be resized now.

There's possibly an argument about saving space because of fewer
sectors used/wasted with superblock information, but I am not going
to buy that.

Why would anyone want to create a partitionable array and put
partitions in it, rather than creating separate arrays for each
filesystem? Intuitively, this makes way more sense as then the
partitions are independent of each other; one array can fail and the
rest still works -- part of the reason why you partition in the
first place.

Would anyone help me answer this FAQ?

(btw: [0] and [1] are obviously for public consumption; they are
available under the terms of the artistic licence 2.0)

0. 
http://svn.debian.org/wsvn/pkg-mdadm/mdadm/trunk/debian/FAQ?op=filerev=0sc=0
1. 
http://svn.debian.org/wsvn/pkg-mdadm/mdadm/trunk/debian/README.recipes?op=filerev=0sc=0

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
the liar at any rate recognises that recreation, not instruction, is
 the aim of conversation, and is a far more civilised being than the
 blockhead who loudly expresses his disbelief in a story which is told
 simply for the amusement of the company.
-- oscar wilde


signature.asc
Description: Digital signature (GPG/PGP)


Re: why partition arrays?

2006-10-18 Thread martin f krafft
also sprach Doug Ledford [EMAIL PROTECTED] [2006.10.18.1526 +0200]:
 There are a couple reasons I can think.

Thanks for your elaborate response. If you don't mind, I shall link
to it from the FAQ.

I have one other question: do partitionable and traditional arrays
actually differ in format? Put differently: can I assemble
a traditional array as a partitionable one simply by specifying:

  mdadm --create ... /dev/md0 ...
  mdadm --stop /dev/md0
  mdadm --assemble --auto=part ... /dev/md0 ...

? Or do the superblocks actually differ?

Thanks,

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
the images rushed around his mind and tried
to find somewhere to settle down and make sense.
-- douglas adams, the hitchhiker's guide to the galaxy


signature.asc
Description: Digital signature (GPG/PGP)


avoiding the initial resync on --create

2006-10-09 Thread martin f krafft
Hi all,

I am looking at http://bugs.debian.org/251898 and wondering whether
it is save to use --assume-clean (which prevents the initial resync)
when creating RAID arrays from the Debian installer.

Please also see the following discussion on IRC:

 madduck yeah, i am not sure --assume-clean is a good idea.
 peterS madduck: why not?  I've tried to think of a reason it
  would fail for months, and so far I'm too stupid to think of one
 madduck even then
 madduck peterS: because it then assumes that it
 madduck it's clean, period.
 peterS yeah, so?
 peterS the blocks you have not written will have unreliable
  contents
 madduck in reality, the three components are not properly XORed
 peterS but why would you care about that?
 madduck hm. kinda true.
 peterS the blocks you _do_ write will be correct
 peterS even an uninitialised raid5 or raid6 seems like it would
  work perfectly well with --assume-clean

Do you have any thoughts on the issue? If Debian were to --create
its arrays with --assume-clean just before slapping a filesystem on
them and installing the system, do you see any potential problems?

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
sometimes we sit and read other people's interpretations of our
 lyrics and think, 'hey, that's pretty good.' if we liked it, we would
 keep our mouths shut and just accept the credit as if it was what we
 meant all along.
-- john lennon


signature.asc
Description: Digital signature (GPG/PGP)


converting RAID5 to RAID10

2006-10-05 Thread martin f krafft
I have a 1.5Tb RAID5 machine (3*750Gb disks + 1 spare) and need to
move some write-intensive services there. Unfortunately, the
performance is unacceptable. Thus, I wanted to convert the machine
to RAID10.

My theory was: backup, remove the spare, set one disk faulty, remove
it, create a degraded RAID10 on the two freed disks, copy data, kill
RAID5, add disks to new RAID10.

Unfortunately, mdadm (2.5.3) doesn't seem to agree; it complains
that it cannot assemble a RAID10 with 4 devices when I ask it to:

  mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing

I can kind of understand, but on the other hand I don't. After all,
if you'll allow me to think in terms of 1+0 instead of 10 for
a second, why doesn't mdadm just assemble /dev/sd[cd] as RAID0 and
make the couple one of the two components of the RAID1? What I mean
is: I could set up RAID1+0 that way; why doesn't it work for RAID10?

Do you know of a way in which I could migrate the data to RAID10?
Unfortunately, I do not have more 750Gb disks available nor
a budget, and the 1.5Tb are 96% full.

Cheers,

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
if a man treats life artistically, his brain is his heart.
-- oscar wilde


signature.asc
Description: Digital signature (GPG/PGP)


Re: converting RAID5 to RAID10

2006-10-05 Thread martin f krafft
also sprach Neil Brown [EMAIL PROTECTED] [2006.10.05.1214 +0200]:
 mdadm --create -l 10 -n 4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing

Peter Samuelson of the Debian project already suggested this and it
seems to work.

Thanks a lot, Neil, for the quick and informative response.

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
the ships hung in the sky in much the same way that bricks don't.
 -- hitchhiker's guide to the galaxy


signature.asc
Description: Digital signature (GPG/PGP)


RAID10: near, far, offset -- which one?

2006-10-05 Thread martin f krafft
I am trying to compare the three RADI10 layouts with each other.
Assuming a simple 4 drive setup with 2 copies of each block,
I understand that a near layout makes RAID10 resemble RAID1+0
(although it's not 1+0).

I also understand that the far layout trades some read performance
for some write performance, so it's best for read-intensive
operations, like read-only file servers.

I don't really understand the offset layout. Am I right in
asserting that like near it keeps stripes together and thus
requires less seeking, but stores the blocks at different offsets
wrt the disks?

If A,B,C are data blocks, a,b their parts, and 1,2 denote their
copies, the following would be a classic RAID1+0 where 1,2 and 3,4
are RAID0 pairs combined into a RAID1:

  hdd1  Aa1 Ba1 Ca1
  hdd2  Ab1 Bb1 Cb1
  hdd3  Aa2 Ba2 Ca2
  hdd4  Ab2 Bb2 Cb2

How would this look with the three different layouts? I think near
is pretty much the same as above, but I can't figure out far and
offset from the md(4) manpage.

Also, what are their respective advantages and disadvantages?

Thanks,

-- 
martin;  (greetings from the heart of the sun.)
  \ echo mailto: !#^.*|tr * mailto:; [EMAIL PROTECTED]
 
spamtraps: [EMAIL PROTECTED]
 
a woman begins by resisting a man's advances and ends by blocking
 his retreat.
-- oscar wilde


signature.asc
Description: Digital signature (GPG/PGP)