Re: block level vs. file level

2006-02-13 Thread PFC



This also raises another point, which is relevant for both cases - same  
exact models of hard disks have different number of cylinders, so if a  
RAID partition is created on a larger drive it cannot be mirrored to a  
smaller drive.


	I have a RAID5 with 5 250G drives, but some are 251 GiB (maxtors), some  
are 250.059 GiB (seagate)... say, if I started with 5 Seagates, I could  
later replace one of them with a Maxtor, but not the other way around, as  
the Seagate are just a tiny bit smaller.


cfdisk says :

sdb1  250994,42
sdc1  250056,74

	I suggest, when using software raid, to create partitions that are, say,  
100 megabytes or even a gigabyte smaller than the size of the drive. You  
lose a bit of space, but if you ever need to change one, you won't feel  
stupid with a brand new drive that you can't use because it's a few  
sectors too short.


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NVRAM support

2006-02-13 Thread Andy Smith
On Mon, Feb 13, 2006 at 10:22:04AM +0100, Erik Mouw wrote:
 On Fri, Feb 10, 2006 at 05:02:02PM -0800, dean gaudet wrote:
  it doesn't seem to make any sense at all to use a non-volatile external 
  memory for swap... swap has no purpose past a power outage.
 
 No, but it is a very fast swap device. Much faster than a hard drive.

Wouldn't the same amount of money be better spent on RAM then?

-- 
http://strugglers.net/wiki/Xen_hosting -- A Xen VPS hosting hobby
Encrypted mail welcome - keyid 0x604DE5DB


signature.asc
Description: Digital signature


Re: block level vs. file level

2006-02-13 Thread Andy Smith
On Mon, Feb 13, 2006 at 09:48:49AM +0100, PFC wrote:
   I suggest, when using software raid, to create partitions that are, 
   say,  100 megabytes or even a gigabyte smaller than the size of the 
 drive. 
 You  lose a bit of space, but if you ever need to change one, you won't 
 feel  stupid with a brand new drive that you can't use because it's a few  
 sectors too short.

After my previous experience what I tend to do now is set aside
about 2GB on each disk to use as components of a RAID-0 that I use
for scratch space (/tmp or whatever, anything that I don't care
about losing) while the machine is running.

That way if I end up by bad luck getting a slightly smaller
replacement drive then I can just do away with or shrink its RAID-0
component while keeping the other partitions the same, yet the space
is not *totally* wasted.

-- 
http://strugglers.net/wiki/Xen_hosting -- A Xen VPS hosting hobby
Encrypted mail welcome - keyid 0x604DE5DB


signature.asc
Description: Digital signature


RE: NVRAM support

2006-02-13 Thread Guy
Not the same amount!  Match the size of the NV RAM disk with RAM at a
fraction of the cost.  With the money saved, buy a computer for the kids.
:)

} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of Andy Smith
} Sent: Monday, February 13, 2006 6:55 AM
} To: linux-raid@vger.kernel.org
} Subject: Re: NVRAM support
} 
} On Mon, Feb 13, 2006 at 10:22:04AM +0100, Erik Mouw wrote:
}  On Fri, Feb 10, 2006 at 05:02:02PM -0800, dean gaudet wrote:
}   it doesn't seem to make any sense at all to use a non-volatile
} external
}   memory for swap... swap has no purpose past a power outage.
} 
}  No, but it is a very fast swap device. Much faster than a hard drive.
} 
} Wouldn't the same amount of money be better spent on RAM then?
} 
} --
} http://strugglers.net/wiki/Xen_hosting -- A Xen VPS hosting hobby
} Encrypted mail welcome - keyid 0x604DE5DB

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID 5 inaccessible - continued

2006-02-13 Thread Krekna Mektek
All right, this weekend I was able to use dd to create an imagefile
out of the disk.
I did the folowing:

dd conv=noerror if=dev/hdd1 of=/mnt/hdb1/Faulty-RAIDDisk.img
losetup /dev/loop0 /mnt/hdb1/Faulty-RAIDDisk.img

I edited the mdadm.conf, by replacing /dev/hdd1 for /dev/loop0.

But it did not work out (yet).

madm -E /dev/loop0
mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc,
got )


How can I continue best?

- mdadm -A --force /dev/md0

or

- can I restore the superblock from the hdd1 disk (which is still alive)

or

- can I configure mdadm.conf other than this:
 (/dev/hdc1 is spare, probably out of date)

DEVICE /dev/hdb1 /dev/hdc1 /dev/loop0
ARRAY /dev/md0 devices=/dev/hdb1,/dev/hdc1,/dev/loop0

or
- some other solution?

Krekna

2006/2/8, Krekna Mektek [EMAIL PROTECTED]:
 Hi,

 I found out that my storage drive was gone and I went to my server to
 check out what wrong.
 I've got 3 400GB disks wich form the array.

 I found out I had one spare and one faulty drive, and the RAID 5 array
 was not able to recover.
 After a reboot because of some stuff with Xen my main rootdisk (hda)
 was also failing, and the whole machine was not able to boot anymore.
 And there I was...
 After I tried to commit suicide and did not succeed, I went back to my
 server to try something out.
 I booted with Knoppix 4.02 and edited the mdadm.conf as follows:

 DEVICE /dev/hd[bcd]1
 ARRAY /dev/md0 devices=/dev/hdb1,/dev/hdc1,/dev/hdd1


 I executed mdrun and the following messages appeared:

 Forcing event count in /dev/hdd1(2) from 81190986 upto 88231796
 clearing FAULTY flag for device 2 in /dev/md0 for /dev/hdd1
 /dev/md0 has been started with 2 drives (out of 3) and 1 spare.

 So I thought I was lucky enough, to get back my data, maybe a bit lost
 concerning the event count which is missing some. Am I right?

 But, when I tried to mount it the next day, this was also not
 happening. I ended up with one faulty, one spare and one active. After
 stopping and starting the array sometimes the array was rebuilding
 again. I found out that the disk that it needs to rebuilt the array
 (hdd1 that is) is
 getting errors and falls back to faulty again.



 Number   Major   Minor   RaidDevice State
0   3   650  active sync
1   00-  removed
2  22   652  active sync

3  2211  spare rebuilding


 and then this:

 Rebuild Status : 1% complete

 Number   Major   Minor   RaidDevice State
0   3   650  active sync
1   00-  removed
2   00-  removed

3  2211  spare rebuilding
4  22   652  faulty

 And my dmesg is full of these errors coming from the faulty hdd:
 end_request: I/O error, dev hdd, sector 13614775
 hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=13615063,
 high=0, low=13615063, sector=13614783
 ide: failed opcode was: unknown
 end_request: I/O error, dev hdd, sector 13614783


 I guess this will never succeed...

 Is there away to get this data back from the individual disks perhaps?


 FYI:


 [EMAIL PROTECTED] cat /proc/mdstat
 Personalities : [raid5]
 md0 : active raid5 hdb1[0] hdc1[3] hdd1[4](F)
   781417472 blocks level 5, 64k chunk, algorithm 2 [3/1] [U__]
   []  recovery =  1.7% (6807460/390708736)
 finish=3626.9min speed=1764K/sec
 unused devices: none

 Krekna

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: array locking, possible?

2006-02-13 Thread Chris Osicki


Rick

On HP-UX disk mirroring is done in LVM. I'm using md driver for
mirroring and LVM on top of it.  Controlling access to my disks in LVM
is just too late. I would have to assemble the array before I can activate
VGs. If the array in question is being used on the other host nobody
can guarantee that bad thing wont happen. And what I would like to
prevent is: two hosts accessing (writing) an array.
Thanks anyway for the hint.

Regards,
Chris


On Thu, 9 Feb 2006 10:28:58 -0800
Stern, Rick (Serviceguard Linux) [EMAIL PROTECTED] wrote:

 There is more interest, just not vocal.
 
 May want to look at LVM2 and its ability to use tagging to control enablement 
 of VGs. This way it is not HW dependent.
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Osicki
 Sent: Thursday, February 09, 2006 2:26 AM
 To: linux-raid@vger.kernel.org
 Subject: Re: Question: array locking, possible?
 
 
 
 It looks like we are the only two md users interested in such a
 feature.
 Not enough to get Neil's attention ;-)
 
 Regards,
 Chris
 
 On Wed, 8 Feb 2006 21:45:33 +0100
 Jure Peèar [EMAIL PROTECTED] wrote:
 
  On Wed, 8 Feb 2006 11:55:49 +0100
  Chris Osicki [EMAIL PROTECTED] wrote:
  
   
   
   I was thinking about it, I have no idea how to do it on Linux if ever 
   possible.
   I connect over fibre channel SAN, using QLogic QLA2312 HBAS, if it 
   matters.
   
   Anyone any hints?
  
  I too am running a jbod with md raid between two machines. So far md never
  caused any kind of problems, altough I did have situations where both
  machines were syncing mirrors at once.
  
  If there's a little tool to reserve a disk via scsi, I'd like to know about
  it too. Even a piece of code would be enough.
  
  
  -- 
  
  Jure Peèar
  http://jure.pecar.org/
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Question: array locking, possible?

2006-02-13 Thread Stern, Rick (Serviceguard Linux)
I understand about HP-UX mirroring/LVM.

I was a little too obtuse.

LVM2 has a feature (not well advertised) that allows an VG to be tagged so it 
will not be activated by system b if it is already tagged as being in use by 
system a.  I was suggesting that a similar feature could be added to MD.  
This way a MD array could be marked as owned and, if so, mdadm would not 
activate it from another system.  This way all of the MD control is still 
within mdadm.

If Neil is interested, I'll try to dig up more info.

Regards,
Rick  

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Osicki
Sent: Monday, February 13, 2006 9:13 AM
To: linux-raid@vger.kernel.org
Subject: Re: Question: array locking, possible?



Rick

On HP-UX disk mirroring is done in LVM. I'm using md driver for
mirroring and LVM on top of it.  Controlling access to my disks in LVM
is just too late. I would have to assemble the array before I can activate
VGs. If the array in question is being used on the other host nobody
can guarantee that bad thing wont happen. And what I would like to
prevent is: two hosts accessing (writing) an array.
Thanks anyway for the hint.

Regards,
Chris


On Thu, 9 Feb 2006 10:28:58 -0800
Stern, Rick (Serviceguard Linux) [EMAIL PROTECTED] wrote:

 There is more interest, just not vocal.
 
 May want to look at LVM2 and its ability to use tagging to control enablement 
 of VGs. This way it is not HW dependent.
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Osicki
 Sent: Thursday, February 09, 2006 2:26 AM
 To: linux-raid@vger.kernel.org
 Subject: Re: Question: array locking, possible?
 
 
 
 It looks like we are the only two md users interested in such a
 feature.
 Not enough to get Neil's attention ;-)
 
 Regards,
 Chris
 
 On Wed, 8 Feb 2006 21:45:33 +0100
 Jure Peèar [EMAIL PROTECTED] wrote:
 
  On Wed, 8 Feb 2006 11:55:49 +0100
  Chris Osicki [EMAIL PROTECTED] wrote:
  
   
   
   I was thinking about it, I have no idea how to do it on Linux if ever 
   possible.
   I connect over fibre channel SAN, using QLogic QLA2312 HBAS, if it 
   matters.
   
   Anyone any hints?
  
  I too am running a jbod with md raid between two machines. So far md never
  caused any kind of problems, altough I did have situations where both
  machines were syncing mirrors at once.
  
  If there's a little tool to reserve a disk via scsi, I'd like to know about
  it too. Even a piece of code would be enough.
  
  
  -- 
  
  Jure Peèar
  http://jure.pecar.org/
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: array locking, possible?

2006-02-13 Thread Chris Osicki

Luca

On Thu, 9 Feb 2006 21:48:48 +0100
Luca Berra [EMAIL PROTECTED] wrote:

 On Thu, Feb 09, 2006 at 10:28:58AM -0800, Stern, Rick (Serviceguard Linux) 
 wrote:
 There is more interest, just not vocal.
 
 May want to look at LVM2 and its ability to use tagging to control 
 enablement of VGs. This way it is not HW dependent.
 
 I believe there is space in md1 superblock for a cluster/exclusive
 flag, if not the name field could be used

Great if there is space for it there is a hope.
Unfortunately I don't think my programming skills are up to
such a task as making proof-of-concept patches.

 what is missing is an interface between mdadm and cmcld so mdadm can ask
 cmcld permission to activate an array with the cluster/exclusive flag
 set.

For the time being we could live without it. I'm convinced HP would
make use of it once it's there.

And I wouldn't say mdadm should get permission from cmcld (for those
who don't know Service Guard cluster software from HP: cmcld is
the Cluster daemon). IMHO cmcld should clear the flag on the array
when initiating a fail-over in case the host which used it crashed.

Once again, what I would like it for is for preventing two hosts writing
the array at the same time because I accidentally activated it.
Without cmcld's awareness of the cluster/exclusive flag I would
always run mdadm with the '--force' option to enable the array during
package startup, because if I trust the cluster software I know the
fail-over is happening because the other node crashed or it is a
manual (clean) fail-over. 

We can discuss details of SG integration after Neil implemented this
flag. I can hope, you already found space for it ... ;-)

Regards,
Chris


 
 L.
 
 -- 
 Luca Berra -- [EMAIL PROTECTED]
 Communication Media  Services S.r.l.
  /\
  \ / ASCII RIBBON CAMPAIGN
   XAGAINST HTML MAIL
  / \
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: array locking, possible?

2006-02-13 Thread Chris Osicki

Rick

You must have missed my first posting, or maybe I was not clear enough.
We _are_ talking about the same thing.

Now we are already three or four thinking of it as a useful feature,
the pression on Neil is dramatically increasing ... ;-)

Regards,
Chris

On Mon, 13 Feb 2006 09:21:06 -0800
Stern, Rick (Serviceguard Linux) [EMAIL PROTECTED] wrote:

 I understand about HP-UX mirroring/LVM.
 
 I was a little too obtuse.
 
 LVM2 has a feature (not well advertised) that allows an VG to be tagged so it 
 will not be activated by system b if it is already tagged as being in use 
 by system a.  I was suggesting that a similar feature could be added to MD. 
  This way a MD array could be marked as owned and, if so, mdadm would not 
 activate it from another system.  This way all of the MD control is still 
 within mdadm.
 
 If Neil is interested, I'll try to dig up more info.
 
 Regards,
 Rick  
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Osicki
 Sent: Monday, February 13, 2006 9:13 AM
 To: linux-raid@vger.kernel.org
 Subject: Re: Question: array locking, possible?
 
 
 
 Rick
 
 On HP-UX disk mirroring is done in LVM. I'm using md driver for
 mirroring and LVM on top of it.  Controlling access to my disks in LVM
 is just too late. I would have to assemble the array before I can activate
 VGs. If the array in question is being used on the other host nobody
 can guarantee that bad thing wont happen. And what I would like to
 prevent is: two hosts accessing (writing) an array.
 Thanks anyway for the hint.
 
 Regards,
 Chris
 
 
 On Thu, 9 Feb 2006 10:28:58 -0800
 Stern, Rick (Serviceguard Linux) [EMAIL PROTECTED] wrote:
 
  There is more interest, just not vocal.
  
  May want to look at LVM2 and its ability to use tagging to control 
  enablement of VGs. This way it is not HW dependent.
  
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Osicki
  Sent: Thursday, February 09, 2006 2:26 AM
  To: linux-raid@vger.kernel.org
  Subject: Re: Question: array locking, possible?
  
  
  
  It looks like we are the only two md users interested in such a
  feature.
  Not enough to get Neil's attention ;-)
  
  Regards,
  Chris
  
  On Wed, 8 Feb 2006 21:45:33 +0100
  Jure Peèar [EMAIL PROTECTED] wrote:
  
   On Wed, 8 Feb 2006 11:55:49 +0100
   Chris Osicki [EMAIL PROTECTED] wrote:
   


I was thinking about it, I have no idea how to do it on Linux if ever 
possible.
I connect over fibre channel SAN, using QLogic QLA2312 HBAS, if it 
matters.

Anyone any hints?
   
   I too am running a jbod with md raid between two machines. So far md never
   caused any kind of problems, altough I did have situations where both
   machines were syncing mirrors at once.
   
   If there's a little tool to reserve a disk via scsi, I'd like to know 
   about
   it too. Even a piece of code would be enough.
   
   
   -- 
   
   Jure Peèar
   http://jure.pecar.org/
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: array locking, possible?

2006-02-13 Thread Luca Berra

On Mon, Feb 13, 2006 at 06:52:47PM +0100, Chris Osicki wrote:


Luca

On Thu, 9 Feb 2006 21:48:48 +0100
Luca Berra [EMAIL PROTECTED] wrote:


On Thu, Feb 09, 2006 at 10:28:58AM -0800, Stern, Rick (Serviceguard Linux) 
wrote:
There is more interest, just not vocal.

May want to look at LVM2 and its ability to use tagging to control enablement 
of VGs. This way it is not HW dependent.

I believe there is space in md1 superblock for a cluster/exclusive
flag, if not the name field could be used


Great if there is space for it there is a hope.
Unfortunately I don't think my programming skills are up to
such a task as making proof-of-concept patches.


i was thinking of adding a bit in the feature_map flags to enable this
kind of behaviour, the downside of it is that kernel space code has to
be updated to account for this flags, as it is for anything in the
superblock except for name.

Neil, what would you think of reserving some more space in the superblock for
other data which can be used from user-space?

i believe playing with name is a kludge.


what is missing is an interface between mdadm and cmcld so mdadm can ask
cmcld permission to activate an array with the cluster/exclusive flag
set.


For the time being we could live without it. I'm convinced HP would
make use of it once it's there.


i was thinking something like a socket based interface between mdadm and
a generic cluster daemon, non necessarily cmcld.


And I wouldn't say mdadm should get permission from cmcld (for those
who don't know Service Guard cluster software from HP: cmcld is
the Cluster daemon). IMHO cmcld should clear the flag on the array
when initiating a fail-over in case the host which used it crashed.

no, i don't like the flag to be cleared, there is too much space for a
race. The flag should be permanent (unless it is forcibly removed with
mdadm --grow).


Once again, what I would like it for is for preventing two hosts writing
the array at the same time because I accidentally activated it.
Without cmcld's awareness of the cluster/exclusive flag I would
always run mdadm with the '--force' option to enable the array during
package startup, because if I trust the cluster software I know the
fail-over is happening because the other node crashed or it is a
manual (clean) fail-over. 


if you only want this, it could be entirely implemented into mdadm, just
adding a exclusive flag to the ARRAY line in mdadm.conf
this is not foolproof, as it will only prevent mdadm -As from assembling
a device, providing the identification information on the command line
or running something like mdadm -Asc partitions, to fool it.


--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
diff -urN mdadm-2.3.1/Assemble.c mdadm-2.3.1.exclusive/Assemble.c
--- mdadm-2.3.1/Assemble.c  2006-01-25 08:01:10.0 +0100
+++ mdadm-2.3.1.exclusive/Assemble.c2006-02-13 22:48:04.0 +0100
@@ -34,7 +34,7 @@
 mddev_dev_t devlist,
 int readonly, int runstop,
 char *update,
-int verbose, int force)
+int verbose, int force, int exclusive)
 {
/*
 * The task of Assemble is to find a collection of
@@ -255,6 +255,15 @@
continue;
}
 
+   if (ident-exclusive != UnSet 
+   !exclusive ) {
+   if ((inargv  verbose = 0) || verbose  0)
+   fprintf(stderr, Name : %s can be activated in 
exclusive mode only.\n,
+   devname);
+   continue;
+   }
+
+
/* If we are this far, then we are commited to this device.
 * If the super_block doesn't exist, or doesn't match others,
 * then we cannot continue
diff -urN mdadm-2.3.1/ReadMe.c mdadm-2.3.1.exclusive/ReadMe.c
--- mdadm-2.3.1/ReadMe.c2006-02-06 05:09:35.0 +0100
+++ mdadm-2.3.1.exclusive/ReadMe.c  2006-02-13 22:27:26.0 +0100
@@ -147,6 +147,7 @@
 {scan,  0, 0, 's'},
 {force,0, 0, 'f'},
 {update,   1, 0, 'U'},
+{exclusive, 0, 0, 'x'},
 
 /* Management */
 {add,   0, 0, 'a'},
diff -urN mdadm-2.3.1/config.c mdadm-2.3.1.exclusive/config.c
--- mdadm-2.3.1/config.c2005-12-09 06:00:47.0 +0100
+++ mdadm-2.3.1.exclusive/config.c  2006-02-13 22:23:02.0 +0100
@@ -286,6 +286,7 @@
mis.st = NULL;
mis.bitmap_fd = -1;
mis.name[0] = 0;
+   mis.exclusive = 0;
 
for (w=dl_next(line); w!=line; w=dl_next(w)) {
if (w[0] == '/') {
@@ -386,6 +387,8 @@
fprintf(stderr, Name : auto type of 
\%s\ ignored for %s\n,
w+5, 
mis.devname?mis.devname:unlabeled-array);
}
+   } else if (strncasecmp(w, 

Re: Question: array locking, possible?

2006-02-13 Thread Luca Berra

On Mon, Feb 13, 2006 at 10:53:43PM +0100, Luca Berra wrote:


diff -urN mdadm-2.3.1/Assemble.c mdadm-2.3.1.exclusive/Assemble.c


please note that the patch was written while i was composing the email
as a proof-of-concept, it should not be considered working (or even
compiling code)

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 000 of 3] MD Acceleration and the ADMA interface: Introduction

2006-02-13 Thread Neil Brown
On Monday February 6, [EMAIL PROTECTED] wrote:
 On 2/5/06, Neil Brown [EMAIL PROTECTED] wrote:
  I've looked through the patches - not exhaustively, but hopefully
  enough to get a general idea of what is happening.
  There are some things I'm not clear on and some things that I could
  suggest alternates too...
 
 I have a few questions to check that I understand your suggestions.

(sorry for the delay).

 
   - Each ADMA client (e.g. a raid5 array) gets a dedicated adma thread
 to handle all its requests.  And it handles them all in series.  I
 wonder if this is really optimal.  If there are multiple adma
 engines, then a single client could only make use of one of them
 reliably.
 It would seem to make more sense to have just one thread - or maybe
 one per processor or one per adma engine - and have any ordering
 between requests made explicit in the interface.
 
 Actually as each processor could be seen as an ADMA engine, maybe
 you want one thread per processor AND one per engine.  If there are
 no engines, the per-processor threads run with high priority, else
 with low.
 
 ...so the engine thread would handle explicit client requested
 ordering constraints and then hand the operations off to per processor
 worker threads in the pio case or queue directly to hardware in the
 presence of such an engine.  In md_thread you talk about priority
 inversion deadlocks, do those same concerns apply here?

That comment in md.c about priority inversion deadlocks predates my
involvement - making it s last millennium...
I don't think it is relevant any more, and possibly never was.

I don't see any room for priority inversion here.

I probably wouldn't even have an 'engine thread'.  If I were to write
'md' today, it probably wouldn't have a dedicate thread but would use
'schedule_work' to arrange for code to be run in process-context.
The ADMA engine could do the same.
Note: I'm not saying this is the right way to go.  But I do think it
is worth exploring.

I'm not sure about threads for the 'pio' case.  It would probably be
easiest that way, but I would explore the 'schedule_work' family of
services first.

But yes, the ADMA engine would handle explicit client requested
ordering and arrange for work to be done somehow.

 
   - I have thought that the way md/raid5 currently does the
 'copy-to-buffer' and 'xor' in two separate operations may not be
 the best use of the memory bus.  If you could have a 3-address
 operation that read from A, stored into B, and xorred into C, then
 A would have to be read half as often.  Would such an interface
 make sense with ADMA?  I don't have sufficient knowledge of
 assemble to do it myself for the current 'xor' code.
 
 At the very least I can add a copy+xor command to ADMA, that way
 developers implementing engines can optimize for this case, if the
 hardware supports it, and the hand coded assembly guys can do their
 thing.
 
   - Your handling of highmem doesn't seem right.  You shouldn't kmap it
 until you have decided that you have to do the operation 'by hand'
 (i.e. in the cpu, not in the DMA engine).  If the dma engine can be
 used at all, kmap isn't needed at all.
 
 I made the assumption that if CONFIG_HIGHMEM is not set then the kmap
 call resolves to a simple page_address() call.  I think its ok, but
 it does look fishy so I will revise this code.  I also was looking to
 handle the case where the underlying hardware DMA engine does not
 support high memory addresses.

I think the only way to handle the ADMA engine not supporting high
memory is to do the operation 'polled' - i.e. in the CPU.
The alternative is to copy it to somewhere that the DMA engine can
reach, and if you are going to do that, you have done most of the work
already.
Possibly you could still gain by using the engine for RAID6
calculations, but not for copy, compare, or xor operations.

And if you are using the DMA engine, then you don't want the
page_address. You want to use pci_map_page (or similar?) to get a
dma_handle. 

 For example, one it has been decided to initiate a write (there is
 enough data to correctly update the parity block).  You need to
 perform a sequence of copies and xor operations, and then submit
 write requests.
 This is currently done by the copy/xor happening inline under the
 sh-lock spinlock, and then R5_WantWrite is set.  Then, out side
 the spinlock, if WantWrite is set generic_make_request is calls as
 appropriate.
 
 I would change this so that a sequence of descriptors was assembled
 which described that copies and xors.  Appropriate call-backs would
 be set so that the generic_make_request is called at the right time
 (after the copy, or after that last xor for the parity block).
 Then outside the sh-lock spinlock this sequence is passed to the
 ADMA manager.  If there is no ADMA engine present, everything is
 performed 

Lilo append= , A suggestion .

2006-02-13 Thread Mr. James W. Laferriere

Hello Neil  All ,
I'll bet I am going to get harassed over this , but ...

The present form (iirc) of the lilo append statement is

append=md=d0,/dev/sda,/dev/sdb

I am wondering how difficult the below would be to code ?
This allows a (relatively) short strings to be append'd
instead of the sometimes large listing of devices .

append=md=d0,UUID=e9e0f605:9ed694c2:3e2002c9:0415c080

Ok ,  I got my asbestos brithes on .  Have at it ;-) .
Tia ,  JimL
--
+--+
| James   W.   Laferriere | SystemTechniques | Give me VMS |
| NetworkEngineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| [EMAIL PROTECTED] | Billings , MT. 59105 |   only  on  AXP |
|  http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr   |
+--+
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lilo append= , A suggestion .

2006-02-13 Thread Neil Brown
On Monday February 13, [EMAIL PROTECTED] wrote:
   Hello Neil  All ,
   I'll bet I am going to get harassed over this , but ...
 
   The present form (iirc) of the lilo append statement is
 
   append=md=d0,/dev/sda,/dev/sdb
 
   I am wondering how difficult the below would be to code ?
   This allows a (relatively) short strings to be append'd
   instead of the sometimes large listing of devices .
 
   append=md=d0,UUID=e9e0f605:9ed694c2:3e2002c9:0415c080
 
   Ok ,  I got my asbestos brithes on .  Have at it ;-) .

This is just the job for an initramfs.  They are *really*easy* to
make, and very flexible.  mdadm-2.2 and later come with a little
script which (tested on Debian) makes a simple initramfs which
will recognise a kernel parameter (as passed by lilo's 'append')
like
rootuuid=97e58306:2c85fd85:2346b91e:aaca5fee

and will assemble the appropriate array a /dev/md_d0 and will then
mount a filesystem of there as root.  If it doesn't do exactly what
you want, it is fairly easy to modify.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lilo append= , A suggestion .

2006-02-13 Thread Luca Berra

On Mon, Feb 13, 2006 at 09:12:42PM -0700, Mr. James W. Laferriere wrote:

Hello Neil  All ,
I'll bet I am going to get harassed over this , but ...

The present form (iirc) of the lilo append statement is

append=md=d0,/dev/sda,/dev/sdb

I am wondering how difficult the below would be to code ?
This allows a (relatively) short strings to be append'd
instead of the sometimes large listing of devices .

append=md=d0,UUID=e9e0f605:9ed694c2:3e2002c9:0415c080

Ok ,  I got my asbestos brithes on .  Have at it ;-) .
Tia ,  JimL

what about all the past threads about in-kernel autodetection?

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html