Re: Behavior of mdadm depending on user

2007-07-03 Thread Ian Dall
On Mon, 2007-07-02 at 21:10 -0500, Michael Schwarz wrote:
 This ia just a couple of quick questions.
 
 I'm charged with developing a prototype application that will assemble and
 mount a hot-swapped drive array, mount it, transfer files to it, unmount it,
 and stop the array. And it is an application delivered by a local webserver
 (don't ask).
 
 I don't want to do any of the incredibly stupid acts of making madadm and
 mount/umount setuid root, nor do I want to run the webserver as root.
 
 Instead, I took the slightly less stupid approach of invoking madadm and
 mount/umount with a hardcoded C application that is setuid root. (We can
 debate the stupidity of this -- I know it isn't best, but it is fast and less
 stupid than the alternatives presented above).

This isn't really an answer to your question, but isn't this an ideal
application for sudo? Make a shell script with the mdadm command(s) you
want. And set it up so apache or whatever your web server runs as able
to run your shell script as root without authentication. 

Ian
-- 
Ian Dall [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.

2007-07-03 Thread Ian Dall
There doesn't seem to be any designated place to send bug reports and
feature requests to mdadm, so I hope I am doing the right thing by
sending it here.

I have a small patch to mdamd which allows the write-behind amount to be
set a array grow time (instead of currently only at grow or create
time). I have tested this fairly extensively on some arrays built out of
loop back devices, and once on a real live array. I haven't lot any data
and it seems to work OK, though it is possible I am missing something.

--- mdadm-2.6.1/mdadm.c.writebehind 2006-12-21 16:12:50.0 +1030
+++ mdadm-2.6.1/mdadm.c 2007-06-30 13:16:22.0 +0930
@@ -827,6 +827,7 @@
bitmap_chunk = bitmap_chunk ? bitmap_chunk * 1024 : 512;
continue;
 
+   case O(GROW, WriteBehind):
case O(BUILD, WriteBehind):
case O(CREATE, WriteBehind): /* write-behind mode */
write_behind = DEFAULT_MAX_WRITE_BEHIND;


-- 
Ian Dall [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.

2007-07-03 Thread David Greaves

Ian Dall wrote:

There doesn't seem to be any designated place to send bug reports and
feature requests to mdadm, so I hope I am doing the right thing by
sending it here.

I have a small patch to mdamd which allows the write-behind amount to be
set a array grow time (instead of currently only at grow or create
time). I have tested this fairly extensively on some arrays built out of
loop back devices, and once on a real live array. I haven't lot any data
and it seems to work OK, though it is possible I am missing something.


Sounds like a useful feature...

Did you test the bitmap cases you mentioned?

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID is really RAID?

2007-07-03 Thread Mark Lord

Johny Mail list wrote:

2007/7/3, Tejun Heo [EMAIL PROTECTED]:

Brad Campbell wrote:
 Johny Mail list wrote:
 Hello list,
 I have a little question about software RAID on Linux.
 I have installed Software Raid on all my SC1425 servers DELL by
 believing that the md raid was a strong driver.
 And recently i make some test on a server and try to view if the RAID
 hard drive power failure work fine, so i power up my server and after
 booting and the prompt appear I disconnected the power cable of my
 SATA hard drive. Normaly the MD should eleminate the failure hard
 drive of the logical drive it build, and the server continue to work
 fine like nothing happen. Oddly the server stop to respond and i get
 this messages :
 ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)

 ata4: port is slow to respond, please be patient (Status 0xd0)
 ata4: port failed to respond (30sec, Status 0xd0)
 ata4: soft resetting port

 After that my system is frozen.

How hard is it frozen?  Can you blink the Numlock LED?


I believe he said it was ICH5 (different post/thread).

My observation on ICH5 is that if one unplugs a drive,
then the chipset/cpu locks up hard when toggling SRST
in the EH code.

Specifically, it locks up at the instruction
which restores SRST back to the non-asserted state,
which likely corresponds to the chipset finally actually
sending a FIS to the drive.

A hard(ware) lockup, not software.
That's why Intel says ICH5 doesn't do hotplug.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/2] raid5: 65% sequential-write performance improvement, stripe-queue take2

2007-07-03 Thread Dan Williams
The first take of the stripe-queue implementation[1] had a performance
limiting bug in __wait_for_inactive_queue.  Fixing that issue
drastically changed the performance characteristics.  The following data
from tiobench shows the relative performance difference of the
stripe-queue patchset.

Unit information

File size = megabytes
Blk Size  = bytes
Num Thr   = number of threads
Avg Rate  = relative throughput
CPU%  = relative percentage of CPU used during the test
CPU Eff   = Rate divided by CPU% - relative throughput per cpu load

Configuration
=
Platform: 1200Mhz iop348 with 4-disk sata_vsc array
mdadm --create /dev/md0 /dev/sd[abcd] -n 4 -l 5
mkfs.ext2 /dev/md0
mount /dev/md0 /mnt/raid
tiobench --size 2048 --numruns 5 --block 4096 --block 131072 --dir /mnt/raid

Sequential Reads
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-rc7-iop1 204840961   0%  4%  -3%
2.6.22-rc7-iop1 204840962   -38%-33%-8%
2.6.22-rc7-iop1 204840964   -35%-30%-8%
2.6.22-rc7-iop1 204840968   -14%-11%-3%
2.6.22-rc7-iop1 204813107   1   2%  1%  2%
2.6.22-rc7-iop1 204813107   2   -11%-10%-2%
2.6.22-rc7-iop1 204813107   4   -7% -6% -1%
2.6.22-rc7-iop1 204813107   8   -9% -6% -4%

Random  Reads
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-rc7-iop1 204840961   -9% 15% -21%
2.6.22-rc7-iop1 204840962   -1% -30%42%
2.6.22-rc7-iop1 204840964   -14%-22%10%
2.6.22-rc7-iop1 204840968   -21%-28%9%
2.6.22-rc7-iop1 204813107   1   -8% -4% -4%
2.6.22-rc7-iop1 204813107   2   -13%-13%0%
2.6.22-rc7-iop1 204813107   4   -15%-15%0%
2.6.22-rc7-iop1 204813107   8   -13%-13%0%

Sequential Writes
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-rc7-iop1 204840961   25% 11% 12%
2.6.22-rc7-iop1 204840962   41% 42% -1%
2.6.22-rc7-iop1 204840964   40% 18% 19%
2.6.22-rc7-iop1 204840968   15% -5% 21%
2.6.22-rc7-iop1 204813107   1   65% 57% 4%
2.6.22-rc7-iop1 204813107   2   46% 36% 8%
2.6.22-rc7-iop1 204813107   4   24% -7% 34%
2.6.22-rc7-iop1 204813107   8   28% -15%51%

Random  Writes
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-rc7-iop1 204840961   2%  -8% 11%
2.6.22-rc7-iop1 204840962   -1% -19%21%
2.6.22-rc7-iop1 204840964   2%  2%  0%
2.6.22-rc7-iop1 204840968   -1% -28%37%
2.6.22-rc7-iop1 204813107   1   2%  -3% 5%
2.6.22-rc7-iop1 204813107   2   3%  -4% 7%
2.6.22-rc7-iop1 204813107   4   4%  -3% 8%
2.6.22-rc7-iop1 204813107   8   5%  -9% 15%

The write performance numbers are better than I expected and would seem
to address the concerns raised in the thread Odd (slow) RAID
performance[2].  The read performance drop was not expected.  However,
the numbers suggest some additional changes to be made to the queuing
model.  Where read performance is dropping there appears to be an equal
drop in CPU utilization, which seems to suggest that pure read requests
be handled immediately without a trip to the the stripe-queue workqueue.

Although it is not shown in the above data, another positive aspect is that
increasing the cache size past a certain point causes the write performance
gains to erode.  In other words negative returns in contrast to diminishing
returns.  The stripe-queue can only carry out optimizations while the cache is
busy.  When the cache is large requests can be handled without waiting, and
performance approaches the original 1:1 (queue-to-stripe-head) model.  CPU
speed dictates the maximum effective cache size.  Once the CPU can no longer
keep the stripe-queue saturated performance falls off from the peak.  This is
a positive change because it shows that the new queuing model can produce higher
performance with less resources, but it does require more care when changing
'stripe_cache_size.'  The above numbers were taken with the default cache size
of 256.

Changes since take1:
* separate write and overwrite in the io_weight fields, i.e. an overwrite
  no longer implies a write
* rename 

Re: Linux Software RAID is really RAID?

2007-07-03 Thread Tejun Heo
Mark Lord wrote:
 I believe he said it was ICH5 (different post/thread).
 
 My observation on ICH5 is that if one unplugs a drive,
 then the chipset/cpu locks up hard when toggling SRST
 in the EH code.
 
 Specifically, it locks up at the instruction
 which restores SRST back to the non-asserted state,
 which likely corresponds to the chipset finally actually
 sending a FIS to the drive.
 
 A hard(ware) lockup, not software.
 That's why Intel says ICH5 doesn't do hotplug.

OIC.  I don't think there's much left to do from the driver side then.
Or is there any workaround?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.

2007-07-03 Thread Ian Dall
On Tue, 2007-07-03 at 15:03 +0100, David Greaves wrote:
 Ian Dall wrote:
  There doesn't seem to be any designated place to send bug reports and
  feature requests to mdadm, so I hope I am doing the right thing by
  sending it here.
  
  I have a small patch to mdamd which allows the write-behind amount to be
  set a array grow time (instead of currently only at grow or create
  time). I have tested this fairly extensively on some arrays built out of
  loop back devices, and once on a real live array. I haven't lot any data
  and it seems to work OK, though it is possible I am missing something.
 
 Sounds like a useful feature...
 
 Did you test the bitmap cases you mentioned?

Yes. And I can use mdadm -X to see that the write behind parameter is
set in the superblock. I don't know any way to monitor how much the
write behind feature is being used though.

My motivation was for doing this was to enable me to experiment to see
how effective it is. Currently I have a Raid 0 array across 3 very fast
(15k rpm) scsi disks. This array is mirrored by a single large vanilla
ata (7.2k rpm) disk. I figure that the read performance of the
combination is basically the read performance of the Raid 0, and the
sustained write performance is basically that of the ata disk, which is
about 6:1 read to write speed. I also see typically about 6 times the
read traffic to write traffic. So I figure it should be close to optimal
IF the bursts of write activity are not too long.

Does anyone know how I can monitor the number of pending writes? Where
are these queued? Are they simply stuck on the block device queue (and I
could see with iostat) or does the md device maintain its own special
queue for this?


Ian
-- 
Ian Dall [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html