Re: Behavior of mdadm depending on user
On Mon, 2007-07-02 at 21:10 -0500, Michael Schwarz wrote: This ia just a couple of quick questions. I'm charged with developing a prototype application that will assemble and mount a hot-swapped drive array, mount it, transfer files to it, unmount it, and stop the array. And it is an application delivered by a local webserver (don't ask). I don't want to do any of the incredibly stupid acts of making madadm and mount/umount setuid root, nor do I want to run the webserver as root. Instead, I took the slightly less stupid approach of invoking madadm and mount/umount with a hardcoded C application that is setuid root. (We can debate the stupidity of this -- I know it isn't best, but it is fast and less stupid than the alternatives presented above). This isn't really an answer to your question, but isn't this an ideal application for sudo? Make a shell script with the mdadm command(s) you want. And set it up so apache or whatever your web server runs as able to run your shell script as root without authentication. Ian -- Ian Dall [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.
There doesn't seem to be any designated place to send bug reports and feature requests to mdadm, so I hope I am doing the right thing by sending it here. I have a small patch to mdamd which allows the write-behind amount to be set a array grow time (instead of currently only at grow or create time). I have tested this fairly extensively on some arrays built out of loop back devices, and once on a real live array. I haven't lot any data and it seems to work OK, though it is possible I am missing something. --- mdadm-2.6.1/mdadm.c.writebehind 2006-12-21 16:12:50.0 +1030 +++ mdadm-2.6.1/mdadm.c 2007-06-30 13:16:22.0 +0930 @@ -827,6 +827,7 @@ bitmap_chunk = bitmap_chunk ? bitmap_chunk * 1024 : 512; continue; + case O(GROW, WriteBehind): case O(BUILD, WriteBehind): case O(CREATE, WriteBehind): /* write-behind mode */ write_behind = DEFAULT_MAX_WRITE_BEHIND; -- Ian Dall [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.
Ian Dall wrote: There doesn't seem to be any designated place to send bug reports and feature requests to mdadm, so I hope I am doing the right thing by sending it here. I have a small patch to mdamd which allows the write-behind amount to be set a array grow time (instead of currently only at grow or create time). I have tested this fairly extensively on some arrays built out of loop back devices, and once on a real live array. I haven't lot any data and it seems to work OK, though it is possible I am missing something. Sounds like a useful feature... Did you test the bitmap cases you mentioned? David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux Software RAID is really RAID?
Johny Mail list wrote: 2007/7/3, Tejun Heo [EMAIL PROTECTED]: Brad Campbell wrote: Johny Mail list wrote: Hello list, I have a little question about software RAID on Linux. I have installed Software Raid on all my SC1425 servers DELL by believing that the md raid was a strong driver. And recently i make some test on a server and try to view if the RAID hard drive power failure work fine, so i power up my server and after booting and the prompt appear I disconnected the power cable of my SATA hard drive. Normaly the MD should eleminate the failure hard drive of the logical drive it build, and the server continue to work fine like nothing happen. Oddly the server stop to respond and i get this messages : ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata4: port is slow to respond, please be patient (Status 0xd0) ata4: port failed to respond (30sec, Status 0xd0) ata4: soft resetting port After that my system is frozen. How hard is it frozen? Can you blink the Numlock LED? I believe he said it was ICH5 (different post/thread). My observation on ICH5 is that if one unplugs a drive, then the chipset/cpu locks up hard when toggling SRST in the EH code. Specifically, it locks up at the instruction which restores SRST back to the non-asserted state, which likely corresponds to the chipset finally actually sending a FIS to the drive. A hard(ware) lockup, not software. That's why Intel says ICH5 doesn't do hotplug. Cheers - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/2] raid5: 65% sequential-write performance improvement, stripe-queue take2
The first take of the stripe-queue implementation[1] had a performance limiting bug in __wait_for_inactive_queue. Fixing that issue drastically changed the performance characteristics. The following data from tiobench shows the relative performance difference of the stripe-queue patchset. Unit information File size = megabytes Blk Size = bytes Num Thr = number of threads Avg Rate = relative throughput CPU% = relative percentage of CPU used during the test CPU Eff = Rate divided by CPU% - relative throughput per cpu load Configuration = Platform: 1200Mhz iop348 with 4-disk sata_vsc array mdadm --create /dev/md0 /dev/sd[abcd] -n 4 -l 5 mkfs.ext2 /dev/md0 mount /dev/md0 /mnt/raid tiobench --size 2048 --numruns 5 --block 4096 --block 131072 --dir /mnt/raid Sequential Reads FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-rc7-iop1 204840961 0% 4% -3% 2.6.22-rc7-iop1 204840962 -38%-33%-8% 2.6.22-rc7-iop1 204840964 -35%-30%-8% 2.6.22-rc7-iop1 204840968 -14%-11%-3% 2.6.22-rc7-iop1 204813107 1 2% 1% 2% 2.6.22-rc7-iop1 204813107 2 -11%-10%-2% 2.6.22-rc7-iop1 204813107 4 -7% -6% -1% 2.6.22-rc7-iop1 204813107 8 -9% -6% -4% Random Reads FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-rc7-iop1 204840961 -9% 15% -21% 2.6.22-rc7-iop1 204840962 -1% -30%42% 2.6.22-rc7-iop1 204840964 -14%-22%10% 2.6.22-rc7-iop1 204840968 -21%-28%9% 2.6.22-rc7-iop1 204813107 1 -8% -4% -4% 2.6.22-rc7-iop1 204813107 2 -13%-13%0% 2.6.22-rc7-iop1 204813107 4 -15%-15%0% 2.6.22-rc7-iop1 204813107 8 -13%-13%0% Sequential Writes FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-rc7-iop1 204840961 25% 11% 12% 2.6.22-rc7-iop1 204840962 41% 42% -1% 2.6.22-rc7-iop1 204840964 40% 18% 19% 2.6.22-rc7-iop1 204840968 15% -5% 21% 2.6.22-rc7-iop1 204813107 1 65% 57% 4% 2.6.22-rc7-iop1 204813107 2 46% 36% 8% 2.6.22-rc7-iop1 204813107 4 24% -7% 34% 2.6.22-rc7-iop1 204813107 8 28% -15%51% Random Writes FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-rc7-iop1 204840961 2% -8% 11% 2.6.22-rc7-iop1 204840962 -1% -19%21% 2.6.22-rc7-iop1 204840964 2% 2% 0% 2.6.22-rc7-iop1 204840968 -1% -28%37% 2.6.22-rc7-iop1 204813107 1 2% -3% 5% 2.6.22-rc7-iop1 204813107 2 3% -4% 7% 2.6.22-rc7-iop1 204813107 4 4% -3% 8% 2.6.22-rc7-iop1 204813107 8 5% -9% 15% The write performance numbers are better than I expected and would seem to address the concerns raised in the thread Odd (slow) RAID performance[2]. The read performance drop was not expected. However, the numbers suggest some additional changes to be made to the queuing model. Where read performance is dropping there appears to be an equal drop in CPU utilization, which seems to suggest that pure read requests be handled immediately without a trip to the the stripe-queue workqueue. Although it is not shown in the above data, another positive aspect is that increasing the cache size past a certain point causes the write performance gains to erode. In other words negative returns in contrast to diminishing returns. The stripe-queue can only carry out optimizations while the cache is busy. When the cache is large requests can be handled without waiting, and performance approaches the original 1:1 (queue-to-stripe-head) model. CPU speed dictates the maximum effective cache size. Once the CPU can no longer keep the stripe-queue saturated performance falls off from the peak. This is a positive change because it shows that the new queuing model can produce higher performance with less resources, but it does require more care when changing 'stripe_cache_size.' The above numbers were taken with the default cache size of 256. Changes since take1: * separate write and overwrite in the io_weight fields, i.e. an overwrite no longer implies a write * rename
Re: Linux Software RAID is really RAID?
Mark Lord wrote: I believe he said it was ICH5 (different post/thread). My observation on ICH5 is that if one unplugs a drive, then the chipset/cpu locks up hard when toggling SRST in the EH code. Specifically, it locks up at the instruction which restores SRST back to the non-asserted state, which likely corresponds to the chipset finally actually sending a FIS to the drive. A hard(ware) lockup, not software. That's why Intel says ICH5 doesn't do hotplug. OIC. I don't think there's much left to do from the driver side then. Or is there any workaround? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed enhancement to mdadm: Allow --write-behind= to be done in grow mode.
On Tue, 2007-07-03 at 15:03 +0100, David Greaves wrote: Ian Dall wrote: There doesn't seem to be any designated place to send bug reports and feature requests to mdadm, so I hope I am doing the right thing by sending it here. I have a small patch to mdamd which allows the write-behind amount to be set a array grow time (instead of currently only at grow or create time). I have tested this fairly extensively on some arrays built out of loop back devices, and once on a real live array. I haven't lot any data and it seems to work OK, though it is possible I am missing something. Sounds like a useful feature... Did you test the bitmap cases you mentioned? Yes. And I can use mdadm -X to see that the write behind parameter is set in the superblock. I don't know any way to monitor how much the write behind feature is being used though. My motivation was for doing this was to enable me to experiment to see how effective it is. Currently I have a Raid 0 array across 3 very fast (15k rpm) scsi disks. This array is mirrored by a single large vanilla ata (7.2k rpm) disk. I figure that the read performance of the combination is basically the read performance of the Raid 0, and the sustained write performance is basically that of the ata disk, which is about 6:1 read to write speed. I also see typically about 6 times the read traffic to write traffic. So I figure it should be close to optimal IF the bursts of write activity are not too long. Does anyone know how I can monitor the number of pending writes? Where are these queued? Are they simply stuck on the block device queue (and I could see with iostat) or does the md device maintain its own special queue for this? Ian -- Ian Dall [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html