Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])

2007-08-12 Thread Paul Clements
Iustin Pop wrote: On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote: On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote: now, I am not an expert on either option, but three are a couple things that I would question about the DRDB+MD option 1. when the remote machine is down, how does

Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

2007-06-14 Thread Paul Clements
Mike Snitzer wrote: Here are the steps to reproduce reliably on SLES10 SP1: 1) establish a raid1 mirror (md0) using one local member (sdc1) and one remote member (nbd0) 2) power off the remote machine, whereby severing nbd0's connection 3) perform IO to the filesystem that is on the md0 device

Re: accessing windows raid with mdadm?

2006-11-04 Thread Paul Clements
Shaya Potter wrote: [please cc: me on responses] if one read Documentation/fs/ntfs.txt in the linux kernel, it talks about accessing windows raid volumes in Linux If one is using non raid-5, one can use the device mapper, but if one is using raid-5, one has to use the md driver. However, it

[PATCH/BUG] mdadm: write behind value does not have endian conversion

2006-10-06 Thread Paul Clements
Neil, The write behind value does not get converted to/from little endian which causes write behind not to work on big endian machines (RUN ARRAY fails with invalid argument error): # mdadm -B /dev/md0 -l1 -n2 --write-behind=256 /dev/sdb10 --write-mostly /dev/nbd0 --bitmap=/bitmaps/test5

Re: [PATCH/BUG] mdadm: write behind value does not have endian conversion

2006-10-06 Thread Paul Clements
Paul Clements wrote: --- mdadm-2.5.1/bitmap.c.orig Fri Oct 6 13:40:35 2006 +++ mdadm-2.5.1/bitmap.cFri Oct 6 13:40:53 2006 @@ -33,6 +33,7 @@ inline void sb_le_to_cpu(bitmap_super_t sb-chunksize = __le32_to_cpu(sb-chunksize); sb-daemon_sleep = __le32_to_cpu(sb

Re: [BUG/PATCH] md bitmap broken on big endian machines

2006-09-28 Thread Paul Clements
Michael Tokarev wrote: Neil Brown wrote: ffs is closer, but takes an 'int' and we have a 'unsigned long'. So use ffz(~X) to convert a chunksize into a chunkshift. So we don't use ffs(int) for an unsigned value because of int vs unsigned int, but we use ffz() with negated UNSIGNED. Looks

[BUG/PATCH] md bitmap broken on big endian machines

2006-09-21 Thread Paul Clements
Neil, We just discovered this problem on a 64-bit IBM POWER (ppc64) system. The symptom was this BUG(): Sep 20 20:55:51 caspian kernel: kernel BUG in sync_request at drivers/md/raid1.c :1743! Sep 20 20:55:51 caspian kernel: Oops: Exception in kernel mode, sig: 5 [#1] Sep 20 20:55:51 caspian

Re: way too high reconstruction speed - bug?

2006-08-10 Thread Paul Clements
Tomasz Chmielewski wrote: [42949428.59] md: md11: raid array is not clean -- starting background reconstruction [42949428.62] raid10: raid set md11 active with 4 out of 4 devices [42949428.63] md: syncing RAID array md11 [42949428.63] md: minimum _guaranteed_ reconstruction

[BUG] raid1: barrier retry does not work correctly with write-behind

2006-08-03 Thread Paul Clements
Description: --- When a filesystem sends a barrier write down to raid1, raid1 tries to pass the write down to its component devices. However, if one or more of the devices return EOPNOTSUPP, it means that they do not support barriers, and raid1 must strip the barrier out of the write

Re: [BUG] raid1: barrier retry does not work correctly with write-behind

2006-08-03 Thread Paul Clements
Neil Brown wrote: sector, bdev, size are all remembered in r1_bio. That leaves bi_idx and an array od len/offset pairs that we need to preserve. So I guess the first step is to change alloc_behind_pages to return a new 'struct bio_vec' array rather than just a list of pages, and we should keep

Re: [BUG] raid1: barrier retry does not work correctly with write-behind

2006-08-03 Thread Paul Clements
Paul Clements wrote: I think bio_clone gives us that already. I may have missed something but I think we have everything we need: When a bio comes into raid1's make_request we bio_clone for each drive and attach those to r1_bio-bios. We also have behind_pages, which contains the pages. I

Re: [PATCH] md: new bitmap sysfs interface

2006-08-02 Thread Paul Clements
Neil Brown wrote: Is 'bitmap' the best name for the sysfs file? It seems a bit generic to me. write-bits-here-to-dirty-them-in-the-bitmap is probably (no, definitely) too verbose. dirty-in-bitmap maybe? bitmap-set-bits Any better suggestions? I like bitmap-set-bits or bitmap-dirty

Re: [PATCH] md: new bitmap sysfs interface

2006-07-27 Thread Paul Clements
Mike Snitzer wrote: On 7/26/06, Paul Clements [EMAIL PROTECTED] wrote: Mike Snitzer wrote: Also, what is the interface one should use to collect dirty bits from the primary's bitmap? Whatever you'd like. scp the bitmap file over or collect the ranges into a file and scp that over

Re: [PATCH] md: new bitmap sysfs interface

2006-07-27 Thread Paul Clements
Mike Snitzer wrote: On 7/26/06, Paul Clements [EMAIL PROTECTED] wrote: Right. At the time of the failover, there were (probably) blocks that were out of sync between the primary and secondary. OK, so now that I understand the need to merge the bitmaps... the various scenarios that create

Re: [PATCH] md: new bitmap sysfs interface

2006-07-26 Thread Paul Clements
Mike Snitzer wrote: I tracked down the thread you referenced and these posts (by you) seems to summarize things well: http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2 http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2 But for clarity's sake, could you elaborate on the

[PATCH] md: new bitmap sysfs interface

2006-07-25 Thread Paul Clements
This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface that allows the bitmap of an array to be dirtied. The interface is write-only, and is used as follows: echo 1000 /sys/block/md2/md/bitmap (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk bitmaps of

Re: mdadm -X bitmap status off by 2^16

2006-07-18 Thread Paul Clements
Janos Farkas wrote: # for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map Bitmap : 285923 bits (chunks), 0 dirty (0.0%) Bitmap : 285923 bits (chunks), 0 dirty (0.0%) Bitmap : 285923 bits (chunks), 65536 dirty (22.9%) This indicates that the _on-disk_ bits are

Re: bitmap status question

2006-06-21 Thread Paul Clements
David Greaves wrote: How do I interpret: bitmap: 0/117 pages [0KB], 1024KB chunk in the mdstat output what does it mean when it's, eg: 23/117 This refers to the in-memory bitmap (basically a cache of what's in the on-disk bitmap -- it allows bitmap operations to be more efficient).

Re: [PATCH] ANNOUNCE: mdadm 2.5.1 - A tool for managing Soft RAID under Linux

2006-06-19 Thread Paul Clements
Bill Davidsen wrote: Paul Clements wrote: Neil Brown wrote: I am pleased to announce the availability of mdadm version 2.5.1 and here's another patch for a compile error on ppc... Since ppc is big endian, the compiler is complaining because it can't determine whether the isuper post

Re: [PATCH] ANNOUNCE: mdadm 2.5.1 - A tool for managing Soft RAID under Linux

2006-06-19 Thread Paul Clements
Paul Clements wrote: --- super1.c2006-06-19 05:17:36.0 -0400 +++ /export/public/clemep/tmp/super1-ppc-compile-error.c2006-06-19 00:40:26.0 -0400 @@ -124,8 +124,11 @@ static unsigned int calc_sb_1_csum(struc disk_csum = sb-sb_csum; sb-sb_csum = 0

Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.

2006-05-13 Thread Paul Clements
Andrew Morton wrote: The loss of pagecache coherency seems sad. I assume there's never a requirement for userspace to read this file. Actually, there is. mdadm reads the bitmap file, so that would be broken. Also, it's just useful for a user to be able to read the bitmap (od -x, or

Re: Can't mount /dev/md0 after stopping a synchronization

2006-04-05 Thread Paul Clements
Mike Garey wrote: I seem to be getting closer.. If I try booting from a kernel without raid1 and md support, but using an initrd with raid1/md modules, then I get the ALERT! /dev/md0 does not exist. Dropping to a shell! message. I can't understand why there would be any difference between

Re: [PATCH] mdadm: monitor event argument passing

2006-04-03 Thread Paul Clements
Neil Brown wrote: On Friday March 31, [EMAIL PROTECTED] wrote: I've been looking at the mdadm monitor, and thought it might be useful if it allowed extra context information (in the form of command line arguments) to be sent to the event program, so instead of just: # mdadm -F /dev/md0 -p

[PATCH] mdadm: monitor event argument passing

2006-03-31 Thread Paul Clements
Signed-Off-By: Paul Clements [EMAIL PROTECTED] Monitor.c | 23 ++- 1 files changed, 22 insertions(+), 1 deletion(-) --- mdadm-2.4/Monitor.c 2006-03-28 17:59:42.0 -0500 +++ mdadm-2.4-event-args/Monitor.c 2006-03-31 13:19:40.0 -0500 @@ -464,6 +464,27

[PATCH] mdadm 2.4: fix write mostly for add and re-add

2006-03-31 Thread Paul Clements
The following patch makes it possible to tag a device as write-mostly on --add and --re-add with a non-persistent superblock array. Previously, this was not working. Thanks, Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] Manage.c |2 ++ 1 files changed, 2 insertions(+) --- mdadm

Re: Linux MD RAID5/6 bitmap patches

2006-03-23 Thread Paul Clements
, going back to 2003. -- Paul -Original Message- From: Paul Clements [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:37 PM To: Yogesh Pahilwan Cc: 'Neil Brown'; linux-raid@vger.kernel.org Subject: Re: Linux MD RAID5/6 bitmap patches Yogesh Pahilwan wrote: Thanks

Re: Question: array locking, possible?

2006-02-10 Thread Paul Clements
Jure Pečar wrote: I too am running a jbod with md raid between two machines. So far md never caused any kind of problems, altough I did have situations where both machines were syncing mirrors at once. If there's a little tool to reserve a disk via scsi, I'd like to know about it too. Even a

Re: NVRAM support

2006-02-10 Thread Paul Clements
Mirko Benz wrote: Does a high speed NVRAM device makes sense for Linux SW RAID? E.g. a PCI card that exports battery backed memory. Sure. There are a couple ways I can think of using such a thing: 1) put an md intent bitmap on the NVRAM device for faster resyncs 2) use the NVRAM as a write

Re: Question: array locking, possible?

2006-02-07 Thread Paul Clements
Chris Osicki wrote: The problem now is how to prevent somebody on the other host from accidentally assembling the array. Because the result of doing so would be something from strange to catastrophic ;-) To rephrase my question, is there any way to make it visible to the other host that the

Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.

2005-11-29 Thread Paul Clements
Hi Neil, Glad to see this patch is making its way to mainline. I have a couple of questions on the patch, though... NeilBrown wrote: + if (uptodate || conf-working_disks = 1) { Is it valid to mask a read error just because we have only 1 working disk? +

Re: Poor Software RAID-0 performance with 2.6.14.2

2005-11-22 Thread Paul Clements
Bill Davidsen wrote: One of the advantages of mirroring is that if there is heavy read load when one drive is busy there is another copy of the data on the other drive(s). But doing 1MB reads on the mirrored device did not show that the kernel took advantage of this in any way. In fact, it

Re: raid5 write performance

2005-11-20 Thread Paul Clements
Carlos Carvalho wrote: I think the demand for any solution to the unclean array is indeed low because of the small probability of a double failure. Those that want more reliability can use a spare drive that resyncs automatically or raid6 (or both). A spare disk would help, but note that

Re: split RAID1 during backups?

2005-10-25 Thread Paul Clements
Jeff Breidenbach wrote: your suggestion about kernel 2.6.13 and intent logging and having mdadm pull a disk sounds like a winner. I'm going to to try it if the software looks mature enough. Should I be scared? There have been a couple bug fixes in the bitmap stuff since 2.6.13 was released,

Re: [PATCH md 006 of 6] Add write-behind support for md/raid1

2005-08-12 Thread Paul Clements
Al Boldi wrote: NeilBrown wrote: If a device is flagged 'WriteMostly' and the array has a bitmap, and the bitmap superblock indicates that write_behind is allowed, then write_behind is enabled for WriteMostly devices. Nice, but why is it dependent on WriteMostly? WriteMostly is just a

Re: does chunksize matter in raid-1?

2005-08-06 Thread Paul Clements
Jeff Breidenbach wrote: Does chunk size matter *at all* for RAID-1? mdadm --create /dev/md0 --level=1 --chunk=8 /dev/sdc1 /dev/sdd1 mdadm --create /dev/md0 --level=1 --chunk=128 /dev/sdc1 /dev/sdd1 In my mental model of how RAID works, it can't possibly matter what my chunk size is whether

Re: endianness of Linux kernel RAID

2005-08-03 Thread Paul Clements
Gregory Seidman wrote: It turns out that if one uses the kernel (2.4.x-2.6.x) RAID support (RAID5, anyway, since that's all I've tested), the RAID'd disks cannot be moved to another system with a different endianness. I don't know how hard that Unfortunately, I haven't gotten into kernel

Re: endianness of Linux kernel RAID

2005-08-03 Thread Paul Clements
Gregory Seidman wrote: 1) Does this mean that the fix will be in 2.6.13? Yes, version 1 superblock support is in 2.6.13. You need the latest mdadm as well. 2) Does the version 1 you refer to have to do with pre-2.6 RAID support? No. 2.4 kernels do not understand version 1 superblocks.

Re: sofware raid5 oops

2005-07-11 Thread Paul Clements
Farkas Levente wrote: anybody has any useful tip about it? Unable to handle kernel NULL pointer dereference at virtual address printing eip: *pde = 0f94a067 Oops: [#1] Modules linked in: cifs nls_utf8 ncpfs nfsd exportfs lockd sunrpc parport_pc lp parport netconsole

Re: multi-hosting support for carrier grade Linux

2005-04-05 Thread Paul Clements
Dave Jiang wrote: I'm attempting to implement multihost support of the MD for environments such as carrier grade Linux. Multihost support allows the RAID array to be claimed by a particular host via a unique ID (unique SCSI host ID, FibreChannel WWN, or geographical address of a chassis blade.

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Paul Clements
Peter T. Breuer wrote: Paul Clements [EMAIL PROTECTED] wrote: Peter T. Breuer wrote: I don't see that this solves anything. If you had both sides going at once, receiving different writes, then you are sc**ed, and no resolution of bitmaps will help you, since both sides have received different

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Paul Clements
Luca Berra wrote: On Mon, Mar 21, 2005 at 11:07:06AM -0500, Paul Clements wrote: All I'm saying is that in a split-brain scenario, typical cluster frameworks will make two (or more) systems active at the same time. This I sincerely hope not. Perhaps my choice of wording was not the best? I

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Paul Clements
Peter T. Breuer wrote: Paul Clements [EMAIL PROTECTED] wrote: At any rate, this is all irrelevant given the second part of that email reply that I gave. You still have to do the bitmap combining, regardless of whether two systems were active at the same time or not. But why don't we already

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Paul Clements
Peter T. Breuer wrote: Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-03-18T13:52:54, Peter T. Breuer [EMAIL PROTECTED] wrote: The problem is for multi-nodes, both sides have their own bitmap. When a split scenario occurs, Here I think you mean that both nodes go their independent ways, due

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Paul Clements
Peter T. Breuer wrote: Paul Clements [EMAIL PROTECTED] wrote: I don't see that this solves anything. If you had both sides going at once, receiving different writes, then you are sc**ed, and no resolution of bitmaps will help you, since both sides have received different (legitimate) data

[PATCH 1/3] md bitmap async write enabling

2005-03-17 Thread Paul Clements
This patch enables the async write capability in md. It requires the md bitmap patches and the 117-WriteMostly-update patch. Signed-Off-By: Paul Clements [EMAIL PROTECTED] drivers/md/bitmap.c | 26 ++ drivers/md/md.c | 13 + include

[PATCH 2/3] md bitmap async writes for raid1

2005-03-17 Thread Paul Clements
This patch implements async writes in raid1. Signed-Off-By: Paul Clements [EMAIL PROTECTED] drivers/md/raid1.c | 107 ++--- include/linux/raid/raid1.h |2 2 files changed, 102 insertions(+), 7 deletions(-) diff -purN --exclude core --exclude

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-14 Thread Paul Clements
Neil Brown wrote: On Wednesday March 9, [EMAIL PROTECTED] wrote: avoid setting of sb-events_lo = 1 when creating a 0.90 superblock -- it doesn't seem to be necessary and it was causing the event counters to start at 4 billion+ (events_lo is actually the high part of the events counter, on

[PATCH 1/2] md bitmap bug fixes

2005-03-09 Thread Paul Clements
, as in the kernel if'ed out super1 definition which is now in the kernel headers included sys/time.h to avoid compile error Thanks, Paul Signed-Off-By: Paul Clements [EMAIL PROTECTED] bitmap.c | 58 +- raid1.c |1 + 2 files changed, 34

[PATCH 2/2] md bitmap bug fixes

2005-03-09 Thread Paul Clements
Here's the mdadm patch... Paul Clements wrote: Neil, here are a couple of patches -- this one for the kernel, the next for mdadm. They fix a few issues that I found while testing the new bitmap intent logging code. Briefly, the issues were: kernel: added call to bitmap_daemon_work() from raid1d

Re: [PATCH md 0 of 4] Introduction

2005-03-08 Thread Paul Clements
Peter T. Breuer wrote: Neil - can you describe for me (us all?) what is meant by intent­logging here. Since I wrote a lot of the code, I guess I'll try... Well, I can guess - I suppose the driver marks the bitmap before a write (or group of writes) and unmarks it when they have completed

Re: RAID 1 on a server with possible mad mobo

2005-03-08 Thread Paul Clements
Colin McDonald wrote: Without taking up to much of y'alls time, what would be the best solution for moving the RAID array over to a new box? 2. Try to boot off of the disks after they have been transferred into the new machine? I know this will cause all kinds of problems with kernel/devices, etc

Re: kernel race with mdadm monitor

2005-02-10 Thread Paul Clements
Mario Holbe wrote: I'm running Linux 2.4.27 i686 single-processor from debian's kernel-source-2.4.27 and mdadm 1.9.0 in monitor mode: While stopping a raid1 (raidstop /dev/md8) it seems there Unable to handle kernel NULL pointer dereference at virtual address 03d8 c024be53 *pde =

Re: Migrating from SINGLE DISK to RAID1

2005-02-01 Thread Paul Clements
Hi Robert, Robert Heinzmann wrote: can someone verify if the following statements are true ? - It's not possible to simply convert a existing partition with a filesystem on it to a raid1 mirror set. If you create a raid1 without a superblock it is possible (although not a very common