asynchronous write

2007-07-18 Thread Peter T. Breuer
Did the asynchronous write stuff (as it was in fr1) ever get into kernel software raid? I see from the raid acceleration (ioat) patching going on that some sort of asynchronicity is being contemplated, but blessed if I can make head or tail of the descriptions I've read. It looks vaguely like

Re: possible deadlock through raid5/md

2006-10-15 Thread Peter T. Breuer
While travelling the last few days, a theory has occurred to me to explain this sort of thing ... A user has sent me a ps ax output showing an enbd client daemon blocked in get_active_stripe (I presume in raid5.c). ps ax -of,uid,pid,ppid,pri,ni,vsz,rss,wchan:30,stat,tty,time,command

Re: remark and RFC

2006-08-19 Thread Peter T. Breuer
Also sprach Gabor Gombas: On Thu, Aug 17, 2006 at 08:28:07AM +0200, Peter T. Breuer wrote: 1) if the network disk device has decided to shut down wholesale (temporarily) because of lack of contact over the net, then retries and writes are _bound_ to fail for a while, so

Re: remark and RFC

2006-08-17 Thread Peter T. Breuer
HI Neil .. Also sprach Neil Brown: On Wednesday August 16, [EMAIL PROTECTED] wrote: 1) I would like raid request retries to be done with exponential delays, so that we get a chance to overcome network brownouts. 2) I would like some channel of communication to be available with

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
Also sprach Molle Bestefich: Peter T. Breuer wrote: I would like raid request retries to be done with exponential delays, so that we get a chance to overcome network brownouts. Hmm, I don't think MD even does retries of requests. I had a robust read patch in FR1, and I thought Neil

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
Also sprach Molle Bestefich: [Charset ISO-8859-1 unsupported, filtering to ASCII...] Peter T. Breuer wrote: You want to hurt performance for every single MD user out there, just There's no performance drop! Exponentially staged retries on failure are standard in all network protocols

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
Also sprach Molle Bestefich: See above. The problem is generic to fixed bandwidth transmission channels, which, in the abstract, is everything. As soon as one does retransmits one has a kind of obligation to keep retransmissions down to a fixed maximum percentage of the potential

Re: Questions about software RAID

2005-04-20 Thread Peter T. Breuer
Molle Bestefich [EMAIL PROTECTED] wrote: There seems to be an obvious lack of a properly thought out interface to notify userspace applications of MD events (disk failed -- go light a LED, etc). Well, that's probably truish. I've been meaning to ask for a per-device sysctl interface for some

Re: remove resyncing disk

2005-04-20 Thread Peter T. Breuer
Robbie Hughes [EMAIL PROTECTED] wrote: Number Major Minor RaidDevice State 0 00 -1 removed 1 22 661 active sync /dev/hdd2 2 330 spare /dev/hda3 The main problem i have now is that

Re: Questions about software RAID

2005-04-19 Thread Peter T. Breuer
tmp [EMAIL PROTECTED] wrote: I've read man mdadm and man mdadm.conf but I certainly doesn't have an overview of software RAID. Then try using it instead/as well as reading about it, and you will obtain a more cmprehensive understanding. OK. The HOWTO describes mostly a raidtools context,

Re: RAID1 and data safety?

2005-04-10 Thread Peter T. Breuer
Doug Ledford [EMAIL PROTECTED] wrote: Now, if I recall correctly, Peter posted a patch that changed this semantic in the raid1 code. The raid1 code does not complete a write to the upper layers of the kernel until it's been completed on all devices and his patch made it such that as

Re: RAID1 and data safety?

2005-04-08 Thread Peter T. Breuer
I forgot to say thanks! Thanks for the breakdown. Doug Ledford [EMAIL PROTECTED] wrote: (of event count increment) I think the best explanation is this: any change in array state that OK .. would necessitate kicking a drive out of the array if it didn't also make this change in state with

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Schuett Thomas EXT [EMAIL PROTECTED] wrote: And here the fault happens: By chance, it reads the transaction log from hda, then sees, that the transaction was finished, and clears the overall unclean bit. This cleaning is a write, so it goes to *both* HDs. Don't put the journal on the raid

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Neil Brown [EMAIL PROTECTED] wrote: Due to the system crash the data on hdb is completely ignored. Data Neil - can you explain the algorithm that stamps the superblocks with an event count, once and for all? (until further amendment :-). It goes without saying that sb's are not stamped at

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Neil Brown [EMAIL PROTECTED] wrote: On Tuesday March 29, [EMAIL PROTECTED] wrote: Don't put the journal on the raid device, then - I'm not ever sure why people do that! (they probably have a reason that is good - to them). Not good advice. DO put the journal on a raid device. It is

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Luca Berra [EMAIL PROTECTED] wrote: On Tue, Mar 29, 2005 at 01:29:22PM +0200, Peter T. Breuer wrote: Neil Brown [EMAIL PROTECTED] wrote: Due to the system crash the data on hdb is completely ignored. Data Neil - can you explain the algorithm that stamps the superblocks with an event count

[PATCH] md sleeps under spinlock on exit

2005-03-27 Thread Peter T. Breuer
md_exit calls mddev_put on each mddev during module exit. mddev_put calls blk_put_queue under spinlock, although it can sleep (it clearly calls kblockd_flush). This patch lifts the spinlock to do the flush. --- md.c.orig Fri Dec 24 22:34:29 2004 +++ md.cSun Mar 27 14:14:22 2005 @@

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-25 Thread Peter T. Breuer
Luca Berra [EMAIL PROTECTED] wrote: we can have a series of failures which must be accounted for and dealt with according to a policy that might be site specific. A) Failure of the standby node A.1) the active is allowed to continue in the absence of a data replica A.2) disk writes from

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Luca Berra [EMAIL PROTECTED] wrote: If we want to do data-replication, access to the data-replicated device should be controlled by the data replication process (*), md does not guarantee this. Well, if one writes to the md device, then md does guarantee this - but I find it hard to parse the

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Paul Clements [EMAIL PROTECTED] wrote: system A [raid1] / \ [disk][nbd] -- system B 2) you're writing, say, block 10 to the raid1 when A crashes (block 10 is dirty in the bitmap, and you don't know whether it got written to the disk on A or B, neither, or both)

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Peter T. Breuer
Paul Clements [EMAIL PROTECTED] wrote: At any rate, this is all irrelevant given the second part of that email reply that I gave. You still have to do the bitmap combining, regardless of whether two systems were active at the same time or not. As I understand it, you want both bitmaps in

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Peter T. Breuer
Paul Clements [EMAIL PROTECTED] wrote: OK - thanks for the reply, Paul ... Peter T. Breuer wrote: But why don't we already know from the _single_ bitmap on the array node (the node with the array) what to rewrite in total? All writes must go through the array. We know how many didn't go

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Mario Holbe [EMAIL PROTECTED] wrote: Peter T. Breuer [EMAIL PROTECTED] wrote: Yes, you can sync them by writing any one of the two mirrors to the other one, and need do so only on the union of the mapped data areas, As far as I understand the issue, this is exactly what should be possible

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: Luca Berra wrote: On Fri, Mar 18, 2005 at 02:42:55PM +0100, Lars Marowsky-Bree wrote: The problem is for multi-nodes, both sides have their own bitmap. When a split scenario occurs, and both sides begin modifying the data, that bitmap needs to

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: Ok, you intrigued me enouth already.. what's the FR1 patch? I want to give it a try... ;) Especially I'm interested in the Robust Read thing... That was published on this list a few weeks ago (probably needs updating, but I am sure you can help :-).

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-03-19T12:43:41, Peter T. Breuer [EMAIL PROTECTED] wrote: Well, there is the right data from our point of view, and it is what should by on (one/both?) device by now. One doesn't get to recover that right data by copying one disk over

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-03-19T14:27:45, Peter T. Breuer [EMAIL PROTECTED] wrote: Which one of the datasets you choose you could either arbitate via some automatic mechanisms (drbd-0.8 has a couple) or let a human decide. But how on earth can you get

Re: Robust Read

2005-03-19 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: [-- text/plain, encoding 7bit, charset: KOI8-R, 74 lines --] Peter T. Breuer wrote: [] The patch was originally developed for 2.4, then ported to 2.6.3, and then to 2.6.8.1. Neil has recently been doing stuff, so I don't think it applies cleanly

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-03-19T16:06:29, Peter T. Breuer [EMAIL PROTECTED] wrote: I'm cutting out those parts of the discussion which are irrelevant (or which I don't consider worth pursuing; maybe you'll find someone else to explain with more patience). Probably

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Guy [EMAIL PROTECTED] wrote: I agree, but I don't think a block device can to a re-sync without corrupting both. How do you merge a superset at the block level? AND the 2 Don't worry - it's just a one-way copy done efficiently (i.e., leaving out all the blocks known to be unmodified both

Re: Robust Read

2005-03-19 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: Uh OK. As I recall one only needs to count, one doesn't need a bitwise map of what one has dealt with. Well. I see read_balance() is now used to resubmit reads. There's a reason to use it instead of choosing next disk, I think. I can't think of

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-03-18T13:52:54, Peter T. Breuer [EMAIL PROTECTED] wrote: (proviso - I didn't read the post where you set out the error situations, but surely, on theoretical grounds, all that can happen is that the bitmap causes more to be synced than

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Mario Holbe [EMAIL PROTECTED] wrote: Peter T. Breuer [EMAIL PROTECTED] wrote: different (legitimate) data. It doesn't seem relevant to me to consider if they are equally up to date wrt the writes they have received. They will be in the wrong even if they are up to date. The goal

Re: [PATCH md 0 of 4] Introduction

2005-03-09 Thread Peter T. Breuer
Neil Brown [EMAIL PROTECTED] wrote: On Tuesday March 8, [EMAIL PROTECTED] wrote: Have you remodelled the md/raid1 make_request() fn? Somewhat. Write requests are queued, and raid1d submits them when it is happy that all bitmap updates have been done. OK - so a slight modification of the

Re: Write Order Restrictions

2005-03-08 Thread Peter T. Breuer
Can Sar [EMAIL PROTECTED] wrote: the driver just cycles through all devices that make up a soft raid device and just calls generic_make_request on them. Is this correct, or does some other function involved in the write process (starting from the soft raid level down) actually wait on

Re: [PATCH md 0 of 4] Introduction

2005-03-08 Thread Peter T. Breuer
NeilBrown [EMAIL PROTECTED] wrote: The second two fix bugs that were introduced by the recent bitmap-based-intent-logging patches and so are not relevant Neil - can you describe for me (us all?) what is meant by intent­logging here. Well, I can guess - I suppose the driver marks the bitmap

Re: [PATCH md 0 of 4] Introduction

2005-03-08 Thread Peter T. Breuer
Paul Clements [EMAIL PROTECTED] wrote: Peter T. Breuer wrote: Neil - can you describe for me (us all?) what is meant by intent-logging here. Since I wrote a lot of the code, I guess I'll try... Hi, Paul. Thanks. Well, I can guess - I suppose the driver marks the bitmap before a write

Re: Joys of spare disks!

2005-03-07 Thread Peter T. Breuer
[EMAIL PROTECTED] wrote: I've been going through the MD driver source, and to tell the truth, can't figure out where the read error is detected and how to hook that event and force a re-write of the failing sector. I would very much appreciate it if I did that for RAID1, or at least most of

Re: Creating RAID1 with missing - mdadm 1.90

2005-03-05 Thread Peter T. Breuer
berk walker [EMAIL PROTECTED] wrote: What might the proper [or functional] syntax be to do this? I'm running 2.6.10-1.766-FC3, and mdadm 1.90. Substitute the word missing for the corresponding device in the mdadm create command. (quotes manual page) To create a degraded array in which

Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel [EMAIL PROTECTED] wrote: I'd like to try this patch http://marc.theaimsgroup.com/?l=linux-raidm=110704868115609w=2 with EVMS BBR. Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)? Has anyone tried the rewrite part at all? I don't know md or the kernel or

Re: *terrible* direct-write performance with raid5

2005-02-23 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: (note raid5 performs faster than a single drive, it's expectable as it is possible to write to several drives in parallel). Each raid5 write must include at least ONE write to a target. I think you're saying that the writes go to different targets from

Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel [EMAIL PROTECTED] wrote: Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM: Quite possibly - I never tested the rewrite part of the patch, just wrote it to indicate how it should go and stuck it in to encourage others to go on from there. It's disabled by default. You almost

Re: *terrible* direct-write performance with raid5

2005-02-22 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: When debugging some other problem, I noticied that direct-io (O_DIRECT) write speed on a software raid5 And normal write speed (over 10 times the size of ram)? is terrible slow. Here's a small table just to show the idea (not numbers by itself as

Re: 2TB ?

2005-02-10 Thread Peter T. Breuer
No email [EMAIL PROTECTED] wrote: Forgive me as this is probably a silly question and one that has been answered many times, I have tried to search for the answers but have ended up more confused than when I started. So thought maybe I could ask the community to put me out of my misery

Re: Robust read patch for raid1

2005-02-01 Thread Peter T. Breuer
Peter T. Breuer [EMAIL PROTECTED] wrote: Allow me to remind what the patch does: it allows raid1 to proceed smoothly after a read error on a mirror component, without faulting the component. If the information is on another component, it will be returned. If all components are faulty

Robust read patch for raid1

2005-01-29 Thread Peter T. Breuer
I've had the opportunity to test the robust read patch that I posted earier in the month (10 Jan, Subject: Re: Spares and partitioning huge disks), and it needs one more change ... I assumed that the raid1 map function would move a (retried) request to another disk, but it des not, it always moves

Re: patches for mdadm 1.8.0 (auto=dev and stacking of devices)

2005-01-23 Thread Peter T. Breuer
Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2005-01-23T16:13:05, Luca Berra [EMAIL PROTECTED] wrote: the first one adds an auto=dev parameter rationale: udev does not create /dev/md* device files, so we need a way to create them when assembling the md device. Am I missing something

Re: patches for mdadm 1.8.0 (auto=dev and stacking of devices)

2005-01-23 Thread Peter T. Breuer
Luca Berra [EMAIL PROTECTED] wrote: I believe the correct solution to this would be implementing a char-misc /dev/mdadm device that mdadm would use instead of the block device, like device-mapper does. Alas i have no time for this in the forseable future. It's a generic problem (or

corruption on disk

2005-01-23 Thread Peter T. Breuer
Just a followup ... Neil said he has never seen disks corrupt spontaneously. I'm just making the rounds of checking the daily md5sums on one group of machines with a view to estimating the corruption rates. Here's one of the typical (one bit) corruptions: doc013:/usr/oboe/ptb% cmp --verbose

Re: No response?

2005-01-20 Thread Peter T. Breuer
David Dougall [EMAIL PROTECTED] wrote: If I am running software raid1 and a disk device starts throwing I/O errors, Is the filesystem supposed to see any indication of this? I No - not if the error is on only one disk. The first error will fault the disk from the array and the driver will

Re: RAID1 2.6.9 performance problem

2005-01-18 Thread Peter T. Breuer
Hans Kristian Rosbach [EMAIL PROTECTED] wrote: On Mon, 2005-01-17 at 17:46, Peter T. Breuer wrote: Interesting. How did you measure latency? Do you have a script you could post? It's part of another application we use internally at work. I'll check to see wether part of it could be GPL'ed

Re: RAID1 2.6.9 performance problem

2005-01-17 Thread Peter T. Breuer
Hans Kristian Rosbach [EMAIL PROTECTED] wrote: -It selects the disk that is closest to the wanted sector by remembering what sector was last requested and what disk was used for it. -For sequential reads (sucha as hdparm) it will override and use the same disk anyways. (sector =

Re: Spares and partitioning huge disks

2005-01-15 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote: That all to say: yes indeed, this lack of smart error handling is a noticieable omission in linux software raid. There are quite some (sometimes fatal to the data) failure scenarios that'd not had happened provided the smart error handling where in