Re: Implementing low level timeouts within MD

2007-11-02 Thread Doug Ledford
On Fri, 2007-11-02 at 03:41 -0500, Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware

Re: Implementing low level timeouts within MD

2007-11-02 Thread Bill Davidsen
Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very

Re: Implementing low level timeouts within MD

2007-11-02 Thread David Greaves
Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very specific as

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 11:09 +, David Greaves wrote: David PS I can't really contribute to your list - I'm only using cheap desktop hardware. - If you had failures and it properly handled them, then you can contribute to the good combinations, so far that's the list that is kind of

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: I wasn't belittling them. I was trying to isolate the likely culprit in the situations. You seem to want the md stack to time things out. As has already been commented by several people, myself included, that's a band-aid and not a fix

Re: Implementing low level timeouts within MD

2007-11-02 Thread Doug Ledford
On Fri, 2007-11-02 at 13:21 -0500, Alberto Alonso wrote: On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote: The key word here being supported. That means if you run across a problem, we fix it. It doesn't mean there will never be any problems. On hardware specs I normally read

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote: It was tested, it simply obviously had a bug you hit. Assuming that your particular failure situation is the only possible outcome for all the other people that used it would be an invalid assumption. There are lots of code paths in an

Re: Implementing low level timeouts within MD

2007-11-01 Thread Bill Davidsen
Alberto Alonso wrote: On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote: Really, you've only been bitten by three so far. Serverworks PATA (which I tend to agree with the other person, I would probably chock 3 types of bugs is too many, it basically affected all my customers with

Re: Implementing low level timeouts within MD

2007-11-01 Thread Bill Davidsen
Alberto Alonso wrote: On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: What kernels were these under? Yes, these 3 were all SATA. The kernels (in the same order as above) are: * 2.4.21-4.ELsmp #1 (Basically RHEL v3) * 2.6.18-4-686 #1 SMP on a Fedora Core release 2 *

Re: Implementing low level timeouts within MD

2007-11-01 Thread Doug Ledford
On Thu, 2007-11-01 at 00:08 -0500, Alberto Alonso wrote: On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote: Really, you've only been bitten by three so far. Serverworks PATA (which I tend to agree with the other person, I would probably chock 3 types of bugs is too many, it

Re: Implementing low level timeouts within MD

2007-10-30 Thread Gabor Gombas
On Tue, Oct 30, 2007 at 12:08:07AM -0500, Alberto Alonso wrote: * Internal serverworks PATA controller on a netengine server. The server if off waiting to get picked up, so I can't get the important details. 1 PATA failure. I was surprised on this one, I did have good luck

Re: Implementing low level timeouts within MD

2007-10-30 Thread Doug Ledford
On Tue, 2007-10-30 at 00:19 -0500, Alberto Alonso wrote: On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only

Re: Implementing low level timeouts within MD

2007-10-30 Thread Doug Ledford
On Tue, 2007-10-30 at 00:08 -0500, Alberto Alonso wrote: On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was

Re: Implementing low level timeouts within MD

2007-10-29 Thread Doug Ledford
On Sun, 2007-10-28 at 01:27 -0500, Alberto Alonso wrote: On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only the low-level drivers know when it is appropriate to

Re: Implementing low level timeouts within MD

2007-10-29 Thread Neil Brown
On Friday October 26, [EMAIL PROTECTED] wrote: I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? No. However it is possible that we will start sending the BIO_RW_FAILFAST

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was never anything other than a means to connect simple devices

Re: Implementing low level timeouts within MD

2007-10-28 Thread Alberto Alonso
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array

Re: Implementing low level timeouts within MD

2007-10-27 Thread Samuel Tardieu
Doug == Doug Ledford [EMAIL PROTECTED] writes: Doug This isn't an md problem, this is a low level disk driver Doug problem. Yell at the author of the disk driver in question. If Doug that driver doesn't time things out and return errors up the Doug stack in a reasonable time, then it's broken.

Re: Implementing low level timeouts within MD

2007-10-27 Thread Richard Scobie
Alberto Alonso wrote: After 4 different array failures all due to a single drive failure I think it would really be helpful if the md code timed out the driver. Hi Alberto, Sorry you've been having so much trouble. For interest, can you tell us what drives and controllers are involved?

Re: Implementing low level timeouts within MD

2007-10-27 Thread Alberto Alonso
On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors up the stack in a reasonable time, then it's broken. Md

Re: Implementing low level timeouts within MD

2007-10-27 Thread Richard Scobie
Alberto Alonso wrote: What hardware do you use? I was trying to compile a list of known configurations capable to detect and degrade properly. To date I have not yet had a SATA based array drive go faulty - all mine have been PATA arrays on Intel or AMD MB controllers, which as per your

Re: Implementing low level timeouts within MD

2007-10-27 Thread Doug Ledford
On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors

Implementing low level timeouts within MD

2007-10-26 Thread Alberto Alonso
I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? For me this year shall be known as the year the array stood still (bad scifi reference :-) After 4 different array failures

Re: Implementing low level timeouts within MD

2007-10-26 Thread Doug Ledford
On Fri, 2007-10-26 at 12:12 -0500, Alberto Alonso wrote: I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? For me this year shall be known as the year the array stood