Re: Implementing low level timeouts within MD
On Fri, 2007-11-02 at 03:41 -0500, Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very specific as to what versions of Linux were supported. The key word here being supported. That means if you run across a problem, we fix it. It doesn't mean there will never be any problems. Of all people, I would think you would appreciate that. Sorry if I sound frustrated and upset, but it is clearly a result of what supported and tested really means in this case. I'm sorry, but given the specially the RHEL case you cited, it is clear I can't help you. No one can. You were running first gen software on first gen hardware. You show me *any* software company who's first gen software never has to be updated to fix bugs, and I'll show you a software company that went out of business they day after they released their software. Our RHEL3 update kernels contained *significant* updates to the SATA stack after our GA release, replete with hardware driver updates and bug fixes. I don't know *when* that RHEL3 system failed, but I would venture a guess that it wasn't prior to RHEL3 Update 1. So, I'm guessing you didn't take advantage of those bug fixes. And I would hardly call once a quarter continuously updating your kernel. In any case, given your insistence on running first gen software on first gen hardware and not taking advantage of the support we *did* provide to protect you against that failure, I say again that I can't help you. I don't want to go into a discussion of commercial distros, which are supported as this is nor the time nor the place but I don't want to open the door to the excuse of its an old kernel, it wasn't when it got installed. I *really* can't help you. Outside of the rejected suggestion, I just want to figure out when software raid works and when it doesn't. With SATA, my experience is that it doesn't. So far I've only received one response stating success (they were using the 3ware and Areca product lines). No, your experience, as you listed it, is that SATA/usb-storage/Serverworks PATA failed you. The software raid never failed to perform as designed. However, one of the things you are doing here is drawing sweeping generalizations that are totally invalid. You are saying your experience is that SATA doesn't work, but you aren't qualifying it with the key factor: SATA doesn't work in what kernel version? It is pointless to try and establish whether or not something like SATA works in a global, all kernel inclusive fashion because the answer to the question varies depending on the kernel version. And the same is true of pretty much every driver you can name. This is why commercial companies don't just certify hardware, but the software version that actually works as opposed to all versions. In truth, you have *no idea* if SATA works today, because you haven't tried. As David pointed out, there was a significant overhaul of the SATA error recovery that took place *after* the kernel versions that failed you which totally invalidates your experiences and requires retesting of the later software to see if it performs differently. Anyway, this thread just posed the question, and as Neil pointed out, it isn't feasible/worth to implement timeouts within the md code. I think most of the points/discussions raised beyond that original question really belong to the thread Software RAID when it works and when it doesn't I do appreciate all comments and suggestions and I hope to keep them coming. I would hope however to hear more about success stories with specific hardware details. It would be helpfull to have a list of tested configurations that are known to work. I've had *lots* of success with software RAID as I've been running it for years. I've had old PATA drives fail, SCSI drives fail, FC drives fail, and I've had SATA drives that got kicked from the array due to read errors but not out and out drive failures. But I keep at least reasonably up to date with my kernels. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very specific as to what versions of Linux were supported. So the vendors of the failing drives claimed that these kernels were supported? That's great, most vendors don't even consider Linux supported. What response did you get when you reported the problem to Redhat on your RHEL support contract? Did they agree that this hardware, and its use for software raid, was supported and intended? Of all people, I would think you would appreciate that. Sorry if I sound frustrated and upset, but it is clearly a result of what supported and tested really means in this case. I don't want to go into a discussion of commercial distros, which are supported as this is nor the time nor the place but I don't want to open the door to the excuse of its an old kernel, it wasn't when it got installed. The problem is in the time travel module. It didn't properly cope with future hardware, and since you have very long uptimes, I'm reasonably sure you haven't updated the kernel to get fixes installed. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very specific as to what versions of Linux were supported. Of all people, I would think you would appreciate that. Sorry if I sound frustrated and upset, but it is clearly a result of what supported and tested really means in this case. I don't want to go into a discussion of commercial distros, which are supported as this is nor the time nor the place but I don't want to open the door to the excuse of its an old kernel, it wasn't when it got installed. It may be worth noting that the context of this email is the upstream linux-raid list. In my time watching the list it is mainly focused on 'current' code and development (but hugely supportive of older environments). In general discussions in this context will have a certain mindset - and it's not going to be the same as that which you'd find in an enterprise product support list. Outside of the rejected suggestion, I just want to figure out when software raid works and when it doesn't. With SATA, my experience is that it doesn't. SATA, or more precisely, error handling in SATA has recently been significantly overhauled by Tejun Heo (IIRC). We're talking post 2.6.18 though (again IIRC) - so as far as SATA EH goes, older kernels bear no relation to the new ones. And the initial SATA EH code was, of course, beta :) David PS I can't really contribute to your list - I'm only using cheap desktop hardware. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Fri, 2007-11-02 at 11:09 +, David Greaves wrote: David PS I can't really contribute to your list - I'm only using cheap desktop hardware. - If you had failures and it properly handled them, then you can contribute to the good combinations, so far that's the list that is kind of empty :-( Thanks, Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: I wasn't belittling them. I was trying to isolate the likely culprit in the situations. You seem to want the md stack to time things out. As has already been commented by several people, myself included, that's a band-aid and not a fix in the right place. The linux kernel community in general is pretty hard lined when it comes to fixing the bug in the wrong way. It did sound as if I was complaining about nothing and that I shouldn't bother the linux-raid people and instead just continuously update the kernel and stop raising issues. If I misunderstood you I'm sorry, but somehow I still think that belittling my problems was implied in your responses. Not in the older kernel versions you were running, no. These old versions (specially the RHEL) are supposed to be the official versions supported by Redhat and the hardware vendors, as they were very specific as to what versions of Linux were supported. Of all people, I would think you would appreciate that. Sorry if I sound frustrated and upset, but it is clearly a result of what supported and tested really means in this case. I don't want to go into a discussion of commercial distros, which are supported as this is nor the time nor the place but I don't want to open the door to the excuse of its an old kernel, it wasn't when it got installed. And I guarantee not a single one of those systems even knows what SATA is. They all use tried and true SCSI/FC technology. Sure, the tru64 units I talked about don't use SATA (although some did use PATA) I'll concede to that point. In any case, if Neil is so inclined to do so, he can add timeout code into the md stack, it's not my decision to make. The timeout was nothing more than a suggestion based on what I consider a reasonable expectation of usability. Neil said no and I respect that. If I didn'tm I could always write my own as per the open source model :-) But I am not inclined to do so. Outside of the rejected suggestion, I just want to figure out when software raid works and when it doesn't. With SATA, my experience is that it doesn't. So far I've only received one response stating success (they were using the 3ware and Areca product lines). Anyway, this thread just posed the question, and as Neil pointed out, it isn't feasible/worth to implement timeouts within the md code. I think most of the points/discussions raised beyond that original question really belong to the thread Software RAID when it works and when it doesn't I do appreciate all comments and suggestions and I hope to keep them coming. I would hope however to hear more about success stories with specific hardware details. It would be helpfull to have a list of tested configurations that are known to work. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Fri, 2007-11-02 at 13:21 -0500, Alberto Alonso wrote: On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote: The key word here being supported. That means if you run across a problem, we fix it. It doesn't mean there will never be any problems. On hardware specs I normally read supported as tested within that OS version to work within specs. I may be expecting too much. It was tested, it simply obviously had a bug you hit. Assuming that your particular failure situation is the only possible outcome for all the other people that used it would be an invalid assumption. There are lots of code paths in an error handler routine, and lots of different hardware failure scenarios, and they each have their own independent outcome should they ever be experienced. I'm sorry, but given the specially the RHEL case you cited, it is clear I can't help you. No one can. You were running first gen software on first gen hardware. You show me *any* software company who's first gen software never has to be updated to fix bugs, and I'll show you a software company that went out of business they day after they released their software. I only pointed to RHEL as an example since that was a particular distro that I use and exhibited the problem. I probably could of replaced it with Suse, Ubuntu, etc. I may have called the early versions back in 94 first gen but not today's versions. I know I didn't expect the SLS distro to work reliably back then. Then you didn't pay attention to what I said before: RHEL3 was the first ever RHEL product that had support for SATA hardware. The SATA drivers in RHEL3 *were* first gen. Can you provide specific chipsets that you used (specially for SATA)? All of the Adaptec SCSI chipsets through the 7899, Intel PATA, QLogic FC, and nVidia and winbond based SATA. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Implementing low level timeouts within MD
On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote: It was tested, it simply obviously had a bug you hit. Assuming that your particular failure situation is the only possible outcome for all the other people that used it would be an invalid assumption. There are lots of code paths in an error handler routine, and lots of different hardware failure scenarios, and they each have their own independent outcome should they ever be experienced. This is the kind of statement why I said you were belittling my experiences. And to think that since I've hit it in three different machines with different hardware and different kernel versions that it won't affect others is something else. I thought I was helping, but don't worry I learned my lesson, it won't happen again. I asked people for their experiences, clearly not everybody is as lucky as I am. Then you didn't pay attention to what I said before: RHEL3 was the first ever RHEL product that had support for SATA hardware. The SATA drivers in RHEL3 *were* first gen. Oh, I paid attention alright. It is my fault for assuming that things not marked as experimental are not experimental. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote: Really, you've only been bitten by three so far. Serverworks PATA (which I tend to agree with the other person, I would probably chock 3 types of bugs is too many, it basically affected all my customers with multi-terabyte arrays. Heck, we can also oversimplify things and say that it is really just one type and define everything as kernel type problems (or as some other kernel used to say... general protection error). I am sorry for not having hundreds of RAID servers from which to draw statistical analysis. As I have clearly stated in the past I am trying to come up with a list of known combinations that work. I think my data points are worth something to some people, specially those considering SATA drives and software RAID for their file servers. If you don't consider them important for you that's fine, but please don't belittle them just because they don't match your needs. this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack is arranged similar to the SCSI stack with a core library that all the drivers use, and then hardware dependent driver modules...I suspect that since you got bit on three different hardware versions that you were in fact hitting a core library bug, but that's just a suspicion and I could well be wrong). What you haven't tried is any of the SCSI/SAS/FC stuff, and generally that's what I've always used and had good things to say about. I've only used SATA for my home systems or workstations, not any production servers. The USB array was never meant to be a full production system, just to buy some time until the budget was allocated to buy a real array. Having said that, the raid code is written to withstand the USB disks getting disconnected as far as the driver reports it properly. Since it doesn't, I consider it another case that shows when not to use software RAID thinking that it will work. As for SCSI I think it is a greatly proved and reliable technology, I've dealt with it extensively and have always had great results. I know deal with it mostly on non Linux based systems. But I don't think it is affordable to most SMBs that need multi-terabyte arrays. Actually, SCSI can fail as well. Until recently I was running servers with multi-TB arrays, and regularly, several times a year, a drive would fail and glitch the SCSI bus such that the next i/o to another drive would fail. And I've had SATA drives fail cleanly on small machines, so neither is an always config. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: What kernels were these under? Yes, these 3 were all SATA. The kernels (in the same order as above) are: * 2.4.21-4.ELsmp #1 (Basically RHEL v3) * 2.6.18-4-686 #1 SMP on a Fedora Core release 2 * 2.6.17.13 (compiled from vanilla sources) *Old* kernels. If you are going to build your own kernel, get a new one! The RocketRAID was configured for all drives as legacy/normal and software RAID5 across all drives. I wasn't using hardware raid on the last described system when it crashed. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Thu, 2007-11-01 at 00:08 -0500, Alberto Alonso wrote: On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote: Really, you've only been bitten by three so far. Serverworks PATA (which I tend to agree with the other person, I would probably chock 3 types of bugs is too many, it basically affected all my customers with multi-terabyte arrays. Heck, we can also oversimplify things and say that it is really just one type and define everything as kernel type problems (or as some other kernel used to say... general protection error). I am sorry for not having hundreds of RAID servers from which to draw statistical analysis. As I have clearly stated in the past I am trying to come up with a list of known combinations that work. I think my data points are worth something to some people, specially those considering SATA drives and software RAID for their file servers. If you don't consider them important for you that's fine, but please don't belittle them just because they don't match your needs. I wasn't belittling them. I was trying to isolate the likely culprit in the situations. You seem to want the md stack to time things out. As has already been commented by several people, myself included, that's a band-aid and not a fix in the right place. The linux kernel community in general is pretty hard lined when it comes to fixing the bug in the wrong way. this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack is arranged similar to the SCSI stack with a core library that all the drivers use, and then hardware dependent driver modules...I suspect that since you got bit on three different hardware versions that you were in fact hitting a core library bug, but that's just a suspicion and I could well be wrong). What you haven't tried is any of the SCSI/SAS/FC stuff, and generally that's what I've always used and had good things to say about. I've only used SATA for my home systems or workstations, not any production servers. The USB array was never meant to be a full production system, just to buy some time until the budget was allocated to buy a real array. Having said that, the raid code is written to withstand the USB disks getting disconnected as far as the driver reports it properly. Since it doesn't, I consider it another case that shows when not to use software RAID thinking that it will work. As for SCSI I think it is a greatly proved and reliable technology, I've dealt with it extensively and have always had great results. I know deal with it mostly on non Linux based systems. But I don't think it is affordable to most SMBs that need multi-terabyte arrays. I'll repeat my plea one more time. Is there a published list of tested combinations that respond well to hardware failures and fully signals the md code so that nothing hangs? I don't know of one, but like I said, I've not used a lot of the SATA stuff for production. I would make this one suggestion though, SATA is still an evolving driver stack to a certain extent, and as such, keeping with more current kernels than you have been using is likely to be a big factor in whether or not these sorts of things happen. OK, so based on this it seems that you would not recommend the use of SATA for production systems due to its immaturity, correct? Not in the older kernel versions you were running, no. Keep in mind that production systems are not able to be brought down just to keep up with kernel changes. We have some tru64 production servers with 1500 to 2500 days uptime, that's not uncommon in industry. And I guarantee not a single one of those systems even knows what SATA is. They all use tried and true SCSI/FC technology. In any case, if Neil is so inclined to do so, he can add timeout code into the md stack, it's not my decision to make. However, I would say that the current RAID subsystem relies on the underlying disk subsystem to report errors when they occur instead of hanging infinitely, which implies that the raid subsystem relies upon a bug free low level driver. It is intended to deal with hardware failure, in as much as possible, and a driver bug isn't a hardware failure. You are asking the RAID subsystem to be extended to deal with software errors as well. Even though you may have thought it should handle this type of failure when you put those systems together, it in fact was not designed to do so. For that reason, choice of hardware and status of drivers for specific versions of hardware is important, and therefore it is also important to keep up to date with driver updates. It's highly likely that had you been keeping up to date with kernels, several of those failures might not have happened. One of the benefits of having many people running a software setup is that when one person hits a bug and you fix it, and then distribute that fix to everyone else, you save everyone else from also hitting that bug. You have
Re: Implementing low level timeouts within MD
On Tue, Oct 30, 2007 at 12:08:07AM -0500, Alberto Alonso wrote: * Internal serverworks PATA controller on a netengine server. The server if off waiting to get picked up, so I can't get the important details. 1 PATA failure. I was surprised on this one, I did have good luck with with PATA in the past. The kernel is whatever came standard in Fedora Core 2 The keyword here is probably not PATA but Serverworks... AFAIR that chipset was always considered somewhat problematic. You may want to try with the libata driver, it has a nice comment: * Note that we don't copy the old serverworks code because the old * code contains obvious mistakes But even the new driver retained this coment from the old driver: * Documentation: * Available under NDA only. Errata info very hard to get. It isn't exactly giving me warm feelings to trust data to this chipset... Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Tue, 2007-10-30 at 00:19 -0500, Alberto Alonso wrote: On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only the low-level drivers know when it is appropriate to timeout or fail. Sam The problem is when some of these drivers are just not smart enough to keep themselves out of trouble. Unfortunately I've been bitten by apparently too many of them. Really, you've only been bitten by three so far. Serverworks PATA (which I tend to agree with the other person, I would probably chock this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack is arranged similar to the SCSI stack with a core library that all the drivers use, and then hardware dependent driver modules...I suspect that since you got bit on three different hardware versions that you were in fact hitting a core library bug, but that's just a suspicion and I could well be wrong). What you haven't tried is any of the SCSI/SAS/FC stuff, and generally that's what I've always used and had good things to say about. I've only used SATA for my home systems or workstations, not any production servers. I'll repeat my plea one more time. Is there a published list of tested combinations that respond well to hardware failures and fully signals the md code so that nothing hangs? I don't know of one, but like I said, I've not used a lot of the SATA stuff for production. I would make this one suggestion though, SATA is still an evolving driver stack to a certain extent, and as such, keeping with more current kernels than you have been using is likely to be a big factor in whether or not these sorts of things happen. If not, I would like to see what people that have experienced hardware failures and survived them are using so that such a list can be compiled. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Implementing low level timeouts within MD
On Tue, 2007-10-30 at 00:08 -0500, Alberto Alonso wrote: On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was never anything other than a means to connect simple devices without having to put a card in your PC, it was never intended to be a raid transport. I still count them ;-) I guess I just would of hoped for software raid to really don't care about the lower layers. The job of software raid is to help protect your data. In order to do that, the raid needs to be run over something that *at least* provides a minimum level of reliability itself. The entire USB spec is written under the assumption that a USB device can disappear at any time and the stack must accept that (and it can, just trip on a cable some time and watch your raid device get all pissy). So, yes, software raid can run over any block device, but putting it over an unreliable connection medium is like telling a gladiator that he has to face the lion with no sword, no shield, and his hands tied behind his back. He might survive, but you have so seriously handicapped him that it's all but over. * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 disks each. (only one drive on one array went bad) * VIA VT6420 built into the MB with RAID1 across 2 SATA drives. * And the most complex is this week's server with 4 PCI/PCI-X cards. But the one that hanged the server was a 4 disk RAID5 array on a RocketRAID1540 card. And 3 SATA failures, right? I'm assuming the Supermicro is SATA or else it has more PATA ports than I've ever seen. Was the RocketRAID card in hardware or software raid mode? It sounds like it could be a combination of both, something like hardware on the card, and software across the different cards or something like that. What kernels were these under? Yes, these 3 were all SATA. The kernels (in the same order as above) are: * 2.4.21-4.ELsmp #1 (Basically RHEL v3) *Really* old kernel. RHEL3 is in maintenance mode already, and that was the GA kernel. It was also the first RHEL release with SATA support. So, first gen driver on first gen kernel. * 2.6.18-4-686 #1 SMP on a Fedora Core release 2 * 2.6.17.13 (compiled from vanilla sources) The RocketRAID was configured for all drives as legacy/normal and software RAID5 across all drives. I wasn't using hardware raid on the last described system when it crashed. So, the system that died *just this week* was running 2.6.17.13? Like I said in my last email, the SATA stack has been evolving over the last few years, and that's quite a few revisions behind. My basic advice is this: if you are going to use the latest and greatest hardware options, then you should either make sure you are using an up to date distro kernel of some sort or you need to watch the kernel update announcements for fixes related to that hardware and update your kernels/drivers as appropriate. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Implementing low level timeouts within MD
On Sun, 2007-10-28 at 01:27 -0500, Alberto Alonso wrote: On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array bringing a file server down is not a valid option under most situations. Without knowing the exact controller you have and driver you use, I certainly can't tell the situation. However, I will note that there are times when no matter how well the driver is written, the wrong type of drive failure *will* take down the entire machine. For example, on an SPI SCSI bus, a single drive failure that involves a blown terminator will cause the electrical signaling on the bus to go dead no matter what the driver does to try and work around it. Sorry I thought I copied the list with the info that I sent to Richard. Here is the main hardware combinations. --- Excerpt Start Certainly. The times when I had good results (ie. failed drives with properly degraded arrays have been with old PATA based IDE controllers built in the motherboard and the Highpoint PATA cards). The failures (ie. single disk failure bringing the whole server down) have been with the following: * External disks on USB enclosures, both RAID1 and RAID5 (two different systems) Don't know the actual controller for these. I assume it is related to usb-storage, but can probably research the actual chipset, if it is needed. OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was never anything other than a means to connect simple devices without having to put a card in your PC, it was never intended to be a raid transport. * Internal serverworks PATA controller on a netengine server. The server if off waiting to get picked up, so I can't get the important details. 1 PATA failure. * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 disks each. (only one drive on one array went bad) * VIA VT6420 built into the MB with RAID1 across 2 SATA drives. * And the most complex is this week's server with 4 PCI/PCI-X cards. But the one that hanged the server was a 4 disk RAID5 array on a RocketRAID1540 card. And 3 SATA failures, right? I'm assuming the Supermicro is SATA or else it has more PATA ports than I've ever seen. Was the RocketRAID card in hardware or software raid mode? It sounds like it could be a combination of both, something like hardware on the card, and software across the different cards or something like that. What kernels were these under? --- Excerpt End I wasn't even asking as to whether or not it should, I was asking if it could. It could, but without careful control of timeouts for differing types of devices, you could end up making the software raid less reliable instead of more reliable overall. Even if the default timeout was really long (ie. 1 minute) and then configurable on a per device (or class) via /proc it would really help. It's a band-aid. It's working around other bugs in the kernel instead of fixing the real problem. Generally speaking, most modern drivers will work well. It's easier to maintain a list of known bad drivers than known good drivers. That's what has been so frustrating. The old PATA IDE hardware always worked and the new stuff is what has crashed. In all fairness, the SATA core is still relatively young. IDE was around for eons, where as Jeff started the SATA code just a few years back. In that time I know he's had to deal with both software bugs and hardware bugs that would lock a SATA port up solid with no return. What it sounds like to me is you found some of those. Be careful which hardware raid you choose, as in the past several brands have been known to have the exact same problem you are having with software raid, so you may not end up buying yourself anything. (I'm not naming names because it's been long enough since I paid attention to hardware raid driver issues that the issues I knew of could have been solved by now and I don't want to improperly accuse a currently well working driver of being broken) I have settled for 3ware. All my tests showed that it performed quite well and kicked drives out when needed. Of course, I haven't had a bad drive on a 3ware production server yet, so I may end up pulling the little bit of hair I have left. I am now rushing the RocketRAID 2220 into production without testing due to it being the only thing I could get my hands on. I'll report any experiences as they happen. Thanks for all the info, Alberto -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford
Re: Implementing low level timeouts within MD
On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only the low-level drivers know when it is appropriate to timeout or fail. Sam The problem is when some of these drivers are just not smart enough to keep themselves out of trouble. Unfortunately I've been bitten by apparently too many of them. I'll repeat my plea one more time. Is there a published list of tested combinations that respond well to hardware failures and fully signals the md code so that nothing hangs? If not, I would like to see what people that have experienced hardware failures and survived them are using so that such a list can be compiled. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Friday October 26, [EMAIL PROTECTED] wrote: I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? No. However it is possible that we will start sending the BIO_RW_FAILFAST flag down on some or all requests. That might make drivers fail more promptly, which might be good thing. However it won't fix bugs in drivers and - as has been said elsewhere on this thread - that is the real problem. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was never anything other than a means to connect simple devices without having to put a card in your PC, it was never intended to be a raid transport. I still count them ;-) I guess I just would of hoped for software raid to really don't care about the lower layers. * Internal serverworks PATA controller on a netengine server. The server if off waiting to get picked up, so I can't get the important details. 1 PATA failure. I was surprised on this one, I did have good luck with with PATA in the past. The kernel is whatever came standard in Fedora Core 2 * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 disks each. (only one drive on one array went bad) * VIA VT6420 built into the MB with RAID1 across 2 SATA drives. * And the most complex is this week's server with 4 PCI/PCI-X cards. But the one that hanged the server was a 4 disk RAID5 array on a RocketRAID1540 card. And 3 SATA failures, right? I'm assuming the Supermicro is SATA or else it has more PATA ports than I've ever seen. Was the RocketRAID card in hardware or software raid mode? It sounds like it could be a combination of both, something like hardware on the card, and software across the different cards or something like that. What kernels were these under? Yes, these 3 were all SATA. The kernels (in the same order as above) are: * 2.4.21-4.ELsmp #1 (Basically RHEL v3) * 2.6.18-4-686 #1 SMP on a Fedora Core release 2 * 2.6.17.13 (compiled from vanilla sources) The RocketRAID was configured for all drives as legacy/normal and software RAID5 across all drives. I wasn't using hardware raid on the last described system when it crashed. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array bringing a file server down is not a valid option under most situations. Without knowing the exact controller you have and driver you use, I certainly can't tell the situation. However, I will note that there are times when no matter how well the driver is written, the wrong type of drive failure *will* take down the entire machine. For example, on an SPI SCSI bus, a single drive failure that involves a blown terminator will cause the electrical signaling on the bus to go dead no matter what the driver does to try and work around it. Sorry I thought I copied the list with the info that I sent to Richard. Here is the main hardware combinations. --- Excerpt Start Certainly. The times when I had good results (ie. failed drives with properly degraded arrays have been with old PATA based IDE controllers built in the motherboard and the Highpoint PATA cards). The failures (ie. single disk failure bringing the whole server down) have been with the following: * External disks on USB enclosures, both RAID1 and RAID5 (two different systems) Don't know the actual controller for these. I assume it is related to usb-storage, but can probably research the actual chipset, if it is needed. * Internal serverworks PATA controller on a netengine server. The server if off waiting to get picked up, so I can't get the important details. * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 disks each. (only one drive on one array went bad) * VIA VT6420 built into the MB with RAID1 across 2 SATA drives. * And the most complex is this week's server with 4 PCI/PCI-X cards. But the one that hanged the server was a 4 disk RAID5 array on a RocketRAID1540 card. --- Excerpt End I wasn't even asking as to whether or not it should, I was asking if it could. It could, but without careful control of timeouts for differing types of devices, you could end up making the software raid less reliable instead of more reliable overall. Even if the default timeout was really long (ie. 1 minute) and then configurable on a per device (or class) via /proc it would really help. Generally speaking, most modern drivers will work well. It's easier to maintain a list of known bad drivers than known good drivers. That's what has been so frustrating. The old PATA IDE hardware always worked and the new stuff is what has crashed. Be careful which hardware raid you choose, as in the past several brands have been known to have the exact same problem you are having with software raid, so you may not end up buying yourself anything. (I'm not naming names because it's been long enough since I paid attention to hardware raid driver issues that the issues I knew of could have been solved by now and I don't want to improperly accuse a currently well working driver of being broken) I have settled for 3ware. All my tests showed that it performed quite well and kicked drives out when needed. Of course, I haven't had a bad drive on a 3ware production server yet, so I may end up pulling the little bit of hair I have left. I am now rushing the RocketRAID 2220 into production without testing due to it being the only thing I could get my hands on. I'll report any experiences as they happen. Thanks for all the info, Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Doug == Doug Ledford [EMAIL PROTECTED] writes: Doug This isn't an md problem, this is a low level disk driver Doug problem. Yell at the author of the disk driver in question. If Doug that driver doesn't time things out and return errors up the Doug stack in a reasonable time, then it's broken. Md should not, Doug and realistically can not, take the place of a properly written Doug low level driver. I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only the low-level drivers know when it is appropriate to timeout or fail. Sam -- Samuel Tardieu -- [EMAIL PROTECTED] -- http://www.rfc1149.net/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: After 4 different array failures all due to a single drive failure I think it would really be helpful if the md code timed out the driver. Hi Alberto, Sorry you've been having so much trouble. For interest, can you tell us what drives and controllers are involved? I've been running md for 8 years and over that time have had probably half a dozen drives failed out of arrays without any problems. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors up the stack in a reasonable time, then it's broken. Md should not, and realistically can not, take the place of a properly written low level driver. I am not arguing whether or not MD is at fault, I know it isn't. Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array bringing a file server down is not a valid option under most situations. I wasn't even asking as to whether or not it should, I was asking if it could. Should is a relative term, could is not. If the MD code can not cope with poorly written drivers then a list of valid drivers and cards would be nice to have (that's why I posted my ... when it works and when it doesn't, I was trying to come up with such a list). I only got 1 answer with brand specific information to figure out when it works and when it doesn't work. My recent experience is that too many drivers seem to have the problem so software raid is no longer an option for any new systems that I build, and as time and money permits I'll be switching to hardware/firmware raid all my legacy servers. Thanks, Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
Alberto Alonso wrote: What hardware do you use? I was trying to compile a list of known configurations capable to detect and degrade properly. To date I have not yet had a SATA based array drive go faulty - all mine have been PATA arrays on Intel or AMD MB controllers, which as per your experience, have failed out drives OK. I have one 3ware PATA card that is running hardware RAID10 and it has failed 4 drives over the years without trouble. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implementing low level timeouts within MD
On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors up the stack in a reasonable time, then it's broken. Md should not, and realistically can not, take the place of a properly written low level driver. I am not arguing whether or not MD is at fault, I know it isn't. Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array bringing a file server down is not a valid option under most situations. Without knowing the exact controller you have and driver you use, I certainly can't tell the situation. However, I will note that there are times when no matter how well the driver is written, the wrong type of drive failure *will* take down the entire machine. For example, on an SPI SCSI bus, a single drive failure that involves a blown terminator will cause the electrical signaling on the bus to go dead no matter what the driver does to try and work around it. I wasn't even asking as to whether or not it should, I was asking if it could. It could, but without careful control of timeouts for differing types of devices, you could end up making the software raid less reliable instead of more reliable overall. Should is a relative term, could is not. If the MD code can not cope with poorly written drivers then a list of valid drivers and cards would be nice to have (that's why I posted my ... when it works and when it doesn't, I was trying to come up with such a list). Generally speaking, most modern drivers will work well. It's easier to maintain a list of known bad drivers than known good drivers. I only got 1 answer with brand specific information to figure out when it works and when it doesn't work. My recent experience is that too many drivers seem to have the problem so software raid is no longer an option for any new systems that I build, and as time and money permits I'll be switching to hardware/firmware raid all my legacy servers. Be careful which hardware raid you choose, as in the past several brands have been known to have the exact same problem you are having with software raid, so you may not end up buying yourself anything. (I'm not naming names because it's been long enough since I paid attention to hardware raid driver issues that the issues I knew of could have been solved by now and I don't want to improperly accuse a currently well working driver of being broken) -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Implementing low level timeouts within MD
On Fri, 2007-10-26 at 12:12 -0500, Alberto Alonso wrote: I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? For me this year shall be known as the year the array stood still (bad scifi reference :-) After 4 different array failures all due to a single drive failure I think it would really be helpful if the md code timed out the driver. This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors up the stack in a reasonable time, then it's broken. Md should not, and realistically can not, take the place of a properly written low level driver. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part