Re: Implementing low level timeouts within MD

2007-11-02 Thread Doug Ledford
On Fri, 2007-11-02 at 03:41 -0500, Alberto Alonso wrote:
 On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
  Not in the older kernel versions you were running, no.
 
 These old versions (specially the RHEL) are supposed to be
 the official versions supported by Redhat and the hardware 
 vendors, as they were very specific as to what versions of 
 Linux were supported.

The key word here being supported.  That means if you run across a
problem, we fix it.  It doesn't mean there will never be any problems.

  Of all people, I would think you would
 appreciate that. Sorry if I sound frustrated and upset, but 
 it is clearly a result of what supported and tested really 
 means in this case.

I'm sorry, but given the specially the RHEL case you cited, it is
clear I can't help you.  No one can.  You were running first gen
software on first gen hardware.  You show me *any* software company
who's first gen software never has to be updated to fix bugs, and I'll
show you a software company that went out of business they day after
they released their software.

Our RHEL3 update kernels contained *significant* updates to the SATA
stack after our GA release, replete with hardware driver updates and bug
fixes.  I don't know *when* that RHEL3 system failed, but I would
venture a guess that it wasn't prior to RHEL3 Update 1.  So, I'm
guessing you didn't take advantage of those bug fixes.  And I would
hardly call once a quarter continuously updating your kernel.  In any
case, given your insistence on running first gen software on first gen
hardware and not taking advantage of the support we *did* provide to
protect you against that failure, I say again that I can't help you.

  I don't want to go into a discussion of
 commercial distros, which are supported as this is nor the
 time nor the place but I don't want to open the door to the
 excuse of its an old kernel, it wasn't when it got installed.

I *really* can't help you.

 Outside of the rejected suggestion, I just want to figure out 
 when software raid works and when it doesn't. With SATA, my 
 experience is that it doesn't. So far I've only received one 
 response stating success (they were using the 3ware and Areca 
 product lines).

No, your experience, as you listed it, is that
SATA/usb-storage/Serverworks PATA failed you.  The software raid never
failed to perform as designed.

However, one of the things you are doing here is drawing sweeping
generalizations that are totally invalid.  You are saying your
experience is that SATA doesn't work, but you aren't qualifying it with
the key factor: SATA doesn't work in what kernel version?  It is
pointless to try and establish whether or not something like SATA works
in a global, all kernel inclusive fashion because the answer to the
question varies depending on the kernel version.  And the same is true
of pretty much every driver you can name.  This is why commercial
companies don't just certify hardware, but the software version that
actually works as opposed to all versions.  In truth, you have *no idea*
if SATA works today, because you haven't tried.  As David pointed out,
there was a significant overhaul of the SATA error recovery that took
place *after* the kernel versions that failed you which totally
invalidates your experiences and requires retesting of the later
software to see if it performs differently.

 Anyway, this thread just posed the question, and as Neil pointed
 out, it isn't feasible/worth to implement timeouts within the md
 code. I think most of the points/discussions raised beyond that
 original question really belong to the thread Software RAID when 
 it works and when it doesn't 
 
 I do appreciate all comments and suggestions and I hope to keep
 them coming. I would hope however to hear more about success
 stories with specific hardware details. It would be helpfull
 to have a list of tested configurations that are known to work.

I've had *lots* of success with software RAID as I've been running it
for years.  I've had old PATA drives fail, SCSI drives fail, FC drives
fail, and I've had SATA drives that got kicked from the array due to
read errors but not out and out drive failures.  But I keep at least
reasonably up to date with my kernels.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Implementing low level timeouts within MD

2007-11-02 Thread Bill Davidsen

Alberto Alonso wrote:

On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
  

Not in the older kernel versions you were running, no.



These old versions (specially the RHEL) are supposed to be
the official versions supported by Redhat and the hardware 
vendors, as they were very specific as to what versions of 
Linux were supported.


So the vendors of the failing drives claimed that these kernels were 
supported? That's great, most vendors don't even consider Linux 
supported. What response did you get when you reported the problem to 
Redhat on your RHEL support contract? Did they agree that this hardware, 
and its use for software raid, was supported and intended?



 Of all people, I would think you would
appreciate that. Sorry if I sound frustrated and upset, but 
it is clearly a result of what supported and tested really 
means in this case. I don't want to go into a discussion of

commercial distros, which are supported as this is nor the
time nor the place but I don't want to open the door to the
excuse of its an old kernel, it wasn't when it got installed.
  
The problem is in the time travel module. It didn't properly cope with 
future hardware, and since you have very long uptimes, I'm reasonably 
sure you haven't updated the kernel to get fixes installed.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-02 Thread David Greaves
Alberto Alonso wrote:
 On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
 Not in the older kernel versions you were running, no.
 
 These old versions (specially the RHEL) are supposed to be
 the official versions supported by Redhat and the hardware 
 vendors, as they were very specific as to what versions of 
 Linux were supported. Of all people, I would think you would
 appreciate that. Sorry if I sound frustrated and upset, but 
 it is clearly a result of what supported and tested really 
 means in this case. I don't want to go into a discussion of
 commercial distros, which are supported as this is nor the
 time nor the place but I don't want to open the door to the
 excuse of its an old kernel, it wasn't when it got installed.

It may be worth noting that the context of this email is the upstream linux-raid
 list. In my time watching the list it is mainly focused on 'current' code and
development (but hugely supportive of older environments).
In general discussions in this context will have a certain mindset - and it's
not going to be the same as that which you'd find in an enterprise product
support list.

 Outside of the rejected suggestion, I just want to figure out 
 when software raid works and when it doesn't. With SATA, my 
 experience is that it doesn't.

SATA, or more precisely, error handling in SATA has recently been significantly
overhauled by Tejun Heo (IIRC). We're talking post 2.6.18 though (again IIRC) -
so as far as SATA EH goes, older kernels bear no relation to the new ones.

And the initial SATA EH code was, of course, beta :)

David
PS I can't really contribute to your list - I'm only using cheap desktop 
hardware.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 11:09 +, David Greaves wrote:

 David
 PS I can't really contribute to your list - I'm only using cheap desktop 
 hardware.
 -

If you had failures and it properly handled them, then you can 
contribute to the good combinations, so far that's the list
that is kind of empty :-(

Thanks,

Alberto

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
 I wasn't belittling them.  I was trying to isolate the likely culprit in
 the situations.  You seem to want the md stack to time things out.  As
 has already been commented by several people, myself included, that's a
 band-aid and not a fix in the right place.  The linux kernel community
 in general is pretty hard lined when it comes to fixing the bug in the
 wrong way.

It did sound as if I was complaining about nothing and that I shouldn't
bother the linux-raid people and instead just continuously update the
kernel and stop raising issues. If I misunderstood you I'm sorry, but
somehow I still think that belittling my problems was implied in your
responses.

 Not in the older kernel versions you were running, no.

These old versions (specially the RHEL) are supposed to be
the official versions supported by Redhat and the hardware 
vendors, as they were very specific as to what versions of 
Linux were supported. Of all people, I would think you would
appreciate that. Sorry if I sound frustrated and upset, but 
it is clearly a result of what supported and tested really 
means in this case. I don't want to go into a discussion of
commercial distros, which are supported as this is nor the
time nor the place but I don't want to open the door to the
excuse of its an old kernel, it wasn't when it got installed.

 And I guarantee not a single one of those systems even knows what SATA
 is.  They all use tried and true SCSI/FC technology.

Sure, the tru64 units I talked about don't use SATA (although 
some did use PATA) I'll concede to that point.

 In any case, if Neil is so inclined to do so, he can add timeout code
 into the md stack, it's not my decision to make.

The timeout was nothing more than a suggestion based on what
I consider a reasonable expectation of usability. Neil said no
and I respect that. If I didn'tm I could always write my own as
per the open source model :-) But I am not inclined to do so.

Outside of the rejected suggestion, I just want to figure out 
when software raid works and when it doesn't. With SATA, my 
experience is that it doesn't. So far I've only received one 
response stating success (they were using the 3ware and Areca 
product lines).

Anyway, this thread just posed the question, and as Neil pointed
out, it isn't feasible/worth to implement timeouts within the md
code. I think most of the points/discussions raised beyond that
original question really belong to the thread Software RAID when 
it works and when it doesn't 

I do appreciate all comments and suggestions and I hope to keep
them coming. I would hope however to hear more about success
stories with specific hardware details. It would be helpfull
to have a list of tested configurations that are known to work.

Alberto

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-02 Thread Doug Ledford
On Fri, 2007-11-02 at 13:21 -0500, Alberto Alonso wrote:
 On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote:
 
  The key word here being supported.  That means if you run across a
  problem, we fix it.  It doesn't mean there will never be any problems.
 
 On hardware specs I normally read supported as tested within that
 OS version to work within specs. I may be expecting too much.

It was tested, it simply obviously had a bug you hit.  Assuming that
your particular failure situation is the only possible outcome for all
the other people that used it would be an invalid assumption.  There are
lots of code paths in an error handler routine, and lots of different
hardware failure scenarios, and they each have their own independent
outcome should they ever be experienced.

  I'm sorry, but given the specially the RHEL case you cited, it is
  clear I can't help you.  No one can.  You were running first gen
  software on first gen hardware.  You show me *any* software company
  who's first gen software never has to be updated to fix bugs, and I'll
  show you a software company that went out of business they day after
  they released their software.
 
 I only pointed to RHEL as an example since that was a particular
 distro that I use and exhibited the problem. I probably could of
 replaced it with Suse, Ubuntu, etc. I may have called the early
 versions back in 94 first gen but not today's versions. I know I 
 didn't expect the SLS distro to work reliably back then. 

Then you didn't pay attention to what I said before: RHEL3 was the first
ever RHEL product that had support for SATA hardware.  The SATA drivers
in RHEL3 *were* first gen.

 Can you provide specific chipsets that you used (specially for SATA)? 

All of the Adaptec SCSI chipsets through the 7899, Intel PATA, QLogic
FC, and nVidia and winbond based SATA.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote:
 It was tested, it simply obviously had a bug you hit.  Assuming that
 your particular failure situation is the only possible outcome for all
 the other people that used it would be an invalid assumption.  There are
 lots of code paths in an error handler routine, and lots of different
 hardware failure scenarios, and they each have their own independent
 outcome should they ever be experienced.

This is the kind of statement why I said you were belittling my 
experiences. 

And to think that since I've hit it in three different machines with
different hardware and different kernel versions that it won't affect
others is something else. I thought I was helping, but don't worry I
learned my lesson, it won't happen again. I asked people for their
experiences, clearly not everybody is as lucky as I am.

 Then you didn't pay attention to what I said before: RHEL3 was the first
 ever RHEL product that had support for SATA hardware.  The SATA drivers
 in RHEL3 *were* first gen.

Oh, I paid attention alright. It is my fault for assuming that things
not marked as experimental are not experimental.

Alberto

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-01 Thread Bill Davidsen

Alberto Alonso wrote:

On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
  

Really, you've only been bitten by three so far.  Serverworks PATA
(which I tend to agree with the other person, I would probably chock



3 types of bugs is too many, it basically affected all my customers
with  multi-terabyte arrays. Heck, we can also oversimplify things and 
say that it is really just one type and define everything as kernel type

problems (or as some other kernel used to say... general protection
error).

I am sorry for not having hundreds of RAID servers from which to draw
statistical analysis. As I have clearly stated in the past I am trying
to come up with a list of known combinations that work. I think my
data points are worth something to some people, specially those 
considering SATA drives and software RAID for their file servers. If

you don't consider them important for you that's fine, but please don't
belittle them just because they don't match your needs.

  

this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
is arranged similar to the SCSI stack with a core library that all the
drivers use, and then hardware dependent driver modules...I suspect that
since you got bit on three different hardware versions that you were in
fact hitting a core library bug, but that's just a suspicion and I could
well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
and generally that's what I've always used and had good things to say
about.  I've only used SATA for my home systems or workstations, not any
production servers.



The USB array was never meant to be a full production system, just to 
buy some time until the budget was allocated to buy a real array. Having

said that, the raid code is written to withstand the USB disks getting
disconnected as far as the driver reports it properly. Since it doesn't,
I consider it another case that shows when not to use software RAID
thinking that it will work.

As for SCSI I think it is a greatly proved and reliable technology, I've
dealt with it extensively and have always had great results. I know deal
with it mostly on non Linux based systems. But I don't think it is
affordable to most SMBs that need multi-terabyte arrays.
  
Actually, SCSI can fail as well. Until recently I was running servers 
with multi-TB arrays, and regularly, several times a year, a drive would 
fail and glitch the SCSI bus such that the next i/o to another drive 
would fail. And I've had SATA drives fail cleanly on small machines, so 
neither is an always config.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-01 Thread Bill Davidsen

Alberto Alonso wrote:

On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:

  

What kernels were these under?




Yes, these 3 were all SATA. The kernels (in the same order as above) 
are:


* 2.4.21-4.ELsmp #1 (Basically RHEL v3)
* 2.6.18-4-686 #1 SMP on a Fedora Core release 2
* 2.6.17.13 (compiled from vanilla sources)
  


*Old* kernels. If you are going to build your own kernel, get a new one!

The RocketRAID was configured for all drives as legacy/normal and
software RAID5 across all drives. I wasn't using hardware raid on
the last described system when it crashed.
  

--

bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-11-01 Thread Doug Ledford
On Thu, 2007-11-01 at 00:08 -0500, Alberto Alonso wrote:
 On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
  
  Really, you've only been bitten by three so far.  Serverworks PATA
  (which I tend to agree with the other person, I would probably chock
 
 3 types of bugs is too many, it basically affected all my customers
 with  multi-terabyte arrays. Heck, we can also oversimplify things and 
 say that it is really just one type and define everything as kernel type
 problems (or as some other kernel used to say... general protection
 error).
 
 I am sorry for not having hundreds of RAID servers from which to draw
 statistical analysis. As I have clearly stated in the past I am trying
 to come up with a list of known combinations that work. I think my
 data points are worth something to some people, specially those 
 considering SATA drives and software RAID for their file servers. If
 you don't consider them important for you that's fine, but please don't
 belittle them just because they don't match your needs.

I wasn't belittling them.  I was trying to isolate the likely culprit in
the situations.  You seem to want the md stack to time things out.  As
has already been commented by several people, myself included, that's a
band-aid and not a fix in the right place.  The linux kernel community
in general is pretty hard lined when it comes to fixing the bug in the
wrong way.

  this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
  is arranged similar to the SCSI stack with a core library that all the
  drivers use, and then hardware dependent driver modules...I suspect that
  since you got bit on three different hardware versions that you were in
  fact hitting a core library bug, but that's just a suspicion and I could
  well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
  and generally that's what I've always used and had good things to say
  about.  I've only used SATA for my home systems or workstations, not any
  production servers.
 
 The USB array was never meant to be a full production system, just to 
 buy some time until the budget was allocated to buy a real array. Having
 said that, the raid code is written to withstand the USB disks getting
 disconnected as far as the driver reports it properly. Since it doesn't,
 I consider it another case that shows when not to use software RAID
 thinking that it will work.
 
 As for SCSI I think it is a greatly proved and reliable technology, I've
 dealt with it extensively and have always had great results. I know deal
 with it mostly on non Linux based systems. But I don't think it is
 affordable to most SMBs that need multi-terabyte arrays.
 
  
   I'll repeat my plea one more time. Is there a published list
   of tested combinations that respond well to hardware failures
   and fully signals the md code so that nothing hangs?
  
  I don't know of one, but like I said, I've not used a lot of the SATA
  stuff for production.  I would make this one suggestion though, SATA is
  still an evolving driver stack to a certain extent, and as such, keeping
  with more current kernels than you have been using is likely to be a big
  factor in whether or not these sorts of things happen.
 
 OK, so based on this it seems that you would not recommend the use
 of SATA for production systems due to its immaturity, correct?

Not in the older kernel versions you were running, no.

  Keep in
 mind that production systems are not able to be brought down just to
 keep up with kernel changes. We have some tru64 production servers with
 1500 to 2500 days uptime, that's not uncommon in industry.

And I guarantee not a single one of those systems even knows what SATA
is.  They all use tried and true SCSI/FC technology.

In any case, if Neil is so inclined to do so, he can add timeout code
into the md stack, it's not my decision to make.

However, I would say that the current RAID subsystem relies on the
underlying disk subsystem to report errors when they occur instead of
hanging infinitely, which implies that the raid subsystem relies upon a
bug free low level driver.  It is intended to deal with hardware
failure, in as much as possible, and a driver bug isn't a hardware
failure.  You are asking the RAID subsystem to be extended to deal with
software errors as well.

Even though you may have thought it should handle this type of failure
when you put those systems together, it in fact was not designed to do
so.  For that reason, choice of hardware and status of drivers for
specific versions of hardware is important, and therefore it is also
important to keep up to date with driver updates.

It's highly likely that had you been keeping up to date with kernels,
several of those failures might not have happened.  One of the benefits
of having many people running a software setup is that when one person
hits a bug and you fix it, and then distribute that fix to everyone
else, you save everyone else from also hitting that bug.  You have

Re: Implementing low level timeouts within MD

2007-10-30 Thread Gabor Gombas
On Tue, Oct 30, 2007 at 12:08:07AM -0500, Alberto Alonso wrote:

   * Internal serverworks PATA controller on a netengine server. The
 server if off waiting to get picked up, so I can't get the important
 details.
  
  1 PATA failure.
 
 I was surprised on this one, I did have good luck with with PATA in
 the past. The kernel is whatever came standard in Fedora Core 2

The keyword here is probably not PATA but Serverworks... AFAIR that
chipset was always considered somewhat problematic. You may want to try
with the libata driver, it has a nice comment:

 *  Note that we don't copy the old serverworks code because the old
 *  code contains obvious mistakes

But even the new driver retained this coment from the old driver:

 * Documentation:
 *  Available under NDA only. Errata info very hard to get.

It isn't exactly giving me warm feelings to trust data to this chipset...

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-30 Thread Doug Ledford
On Tue, 2007-10-30 at 00:19 -0500, Alberto Alonso wrote:
 On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote:
  I agree with Doug: nothing prevents you from using md above very slow
  drivers (such as remote disks or even a filesystem implemented over a
  tape device to make it extreme). Only the low-level drivers know when
  it is appropriate to timeout or fail.
  
Sam
 
 The problem is when some of these drivers are just not smart
 enough to keep themselves out of trouble. Unfortunately I've
 been bitten by apparently too many of them.

Really, you've only been bitten by three so far.  Serverworks PATA
(which I tend to agree with the other person, I would probably chock
this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
is arranged similar to the SCSI stack with a core library that all the
drivers use, and then hardware dependent driver modules...I suspect that
since you got bit on three different hardware versions that you were in
fact hitting a core library bug, but that's just a suspicion and I could
well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
and generally that's what I've always used and had good things to say
about.  I've only used SATA for my home systems or workstations, not any
production servers.

 I'll repeat my plea one more time. Is there a published list
 of tested combinations that respond well to hardware failures
 and fully signals the md code so that nothing hangs?

I don't know of one, but like I said, I've not used a lot of the SATA
stuff for production.  I would make this one suggestion though, SATA is
still an evolving driver stack to a certain extent, and as such, keeping
with more current kernels than you have been using is likely to be a big
factor in whether or not these sorts of things happen.

 If not, I would like to see what people that have experienced
 hardware failures and survived them are using so that such
 a list can be compiled.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Implementing low level timeouts within MD

2007-10-30 Thread Doug Ledford
On Tue, 2007-10-30 at 00:08 -0500, Alberto Alonso wrote:
 On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:
 
  OK, these you don't get to count.  If you run raid over USB...well...you
  get what you get.  IDE never really was a proper server interface, and
  SATA is much better, but USB was never anything other than a means to
  connect simple devices without having to put a card in your PC, it was
  never intended to be a raid transport.
 
 I still count them ;-) I guess I just would of hoped for software raid
 to really don't care about the lower layers.

The job of software raid is to help protect your data.  In order to do
that, the raid needs to be run over something that *at least* provides a
minimum level of reliability itself.  The entire USB spec is written
under the assumption that a USB device can disappear at any time and the
stack must accept that (and it can, just trip on a cable some time and
watch your raid device get all pissy).  So, yes, software raid can run
over any block device, but putting it over an unreliable connection
medium is like telling a gladiator that he has to face the lion with no
sword, no shield, and his hands tied behind his back.  He might survive,
but you have so seriously handicapped him that it's all but over.

  
   * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
 disks each. (only one drive on one array went bad)
   
   * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
   
   * And the most complex is this week's server with 4 PCI/PCI-X cards.
 But the one that hanged the server was a 4 disk RAID5 array on a
 RocketRAID1540 card.
  
  And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
  it has more PATA ports than I've ever seen.
  
  Was the RocketRAID card in hardware or software raid mode?  It sounds
  like it could be a combination of both, something like hardware on the
  card, and software across the different cards or something like that.
  
  What kernels were these under?
 
 
 Yes, these 3 were all SATA. The kernels (in the same order as above) 
 are:
 
 * 2.4.21-4.ELsmp #1 (Basically RHEL v3)

*Really* old kernel.  RHEL3 is in maintenance mode already, and that was
the GA kernel.  It was also the first RHEL release with SATA support.
So, first gen driver on first gen kernel.

 * 2.6.18-4-686 #1 SMP on a Fedora Core release 2
 * 2.6.17.13 (compiled from vanilla sources)
 
 The RocketRAID was configured for all drives as legacy/normal and
 software RAID5 across all drives. I wasn't using hardware raid on
 the last described system when it crashed.

So, the system that died *just this week* was running 2.6.17.13?  Like I
said in my last email, the SATA stack has been evolving over the last
few years, and that's quite a few revisions behind.  My basic advice is
this: if you are going to use the latest and greatest hardware options,
then you should either make sure you are using an up to date distro
kernel of some sort or you need to watch the kernel update announcements
for fixes related to that hardware and update your kernels/drivers as
appropriate.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Implementing low level timeouts within MD

2007-10-29 Thread Doug Ledford
On Sun, 2007-10-28 at 01:27 -0500, Alberto Alonso wrote:
 On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote:
  On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
   Regardless of the fact that it is not MD's fault, it does make
   software raid an invalid choice when combined with those drivers. A
   single disk failure within a RAID5 array bringing a file server down
   is not a valid option under most situations.
  
  Without knowing the exact controller you have and driver you use, I
  certainly can't tell the situation.  However, I will note that there are
  times when no matter how well the driver is written, the wrong type of
  drive failure *will* take down the entire machine.  For example, on an
  SPI SCSI bus, a single drive failure that involves a blown terminator
  will cause the electrical signaling on the bus to go dead no matter what
  the driver does to try and work around it.
 
 Sorry I thought I copied the list with the info that I sent to Richard.
 Here is the main hardware combinations.
 
 --- Excerpt Start 
 Certainly. The times when I had good results (ie. failed drives
 with properly degraded arrays have been with old PATA based IDE 
 controllers built in the motherboard and the Highpoint PATA
 cards). The failures (ie. single disk failure bringing the whole
 server down) have been with the following:
 
 * External disks on USB enclosures, both RAID1 and RAID5 (two different
   systems) Don't know the actual controller for these. I assume it is
   related to usb-storage, but can probably research the actual chipset,
   if it is needed.

OK, these you don't get to count.  If you run raid over USB...well...you
get what you get.  IDE never really was a proper server interface, and
SATA is much better, but USB was never anything other than a means to
connect simple devices without having to put a card in your PC, it was
never intended to be a raid transport.

 * Internal serverworks PATA controller on a netengine server. The
   server if off waiting to get picked up, so I can't get the important
   details.

1 PATA failure.

 * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
   disks each. (only one drive on one array went bad)
 
 * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
 
 * And the most complex is this week's server with 4 PCI/PCI-X cards.
   But the one that hanged the server was a 4 disk RAID5 array on a
   RocketRAID1540 card.

And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
it has more PATA ports than I've ever seen.

Was the RocketRAID card in hardware or software raid mode?  It sounds
like it could be a combination of both, something like hardware on the
card, and software across the different cards or something like that.

What kernels were these under?

 --- Excerpt End 
 
  
   I wasn't even asking as to whether or not it should, I was asking if
   it could.
  
  It could, but without careful control of timeouts for differing types of
  devices, you could end up making the software raid less reliable instead
  of more reliable overall.
 
 Even if the default timeout was really long (ie. 1 minute) and then
 configurable on a per device (or class) via /proc it would really help.

It's a band-aid.  It's working around other bugs in the kernel instead
of fixing the real problem.

  Generally speaking, most modern drivers will work well.  It's easier to
  maintain a list of known bad drivers than known good drivers.
 
 That's what has been so frustrating. The old PATA IDE hardware always
 worked and the new stuff is what has crashed.

In all fairness, the SATA core is still relatively young.  IDE was
around for eons, where as Jeff started the SATA code just a few years
back.  In that time I know he's had to deal with both software bugs and
hardware bugs that would lock a SATA port up solid with no return.  What
it sounds like to me is you found some of those.

  Be careful which hardware raid you choose, as in the past several brands
  have been known to have the exact same problem you are having with
  software raid, so you may not end up buying yourself anything.  (I'm not
  naming names because it's been long enough since I paid attention to
  hardware raid driver issues that the issues I knew of could have been
  solved by now and I don't want to improperly accuse a currently well
  working driver of being broken)
 
 I have settled for 3ware. All my tests showed that it performed quite
 well and kicked drives out when needed. Of course, I haven't had a
 bad drive on a 3ware production server yet, so I may end up
 pulling the little bit of hair I have left.
 
 I am now rushing the RocketRAID 2220 into production without testing
 due to it being the only thing I could get my hands on. I'll report
 any experiences as they happen.
 
 Thanks for all the info,
 
 Alberto
 
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford


Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote:
 I agree with Doug: nothing prevents you from using md above very slow
 drivers (such as remote disks or even a filesystem implemented over a
 tape device to make it extreme). Only the low-level drivers know when
 it is appropriate to timeout or fail.
 
   Sam

The problem is when some of these drivers are just not smart
enough to keep themselves out of trouble. Unfortunately I've
been bitten by apparently too many of them.

I'll repeat my plea one more time. Is there a published list
of tested combinations that respond well to hardware failures
and fully signals the md code so that nothing hangs?

If not, I would like to see what people that have experienced
hardware failures and survived them are using so that such
a list can be compiled.

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-29 Thread Neil Brown
On Friday October 26, [EMAIL PROTECTED] wrote:
 I've been asking on my other posts but haven't seen
 a direct reply to this question:
 
 Can MD implement timeouts so that it detects problems when
 drivers don't come back?

No.
However it is possible that we will start sending the BIO_RW_FAILFAST
flag down on some or all requests.  That might make drivers fail more
promptly, which might be  good thing.  However it won't fix bugs in
drivers and - as has been said elsewhere on this thread - that is the
real problem.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:

 OK, these you don't get to count.  If you run raid over USB...well...you
 get what you get.  IDE never really was a proper server interface, and
 SATA is much better, but USB was never anything other than a means to
 connect simple devices without having to put a card in your PC, it was
 never intended to be a raid transport.

I still count them ;-) I guess I just would of hoped for software raid
to really don't care about the lower layers.
 
  * Internal serverworks PATA controller on a netengine server. The
server if off waiting to get picked up, so I can't get the important
details.
 
 1 PATA failure.

I was surprised on this one, I did have good luck with with PATA in
the past. The kernel is whatever came standard in Fedora Core 2

 
  * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
disks each. (only one drive on one array went bad)
  
  * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
  
  * And the most complex is this week's server with 4 PCI/PCI-X cards.
But the one that hanged the server was a 4 disk RAID5 array on a
RocketRAID1540 card.
 
 And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
 it has more PATA ports than I've ever seen.
 
 Was the RocketRAID card in hardware or software raid mode?  It sounds
 like it could be a combination of both, something like hardware on the
 card, and software across the different cards or something like that.
 
 What kernels were these under?


Yes, these 3 were all SATA. The kernels (in the same order as above) 
are:

* 2.4.21-4.ELsmp #1 (Basically RHEL v3)
* 2.6.18-4-686 #1 SMP on a Fedora Core release 2
* 2.6.17.13 (compiled from vanilla sources)

The RocketRAID was configured for all drives as legacy/normal and
software RAID5 across all drives. I wasn't using hardware raid on
the last described system when it crashed.

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-28 Thread Alberto Alonso
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote:
 On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
  Regardless of the fact that it is not MD's fault, it does make
  software raid an invalid choice when combined with those drivers. A
  single disk failure within a RAID5 array bringing a file server down
  is not a valid option under most situations.
 
 Without knowing the exact controller you have and driver you use, I
 certainly can't tell the situation.  However, I will note that there are
 times when no matter how well the driver is written, the wrong type of
 drive failure *will* take down the entire machine.  For example, on an
 SPI SCSI bus, a single drive failure that involves a blown terminator
 will cause the electrical signaling on the bus to go dead no matter what
 the driver does to try and work around it.

Sorry I thought I copied the list with the info that I sent to Richard.
Here is the main hardware combinations.

--- Excerpt Start 
Certainly. The times when I had good results (ie. failed drives
with properly degraded arrays have been with old PATA based IDE 
controllers built in the motherboard and the Highpoint PATA
cards). The failures (ie. single disk failure bringing the whole
server down) have been with the following:

* External disks on USB enclosures, both RAID1 and RAID5 (two different
  systems) Don't know the actual controller for these. I assume it is
  related to usb-storage, but can probably research the actual chipset,
  if it is needed.

* Internal serverworks PATA controller on a netengine server. The
  server if off waiting to get picked up, so I can't get the important
  details.

* Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
  disks each. (only one drive on one array went bad)

* VIA VT6420 built into the MB with RAID1 across 2 SATA drives.

* And the most complex is this week's server with 4 PCI/PCI-X cards.
  But the one that hanged the server was a 4 disk RAID5 array on a
  RocketRAID1540 card.

--- Excerpt End 

 
  I wasn't even asking as to whether or not it should, I was asking if
  it could.
 
 It could, but without careful control of timeouts for differing types of
 devices, you could end up making the software raid less reliable instead
 of more reliable overall.

Even if the default timeout was really long (ie. 1 minute) and then
configurable on a per device (or class) via /proc it would really help.

 Generally speaking, most modern drivers will work well.  It's easier to
 maintain a list of known bad drivers than known good drivers.

That's what has been so frustrating. The old PATA IDE hardware always
worked and the new stuff is what has crashed.

 Be careful which hardware raid you choose, as in the past several brands
 have been known to have the exact same problem you are having with
 software raid, so you may not end up buying yourself anything.  (I'm not
 naming names because it's been long enough since I paid attention to
 hardware raid driver issues that the issues I knew of could have been
 solved by now and I don't want to improperly accuse a currently well
 working driver of being broken)

I have settled for 3ware. All my tests showed that it performed quite
well and kicked drives out when needed. Of course, I haven't had a
bad drive on a 3ware production server yet, so I may end up
pulling the little bit of hair I have left.

I am now rushing the RocketRAID 2220 into production without testing
due to it being the only thing I could get my hands on. I'll report
any experiences as they happen.

Thanks for all the info,

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-27 Thread Samuel Tardieu
 Doug == Doug Ledford [EMAIL PROTECTED] writes:

Doug This isn't an md problem, this is a low level disk driver
Doug problem.  Yell at the author of the disk driver in question.  If
Doug that driver doesn't time things out and return errors up the
Doug stack in a reasonable time, then it's broken.  Md should not,
Doug and realistically can not, take the place of a properly written
Doug low level driver.

I agree with Doug: nothing prevents you from using md above very slow
drivers (such as remote disks or even a filesystem implemented over a
tape device to make it extreme). Only the low-level drivers know when
it is appropriate to timeout or fail.

  Sam
-- 
Samuel Tardieu -- [EMAIL PROTECTED] -- http://www.rfc1149.net/

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-27 Thread Richard Scobie

Alberto Alonso wrote:


After 4 different array failures all due to a single drive
failure I think it would really be helpful if the md code
timed out the driver.


Hi Alberto,

Sorry you've been having so much trouble.

For interest, can you tell us what drives and controllers are involved?

I've been running md for 8 years and over that time have had probably 
half a dozen drives failed out of arrays without any problems.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-27 Thread Alberto Alonso
On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote:
 
 This isn't an md problem, this is a low level disk driver problem.  Yell
 at the author of the disk driver in question.  If that driver doesn't
 time things out and return errors up the stack in a reasonable time,
 then it's broken.  Md should not, and realistically can not, take the
 place of a properly written low level driver.
 

I am not arguing whether or not MD is at fault, I know it isn't. 

Regardless of the fact that it is not MD's fault, it does make
software raid an invalid choice when combined with those drivers. A
single disk failure within a RAID5 array bringing a file server down
is not a valid option under most situations.

I wasn't even asking as to whether or not it should, I was asking if
it could. Should is a relative term, could is not. If the MD code
can not cope with poorly written drivers then a list of valid drivers
and cards would be nice to have (that's why I posted my ... when it
works and when it doesn't, I was trying to come up with such a list).

I only got 1 answer with brand specific information to figure out when
it works and when it doesn't work. My recent experience is that too
many drivers seem to have the problem so software raid is no longer
an option for any new systems that I build, and as time and money
permits I'll be switching to hardware/firmware raid all my legacy
servers.

Thanks,

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-27 Thread Richard Scobie

Alberto Alonso wrote:


What hardware do you use? I was trying to compile a list of known
configurations capable to detect and degrade properly.


To date I have not yet had a SATA based array drive go faulty - all mine
have been PATA arrays on Intel or AMD MB controllers, which as per your
experience, have failed out drives OK.

I have one 3ware PATA card that is running hardware RAID10 and it has
failed 4 drives over the years without trouble.

Regards,

Richard

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implementing low level timeouts within MD

2007-10-27 Thread Doug Ledford
On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
 On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote:
  
  This isn't an md problem, this is a low level disk driver problem.  Yell
  at the author of the disk driver in question.  If that driver doesn't
  time things out and return errors up the stack in a reasonable time,
  then it's broken.  Md should not, and realistically can not, take the
  place of a properly written low level driver.
  
 
 I am not arguing whether or not MD is at fault, I know it isn't. 
 
 Regardless of the fact that it is not MD's fault, it does make
 software raid an invalid choice when combined with those drivers. A
 single disk failure within a RAID5 array bringing a file server down
 is not a valid option under most situations.

Without knowing the exact controller you have and driver you use, I
certainly can't tell the situation.  However, I will note that there are
times when no matter how well the driver is written, the wrong type of
drive failure *will* take down the entire machine.  For example, on an
SPI SCSI bus, a single drive failure that involves a blown terminator
will cause the electrical signaling on the bus to go dead no matter what
the driver does to try and work around it.

 I wasn't even asking as to whether or not it should, I was asking if
 it could.

It could, but without careful control of timeouts for differing types of
devices, you could end up making the software raid less reliable instead
of more reliable overall.

  Should is a relative term, could is not. If the MD code
 can not cope with poorly written drivers then a list of valid drivers
 and cards would be nice to have (that's why I posted my ... when it
 works and when it doesn't, I was trying to come up with such a list).

Generally speaking, most modern drivers will work well.  It's easier to
maintain a list of known bad drivers than known good drivers.

 I only got 1 answer with brand specific information to figure out when
 it works and when it doesn't work. My recent experience is that too
 many drivers seem to have the problem so software raid is no longer
 an option for any new systems that I build, and as time and money
 permits I'll be switching to hardware/firmware raid all my legacy
 servers.

Be careful which hardware raid you choose, as in the past several brands
have been known to have the exact same problem you are having with
software raid, so you may not end up buying yourself anything.  (I'm not
naming names because it's been long enough since I paid attention to
hardware raid driver issues that the issues I knew of could have been
solved by now and I don't want to improperly accuse a currently well
working driver of being broken)

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Implementing low level timeouts within MD

2007-10-26 Thread Doug Ledford
On Fri, 2007-10-26 at 12:12 -0500, Alberto Alonso wrote:
 I've been asking on my other posts but haven't seen
 a direct reply to this question:
 
 Can MD implement timeouts so that it detects problems when
 drivers don't come back?
 
 For me this year shall be known as the year the array
 stood still (bad scifi reference :-)
 
 After 4 different array failures all due to a single drive
 failure I think it would really be helpful if the md code
 timed out the driver.

This isn't an md problem, this is a low level disk driver problem.  Yell
at the author of the disk driver in question.  If that driver doesn't
time things out and return errors up the stack in a reasonable time,
then it's broken.  Md should not, and realistically can not, take the
place of a properly written low level driver.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part