Re: Software raid - controller options

2007-11-05 Thread Alberto Alonso
and any info you are able to give me! Lyle - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Alberto AlonsoGlobal Gate Systems LLC

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 11:09 +, David Greaves wrote: David PS I can't really contribute to your list - I'm only using cheap desktop hardware. - If you had failures and it properly handled them, then you can contribute to the good combinations, so far that's the list that is kind of

Re: Software RAID when it works and when it doesn't

2007-11-02 Thread Alberto Alonso
On Sat, 2007-10-27 at 11:26 -0400, Bill Davidsen wrote: Alberto Alonso wrote: On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: Depending on the hardware you can still access a different disk while another one is reseting. But since there is no timeout in md it won't

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: I wasn't belittling them. I was trying to isolate the likely culprit in the situations. You seem to want the md stack to time things out. As has already been commented by several people, myself included, that's a band-aid and not a fix

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote: It was tested, it simply obviously had a bug you hit. Assuming that your particular failure situation is the only possible outcome for all the other people that used it would be an invalid assumption. There are lots of code paths in an

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: I agree with Doug: nothing prevents you from using md above very slow drivers (such as remote disks or even a filesystem implemented over a tape device to make it extreme). Only the low-level drivers know when it is appropriate to

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: OK, these you don't get to count. If you run raid over USB...well...you get what you get. IDE never really was a proper server interface, and SATA is much better, but USB was never anything other than a means to connect simple devices

Re: Implementing low level timeouts within MD

2007-10-28 Thread Alberto Alonso
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: Regardless of the fact that it is not MD's fault, it does make software raid an invalid choice when combined with those drivers. A single disk failure within a RAID5 array

Re: Implementing low level timeouts within MD

2007-10-27 Thread Alberto Alonso
On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: This isn't an md problem, this is a low level disk driver problem. Yell at the author of the disk driver in question. If that driver doesn't time things out and return errors up the stack in a reasonable time, then it's broken. Md

RocketRAID 2220 firmware raid experience

2007-10-27 Thread Alberto Alonso
Has anybody used the RocketRAID 2220 to build hardware raid and lived through failures? As some of you may know from my previous posts, I've been having problems with software raid. Unfortunately, this was the only card available to me to add to my server so I haven't been able to test anything

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Alberto Alonso
On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: Depending on the hardware you can still access a different disk while another one is reseting. But since there is no timeout in md it won't try to use any other disk while one is stuck. That is exactly what I miss. MfG

Implementing low level timeouts within MD

2007-10-26 Thread Alberto Alonso
I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? For me this year shall be known as the year the array stood still (bad scifi reference :-) After 4 different array failures

Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Alberto Alonso
On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote: I think what you really want is to notice how long the drive and driver took to recover or fail, and take action based on that. In general kick the drive is not optimal for a few bad spots, even if the drive recovery sucks. The

Re: Software RAID when it works and when it doesn't

2007-10-23 Thread Alberto Alonso
On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: I'm not sure the timeouts are the problem, even if md did its own timeout, it then needs a way to tell the driver (or device) to stop retrying. I don't believe that's available, certainly not everywhere, and anything other than

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Alberto Alonso
On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta [EMAIL PROTECTED] writes: What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1 second) then the request should be duplicated on

Re: Software RAID when it works and when it doesn't

2007-10-14 Thread Alberto Alonso
On Sun, 2007-10-14 at 10:21 -0600, Maurice Hilarius wrote: Alberto Alonso wrote: PATA (IDE) with Master and Slave drives is a bad idea as, when one drive fails, the other of the Master Slave pair often is no longer usable. On discrete interfaces, with all drives configured as Master

Kicking the right drive out

2007-10-13 Thread Alberto Alonso
I have a need to kick a disk out of a RAID 5 array. I can do a fdisk on 2 out of the 3 devices that form part of the array, so I suspect I know which one is bad. The problem is that mdstat shows the array as follows: md3 : active raid5 sda6[0] sdc6[2] sdb6[1] 960863488 blocks level 5, 64k

Software RAID when it works and when it doesn't

2007-10-13 Thread Alberto Alonso
Over the past several months I have encountered 3 cases where the software RAID didn't work in keeping the servers up and running. In all cases, the failure has been on a single drive, yet the whole md device and server become unresponsive. (usb-storage) In one situation a RAID 0 across 2 USB

Re: When does a disk get flagged as bad?

2007-06-02 Thread Alberto Alonso
-- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting, sysadmin, monitoring and remote backups - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED

Re: When does a disk get flagged as bad?

2007-05-30 Thread Alberto Alonso
On Wed, 2007-05-30 at 22:28 -0400, Mike Accetta wrote: Alberto Alonso writes: OK, lets see if I can understand how a disk gets flagged as bad and removed from an array. I was under the impression that any read or write operation failure flags the drive as bad and it gets removed

When does a disk get flagged as bad?

2007-05-24 Thread Alberto Alonso
where the array is never degraded. Does an error of type: end_request: I/O error, dev sdb, sector not count as a read/write error? Thanks, Alberto -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting

I/O errors, server unresponsive, array NOT-degraded

2007-05-22 Thread Alberto Alonso
the array rebuilds that the disks themselves should be OK? * Why is the md device not being downgraded? Thanks, Alberto -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting, sysadmin, monitoring and remote