Re: RAID questions

2000-08-08 Thread Danilo Godec

On Mon, 7 Aug 2000, Adam McKenna wrote:

 2)  If I do, will it still broken unless I apply the "2.2.16combo" patch?
 3)  If it will, then how do I resolve the problem with the md.c hunk failing 
 with "2.2.16combo"?

If I remember correctly, 2.2.16combo was there to make it possible to use
Ingo's older raid patches on 2.2.16 (before raid-2.2.16-A0 was released).
I'm not 100% sure, though.

 This is a production system I am working on here.  I can't afford to have it
 down for an hour or two to test a new kernel.  I'd rather not be working with
 this mess to begin with, but unfortunately this box was purchased before I
 started this job, and whoever ordered it decided that software raid was
 "Good enough".

A test machine comes in handy. Not to actually test the new RAID code (we
did/do that already ;) ), but just to train handling of SW raid.

 I am not subscribed to either list so CC's are desirable.  However if you
 don't want to CC then you don't have to -- I'll just read the archives.
 That is, if someone fixes the "Mailing list archives" link on www.linux.org 
 to point to someplace that exists and actually has archives.

IMHO, if you need (or want) to work with SW raid, it would be better to
subscribe. It's not all that much traffic here and (usually) the stuff we
get here is relevant (with exception of too many questions on patches
location, but that should be fixed anyway). Besides, any real problems,
bug reports, warnings appear here very soon.


   D.





RE: owie, disk failure

2000-08-07 Thread Danilo Godec

On Mon, 7 Aug 2000, Corin Hartland-Swann wrote:

 I have to confess I've never heard of manufacturers offering diagnostic
 utilities for disks... Gregory, can you point me at any examples? Am I
 just being a complete dumbass here?

At least Western Digital does on their ftp address
ftp://ftp.wdc.com/pub/drivers/hdutil, however I don't know what and how
those utils do better than badblocks  friends.


   D.





Re: Raid developers question

2000-07-27 Thread Danilo Godec

On Thu, 27 Jul 2000, Art wrote:

 After pulling out one disk (system off line), it came back on line with the
 data intact... It started automatically the reconfiguration using the spare
 disk.
 The funny thing was, after reinserting the original disk it did not
 reconfigure it automatially. I had to raidhotadd the disk. Then it started
 reconfiguring it.

That's expected behaviour. 

 I would like to be able to stop my raid array and switch off the power of
 this box (not the computer). If I switch the array off and on, the scsi
 disks do not spindle up. So I have to reboot the machine (scsi card spindles
 them up). This is a bit awkward.

Check out /usr/src/linux/drivers/scsi/scsi.c. You need to do some magic
with 'remove-single-device' and then, after restarting the disks,
'add-single-device'. If you turn your disk off without first removing it
from /proc/scsi/scsi, the controller seems to get confused...

 Can the raid-software also control the above mentioned leds?

AFAIK, no.

 What is translucent mode?

No idea, it's just not supposed to be used... :)

 Which drive is my hotspare if I issue `cat /proc/mdstat` ?

md0 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0] 17333888 blocks level
5, 32k chunk, algorithm 2 [3/3] [UUU] 

I suppose active RAID drives are numbered from 0, meaning that 0, 1 and 2
are active (in a 3+1 RAID5 array). So number sdd2[3] is the spare drive.

 Is it possible to modify the raidstart/stop code so that it uses scsi
 commands to start/stop the disks (in a running machine)?

This is trivial as software raid is not limited to SCSI disks only, so
that would involve quite a lot of sanity checking...

However, it's pretty easy to write a couple of scripts doing just that.

   D.




RE: raid and 2.4 kernels

2000-07-27 Thread Danilo Godec

On Thu, 27 Jul 2000, Neil Brown wrote:

 If raid on 2.4 is fast than raid in 2.2, we say "great".
 If it is slower, we look at the no-raid numbers.
 If no-raid on 2.4 is slow than no-raid on 2.2, we say "oh dear, the
 disc subsystem is slower on 2.4", and point the finger appropriately.
 If no-raid on 2.2 is fast than no-raid on 2.4, then we say "Hmm, must
 be a problem with raid" and point the finger there.
 
 Does that make sense?

In a way, yes. But raid could depend on other parts of the kernel more
heavily then no-raid disk access and thus could be more affected by
errors/problems in those parts.

D.





Re: raid5 troubles

2000-07-20 Thread Danilo Godec

On Thu, 20 Jul 2000, Hermann 'mrq1' Gausterer wrote:

 but when i do mkraid, i get an error :-(((
 
 [root@mrqserv2 linux]# mkraid /dev/md0
 handling MD device /dev/md0
 analyzing super-block
 disk 0: /dev/sdb1, 4233096kB, raid superblock at 4233024kB
 disk 1: /dev/sdc1, 4233096kB, raid superblock at 4233024kB
 disk 2: /dev/sda6, failed
 mkraid: aborted, see the syslog and /proc/mdstat for potential clues.
 [root@mrqserv2 linux]#
 
 what is wrong here ?

Most probably your version of raidtools-0.90 doesn't recognize the
failed-disk directive. I use the version from Ingo's page (marked
dangerous) http://people.redhat.com/mingo/raid-patches/... and it works
fine.

D.





Re: upgrading a raid kernel

2000-07-11 Thread Danilo Godec

On Tue, 11 Jul 2000, Dirk Bonenkamp - Bean IT wrote:

 I want to upgrade a machine running 2.2.10 kernel running software raid to 
 2.2.16. I only found raid patches ending with 2.2.11 (ftp.fi.kernel.org), 
 will this work on 2.2.16?? And, is patching and installing the new kernel 
 enough to get things working? (I guess so, raid devices are build  
 working, so no need for new raidtools etc?).

Aren't you reading this mailing list?

Patches for new kernels are available at
http://people.redhat.com/mingo/raid-patches/. You should also grab the
raidtools from there, as they support some new usefull features (such as
failed-disk directive).

 D.





Re: Problem with raidhotremove

2000-06-27 Thread Danilo Godec

On Tue, 27 Jun 2000, Neil Brown wrote:

 If you don't have raidsetfaulty (so RedHats don't have it), grab the
 latest raidtools from

http://www.{country}.kernel.org/pub/linux/daemons/raid/alpha/raidtools-19990824-0.90.tar.gz

I think you can get more recent raidtools from
http://people.redhat.com/mingo/raid-patches/ .

   D.





Re: 2.2.16 RAID patch

2000-06-15 Thread Danilo Godec

On Wed, 14 Jun 2000, Matthew DeFoor wrote:

 now!
 sdb6's event counter: 0006
 sda6's event counter: 0006
 request_module[md-personality-3]: Root fs not mounted

Seems to me you have raid1 compiled as a module. That's OK if you really
know initrd stuff, but personally I prefer to compile raid1 in the
kernel. It saves me the trouble of creating an initrd...

D.




Re: Software Raid on linux 2.2.14/5 with version 0.90.0 of raidtools

2000-06-09 Thread Danilo Godec

On Thu, 8 Jun 2000, Maria Blackmore wrote:

 In a nutshell, it simply doesn't work, there isn't much more I can say
 than that, because that is just it.

In a nutshell, get the patches (http://www.redhat.com/~mingo/raid-patches/), 
compile the kernel and off you go.

 needless to say, niether the syslog or /proc/mdstat provide any hints
 whatsoever, in fact there is nothing logged at all during this.

Needless to say, this was on the list like 6000 times. I wish HOW-TO would
mention the location of recent patches.

   D.





Problems again

2000-03-28 Thread Danilo Godec

Today, I had another SCSI failure. I was able to get a bit more of dmesg
stuff, but can't figure out, what is going wrong there.

In /var/log/messages, the unusuall stuff starts with this repeated a
couple of times:

Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Parity error during Message-In phase
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Parity error during Data-In phase.

It goes on to a lot of messages similar to this (pid, id and stuff right
from 'lun 0' is changing):

Mar 28 12:00:45 mail kernel: scsi : aborting command due to timeout : pid 14301024, 
scsi0, channel 0, id 0, lun 0 Write (10) 00 00 6b 0f 14 00 00 08 00

Then this (a lot of lines):

Mar 28 12:00:45 mail kernel: SCSI host 0 abort (pid 14301062) timed out - resetting
Mar 28 12:00:45 mail kernel: SCSI bus is being reset for host 0 channel 0.

Somewhere in between this shows up:

Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Performing Domain validation.

Then this:

Mar 28 12:00:45 mail kernel: SCSI host 0 reset (pid 14301061) timed out again - 
Mar 28 12:00:45 mail kernel: probably an unrecoverable SCSI bus or device hang.

And finally this:

Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Successfully completed Domain validation.
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Using asynchronous transfers.
Mar 28 12:00:45 mail kernel: (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offse 31.
Mar 28 12:00:45 mail kernel: (scsi0:0:0:0) Using asynchronous transfers.   

followed by some more liens of previous messages. This are the last
entries I got in /var/log/messages before rebooting (hard). The machine
was sortof alive (ie. ping, httpd, php3...), but I was unable to login
(even locally). The one console I had open was able to do 'ls', 'free',
'dmesg', things doing anything with hard disk froze up. Even 'shutdown'
and 'reboot' failed to execute.

The weird thing is that all of these messages occured in a single second
(12:00:45).

I'm asking if someone with more SCSI experience could diagnose what could
be the cause of that?

  Thanks, D.

PS: More info about the machine:

CPU:Dual P-III 500 MHz
Board:  Intel L440GX
Disks:  4x IBM DNES-309170Y (3 RAID5 + 1 spare)
LAN:Integrated Inte EtherExpress Pro 10/100 

cat /proc/interrupts
   CPU0   CPU1
  0: 253546 252241IO-APIC-edge  timer
  1: 99103IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  4:473472IO-APIC-edge  serial
  8:  0  0IO-APIC-edge  rtc
 13:  1  0  XT-PIC  fpu
 19: 358370 359232   IO-APIC-level  aic7xxx, aic7xxx
 21: 225846 225239   IO-APIC-level  Intel EtherExpress Pro 10/100 Ethernet

cat /proc/ioports
-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
03c0-03df : vga+
03f8-03ff : serial(auto)
1080-109f : Intel Speedo3 Ethernet
1400-14be : aic7xxx
1800-18be : aic7xxx

uname -a
Linux my.host.name 2.2.13 #1 SMP Tue Mar 14 11:55:56 CET 2000 i686 unknown




Re: Problems again

2000-03-28 Thread Danilo Godec

On Tue, 28 Mar 2000, Mike Bilow wrote:

 That's a hardware problem.  A SCSI parity error is reported by the
 hardware and simply passed up the chain.  Unless there is something
 seriously wrong in the aic7xxx sequencer code, which I doubt, this looks
 like a typical cabling and termination issue.

Well, the chassis is an Intel pre-installed rack mountable one with
hot-swappable SCSI backplane. All the cables were there allready connected
to disk racks. All I had to do was to install the disks in the racks and
slide them in.

I think I wasn't  able to screw something up there... :/

 Hard to say, but my guess is that your drive has elected to shut down.  I
 don't know what devices are on the bus, but the negotiation of aynchronous
 transfers is not a good sign and it may indicated one of the lines is
 being held in a funny state.  Are you trying to run slow and fast devices
 on the same SCSI bus?

No, the disks are the only SCSI devices there. No other disk/tape
devices there (except a standard 3,5" floppy, but it really shouldn't
matter).

 I think you have an electrical issue.

I feared that, but what should I do? It's all LVD and all pre-installed by
Intel... except disks, of course. 

Besides, it only happens every few weeks even though the machine is
pretty active (in use).

 
   19: 358370 359232   IO-APIC-level  aic7xxx, aic7xxx
 * * *
  1400-14be : aic7xxx
  1800-18be : aic7xxx
 
 Are you really running two separate aic7xxx controllers?  Do they have the
 same firmware revision?

I guess the motherboard has two chips integrated. I didn't really check
then (now it's off-site), but kernel detects two hosts (scsi0  scsi1).
The board also features two 68-pin SCSI connectors (the one I use is
marked LVD, the other is marked SE).



Thanks, D.




Re: RAID5 array not coming up after repaired disk

2000-03-24 Thread Danilo Godec

On Fri, 24 Mar 2000, Douglas Egan wrote:

 When this happened to me I had to "raidhotadd" to get it back in the
 list.  What does your /proc/mdstat indicate?
 
 Try:
 raidhotadd /dev/md0 /dev/sde7
 

I *think* you should 'raidhotremove' the failed disk-partition first, then
you can 'raidhotadd' it back.

   D.




Re: Which patch? Kernel 2.2.14

2000-03-14 Thread Danilo Godec

On Mon, 13 Mar 2000, Clinton Bittel wrote:

 I tried patching 
 ide_2_2_14_2124_patch.gz
 raid-2_2.14-B1.gz
 And still cannot find a mention of the Ultra 66 or 33 
 when I go to recompile the kernel.  Is it already
 built in??

ide_2_2_14_2124 takes care of ATA-66 drivers. They are not reffered to
as in general 'ATA-66', but are rather mentioned on per-chipset basis.

CONFIG_BLK_DEV_HPT366 is one, for example...

D.




Disk or SCSI bus problem?

2000-03-13 Thread Danilo Godec

Hi!

I have a three disk RAID5 with 2.2.13-SMP kernel (with 2.2.11 raid
patches) and recently I seem to be havink some disk related trouble.

Once the machine was brougth down by a huge amount of SCSI errors (printed
out to the console). That time I was unable to track the problem,
especially cause the machine was working well after reboot (I even
badblocked the suspected disk, the reconstruction went well...).

Of course, the machine is now under 'heavy surveilance' and recently I got
this in /var/log/messages:

(scsi0:0:0:0) Parity error during Message-In phase.
(scsi0:0:0:0) Parity error during Data-In phase.
(scsi0:0:0:0) Parity error during Message-In phase.
(scsi0:0:0:0) Parity error during Data-In phase.
(scsi0:0:2:0) Parity error during Message-In phase.
(scsi0:0:2:-1) Unexpected busfree, LASTPHASE = 0xa0, SEQADDR = 0x14f 

Does anyone have a clue, what this might mean?


Thanks, Danilo

__
|Danilo Godec|Agenda d.o.o.|   ISP for business  |
|  jr. Syst. Admin   |   Gosposvetska 84   | WAN networks|
|   [EMAIL PROTECTED]   |   si-2000 Maribor   |  Internet/Intranet  |
| tel:+386.62.226364 |  Slovenija  | Application servers |
| fax:+386.62.226364 | http://www.slon.net |  Caldera OpenLinux  |



Failed disk

2000-03-13 Thread Danilo Godec

If I have a three disk RAID5 array and one disk seems to be slowly
failing. The disks are on hot-swapable backplane. 

I know that 'echo "scsi remove-single-device X X X X"  /proc/scsi/scsi'
works for me and I can remove and replace the disk, but NOT as long it is
in use in RAID5 array. I don't want to stop the array for two reasons: 

1. / file system is on it
2. it's a production machine, running web and mail services for a lot of
users

Is it somehow possible to temporarily mark the disk as unused (failed?),
apply this setting to the running array and thus 'free' the device for
removal?


 Thanks, D.

__
|    Danilo Godec|Agenda d.o.o.|   ISP for business  |
|  jr. Syst. Admin   |   Gosposvetska 84   | WAN networks|
|   [EMAIL PROTECTED]   |   si-2000 Maribor   |  Internet/Intranet  |
| tel:+386.62.226364 |  Slovenija  | Application servers |
| fax:+386.62.226364 | http://www.slon.net |  Caldera OpenLinux  |



RE: Failed disk

2000-03-13 Thread Danilo Godec

On Mon, 13 Mar 2000 [EMAIL PROTECTED] wrote:

 I think what you are looking for is: 
   raidhotremove /dev/md? /dev/sd??

I already tried that. Simply raidhotremove-ing doesn't work as the
/dev/sd?? is used (it complains about it). But you're close.

However, I found out that Ingo's _dangerous_ raidtools (2116) include
'raidsetfaulty' command, which marks the device as failed. It is possible
to raidhotremove it afterwards. Redhat 6.1 original raidtools-0.90-5 don't
inlcude that command.

I'm currently testing this on my local machine using multiple partitions
of a single disk as a RAID5 array (for testing only) and it's looking
good.

  Thanks,  D.

__
|Danilo Godec|Agenda d.o.o.|   ISP for business  |
|  jr. Syst. Admin   |   Gosposvetska 84   | WAN networks|
|   [EMAIL PROTECTED]   |   si-2000 Maribor   |  Internet/Intranet  |
| tel:+386.62.226364 |  Slovenija  | Application servers |
| fax:+386.62.226364 | http://www.slon.net |  Caldera OpenLinux  |



Re: RAID5 and 2.2.14

2000-01-24 Thread Danilo Godec

On Sun, 23 Jan 2000, David Cooley wrote:

 Here's what I get when patching against a fresh 2.2.13-1.3.0 kernel source
 
 
 
 Where'd you get your source?
 I downloaded mine from ftp.kernel.org and it's 2.2.14-1.3.0

What is this '-1.3.0'? I don't think this is plain kernel source...

If I go to ftp://ftp.kernel.org/pub/linux/kernel/v2.2/ (that is where
official kernel tarballs are) I see linux-2.2.14.tar.bz2 (and .gz and
.sign files).


D.




Re: raid with 2.2.13

2000-01-16 Thread Danilo Godec

On Sun, 16 Jan 2000, Standardaccount wrote:

 How can I get raid running?
 
 BTW: I've tried to apply the kernel-patch for the 2.2.11 kernel, but
 patch won't work. Is there any need for the kernel- patch and where can
 I get it for the 2.2.13 kernel?

You need to apply the 2.2.11 patch to 2.2.13 kernel tree. There are a few
(I think two) errors reported, but they can be safely ignored.

This is necessary as plain 2.2.13 kernel use the 'old style' raid code,
while raidtools-0.90 make use of the 'new style' raid code.


   D.

PS: You can use 2.2.14 kernel with a 2.2.14 patch now
(http://people.redhat.com/mingo/raid...).




Re: kernel patch?

2000-01-14 Thread Danilo Godec

On Thu, 13 Jan 2000, Edward Schernau wrote:

 I am running 2.2.13, whose config script has options for RAID.  I have
 raidtools-0.90.  Why/Do I need to patch?  Pointers appreciated.

You have to patch because plain 2.2.13 kernel has an 'old style' raid,
while raidtools-0.90 are designed for 'new style' raid (which adds
autodetection and other nice stuff).

For 2.2.13 you can use the patch for 2.2.11, while for 2.2.14 you have to
get a new patch from http://www.redhat.com/~mingo/raid-2.2.14-B1

D.





Re: Performance?

1999-12-10 Thread Danilo Godec

On Thu, 9 Dec 1999, Randy Winch wrote:

 Retested with with mem=256M:

I usually use mem=12M and boot into single mode, so that memory really has
almost no influence.

 bonnie
 
   ---Sequential Output ---Sequential Input--
 --Random--
   -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
 --Seeks---
 MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU 
 /sec %CPU
  2000  7230 98.3 37168 59.2 19576 55.8  8305 97.4 71834 58.3
 339.7  6.6

Well, in my case the read performance dropped from 280MB/sec (hehe, would
be nice though) to ~30MB/sec (which is pretty cool too). Write performance
was also influenced, but I don't remember the figures. I guess bonnie is
using pretty standard routines for file operations which get cached and
buffered by the kernel. 

I suggest you REALLY limit the memory down (like 12M, could try lower) and
run bonnie with 'normal' file sizes - this way you do get pretty real
results and you don't have to wait that long...


D.



Re: 5*36.5 GB SoftRAID problem

1999-12-08 Thread Danilo Godec

On Wed, 8 Dec 1999, Jakob Sandgren wrote:

 No, actually i did not. I read the FAQ and according to it should it
 be ok to start with mke2fs _before_ the sync/(re)build was finished.
 Anyone else who could confirm that this should be a problem?

A few days ago a bug - that could cause that - was mentioned on this list.

It was said it could corrupt swap writes when the background
reconstruction was going on, but who knows...

D.



Re: Web page for kernel/raid updates Promise Ultra66 issues

1999-12-05 Thread Danilo Godec

On Sun, 5 Dec 1999, Zach Coombes wrote:

 - the Ultra66 isn't supported yet.  I'm running a 2.3.XX beta (yes I
   take digitalis regularly) as recent 2.2 kernels didn't even register
   the Ultra66 controllers' existence.  Are the Ultra66 patches to the
   kernel nearing a state where we'll see them in a 2.2 release soon?
   (Slap me if this reads as a "aren't you finished yet" poke at the
   developers - it's not meant to be)

Well, the ide patches located in
ftp.kernel.org/pub/linux/kernel/people/hedrick apply to 2.2 kernels very
nicely and include support for a variety of UDMA66 controller and
chipsets. 

I think it's better and safer to run a non-developement kernel with
pathces made for it (especially in production machines).

 Also some of the drives are coming up in PIO mode.  Is there any
 redress to adjust this before mounting the drives (i.e. request
 that it re-check for DMA capable drives)?

You could use 'hdparm -d1 device'. You can do that after mounting too.

D.




Re: Raid with new kernel

1999-12-05 Thread Danilo Godec

On Sun, 5 Dec 1999, ACEAlex wrote:

 the 2.2.13 kernel with is the latest stable. But when i try to start using
 it i get a different startup screens (see belove). Do i have to patch the
 kernel before i use raidtools. Cause i get errors when trying to execute

Yes. The RedHat kernel includes the latest RAID patches. You should patch
your 2.2.13 kernel too. The patch will probably produce two rejects, but
you can ignore them.

ftp.kernel.org/pub/linux/daemons/raid/alpha/raid0145-19990824-2.2.11.bz2

 mkraid etc.. Also i have another question. In some faqs they are talking
 about mdadd and mddel etc.. But i cant find those with the redhat package. I
 use mkraid and edit the /etc/raidtab file.

This are the 'old' raid utilities. Now you should look for raidstart,
raidstop, etc. 



 D.




Re: problems with RAID fs

1999-12-03 Thread Danilo Godec

On Thu, 2 Dec 1999, Terry Ewing wrote:

 I also manually used cp to copy about 10 or 12 of the corrupted files from 
 the original tree to the RAID filesystem.  After this, the files that I 
 copied did not differ from the originals.  It seems that files become 
 corrupted under a heavy load either by the RAID5 daemon or in hardware.

Bad RAM can often be the cause for wierd problems like that (I had my
share of that). Now I use memtest86 on every machine I build and it seems
very reliable - last week I discovered 4 out of 12 brand new DIMMs to be
faulty and machines didn't even complain under moderat load. Kernel
compile using 'make -j' resulted in many 'signal 11' errors.

http://reality.sgi.com/cbrady_denver/memtest86/ 

   D.




Re: ac?

1999-11-30 Thread Danilo Godec

On Tue, 30 Nov 1999, David Cunningham wrote:

 I've seen a lot of recommendations for obtaining the 2.2.13ac kernel.  So
 far I've found nothing listed with the ac suffix.  What is this ac?

These are Alan Cox's patches located on kernel.org mirrors in
/pub/linux/kernel/people/alan/ directory. They combine some features
and/or bug fixes usually found in separate patches.


D.

PS: I got excelent results with plain 2.2.13+raid0145-19990824-2.2.11
(only two rejects while patching, both due to already patched files).



Re: Could not change configuration.

1999-11-25 Thread Danilo Godec

On Thu, 25 Nov 1999, Dong Hu wrote:

 Now I want to change the configuration to raid0, 
 so I edit the /etc/raidtab file,
 issue  mkraid --force /dev/md0,

I suppose you did stop the raid device ('raidstop /dev/md0') first? Then I
think you should use --really-force (this is not in the documentation, but
it is printed on the screen when you do mkraid --force).


 D.




Partitions on RAID ?

1999-11-15 Thread Danilo Godec

Hi!

Can I use partitions on software raid device (/dev/md0, raid-5 in my
case)? 

Using 2.2.13 with raid0145-19990824-2.2.11 patch and raidtools-0.90.

Thanks, D.




Archive?

1999-11-12 Thread Danilo Godec

Hi!

I'm new to this list and have some questions. However I'd like to first
browse through the list archive if it's available somewhere.

Is it? :)

Thanks, D.



Monitoring?

1999-11-12 Thread Danilo Godec

Hi!

Ok, found an archive, but haven't found the questions/answers I was hoping
to find.

I have a RAID1 setup with kernel 2.2.13 and appropriate patches for 2.2.11
(only two files didn't patch correctly, as they were already patched in
2.2.13) and raidtools-0.90. Everything works nice, even hot-swapping disks
(with hot-pluggable SCSI backplane and some caution, of course) didn't
cause a problem.

However, are there any tools already available to monitor the md device
and notify the administrator via mail, modem, pager etc.? 

Thanks, D.




Re: Monitoring?

1999-11-12 Thread Danilo Godec

On Fri, 12 Nov 1999, [iso-8859-1] Jakob stergaard wrote:

 It should be fairly simple to grep for underscores in /proc/mdstat using
 cron+{perl,grep,whatever} and send a mail if one is found.
 
 When a disk dies it is marked in /proc/mdstat like  [UU_U].

Thanks, I think I will do that.

Now for another question: 

I have a hot-swappable SCSI backplane, so I simulated a dead disk by
simply removing it (while there was no I/O activity). If I umount /dev/md0
and stop it (raidstop /dev/md0), I can use /proc/scsi/scsi and first
remove the dead-disk entry and then add a new disk (echo "scsi
[remove|add]-single-device 0 0 1 0"  /proc/scsi/scsi). Then, I can
raidhotadd the new disk to /dev/md0 and the world is nice.

However, is there a way to do all this while raid1 is stil active? So that
users never have to notice something went wrong with disks?


Thanks, D.


PS: I thought of adding the new disk with some other ID, but the backplane
has fixed IDs so I cannot change them (disk0= ID 0, disk1= ID 1).