Re: [PATCH md 2 of 4] Fix raid6 problem

2005-02-13 Thread Gordon Henderson
On Sun, 13 Feb 2005, Mark Hahn wrote:

  Interesting - the private mail was from me, and I've got two dual
  Opterons in service. The one with significantly more PCI activity has
  significantly more problems then the one with less PCI activity.

 that's pretty odd, since the most intense IO devices I know of
 are cluster interconnect (quadrics, myrinet, infiniband),
 and those vendors *love* opterons.  I've never heard any of them
 say other than that Opteron IO handling is noticably better than
 Intel's.

 otoh, I could easily believe that if you're running the Opteron
 systems in acts-like-a-faster-xeon mode (ie, not x86_64),
 you might be exercising some less-tested paths.

I was about to post that I've solved my problems with that Tyan dual
opteron motherboard, but it's still crap. I upgraded the BIOS to the 2.02
beta and it seemed to work a lot better.  Still couldn't boot off it with
all 8 drives in, but solved that with the use of a 32MB flash IDE unit
holding /boot... However, it dropped a drive during initial sync of the
raid6 arrays with lots of SCSI errors, and had given lots of DMA interrupt
missing, etc. thorugh the day when I've run soaktests on it, so I'm going
to conclude that that Tyan motherboard is utterly useless and deserves
nothing more than being driven over. Slowly. With a steam roller.  Then
jumped on. Just to make me feel better.

Gordon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH md 2 of 4] Fix raid6 problem

2005-02-13 Thread Mike Hardy

Mark Hahn wrote:
Interesting - the private mail was from me, and I've got two dual 
Opterons in service. The one with significantly more PCI activity has 
significantly more problems then the one with less PCI activity.

that's pretty odd, since the most intense IO devices I know of 
are cluster interconnect (quadrics, myrinet, infiniband),
and those vendors *love* opterons.  I've never heard any of them
say other than that Opteron IO handling is noticably better than
Intel's.
Sure, but which variables are changed between the rigs the vendors 
loved, and the rig we're having problems with?

otoh, I could easily believe that if you're running the Opteron 
systems in acts-like-a-faster-xeon mode (ie, not x86_64),
you might be exercising some less-tested paths.
Its running x86_64 (Fedora Core 3) and the problem is rooted in the 
chipset I believe. I don't think its Opterons per se, I think its just 
the Athlon take two - which is to say that its a wonderful chip, but 
some of the chipsets its saddled with are horrible, and careful 
selection (as well as heavy testing prior to putting a machine in 
service) is essential.

-Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH md 2 of 4] Fix raid6 problem

2005-02-13 Thread Richard Scobie
Mike Hardy wrote:
Its running x86_64 (Fedora Core 3) and the problem is rooted in the 
chipset I believe. I don't think its Opterons per se, I think its just 
the Athlon take two - which is to say that its a wonderful chip, but 
some of the chipsets its saddled with are horrible, and careful 
selection (as well as heavy testing prior to putting a machine in 
service) is essential.
I'd be interested to hear the outcome, as I'll be looking at Opteron
systems soon and having been badly bitten on dual Athlon (AMD768
Southbridge), would prefer to avoid a repeat of the experience.
Regards,
Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH md 2 of 4] Fix raid6 problem

2005-02-13 Thread Tim Moore

Gordon Henderson wrote:
Anyone using Tyan Thunder K8W motherboards???
I now know, there is a K8S (server?) version of that mobo, but at the time
it was all orderd, I wasn't aware of it - my thoughts are there there is
some sort of PCI/PCI-X problem with either the motherboard or the chipset,
and in all probability the K8S mobo will have the same chipsset and same
problems anyway...
I'm using a K8W at work as a driver client for NAS testing.  Onboard 
Broadcom GigE, Linksys Marvell GigE, 2xWD1200JD + 2xMaxtor Maxline Plus II 
as RAID-0 and RAID-5 using the Sil_3114, 2.4.29, raidtools 1.0.  2+2x1GB 
PC-2700 in first and third slots for each CPU.  All PCI-X/HT configs set to 
 Auto in BIOS and Jumpers, 2.02b BIOS.  No issues except for a bad SATA cable.

Striping yields ~90MB/s, RAID-5 about 65r, 55w on 8GB Bonnie++ runs, 2GB dd 
reads on raw devices yields ~55MB/s

Fedora Core 2 tests next week.

--
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH md 2 of 4] Fix raid6 problem

2005-02-13 Thread Tim Moore

Gordon Henderson wrote:
What I wanted was an 8-way RAID-1 for the boot partition (all of /, in
reality) and I've done this many times in the past on other 2-5 way
systems without issue. So I do the stuff I've done in the past, and theres
nothing really new to me in that respect. (I'm using LILO) So when I try
to get it to boot off the md device, it boots and says LIL and then
nothing more. (Lilo diagnostics interpret this as a media failure, or
geometry mismatch) If I make it boot off /dev/sda1 then it would work.
We put /boot on 100MB /dev/sda1 partition, rest of drive is md.  Lilo 
script section does
dd if=/dev/sda of=/boot/boot446.sda bs=446 count=1  \
fdisk -l /dev/sda  /boot/fdisk.sda  \
dd if=/dev/sda1 of=/dev/sdb1
every time a new kernel is built.  Recovery is much easier without RAID 
involved (lilo 22.6).  I've considered manipulating the boot block/disk 
label on copy so that it would boot off any off sda1 or sdb1 transparently.

(ie. boot off /dev/sda1, root on /dev/md1, an 8-way RAID-1) I tried many
combinations of old (Debian woody)  new Lilo (compiled from the latest
source), I even tried GRUB at one point with no luck either. It was more
frustrating as the turn-around time is several minutes by the time you go
through the BIOS to change the boot device, then reboot, change lilo.conf,
then try again )-:
It seemed more stable with just one PCI card in, so I have a 4-port card
on order as a last ditch attempt to make it work - I did try re-flashing
the BIOS on one board, (I have 2) as it seemed to be about a year old and
there are several updates on the Tyan web-site, however that resulted in
wiping out the BIOS - it seemed to be going just fine, then it went beep
and was silent forever more )-: Anyone in the SW have a flash
programmer/copier handy???
Flashing from 1.x to 2.02b, same problem.  Power off, pull plug, pull both 
power connectors off mobo, wait 15 seconds, clear CMOS for 15 seconds, 
reboot, reset BIOS, no worries.

Maybe, or maybe we just move to an Intel system, although power
dissipation was a consideration and the Opterons are attractive in that
aspect... The case has a 600W PSU before anyone asks..
Yeech.  Get the 8131's working and you'll never go back.  No Northbridge 
bottlenecks, thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html