Re: [CentOS] HP ProLiant DL380 G5

2014-08-22 Thread John R Pierce
On 8/21/2014 9:50 PM, Keith Keller wrote:
 On 2014-08-22, Valeri Galtsevgalt...@kicp.uchicago.edu  wrote:
 
 3ware was independent company till it was first bought by AMCC, then LSI
 bought them from AMCC. I didn't know LSI sold them to someone else,
 Sorry, I was not clear: LSI was bought, not just 3ware.  If you look at
 LSI's home page, it says An Avago Technologies Company.

there's been a whole lot of merging and splitting.

Avago used to be HP's semiconductor business, was spun off in the great 
split/merger with Compaq as part of Aglient, then in 2005 kicked to the 
curb as an independent.   They are big on analog, mixed signal, and 
microwave semiconductors.

LSI had acquired 3Ware, ONStor, a bunch more, and merged them into 
Engenio, which was their NAS/SAN division, making OEM storage arrays 
sold by IBM, HP, Dell and others.  Engenio has been sold to Network 
Appliance, the original NAS company.

  LSI acquired the Sandforce SSD chip business.  Avago acquired and 
merged with LSI.   Avago unbundled Sandforce and sold it to Seagate.  
Avago sold LSI's networking stuff to Intel just last week.  AFAIK, Avago 
still sells the LSI sas chips and both megaraid and 3ware raid controllers.






-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-22 Thread Keith Keller
On 2014-08-22, John R Pierce pie...@hogranch.com wrote:

 there's been a whole lot of merging and splitting.

You know more about this than is probably healthy.  ;-)

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-22 Thread Les Mikesell
On Thu, Aug 21, 2014 at 6:17 PM, John R Pierce pie...@hogranch.com wrote:

 Yes, but try a software RAID when you have intermittently bad RAM.
 I've been there.  Mirrored disks that were almost, but not quite,
 mirrors.
 try any file system when you've got flakey ram.data thats not quite
 what you wanted, oh boy.

Yes, but if you fix the RAM, fsck the disk, and rewrite the data you
sort of expect it to work again.  In this case with the mirrors
randomly mismatching but marked as good, fsck would read the good one
in some spots when checking but later the system would read the bad
one.  In hindsight the reason is obvious but it took me a while to see
why the box still crashed every few weeks.

 which, btw, is why I insist on ECC for servers.  and really prefer ZFS
 where each block of each part of a raid is checksummed and timestamped,
 so when scrub finds mismatching blocks, it can know which one is correct.

I thought this was supposed to be ECC with 1-bit correction - and I
thought that was supposed to mean that if it couldn't correct it would
just stop, but it didn't.  It took about 3 days of a memtest-86 run to
hit the problem and show that it was RAM - and it has run for many
subsequent years since swapping it all.   But, the only reason that
box is still around is that it is an enormous tower case and the only
thing I had with enough drive bays for what I was doing then.

-- 
  Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-22 Thread m . roth
Les Mikesell wrote:
 On Thu, Aug 21, 2014 at 6:17 PM, John R Pierce pie...@hogranch.com
 wrote:

 Yes, but try a software RAID when you have intermittently bad RAM.
 I've been there.  Mirrored disks that were almost, but not quite,
 mirrors.
 try any file system when you've got flakey ram.data thats not quite
 what you wanted, oh boy.

 Yes, but if you fix the RAM, fsck the disk, and rewrite the data you
 sort of expect it to work again.  In this case with the mirrors
 randomly mismatching but marked as good, fsck would read the good one
 in some spots when checking but later the system would read the bad
 one.  In hindsight the reason is obvious but it took me a while to see
 why the box still crashed every few weeks.

 which, btw, is why I insist on ECC for servers.  and really prefer ZFS
 where each block of each part of a raid is checksummed and timestamped,
 so when scrub finds mismatching blocks, it can know which one is
 correct.

 I thought this was supposed to be ECC with 1-bit correction - and I
snip
But wait, Jim, it's worse than that... does the 380 have *mirrored*
memory? I know that the 580 I have does

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Matt
I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
has eight 750GB drives in a hardware RAID6 array.  Its acting as a
host for a number of OpenVZ containers.

Seems like every time I reboot this server which is not very often it
sits for hours running a disk check or something on boot.  The server
is located 200+ miles away so its not very convenient to look at.  Is
there anyway to tell if it plans to run this or tell it not too?

Right now its reporting one of the drives in array is bad and last
time it did this a reboot resolved it.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread John R Pierce
On 8/21/2014 12:43 PM, Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.

assuming thats a hp smartarray raid controller, use hpacucli to diagnose 
the raid problem.

degraded raid6 is /really/ slow.



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread m . roth
Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.

You need to know what it's running. If it's doing an fsck, that will take
a lot of time. If it's firmware in the RAID controller, that's different.
You can run tune2fs /dev/whatever and see how often it wants to run fsck.
For that matter, what's the entry in /etc/fstab?

  mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread GKH
Hate to change the conversation here but that's why I hate hardware RAID.
If it was software RAID, Linux would always tell you what's going on.
Besides, Linux knows much more about what is going on on the disk and what is 
about to happen (like a megabyte DMA transfer).

BTW, check if something is creating:

/forcefsck

That would make the fsck run every time.

GKH

 Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.

 You need to know what it's running. If it's doing an fsck, that will take
 a lot of time. If it's firmware in the RAID controller, that's different.
 You can run tune2fs /dev/whatever and see how often it wants to run fsck.
 For that matter, what's the entry in /etc/fstab?

   mark

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Digimer
You know that hpacucli (and MegaCli on LSI-based hardware RAID systems) 
can usually tell you more about the array and the drives than mdadm can, 
right? Also, if you're doing parity, having hardware RAID moves the 
parity calculations to a dedicated ASIC, avoiding any load of note on 
the CPU. Also, with hardware RAID, you get battery-backed or 
flash-backed write-back caching, which can dramatically improve performance.

That said, mdadm is better that cheap fake-RAID controllers like you 
find on most mainboards, I will give you that.

digimer

On 21/08/14 04:33 PM, GKH wrote:
 Hate to change the conversation here but that's why I hate hardware RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and what is 
 about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 That would make the fsck run every time.

 GKH

 Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.

 You need to know what it's running. If it's doing an fsck, that will take
 a lot of time. If it's firmware in the RAID controller, that's different.
 You can run tune2fs /dev/whatever and see how often it wants to run fsck.
 For that matter, what's the entry in /etc/fstab?

mark

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Matt
 Hate to change the conversation here but that's why I hate hardware RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and what is 
 about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

These exist:

-rw-r--r--1 root root 0 Jul  7 10:03 .autofsck
-rw-r--r--1 root root 0 Jul  7 10:03 .autorelabel

What does that mean?

 That would make the fsck run every time.

 GKH

 Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.

 You need to know what it's running. If it's doing an fsck, that will take
 a lot of time. If it's firmware in the RAID controller, that's different.
 You can run tune2fs /dev/whatever and see how often it wants to run fsck.
 For that matter, what's the entry in /etc/fstab?

   mark

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread m . roth
Matt wrote:
 Hate to change the conversation here but that's why I hate hardware
 RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and
 what is about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 These exist:

 -rw-r--r--1 root root 0 Jul  7 10:03 .autofsck
 -rw-r--r--1 root root 0 Jul  7 10:03 .autorelabel

 What does that mean?

ARRRGGGHGHGHGHGHGHHGHG!!!

First, delete /.autofsck. That will stop it from fsckin'g *everything*
every reboot. Second, is selinux in enforcing mode? In any case, have you
recently done major changes? If not, delete /.autorelabel, since an
selinux relabel takes a *while*, esp. if you have *lots* of files.

 mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Alexander Dalloz
Am 21.08.2014 um 23:00 schrieb m.r...@5-cent.us:
 Matt wrote:
 Hate to change the conversation here but that's why I hate hardware
 RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and
 what is about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 These exist:

 -rw-r--r--1 root root 0 Jul  7 10:03 .autofsck
 -rw-r--r--1 root root 0 Jul  7 10:03 .autorelabel

 What does that mean?

 ARRRGGGHGHGHGHGHGHHGHG!!!

 First, delete /.autofsck. That will stop it from fsckin'g *everything*
 every reboot. Second, is selinux in enforcing mode? In any case, have you
 recently done major changes? If not, delete /.autorelabel, since an
 selinux relabel takes a *while*, esp. if you have *lots* of files.

   mark

No, /.autofsck is not harmful and will cause nothing unless 
/etc/sysconfig/autofsck exists and has something specific defined. The 
/.autofsck is automatically created at each boot by the system.

/.autorelabel is as well only a control file and does not cause a full 
SELiux relabeling at each boot.

If you don't believe me, please see /etc/rc.sysinit.

Alexander


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Matt
 Hate to change the conversation here but that's why I hate hardware
 RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and
 what is about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 These exist:

 -rw-r--r--1 root root 0 Jul  7 10:03 .autofsck
 -rw-r--r--1 root root 0 Jul  7 10:03 .autorelabel

 What does that mean?

 ARRRGGGHGHGHGHGHGHHGHG!!!

 First, delete /.autofsck. That will stop it from fsckin'g *everything*
 every reboot. Second, is selinux in enforcing mode? In any case, have you
 recently done major changes? If not, delete /.autorelabel, since an
 selinux relabel takes a *while*, esp. if you have *lots* of files.

  mark

The directions for installing OpenVZ on Centos 6 stated to disable
selinux, on this box I missed that step, whoops.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Valeri Galtsev

On Thu, August 21, 2014 3:54 pm, Matt wrote:
 Hate to change the conversation here but that's why I hate hardware
 RAID.

I love hardware RAID. 3ware more than others. In case of hardware RAID it
is tiny specialized system (firmware) that is doing RAID function. In the
specialized CPU (I should have called it differently) inside hardware RAID
controller. Independent on the rest of computer, and needing only power to
keep going. Tiny piece of code, very simple function. It is really hard to
introduce bugs into these. Therefore you unlikely will have problem on
device level. To find the status of the device and its components
(physical drives) you always can use utility that comes from hardware
vendor, you can have even web interface if it is 3ware.

 If it was software RAID, Linux would always tell you what's going on.

It does. And so does hardware RAID device. And most of them (3ware in
particular) do not do offline (i.e. delaying boot) check/rebuild, but they
do it online (they are being operational in degraded state, and do
necessary rebuild with IO present on the device, they just export
themselves to Linux kernel with the warning of being degraded RAID during
boot).

Software RAID, however, has a disadvantage (more knowledgeable people will
correct me wherever necessary). Software RAID function is executed by main
CPU. Under very sophisticated system (linux kernel), as one of the
processes (even if it is real time process), on the system that is
switching between processes. Therefore, RAID task for software RAID lives
in much more dangerous environment. Now, if it never finishes (say, kernel
panics due to something else), you get inconsistent device (software RAID
one), and it is much much much harder task to bring that to to some extent
consistent state than, e.g., to bring back dirty filesystem that lives on
the sane device. This is why we still pay for hardware RAID devices. I do.

Just my 2c.

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Matt
 Hate to change the conversation here but that's why I hate hardware
 RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and
 what is about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 These exist:

 -rw-r--r--1 root root 0 Jul  7 10:03 .autofsck
 -rw-r--r--1 root root 0 Jul  7 10:03 .autorelabel

 What does that mean?

 ARRRGGGHGHGHGHGHGHHGHG!!!

 First, delete /.autofsck. That will stop it from fsckin'g *everything*
 every reboot. Second, is selinux in enforcing mode? In any case, have you
 recently done major changes? If not, delete /.autorelabel, since an
 selinux relabel takes a *while*, esp. if you have *lots* of files.

   mark

 No, /.autofsck is not harmful and will cause nothing unless
 /etc/sysconfig/autofsck exists and has something specific defined. The
 /.autofsck is automatically created at each boot by the system.

 /.autorelabel is as well only a control file and does not cause a full
 SELiux relabeling at each boot.

 If you don't believe me, please see /etc/rc.sysinit.

 Alexander

So I just need to SELINUXTYPE=disabled and ignore .autofsck and
.autorelabel?  Was the targeted selinux causing the slow reboots?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Bruce Ferrell
Have you inspected via the system iLO console?  Assuming it's cabled to the 
network



On 08/21/2014 01:33 PM, GKH wrote:
 Hate to change the conversation here but that's why I hate hardware RAID.
 If it was software RAID, Linux would always tell you what's going on.
 Besides, Linux knows much more about what is going on on the disk and what is 
 about to happen (like a megabyte DMA transfer).

 BTW, check if something is creating:

 /forcefsck

 That would make the fsck run every time.

 GKH

 Matt wrote:
 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.
 You need to know what it's running. If it's doing an fsck, that will take
 a lot of time. If it's firmware in the RAID controller, that's different.
 You can run tune2fs /dev/whatever and see how often it wants to run fsck.
 For that matter, what's the entry in /etc/fstab?

mark

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread GKH
Valeri,

I hope you realize that your arguments for hardware RAID
all depend on everything working just right.

If something goes wrong with a disk (on HW RAID)
you can't just simply take out the disk, move it to another
computer and maybe do some forensics.

The formatting of disks on HW RAID is transparent to Linux.
Therefore my disks are all RAID or not.

What if I wanted to mix and match? Maybe I don't want my swap
RAID for performance.

The idea of taking my data (which is controlled by an OSS
Operating System, Linux) and putting it behind a closed source
and closed system RAID controller is appalling to me.

It comes down to this: Linux knows where and when to position
the heads of disks in order to max performance. If a
RAID controller is in the middle, whatever algorithm
Linux is using is no longer valid. The RAID controller
is the one who makes the I/O decisions.

Sorry, this is not something I want to live with.

GKH



 On Thu, August 21, 2014 3:54 pm, Matt wrote:
 Hate to change the conversation here but that's why I hate hardware
 RAID.

 I love hardware RAID. 3ware more than others. In case of hardware RAID it
 is tiny specialized system (firmware) that is doing RAID function. In the
 specialized CPU (I should have called it differently) inside hardware RAID
 controller. Independent on the rest of computer, and needing only power to
 keep going. Tiny piece of code, very simple function. It is really hard to
 introduce bugs into these. Therefore you unlikely will have problem on
 device level. To find the status of the device and its components
 (physical drives) you always can use utility that comes from hardware
 vendor, you can have even web interface if it is 3ware.

 If it was software RAID, Linux would always tell you what's going on.

 It does. And so does hardware RAID device. And most of them (3ware in
 particular) do not do offline (i.e. delaying boot) check/rebuild, but they
 do it online (they are being operational in degraded state, and do
 necessary rebuild with IO present on the device, they just export
 themselves to Linux kernel with the warning of being degraded RAID during
 boot).

 Software RAID, however, has a disadvantage (more knowledgeable people will
 correct me wherever necessary). Software RAID function is executed by main
 CPU. Under very sophisticated system (linux kernel), as one of the
 processes (even if it is real time process), on the system that is
 switching between processes. Therefore, RAID task for software RAID lives
 in much more dangerous environment. Now, if it never finishes (say, kernel
 panics due to something else), you get inconsistent device (software RAID
 one), and it is much much much harder task to bring that to to some extent
 consistent state than, e.g., to bring back dirty filesystem that lives on
 the sane device. This is why we still pay for hardware RAID devices. I do.

 Just my 2c.

 Valeri

 
 Valeri Galtsev
 Sr System Administrator
 Department of Astronomy and Astrophysics
 Kavli Institute for Cosmological Physics
 University of Chicago
 Phone: 773-702-4247
 
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Les Mikesell
On Thu, Aug 21, 2014 at 5:32 PM, GKH x...@darksmile.net wrote:

 I hope you realize that your arguments for hardware RAID
 all depend on everything working just right.

Yes, but try a software RAID when you have intermittently bad RAM.
I've been there.  Mirrored disks that were almost, but not quite,
mirrors.

 If something goes wrong with a disk (on HW RAID)
 you can't just simply take out the disk, move it to another
 computer and maybe do some forensics.

You can if that other computer has a matching controller.   If you
expect to do forensics you should have that.  Most people would just
use a backup, though.

 What if I wanted to mix and match? Maybe I don't want my swap
 RAID for performance.

If you want performance, you'll have enough RAM that you won't ever
page swap back in.

 The idea of taking my data (which is controlled by an OSS
 Operating System, Linux) and putting it behind a closed source
 and closed system RAID controller is appalling to me.

Why?  It should all be backed up.

 It comes down to this: Linux knows where and when to position
 the heads of disks in order to max performance. If a
 RAID controller is in the middle, whatever algorithm
 Linux is using is no longer valid.

Really??? I don't think linux has ever known or cared much about disk
geometry and most disks lie about it anyway.

 The RAID controller
 is the one who makes the I/O decisions.

 Sorry, this is not something I want to live with.

I think you haven't actually measured any performance.

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread John R Pierce
On 8/21/2014 4:06 PM, Les Mikesell wrote:
 Yes, but try a software RAID when you have intermittently bad RAM.
 I've been there.  Mirrored disks that were almost, but not quite,
 mirrors.

try any file system when you've got flakey ram.data thats not quite 
what you wanted, oh boy.



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread John R Pierce
On 8/21/2014 4:11 PM, John R Pierce wrote:
 On 8/21/2014 4:06 PM, Les Mikesell wrote:
 Yes, but try a software RAID when you have intermittently bad RAM.
 I've been there.  Mirrored disks that were almost, but not quite,
 mirrors.
 try any file system when you've got flakey ram.data thats not quite
 what you wanted, oh boy.

which, btw, is why I insist on ECC for servers.  and really prefer ZFS 
where each block of each part of a raid is checksummed and timestamped, 
so when scrub finds mismatching blocks, it can know which one is correct.



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Steven Tardy
On Thu, Aug 21, 2014 at 3:43 PM, Matt matt.mailingli...@gmail.com wrote:

 I have CentOS 6.x installed on a HP ProLiant DL380 G5 server.  It
 has eight 750GB drives in a hardware RAID6 array.  Its acting as a
 host for a number of OpenVZ containers.

 Seems like every time I reboot this server which is not very often it
 sits for hours running a disk check or something on boot.  The server
 is located 200+ miles away so its not very convenient to look at.  Is
 there anyway to tell if it plans to run this or tell it not too?

 Right now its reporting one of the drives in array is bad and last
 time it did this a reboot resolved it.


run:
 tune2fs -l /dev/mapper/whatever_the_disk_is_called
check:
  Maximum mount count
  Next check after
if those are NOT -1 and 0 respectively change settings by running:
  tune2fs -i 0 -c 0 /dev/mapper/whatever_the_disk_is_called
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Valeri Galtsev

On Thu, August 21, 2014 5:32 pm, GKH wrote:
 Valeri,

 I hope you realize that your arguments for hardware RAID
 all depend on everything working just right.

 If something goes wrong with a disk (on HW RAID)
 you can't just simply take out the disk, move it to another
 computer and maybe do some forensics.

If the drive that is member of hardware RAID failed (say timed out while
dealing with reallocation of badblock), it is kicked out of the RAID, I
get notification by 3ware daemon, hot unplug the drive, hot plug in
replacement (of same or larger size), and start rebuild RAID through
firmware utility, or GUI interface, or controller starts RAID rebuild
automatically if configured so. The system on the machine has no idea
about all this and keeps running happily.

I do no forensics on failed drives. I run manufacturer drive fitness test
(whichever it is called by particular manufacturer), if drive passes, I
usually reuse it, if it fails test, I send it to drive manufacturer for
warranty replacement or toss if it is out of warranty.


 The formatting of disks on HW RAID is transparent to Linux.
 Therefore my disks are all RAID or not.

 What if I wanted to mix and match?

With 3ware RAID controller you can export one or more of attached drives
directly to the system. They can not participate in [hardware] RAID then.

 Maybe I don't want my swap
 RAID for performance.

Speaking of swap: RAM is large and cheap these days. I do not use swap on
machines with 32 GB of RAM or larger. On multitasking system you have to
switch between processes, which is more often than every millisecond.
Imagine now you need to swap in or out 32GB during some of the switching
between processes. Your system will be on its knees just because of that.
Unless you have very special block device which is almost as fast as RAM,
which you better just have added to the address space of RAM (from kernel
point of view), so we are not speaking of these devices as of hard drives.


 The idea of taking my data (which is controlled by an OSS
 Operating System, Linux) and putting it behind a closed source
 and closed system RAID controller is appalling to me.

Me too. Yet all of us use them. Example: hard drive firmware. Which has a
bit more sophisticated function than just take data and write track or
other way around. It detects badblocks, reallocated them, recovers as much
as possible information that was inside that bad block. Another example:
video card, say, NVIDIA one. It has processors inside that run software
that is flashed on its non-volatile memory. Now, with Nvidia cards
sometimes you have to use proprietary driver (if you have more or less
sophisticated display arrangement). Which is closed code binary driver.
What you compile when you install that is just an interface between closed
code driver and this particular version of kernel. And this code runs not
in the card itself, but under your system. And the list goes on. Not to
mention our cell phones...

The only time I was mad about firmware of some controller was one
particular version of via PATA controller that had a bug which was leading
to drive corruption...


 It comes down to this: Linux knows where and when to position
 the heads of disks in order to max performance. If a
 RAID controller is in the middle, whatever algorithm
 Linux is using is no longer valid. The RAID controller
 is the one who makes the I/O decisions.

Yes, but... And some of what I'll say was already mentioned in this
thread. You can tweak RAID controller to align stripe sizes with optimal
data chunks of the drives. Furthermore, you can have cache (battery backed
up), which increases device speed tremendously (30 times I saw for
something particular - just off the top of my head). Of course, RAM is
used as a cache for software RAID, but to the contrary to RAID controller
cache RAM content vanishes with power loss. You don't seem to be the
person who had to recover after [software] RAID cache loss. And _I_
definitely will not like to be the one. Hence I use hardware RAID, with
optimized stripe size, and optimized filesystem block size, and with cache
in RAID controller that is battery backed up. If you beat me in IO with
software RAID, I will live with that. As I do not like to give up
reliability. Not at as small extra cost as I have to pay for hardware
RAID. (And I do not include into hardware RAID these fake raid cards
that rely on driver - which are indeed software RAID cards. 3ware never
fell that low to make/sell any of such junk. Somebody who knows LSI better
than I do will probably say the same about them).

Again, just my 2c.

Valeri


 Sorry, this is not something I want to live with.

 GKH



 On Thu, August 21, 2014 3:54 pm, Matt wrote:
 Hate to change the conversation here but that's why I hate hardware
 RAID.

 I love hardware RAID. 3ware more than others. In case of hardware RAID
 it
 is tiny specialized system (firmware) that is doing RAID function. In
 the
 specialized CPU (I should 

Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Keith Keller
On 2014-08-21, Valeri Galtsev galt...@kicp.uchicago.edu wrote:
 (And I do not include into hardware RAID these fake raid cards
 that rely on driver - which are indeed software RAID cards. 3ware never
 fell that low to make/sell any of such junk. Somebody who knows LSI better
 than I do will probably say the same about them).

LSI has owned 3ware for some time (who in turn were recently gobbled by
some other company).  The LSI MegaRAID cards are real hardware RAID.
What I have heard, but only validated the former, is that the 3ware UI
is simpler but the MegaRAID controllers are faster.

That being said, I do also like mdraid for certain purposes.  I was able
to successfully repurpose a very old controller in an old but nicely
sized chassis (16 bays) by moving to an md RAID6 array, which the
hardware controller doesn't support (I did say it's old).  I also love
the want_replacement feature in md RAID, a feature I don't believe is
supported in the 3ware line (you can rebuild just one disk if it's bad
but hasn't been kicked out of the array yet; that rebuild is much
faster).

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Valeri Galtsev

On Thu, August 21, 2014 6:47 pm, Keith Keller wrote:
 On 2014-08-21, Valeri Galtsev galt...@kicp.uchicago.edu wrote:
 (And I do not include into hardware RAID these fake raid cards
 that rely on driver - which are indeed software RAID cards. 3ware
 never
 fell that low to make/sell any of such junk. Somebody who knows LSI
 better
 than I do will probably say the same about them).

 LSI has owned 3ware for some time (who in turn were recently gobbled by
 some other company).

3ware was independent company till it was first bought by AMCC, then LSI
bought them from AMCC. I didn't know LSI sold them to someone else,
www.3ware.com still redirects to LSI. To the credit of all 3ware owners,
they left nicely working 3ware (engineering, manufacturing, etc) mechanism
as is without destroying it. Which is rare, and they deserve my respect
for that alone. I do like LSI too, and have a few of their megaraid
controllers on long lived and still very reliable boxes. I do like 3ware
more as as you said it has more transparent UI, which in my book
diminishes chance of human error tremendously. And I'm merely human. I do
like 3ware is coming with daemon that watches and notifies me when
necessary.

 The LSI MegaRAID cards are real hardware RAID.

I never heard about any LSI card called RAID which are not hardware RAID.
I can not say the same about, say, Adaptec or HighPoint to name two. Some
of Adaptec and some of HighPoint cards even though they are sold as RAID
cards are junk and have no place in our server room. Areca, probably, is
one more company whose RAID cards all are indeed hardware RAID cards (I
have 3 or 4 Areca cards in our server room).

So let me name them separately (I consider bad guys everybody who sells as
RAID card at least one product which is not hardware RAID card, they
confuse the world and interfere with fair profits of good guys):

good guys: LSI, 3ware, Areca

bad guys: Adaptec, HighPoint


 Somebody correct me if/where I'm wrong.

Valeri

 What I have heard, but only validated the former, is that the 3ware UI
 is simpler but the MegaRAID controllers are faster.

 That being said, I do also like mdraid for certain purposes.  I was able
 to successfully repurpose a very old controller in an old but nicely
 sized chassis (16 bays) by moving to an md RAID6 array, which the
 hardware controller doesn't support (I did say it's old).  I also love
 the want_replacement feature in md RAID, a feature I don't believe is
 supported in the 3ware line (you can rebuild just one disk if it's bad
 but hasn't been kicked out of the array yet; that rebuild is much
 faster).

 --keith

 --
 kkel...@wombat.san-francisco.ca.us


 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos




Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread John R Pierce
On 8/21/2014 5:49 PM, Valeri Galtsev wrote:
 I never heard about any LSI card called RAID which are not hardware RAID.

the base LSI SAS HBA cards, like the 2008 chip (9211-8i etc boards) have 
microcoded hardware raid without any write buffer or battery 
backup   Its not quite fake raid but its kinda in limbo..

I *much* prefer to use these cards in IT mode, which requires reflashing 
the firmware and BIOS, as they always seem to be shipped in IR mode.   
IT mode is plain SAS2 Initiator-Target mode, where the disks are just 
plain disks without needing to create volumes or any other nonsense.



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] HP ProLiant DL380 G5

2014-08-21 Thread Keith Keller
On 2014-08-22, Valeri Galtsev galt...@kicp.uchicago.edu wrote:

 3ware was independent company till it was first bought by AMCC, then LSI
 bought them from AMCC. I didn't know LSI sold them to someone else,

Sorry, I was not clear: LSI was bought, not just 3ware.  If you look at
LSI's home page, it says An Avago Technologies Company.

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos