On Mon, Mar 18, 2013 at 12:05 AM, Dan Egli <[email protected]> wrote: > *All this discussion about raid levels and what not has brought to my mind > a different, if related, question. One of the reasons I like software raid > is that it's easy to monitor. For example, I could have a cron script that > runs once every 15 minutes for example and checks the status of the > /proc/mdstat file to ensure any raid(s) listed show status of Healthy. But > how do you do something like that for a Hardware raid? How can you tell, > for example, if drive #3 in a HW raid10 has failed? This is something I > honestly don't know off my head. I know many of you folks have had > experience with HW raid and device failures in that array. How do you know? > There's no file you can check like mdstat is there? I'd think this would be > especially important for remote hosted/co-located servers.* >
IME it's vendor specific. Some of the cards I've used had their own monitoring software. Others had a utility that you could use to query and thus write your own monitoring plugin. Some had nothing--they would just beep and then you'd have to use their access tool (a front end to their bios software) and navigate their menus to figure it out and deal with it--*possibly* could be automated via an expect script, but not easily--navigating an ncurses-type interface. mdadm has it's own monitoring daemon you can run also, rather than polling the contents of /proc/mdstat yourself. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
