[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hey Frank, On Fri, 2011-01-14 at 05:12 -0800, Frank Steiner wrote: Al Chu wrote Maybe a little warning from the bmc-watchdog binary when ipmi kernel modules are loaded when running on a Sun machine could help others to avoid this? I'm not really sure what kind of error message could be portable and/or reasonable. From the perspective of most FreeIPMI software, any driver/mechanism to reach the BMC card is ok. If the OpenIPMI driver is available, it can use it. I think I have misunderstood your reply to Dave. I thought that the sun BMCs had general problems whenever the kernel modules are loaded. The later statement might be true, they might have problems on the suns. My point was that FreeIPMI can't really tell that (atleast at this moment). FreeIPMI sees a driver available (e.g. /dev/ipmi0) and assumes it can use it. I'm not sure why (in this case) using the OpenIPMI kernel driver didn't work. Did you get any other error messages? The predominant issue that I just figured out what was wrong: When calling /etc/init.d/ipmi start I can use -D OPENIPMI. After calling /etc/init.d/ipmi stop I can use bmc-watchdog without -D. Problem here was that starting lm_sensors from the runlevel loads only some of the ipmi kernel modules. Not enough to make -D openipmi work, but enough to cause many delays and busy problems when calling bmc-watchdog without -D. Thus: loading all or none ipmi kernel modules works (with/without -D), but only some of them causes both methods to fail. Ahhh. That's nasty. I'm not really sure why that would be the case. I would think that lm_sensors would load all the modules appropriately. I would report this as a bug to the appropriate people at Sun/Redhat/Suse or whoever. That's quite nasty. Al cu, Frank -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hi Frank, On Thu, 2010-12-23 at 06:53 -0800, Frank Steiner wrote: Dave Love wrote says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Just ran into this (and -D openipmi didn't work for me) and figured out that the /etc/init.d/lm_sensors script loads the ipmi_si and ipmi_msghandler modules on our Sun machines. I didn't realize figure out the problem for some time because the ipmi service itself was disabled in the runlevels and so I didn't even think of the kernel modules :-) Maybe a little warning from the bmc-watchdog binary when ipmi kernel modules are loaded when running on a Sun machine could help others to avoid this? I'm not really sure what kind of error message could be portable and/or reasonable. From the perspective of most FreeIPMI software, any driver/mechanism to reach the BMC card is ok. If the OpenIPMI driver is available, it can use it. I'm not sure why (in this case) using the OpenIPMI kernel driver didn't work. Did you get any other error messages? The predominant issue that all IPMI software should try to avoid is using different drivers (e.g. one tool uses /dev/ipmi0 and another uses memory mapped I/O from userspace). That can lead to issues. Also, just to make sure, did you use the ignorestateflag workaround when running bmc-watchdog? Al cu, Frank -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Dave Love wrote says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Just ran into this (and -D openipmi didn't work for me) and figured out that the /etc/init.d/lm_sensors script loads the ipmi_si and ipmi_msghandler modules on our Sun machines. I didn't realize figure out the problem for some time because the ipmi service itself was disabled in the runlevels and so I didn't even think of the kernel modules :-) Maybe a little warning from the bmc-watchdog binary when ipmi kernel modules are loaded when running on a Sun machine could help others to avoid this? cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. BioinformatikMail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Frank Steiner wrote Btw., Sun has confirmed it's a bug that the state flag does not change. The 2nd level support has requested a firmware update for the ILOM, but it's not yet clear if it will be granted. Sun finally made it :-) They release a new ILOM firmware 2.0.2.10 for the X4100M2/X4200M2 which can be downloaded at https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sunfire-x4100-x4200-m2-2.3.2-...@cds-cds_smi With this firmware, the timer status field changes from Stopped to Running after e.g. bmc-watchdog -r. Their support now even mentions how to use freeipmi :-) cu, Frank Quote from Sun engineering: Now we have two solutions for Customer to use BMC Watchdog Timer function in new respin G12f ILOM: =1. Use open source ipmitool=== (http://sourceforge.net/projects/ipmitool/files/) ipmitool ... mc watchdog off can correctly stopped current running BMC Watchdog Timer in SP/ILOM. To monitor Server status with open source ipmitool: a. Set up BMC Watchdog Timer in BIOS. The setup path is Advanced - IPMI 2.0 Configuration - BMC Watch Dog Timer Action b. Upon BIOS POST finishes, Timer will start automatically. Then first stop Timer with ipmitool ... bmc watchdog off command from a remote system. c. Set up a daemon in local system. The daemon should do jobs as below: Periodically call ipmitool ... bmc watchdog reset to restart Watchdog Timer. The reset period should be less than the Timer Initial beginning value. ==2. Use freeipmi= Latest freeipmi can be download from: http://ftp.gluster.com/pub/freeipmi/0.8.9/ Details of the bmc-watchdog tool usage: http://www.gnu.org/software/freeipmi/manpages/man8/bmc-watchdog.8.html *NOTICE* We don't need to pre-setup BMC Watchdog Timer in BIOS, because bmc-watchdog tool bundled in freeipmi provide SET command. *Usage Example* First please install freeipmi in local Host system which will be controlled by BMC Watchdog Timer. [r...@ ~]# /usr/local/sbin/bmc-watchdog -g Timer Use: BIOS POST Timer: Stopped Logging: Enabled Timeout Action: None Pre-Timeout Interrupt: None Pre-Timeout Interval:0 seconds Timer Use BIOS FRB2 Flag:Set Timer Use BIOS POST Flag:Set Timer Use BIOS OS Load Flag: Set Timer Use BIOS SMS/OS Flag: Set Timer Use BIOS OEM Flag: Set Initial Countdown: 6553 seconds Current Countdown: 6553 seconds [r...@ ~]# /usr/local/sbin/bmc-watchdog --set -a 1 -i 300 [r...@ ~]# /usr/local/sbin/bmc-watchdog -g Timer Use: BIOS POST Timer: Stopped Logging: Enabled Timeout Action: Hard Reset = Pre-Timeout Interrupt: None Pre-Timeout Interval:0 seconds Timer Use BIOS FRB2 Flag:Set Timer Use BIOS POST Flag:Set Timer Use BIOS OS Load Flag: Set Timer Use BIOS SMS/OS Flag: Set Timer Use BIOS OEM Flag: Set Initial Countdown: 300 seconds= Current Countdown: 300 seconds [r...@ ~]# /usr/local/sbin/bmc-watchdog --logfile=/var/log/bmc-watchdog.log -d [r...@ ~]# cat /var/log/bmc-watchdog.log [Sep 01 01:34:39]: starting bmc-watchdog daemon [r...@ ~]# /usr/local/sbin/bmc-watchdog -g Timer Use: BIOS POST Timer: Running Logging: Enabled Timeout Action: Hard Reset Pre-Timeout Interrupt: None Pre-Timeout Interval:0 seconds Timer Use BIOS FRB2 Flag:Clear Timer Use BIOS POST Flag:Clear Timer Use BIOS OS Load Flag: Clear Timer Use BIOS SMS/OS Flag: Clear Timer Use BIOS OEM Flag: Clear Initial Countdown: 300 seconds Current Countdown: 296 seconds The Timer will be reset by bmc-watchdog daemon periodically. The default reset period is 60 seconds. -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. BioinformatikMail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
I thought I'd replied to this before... Albert Chu ch...@llnl.gov writes: Just want to clarify, you're saying this sleep(5) makes it work on the 4100? Yes, to the limited extent I tested it, as I'm not interested in running it on x4100. However, it seemed to be running on the x4500 initially, with a similar ILOM version, so I wonder if it's reliable. I haven't had a chance to me around any more with it on the x4500, unfortunately. ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hey Dave, Ok. I'll put a sleep(5) like workaround-addition into bmc-watchdog. Al On Mon, 2010-06-28 at 08:06 -0700, Dave Love wrote: I thought I'd replied to this before... Albert Chu ch...@llnl.gov writes: Just want to clarify, you're saying this sleep(5) makes it work on the 4100? Yes, to the limited extent I tested it, as I'm not interested in running it on x4100. However, it seemed to be running on the x4500 initially, with a similar ILOM version, so I wonder if it's reliable. I haven't had a chance to me around any more with it on the x4500, unfortunately. -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
[Sorry, I thought I sent this earlier, but I'm harassed and losing track.] Al Chu ch...@llnl.gov writes: --- bmc-watchdog.c 16 Jun 2010 17:52:38 - 1.131 +++ bmc-watchdog.c 17 Jun 2010 23:18:57 - @@ -1920,6 +1920,8 @@ _daemon_cmd (void) reset_period (retry_wait_time * retry_attempt)) retry_attempt = reset_period/retry_wait_time; + sleep (5); + while (shutdown_flag) { struct timeval start_tv, end_tv; That seemed to work on x4500/Solaris, but then I tried with a shorter sleep, and now it's not working after restoring the sleep (5), but does last a longer, variable, time than without the sleep. I'll see if I can figure any more out. It does seem to work on an x4100 (not M2) with RedHat. ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Al Chu ch...@llnl.gov writes: says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Yeah, I'm certainly unsure of that one. Have you tried running w/o the IPMI kernel driver? I'm wondering if they don't together for you (don't work together for me on some mobos there). Spot on, thanks. I was about to say I'm sure it wasn't installed -- I don't normally run it due to problems on our Supermicros -- but the driver must have been installed. I've just checked and `service start ipmi' on a RH5 system does trigger the failure. It's unrelated to the timer stopped issue, though, and I haven't noticed it causing trouble with other freeipmi stuff, though I mostly run that remotely. ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hey Dave, On Mon, 2010-06-21 at 15:57 -0700, Dave Love wrote: [Sorry, I thought I sent this earlier, but I'm harassed and losing track.] Al Chu ch...@llnl.gov writes: --- bmc-watchdog.c 16 Jun 2010 17:52:38 - 1.131 +++ bmc-watchdog.c 17 Jun 2010 23:18:57 - @@ -1920,6 +1920,8 @@ _daemon_cmd (void) reset_period (retry_wait_time * retry_attempt)) retry_attempt = reset_period/retry_wait_time; + sleep (5); + while (shutdown_flag) { struct timeval start_tv, end_tv; That seemed to work on x4500/Solaris, but then I tried with a shorter sleep, and now it's not working after restoring the sleep (5), but does last a longer, variable, time than without the sleep. I'll see if I can figure any more out. It does seem to work on an x4100 (not M2) with RedHat. Just want to clarify, you're saying this sleep(5) makes it work on the 4100? Al -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Yeah, I'm certainly unsure of that one. Have you tried running w/o the IPMI kernel driver? I'm wondering if they don't together for you (don't work together for me on some mobos there). Al On Thu, 2010-06-17 at 05:30 -0700, Dave Love wrote: Al Chu ch...@llnl.gov writes: (Naturally, I added confirmed to work around based on the assumption you guys can confirm it for me :-) I can't confirm it on either x4100 -- not x4100m2 -- (RedHat 5) or x4500 (Solaris). In each case, the daemon reports: timer stopped by another process stopping bmc-watchdog daemon Under Solaris, SMF restarts the daemon anyway, so it kind-of works, but not under RedHat. I'm running it as /usr/sbin/bmc-watchdog -d -u 4 -p 0 -a 1 -F -P -L -S -O -i 900 -e 60 -D openipmi -W ignorestateflag says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Ii think I have the latest firmware in each case -- some version of 2.0.2.5. I wonder what's different between me and Frank. I don't have time to investigate immediately, but I'll try to later. By the way, I know I'm running the right version as it accepts ignorestateflag. However, bmc-watchdog doesn't accept --version, though it's in the --help output. By the way 2, I noticed that it's still using GPLv2, not v3. Is that an oversight, or because the copyrights don't allow it? -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Frank Steiner fsteiner-ma...@bio.ifi.lmu.de writes: Seems to work fine on my X4100 M2 machines :-) Starts up, reports status correctly, resets etc. Does it actually run indefinitely, though? I just realize I gave incomplete info, as I see it work for a few cycles. Here's a sample of the logged events with -e 60: [Jun 17 12:41:12]: starting bmc-watchdog daemon [Jun 17 12:45:12]: timer stopped by another process [Jun 17 12:45:12]: stopping bmc-watchdog daemon ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hey Dave, Doh! The --version bug was due to me not understanding a subtlety in the argp library. Thanks for the catch. I've put up a new QA release with a fix in it. http://ftp.gluster.com/pub/freeipmi/qa-release/freeipmi-0.8.8.beta0.tar.gz Hmm. I don't know much about the openipmi kernel driver, but I know it can have it's own bmc-watchdog running. Are you sure it's not running it's own watchdog out of the kernel? I do have one other guess as to how you're seeing timer stopped by another process. It's possible there is an early race, where the timer has not yet quite started. In bmc-watchdog/src/bmc-watchdog.c, perhaps you can try this tiny test. --- bmc-watchdog.c 16 Jun 2010 17:52:38 - 1.131 +++ bmc-watchdog.c 17 Jun 2010 23:18:57 - @@ -1920,6 +1920,8 @@ _daemon_cmd (void) reset_period (retry_wait_time * retry_attempt)) retry_attempt = reset_period/retry_wait_time; + sleep (5); + while (shutdown_flag) { struct timeval start_tv, end_tv; By the way 2, I noticed that it's still using GPLv2, not v3. Is that an oversight, or because the copyrights don't allow it? Nope, it'll be GPLv3 when I release the next major release of FreeIPMI. Was just too lazy to back update all the headers and documents in the current release line :P Al On Thu, 2010-06-17 at 05:30 -0700, Dave Love wrote: Al Chu ch...@llnl.gov writes: (Naturally, I added confirmed to work around based on the assumption you guys can confirm it for me :-) I can't confirm it on either x4100 -- not x4100m2 -- (RedHat 5) or x4500 (Solaris). In each case, the daemon reports: timer stopped by another process stopping bmc-watchdog daemon Under Solaris, SMF restarts the daemon anyway, so it kind-of works, but not under RedHat. I'm running it as /usr/sbin/bmc-watchdog -d -u 4 -p 0 -a 1 -F -P -L -S -O -i 900 -e 60 -D openipmi -W ignorestateflag says ps, where I need the -D on x4100 or x4200m2 to avoid bmc-watchdog: Get Watchdog Timer Error: BMC Busy Ii think I have the latest firmware in each case -- some version of 2.0.2.5. I wonder what's different between me and Frank. I don't have time to investigate immediately, but I'll try to later. By the way, I know I'm running the right version as it accepts ignorestateflag. However, bmc-watchdog doesn't accept --version, though it's in the --help output. By the way 2, I noticed that it's still using GPLv2, not v3. Is that an oversight, or because the copyrights don't allow it? -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel
[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Hey Frank, Great. I'll make sure this'll get into the FreeIPMI 0.8.7 release. Al On Wed, 2010-06-16 at 01:32 -0700, Frank Steiner wrote: Al Chu wrote Hi Frank, Dave, I finally got around to trying to develop that workaround into BMC watchdog to get around the issue on the Sun motherboards you have. Do you think you guys could try it out and make sure it works for you? http://*ftp.gluster.com/pub/freeipmi/qa-release/freeipmi-0.8.7.ignorestateflag.tar.gz You have to add the '-W ignorestateflag' option to turn on the workaround. You'll have to edit /etc/sysconfig/bmc-watchdog to add the workaround for Linux. Not sure how you would do this on Solaris. Seems to work fine on my X4100 M2 machines :-) Starts up, reports status correctly, resets etc. Btw., Sun has confirmed it's a bug that the state flag does not change. The 2nd level support has requested a firmware update for the ILOM, but it's not yet clear if it will be granted. Thanks a lot for the patch, now I can safely deploy the watchdog on all our servers :-) cu, Frank -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory ___ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel