[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2011-01-14 Thread Albert Chu
Hey Frank,

On Fri, 2011-01-14 at 05:12 -0800, Frank Steiner wrote:
 Al Chu wrote
 
  Maybe a little warning from the bmc-watchdog binary when ipmi kernel 
  modules
  are loaded when running on a Sun machine could help others to avoid this?
  
  I'm not really sure what kind of error message could be portable and/or
  reasonable.  From the perspective of most FreeIPMI software, any
  driver/mechanism to reach the BMC card is ok.  If the OpenIPMI driver is
  available, it can use it.
 
 I think I have misunderstood your reply to Dave. I thought that the sun BMCs
 had general problems whenever the kernel modules are loaded.

The later statement might be true, they might have problems on the suns.
My point was that FreeIPMI can't really tell that (atleast at this
moment).  FreeIPMI sees a driver available (e.g. /dev/ipmi0) and assumes
it can use it.

  I'm not sure why (in this case) using the OpenIPMI kernel driver didn't
  work.  Did you get any other error messages?  The predominant issue that
 
 I just figured out what was wrong: When calling /etc/init.d/ipmi start
 I can use -D OPENIPMI. After calling /etc/init.d/ipmi stop I can
 use bmc-watchdog without -D.
 
 Problem here was that starting lm_sensors from the runlevel loads only
 some of the ipmi kernel modules. Not enough to make -D openipmi work,
 but enough to cause many delays and busy problems when calling
 bmc-watchdog without -D.
 Thus: loading all or none ipmi kernel modules works (with/without -D),
 but only some of them causes both methods to fail.

Ahhh.  That's nasty.  I'm not really sure why that would be the case.  I
would think that lm_sensors would load all the modules appropriately.  I
would report this as a bug to the appropriate people at Sun/Redhat/Suse
or whoever.  That's quite nasty.

Al

 cu,
 Frank
 
 
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-12-27 Thread Al Chu
Hi Frank,

On Thu, 2010-12-23 at 06:53 -0800, Frank Steiner wrote:
 Dave Love wrote
 
  says ps, where I need the -D on x4100 or x4200m2 to avoid 
  
bmc-watchdog:  Get Watchdog Timer Error: BMC Busy
  
 Just ran into this (and -D openipmi didn't work for me) and figured out
 that the /etc/init.d/lm_sensors script loads the ipmi_si and ipmi_msghandler
 modules on our Sun machines. I didn't realize figure out the problem for 
 some time because the ipmi service itself was disabled in the runlevels
 and so I didn't even think of the kernel modules :-)

 Maybe a little warning from the bmc-watchdog binary when ipmi kernel modules
 are loaded when running on a Sun machine could help others to avoid this?

I'm not really sure what kind of error message could be portable and/or
reasonable.  From the perspective of most FreeIPMI software, any
driver/mechanism to reach the BMC card is ok.  If the OpenIPMI driver is
available, it can use it.

I'm not sure why (in this case) using the OpenIPMI kernel driver didn't
work.  Did you get any other error messages?  The predominant issue that
all IPMI software should try to avoid is using different drivers (e.g.
one tool uses /dev/ipmi0 and another uses memory mapped I/O from
userspace).  That can lead to issues.

Also, just to make sure, did you use the ignorestateflag workaround
when running bmc-watchdog?

Al

 cu,
 Frank
 
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-12-23 Thread Frank Steiner
Dave Love wrote

 says ps, where I need the -D on x4100 or x4200m2 to avoid 
 
   bmc-watchdog:  Get Watchdog Timer Error: BMC Busy
 
Just ran into this (and -D openipmi didn't work for me) and figured out
that the /etc/init.d/lm_sensors script loads the ipmi_si and ipmi_msghandler
modules on our Sun machines. I didn't realize figure out the problem for 
some time because the ipmi service itself was disabled in the runlevels
and so I didn't even think of the kernel modules :-)

Maybe a little warning from the bmc-watchdog binary when ipmi kernel modules
are loaded when running on a Sun machine could help others to avoid this?

cu,
Frank

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. BioinformatikMail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17   Phone: +49 89 2180-4049
80333 Muenchen, Germany   Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-09-09 Thread Frank Steiner
Frank Steiner wrote
 
 Btw., Sun has confirmed it's a bug that the state flag does not change.
 The 2nd level support has requested a firmware update for the ILOM, but it's
 not yet clear if it will be granted.

Sun finally made it :-) They release a new ILOM firmware 2.0.2.10 for
the X4100M2/X4200M2 which can be downloaded at
https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sunfire-x4100-x4200-m2-2.3.2-...@cds-cds_smi

With this firmware, the timer status field changes from Stopped to
Running after e.g. bmc-watchdog -r. Their support now even mentions
how to use freeipmi :-)

cu,
Frank


Quote from Sun engineering:

Now we have two solutions for Customer to use BMC Watchdog Timer function in
new respin G12f ILOM:

=1. Use open source ipmitool===

(http://sourceforge.net/projects/ipmitool/files/)
ipmitool ... mc watchdog off can correctly stopped current running BMC 
Watchdog Timer in SP/ILOM.

To monitor Server status with open source ipmitool:
a. Set up BMC Watchdog Timer in BIOS.
 The setup path is Advanced - IPMI 2.0 Configuration - BMC Watch Dog 
Timer Action
b. Upon BIOS POST finishes, Timer will start automatically. Then first stop 
Timer with
 ipmitool ... bmc watchdog off command from a remote system.
c. Set up a daemon in local system. The daemon should do jobs as below:
Periodically call ipmitool ... bmc watchdog reset to restart Watchdog 
Timer.
The reset period should be less than the Timer Initial beginning value.

==2. Use freeipmi=

Latest freeipmi can be download from:
http://ftp.gluster.com/pub/freeipmi/0.8.9/
Details of the bmc-watchdog tool usage:
http://www.gnu.org/software/freeipmi/manpages/man8/bmc-watchdog.8.html

*NOTICE*
We don't need to pre-setup BMC Watchdog Timer in BIOS, because bmc-watchdog tool
bundled in freeipmi provide SET command.

*Usage Example*
First please install freeipmi in local Host system which will be controlled by
BMC Watchdog Timer.

[r...@ ~]# /usr/local/sbin/bmc-watchdog -g
Timer Use:   BIOS POST
Timer:   Stopped
Logging: Enabled
Timeout Action:  None
Pre-Timeout Interrupt:   None
Pre-Timeout Interval:0 seconds
Timer Use BIOS FRB2 Flag:Set
Timer Use BIOS POST Flag:Set
Timer Use BIOS OS Load Flag: Set
Timer Use BIOS SMS/OS Flag:  Set
Timer Use BIOS OEM Flag: Set
Initial Countdown:   6553 seconds
Current Countdown:   6553 seconds
[r...@ ~]# /usr/local/sbin/bmc-watchdog --set -a 1 -i 300
[r...@ ~]# /usr/local/sbin/bmc-watchdog -g
Timer Use:   BIOS POST
Timer:   Stopped
Logging: Enabled
Timeout Action:  Hard Reset =
Pre-Timeout Interrupt:   None
Pre-Timeout Interval:0 seconds
Timer Use BIOS FRB2 Flag:Set
Timer Use BIOS POST Flag:Set
Timer Use BIOS OS Load Flag: Set
Timer Use BIOS SMS/OS Flag:  Set
Timer Use BIOS OEM Flag: Set
Initial Countdown:   300 seconds=
Current Countdown:   300 seconds
[r...@ ~]# /usr/local/sbin/bmc-watchdog 
--logfile=/var/log/bmc-watchdog.log -d
[r...@ ~]# cat /var/log/bmc-watchdog.log
[Sep 01 01:34:39]: starting bmc-watchdog daemon
[r...@ ~]# /usr/local/sbin/bmc-watchdog -g
Timer Use:   BIOS POST
Timer:   Running
Logging: Enabled
Timeout Action:  Hard Reset
Pre-Timeout Interrupt:   None
Pre-Timeout Interval:0 seconds
Timer Use BIOS FRB2 Flag:Clear
Timer Use BIOS POST Flag:Clear
Timer Use BIOS OS Load Flag: Clear
Timer Use BIOS SMS/OS Flag:  Clear
Timer Use BIOS OEM Flag: Clear
Initial Countdown:   300 seconds
Current Countdown:   296 seconds

The Timer will be reset by bmc-watchdog daemon periodically. The default reset 
period is 60
seconds.


-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. BioinformatikMail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17   Phone: +49 89 2180-4049
80333 Muenchen, Germany   Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-28 Thread Dave Love
I thought I'd replied to this before...

Albert Chu ch...@llnl.gov writes:

 Just want to clarify, you're saying this sleep(5) makes it work on the
 4100?

Yes, to the limited extent I tested it, as I'm not interested in running
it on x4100.  However, it seemed to be running on the x4500 initially,
with a similar ILOM version, so I wonder if it's reliable.

I haven't had a chance to me around any more with it on the x4500,
unfortunately.

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-28 Thread Albert Chu
Hey Dave,

Ok.  I'll put a sleep(5) like workaround-addition into bmc-watchdog.

Al

On Mon, 2010-06-28 at 08:06 -0700, Dave Love wrote:
 I thought I'd replied to this before...
 
 Albert Chu ch...@llnl.gov writes:
 
  Just want to clarify, you're saying this sleep(5) makes it work on the
  4100?
 
 Yes, to the limited extent I tested it, as I'm not interested in running
 it on x4100.  However, it seemed to be running on the x4500 initially,
 with a similar ILOM version, so I wonder if it's reliable.
 
 I haven't had a chance to me around any more with it on the x4500,
 unfortunately.
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-22 Thread Dave Love
[Sorry, I thought I sent this earlier, but I'm harassed and losing
track.]

Al Chu ch...@llnl.gov writes:

 --- bmc-watchdog.c  16 Jun 2010 17:52:38 -  1.131
 +++ bmc-watchdog.c  17 Jun 2010 23:18:57 -
 @@ -1920,6 +1920,8 @@ _daemon_cmd (void)
  reset_period  (retry_wait_time * retry_attempt))
  retry_attempt = reset_period/retry_wait_time;
  
 +  sleep (5);
 +
while (shutdown_flag)
  {
struct timeval start_tv, end_tv;


That seemed to work on x4500/Solaris, but then I tried with a shorter
sleep, and now it's not working after restoring the sleep (5), but does
last a longer, variable, time than without the sleep.  I'll see if I can
figure any more out.

It does seem to work on an x4100 (not M2) with RedHat.

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-22 Thread Dave Love
Al Chu ch...@llnl.gov writes:

 says ps, where I need the -D on x4100 or x4200m2 to avoid 
 
   bmc-watchdog:  Get Watchdog Timer Error: BMC Busy

 Yeah, I'm certainly unsure of that one.  Have you tried running w/o the
 IPMI kernel driver?  I'm wondering if they don't together for you (don't
 work together for me on some mobos there).

Spot on, thanks.  I was about to say I'm sure it wasn't installed -- I
don't normally run it due to problems on our Supermicros -- but the
driver must have been installed.  I've just checked and `service start
ipmi' on a RH5 system does trigger the failure.  It's unrelated to the
timer stopped issue, though, and I haven't noticed it causing trouble
with other freeipmi stuff, though I mostly run that remotely.

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-22 Thread Albert Chu
Hey Dave,

On Mon, 2010-06-21 at 15:57 -0700, Dave Love wrote:
 [Sorry, I thought I sent this earlier, but I'm harassed and losing
 track.]
 
 Al Chu ch...@llnl.gov writes:
 
  --- bmc-watchdog.c  16 Jun 2010 17:52:38 -  1.131
  +++ bmc-watchdog.c  17 Jun 2010 23:18:57 -
  @@ -1920,6 +1920,8 @@ _daemon_cmd (void)
   reset_period  (retry_wait_time * retry_attempt))
   retry_attempt = reset_period/retry_wait_time;
   
  +  sleep (5);
  +
 while (shutdown_flag)
   {
 struct timeval start_tv, end_tv;
 
 
 That seemed to work on x4500/Solaris, but then I tried with a shorter
 sleep, and now it's not working after restoring the sleep (5), but does
 last a longer, variable, time than without the sleep.  I'll see if I can
 figure any more out.
 
 It does seem to work on an x4100 (not M2) with RedHat.

Just want to clarify, you're saying this sleep(5) makes it work on the
4100?

Al

-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-18 Thread Al Chu
 says ps, where I need the -D on x4100 or x4200m2 to avoid 
 
   bmc-watchdog:  Get Watchdog Timer Error: BMC Busy

Yeah, I'm certainly unsure of that one.  Have you tried running w/o the
IPMI kernel driver?  I'm wondering if they don't together for you (don't
work together for me on some mobos there).

Al

On Thu, 2010-06-17 at 05:30 -0700, Dave Love wrote:
 Al Chu ch...@llnl.gov writes:
 
  (Naturally, I added confirmed to work around based on the assumption
  you guys can confirm it for me :-)
 
 I can't confirm it on either x4100 -- not x4100m2 -- (RedHat 5) or x4500
 (Solaris).  In each case, the daemon reports:
 
   timer stopped by another process
   stopping bmc-watchdog daemon
 
 Under Solaris, SMF restarts the daemon anyway, so it kind-of works, but
 not under RedHat.
 
 I'm running it as
 
   /usr/sbin/bmc-watchdog -d -u 4 -p 0 -a 1 -F -P -L -S -O -i 900 -e 60 -D 
 openipmi -W ignorestateflag
 
 says ps, where I need the -D on x4100 or x4200m2 to avoid 
 
   bmc-watchdog:  Get Watchdog Timer Error: BMC Busy
 
 Ii think I have the latest firmware in each case -- some version of
 2.0.2.5.
 
 I wonder what's different between me and Frank.  I don't have time to
 investigate immediately, but I'll try to later.
 
 By the way, I know I'm running the right version as it accepts
 ignorestateflag.  However, bmc-watchdog doesn't accept --version, though
 it's in the --help output.
 
 By the way 2, I noticed that it's still using GPLv2, not v3.  Is that an
 oversight, or because the copyrights don't allow it?
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-17 Thread Dave Love
Frank Steiner fsteiner-ma...@bio.ifi.lmu.de writes:

 Seems to work fine on my X4100 M2 machines :-) Starts up, reports status
 correctly, resets etc. 

Does it actually run indefinitely, though?  I just realize I gave
incomplete info, as I see it work for a few cycles.  Here's a sample of
the logged events with -e 60:

  [Jun 17 12:41:12]: starting bmc-watchdog daemon
  [Jun 17 12:45:12]: timer stopped by another process
  [Jun 17 12:45:12]: stopping bmc-watchdog daemon

___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-17 Thread Al Chu
Hey Dave,

Doh!  The --version bug was due to me not understanding a subtlety in
the argp library.  Thanks for the catch.  I've put up a new QA release
with a fix in it.

http://ftp.gluster.com/pub/freeipmi/qa-release/freeipmi-0.8.8.beta0.tar.gz

Hmm.  I don't know much about the openipmi kernel driver, but I know it
can have it's own bmc-watchdog running.  Are you sure it's not running
it's own watchdog out of the kernel?

I do have one other guess as to how you're seeing timer stopped by
another process.  It's possible there is an early race, where the timer
has not yet quite started.  In bmc-watchdog/src/bmc-watchdog.c, perhaps
you can try this tiny test.

--- bmc-watchdog.c  16 Jun 2010 17:52:38 -  1.131
+++ bmc-watchdog.c  17 Jun 2010 23:18:57 -
@@ -1920,6 +1920,8 @@ _daemon_cmd (void)
 reset_period  (retry_wait_time * retry_attempt))
 retry_attempt = reset_period/retry_wait_time;
 
+  sleep (5);
+
   while (shutdown_flag)
 {
   struct timeval start_tv, end_tv;

 By the way 2, I noticed that it's still using GPLv2, not v3.  Is that 
 an oversight, or because the copyrights don't allow it?

Nope, it'll be GPLv3 when I release the next major release of FreeIPMI.
Was just too lazy to back update all the headers and documents in the
current release line :P

Al

On Thu, 2010-06-17 at 05:30 -0700, Dave Love wrote:
 Al Chu ch...@llnl.gov writes:
 
  (Naturally, I added confirmed to work around based on the assumption
  you guys can confirm it for me :-)
 
 I can't confirm it on either x4100 -- not x4100m2 -- (RedHat 5) or x4500
 (Solaris).  In each case, the daemon reports:
 
   timer stopped by another process
   stopping bmc-watchdog daemon
 
 Under Solaris, SMF restarts the daemon anyway, so it kind-of works, but
 not under RedHat.
 
 I'm running it as
 
   /usr/sbin/bmc-watchdog -d -u 4 -p 0 -a 1 -F -P -L -S -O -i 900 -e 60 -D 
 openipmi -W ignorestateflag
 
 says ps, where I need the -D on x4100 or x4200m2 to avoid 
 
   bmc-watchdog:  Get Watchdog Timer Error: BMC Busy
 
 Ii think I have the latest firmware in each case -- some version of
 2.0.2.5.
 
 I wonder what's different between me and Frank.  I don't have time to
 investigate immediately, but I'll try to later.
 
 By the way, I know I'm running the right version as it accepts
 ignorestateflag.  However, bmc-watchdog doesn't accept --version, though
 it's in the --help output.
 
 By the way 2, I noticed that it's still using GPLv2, not v3.  Is that an
 oversight, or because the copyrights don't allow it?
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel


[Freeipmi-devel] Re: FreeIPMI beta w/ BMC watchdog workaround for Sun machines

2010-06-16 Thread Al Chu
Hey Frank,

Great.  I'll make sure this'll get into the FreeIPMI 0.8.7 release.

Al

On Wed, 2010-06-16 at 01:32 -0700, Frank Steiner wrote:
 Al Chu wrote
 
  Hi Frank, Dave,
  
  I finally got around to trying to develop that workaround into BMC
  watchdog to get around the issue on the Sun motherboards you have.  Do
  you think you guys could try it out and make sure it works for you?
  
  http://*ftp.gluster.com/pub/freeipmi/qa-release/freeipmi-0.8.7.ignorestateflag.tar.gz
  
  You have to add the '-W ignorestateflag' option to turn on the
  workaround.  You'll have to edit /etc/sysconfig/bmc-watchdog to add the
  workaround for Linux.  Not sure how you would do this on Solaris.
 
 Seems to work fine on my X4100 M2 machines :-) Starts up, reports status
 correctly, resets etc. 
 
 Btw., Sun has confirmed it's a bug that the state flag does not change.
 The 2nd level support has requested a firmware update for the ILOM, but it's
 not yet clear if it will be granted.
 
 Thanks a lot for the patch, now I can safely deploy the watchdog on all
 our servers :-)
 
 cu,
 Frank
 
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


___
Freeipmi-devel mailing list
Freeipmi-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/freeipmi-devel