Re: [Ipmitool-devel] IPMI problem with FAI and wheezy

2012-11-12 Thread Steffen Grunewald
On Fri, Sep 07, 2012 at 03:02:05PM +0200, Steffen Grunewald wrote:
 Hi,
 
 I'm at my wits' end now with this old system, perhaps one of you can come
 up with another idea:
 
 The hardware is somewhat old, SuperMicro H8SSL board with IPMI card (BMC)
 looped into eth0 (Broadcom Tigon3).
 
 Excerpts from the demsg file:
 [0.00] Linux version 3.2.0-3-amd64 (Debian 3.2.23-1) 
 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 SMP 
 Mon Jul 23 02:45:17 UTC 2012
 [0.00] ACPI: FACP 7ffe0290 000F4 (v03 A M I  OEMFACP  
 12000606 MSFT 0097)
 [0.00] ACPI: DSDT 7ffe0410 033A8 (v01  0ABSW 0ABSW005 
 0005 INTL 02002026)
 [0.884954] tg3 :02:03.0: eth0: Tigon3 [partno(BCM95704A6) rev 2100] 
 (PCIX:133MHz:64-bit) MAC address xx:xx:xx:xx:xx:xx
 
 I used to set console=ttyS1,19200n1 in the pxelinux.cfg file, and watch
 FAI running via serial-over-LAN, but that stops right at the beginning -
 and the IPMI card cannot be reached afterwards, not by rebooting, nor by
 applying other tricks. The only way to get the connection back is power-
 cycling the whole box.
 
 This behaviour did not show up with Squeeze (2.6.32-5 kernel).
 
 I'm suspecting a change in the handling of the eth0/BMC bridge by the tg3
 driver, but that's only part of the story: it gets worse.

Actually, the problem has gone away with the latest (3.2.32 vs 3.2.23) kernel
now available for Wheezy.

S

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
___
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel


Re: [Ipmitool-devel] IPMI problem with FAI and wheezy

2012-09-11 Thread Steffen Grunewald
On Mon, Sep 10, 2012 at 10:40:01AM -0700, Albert Chu wrote:
 On Mon, 2012-09-10 at 10:32 -0700, Andy Cress wrote:
  For this symptom:
   Trying to shut down the machine (actually, a whole set of machines,
  all
   behaving the same, so it's not a single fault), by running shutdown
  -h
   now, will not halt but reboot it.
   The only way to reliably switch it off seems to be to run ipmitool
   chassis power soft, then shutdown -h now.
   The machine will then stay off for exactly 24 hours, then magically
   restart.
  
  It sounds to me like someone is doing one of these every 24 hours:
* sending a Wake-On-LAN magic packet to eth0
* sending an IPMI LAN chassis control power on command. 

Since it happens at random times, and the box affected had been disconnected
from mains power, *and* the BMC is not reachable from the network side, I
can exclude both of these. (If it were a WOL packet, other nodes would be
affected too, If it were chassis power on it must have been sent from
somewhere that has access to the BMC - certainly not a powered-down mainboard.)

 The get system restart cause IPMI command might be useful for
 debugging these possibilities.  In ipmitool I believe it's the chassis
 restart_cause command.

# ipmitool chassis  restart_cause
System restart cause: unknown

Not very helpful, I guess...

Thanks for your ideas, anything else you can imagine?

S

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel


Re: [Ipmitool-devel] IPMI problem with FAI and wheezy

2012-09-10 Thread Zdenek Styblik
On Mon, Sep 10, 2012 at 3:37 PM, Steffen Grunewald
steffen.grunew...@aei.mpg.de wrote:
 On Fri, Sep 07, 2012 at 08:02:32AM -0700, Andy Cress wrote:
 Steffen,

 Sounds like a firmware bug to me.  Is there a later firmware version for
 this board?

 Nothing I'm aware of - as I said, those boxen are 6 years old now.

 Looking for an explanation of the IPMI behaviour, I found that chassis
 power soft is connected with a sysctl named IPMI_CHASSIS_CTL_ACPI_SOFT,
 and Debian Wheezy's kernel doesn't have any /proc/acpi structure anymore
 (and supposedly, some other acpi functionality probably has moved as well)
 - that's why the expected shutdown doesn't happen... Why there's an
 alarm being set that wakes up the machine after 24 hours, that's
 still unknown, probably there's a date-less clock in the BMC? (Cutting
 power, and re-connecting to mains, doesn't change the behaviour.)

 With Squeeze kernels, everything worked. I would't expect buggy
 (or old) firmware to interact with kernels in such a way, and an
 suspecting a bug in the tg3 driver instead :( Got to UTS, I guess.


Have you considered possibility this might be a kernel bug? Have you
tried vanilla kernel instead of Debian stock kernel? Or compare kernel
configs to guess/bisect the problem? May be it's just some missing
kernel feature that wasn't compiled in.
Things do get broken, even in kernel.

Regards,
Z.

 -Original Message-
 From: Steffen Grunewald [mailto:steffen.grunew...@aei.mpg.de]
 Sent: Friday, September 07, 2012 9:02 AM
 To: FAI mailing list
 Cc: ipmitool developers list
 Subject: [Ipmitool-devel] IPMI problem with FAI and wheezy

 Hi,

 I'm at my wits' end now with this old system, perhaps one of you can
 come
 up with another idea:

 The hardware is somewhat old, SuperMicro H8SSL board with IPMI card
 (BMC)
 looped into eth0 (Broadcom Tigon3).

 Excerpts from the demsg file:
 [0.00] Linux version 3.2.0-3-amd64 (Debian 3.2.23-1)
 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) )
 #1 SMP Mon Jul 23 02:45:17 UTC 2012
 [0.00] ACPI: FACP 7ffe0290 000F4 (v03 A M I  OEMFACP
 12000606 MSFT 0097)
 [0.00] ACPI: DSDT 7ffe0410 033A8 (v01  0ABSW 0ABSW005
 0005 INTL 02002026)
 [0.884954] tg3 :02:03.0: eth0: Tigon3 [partno(BCM95704A6) rev
 2100] (PCIX:133MHz:64-bit) MAC address xx:xx:xx:xx:xx:xx

 I used to set console=ttyS1,19200n1 in the pxelinux.cfg file, and
 watch
 FAI running via serial-over-LAN, but that stops right at the beginning -
 and the IPMI card cannot be reached afterwards, not by rebooting, nor by
 applying other tricks. The only way to get the connection back is power-
 cycling the whole box.

 This behaviour did not show up with Squeeze (2.6.32-5 kernel).

 I'm suspecting a change in the handling of the eth0/BMC bridge by the
 tg3
 driver, but that's only part of the story: it gets worse.

 Trying to shut down the machine (actually, a whole set of machines, all
 behaving the same, so it's not a single fault), by running shutdown -h
 now,
 will not halt but reboot it.
 The only way to reliably switch it off seems to be to run ipmitool
 chassis
 power soft, then shutdown -h now.
 The machine will then stay off for exactly 24 hours, then magically
 restart.

 Needless to say I didn't change any BIOS settings, nor implemented kind
 of
 a watchdog on the BMC.

 Is there anything I can do to nail down the problem?

 Thank you in advance for your suggestions.

 Steffen

 
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond.
 Discussions
 will include endpoint security, mobile security and the latest in
 malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ipmitool-devel mailing list
 Ipmitool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ipmitool-devel mailing list
 Ipmitool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

 --
 Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
 Cluster Admin * - * http://www.aei.mpg.de/
 * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

 --
 Live Security Virtual Conference
 Exclusive 

Re: [Ipmitool-devel] IPMI problem with FAI and wheezy

2012-09-10 Thread Albert Chu
On Mon, 2012-09-10 at 10:32 -0700, Andy Cress wrote:
 For this symptom:
  Trying to shut down the machine (actually, a whole set of machines,
 all
  behaving the same, so it's not a single fault), by running shutdown
 -h
  now, will not halt but reboot it.
  The only way to reliably switch it off seems to be to run ipmitool
  chassis power soft, then shutdown -h now.
  The machine will then stay off for exactly 24 hours, then magically
  restart.
 
 It sounds to me like someone is doing one of these every 24 hours:
   * sending a Wake-On-LAN magic packet to eth0
   * sending an IPMI LAN chassis control power on command. 

The get system restart cause IPMI command might be useful for
debugging these possibilities.  In ipmitool I believe it's the chassis
restart_cause command.

Al

 Andy
 
 
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and 
 threat landscape has changed and how IT managers can respond. Discussions 
 will include endpoint security, mobile security and the latest in malware 
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ipmitool-devel mailing list
 Ipmitool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ipmitool-devel
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel


Re: [Ipmitool-devel] IPMI problem with FAI and wheezy

2012-09-07 Thread Andy Cress
Steffen,

Sounds like a firmware bug to me.  Is there a later firmware version for
this board?

Andy

-Original Message-
From: Steffen Grunewald [mailto:steffen.grunew...@aei.mpg.de] 
Sent: Friday, September 07, 2012 9:02 AM
To: FAI mailing list
Cc: ipmitool developers list
Subject: [Ipmitool-devel] IPMI problem with FAI and wheezy

Hi,

I'm at my wits' end now with this old system, perhaps one of you can
come
up with another idea:

The hardware is somewhat old, SuperMicro H8SSL board with IPMI card
(BMC)
looped into eth0 (Broadcom Tigon3).

Excerpts from the demsg file:
[0.00] Linux version 3.2.0-3-amd64 (Debian 3.2.23-1)
(debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) )
#1 SMP Mon Jul 23 02:45:17 UTC 2012
[0.00] ACPI: FACP 7ffe0290 000F4 (v03 A M I  OEMFACP
12000606 MSFT 0097)
[0.00] ACPI: DSDT 7ffe0410 033A8 (v01  0ABSW 0ABSW005
0005 INTL 02002026)
[0.884954] tg3 :02:03.0: eth0: Tigon3 [partno(BCM95704A6) rev
2100] (PCIX:133MHz:64-bit) MAC address xx:xx:xx:xx:xx:xx

I used to set console=ttyS1,19200n1 in the pxelinux.cfg file, and
watch
FAI running via serial-over-LAN, but that stops right at the beginning -
and the IPMI card cannot be reached afterwards, not by rebooting, nor by
applying other tricks. The only way to get the connection back is power-
cycling the whole box.

This behaviour did not show up with Squeeze (2.6.32-5 kernel).

I'm suspecting a change in the handling of the eth0/BMC bridge by the
tg3
driver, but that's only part of the story: it gets worse.

Trying to shut down the machine (actually, a whole set of machines, all
behaving the same, so it's not a single fault), by running shutdown -h
now, 
will not halt but reboot it.
The only way to reliably switch it off seems to be to run ipmitool
chassis
power soft, then shutdown -h now.
The machine will then stay off for exactly 24 hours, then magically
restart.

Needless to say I didn't change any BIOS settings, nor implemented kind
of
a watchdog on the BMC.

Is there anything I can do to nail down the problem?

Thank you in advance for your suggestions.

Steffen


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond.
Discussions 
will include endpoint security, mobile security and the latest in
malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel