Re: [E1000-devel] Possible to stop external IPMI/BMC access (port 623) by bringing iface up?

2011-09-28 Thread Carsten Aulbert
Hi Jesse

On Thursday 29 September 2011 01:12:19 Jesse Brandeburg wrote:
> This is probably the driver touching a register that prevents IPMI
> traffic from flowing to the bmc.  It may be a patch that Debian made
> that broke it, I don't generally track debian's forks of the kernel. :-)
> 

At least I can rule that one out as it's a self-compiled 2.6.32.28 or 
2.6.32.46 - and I'm using exactly the same binary kernel package
> Can you send the output from the ethregs tool before down/after down.
> 
> ethregs is available on e1000.sf.net in the downloads area.

Attached tarball with 4 files

ethregs.{lenny,squeeze}.{down,up}

which should cover all 4 cases hopefully well enough.

Just in case you need it:

lenny:
n1570:/tmp/ethregs-1.13.0# modinfo e1000e
filename:   /lib/modules/2.6.32.28-atlas-
generic/kernel/drivers/net/e1000e/e1000e.ko
version:1.0.2-k2
license:GPL
description:Intel(R) PRO/1000 Network Driver
author: Intel Corporation, 
srcversion: AF3F52EBD9A435E0A141B19
[...]

squeeze:
root@n1670:~# modinfo e1000e
filename:   /lib/modules/2.6.32.28-atlas-
generic/kernel/drivers/net/e1000e/e1000e.ko
version:1.0.2-k2
license:GPL
description:Intel(R) PRO/1000 Network Driver
author: Intel Corporation, 
srcversion: AF3F52EBD9A435E0A141B19

HTH

Carsten
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Possible to stop external IPMI/BMC access (port 623) by bringing iface up?

2011-09-28 Thread Bokhan Artem
29.09.2011 1:23, Carsten Aulbert пишет:
> But now we reinstalled several machines with Debian Squeeze and suddenly we
> can only query the BMC when eth0 is down.

The same with ubuntu LTS 10.04 (2.6.32) and latest e1000 drivers from 
sf.net. We have some hardware with AOC-SIMSO IPMI modules.

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [BUG?] e1000e Detected Hardware Unit Hang

2011-09-28 Thread joeyli
於 四,2011-09-29 於 13:37 +0800,Dave Young 提到:
> On 09/29/2011 01:28 PM, joeyli wrote:
> 
> > Hi Dave, 
> > 
> > 於 四,2011-09-29 於 10:41 +0800,Dave Young 提到:
> >> Hi,
> >>
> >> suspend to ram, after resume, I got below info: (attached the full dmesg)
> >>
> >> [106900.343520] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
> >> [106900.343521]   TDH  <1>
> >> [106900.343522]   TDT  <2>
> >> [106900.343523]   next_to_use  <2>
> >> [106900.343523]   next_to_clean<1>
> >> [106900.343524] buffer_info[next_to_clean]:
> >> [106900.343525]   time_stamp   <101e7f773>
> >> [106900.343526]   next_to_watch<1>
> >> [106900.343526]   jiffies  <101e7fa4a>
> >> [106900.343527]   next_to_watch.status <0>
> >> [106900.343528] MAC Status <80683>
> >> [106900.343529] PHY Status <796d>
> >> [106900.343529] PHY 1000BASE-T Status  <3800>
> >> [106900.343530] PHY Extended Status<3000>
> >> [106900.343531] PCI Status <10>
> >> [106902.342904] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
> >> [106902.342905]   TDH  <1>
> >> [106902.342906]   TDT  <2>
> >> [106902.342907]   next_to_use  <2>
> >> [106902.342907]   next_to_clean<1>
> >> [106902.342908] buffer_info[next_to_clean]:
> >> [106902.342909]   time_stamp   <101e7f773>
> >> [106902.342909]   next_to_watch<1>
> >> [106902.342910]   jiffies  <101e7fca2>
> >> [106902.342911]   next_to_watch.status <0>
> >> [106902.342912] MAC Status <80683>
> >> [106902.342912] PHY Status <796d>
> >> [106902.342913] PHY 1000BASE-T Status  <3800>
> >> [106902.342914] PHY Extended Status<3000>
> >> [106902.342915] PCI Status <10>
> >> [106903.349326] [ cut here ]
> >> [106903.349336] WARNING: at net/sched/sch_generic.c:255
> >> dev_watchdog+0xeb/0x14b()
> >> [106903.349339] Hardware name: OptiPlex 760
> >> [106903.349342] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> >> [106903.349344] Modules linked in: cdc_ether usbnet mii tun kvm_intel
> >> kvm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> >> snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_analog snd_hda_intel
> >> snd_hda_codec snd_hwdep snd_pcm radeon snd_timer snd_page_alloc ttm
> >> dell_wmi sparse_keymap wmi
> >> [106903.349379] Pid: 0, comm: swapper Not tainted 3.1.0-rc6+ #202
> >> [106903.349382] Call Trace:
> >> [106903.349384][] warn_slowpath_common+0x80/0x98
> >> [106903.349394]  [] warn_slowpath_fmt+0x41/0x43
> >> [106903.349399]  [] dev_watchdog+0xeb/0x14b
> >> [106903.349404]  [] run_timer_softirq+0x217/0x300
> >> [106903.349408]  [] ? run_timer_softirq+0x184/0x300
> >> [106903.349413]  [] ? netif_tx_unlock+0x51/0x51
> >> [106903.349419]  [] __do_softirq+0xe2/0x1bc
> >> [106903.349424]  [] ? paravirt_read_tsc+0x9/0xd
> >> [106903.349428]  [] ? sched_clock+0x9/0xd
> >> [106903.349434]  [] call_softirq+0x1c/0x30
> >> [106903.349438]  [] do_softirq+0x46/0x9c
> >> [106903.349442]  [] irq_exit+0x5b/0xbe
> >> [106903.349446]  [] do_IRQ+0x89/0xa0
> >> [106903.349452]  [] common_interrupt+0x73/0x73
> >> [106903.349454][] ? mwait_idle+0x8a/0xc1
> >> [106903.349462]  [] ? mwait_idle+0x81/0xc1
> >> [106903.349467]  [] cpu_idle+0xb3/0xd5
> >> [106903.349473]  [] rest_init+0xb2/0xb9
> >> [106903.349477]  [] ?
> >> csum_partial_copy_generic+0x16c/0x16c
> >> [106903.349483]  [] start_kernel+0x390/0x39b
> >> [106903.349487]  [] x86_64_start_reservations+0xb6/0xba
> >> [106903.349491]  [] x86_64_start_kernel+0x101/0x110
> >> [106903.349495] ---[ end trace 1d36d9ed335e092c ]---
> >> [106903.349722] e1000e :00:19.0: eth0: Reset adapter
> >>
> > 
> > What's your kernel version?
> 
> 
> My info:
> 
> bash-4.1$ uname -a
> Linux darkstar 3.1.0-rc6+ #202 SMP Tue Sep 20 12:55:02 HKT 2011 x86_64
> Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz GenuineIntel GNU/Linux
> 
> bash-4.1$ lspci|grep Ethernet
> 00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network
> Connection (rev 02)
> 

My pci info:
06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
Ethernet Controller (Copper) (rev 01)

> > 
> > On my machine also have the same problem, but don't need suspend/resume,
> > just need BOOT and WAIT!!
> > Sometimes just need wait 35 - 45 minutes, but sometimes just need wait 3
> > minutes.
> > 
> > My kernel version is v3.0.
> > 
> > 


Thank's
Joey Lee


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.so

Re: [E1000-devel] [BUG?] e1000e Detected Hardware Unit Hang

2011-09-28 Thread Dave Young
On 09/29/2011 01:28 PM, joeyli wrote:

> Hi Dave, 
> 
> 於 四,2011-09-29 於 10:41 +0800,Dave Young 提到:
>> Hi,
>>
>> suspend to ram, after resume, I got below info: (attached the full dmesg)
>>
>> [106900.343520] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
>> [106900.343521]   TDH  <1>
>> [106900.343522]   TDT  <2>
>> [106900.343523]   next_to_use  <2>
>> [106900.343523]   next_to_clean<1>
>> [106900.343524] buffer_info[next_to_clean]:
>> [106900.343525]   time_stamp   <101e7f773>
>> [106900.343526]   next_to_watch<1>
>> [106900.343526]   jiffies  <101e7fa4a>
>> [106900.343527]   next_to_watch.status <0>
>> [106900.343528] MAC Status <80683>
>> [106900.343529] PHY Status <796d>
>> [106900.343529] PHY 1000BASE-T Status  <3800>
>> [106900.343530] PHY Extended Status<3000>
>> [106900.343531] PCI Status <10>
>> [106902.342904] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
>> [106902.342905]   TDH  <1>
>> [106902.342906]   TDT  <2>
>> [106902.342907]   next_to_use  <2>
>> [106902.342907]   next_to_clean<1>
>> [106902.342908] buffer_info[next_to_clean]:
>> [106902.342909]   time_stamp   <101e7f773>
>> [106902.342909]   next_to_watch<1>
>> [106902.342910]   jiffies  <101e7fca2>
>> [106902.342911]   next_to_watch.status <0>
>> [106902.342912] MAC Status <80683>
>> [106902.342912] PHY Status <796d>
>> [106902.342913] PHY 1000BASE-T Status  <3800>
>> [106902.342914] PHY Extended Status<3000>
>> [106902.342915] PCI Status <10>
>> [106903.349326] [ cut here ]
>> [106903.349336] WARNING: at net/sched/sch_generic.c:255
>> dev_watchdog+0xeb/0x14b()
>> [106903.349339] Hardware name: OptiPlex 760
>> [106903.349342] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
>> [106903.349344] Modules linked in: cdc_ether usbnet mii tun kvm_intel
>> kvm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
>> snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_analog snd_hda_intel
>> snd_hda_codec snd_hwdep snd_pcm radeon snd_timer snd_page_alloc ttm
>> dell_wmi sparse_keymap wmi
>> [106903.349379] Pid: 0, comm: swapper Not tainted 3.1.0-rc6+ #202
>> [106903.349382] Call Trace:
>> [106903.349384][] warn_slowpath_common+0x80/0x98
>> [106903.349394]  [] warn_slowpath_fmt+0x41/0x43
>> [106903.349399]  [] dev_watchdog+0xeb/0x14b
>> [106903.349404]  [] run_timer_softirq+0x217/0x300
>> [106903.349408]  [] ? run_timer_softirq+0x184/0x300
>> [106903.349413]  [] ? netif_tx_unlock+0x51/0x51
>> [106903.349419]  [] __do_softirq+0xe2/0x1bc
>> [106903.349424]  [] ? paravirt_read_tsc+0x9/0xd
>> [106903.349428]  [] ? sched_clock+0x9/0xd
>> [106903.349434]  [] call_softirq+0x1c/0x30
>> [106903.349438]  [] do_softirq+0x46/0x9c
>> [106903.349442]  [] irq_exit+0x5b/0xbe
>> [106903.349446]  [] do_IRQ+0x89/0xa0
>> [106903.349452]  [] common_interrupt+0x73/0x73
>> [106903.349454][] ? mwait_idle+0x8a/0xc1
>> [106903.349462]  [] ? mwait_idle+0x81/0xc1
>> [106903.349467]  [] cpu_idle+0xb3/0xd5
>> [106903.349473]  [] rest_init+0xb2/0xb9
>> [106903.349477]  [] ?
>> csum_partial_copy_generic+0x16c/0x16c
>> [106903.349483]  [] start_kernel+0x390/0x39b
>> [106903.349487]  [] x86_64_start_reservations+0xb6/0xba
>> [106903.349491]  [] x86_64_start_kernel+0x101/0x110
>> [106903.349495] ---[ end trace 1d36d9ed335e092c ]---
>> [106903.349722] e1000e :00:19.0: eth0: Reset adapter
>>
> 
> What's your kernel version?


My info:

bash-4.1$ uname -a
Linux darkstar 3.1.0-rc6+ #202 SMP Tue Sep 20 12:55:02 HKT 2011 x86_64
Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz GenuineIntel GNU/Linux

bash-4.1$ lspci|grep Ethernet
00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network
Connection (rev 02)

> 
> On my machine also have the same problem, but don't need suspend/resume,
> just need BOOT and WAIT!!
> Sometimes just need wait 35 - 45 minutes, but sometimes just need wait 3
> minutes.
> 
> My kernel version is v3.0.
> 
> 
> Thank's
> Joey Lee
> 



-- 
Thanks
Dave

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [BUG?] e1000e Detected Hardware Unit Hang

2011-09-28 Thread joeyli
Hi Dave, 

於 四,2011-09-29 於 10:41 +0800,Dave Young 提到:
> Hi,
> 
> suspend to ram, after resume, I got below info: (attached the full dmesg)
> 
> [106900.343520] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
> [106900.343521]   TDH  <1>
> [106900.343522]   TDT  <2>
> [106900.343523]   next_to_use  <2>
> [106900.343523]   next_to_clean<1>
> [106900.343524] buffer_info[next_to_clean]:
> [106900.343525]   time_stamp   <101e7f773>
> [106900.343526]   next_to_watch<1>
> [106900.343526]   jiffies  <101e7fa4a>
> [106900.343527]   next_to_watch.status <0>
> [106900.343528] MAC Status <80683>
> [106900.343529] PHY Status <796d>
> [106900.343529] PHY 1000BASE-T Status  <3800>
> [106900.343530] PHY Extended Status<3000>
> [106900.343531] PCI Status <10>
> [106902.342904] e1000e :00:19.0: eth0: Detected Hardware Unit Hang:
> [106902.342905]   TDH  <1>
> [106902.342906]   TDT  <2>
> [106902.342907]   next_to_use  <2>
> [106902.342907]   next_to_clean<1>
> [106902.342908] buffer_info[next_to_clean]:
> [106902.342909]   time_stamp   <101e7f773>
> [106902.342909]   next_to_watch<1>
> [106902.342910]   jiffies  <101e7fca2>
> [106902.342911]   next_to_watch.status <0>
> [106902.342912] MAC Status <80683>
> [106902.342912] PHY Status <796d>
> [106902.342913] PHY 1000BASE-T Status  <3800>
> [106902.342914] PHY Extended Status<3000>
> [106902.342915] PCI Status <10>
> [106903.349326] [ cut here ]
> [106903.349336] WARNING: at net/sched/sch_generic.c:255
> dev_watchdog+0xeb/0x14b()
> [106903.349339] Hardware name: OptiPlex 760
> [106903.349342] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> [106903.349344] Modules linked in: cdc_ether usbnet mii tun kvm_intel
> kvm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_analog snd_hda_intel
> snd_hda_codec snd_hwdep snd_pcm radeon snd_timer snd_page_alloc ttm
> dell_wmi sparse_keymap wmi
> [106903.349379] Pid: 0, comm: swapper Not tainted 3.1.0-rc6+ #202
> [106903.349382] Call Trace:
> [106903.349384][] warn_slowpath_common+0x80/0x98
> [106903.349394]  [] warn_slowpath_fmt+0x41/0x43
> [106903.349399]  [] dev_watchdog+0xeb/0x14b
> [106903.349404]  [] run_timer_softirq+0x217/0x300
> [106903.349408]  [] ? run_timer_softirq+0x184/0x300
> [106903.349413]  [] ? netif_tx_unlock+0x51/0x51
> [106903.349419]  [] __do_softirq+0xe2/0x1bc
> [106903.349424]  [] ? paravirt_read_tsc+0x9/0xd
> [106903.349428]  [] ? sched_clock+0x9/0xd
> [106903.349434]  [] call_softirq+0x1c/0x30
> [106903.349438]  [] do_softirq+0x46/0x9c
> [106903.349442]  [] irq_exit+0x5b/0xbe
> [106903.349446]  [] do_IRQ+0x89/0xa0
> [106903.349452]  [] common_interrupt+0x73/0x73
> [106903.349454][] ? mwait_idle+0x8a/0xc1
> [106903.349462]  [] ? mwait_idle+0x81/0xc1
> [106903.349467]  [] cpu_idle+0xb3/0xd5
> [106903.349473]  [] rest_init+0xb2/0xb9
> [106903.349477]  [] ?
> csum_partial_copy_generic+0x16c/0x16c
> [106903.349483]  [] start_kernel+0x390/0x39b
> [106903.349487]  [] x86_64_start_reservations+0xb6/0xba
> [106903.349491]  [] x86_64_start_kernel+0x101/0x110
> [106903.349495] ---[ end trace 1d36d9ed335e092c ]---
> [106903.349722] e1000e :00:19.0: eth0: Reset adapter
> 

What's your kernel version?

On my machine also have the same problem, but don't need suspend/resume,
just need BOOT and WAIT!!
Sometimes just need wait 35 - 45 minutes, but sometimes just need wait 3
minutes.

My kernel version is v3.0.


Thank's
Joey Lee


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Possible to stop external IPMI/BMC access (port 623) by bringing iface up?

2011-09-28 Thread Jesse Brandeburg
On Wed, 28 Sep 2011 11:23:50 -0700
Carsten Aulbert  wrote:
> But now we reinstalled several machines with Debian Squeeze and
> suddenly we can only query the BMC when eth0 is down. The kernel we
> use is exactly the same (2.6.32.28 or 2.6.32.46 currently), i.e. same
> binary .deb package, same config, only the userland is changed.

This is probably the driver touching a register that prevents IPMI
traffic from flowing to the bmc.  It may be a patch that Debian made
that broke it, I don't generally track debian's forks of the kernel. :-)

Can you send the output from the ethregs tool before down/after down.

ethregs is available on e1000.sf.net in the downloads area.
 


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82574 DMA Burst Mode Enablement

2011-09-28 Thread Jesse Brandeburg
On Wed, 28 Sep 2011 11:39:54 -0700
Denis Radovanovic  wrote:
> We are currently testing small packet performance on 82574, comparing
> it to 82571. Initial pktgen measurements have shown a significant
> difference in performance that is the most visible when running
> bidirectional traffic with 256 byte packets.
> 
> Looking at the e1000e driver, we noticed that flag FLAG2_DMA_BURST is
> enabled for 82571 and 82572 but it is not enabled for 82574. After
> enabling the flag, the 82574 performance significantly improved,
> approaching the one on 82571.

At the time the feature was implemented we didn't have the bandwidth to
validate it on other parts besides 82571/2

As it stands, yes you can enable it, but there will likely be some bugs
that you will run into that we already know about but don't fully have
fixed in the code.  The bugs might result in tx hangs or other issues.
I do agree that there are significant performance gains to be had via
this feature, if the bugs can all be worked out.

if this is a feature that you would really like implemented please use
your Intel Field Agent or TME contacts  in order to document your requirement 
so we can consider it for future releases.

Thanks,
  Jesse

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] 82574 DMA Burst Mode Enablement

2011-09-28 Thread Denis Radovanovic
Hi,

We are currently testing small packet performance on 82574, comparing it to 
82571. Initial pktgen measurements have shown a significant difference in 
performance that is the most visible when running bidirectional traffic with 
256 byte packets.

Looking at the e1000e driver, we noticed that flag FLAG2_DMA_BURST is enabled 
for 82571 and 82572 but it is not enabled for 82574. After enabling the flag, 
the 82574 performance significantly improved, approaching the one on 82571.

Is there a reason that this flag is not enabled for 82574?

Thank you,
Denis Radovanovic
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] Possible to stop external IPMI/BMC access (port 623) by bringing iface up?

2011-09-28 Thread Carsten Aulbert
Hi all

this is a shot into the dark, but I thought maybe an expert here might 
recognize this issue.

We use many Supermicro PDSML-LN2+ based server where the board management 
controller (BMC/IPMI) shares the physical network connection with the system's 
eth0 device. lspci shows

0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper) (rev 03)
0e:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet 
Controller

The 82573E controller is the one shared with the BMC. Under Debian Lenny we 
have not encountered any problem with remote access to these systems, e.g.

ipmitool -U USER -P PASSWORD -I lan -H IPMI_IP power status

works. Our setup is that eth0, eth1 and the BMC have distinct IP addresses, 
but eth0 and BMC share one MAC, e.g.

n1570:~# ifconfig
eth0  Link encap:Ethernet  HWaddr 00:30:48:99:97:3a  
  inet addr:172.26.15.70  Bcast:172.31.255.255  Mask:255.240.0.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:39289518 errors:0 dropped:0 overruns:0 frame:0
  TX packets:28656886 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100 
  RX bytes:3059622083 (2.8 GiB)  TX bytes:2912754831 (2.7 GiB)
  Memory:ee10-ee12 

eth1  Link encap:Ethernet  HWaddr 00:30:48:99:97:3b  
  inet addr:10.10.15.70  Bcast:10.255.255.255  Mask:255.0.0.0
  UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
  RX packets:733539842 errors:0 dropped:11890044 overruns:0 frame:0
  TX packets:373277670 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3467288935387 (3.1 TiB)  TX bytes:350737301109 (326.6 GiB)
  Memory:ee20-ee22 

and
n1570:~# ipmitool lan print
Set in Progress : Set Complete
Auth Type Support   : NONE MD2 MD5 PASSWORD 
Auth Type Enable: Callback : MD2 MD5 PASSWORD 
: User : MD2 MD5 PASSWORD 
: Operator : MD2 MD5 PASSWORD 
: Admin: MD5 PASSWORD 
: OEM  : MD2 MD5 PASSWORD 
IP Address Source   : Static Address
IP Address  : 172.27.15.70
Subnet Mask : 255.240.0.0
MAC Address : 00:30:48:99:97:3a
SNMP Community String   : public
IP Header   : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl   : 2.0 seconds
Default Gateway IP  : 0.0.0.0
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP   : 0.0.0.0
Backup Gateway MAC  : 00:00:00:00:00:00
802.1q VLAN ID  : Disabled
802.1q VLAN Priority: 0
RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Cipher Suite Priv Max   : Xaa
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM

But now we reinstalled several machines with Debian Squeeze and suddenly we 
can only query the BMC when eth0 is down. The kernel we use is exactly the 
same (2.6.32.28 or 2.6.32.46 currently), i.e. same binary .deb package, same 
config, only the userland is changed.

There is no service listening on port 623 used by IPMI thus there should not 
be any interference. strace shows some differencing when running ifconfig eth0 
up (both attached). From my point of view, the only big difference is

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
[...]
ioctl(4, SIOCGIFFLAGS, {ifr_name="eth0", ifr_flags=IFF_BROADCAST|
IFF_MULTICAST}) = 0

with lenny and with squeeze:

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
[...]
ioctl(4, SIOCGIFFLAGS, {ifr_name="eth0", ifr_flags=IFF_BROADCAST|
IFF_MULTICAST}) = 0
ioctl(4, SIOCSIFFLAGS, {ifr_name="eth0", ifr_flags=IFF_UP|IFF_BROADCAST|
IFF_RUNNING|IFF_MULTICAST}) = 0

Could this be the problem?

thanks for any suggestion!

Cheers

Carsten




-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6
CaCert Assurer | Get free certificates from http://www.cacert.org/
execve("/sbin/ifconfig", ["ifconfig", "eth0", "up"], [/* 19 vars */]) = 0
brk(0)  = 0x11c
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fe0ceca1000
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fe0cec9f000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/opt/vdt/globus/lib/tls/x86_64/libc.so.6", O_RDONLY) = -1 ENOENT (No such 
file or directory)
stat("/opt/vdt/globus/lib/tls/x86_64",