[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
> I've tried this with RHEL4 U3 x86_64 LionMini SDR, SLES10 x86_64 LionCub DDR,
> and RHEL4 U3 x86_64 LionMini DDR so far.

You reported an oops. One which OS/HW/FW did you observe it?

-- 
MST
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
Scott, pls provide data about the crash as requested by previous
comments (note you don't have to reproduce it to provide that data).

-- 
MST
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Scott Weitzenkamp \(sweitzen\)
Yes, I'm toggline one port at a time, among 4 ports (2 for each host).

I've only seen a crash once, every other time IPoIB CM just stops
working.
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
Is that what you are doing?

I can't easily do this (this requires a 3'd system)
and I can't see how toggling ports connected to host A
would crash host B.

-- 
MST
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Scott Weitzenkamp \(sweitzen\)
You have:

ports="7 8";

This is only toggling ports on one host, try adding the ports for the
other host to the list too.

Scott 
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
> Regarding comment #34, can you add details on how you are doing port failover
> (using ibportstate?)

> and what traffic you are running (what is the netperf
> command line?)?

I have opensm running on host 11.4.3.175.
Host 11.4.3.178 is connected to switch lid 5 ports 7 and 8.
I run on 11.4.3.175:

#!/bin/bash
while
./netperf -D -H 11.4.3.178
do
date
sleep 5
done

and at the same time, also on 11.4.3.175:


#!/bin/bash
lid=5;
ports="7 8";
while
true
do
for port in $ports
do
echo ibportstate $lid $port disable
ibportstate $lid $port disable
sleep 5
echo ibportstate $lid $port enable
ibportstate $lid $port enable
sleep 5
done
done

___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
BTW, could you pls add more detail?
What OS/HW/FW does this happen with?
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failovers

2007-04-10 Thread Michael S. Tsirkin
> Unable to handle kernel NULL pointer dereference at 0020 RIP:
> {:ib_ipoib:ipoib_mcast_join_finish+69}

...

> Code: 8b 48 20 89 c8 89 ca 25 00 ff 00 00 c1 e2 18 c1 e0 08 09 c2
> RIP {:ib_ipoib:ipoib_mcast_join_finish+69} RSP
> <0101bc83bc
> 38>
> CR2: 0020
>  <0>Kernel panic - not syncing: Oops

Can you please check which code line does ipoib_mcast_join_finish+69 point at?
You can either use plain objdump for this, or use the script from
http://www.openfabrics.org/~mst/oops
give it the path to the ib_ipoib.ko file and ipoib_mcast_join_finish+69

-- 
MST
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general