Hi Jean,

Thanks for reporting.

On 18:28 Fri 03 Mar     , Jean-Christophe Hugly wrote:
> 
> run osm somewhere.
> 
> then one whatever workstation has an HCA connected to the same subnet,
> do this:
> 
> i=1
> while true; do
>       modprobe -r ib_mthca
>       sleep 3
>       modprobe ib_mthca
>       ibstat
>       echo $i
>       sleep 3
>       i=`expr $i + 1`
> done
> 
> For me, after i reaches 7 or 8, the port no-longer gets initialized and
> ibstat reports:
> 
>                State: Initializing
>                Physical state: LinkUp
> 
> On the other hand if you run osm with -d1 option (mostly
> single-threaded), then it seems to work indefinitely.

I've tried your script and don't see any difference between modes with
and without -d1, however my network is small - two hosts and switch,
probably this is different from your.

Also I see that finally port becomes active but after delay. Those
delays look strange and inconsistent, I will need to test more tomorrow.
Could you try such modification for your script?

i=1
while true; do
        modprobe -r ib_mthca
        sleep 3
        modprobe ib_mthca
        count=0
        while true ; do
                ibstat | egrep 'State: Active$' > /dev/null
                test $? -eq 0 && break
                count=`expr $count + 1`
                sleep 1
        done
        echo $i: delay $count
        sleep 3
        i=`expr $i + 1`
done

> I did this with osm r5594, compiled and running on suse10 (dual xeon)
> with openib of the same rev. The "client side" is the same os and rev;
> cpus are 4 opterons.
> 
> I have not started to look for faulty mutexes, yet. Where the fixes
> recently proposed in that area committed as of 5594 ?

It is not committed yet and I think that the problems are different
there (not sure however).

One reseeper related simpthom which "atomic" patch should solve is when
outstanding mad counter becomes corrupted and has negative values - this
stucks osm in resweep state. But in my tests it takes longer time to
reproduce this failure (but again, my network is small).

Sasha.
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to