Eli Cohen wrote:
I just posted a patch which might fix your problem. Please try it and
let us know if it fixed anything.
Hi Eli
Although Josh already reported that the patch seems to fix the issue I have a
question though.
post_send failed prints were during work in datagram mode. I don't
Hi Roland,
This patchset adds support for the following enhanced atomic
operations:
- Masked atomic compare and swap
- Masked atomic fetch and add
These operations enable using a smaller amount of memory when using
multiple locks by using portions of a 64 bit value in an atomic
operation.
For
- Add a new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD send
opcodes that can be used to mark a masked atomic compare and swap and
masked atomic fetch and add work request correspondingly.
- Add IB_DEVICE_MASKED_ATOMIC capability bit.
- Add mask fields to atomic struct
Added support for masked atomic operations:
- Masked Compare and Swap
- Masked Fetch and Add
Signed-off-by: Vladimir Sokolovsky v...@mellanox.co.il
---
drivers/infiniband/hw/mlx4/cq.c |8
drivers/infiniband/hw/mlx4/main.c |3 ++-
drivers/infiniband/hw/mlx4/qp.c | 27
On Wed, Mar 10, 2010 at 04:03:26PM +0100, Fredrik Unger wrote:
When investigating the error it seems to stem from next_eqe_sw in
drivers/net/mlx4/eq.c
called by the interrupt handler.
What happens is that (eqe-owner 0x80) is true causing the routine to return
NULL resulting in an
A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.
This code tracks that value.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
opensm/include/opensm/osm_subnet.h
A routing engine that needs to compute multicast spanning trees with
special properties will need to delete old trees. There's already
a function that does this: mcast_mgr_purge_tree().
Make it available outside osm_mcast_mgr.c, and change the name
to follow the naming convention (osm_ prefix)
If a routing engine needs to compute spanning trees with special
properties, it needs a way to override the default implementation.
A routing engine callback provides that mechanism. Routing engines
that can use the default implementation can leave the callback
pointer set to NULL.
In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.
To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
opensm/doc/current-routing.txt | 269 +++-
opensm/man/opensm.8.in |9 ++-
2 files changed, 275 insertions(+), 3 deletions(-)
diff --git a/opensm/doc/current-routing.txt
Note that the original code assumes that QoS setup is mostly static and
based only on user configuration. As a result, there is no provision for
routing engines that want to compute contributions to the SL2VL maps.
Fix this up by adding a callback to struct osm_routing_engine that computes
a
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
opensm/include/opensm/osm_base.h | 18 ++
opensm/include/opensm/osm_subnet.h |5 +
opensm/opensm/main.c |9 +
opensm/opensm/osm_subnet.c |1 +
opensm/opensm/osm_torus.c |
This is v2 of a patchset to add to opensm a new routing engine designed
to handle large fabrics connected with a 2D/3D topology.
Changes since initial version:
- Merged my patchsets from 11/20/2009, 12/18/2009, 2/16/2010.
- Moved infomation contained in the earlier patch series introduction
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.
It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t
The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed. Before this change,
opensm -Q was required for that to happen.
When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent
Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB. For IB fabrics
it also requires that _every_ application use path record queries -
any application that uses an SL that was not obtained via a path record
query may cause
The attached files can be used to test the torus-2QoS routing
engine using ibsim.
fabric-torus-5x5x5 contains a fabric description that ibsim can read.
Once ibsim is running, run opensm like this:
opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf
or
opensm --config
Eli Cohen wrote:
On Wed, Mar 10, 2010 at 04:03:26PM +0100, Fredrik Unger wrote:
When investigating the error it seems to stem from next_eqe_sw in
drivers/net/mlx4/eq.c
called by the interrupt handler.
What happens is that (eqe-owner 0x80) is true causing the routine to
return
NULL
On Wed, Mar 10, 2010 at 05:30:38PM +0200, Moni Shoua wrote:
Hi Eli
Although Josh already reported that the patch seems to fix the issue I have a
question though.
post_send failed prints were during work in datagram mode. I don't know if
Josh verified
that but I don't expect that these
Eli Cohen wrote:
The patch does not address these failures directly but maybe as a side effect they would go away too.
The patch seems to solve a case of possible live lock happening in a
node which has both CM and datagram neighbors e.g where ipoib have
called netif_stop etc but there is now
On Thu, Mar 11, 2010 at 09:47:31AM +0200, Or Gerlitz wrote:
The patch does not address these failures directly but maybe as a
side effect they would go away too.
The patch seems to solve a case of possible live lock happening in
a node which has both CM and datagram neighbors e.g where ipoib
21 matches
Mail list logo