Michael S. Tsirkin wrote:
plug
Have you read the boring list of rules?
http://git.openfabrics.org/~mst/boring.txt
/plug
Thanks for the pointer.
core: uncached find gid and find pkey queries
* Add ib_find_gid and ib_find_pkey over uncached device queries.
The calls might block but the
Michael S. Tsirkin wrote:
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: [PATCHv3 1/2] ipoib: handle pkey change events
This should hav ebeen 1 of 2, is that right?
Yes. should have been 2/2.
___
general mailing list
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: Re: [PATCHv3 1/2] core: uncached find gid and find pkey queries
Michael S. Tsirkin wrote:
plug
Have you read the boring list of rules?
http://git.openfabrics.org/~mst/boring.txt
/plug
Thanks for the pointer.
This still violates
On Tue, 2007-05-08 at 17:57 -0700, Roland Dreier wrote:
@@ -249,8 +249,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct
mlx4_eq *eq)
}
}
-if (eqes_found)
-eq_set_ci(eq, 1);
+eq_set_ci(eq, 1);
It has been explained in a different thread on [ofa-general] that the
problem lies in a combination of the OpenIB-cma provider not setting the
local and remote port numbers on endpoints correctly and Open MPI
stepping over the IA to save the port number to circumvent this problem,
thereby
Michael S. Tsirkin wrote:
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: Re: [PATCHv3 1/2] core: uncached find gid and find pkey queries
Michael S. Tsirkin wrote:
plug
Have you read the boring list of rules?
http://git.openfabrics.org/~mst/boring.txt
/plug
Thanks for the pointer.
This
* Add ib_find_gid and ib_find_pkey over uncached device queries.
The calls might block but the returns are always up-to-date.
* Cache pky,gid table lengths in core to avoid port info queries.
Signed-off-by: Yosef Etigin [EMAIL PROTECTED]
---
drivers/infiniband/core/device.c | 138
fix missing initialization of write_mtt_size
Signed-off-by: Eli Cohen [EMAIL PROTECTED]
---
Index: connectx_kernel/drivers/infiniband/hw/mthca/mthca_provider.c
===
---
This email was generated automatically, please do not reply
Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod
--with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod
--with-addr_trans-mod --with-rds-mod --with-cxgb3-mod
Passed:
Passed on i686 with
@@ -642,6 +651,12 @@ void ipoib_ib_dev_flush(struct work_stru
ipoib_ib_dev_down(dev, 0);
+ if (restart_qp) {
+ if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags))
+ ipoib_ib_dev_stop(dev, 0);
+ ipoib_ib_dev_open(dev);
+ }
+
Michael S. Tsirkin wrote:
@@ -642,6 +651,12 @@ void ipoib_ib_dev_flush(struct work_stru
ipoib_ib_dev_down(dev, 0);
+ if (restart_qp) {
+ if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags))
+ ipoib_ib_dev_stop(dev, 0);
+
I don't understand what you mean here. How is unconditionally arming
the EQ at the end of mlx4_eq_int() any different from your proposed
patch? My change calls eq_set_ci() at the end of every call to
mlx4_eq_int(), and your change calls eq_set_ci() after every call to
OK. looks pretty good to me. One coding style violation I found:
@@ -268,5 +265,9 @@ void ipoib_event(struct ib_event_handler
record-element.port_num == priv-port) {
ipoib_dbg(priv, Port state change event\n);
queue_work(ipoib_workqueue, priv-flush_task);
On Wed, 2007-05-09 at 04:03 -0700, Roland Dreier wrote:
I understand all that. The question is, what's the difference between
my version (which is in my tree now), which does:
mlx4_eq_int(...eq...)
{
...
eq_set_ci(eq, 1);
Quoting Eli Cohen [EMAIL PROTECTED]:
Subject: [PATCH] IB/core user memory registrations
fix missing initialization of write_mtt_size
Signed-off-by: Eli Cohen [EMAIL PROTECTED]
This is actually IB/mthca, right? Wow, this seems to fix breakage introduced by
latest core changes, is that
From: Stefan Roscher [EMAIL PROTECTED]
Some pSeries hypervisor versions show a race condition in the allocate MR hCall.
Serialize this call per adapter to circumvent this problem.
Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
---
drivers/infiniband/hw/ehca/ehca_classes.h |1 +
The driver needs to always supply the GRH present flag to the hypervisor,
whether it's true or false. Not supplying it (i.e. not setting the
corresponding mask bit) amounts to a perhaps, which we don't want.
Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
---
drivers/infiniband/hw/ehca/ehca_qp.c
eHCA's sysfs attributes are now being created via sysfs_create_group(),
making the process neatly table-driven. The return value is checked, thus
fixing a few compiler warnings.
Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
---
drivers/infiniband/hw/ehca/ehca_main.c | 86
- In ehca_process_eq(), we're IRQ safe throughout the whole function, so we
don't need another _irqsave in the middle of flight.
- take_over_work() is only called by comp_pool_callback(), so it can move
into the same #ifdef block.
Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
---
On May 9, 2007, at 1:37 AM, Or Gerlitz wrote:
Doing a bit of zoom out from the how to make ofed's udapl work for
ompi thread, my thinking is that the ompi udapl btl enablement is
actually only the first step, where for production/longterm/etc you
want to have an rdmacm btl.
I think this
Michael S. Tsirkin wrote:
OK. looks pretty good to me. One coding style violation I found:
fixed
--
This issue was found during partitioning SM fail over testing.
* Added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
* Rename the polling thread work to 'pkey_poll_task' to
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: Re: [PATCHv3 2/2] ipoib: handle pkey change events
Michael S. Tsirkin wrote:
OK. looks pretty good to me. One coding style violation I found:
fixed
OK, Ack for this latest revision.
I'm quite happy with the latest state of these 2
Michael S. Tsirkin wrote:
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: Re: [PATCHv3 2/2] ipoib: handle pkey change events
Michael S. Tsirkin wrote:
OK. looks pretty good to me. One coding style violation I found:
fixed
OK, Ack for this latest revision.
I'm quite happy with the
These two patches fix bug #577: PKey table reordering caused by SM failover
stops ipoib traffic
patch 1: add uncached device queries to core
patch 2: restart ipoib qp on pkey change event, and use uncached queries on qp
init
--
___
general mailing
* Add ib_find_gid and ib_find_pkey over uncached device queries.
The calls might block but the returns are always up-to-date.
* Cache pky,gid table lengths in core to avoid port info queries.
Signed-off-by: Yosef Etigin [EMAIL PROTECTED]
---
drivers/infiniband/core/device.c | 138
This issue was found during partitioning SM fail over testing.
* Added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
* Rename the polling thread work to 'pkey_poll_task' to avoid ambiguity
* Upon PKEY_CHANGE event, schedule a work that restarts the QP
* Restart child
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: [PATCHv4 for-2.6.22 1 of 2] core: uncached find gid and find
pkey queries
* Add ib_find_gid and ib_find_pkey over uncached device queries.
The calls might block but the returns are always up-to-date.
* Cache pky,gid table lengths in
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: [PATCHv4 for-2.6.22 2 of 2] ipoib: handle pkey change events
This issue was found during partitioning SM fail over testing.
* Added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
* Rename the polling thread work to
Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: [PATCHv4 for-2.6.22 0 of 2] pkey change handling - fix bug #577
These two patches fix bug #577: PKey table reordering caused by SM failover
stops ipoib traffic
patch 1: add uncached device queries to core
patch 2: restart ipoib qp on pkey
You say that fixes the problem, does it work even when running more than
one MPI process per node? (that is the case the hack fixes) Simply
doing an mpirun with a -np paremeter higher than the number of nodes you
have set up should trigger this case, and making sure to use '-mca btl
606 opened to track the udapl change.
607 opened to track the ompi change to remove the port number stashing
hack.
Status: I have a patch from Arlin to test today. I will test with that
patch and with the OMPI port hack removed. Stay tuned...
Steve.
On Tue, 2007-05-08 at 15:47 -0700,
I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.
The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the
The following patches add the required backports kernel addons in order to
support open-iscsi over iSER in RHAS4 up3 up4 in OFED (currently SLES 10,
SLES 10 sp1 RHEL 5 are supported).
--
Erez Zilber | 972-9-971-7689
Software
On Wed, 2007-05-09 at 08:37 +0300, Or Gerlitz wrote:
Andrew Friedley wrote:
Jeff Squyres wrote:
FWIW, yes, adding RDMA CM support has actually been on my to-do list
for a while, but it keeps getting bumped by higher priority items.
It would be *much* better if some iWARP companies got
Although as Boris pointed out, perhaps the hack in OMPI is no longer
needed at all...
On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote:
606 opened to track the udapl change.
607 opened to track the ompi change to remove the port number stashing
hack.
Status: I have a patch from Arlin
FWIW, I would marginally prefer if this bug is tracked in the Open
MPI trac ticket system, not the OFA bugzilla (Steve W. will have
write access there as soon as Chelsio submits their OMPI 3rd party
contribution agreement). We've traditionally [mostly] tracked OMPI
bugs in the OMPI bug
On May 9, 2007, at 10:30 AM, Steve Wise wrote:
Agreed. enabling udapl will get OMPI over iwarp immediately (and
hopefully in ofed-1.2). Post ofed-1.2, I think OMPI _should_ create a
rdma-cm btl. That's the plan...
Yes and no. Please see my other reply about an rdma cm BTL...
--
Jeff
Here is a fourth version of the IPOIB_CM_NOSRQ patch for review. This
patch will benefit adapters that do not support shared receive queues.
This patch incorporates the following review comments from v3:
1. Incorporated review comments (related to style) from Roland Dreier
and Michael Tsirkin
I agree OMPI trac ticket #890 should cover this. I will test the
suggested fix, just removing that one line from btl_udapl.c, on Solaris.
I am still not set up on Linux so hopefully Steve can confirm there.
-DON
Jeff Squyres wrote:
FWIW, I would marginally prefer if this bug is tracked in
Under what conditions is the field abi_compat of struct ibv_context set to
non-zero? I'm encountering a situation where it is set whencoding to verbs on a
clean OFED 1.2 install. Seems odd that it would be set since I suspected that
it would only occur for verbs 1.0/1.1 compatibility.
thanks!
Hi Doug,
I installed RHEL-4.5 on one of our ppc64 systems and recognized that asm-ppc
directory is missing in /usr/src/kernels/2.6.9-55.EL/include.
Normally I don't need this directory, but ibmebus.h includes
asm-ppc64/of_device.h. And there asm-ppc64/of_device.h includes
asm-ppc/of_device.h.
On Wednesday 09 May 2007, Michael S. Tsirkin wrote:
Quoting Peter Kjellstrom [EMAIL PROTECTED]:
Subject: Re: [ofa-general] ofa_1_2_kernel 20070508-0200 daily build
status
Not related to the failed 2.6.21.1 below, but, are there any plans to add
the EL5 and EL4u5 kernels to the list?
@@ -159,11 +214,14 @@ static struct ib_qp *ipoib_cm_create_rx_
.recv_cq = priv-cq,
.srq = priv-cm.srq,
.cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */
+ .cap.max_recv_wr = ipoib_recvq_size + 1,
.cap.max_send_sge = 1,
Michael S. Tsirkin wrote:
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+ struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+ if (priv-cm.srq)
+ handle_rx_wc_srq(dev, wc);
+ else
+ handle_rx_wc_nosrq(dev, wc);
}
I still
Quoting Pradeep Satyanarayana [EMAIL PROTECTED]:
Subject: Re: IPOIB CM (NOSRQ)[PATCH V4] patch for review
Michael S. Tsirkin wrote:
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+ struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+ if (priv-cm.srq)
+
Michael S. Tsirkin wrote:
@@ -159,11 +214,14 @@ static struct ib_qp *ipoib_cm_create_rx_
.recv_cq = priv-cq,
.srq = priv-cm.srq,
.cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */
+ .cap.max_recv_wr = ipoib_recvq_size + 1,
With the memory registration restrictions of the eHCA coupled with our
applications which require large memory registrations, we've found that
we can quickly trigger a case where ibv_reg_mr() will return -EINVAL,
when it should be returning -ENOMEM. If we were able to differentiate
this type
With the memory registration restrictions of the eHCA coupled with our
applications which require large memory registrations, we've found that
we can quickly trigger a case where ibv_reg_mr() will return -EINVAL,
when it should be returning -ENOMEM. If we were able to differentiate
this type
With the memory registration restrictions of the eHCA coupled with our
applications which require large memory registrations, we've found that
we can quickly trigger a case where ibv_reg_mr() will return -EINVAL,
when it should be returning -ENOMEM. If we were able to differentiate
this type
Roland, please pull from:
git://git.openfabrics.org/~shefty/rdma-dev.git for-roland
This will cleanup device removal synchronization in the rdma_cm. The changes
are based on 2.6.21.
Sean Hefty (3):
rdma/cm: simplify device removal handling code
rdma/cm: Fix synchronization
On Wed, 2007-05-09 at 18:24 +0200, Stefan Roscher wrote:
Hi Doug,
I installed RHEL-4.5 on one of our ppc64 systems and recognized that asm-ppc
directory is missing in /usr/src/kernels/2.6.9-55.EL/include.
Normally I don't need this directory, but ibmebus.h includes
asm-ppc64/of_device.h.
Hi Hal,
This simplifies osm_port_t structure and related API functions -
the main idea is to not use duplicated (from osm_node_t) physical port
pointers table, but only one direct pointer to appropriated physical
port (osm_physp_t).
Sasha
___
general
This removes osm_port_get_num_physp() function and instead uses native
node oriented osm_node_get_num_physp().
Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
osm/include/opensm/osm_port.h | 29 -
osm/opensm/osm_drop_mgr.c |2 +-
This removes some not really needed functions: osm_port_get_phys_ptr(),
osm_port_get_default_phys_ptr() and osm_port_get_parent_node().
Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
osm/include/opensm/osm_port.h| 101 --
osm/opensm/osm_drop_mgr.c
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote:
I missing some context here. Where are you plugging iwarp and OMPI
together?
ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed
directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl.
I'm attempting to run OMPI
So then I agree with Andrew, I think you are trying to impose
restrictions on uDAPL which are not part of the Spec.
-DON
Steve Wise wrote:
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote:
I missing some context here. Where are you plugging iwarp and OMPI
together?
ofed-1.2
I guess I have not read enough about iwarp yet but if iwarp is sitting
below ib verbs or udapl in the stack and is trying to impose
restrictions which ib verbs or udapl do not adhere to then maybe iwarp
is in the wrong place in the ofed stack.
Having said that I do agree the OMPI community
I talked with Steve a bunch on the phone about this.
1. This connector must RDMA first issue is an iWARP restriction --
it's not specific to udapl or verbs. For example, if you try to use
udapl with iWARP on Solaris, you'll have the same issue (I have no
idea whether you have iWARP
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
Steve Wise wrote:
There have been a series of discussions on the ofa general list about
this issue, and the conclusion to date is that it cannot be resolved in
the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because
Therefore, the only truly safe thing for an iWARP btl to do (or a
udapl btl since that is also an iWARP btl) is to have the active
layer send an MPI Layer nop of some kind immediately after
establishing the connection if there is nothing else to send.
This is fine for an
[EMAIL PROTECTED] wrote:
Therefore, the only truly safe thing for an iWARP btl to do (or a
udapl btl since that is also an iWARP btl) is to have the active
layer send an MPI Layer nop of some kind immediately after
establishing the connection if there is nothing else to send.
This is fine
Steve Wise wrote:
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
Steve Wise wrote:
There have been a series of discussions on the ofa general list about
this issue, and the conclusion to date is that it cannot be resolved in
the rdma-cm or iwarp-cm code of the linux rdma stack.
On Wed, 2007-05-09 at 17:46 -0700, Andrew Friedley wrote:
Therefore, the only truly safe thing for an iWARP btl to do (or a
udapl btl since that is also an iWARP btl) is to have the active
layer send an MPI Layer nop of some kind immediately after
establishing the connection if there is
On Wednesday 09 May 2007 21:05, Doug Ledford wrote:
On Wed, 2007-05-09 at 18:24 +0200, Stefan Roscher wrote:
Hi Doug,
I installed RHEL-4.5 on one of our ppc64 systems and recognized that asm-ppc
directory is missing in /usr/src/kernels/2.6.9-55.EL/include.
Normally I don't need this
The reason it is hard or impossible to solve this in the DAPL layer is
that any rdma operation on the QP affects the state of that QP and the
associate CQs. In addition, if you use an RDMA send to enforce this you
impact the other side by consuming a RECV buffer. So its hard if not
impossible to
I see a new patch ipoib_correct_timers.patch in OFED-1.2-20070509-0600,
which patch should I try?
Scott
-Original Message-
From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
Sent: Monday, May 07, 2007 1:03 PM
To: Scott Weitzenkamp (sweitzen)
Cc: Yohad Dickman; Amit Krig; Tziporet
@@ -1020,7 +1020,7 @@ static struct ib_mr *mthca_reg_user_mr(s
int shift, n, len;
int i, j, k;
int err = 0;
-int write_mtt_size;
+int write_mtt_size = mthca_write_mtt_size(dev);
mr = kmalloc(sizeof *mr, GFP_KERNEL);
if (!mr)
Not sure I
thanks, it all looks fine... I'll apply when I'm back from my trip on Monday.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
Under what conditions is the field abi_compat of struct ibv_context
set to non-zero? I'm encountering a situation where it is set
whencoding to verbs on a clean OFED 1.2 install. Seems odd that it
would be set since I suspected that it would only occur for verbs
1.0/1.1 compatibility.
ok, I'll look at this when I get back home on Monday.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-Original Message-
Under what conditions is the field abi_compat of struct ibv_context
set to non-zero? I'm encountering a situation where it is set
whencoding to verbs on a clean OFED 1.2 install. Seems odd that it
would be set since I suspected that it would only occur
Michael S. Tsirkin wrote:
@@ -642,6 +651,11 @@ void ipoib_ib_dev_flush(struct work_stru
ipoib_ib_dev_down(dev, 0);
+ if (restart_qp) {
+ ipoib_ib_dev_stop(dev, 0);
+ ipoib_ib_dev_open(dev);
+ }
+
/*
* The device could have been brought down
72 matches
Mail list logo