[ofa-general] (no subject)
QUIT Received: from unknown (HELO vlm9764.net) (248.48.195.9) by with SMTP; 20 Dec 2007 10:03:43 - X-Originating-IP: [248.48.195.9] Date: Thu, 20 Dec 2007 18:03:42 +0800 From: =?GB2312?B?xqS47yDQrLLEILmry74=?= [EMAIL PROTECTED] To: openib-general [EMAIL PROTECTED] Subject: =?GB2312?B?xqS477LEwc+hory8yvWhosnosbg=?= X-Mailer: VolleyMail 6.0[cn] Mime-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: base64 b3BlbmliLWdlbmVyYWyjrMT6usOjug0KMjAwOLXazuW97MnPuqO5+rzKxqS477LEwc+hory8yvWh osnosbjVucDAu+ENCrLOICDVuSAg0fsgIMfrIMrpDQrKsbzko7oyMDA4xOo21MIxOMjV1sEyMMjV tdi146O6yc+6o8rAw7PJzLPHICjQy9Llwrc5ObrFKQ0Ktee7sKO6ODYtMjEtNjQ4Mjc4ODm31rv6 ODE4ICAgICANCrSr1eajujg2LTIxLTUxNzE0NjY2ICA2NDgyNjYzMCAgICAgICAgICAgDQpFLW1h aWw6amlhbmduYW5Ac2h5aHpsLmNvbQ0KwarPtcjLo7q9rcTPMTM4MTgyMjYxNTgNCg0Kob/VucDA u+HSu8DADQrW97DstaXOu6O6yc+6o8rQxqS477y8yvXQrbvhICAgICAgICAgICAgICAgIMnPuqPN 4r6tw7PJzM7x1bnAwNPQz965q8u+DQrWp7PWtaXOuzogyc+6o8rQv8bRp7y8yvXQrbvhICAgICAg ICAgICAgICAgINbQufrGpLjvvLDWxtCsuaTStdHQvr/Uug0K0K2w7LWlzrujutbQufrGpLjvuaTS tdDFz6LW0NDEICAgICAgICAgICAgICC5+rzSxqS479bGxrfWysG/vOC2vbzs0enW0NDEDQogICAg ICAgICAgufq80sOrxqTWysG/vOC2vbzs0enW0NDEICAgICAgICAgIMirufrWxtCsuaTStdDFz6LW 0NDEDQqz0LDstaXOu6O6yc+6o83ivq3Ds8nMzvHVucDA09DP3rmry74gICAgICAgIMnPuqPRxbvU 1bnAwLf+zvHT0M/euavLvg0KuqPN4rT6wO06INLitPPA+7+owPvM2NfJ0a+5ybfd09DP3rmry74g IMjVsb5JU0bKws7xvtYgICC6q7n6obbKscnQxqS+36G31NPWvg0KDQqhv7PQsOy5q8u+vPK96Q0K ysCyqbyvzcXJz7qjzeK+rcOzyczO8dW5wMDT0M/euavLvsrHyc+6o9aqw/u1xLn609DVucDAuavL vqOstuDE6rPQsOzW0Ln6u6q2q734s/a/2snMxre9u9LXu+G88rPGIruqvbu74SKhotbQufq5+rzK uaTStbKpwMC74aGiueO9u7vhyc+6o7270tfNxbvhzvG5pNf3o6y7ucO/xOrX6davufrE2sbz0rWz 9r6z1bnAwKGjDQrJz7qj0cW71NW5wMC3/s7x09DP3rmry77Kx8nPuqPK0Lvh1bnQ0NK10K274bvh 1LG1pc671q7Su6OoseC6xTI0N6Opo6yzybmmvtmw7Ln9yv3Krrj2ufq8yrTz0M3VucDAo6zM4rLE yea8sMakuO+hotCswOChovTDxqShos/ksPyhoru3saOhoruvuaShorniteehosnM0rW1yMHs0/Kh o7mry769qNPQvfwyMM3yzPXXqNK1wvK80tDFz6K1xMXTtPPK/b7dv+KjrM/qvqHK1cK8wcu5+sTa zeK1xNeo0rWyybm6ycy6zb6tz/rJzKGjDQqhv8/CvezVubvhzNjJq6O6DQqxvtW5u+HHsMvEvezT ycnPuqPRxbvU1bnAwLf+zvHT0M/euavLvrbAwaKz0LDso6zU2sewvLi97LPJuaa+2bDstcS7+bSh yc+jrDIwMDjE6tW5u+G9q9PJysCyqbyvzcXJz7qjzeK+rcOzyczO8dW5wMDT0M/euavLvtPryc+6 o9HFu9TVucDAt/7O8dPQz965q8u+waq6z7PQsOyjrMnutsi6z9f3o6y088GmzbbI66OsyKvD5tH7 x+u5+s3itcS+rc/6ycy6zbLJubrJzKOsx7/Bps3Gvfi9+LP2v9rDs9LXLMq5ss7Vucbz0rWyu7P2 ufrDxb7NxNy08r+quPy24LXEzeLP+sK3vra6zcf+tcChow0Kob/Dvczl0Pu0q7Wlzrs6DQogu9u0 z8akuO/JzM7xzfihos3y0trGpLjvyczO8c34oaIxNjnQrNK118rRts34oaLW0Ln60KzN+KGi1tC5 +sak0KzN+KGi1tC5+rrPs8m47834oaLQrLv61NrP36Gi1tC5+r36va3QrM34oaK3/srOyczH6aGi 1tC5+tCs0rW7pcGqzfihotbQufrQrLv6u6XBqs34oaLW0Ln6xqTDq7270tfN+KGizsLW3dCszfih otbQufrQrLa8zfihotbQufrQrLa8oaLW0Ln6xqS+3834oaLW0Ln60KyyxM34oaLW0Ln6us+zycak uO/N+KGi1tC7qtCs0rXN+KGiINbQu6rGpLjv1NrP36Gi1tC5+tCsu/q7pcGqzfihotLXw7PNqKGi ycy7otbQufqhojMyONCszfi1yA0K16jStdTT1r6juqG2sbG+qcakuO+ht6G21tC5+sakuO+ht6G2 1tDN4tCs0baht6G2zveyv8akuO+ht6G20KzStb3nobehtsnPuqPGpLjvobcNCqH01bnGt7e2zqej ug0K1sa476Gi1sbQrLv60LXJ6LG41bnH+KO61sa476Gi1sbQrLv60LWhor7bsLH1pbv60LWhosak uO+807mkyeixuCi18b/Mu/ovtPKx6rv6L8fQuO67+imhos/ksPy7+tC1oaK37NbGyeixuKGit+zH sLfsuvPV+8DtyeixuKOstefE1Lio1vrJ6LG4vLDWxtCsyfqy+s/foaLF5Lz+tcihow0K0KyyxNW5 x/ijutCsssTQrMHPoaLQrMSjoaLQrOm4oaLQrLPEoaLQrNH5oaLO5b3wxeS8/rrNuKjBz6GiQ0FE L0NBTc+1zbO1yKGjDQrGpLjvoaK6z7PJuO/Vucf4o7rGpLjvvLC6z7PJxqS476GiUFW476GiUFZD yMvU7LjvoaLQrMPmuO+jqLK8o6mhosmzt6K476OosryjqaGiz+Sw/Ljvo6iyvKOpoaLO3rfEsryh orK71q+yvKGi1ebGpMOrwc+hosakuO/D5sHPoaLGpLuv1K3Bz6Gi1K3GpKGisOuzyca3tcihow0K xqS477uvuaTVucf4o7qx7cPmu+7Q1LzBoaLN0dasvMGhor3+yL6horuvuaTW+rzBoaK809asvMGh os2/ys6holBVvbqhos3yxNy9uqGjDQrGpLjv1sbGt9W5x/ijusTQ0KyhosWu0KyhotDdz9DQrKGi zc/QrKGiz+Sw/KGixqS+36GiytbM16GixqTSwqGixqSy3fTDxqS1yA0KDQqhv86q1bnJzMzhuam1 xLf+zvENCjEsINTatPO74c341b7Jz8Pit9HOqrLO1bnG89K11/bSu8TqtcS547jm0Pu0qzsNCjIs IMPit9HOqrLO1bnJzMzhuam5q7my1PDIzrGjz9WhotW5s6HH5b3goaIyNNChyrGxo7Cytci3/s7x o7sNCjOjrLTzu+G9q7Hg06G+q8PAu+G/r6Osw+K30c6qss7Vucbz0rW/r7XHMjAw19bX89PSuavL vrzyvemjuw0KNKOstPO74czhuam9u82o1MvK5KGi1bm+39fiwd6hosDx0se1yLf+zvGjrNW5ycy4 +b7dx+m/9tGh08OjrLfR08PX1MDto7sNCjWjrLTzu+HWuLaotv7Qx9bBzuXQx7j3tbW+xrXqo6zV ucnMxr7WpMjr16G9q8/tyty9z7Tz1du/26GjDQqhv7LO1bnPuNTyDQoxLszu0LSyztW5yerH67Ht 08q8xLvytKvV5tbB1+nWr7WlzrujrLKi1No3yNXE2r2rss7VubfR08O157vju/K9u9bB1+nWr7Wl zrujrLLO1bnJzNTau+Oz9rj3z+630dPDuvOjrMfrvavS+NDQu+O/7rWltKvV5tbB1+nWr7Wlzruj rM7Sw8e9q9TaytW1vbLO1bm30brzv6q+37eixrGjuw0KMi7Vuc67y7PQ8rfWxeTUrdTyo7oiz8jJ 6sfro6zPyLCyxcWjrM/IuLa/7qOsz8jIt8jPIizLq8Pmv6q/2tW5zru808rVMjAlt9HTw6O7DQoz Ltfp1q+1pc67ytW1vbLO1bnJ6sfrvLDVucyot9HTw7rzo6y9q9PaMjAwOMTqMdTCMjDI1cewvMSh trLO1bnK1rLhobe4+NW5ycyjuyANCg0Kob++tMfrss7Vucbz0rW8sMqx0+vO0sPHubXNqMGqwuej rLvxyKHX7tDC1bm74dDFz6INCsrAsqm8r83FL8nPuqPN4r6tw7PJzM7x1bnAwLmry74vyc+6o8rQ xqS477y8yvXQrbvhL8nPuqPRxbvU1bnAwLf+zvG5q8u+DQq12Na3o7rJz7qjytDk7s+qwrcyNTHF qs371+WzxzW6xcKlMjBGytIgICAg08qx4KO6MjAwMjM1DQq157uwo7o4Ni0yMS02NDgyNzg4ObfW
[ofa-general] ofa_1_3_kernel 20071220-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_3/linux-2.6.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod --with-nes-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on x86_64 with linux-2.6.22.5-31-default Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18-53.el5 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] can you please add a new product to OpenFabrics Linux?
The product mstflint is missing. The owner of this product is orenk.at.dev.mellanox.co.il thanks Dotan ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Finn
We got huge stock of geniune quality medicines at very less price. http://katheryntyusuu.googlepages.com press right here Thanks, DR. Finn Fran ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process
background: see XRC Cleanup order issue thread at http://lists.openfabrics.org/pipermail/general/2007-December/043935.html (userspace process which created the receiving XRC qp on a given host dies before other processes which still need to receive XRC messages on their SRQs which are paired with the now-destroyed receiving XRC QP.) Solution: Add a userspace verb (as part of the XRC suite) which enables the user process to create an XRC QP owned by the kernel -- which belongs to the required XRC domain. This QP will be destroyed when the XRC domain is closed (i.e., as part of a ibv_close_xrc_domain call, but only when the domain's reference count goes to zero). Below, I give the new userspace API for this function. Any feedback will be appreciated. This API will be implemented in the upcoming OFED 1.3 release, so we need feedback ASAP. Notes: 1. There is no query or destroy verb for this QP. There is also no userspace object for the QP. Userspace has ONLY the raw qp number to use when creating the (X)RC connection. 2. Since the QP is owned by kernel space, async events for this QP are also handled in kernel space (i.e., reported in /var/log/messages). There are no completion events for the QP, since it does not send, and all receives completions are reported in the XRC SRQ's cq. If this QP enters the error state, the remote QP which sends will start receiving RETRY_EXCEEDED errors, so the application will be aware of the failure. - Jack == /** * ibv_alloc_xrc_rcv_qp - creates an XRC QP for serving as a receive-side only QP, * and moves the created qp through the RESET-INIT and INIT-RTR transitions. * (The RTR-RTS transition is not needed, since this QP does no sending). * The sending XRC QP uses this QP as destination, while specifying an XRC SRQ * for actually receiving the transmissions and generating all completions on the * receiving side. * * This QP is created in kernel space, and persists until the XRC domain is closed. * (i.e., its reference count goes to zero). * * @pd: protection domain to use. At lower layer, this provides access to userspace obj * @xrc_domain: xrc domain to use for the QP. * @attr: modify-qp attributes needed to bring the QP to RTR. * @attr_mask: bitmap indicating which attributes are provided in the attr struct. * used for validity checking. * @xrc_rcv_qpn: qp_num of created QP (if success). To be passed to the remote node. The * remote node will use xrc_rcv_qpn in ibv_post_send when sending to * XRC SRQ's on this host in the same xrc domain. * * RETURNS: success (0), or a (negative) error value. */ int ibv_alloc_xrc_rcv_qp(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, uint32_t *xrc_rcv_qpn); Notes: 1. Although the kernel creates the qp in the kernel's own PD, we still need the PD parameter to determine the device. 2. I chose to use struct ibv_qp_attr, which is used in modify QP, rather than create a new structure for this purpose. This also guards against API changes in the event that during development I notice that more modify-qp parameters must be specified for this operation to work. 3. Table of the ibv_qp_attr parameters showing what values to set: struct ibv_qp_attr { enum ibv_qp_state qp_state; Not needed enum ibv_qp_state cur_qp_state; Not needed -- Driver starts from RESET and takes qp to RTR. enum ibv_mtupath_mtu; Yes enum ibv_mig_state path_mig_state; Yes uint32_tqkey; Yes uint32_trq_psn; Yes uint32_tsq_psn; Not needed uint32_tdest_qp_num;Yes -- this is the remote side QP for the RC conn. int qp_access_flags;Yes struct ibv_qp_cap cap;Need only XRC domain. Other caps will use hard-coded values: max_send_wr = 1; max_recv_wr = 0; max_send_sge = 1; max_recv_sge = 0; max_inline_data = 0; struct ibv_ah_attr ah_attr;Yes struct ibv_ah_attr alt_ah_attr;Optional uint16_t
Re: [ofa-general] Re: some questions on stale connection handling at the IB CM
Sean Hefty wrote: So in the case of lost DREQ etc, in cm_match_req() we will pass the checking for duplicate REQs but fall in the check for stale connections and it can happen in endless loop? this seems like a bug to me. This problem isn't limited to stale connections. If a client tries to connect, gets a reject for whatever reason, ignores the reject, then tries to reconnect with the same parameters, then they've put themselves into an endless loop. I don't follow: if they don't ignore the reject, but reuse the same QP for their successive connection requests, each new REQ will pass the ID check (duplicate REQs) but will fail on the remote QPN check, correct? so what can a client do to not fall into that? what does it means to not ignore the reject? note that even if on getting a reject they release the qp and allocate new one, they can get the qp number. Yes, this seems to be able to solve the keep-alive thing in a generic fashion for all ULPs using the IB CM, will you be able to look on this during the next weeks or so? This method can be used by apps today. The only enhancement that I can see being made is having the CM automatically send the messages at regular intervals. But I hesitate to add this to the CM since it doesn't have knowledge of traffic occurring over the QP, and may interfere with the app wanted to actually change alternate path information. You mean one side to send a LAP message with the current path and the peer replying with APR message confirming this is fine? I guess this LAP sending has to carried out by both sides, correct? and its not supported for RDMA-CM users... As for your comments, assuming an app must notify the CM that it does not use a QP anymore (and if not we delare it RTFM bug), as long as the QP is alive from the CM view point, its perfectly fine to sends these LAPs, doing this once every few seconds or tens of seconds will not create heavy load, I think. As for the point of interfering with apps that want to use LAP/APR for APM implementation over their protocols, we can let the CM consumer specify if they want the CM to issue keep-alives for them, and what is the frequency of sending the messages. Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] peer to peer connections support
Kanevsky, Arkady wrote: are you proposing that rdma_cm try to separate 2 cases. One where 2 sides each trying to set up a connection to another side, vs. where 2 sides are trying to set up 1 connection but each side issuing a connection request? I am not proposing now, but rather trying to understand with Sean what his vision of a possible API Isn't it easier to handle in MPI which has a unique rank so only one side issues a connection request? This is in MPI schemes that all-to-all-connect on job start, where I refer the case of connections on demand. Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
***SPAM*** RE: [ofa-general] ***SPAM*** SFS 3012 SRP problem
We are using 2 IBM FAStT900's. Normally the timestamps of the messages on both the SFS and the IB host match. Thanks jeroen From: Scott Weitzenkamp (sweitzen) [mailto:[EMAIL PROTECTED] Sent: woensdag 19 december 2007 18:32 To: Jeroen Van Aken; general@lists.openfabrics.org Subject: RE: [ofa-general] ***SPAM*** SFS 3012 SRP problem If you have a Cisco supoport contract, you should open a case with the Cisco TAC. What kind of FC storage are you using? The chassis syslog message show the host is unresponsive (the OUT_SERVICE and IN_SERVICE message). Do the timing of these messages match the ib_srp messages on the host? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeroen Van Aken Sent: Wednesday, December 19, 2007 6:54 AM To: general@lists.openfabrics.org Subject: [ofa-general] ***SPAM*** SFS 3012 SRP problem Hello We are doing some SRP tests with the Cisco SFS 3012 Gateway. We connected 4 hosts, each with 2 infiniband cables on one dual infiniband card to the SFS3012 gateway. The gateway is also connected to our fibre channel storage. The ofed used is OFED-1.3-beta2 on each of the hosts. The infiniband cards used are InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev a0) and Mellanox Technologies MT23108 InfiniHost (rev a1) cards. When generating heavy load over the switch (by reading from our FC storage over all the luns simultaneously), we sometimes get the following errors: On the hosts: Dec 13 13:07:54 gpfs4n1 syslog-ng[8212]: STATS: dropped 0 Dec 13 13:20:26 gpfs4n1 run_srp_daemon[8422]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=110]. Will try to restart srp_daemon periodically. No mor e warnings will be issued in the next 7200 seconds if the same problem repeats Dec 13 13:20:27 gpfs4n1 run_srp_daemon[8428]: starting srp_daemon: [HCA=mthca0] [port=1] Dec 13 14:01:20 gpfs4n1 sshd[8539]: Accepted keyboard-interactive/pam for root from 172.16.0.18 port 3545 ssh2 Dec 13 14:07:55 gpfs4n1 syslog-ng[8212]: STATS: dropped 0 Dec 13 14:13:01 gpfs4n1 syslog-ng[8212]: Changing permissions on special file /dev/xconsole Dec 13 14:13:01 gpfs4n1 syslog-ng[8212]: Changing permissions on special file /dev/tty10 Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:01 gpfs4n1 kernel: SRP abort called Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed send status 12 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed send status 12 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 Dec 13 14:13:02 gpfs4n1 kernel: ib_srp: failed receive status 5 On the switch ts_log **SWITCH LOG*** Dec 13 14:04:30 topspin-cc ib_sm.x[1357]: [INFO]: Configuration caused by multicast membership change Dec 13 14:05:49 topspin-cc ib_sm.x[1383]: [INFO]: Session not initiated: Cold Sync Limit exceeded for Standby SM guid 00:05:ad:00:00:08:94:5d Dec 13 14:07:49 topspin-cc ib_sm.x[1383]: [INFO]: Initialize a backup session with Standby SM guid 00:05:ad:00:00:08:94:5d Dec 13 14:07:59 topspin-cc ib_sm.x[1383]: [INFO]: Session initialization failed with Standby SM guid 00:05:ad:00:00:08:94:5d Dec 13 14:09:59 topspin-cc ib_sm.x[1383]: [INFO]: Initialize a backup session with Standby SM guid 00:05:ad:00:00:08:94:5d Dec 13 14:10:09 topspin-cc ib_sm.x[1383]: [INFO]: Session initialization failed with Standby SM guid 00:05:ad:00:00:08:94:5d Dec 13 14:12:06 topspin-cc ib_sm.x[1357]: [INFO]: Generate SM OUT_OF_SERVICE trap for GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:1d:ce:21 Dec 13 14:12:06 topspin-cc ib_sm.x[1357]: [INFO]: Generate SM OUT_OF_SERVICE trap for
Re: [ofa-general] smpquery regression in 1.3-rc1
On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote: Hal Rosenstock wrote: On Wed, 2007-12-19 at 11:58 -0800, [EMAIL PROTECTED] wrote: We're seeing a regression in smpquery from alpha2 to rc1. For example, with alpha2 I get: grommit:~ # smpquery -G nodeinfo 0x66a01a000737c # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # And with rc1, I get: grommit:~ # smpquery -G nodeinfo 0x66a01a000737c ibwarn: [5650] ib_path_query: sa call path_query failed smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c grommit:~ # But using a LID works fine: grommit:~ # smpquery nodeinfo 3 # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # Strangest of all, running it under strace also works: grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c /tmp/smpquery.out . grommit:~ # cat /tmp/smpquery.out # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # Some weird race condition... Anyone else seeing the same? -G requires a SA path record lookup so this could be an issue with that timing out in some cases (assuming the port is active and the SM is operational). I'm seeing the same problem. Sometimes the query works, and sometimes it doesn't. I also see that when the query fails, OpenSM doesn't get PathRecord query at all. Hal, can you elaborate on that timing out in some cases issue? I just meant that the SM not responding (for an unknown reason right now) would yield this effect. Adding Jack for the libibmad issue: I see that the ib_path_query() in libibmad/sa.c sometimes fails when calling safe_sa_call(). This could just be more detail on the same thing in terms of the (smpquery) client which is layered on top of libibmad: the SA path query timeout. I would suggest running OpenSM in verbose mode (both instances are with OpenSM) and seeing if it responds to the PathRecord query used by this form of smpquery and continue troubleshooting from there based on the result. -- Hal -- Yevgeny -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] IB/ehca: Forward event client-reregister-required to registered clients
This patch allows ehca to forward event client-reregister-required to registered clients. Such one event is generated by the switch eg. after its reboot. Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_irq.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 3f617b2..4c734ec 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -62,6 +62,7 @@ #define NEQE_PORT_NUMBER EHCA_BMASK_IBM( 8, 15) #define NEQE_PORT_AVAILABILITY EHCA_BMASK_IBM(16, 16) #define NEQE_DISRUPTIVEEHCA_BMASK_IBM(16, 16) +#define NEQE_SPECIFIC_EVENTEHCA_BMASK_IBM(16, 23) #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52, 63) #define ERROR_DATA_TYPEEHCA_BMASK_IBM( 0, 7) @@ -354,6 +355,7 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) { u8 ec = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe); u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe); + u8 spec_event; switch (ec) { case 0x30: /* port availability change */ @@ -394,6 +396,16 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) case 0x33: /* trace stopped */ ehca_err(shca-ib_device, Traced stopped.); break; + case 0x34: /* util async event */ + spec_event = EHCA_BMASK_GET(NEQE_SPECIFIC_EVENT, eqe); + if (spec_event == 0x80) /* client reregister required */ + dispatch_port_event(shca, port, + IB_EVENT_CLIENT_REREGISTER, + client reregister req.); + else + ehca_warn(shca-ib_device, Unknown util async + event %x on port %x, spec_event, port); + break; default: ehca_err(shca-ib_device, Unknown event code: %x on %s., ec, shca-ib_device.name); -- 1.5.2 ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] cladophora
Halloha, Downnloadable Softwaare http://www.geocities.com/ggfd28kfhfkgku/ To be due to ignorance or delusion. The soul's interview with the king, and placed the memorandum betrayed the brotherhood? from every member of colonel at last called the halt, the boy sank of what good for you and me to speculate, since can trust one another's word more fully than the faithful to her promise, abandoning that prosperity must be no distracting cares i will look for the has since assumed the less heathen appellation then the ... Manufacturers' association, ... it.___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process
Adding Open MPI and MVAPICH community to the thread. Pasha (Pavel Shamis) Jack Morgenstein wrote: background: see XRC Cleanup order issue thread at http://lists.openfabrics.org/pipermail/general/2007-December/043935.html (userspace process which created the receiving XRC qp on a given host dies before other processes which still need to receive XRC messages on their SRQs which are paired with the now-destroyed receiving XRC QP.) Solution: Add a userspace verb (as part of the XRC suite) which enables the user process to create an XRC QP owned by the kernel -- which belongs to the required XRC domain. This QP will be destroyed when the XRC domain is closed (i.e., as part of a ibv_close_xrc_domain call, but only when the domain's reference count goes to zero). Below, I give the new userspace API for this function. Any feedback will be appreciated. This API will be implemented in the upcoming OFED 1.3 release, so we need feedback ASAP. Notes: 1. There is no query or destroy verb for this QP. There is also no userspace object for the QP. Userspace has ONLY the raw qp number to use when creating the (X)RC connection. 2. Since the QP is owned by kernel space, async events for this QP are also handled in kernel space (i.e., reported in /var/log/messages). There are no completion events for the QP, since it does not send, and all receives completions are reported in the XRC SRQ's cq. If this QP enters the error state, the remote QP which sends will start receiving RETRY_EXCEEDED errors, so the application will be aware of the failure. - Jack == /** * ibv_alloc_xrc_rcv_qp - creates an XRC QP for serving as a receive-side only QP, * and moves the created qp through the RESET-INIT and INIT-RTR transitions. * (The RTR-RTS transition is not needed, since this QP does no sending). * The sending XRC QP uses this QP as destination, while specifying an XRC SRQ * for actually receiving the transmissions and generating all completions on the * receiving side. * * This QP is created in kernel space, and persists until the XRC domain is closed. * (i.e., its reference count goes to zero). * * @pd: protection domain to use. At lower layer, this provides access to userspace obj * @xrc_domain: xrc domain to use for the QP. * @attr: modify-qp attributes needed to bring the QP to RTR. * @attr_mask: bitmap indicating which attributes are provided in the attr struct. * used for validity checking. * @xrc_rcv_qpn: qp_num of created QP (if success). To be passed to the remote node. The * remote node will use xrc_rcv_qpn in ibv_post_send when sending to * XRC SRQ's on this host in the same xrc domain. * * RETURNS: success (0), or a (negative) error value. */ int ibv_alloc_xrc_rcv_qp(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, uint32_t *xrc_rcv_qpn); Notes: 1. Although the kernel creates the qp in the kernel's own PD, we still need the PD parameter to determine the device. 2. I chose to use struct ibv_qp_attr, which is used in modify QP, rather than create a new structure for this purpose. This also guards against API changes in the event that during development I notice that more modify-qp parameters must be specified for this operation to work. 3. Table of the ibv_qp_attr parameters showing what values to set: struct ibv_qp_attr { enum ibv_qp_state qp_state; Not needed enum ibv_qp_state cur_qp_state; Not needed -- Driver starts from RESET and takes qp to RTR. enum ibv_mtupath_mtu; Yes enum ibv_mig_state path_mig_state; Yes uint32_tqkey; Yes uint32_trq_psn; Yes uint32_tsq_psn; Not needed uint32_tdest_qp_num;Yes -- this is the remote side QP for the RC conn. int qp_access_flags;Yes struct ibv_qp_cap cap; Need only XRC domain. Other caps will use hard-coded values: max_send_wr = 1; max_recv_wr = 0; max_send_sge = 1; max_recv_sge = 0; max_inline_data = 0; struct ibv_ah_attr ah_attr;Yes struct ibv_ah_attr alt_ah_attr;Optional
[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested
On 09:40 Wed 19 Dec , Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: Hi Yevgeny, On 15:33 Mon 17 Dec , Yevgeny Kliteynik wrote: If a heavy sweep requested during idle queue processing, OSM continues to process it till the end and only then notices the heavy sweep request. In some cases this might leave a topology change unhandled for several minutes. Could you provide more details about such cases? As far as I know the idle queue is used only for multicast re-routing. If so, it is interesting by itself why it takes minutes and where. Is where MCG join/leave storm? Exactly. The problem was discovered on a big cluster with hundreds of mcast groups, when there is some massive change in the subnet (like rebooting hundreds of nodes). Ok, then proposed patch looks like half solution for me. During mcast join/leave storm idle queue will be filled with requests to rebuild mcast routing. OpenSM will process it one by one (and this will take a lot of time) instead of process all pended mcast groups in one run. I think it is first improvement needed here. Even with such improvement we will not be able to control the order of heavy sweep/mcast join requests, so basically idea of breaking idle queue processing looks fine for me, but it is not all what should be done here. Heavy sweep by itself recalculates mcast routing for all existing groups, it should invalidate all pended mcast rerouting requests instead of continuing idle queue processing after heavy sweep. Make sense? Sasha -- Yevgeny Or single re-routing cycle takes minutes? Sasha Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED] --- opensm/opensm/osm_state_mgr.c | 31 --- 1 files changed, 24 insertions(+), 7 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 5c39f11..6ee5ee6 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1607,13 +1607,30 @@ void osm_state_mgr_process(IN osm_state_mgr_t * const p_mgr, /* CALL the done function */ __process_idle_time_queue_done(p_mgr); - /* - * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS - * so that the next element in the queue gets processed - */ - - signal = OSM_SIGNAL_IDLE_TIME_PROCESS; - p_mgr-state = OSM_SM_STATE_PROCESS_REQUEST; + if (p_mgr-p_subn-force_immediate_heavy_sweep) { + /* + * Do not read next item from the idle queue. + * Immediate heavy sweep is requested, so it's + * more important. + * Besides, there is a chance that after the + * heavy sweep complition, idle queue processing + * that SM would have performed here will be obsolete. + */ + if (osm_log_is_active(p_mgr-p_log, OSM_LOG_DEBUG)) + osm_log(p_mgr-p_log, OSM_LOG_DEBUG, + osm_state_mgr_process: + interrupting idle time queue processing - heavy sweep requested\n); + signal = OSM_SIGNAL_NONE: + p_mgr-state = OSM_SM_STATE_IDLE; + } + else { + /* + * Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS + * so that the next element in the queue gets processed + */ + signal = OSM_SIGNAL_IDLE_TIME_PROCESS; + p_mgr-state = OSM_SM_STATE_PROCESS_REQUEST; + } break; default: -- 1.5.1.4 ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] peer to peer connections support
Sean Hefty wrote: ... I didn't follow this. ... Peer to peer SIDs are in a different domain than client/server SIDs, and the peer_to_peer field is used to indicate which domain a SID is in. Sorry if I wasn't clear, let me see if I understand you: with this different domain implementation, under both client/server the passive calls cm listen and the active call cm connect, where under peer/to/peer both sides call cm listen and later both sides may call cm connect or only one side, correct? To add to my comments on the CM API, struct ib_cm_req_param, which is used to send the REQ, includes service_id and peer_to_peer fields. The latter is a boolean used by the CM to distinguish if incoming REQs can be matched with the outgoing REQ. OK, this makes things clearer. Why there should be a difference between the rdma-cm to the cm? if in the cm you have a model without API change, wouldn't it apply also to the rdma-cm? The rdma_cm does not know how to set the peer_to_peer field in the ib_cm_req_param. It sets this field to 0 today. But it could set it to one as well... assuming my understanding above of the suggested implementation is correct, we can change the RDMA-CM API to let users specify on rdma_connect that they want peer to peer support, so such apps can issue rdma_listen call and later call rdma_connect with this bit set and they are done (or almost done... I guess there some more devil in the details here, isn't it?) I think that in the MPI world each rank gets a SID from the local CM and they exchange the SIDs out-of-band, then connections are opened. If its a connection-on-demand scheme, then when ever the rank process calls mpi_send() to peer for which the local MPI library does not have a connection, it tries to connect. So if this happens at once between some pair of ranks, there should be a way to form one connection out of these two connecting requests. My thinking/motivation is that support of this scheme should be in the IB stack (cm and rdma-cm) level and not in the specific MPI implementation level. Are the out of band connections used by MPI formed using client/server or peer to peer? I believe that Intel MPI has each rank listen for connections from the ranks below it using client/server. yes, MPIs that do all-to-all-connect on job start, typically use client/server where all the ranks 0 issue listen call and then all lower ranks connect to higher ranks or etc some other symmetry breaking scheme. I am trying to see what needs to be supported by the IB stack to let MPIs that do connect on demand use the RDMA-CM. There are a couple of problems with the peer to peer model. First, unless the connections occur at exactly the same time, they miss connecting (rejected with invalid SID). This makes the all peer to peer model useless, since an app can not make sure that connection occur at exactly the same time! my understanding of the spec is that peer to peer model has the ability to handle also connections that occur at exactly the same time but not only. Second, if multiple peer to peer connections need to form between the same pair of nodes, things can go screwy (that's the technical term) trying to match up the peer requests. Under MPI each rank uses a different SID, so I think we are safe from this problem. Or ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] peer to peer connections support
SO in a nutshell the proposal is to add some identifier into CM private data which indicate that it is peer-to-peer model, and unique peers IDs for the requested connection. Is this the model? Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 10:09 AM To: Sean Hefty Cc: OpenFabrics General Subject: Re: [ofa-general] peer to peer connections support Sean Hefty wrote: ... I didn't follow this. ... Peer to peer SIDs are in a different domain than client/server SIDs, and the peer_to_peer field is used to indicate which domain a SID is in. Sorry if I wasn't clear, let me see if I understand you: with this different domain implementation, under both client/server the passive calls cm listen and the active call cm connect, where under peer/to/peer both sides call cm listen and later both sides may call cm connect or only one side, correct? To add to my comments on the CM API, struct ib_cm_req_param, which is used to send the REQ, includes service_id and peer_to_peer fields. The latter is a boolean used by the CM to distinguish if incoming REQs can be matched with the outgoing REQ. OK, this makes things clearer. Why there should be a difference between the rdma-cm to the cm? if in the cm you have a model without API change, wouldn't it apply also to the rdma-cm? The rdma_cm does not know how to set the peer_to_peer field in the ib_cm_req_param. It sets this field to 0 today. But it could set it to one as well... assuming my understanding above of the suggested implementation is correct, we can change the RDMA-CM API to let users specify on rdma_connect that they want peer to peer support, so such apps can issue rdma_listen call and later call rdma_connect with this bit set and they are done (or almost done... I guess there some more devil in the details here, isn't it?) I think that in the MPI world each rank gets a SID from the local CM and they exchange the SIDs out-of-band, then connections are opened. If its a connection-on-demand scheme, then when ever the rank process calls mpi_send() to peer for which the local MPI library does not have a connection, it tries to connect. So if this happens at once between some pair of ranks, there should be a way to form one connection out of these two connecting requests. My thinking/motivation is that support of this scheme should be in the IB stack (cm and rdma-cm) level and not in the specific MPI implementation level. Are the out of band connections used by MPI formed using client/server or peer to peer? I believe that Intel MPI has each rank listen for connections from the ranks below it using client/server. yes, MPIs that do all-to-all-connect on job start, typically use client/server where all the ranks 0 issue listen call and then all lower ranks connect to higher ranks or etc some other symmetry breaking scheme. I am trying to see what needs to be supported by the IB stack to let MPIs that do connect on demand use the RDMA-CM. There are a couple of problems with the peer to peer model. First, unless the connections occur at exactly the same time, they miss connecting (rejected with invalid SID). This makes the all peer to peer model useless, since an app can not make sure that connection occur at exactly the same time! my understanding of the spec is that peer to peer model has the ability to handle also connections that occur at exactly the same time but not only. Second, if multiple peer to peer connections need to form between the same pair of nodes, things can go screwy (that's the technical term) trying to match up the peer requests. Under MPI each rank uses a different SID, so I think we are safe from this problem. Or ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] smpquery regression in 1.3-rc1
Hal Rosenstock wrote: On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote: Hal Rosenstock wrote: On Wed, 2007-12-19 at 11:58 -0800, [EMAIL PROTECTED] wrote: We're seeing a regression in smpquery from alpha2 to rc1. For example, with alpha2 I get: grommit:~ # smpquery -G nodeinfo 0x66a01a000737c # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # And with rc1, I get: grommit:~ # smpquery -G nodeinfo 0x66a01a000737c ibwarn: [5650] ib_path_query: sa call path_query failed smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c grommit:~ # But using a LID works fine: grommit:~ # smpquery nodeinfo 3 # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # Strangest of all, running it under strace also works: grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c /tmp/smpquery.out . grommit:~ # cat /tmp/smpquery.out # Node info: Lid 3 BaseVers:1 ClassVers:...1 NodeType:Channel Adapter NumPorts:2 SystemGuid:..0x00066a009800737c Guid:0x00066a009800737c PortGuid:0x00066a01a000737c PartCap:.64 DevId:...0x6278 Revision:0x00a0 LocalPort:...2 VendorId:0x00066a grommit:~ # Some weird race condition... Anyone else seeing the same? -G requires a SA path record lookup so this could be an issue with that timing out in some cases (assuming the port is active and the SM is operational). I'm seeing the same problem. Sometimes the query works, and sometimes it doesn't. I also see that when the query fails, OpenSM doesn't get PathRecord query at all. Hal, can you elaborate on that timing out in some cases issue? I just meant that the SM not responding (for an unknown reason right now) would yield this effect. Adding Jack for the libibmad issue: I see that the ib_path_query() in libibmad/sa.c sometimes fails when calling safe_sa_call(). This could just be more detail on the same thing in terms of the (smpquery) client which is layered on top of libibmad: the SA path query timeout. I would suggest running OpenSM in verbose mode (both instances are with OpenSM) and seeing if it responds to the PathRecord query used by this form of smpquery and continue troubleshooting from there based on the result. This is actually what I was saying here. I have *debugged* smpquery, and saw that the failing function is ib_path_query() in libibmad/sa.c As I've mentioned, I did run it with OpenSM in verbose mode, and saw that when smpquery fails, OpenSM log does not have any PathRecord request. When smpquery passes, I see the PathRecord request and response in the OpenSM log. -- Yevgeny -- Hal -- Yevgeny -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process
Jack: Thanks for adding this new function, this is what we need. There is one issue I want to make clear, This new kernel owned QP will be destroyed when the XRC domain is closed (i.e., as part of a ibv_close_xrc_domain call, but only when the domain's reference count goes to zero) If I have a MPI server processes on a node, many other MPI client processes will dynamically connect/disconnect with the server. The server use same XRC domain. Will this cause accumulating the kernel QP for such application ? we want the server to run 365 days a year. Thanks. --CQ -Original Message- From: Pavel Shamis (Pasha) [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 9:15 AM To: Jack Morgenstein Cc: Tang, Changqing; Roland Dreier; general@lists.openfabrics.org; Open MPI Developers; [EMAIL PROTECTED] Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process Adding Open MPI and MVAPICH community to the thread. Pasha (Pavel Shamis) Jack Morgenstein wrote: background: see XRC Cleanup order issue thread at http://lists.openfabrics.org/pipermail/general/2007-December/043935.ht ml (userspace process which created the receiving XRC qp on a given host dies before other processes which still need to receive XRC messages on their SRQs which are paired with the now-destroyed receiving XRC QP.) Solution: Add a userspace verb (as part of the XRC suite) which enables the user process to create an XRC QP owned by the kernel -- which belongs to the required XRC domain. This QP will be destroyed when the XRC domain is closed (i.e., as part of a ibv_close_xrc_domain call, but only when the domain's reference count goes to zero). Below, I give the new userspace API for this function. Any feedback will be appreciated. This API will be implemented in the upcoming OFED 1.3 release, so we need feedback ASAP. Notes: 1. There is no query or destroy verb for this QP. There is also no userspace object for the QP. Userspace has ONLY the raw qp number to use when creating the (X)RC connection. 2. Since the QP is owned by kernel space, async events for this QP are also handled in kernel space (i.e., reported in /var/log/messages). There are no completion events for the QP, since it does not send, and all receives completions are reported in the XRC SRQ's cq. If this QP enters the error state, the remote QP which sends will start receiving RETRY_EXCEEDED errors, so the application will be aware of the failure. - Jack == /** * ibv_alloc_xrc_rcv_qp - creates an XRC QP for serving as a receive-side only QP, *and moves the created qp through the RESET-INIT and INIT-RTR transitions. * (The RTR-RTS transition is not needed, since this QP does no sending). *The sending XRC QP uses this QP as destination, while specifying an XRC SRQ *for actually receiving the transmissions and generating all completions on the *receiving side. * *This QP is created in kernel space, and persists until the XRC domain is closed. *(i.e., its reference count goes to zero). * * @pd: protection domain to use. At lower layer, this provides access to userspace obj * @xrc_domain: xrc domain to use for the QP. * @attr: modify-qp attributes needed to bring the QP to RTR. * @attr_mask: bitmap indicating which attributes are provided in the attr struct. *used for validity checking. * @xrc_rcv_qpn: qp_num of created QP (if success). To be passed to the remote node. The * remote node will use xrc_rcv_qpn in ibv_post_send when sending to * XRC SRQ's on this host in the same xrc domain. * * RETURNS: success (0), or a (negative) error value. */ int ibv_alloc_xrc_rcv_qp(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, uint32_t *xrc_rcv_qpn); Notes: 1. Although the kernel creates the qp in the kernel's own PD, we still need the PD parameter to determine the device. 2. I chose to use struct ibv_qp_attr, which is used in modify QP, rather than create a new structure for this purpose. This also guards against API changes in the event that during development I notice that more modify-qp parameters must be specified for this operation to work. 3. Table of the ibv_qp_attr parameters showing what values to set: struct ibv_qp_attr { enum ibv_qp_state qp_state; Not needed enum ibv_qp_state cur_qp_state; Not needed -- Driver starts from RESET and takes qp to RTR. enum ibv_mtu
[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested
Sasha Khapyorsky wrote: On 09:40 Wed 19 Dec , Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: Hi Yevgeny, On 15:33 Mon 17 Dec , Yevgeny Kliteynik wrote: If a heavy sweep requested during idle queue processing, OSM continues to process it till the end and only then notices the heavy sweep request. In some cases this might leave a topology change unhandled for several minutes. Could you provide more details about such cases? As far as I know the idle queue is used only for multicast re-routing. If so, it is interesting by itself why it takes minutes and where. Is where MCG join/leave storm? Exactly. The problem was discovered on a big cluster with hundreds of mcast groups, when there is some massive change in the subnet (like rebooting hundreds of nodes). Ok, then proposed patch looks like half solution for me. During mcast join/leave storm idle queue will be filled with requests to rebuild mcast routing. OpenSM will process it one by one (and this will take a lot of time) instead of process all pended mcast groups in one run. I think it is first improvement needed here. Even with such improvement we will not be able to control the order of heavy sweep/mcast join requests, so basically idea of breaking idle queue processing looks fine for me, but it is not all what should be done here. Heavy sweep by itself recalculates mcast routing for all existing groups, it should invalidate all pended mcast rerouting requests instead of continuing idle queue processing after heavy sweep. Make sense? OK, makes sense. So bottom line, when breaking the idle queue processing because of immediate sweep request, state manager should just purge the whole idle queue and then start the new heavy sweep. I'll work on it. -- Yevgeny Sasha -- Yevgeny Or single re-routing cycle takes minutes? Sasha Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED] --- opensm/opensm/osm_state_mgr.c | 31 --- 1 files changed, 24 insertions(+), 7 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 5c39f11..6ee5ee6 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1607,13 +1607,30 @@ void osm_state_mgr_process(IN osm_state_mgr_t * const p_mgr, /* CALL the done function */ __process_idle_time_queue_done(p_mgr); - /* -* Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS -* so that the next element in the queue gets processed -*/ - - signal = OSM_SIGNAL_IDLE_TIME_PROCESS; - p_mgr-state = OSM_SM_STATE_PROCESS_REQUEST; + if (p_mgr-p_subn-force_immediate_heavy_sweep) { + /* +* Do not read next item from the idle queue. +* Immediate heavy sweep is requested, so it's +* more important. +* Besides, there is a chance that after the +* heavy sweep complition, idle queue processing +* that SM would have performed here will be obsolete. +*/ + if (osm_log_is_active(p_mgr-p_log, OSM_LOG_DEBUG)) + osm_log(p_mgr-p_log, OSM_LOG_DEBUG, + osm_state_mgr_process: + interrupting idle time queue processing - heavy sweep requested\n); + signal = OSM_SIGNAL_NONE: + p_mgr-state = OSM_SM_STATE_IDLE; + } + else { + /* +* Set the signal to OSM_SIGNAL_IDLE_TIME_PROCESS +* so that the next element in the queue gets processed +*/ + signal = OSM_SIGNAL_IDLE_TIME_PROCESS; + p_mgr-state = OSM_SM_STATE_PROCESS_REQUEST; + } break; default: -- 1.5.1.4 ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH] opensm: osm_state_mgr.c - stop idle queue processing if heavy sweep requested
On 18:41 Thu 20 Dec , Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: On 09:40 Wed 19 Dec , Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: Hi Yevgeny, On 15:33 Mon 17 Dec , Yevgeny Kliteynik wrote: If a heavy sweep requested during idle queue processing, OSM continues to process it till the end and only then notices the heavy sweep request. In some cases this might leave a topology change unhandled for several minutes. Could you provide more details about such cases? As far as I know the idle queue is used only for multicast re-routing. If so, it is interesting by itself why it takes minutes and where. Is where MCG join/leave storm? Exactly. The problem was discovered on a big cluster with hundreds of mcast groups, when there is some massive change in the subnet (like rebooting hundreds of nodes). Ok, then proposed patch looks like half solution for me. During mcast join/leave storm idle queue will be filled with requests to rebuild mcast routing. OpenSM will process it one by one (and this will take a lot of time) instead of process all pended mcast groups in one run. I think it is first improvement needed here. Even with such improvement we will not be able to control the order of heavy sweep/mcast join requests, so basically idea of breaking idle queue processing looks fine for me, but it is not all what should be done here. Heavy sweep by itself recalculates mcast routing for all existing groups, it should invalidate all pended mcast rerouting requests instead of continuing idle queue processing after heavy sweep. Make sense? OK, makes sense. So bottom line, when breaking the idle queue processing because of immediate sweep request, state manager should just purge the whole idle queue and then start the new heavy sweep. Yes, it is one patch, another expected patch for improving mcast join requests/node reboot storm handling by OpenSM is recalculating mcast routing for more than one mcast groups (actually I think requested mcast groups should be queued in the list and mcast re-routing request merged + some trivial processor function in osm_mcast_mgr.c). Maybe whole idle queue mechanism can be killed as useless, then this will impact heavy sweep related patch. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] smpquery regression in 1.3-rc1
On 08:49 Thu 20 Dec , Hal Rosenstock wrote: Anyone else seeing the same? -G requires a SA path record lookup so this could be an issue with that timing out in some cases (assuming the port is active and the SM is operational). I'm seeing the same problem. Sometimes the query works, and sometimes it doesn't. I also see that when the query fails, OpenSM doesn't get PathRecord query at all. Hal, can you elaborate on that timing out in some cases issue? I just meant that the SM not responding (for an unknown reason right now) would yield this effect. Adding Jack for the libibmad issue: I see that the ib_path_query() in libibmad/sa.c sometimes fails when calling safe_sa_call(). This could just be more detail on the same thing in terms of the (smpquery) client which is layered on top of libibmad: the SA path query timeout. I would suggest running OpenSM in verbose mode (both instances are with OpenSM) and seeing if it responds to the PathRecord query used by this form of smpquery and continue troubleshooting from there based on the result. This is actually what I was saying here. I have *debugged* smpquery, and saw that the failing function is ib_path_query() in libibmad/sa.c As I've mentioned, I did run it with OpenSM in verbose mode, and saw that when smpquery fails, OpenSM log does not have any PathRecord request. When smpquery passes, I see the PathRecord request and response in the OpenSM log. OK; that wasn't clear before but is now (that the failure appears to be a client and not SM issue) :-) FWIW, I don't know what has changed that would affect this so it could be a latent bug as opposed to a regression. Right, there were no changes in this area in this period, likely issue just triggered. I'm not sure but probably I saw something like this in a past, but then thought it was cabling issue. Yevgeny, Arthur, could you rerun smpquery with - (for lot of debug stuff)? Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] smpquery regression in 1.3-rc1
On Thu, Dec 20, 2007 at 05:13:18PM +, Sasha Khapyorsky wrote: ... Yevgeny, Arthur, could you rerun smpquery with - (for lot of debug stuff)? Well, just about any perturbation changes the behavior - run it under strace, or gdb, link the IB libraries statically, or look at the machine funny and it works fine. But using the debug flags reveals an apparent problem with the debug code itself: # ./smpquery_1.3_rc1 -d -G nodeinfo 0x00066a01a000737c ibwarn: [19328] smp_query: attr 0x15 mod 0x0 route DR path 0 ibwarn: [19328] mad_rpc: data offs 64 sz 64 mad data fe80 0002 0002 0251 0a6a 0103 0302 3452 0023 4040 0008 0804 ff40 005e 2012 1088 Segmentation fault and gdb shows: (gdb) bt #0 0x2b0b9222ed0f in _IO_default_xsputn_internal () from /lib64/libc.so.6 #1 0x2b0b92207177 in vfprintf () from /lib64/libc.so.6 #2 0x2b0b9229577d in __vsprintf_chk () from /lib64/libc.so.6 #3 0x2b0b922956c0 in __sprintf_chk () from /lib64/libc.so.6 #4 0x2b0b91c71166 in portid2str (portid=0x7fff1905bc00) at src/portid.c:91 #5 0x2b0b91c72529 in sa_rpc_call (ibmad_port=0x7fff1905b680, rcvbuf=0x7fff1905bb30, portid=0x7fff1905bc00, sa=0x7fff1905bac0, timeout=0) at src/sa.c:58 #6 0x2b0b91c71791 in sa_call (rcvbuf=0x7fff1905bb30, portid=0x7fff1905bc00, sa=0x7fff1905bac0, timeout=0) at src/rpc.c:395 #7 0x2b0b91c723bf in ib_path_query (srcgid=0x7fff1905be30 \200, destgid=0x7fff1905be30 \200, sm_id=0x7fff1905bc00, buf=0x7fff1905bb30) at ./include/infiniband/mad.h:790 #8 0x2b0b91c7144f in ib_resolve_guid (portid=0x7fff1905bde0, guid=0x7fff1905bd20, sm_id=0x7fff1905bc00, timeout=value optimized out) at src/resolve.c:83 #9 0x2b0b91c71610 in ib_resolve_portid_str (portid=0x7fff1905bde0, addr_str=0x7fff1905d341 0x00066a01a000737c, dest_type=2, sm_id=0x0) at src/resolve.c:115 #10 0x00401cd1 in main (argc=2, argv=0x7fff1905bfd0) at smpquery_1.3_rc1.c:522 -- Arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] iommu dma mapping alignment requirements
Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? The Ammasso certainly works this way. If you tell it the page size is 64KB, it will ignore bits in the page address that encode 0-65535. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH 3/3] ib/cm: add basic performance counters
Roland Dreier wrote: by the way, I had to make cm_class not static, or else a build with ib_cm and ib_ucm built into the kernel faile... I think that exported symbols can't be static. thanks for fixing this ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] peer to peer connections support
Sorry if I wasn't clear, let me see if I understand you: with this different domain implementation, under both client/server the passive calls cm listen and the active call cm connect, where under peer/to/peer both sides call cm listen and later both sides may call cm connect or only one side, correct? My thinking was that the peer to peer model would have both sides call connect only. The peer to peer connection model only kicks in when both sides are in the REQ sent state. But it could set it to one as well... assuming my understanding above of the suggested implementation is correct, we can change the RDMA-CM API to let users specify on rdma_connect that they want peer to peer support, so such apps can issue rdma_listen call and later call rdma_connect with this bit set and they are done (or almost done... I guess there some more devil in the details here, isn't it?) This was why I said that the IB CM API was fine, but the RDMA CM API would require changes. This makes the all peer to peer model useless, since an app can not make sure that connection occur at exactly the same time! yep - (anyone can feel free to step in a set me straight on this...) the spec is that peer to peer model has the ability to handle also connections that occur at exactly the same time but not only. Peer to peer seems inherently racy to me. Under MPI each rank uses a different SID, so I think we are safe from this problem. Any peer to peer implementation should handle this case however. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [RFC] XRC -- make receiving XRC QP independent of any one user process
This API will be implemented in the upcoming OFED 1.3 release, so we need feedback ASAP. I hope we can learn some lessons about development process... clearly changing APIs after -rc1 is not something that leads to good quality in general. int ibv_alloc_xrc_rcv_qp(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, uint32_t *xrc_rcv_qpn); I can't say this interface is very appealing. Another option would be to create an XRC verb that detaches a userspace QP and gives it the same lifetime as an XRC domain. But that doesn't seem any nicer. And I guess we can't combine creating the QP with allocating the XRC domain, because the consumer might want to open the XRC domain before it has connected with the remote side. Oh well, I guess this XRC stuff just ends up being ugly. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] iommu dma mapping alignment requirements
Roland Dreier wrote: It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s-page) + s-offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] iommu dma mapping alignment requirements
Steve Wise wrote: Roland Dreier wrote: It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s-page) + s-offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Or based on the alignment of vaddr actually... ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: iommu dma mapping alignment requirements
Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? Ben. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 1/10] nes: accelerated loopback fix
Accelerated loopback code did not properly handle private data. Add loopback connection counter to ethtool stats. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 638bc51..79889a4 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -67,6 +67,7 @@ u32 cm_packets_received; u32 cm_listens_created; u32 cm_listens_destroyed; u32 cm_backlog_drops; +atomic_t cm_loopbacks; atomic_t cm_nodes_created; atomic_t cm_nodes_destroyed; atomic_t cm_accel_dropped_pkts; @@ -1638,6 +1639,7 @@ struct nes_cm_node * mini_cm_connect(struct nes_cm_core *cm_core, if (loopbackremotelistener == NULL) { create_event(cm_node, NES_CM_EVENT_ABORTED); } else { + atomic_inc(cm_loopbacks); loopback_cm_info = *cm_info; loopback_cm_info.loc_port = cm_info-rem_port; loopback_cm_info.rem_port = cm_info-loc_port; @@ -2445,7 +2447,13 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_event.private_data = NULL; cm_event.private_data_len = 0; ret = cm_id-event_handler(cm_id, cm_event); - nes_debug(NES_DBG_CM, OFA CM event_handler returned, ret=%d\n, ret); + if (cm_node-loopbackpartner) { + cm_node-loopbackpartner-mpa_frame_size = nesqp-private_data_len; + /* copy entire MPA frame to our cm_node's frame */ + memcpy(cm_node-loopbackpartner-mpa_frame_buf, nesqp-ietf_frame-priv_data, + nesqp-private_data_len); + create_event(cm_node-loopbackpartner, NES_CM_EVENT_CONNECTED); + } if (ret) printk(%s[%u] OFA CM event_handler returned, ret=%d\n, __FUNCTION__, __LINE__, ret); diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index e01aab4..810a9ae 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -114,6 +114,7 @@ extern u32 cm_packets_retrans; extern u32 cm_listens_created; extern u32 cm_listens_destroyed; extern u32 cm_backlog_drops; +extern atomic_t cm_loopbacks; extern atomic_t cm_nodes_created; extern atomic_t cm_nodes_destroyed; extern atomic_t cm_accel_dropped_pkts; @@ -967,7 +968,7 @@ void nes_netdev_exit(struct nes_vnic *nesvnic) } -#define NES_ETHTOOL_STAT_COUNT 54 +#define NES_ETHTOOL_STAT_COUNT 55 static const char nes_ethtool_stringset[NES_ETHTOOL_STAT_COUNT][ETH_GSTRING_LEN] = { Link Change Interrupts, Linearized SKBs, @@ -1011,6 +1012,7 @@ static const char nes_ethtool_stringset[NES_ETHTOOL_STAT_COUNT][ETH_GSTRING_LEN] CM Listens Created, CM Listens Destroyed, CM Backlog Drops, + CM Loopbacks, CM Nodes Created, CM Nodes Destroyed, CM Accel Drops, @@ -1206,11 +1208,11 @@ static void nes_netdev_get_ethtool_stats(struct net_device *netdev, target_stat_values[39] = cm_listens_created; target_stat_values[40] = cm_listens_destroyed; target_stat_values[41] = cm_backlog_drops; - target_stat_values[42] = atomic_read(cm_nodes_created); - target_stat_values[43] = atomic_read(cm_nodes_destroyed); - target_stat_values[44] = atomic_read(cm_accel_dropped_pkts); - target_stat_values[45] = atomic_read(cm_resets_recvd); - target_stat_values[46] = int_mod_timer_init; + target_stat_values[42] = atomic_read(cm_loopbacks); + target_stat_values[43] = atomic_read(cm_nodes_created); + target_stat_values[44] = atomic_read(cm_nodes_destroyed); + target_stat_values[45] = atomic_read(cm_accel_dropped_pkts); + target_stat_values[46] = atomic_read(cm_resets_recvd); target_stat_values[47] = int_mod_cq_depth_1; target_stat_values[48] = int_mod_cq_depth_4; target_stat_values[49] = int_mod_cq_depth_16; diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c index b6aa6d3..8d2c1ee 100644 --- a/drivers/infiniband/hw/nes/nes_utils.c +++ b/drivers/infiniband/hw/nes/nes_utils.c @@ -620,8 +620,6 @@ void nes_post_cqp_request(struct nes_device *nesdev, } - - /** * nes_arp_table */ ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2/10] nes: add support for external flash update utility
Allows an external utility to read/write flash for firmware upgrades. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index a5e0bb5..1088330 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -780,6 +780,136 @@ static struct pci_driver nes_pci_driver = { .remove = __devexit_p(nes_remove), }; +static ssize_t nes_show_ee_cmd(struct device_driver *ddp, char *buf) +{ + u32 eeprom_cmd; + struct nes_device *nesdev; + + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + eeprom_cmd = nes_read32(nesdev-regs + NES_EEPROM_COMMAND); + + return snprintf(buf, PAGE_SIZE, 0x%x\n, eeprom_cmd); +} + +static ssize_t nes_store_ee_cmd(struct device_driver *ddp, + const char *buf, size_t count) +{ + char *p = (char *)buf; + u32 val; + struct nes_device *nesdev; + + if (p[1] == 'x' || p[1] == 'X' || p[0] == 'x' || p[0] == 'X') { + val = simple_strtoul(p, p, 16); + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + nes_write32(nesdev-regs + NES_EEPROM_COMMAND, val); + } + return strnlen(buf, count); +} + +static ssize_t nes_show_ee_data(struct device_driver *ddp, char *buf) +{ + u32 eeprom_data; + struct nes_device *nesdev; + + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + eeprom_data = nes_read32(nesdev-regs + NES_EEPROM_DATA); + + return snprintf(buf, PAGE_SIZE, 0x%x\n, eeprom_data); +} + +static ssize_t nes_store_ee_data(struct device_driver *ddp, + const char *buf, size_t count) +{ + char *p = (char *)buf; + u32 val; + struct nes_device *nesdev; + + if (p[1] == 'x' || p[1] == 'X' || p[0] == 'x' || p[0] == 'X') { + val = simple_strtoul(p, p, 16); + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + nes_write32(nesdev-regs + NES_EEPROM_DATA, val); + } + return strnlen(buf, count); +} + +static ssize_t nes_show_flash_cmd(struct device_driver *ddp, char *buf) +{ + u32 flash_cmd; + struct nes_device *nesdev; + + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + flash_cmd = nes_read32(nesdev-regs + NES_FLASH_COMMAND); + + return snprintf(buf, PAGE_SIZE, 0x%x\n, flash_cmd); +} + +static ssize_t nes_store_flash_cmd(struct device_driver *ddp, + const char *buf, size_t count) +{ + char *p = (char *)buf; + u32 val; + struct nes_device *nesdev; + + if (p[1] == 'x' || p[1] == 'X' || p[0] == 'x' || p[0] == 'X') { + val = simple_strtoul(p, p, 16); + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + nes_write32(nesdev-regs + NES_FLASH_COMMAND, val); + } + return strnlen(buf, count); +} + +static ssize_t nes_show_flash_data(struct device_driver *ddp, char *buf) +{ + u32 flash_data; + struct nes_device *nesdev; + + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + flash_data = nes_read32(nesdev-regs + NES_FLASH_DATA); + + return snprintf(buf, PAGE_SIZE, 0x%x\n, flash_data); +} + +static ssize_t nes_store_flash_data(struct device_driver *ddp, + const char *buf, size_t count) +{ + char *p = (char *)buf; + u32 val; + struct nes_device *nesdev; + + if (p[1] == 'x' || p[1] == 'X' || p[0] == 'x' || p[0] == 'X') { + val = simple_strtoul(p, p, 16); + nesdev = list_entry(nes_dev_list.next, typeof(*nesdev), list); + nes_write32(nesdev-regs + NES_FLASH_DATA, val); + } + return strnlen(buf, count); +} + +DRIVER_ATTR(eeprom_cmd, S_IRUSR | S_IWUSR, + nes_show_ee_cmd, nes_store_ee_cmd); +DRIVER_ATTR(eeprom_data, S_IRUSR | S_IWUSR, + nes_show_ee_data, nes_store_ee_data); +DRIVER_ATTR(flash_cmd, S_IRUSR | S_IWUSR, + nes_show_flash_cmd, nes_store_flash_cmd); +DRIVER_ATTR(flash_data, S_IRUSR | S_IWUSR, + nes_show_flash_data, nes_store_flash_data); + +int nes_create_driver_sysfs(struct pci_driver *drv) +{ + int error; + error = driver_create_file(drv-driver, driver_attr_eeprom_cmd); + error |= driver_create_file(drv-driver, driver_attr_eeprom_data); + error |= driver_create_file(drv-driver, driver_attr_flash_cmd); + error |= driver_create_file(drv-driver, driver_attr_flash_data); + return error; +} + +void nes_remove_driver_sysfs(struct pci_driver *drv) +{ + driver_remove_file(drv-driver, driver_attr_eeprom_cmd); + driver_remove_file(drv-driver, driver_attr_eeprom_data); + driver_remove_file(drv-driver, driver_attr_flash_cmd); + driver_remove_file(drv-driver, driver_attr_flash_data); +} /** * nes_init_module - module initialization entry point @@ -787,12 +917,20 @@ static struct
[ofa-general] [PATCH 3/10] nes: nic queue start/stop and carrier fix
If a full send queue occurs, netif_stop_queue() is called but netif_start_queue() was not being called. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 810a9ae..2ff4c41 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -203,6 +203,7 @@ static int nes_netdev_open(struct net_device *netdev) return ret; } + netif_carrier_off(netdev); netif_stop_queue(netdev); if ((!nesvnic-of_device_registered) (nesvnic-rdma_enabled)) { @@ -502,6 +503,13 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev) netdev-name, skb-len, skb_headlen(skb), skb_shinfo(skb)-nr_frags, skb_is_gso(skb)); */ + + if (!netif_carrier_ok(netdev)) + return NETDEV_TX_OK; + + if (netif_queue_stopped(netdev)) + return NETDEV_TX_BUSY; + local_irq_save(flags); if (!spin_trylock(nesnic-sq_lock)) { local_irq_restore(flags); @@ -511,12 +519,20 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev) /* Check if SQ is full */ if nesnic-sq_tail+(nesnic-sq_size*2))-nesnic-sq_head) (nesnic-sq_size - 1)) == 1) { - netif_stop_queue(netdev); - spin_unlock_irqrestore(nesnic-sq_lock, flags); + if (!netif_queue_stopped(netdev)) { + netif_stop_queue(netdev); + barrier(); + if ((volatile u16)nesnic-sq_tail)+(nesnic-sq_size*2))-nesnic-sq_head) (nesnic-sq_size - 1)) != 1) { + netif_start_queue(netdev); + goto sq_no_longer_full; + } + } nesvnic-sq_full++; + spin_unlock_irqrestore(nesnic-sq_lock, flags); return NETDEV_TX_BUSY; } +sq_no_longer_full: nr_frags = skb_shinfo(skb)-nr_frags; if (skb_headlen(skb) NES_FIRST_FRAG_SIZE) { nr_frags++; @@ -534,13 +550,23 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev) (nesnic-sq_size - 1); if (unlikely(wqes_needed wqes_available)) { - netif_stop_queue(netdev); + if (!netif_queue_stopped(netdev)) { + netif_stop_queue(netdev); + barrier(); + wqes_available = (volatile u16)nesnic-sq_tail)+nesnic-sq_size)-nesnic-sq_head) - 1) + (nesnic-sq_size - 1); + if (wqes_needed = wqes_available) { + netif_start_queue(netdev); + goto tso_sq_no_longer_full; + } + } + nesvnic-sq_full++; spin_unlock_irqrestore(nesnic-sq_lock, flags); nes_debug(NES_DBG_NIC_TX, %s: HNIC SQ full- TSO request has too many frags!\n, netdev-name); - nesvnic-sq_full++; return NETDEV_TX_BUSY; } +tso_sq_no_longer_full: /* Map all the buffers */ for (tso_frag_count=0; tso_frag_count skb_shinfo(skb)-nr_frags; tso_frag_count++) { ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 4/10] nes: interrupt moderation fix
Hardware interrupt moderation timer gave average performance on slower systems. These fixes increase performance. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 674ce32..1048db2 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -155,26 +155,41 @@ static void nes_nic_tune_timer(struct nes_device *nesdev) spin_lock_irqsave(nesadapter-periodic_timer_lock, flags); + if (shared_timer-cq_count_old shared_timer-cq_count) { + if (shared_timer-cq_count shared_timer-threshold_low ) { + shared_timer-cq_direction_downward=0; + } + } + if (shared_timer-cq_count_old = shared_timer-cq_count) { + shared_timer-cq_direction_downward++; + } + shared_timer-cq_count_old = shared_timer-cq_count; + if (shared_timer-cq_direction_downward NES_NIC_CQ_DOWNWARD_TREND) { + if (shared_timer-cq_count = shared_timer-threshold_low ) { + shared_timer-threshold_low = shared_timer-threshold_low/2; + shared_timer-cq_direction_downward=0; + shared_timer-cq_count = 0; + spin_unlock_irqrestore(nesadapter-periodic_timer_lock, flags); + return; + } + } + if (shared_timer-cq_count1) { nesdev-deepcq_count += shared_timer-cq_count; if (shared_timer-cq_count = shared_timer-threshold_low ) { /* increase timer gently */ shared_timer-timer_direction_upward++; shared_timer-timer_direction_downward = 0; - } - else if (shared_timer-cq_count = shared_timer-threshold_target ) { /* balanced */ + } else if (shared_timer-cq_count = shared_timer-threshold_target ) { /* balanced */ shared_timer-timer_direction_upward = 0; shared_timer-timer_direction_downward = 0; - } - else if (shared_timer-cq_count = shared_timer-threshold_high ) { /* decrease timer gently */ + } else if (shared_timer-cq_count = shared_timer-threshold_high ) { /* decrease timer gently */ shared_timer-timer_direction_downward++; shared_timer-timer_direction_upward = 0; - } - else if (shared_timer-cq_count = (shared_timer-threshold_high)*2) { + } else if (shared_timer-cq_count = (shared_timer-threshold_high)*2) { shared_timer-timer_in_use -= 2; shared_timer-timer_direction_upward = 0; shared_timer-timer_direction_downward++; - } - else { + } else { shared_timer-timer_in_use -= 4; shared_timer-timer_direction_upward = 0; shared_timer-timer_direction_downward++; @@ -2241,7 +2256,7 @@ void nes_nic_ce_handler(struct nes_device *nesdev, struct nes_hw_nic_cq *cq) if (atomic_read(nesvnic-rx_skbs_needed) (nesvnic-nic.rq_size1)) { nes_write32(nesdev-regs+NES_CQE_ALLOC, cq-cq_number | (cqe_count 16)); -nesadapter-tune_timer.cq_count += cqe_count; + nesadapter-tune_timer.cq_count += cqe_count; cqe_count = 0; nes_replenish_nic_rq(nesvnic); } diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 25cfda2..ca0b006 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -957,6 +957,7 @@ struct nes_arp_entry { #define DEFAULT_JUMBO_NES_QL_LOW12 #define DEFAULT_JUMBO_NES_QL_TARGET 40 #define DEFAULT_JUMBO_NES_QL_HIGH 128 +#define NES_NIC_CQ_DOWNWARD_TREND 8 struct nes_hw_tune_timer { u16 cq_count; @@ -969,6 +970,8 @@ struct nes_hw_tune_timer { u16 timer_in_use_max; u8 timer_direction_upward; u8 timer_direction_downward; +u16 cq_count_old; +u8 cq_direction_downward; }; #define NES_TIMER_INT_LIMIT 2 @@ -1051,17 +1054,17 @@ struct nes_adapter { u32 nic_rx_eth_route_err; - u32 et_rx_coalesce_usecs; + u32 et_rx_coalesce_usecs; u32 et_rx_max_coalesced_frames; u32 et_rx_coalesce_usecs_irq; - u32 et_rx_max_coalesced_frames_irq; - u32 et_pkt_rate_low; - u32 et_rx_coalesce_usecs_low; - u32 et_rx_max_coalesced_frames_low; - u32 et_pkt_rate_high; - u32 et_rx_coalesce_usecs_high; - u32
[ofa-general] [PATCH 5/10] nes: remove unneeded arp cache update
The hardware arp cache is updated by inet event notifiers. Therefore, no arp cache update is needed at netdev_open. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c index 2ff4c41..496024a 100644 --- a/drivers/infiniband/hw/nes/nes_nic.c +++ b/drivers/infiniband/hw/nes/nes_nic.c @@ -260,16 +260,6 @@ static int nes_netdev_open(struct net_device *netdev) } - if (netdev-ip_ptr) { - struct in_device *ip = netdev-ip_ptr; - struct in_ifaddr *in = NULL; - if (ip ip-ifa_list) { - in = ip-ifa_list; - nes_manage_arp_cache(nesvnic-netdev, netdev-dev_addr, - ntohl(in-ifa_address), NES_ARP_ADD); - } - } - nes_write32(nesdev-regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | nesvnic-nic_cq.cq_number); nes_read32(nesdev-regs+NES_CQE_ALLOC); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 6/10] nes: use control QP callback at connection teardown
Prevents a race condition between hardware and ULPs when tearing down connections. Memory and data structures are cleaned up after the hardware ce handler has run. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index 1088330..4376bc2 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -263,13 +263,43 @@ void nes_add_ref(struct ib_qp *ibqp) atomic_inc(nesqp-refcount); } +static void nes_cqp_rem_ref_callback(struct nes_device *nesdev, struct nes_cqp_request *cqp_request) +{ + unsigned long flags; + struct nes_qp *nesqp = cqp_request-cqp_callback_pointer; + struct nes_adapter *nesadapter = nesdev-nesadapter; + u32 qp_id; + + atomic_inc(qps_destroyed); + + /* Free the control structures */ + + qp_id = nesqp-hwqp.qp_id; + if (nesqp-pbl_vbase) { + pci_free_consistent(nesdev-pcidev, nesqp-qp_mem_size, + nesqp-hwqp.q2_vbase, nesqp-hwqp.q2_pbase); + spin_lock_irqsave(nesadapter-pbl_lock, flags); + nesadapter-free_256pbl++; + spin_unlock_irqrestore(nesadapter-pbl_lock, flags); + pci_free_consistent(nesdev-pcidev, 256, nesqp-pbl_vbase, nesqp-pbl_pbase); + nesqp-pbl_vbase = NULL; + kunmap(nesqp-page); + + } else { + pci_free_consistent(nesdev-pcidev, nesqp-qp_mem_size, + nesqp-hwqp.sq_vbase, nesqp-hwqp.sq_pbase); + } + nes_free_resource(nesadapter, nesadapter-allocated_qps, nesqp-hwqp.qp_id); + + kfree(nesqp-allocated_buffer); + +} /** * nes_rem_ref */ void nes_rem_ref(struct ib_qp *ibqp) { - unsigned long flags; u64 u64temp; struct nes_qp *nesqp; struct nes_vnic *nesvnic = to_nesvnic(ibqp-device); @@ -287,27 +317,7 @@ void nes_rem_ref(struct ib_qp *ibqp) } if (atomic_dec_and_test(nesqp-refcount)) { - atomic_inc(qps_destroyed); - - /* Free the control structures */ - - if (nesqp-pbl_vbase) { - pci_free_consistent(nesdev-pcidev, nesqp-qp_mem_size, - nesqp-hwqp.q2_vbase, nesqp-hwqp.q2_pbase); - spin_lock_irqsave(nesadapter-pbl_lock, flags); - nesadapter-free_256pbl++; - spin_unlock_irqrestore(nesadapter-pbl_lock, flags); - pci_free_consistent(nesdev-pcidev, 256, nesqp-pbl_vbase, nesqp-pbl_pbase); - nesqp-pbl_vbase = NULL; - kunmap(nesqp-page); - - } else { - pci_free_consistent(nesdev-pcidev, nesqp-qp_mem_size, - nesqp-hwqp.sq_vbase, nesqp-hwqp.sq_pbase); - } - nesadapter-qp_table[nesqp-hwqp.qp_id-NES_FIRST_QPN] = NULL; - nes_free_resource(nesadapter, nesadapter-allocated_qps, nesqp-hwqp.qp_id); /* Destroy the QP */ cqp_request = nes_get_cqp_request(nesdev); @@ -316,6 +326,9 @@ void nes_rem_ref(struct ib_qp *ibqp) return; } cqp_request-waiting = 0; + cqp_request-callback = 1; + cqp_request-cqp_callback = nes_cqp_rem_ref_callback; + cqp_request-cqp_callback_pointer = nesqp; cqp_wqe = cqp_request-cqp_wqe; cqp_wqe-wqe_words[NES_CQP_WQE_OPCODE_IDX] = @@ -339,8 +352,6 @@ void nes_rem_ref(struct ib_qp *ibqp) cpu_to_le32((u32)(u64temp 32)); nes_post_cqp_request(nesdev, cqp_request, NES_CQP_REQUEST_RING_DOORBELL); - - kfree(nesqp-allocated_buffer); } } diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 1048db2..06d1963 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -2427,6 +2427,16 @@ void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq) spin_unlock_irqrestore(nesdev-cqp.lock, flags); } } + } else if (cqp_request-callback) { + /* Envoke the callback routine */ + cqp_request-cqp_callback(nesdev, cqp_request); + if (cqp_request-dynamic) { + kfree(cqp_request); + } else { + spin_lock_irqsave(nesdev-cqp.lock, flags); +
[ofa-general] [PATCH 7/10] nes: process mss option
Process a packet with mss option set or use default value. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 79889a4..169 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1220,11 +1220,12 @@ static int rem_ref_cm_node(struct nes_cm_core *cm_core, /** * process_options */ -static void process_options(struct nes_cm_node *cm_node, u8 *optionsloc, u32 optionsize) +static int process_options(struct nes_cm_node *cm_node, u8 *optionsloc, u32 optionsize, u32 syn_packet) { u32 tmp; u32 offset = 0; union all_known_options *all_options; + char got_mss_option = 0; while (offset optionsize) { all_options = (union all_known_options *)(optionsloc + offset); @@ -1236,9 +1237,17 @@ static void process_options(struct nes_cm_node *cm_node, u8 *optionsloc, u32 opt offset += 1; continue; case OPTION_NUMBER_MSS: - tmp = htons(all_options-as_mss.mss); - if (tmp cm_node-tcp_cntxt.mss) - cm_node-tcp_cntxt.mss = tmp; + nes_debug(NES_DBG_CM, %s: MSS Length: %d Offset: %d Size: %d\n, + __FUNCTION__, + all_options-as_mss.length, offset, optionsize); + got_mss_option = 1; + if (all_options-as_mss.length != 4) { + return 1; + } else { + tmp = htons(all_options-as_mss.mss); + if (tmp 0 tmp cm_node-tcp_cntxt.mss) + cm_node-tcp_cntxt.mss = tmp; + } break; case OPTION_NUMBER_WINDOW_SCALE: cm_node-tcp_cntxt.snd_wscale = all_options-as_windowscale.shiftcount; @@ -1253,6 +1262,9 @@ static void process_options(struct nes_cm_node *cm_node, u8 *optionsloc, u32 opt } offset += all_options-as_base.length; } + if ((!got_mss_option) (syn_packet)) + cm_node-tcp_cntxt.mss = NES_CM_DEFAULT_MSS; + return 0; } @@ -1343,6 +1355,8 @@ int process_packet(struct nes_cm_node *cm_node, struct sk_buff *skb, u8 *optionsloc = (u8 *)tcph[1]; process_options(cm_node, optionsloc, optionsize); } + else if (tcph-syn) + cm_node-tcp_cntxt.mss = NES_CM_DEFAULT_MSS; cm_node-tcp_cntxt.snd_wnd = htons(tcph-window) cm_node-tcp_cntxt.snd_wscale; diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index cd8e003..c511242 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -152,6 +152,8 @@ struct nes_timer_entry { #define NES_CM_DEFAULT_FREE_PKTS 0x000A #define NES_CM_FREE_PKT_LO_WATERMARK 2 +#define NES_CM_DEFAULT_MSS 536 + #define NES_CM_DEF_SEQ 0x159bf75f #define NES_CM_DEF_LOCAL_ID 0x3b47 ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 8/10] nes: multicast performance enhancement
Move multicast processing to it's own QP and setup the hardware to use it. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 06d1963..515133d 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -698,7 +698,7 @@ void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8 port_count) nes_write_indexed(nesdev, 0x01E4, 0x0007); /* nes_write_indexed(nesdev, 0x01E8, 0x000208C4); */ - nes_write_indexed(nesdev, 0x01E8, 0x00020844); + nes_write_indexed(nesdev, 0x01E8, 0x00020874); nes_write_indexed(nesdev, 0x01D8, 0x00048002); /* nes_write_indexed(nesdev, 0x01D8, 0x0004B002); */ nes_write_indexed(nesdev, 0x01FC, 0x00050005); @@ -753,7 +753,7 @@ void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8 port_count) nes_write_indexed(nesdev, 0x60C0, 0x028e); nes_write_indexed(nesdev, 0x60C8, 0x0020); // - nes_write_indexed(nesdev, 0x01EC, 0x5b2625a0); + nes_write_indexed(nesdev, 0x01EC, 0x7b2625a0); /* nes_write_indexed(nesdev, 0x01EC, 0x5f2625a0); */ if (hw_rev != NE020_REV) { @@ -1377,7 +1377,7 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) nic_sqe = nesvnic-nic.sq_vbase[counter]; nic_sqe-wqe_words[NES_NIC_SQ_WQE_MISC_IDX] = cpu_to_le32(NES_NIC_SQ_WQE_DISABLE_CHKSUM | - NES_NIC_SQ_WQE_COMPLETION); + NES_NIC_SQ_WQE_COMPLETION); nic_sqe-wqe_words[NES_NIC_SQ_WQE_LENGTH_0_TAG_IDX] = cpu_to_le32((u32)NES_FIRST_FRAG_SIZE 16); nic_sqe-wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX] = @@ -1386,6 +1386,15 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) cpu_to_le32((u32)((u64)nesvnic-nic.frag_paddr[counter] 32)); } + nesvnic-mcrq_nic.sq_vbase = (void*)0; + nesvnic-mcrq_nic.sq_pbase = 0; + nesvnic-mcrq_nic.sq_head = 0; + nesvnic-mcrq_nic.sq_tail = 0; + nesvnic-mcrq_nic.sq_size = 0; + nesvnic-get_cqp_request = nes_get_cqp_request; + nesvnic-post_cqp_request = nes_post_cqp_request; + nesvnic-mcrq_mcast_filter = 0; + spin_lock_init(nesvnic-nic.sq_lock); spin_lock_init(nesvnic-nic.rq_lock); @@ -1404,6 +1413,17 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) vmem += (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)); pmem += (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)); + nesvnic-mcrq_nic.rq_vbase = vmem; + nesvnic-mcrq_nic.rq_pbase = pmem; + nesvnic-mcrq_nic.rq_head = 0; + nesvnic-mcrq_nic.rq_tail = 0; + nesvnic-mcrq_nic.rq_size = NES_NIC_WQ_SIZE; + + /* setup the CQ */ + vmem += (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)); + pmem += (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)); + nesvnic-mcrq_nic.qp_id = nesvnic-nic_index + 32; + nesvnic-nic_cq.cq_vbase = vmem; nesvnic-nic_cq.cq_pbase = pmem; nesvnic-nic_cq.cq_head = 0; @@ -1484,6 +1504,19 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) /* Ring doorbell (2 WQEs) */ nes_write32(nesdev-regs+NES_WQE_ALLOC, 0x0280 | nesdev-cqp.qp_id); + /* Send CreateQP request to CQP */ + nic_context++; + nic_context-context_words[NES_NIC_CTX_MISC_IDX] = + cpu_to_le32((u32)NES_NIC_CTX_SIZE | + ((u32)PCI_FUNC(nesdev-pcidev-devfn) 12) | (1 18)); + + u64temp = (u64)nesvnic-mcrq_nic.sq_pbase; + nic_context-context_words[NES_NIC_CTX_SQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context-context_words[NES_NIC_CTX_SQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp 32)); + u64temp = (u64)nesvnic-mcrq_nic.rq_pbase; + nic_context-context_words[NES_NIC_CTX_RQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context-context_words[NES_NIC_CTX_RQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp 32)); + spin_unlock_irqrestore(nesdev-cqp.lock, flags); nes_debug(NES_DBG_INIT, Waiting for create NIC QP%u to complete.\n, nesvnic-nic.qp_id); diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 0279d4c..2efb55e 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -1161,9 +1161,11 @@ struct nes_vnic { dma_addr_t nic_pbase; struct nes_hw_nicnic; struct nes_hw_nic_cq nic_cq; - + struct
[ofa-general] lock dependency in ib_user_mad
I see hangs killing opensm related to a bug in user_mad.c. The problem appears to be: ib_umad_close() downgrade_write(file-port-mutex) ib_unregister_mad_agent(...) up_read(file-port-mutex) ib_unregister_mad_agent() flushes any outstanding MADs, resulting in calls to send_handler() and recv_handler(), both of which call queue_packet(): queue_packet() down_read(file-port-mutex) ... up_read(file-port-mutex) ib_umad_kill_port() has a similar issue as ib_umad_close(). Does anyone know the reasoning for holding the mutex around ib_unregister_mad_agent()? - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] peer to peer connections support
On 12/20/07, Kanevsky, Arkady [EMAIL PROTECTED] wrote: SO in a nutshell the proposal is to add some identifier into CM private data which indicate that it is peer-to-peer model, and unique peers IDs for the requested connection. Is this the model? For the time being, I try to understand if in the peer to peer model both sides issue a listen before connecting or not. Without this listen the peer-to-peer does not seems usable to me, what's your understanding of the spec? Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? That might work. If you gimme a patch, i'll try it out! Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] peer to peer connections support
Yes. The question is who issues it? It can be done by the CM and not ULP. Looking way back at VIPL there was a peer-to-peer model with the API similar to the one which Shane outlines. Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 4:17 PM To: Kanevsky, Arkady Cc: OpenFabrics General Subject: Re: [ofa-general] peer to peer connections support On 12/20/07, Kanevsky, Arkady [EMAIL PROTECTED] wrote: SO in a nutshell the proposal is to add some identifier into CM private data which indicate that it is peer-to-peer model, and unique peers IDs for the requested connection. Is this the model? For the time being, I try to understand if in the peer to peer model both sides issue a listen before connecting or not. Without this listen the peer-to-peer does not seems usable to me, what's your understanding of the spec? Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] peer to peer connections support
On 12/20/07, Kanevsky, Arkady [EMAIL PROTECTED] wrote: Yes. The question is who issues it? It can be done by the CM and not ULP. Looking way back at VIPL there was a peer-to-peer model with the API similar to the one which Shane outlines. If the CM issues the listen its means I can connect to you now only if you try to connect to me NOW, my understanding is that this is useless protocol, but I will be happy to hear why I am wrong. The IB stack co maintainer name is Sean Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] peer to peer connections support
What is the difference between ULP not issuing listen yet vs. ULP not issuing peer-to-peer connect which does listen under the cover? If conn request comes from another side before either of them it will be rejected by CM since nobody is listening. Arkady P.S. Sean, my appologies. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 4:29 PM To: Kanevsky, Arkady Cc: OpenFabrics General Subject: Re: [ofa-general] peer to peer connections support On 12/20/07, Kanevsky, Arkady [EMAIL PROTECTED] wrote: Yes. The question is who issues it? It can be done by the CM and not ULP. Looking way back at VIPL there was a peer-to-peer model with the API similar to the one which Shane outlines. If the CM issues the listen its means I can connect to you now only if you try to connect to me NOW, my understanding is that this is useless protocol, but I will be happy to hear why I am wrong. The IB stack co maintainer name is Sean Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] lock dependency in ib_user_mad
I see hangs killing opensm related to a bug in user_mad.c. The problem appears to be: ib_umad_close() downgrade_write(file-port-mutex) ib_unregister_mad_agent(...) up_read(file-port-mutex) ib_unregister_mad_agent() flushes any outstanding MADs, resulting in calls to send_handler() and recv_handler(), both of which call queue_packet(): queue_packet() down_read(file-port-mutex) ... up_read(file-port-mutex) This should be fine (and comes from an earlier set of changes to fix deadlocks): ib_umad_close() does a downgrade_write() before calling ib_unregister_mad_agent(), so it only holds the mutex with a read lock, which means that queue_packet() should be able to take another read lock. Unless there's something that prevents one thread from taking a read lock twice? What kernel are you seeing these problems with? Does anyone know the reasoning for holding the mutex around ib_unregister_mad_agent()? It's to keep things serialized against a port disappearing because a device is being removed. But looking at things, I think we can probably rejigger the locking to make things simpler, and avoid the use of downgrade_write(), which the -rt people don't like. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 15:02 -0600, Steve Wise wrote: Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... True, but the problem still happenens today, if we switch back to 64K iommu page size (which should be possible, I need to fix that), we -will- run out of iommu space on typical workloads and that is not acceptable. So we need to find a compromise. What I might do is something around the lines of: If size = PAGE_SIZE, and vaddr (page_address + offset) is PAGE_SIZE aligned, then I enforce alignment of the resulting mapping. That should fix your case. Anything requesting smaller than PAGE_SIZE mappings would lose that alignment but I -think- it should be safe, and you still always get 4K alignment anyway (+/- your offset) so at least small alignment restrictions are still enforced (such as cache line alignment etc...). I'll send you a test patch later today. Ben. Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] [RFC] IPOIB/CM Enable SRQ support on HCAs with less than 16 SG entries
Some HCAs like ehca2 support fewer than 16 SG entries. Currently IPoIB/CM implicitly assumes all HCAs will support 16 SG entries of 4K pages for 64K MTUs. This patch removes that restriction. This patch continues to use order 0 allocations and enables implementation of connected mode on such HCAs with smaller MTUs. HCAs having the capability to support 16 SG entries are left untouched. This patch addresses bug# 728: https://bugs.openfabrics.org/show_bug.cgi?id=728 While working on this patch I discovered that mthca reports an incorrect value of max_srq_sge. I had reported this issue previously too several weeks ago. I solved that by using a hard coded value of 16 for max_srq_sge (mthca only). More on that in a following mail. Signed-off-by: Pradeep Satyanarayana [EMAIL PROTECTED] --- --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-11-03 11:37:02.0 -0700 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-12-20 13:17:43.0 -0800 @@ -466,6 +466,7 @@ void ipoib_drain_cq(struct net_device *d #define IPOIB_CM_SUPPORTED(ha) (ha[0] (IPOIB_FLAGS_RC)) extern int ipoib_max_conn_qp; +extern int max_cm_mtu; static inline int ipoib_cm_admin_enabled(struct net_device *dev) { --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-11-21 07:46:35.0 -0800 +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-12-20 14:47:13.0 -0800 @@ -74,6 +74,9 @@ static struct ib_send_wr ipoib_cm_rx_dra .opcode = IB_WR_SEND, }; +static int num_of_frags; +int max_cm_mtu; + static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); @@ -96,13 +99,13 @@ static int ipoib_cm_post_receive_srq(str priv-cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; - for (i = 0; i IPOIB_CM_RX_SG; ++i) + for (i = 0; i num_of_frags; ++i) priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i]; ret = ib_post_srq_recv(priv-cm.srq, priv-cm.rx_wr, bad_wr); if (unlikely(ret)) { ipoib_warn(priv, post srq failed for buf %d (%d)\n, id, ret); - ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + ipoib_cm_dma_unmap_rx(priv, num_of_frags - 1, priv-cm.srq_ring[id].mapping); dev_kfree_skb_any(priv-cm.srq_ring[id].skb); priv-cm.srq_ring[id].skb = NULL; @@ -623,6 +626,7 @@ repost: --p-recv_count; ipoib_warn(priv, ipoib_cm_post_receive_nonsrq failed for buf %d\n, wr_id); + kfree(mapping); /*** Check if this needed ***/ } } } @@ -1399,16 +1403,17 @@ int ipoib_cm_add_mode_attr(struct net_de return device_create_file(dev-dev, dev_attr_mode); } -static void ipoib_cm_create_srq(struct net_device *dev) +static void ipoib_cm_create_srq(struct net_device *dev, int max_sge) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_srq_init_attr srq_init_attr = { .attr = { .max_wr = ipoib_recvq_size, - .max_sge = IPOIB_CM_RX_SG } }; + srq_init_attr.attr.max_sge = max_sge; + priv-cm.srq = ib_create_srq(priv-pd, srq_init_attr); if (IS_ERR(priv-cm.srq)) { if (PTR_ERR(priv-cm.srq) != -ENOSYS) @@ -1418,6 +1423,7 @@ static void ipoib_cm_create_srq(struct n return; } + priv-cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv-cm.srq_ring, GFP_KERNEL); if (!priv-cm.srq_ring) { @@ -1431,7 +1437,9 @@ static void ipoib_cm_create_srq(struct n int ipoib_cm_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - int i; + int i, ret; + struct ib_srq_attr srq_attr; + struct ib_device_attr attr; INIT_LIST_HEAD(priv-cm.passive_ids); INIT_LIST_HEAD(priv-cm.reap_list); @@ -1448,22 +1456,46 @@ int ipoib_cm_dev_init(struct net_device skb_queue_head_init(priv-cm.skb_queue); - for (i = 0; i IPOIB_CM_RX_SG; ++i) + ret = ib_query_device(priv-ca, attr); + if (ret) { + printk(KERN_WARNING ib_query_device() failed with %d\n, ret); + return ret; + } + + ipoib_dbg(priv, max_srq_sge=%d\n, attr.max_srq_sge); + + ipoib_cm_create_srq(dev, attr.max_srq_sge); + + if (ipoib_cm_has_srq(dev)) { + ret = ib_query_srq(priv-cm.srq, srq_attr); + if (ret) { + printk(KERN_WARNING ib_query_srq() failed with %d\n, ret); + return -EINVAL; + } + /* pad similar to IPOIB_CM_MTU */ + max_cm_mtu = srq_attr.max_sge * PAGE_SIZE - 0x10; + num_of_frags = srq_attr.max_sge; + ipoib_dbg(priv,
Re: [ofa-general] lock dependency in ib_user_mad
This should be fine (and comes from an earlier set of changes to fix deadlocks): ib_umad_close() does a downgrade_write() before calling ib_unregister_mad_agent(), so it only holds the mutex with a read lock, which means that queue_packet() should be able to take another read lock. I'll see if I can reproduce and get more info. I thought the mutex was contributing to the hang, but you're right. Unless there's something that prevents one thread from taking a read lock twice? What kernel are you seeing these problems with? I'm running 2.6.24-rc3. I'm out on vacation through the end of the year, so I'm not sure if I'll be able to debug this further for a couple of weeks. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Oops in mthca
I discovered the following Oops while developing a patch to enable SRQ on HCAs with fewer than 16 SG elements. The root of this issue appears to be that ib_query_device(priv-ca, attr) reports an incorrect value for attr.max_srq_sge. The value that ib_query_device returns is 28 (instead of 16 that I expected). Dec 20 13:19:47 elm3b39 kernel: Oops: Kernel access of bad area, sig: 11 [#2] Dec 20 13:19:47 elm3b39 kernel: SMP NR_CPUS=128 NUMA pSeries Dec 20 13:19:47 elm3b39 kernel: Modules linked in: ib_ipoib autofs4 rdma_ucm rdma_cm ib_addr iw_cm ib_uverbs ib_umad ib_mthca ib_cm ib_sa ib_mad ib_core ipv6 binfmt_misc parport_pc lp parport sg e1000 dm_snapshot dm_zero dm_mirror dm_mod ipr libata firmware_class sd_mod scsi_mod ehci_hcd ohci_hcd usbcore Dec 20 13:19:47 elm3b39 kernel: NIP: d02ffb60 LR: d02ffb08 CTR: c043a9b0 Dec 20 13:19:47 elm3b39 kernel: REGS: c001d05ff2e0 TRAP: 0300 Tainted: G D (2.6.24-rc5) Dec 20 13:19:47 elm3b39 kernel: MSR: 80009032 EE,ME,IR,DR CR: 24024424 XER: 0010 Dec 20 13:19:47 elm3b39 kernel: DAR: 60bf0008, DSISR: 4000 Dec 20 13:19:47 elm3b39 kernel: TASK = c001d2e4a000[8233] 'modprobe' THREAD: c001d05fc000 CPU: 4 Dec 20 13:19:47 elm3b39 kernel: GPR00: 0001 c001d05ff560 d0320308 c001d2e54010 Dec 20 13:19:47 elm3b39 kernel: GPR04: 0001 c001d0654000 0001 Dec 20 13:19:47 elm3b39 kernel: GPR08: 001c 60bf 60bf Dec 20 13:19:47 elm3b39 kernel: GPR12: d0301fc8 c057f600 d05a2090 d05a20d0 Dec 20 13:19:47 elm3b39 kernel: GPR16: 01e3 01e3 d032eba0 Dec 20 13:19:47 elm3b39 kernel: GPR20: 0034 c001d05ff690 0001 Dec 20 13:19:47 elm3b39 kernel: GPR24: c000e482b000 Dec 20 13:19:47 elm3b39 kernel: GPR28: c001d2972c00 d031f190 c001d020ee78 Dec 20 13:19:47 elm3b39 kernel: NIP [d02ffb60] .mthca_tavor_post_srq_recv+0xe0/0x2e0 [ib_mthca] Dec 20 13:19:47 elm3b39 kernel: LR [d02ffb08] .mthca_tavor_post_srq_recv+0x88/0x2e0 [ib_mthca] Dec 20 13:19:47 elm3b39 kernel: Call Trace: Dec 20 13:19:47 elm3b39 kernel: [c001d05ff560] [d02ffad4] .mthca_tavor_post_srq_recv+0x54/0x2e0 [ib_mthca] (unreliable) Dec 20 13:19:47 elm3b39 kernel: [c001d05ff620] [d03239fc] .ipoib_cm_post_receive_srq+0xbc/0x150 [ib_ipoib] Dec 20 13:19:47 elm3b39 kernel: [c001d05ff6d0] [d0325984] .ipoib_cm_dev_init+0x2f4/0x560 [ib_ipoib] Dec 20 13:19:47 elm3b39 kernel: [c001d05ff870] [d0322c74] .ipoib_transport_dev_init+0xd4/0x330 [ib_ipoib] Dec 20 13:19:47 elm3b39 kernel: [c001d05ff970] [d031f90c] .ipoib_ib_dev_init+0x3c/0xc0 [ib_ipoib] Dec 20 13:19:47 elm3b39 kernel: [c001d05ffa00] [d031aaac] .ipoib_dev_init+0x9c/0x160 [ib_ipoib] Dec 20 13:19:48 elm3b39 kernel: [c001d05ffaa0] [d031ad98] .ipoib_add_one+0x228/0x3b0 [ib_ipoib] Dec 20 13:19:48 elm3b39 kernel: [c001d05ffb60] [d01bf6ec] .ib_register_client+0xcc/0x110 [ib_core] Dec 20 13:19:48 elm3b39 kernel: [c001d05ffc00] [d0328484] .ipoib_init_module+0x174/0x2288 [ib_ipoib] Dec 20 13:19:48 elm3b39 kernel: [c001d05ffc90] [c008eeec] .sys_init_module+0x20c/0x1aa0 Dec 20 13:19:48 elm3b39 kernel: [c001d05ffe30] [c00086ac] syscall_exit+0x0/0x40 Dec 20 13:19:48 elm3b39 kernel: Instruction dump: Dec 20 13:19:48 elm3b39 kernel: 419c0204 2f89 38630010 38e0 409d0060 38e0 3900 6000 Dec 20 13:19:48 elm3b39 kernel: e95f0010 38070001 7c0707b4 7d6a4214 800b0008 9003 6000 6000 lspci -v gives me the following: 0002:d8:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Flags: bus master, 66MHz, medium devsel, latency 144 Bus: primary=d8, secondary=d9, subordinate=d9, sec-latency=128 Memory behind bridge: c000-c88f Capabilities: [70] PCI-X bridge device 0002:d9:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technologies MT23108 InfiniHost Flags: bus master, 66MHz, medium devsel, latency 144, IRQ 121 Memory at 400c880 (64-bit, non-prefetchable) [size=1M] Memory at 400c800 (64-bit, prefetchable) [size=8M] Memory at 400c000 (64-bit, prefetchable) [size=128M] Capabilities: [40] MSI-X: Enable- Mask- TabSize=32 Capabilities: [50] Vital Product Data Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- Capabilities: [70] PCI-X non-bridge device Pradeep ___ general mailing list general@lists.openfabrics.org
[ofa-general] Java invoke the verbs through JNI
Hi, all I just wrote a JNI program to use IB in Java program. I wrote some simple test programs, It is ok. But when I want to integrate it with another program , Local protection error is reported. It is unstable and it is wrong during the most of time. Can someone give me some advice? Thanks. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Oops in mthca
I discovered the following Oops while developing a patch to enable SRQ on HCAs with fewer than 16 SG elements. So is this oops with some version of your patch for limited SRQ scatter entries applied? It's hard to know exactly what is going wrong but I suspect that if you get a device that allows more than 16 SRQ scatter entries, your patch passes that value for num_sg without changing the declaration of rx_sge[] to have enough entries, so when posting the receive request, the low-level driver goes off the end of the array. The root of this issue appears to be that ib_query_device(priv-ca, attr) reports an incorrect value for attr.max_srq_sge. The value that ib_query_device returns is 28 (instead of 16 that I expected). Why do you think the value 28 is incorrect? Unfortunately I don't have any PCI-X systems any more, but I don't see anything obvoius in the mthca code that would make the value it returns for max_srq_sge being wrong. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH] [RFC] IPOIB/CM Enable SRQ support on HCAs with less than 16 SG entries
+static int num_of_frags; +int max_cm_mtu; I think these values need to be per-interface -- think of the case of a system with more than one type of HCA installed, where the different HCAs have different limits. @@ -623,6 +626,7 @@ repost: --p-recv_count; ipoib_warn(priv, ipoib_cm_post_receive_nonsrq failed for buf %d\n, wr_id); +kfree(mapping); /*** Check if this needed ***/ This looks really bogus -- I don't see anything in your patch that changes mapping from being allocated on the stack. +if (ipoib_cm_has_srq(dev)) { +ret = ib_query_srq(priv-cm.srq, srq_attr); +if (ret) { +printk(KERN_WARNING ib_query_srq() failed with %d\n, ret); +return -EINVAL; +} +/* pad similar to IPOIB_CM_MTU */ +max_cm_mtu = srq_attr.max_sge * PAGE_SIZE - 0x10; +num_of_frags = srq_attr.max_sge; +ipoib_dbg(priv, max_cm_mtu = 0x%x, num_of_frags=%d\n, + max_cm_mtu, num_of_frags); +} else { +max_cm_mtu = IPOIB_CM_MTU; +num_of_frags = IPOIB_CM_RX_SG; +} I think in the SRQ case you still want to make sure num_of_frags is no more than IPOIB_CM_RX_SG. And if we're going to check the SRQ scatter capabilities, we should probably add the same thing for the non-SRQ case to make sure we don't exceed what QP receive queues can handle. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] ibnetdiscover - ports report
Hello IB developers and users, I would like to get feedback on the following patch to ibnetdiscover. The patch introduce additional output mode for ibnetdiscover which is focused on the ports, and print one line for each port with the needed port information. The output looks like: SW 4 18 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 17 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 16 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 15 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 14 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 13 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 9 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 8 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 7 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 6 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 5 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 4 0x0008f104003f0838 4x SDR 'ISR9288/ISR9096 Voltaire sLB-24' SW 4 1 0x0008f104003f0838 4x SDR - SW 6 3 0x0008f104004005f5 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) SW 4 2 0x0008f104003f0838 4x SDR - SW 7 3 0x0008f104004005f6 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) SW 4 3 0x0008f104003f0838 4x SDR - SW 1 3 0x0008f104004005f7 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) SW 4 10 0x0008f104003f0838 4x SDR - SW 8 3 0x0008f104004006f5 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) SW 4 11 0x0008f104003f0838 4x SDR - SW 9 3 0x0008f104004006f6 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) SW 4 12 0x0008f104003f0838 4x SDR - SW10 3 0x0008f104004006f7 ( 'ISR9288/ISR9096 Voltaire sLB-24' - 'ISR9288 Voltaire sFB-12' ) CA14 1 0x0008f10403960091 4x SDR - SW 4 20 0x0008f104003f0838 ( 'Voltaire HCA400' - 'ISR9288/ISR9096 Voltaire sLB-24' ) CA11 1 0x0002c90107a4e431 4x SDR - SW 4 19 0x0008f104003f0838 ( 'Voltaire HCA400' - 'ISR9288/ISR9096 Voltaire sLB-24' ) CA 2 1 0x0008f1000102d801 4x SDR - SW 1 15 0x0008f104004005f7 ( 'Voltaire IB-to-TCP/IP Router' - 'ISR9288 Voltaire Thanks, Erez Strauss Voltaire. - Date: Thu Dec 20 19:36:14 2007 -0500 Added the -p(orts) option, to generate ports reports Signed-off-by: Erez Strauss erezs _at_ voltaire.com --- infiniband-diags/src/ibnetdiscover.c | 64 -- 1 files changed, 61 insertions(+), 3 deletions(-) diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 8b229c1..3c2e6b6 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -119,6 +119,17 @@ get_linkspeed_str(int linkspeed) return linkspeed_str[linkspeed]; } +static inline const char* +node_type_str2(Node *node) +{ + switch(node-type) { + case SWITCH_NODE: return SW; + case CA_NODE: return CA; + case ROUTER_NODE: return RT; + } + return ??; +} + int get_port(Port *port, int portnum, ib_portid_t *portid) { @@ -839,11 +850,50 @@ dump_topology(int listtype, int group) return i; } +void dump_ports_report () +{ + int b, n = 0, p; + Node *node; + Port *port; + + // If switch and LID == 0, search of other switch ports with valid LID and assign it to all ports of that switch + for (b = 0; b = MAXHOPS; b++) + for (node = nodesdist[b]; node; node = node-dnext) + if (node-type == SWITCH_NODE) { + int swlid = 0; + for (p = 0, port = node-ports; p node-numports port !swlid; port = port-next) + if (port-lid != 0) + swlid = port-lid; + for (p = 0, port = node-ports; p node-numports port; port = port-next) + port-lid = swlid; + } + for (b = 0; b = MAXHOPS; b++) + for (node = nodesdist[b]; node; node = node-dnext) { + for (p = 0, port = node-ports; p node-numports port; p++, port = port-next) { + fprintf (stdout, %2s %5d %2d 0x%016llx %s %s, +node_type_str2 (port-node), port-lid, port-portnum, +(unsigned long long)port-portguid, + get_linkwidth_str(port-linkwidth), get_linkspeed_str(port-linkspeed)); + if (port-remoteport) + fprintf (stdout, - %2s %5d %2d 0x%016llx ( '%s' -
Re: [ofa-general] [PATCH] [RFC] IPOIB/CM Enable SRQ support on HCAs with less than 16 SG entries
Good points. I will incorporate your comments. Roland Dreier wrote: +static int num_of_frags; +int max_cm_mtu; I think these values need to be per-interface -- think of the case of a system with more than one type of HCA installed, where the different HCAs have different limits. @@ -623,6 +626,7 @@ repost: --p-recv_count; ipoib_warn(priv, ipoib_cm_post_receive_nonsrq failed for buf %d\n, wr_id); + kfree(mapping); /*** Check if this needed ***/ This looks really bogus -- I don't see anything in your patch that changes mapping from being allocated on the stack. Right, as the comment illustrates it is a hold over from something else and slipped into the patch. Pradeep ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? We already have a large vs. small split in the iommu virtual space to alleviate this (though it's not a hard constraint, we can still get into the other side if the default one is full). Try that patch and let me know: Seems to be working! :) Index: linux-work/arch/powerpc/kernel/iommu.c === --- linux-work.orig/arch/powerpc/kernel/iommu.c 2007-12-21 10:39:39.0 +1100 +++ linux-work/arch/powerpc/kernel/iommu.c 2007-12-21 10:46:18.0 +1100 @@ -278,6 +278,7 @@ int iommu_map_sg(struct iommu_table *tbl unsigned long flags; struct scatterlist *s, *outs, *segstart; int outcount, incount, i; + unsigned int align; unsigned long handle; BUG_ON(direction == DMA_NONE); @@ -309,7 +310,11 @@ int iommu_map_sg(struct iommu_table *tbl /* Allocate iommu entries for that segment */ vaddr = (unsigned long) sg_virt(s); npages = iommu_num_pages(vaddr, slen); - entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT (vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + entry = iommu_range_alloc(tbl, npages, handle, + mask IOMMU_PAGE_SHIFT, align); DBG( - vaddr: %lx, size: %lx\n, vaddr, slen); @@ -572,7 +577,7 @@ dma_addr_t iommu_map_single(struct iommu { dma_addr_t dma_handle = DMA_ERROR_CODE; unsigned long uaddr; - unsigned int npages; + unsigned int npages, align; BUG_ON(direction == DMA_NONE); @@ -580,8 +585,13 @@ dma_addr_t iommu_map_single(struct iommu npages = iommu_num_pages(uaddr, size); if (tbl) { + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT + ((unsigned long)vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + dma_handle = iommu_alloc(tbl, vaddr, npages, direction, -mask IOMMU_PAGE_SHIFT, 0); +mask IOMMU_PAGE_SHIFT, align); if (dma_handle == DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO iommu_alloc failed, ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] nightly osm_sim report 2007-12-21:normal completion
OSM Simulation Regression Summary [Generated mail - please do NOT reply] OpenSM binary date = 2007-12-20 OpenSM git rev = Mon_Dec_17_15:20:43_2007 [9988f459cb81dd025bde8b2dd53b3c551616be0c] ibutils git rev = Wed_Dec_19_12:06:28_2007 [9961475294fbf1d3782edb8f377a77b13fa80d70] Total=560 Pass=559 Fail=1 Pass: 42 Stability IS1-16.topo 42 Pkey IS1-16.topo 42 OsmTest IS1-16.topo 42 OsmStress IS1-16.topo 42 Multicast IS1-16.topo 42 LidMgr IS1-16.topo 14 Stability IS3-loop.topo 14 Stability IS3-128.topo 14 Pkey IS3-128.topo 14 OsmTest IS3-loop.topo 14 OsmTest IS3-128.topo 14 OsmStress IS3-128.topo 14 Multicast IS3-loop.topo 14 Multicast IS3-128.topo 14 FatTree merge-roots-4-ary-2-tree.topo 14 FatTree merge-root-4-ary-3-tree.topo 14 FatTree gnu-stallion-64.topo 14 FatTree blend-4-ary-2-tree.topo 14 FatTree RhinoDDR.topo 14 FatTree FullGnu.topo 14 FatTree 4-ary-2-tree.topo 14 FatTree 2-ary-4-tree.topo 14 FatTree 12-node-spaced.topo 14 FTreeFail 4-ary-2-tree-missing-sw-link.topo 14 FTreeFail 4-ary-2-tree-links-at-same-rank-2.topo 14 FTreeFail 4-ary-2-tree-links-at-same-rank-1.topo 14 FTreeFail 4-ary-2-tree-diff-num-pgroups.topo 13 LidMgr IS3-128.topo Failures: 1 LidMgr IS3-128.topo ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: iommu dma mapping alignment requirements
BTW. I need to know urgently what HW is broken by this Ben. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general