Re: [ofa-general] PATCH] IB/ipoib: ignore membership bit when looking for a P_Key in the table
Hal Rosenstock wrote: On 7/23/07, *Moni Shoua* [EMAIL PROTECTED] Hal Rosenstock wrote: - if (pkey == tmp_pkey) { + if ((pkey 0x7fff) == (tmp_pkey 0x7fff)) { Wouldn't this allow 2 limited PKeys to match though ? Hi Hal, Can you please explain what do you mean? Perhaps by example? Two Pkeys which have their full memebership bit off (0x8000). Two limited members are not allowed to talk with each other. Hal, ib_find_pkey() is the buddy of ib_find_cached_pkey() which is in the stack from day one. Now, ib_find_cached_pkey does some abstraction where it masks out the membership bit, so pkeys are matched in 15 bit fashion. Indeed, the overall design of the IB stack wrt to partial membership in a partition is not perfect nor final. I don't see why this masking off makes things worse then they could have been without it. As you know, as some changes need to be done in the IB spec and the IPoIB RFC, I am personally holding off with suggesting changes/fixes till the spec is done, this is per the approach expressed by you and Sean. Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] ofa_1_2_kernel 20070724-0100 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_2/linux-2.6.git git_branch: ofed_1_2 Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-8.el5 Failed: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPoIB path caching
Sean Hefty wrote: What I have in mind is that IPoIB must not use cached IB path info. If the IB stack has path caching which is in the default flow of requesting a path record, it should provide an API (eg flag to the function through which one does path query) to request a non cached path. Argh! This was the original design. I believe the current design is a better approach. The ULP shouldn't care whether the PR is cached or not - only that it's usable. Linux has a quite sophisticated mechanism to maintain / cache / probe / invalidate / update the network stack L2 neighbour info. Stating that although the neighbour cache state machine decided to update/delete a neighbour it is just correct by design for IPoIB to use cached IB L2 info is somehow moving too fast I think, some discussion is needed here. My basic thought is that for IPoIB its better to never use cached path then to always use cached path. But! maybe there's a way in the middle here, lets think. This is what I was referring to when saying almost always. For example, in the Voltaire gen1 stack we had an ib arp module which was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This module managed some sort of path cache, were IPoIB was always asking for non-cached path and other ULPs were willing to get cached path. The design I was thinking to suggest for IPoIB is to almost always use this API since this policy makes the implementation consistent with the decisions made by the network stack neighbour cache This defeats one of the benefit of caching, which is using a single GetTable query, versus literally hundreds or thousands of Get queries. Consider that constant all-to-all communication using IPoIB between 1024 ports, with a 15 minute ARP table timeout would hit the SA with close to 600 queries per second. If the cache comes to serve all-to-all MPI jobs and practically with IB, to get MPI performance (specifically latency) people would --not-- be using IPoIB for their MPI jobs since they want kernel AND net-stack bypass, it does make sense to use non-cached path in IPoIB if we agree that design-wise its the the correct approach. While I agree that there's the potential for a problem, given that IPoIB has always cached PRs and no one has reported problems, I think we're overstating the likelihood of issues occurring in practice. Even the SA caches the path data -- getting a PR from the SA doesn't provide any additional guarantees. I am not with you... I would expect an SA implementation to invalid / recompute the relevant data structures associated with each change in the fabric and get a trap for each change. Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration
On 7/23/07, Sasha Khapyorsky [EMAIL PROTECTED] wrote: Hi Eitan, On 20:59 Mon 23 Jul , Eitan Zahavi wrote: Hi Sasha, Hal, I think I have an idea: Since this is a specific switch that reported ChangeBit or Trap why can't we just qualify that there was no change in the switch setup? The ChangeBit seems to be good start point - then OpenSM will query all switch ports PortInfo anyway and if for all ports PortState is = INIT (and at least for one port it is = INIT), it means that this switch was rebooted/reinitialized. And for single port PortState drop to = INIT should indicate reinitialization. Seems correct? Wouldn't this be all ports in INIT indicate reset of switch ? -- Hal We could send PortInfo, SwitchInfo, SwitchInfo is queried at each light sweep, PortInfo's if ChangeBit is set. Guess we are ok with it even now. LFT, MFT, SL2VL, VLArb, PKey queries and make sure no change from previous state. Or we could simply enforce last state by sending it over again ... I think we could want to re-read PKey tables in order to preserve existing PKey indices and just to flush (overwrite with new settings) LFT, MFT, SL2VL, VLArb tables. Reasonable? Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] libibmad: Fixed a name of a field in SwitchInfo to the right name
Fixed a name of a field in SwitchInfo to the right name. Signed-off-by: Dotan Barak [EMAIL PROTECTED] --- Index: connectx_user/src/userspace/management/libibmad/src/fields.c === --- connectx_user.orig/src/userspace/management/libibmad/src/fields.c 2007-07-22 16:34:02.0 +0300 +++ connectx_user/src/userspace/management/libibmad/src/fields.c 2007-07-24 13:58:41.0 +0300 @@ -193,7 +193,7 @@ ib_field_t ib_mad_f [] = { [IB_SW_PARTITION_ENF_INB_F] {BITSOFFS(128, 1), InboundPartEnf, mad_dump_uint}, [IB_SW_PARTITION_ENF_OUTB_F]{BITSOFFS(129, 1), OutboundPartEnf, mad_dump_uint}, [IB_SW_FILTER_RAW_INB_F]{BITSOFFS(130, 1), FilterRawInbound, mad_dump_uint}, - [IB_SW_FILTER_RAW_OUTB_F] {BITSOFFS(131, 1), FilterRawInbound, mad_dump_uint}, + [IB_SW_FILTER_RAW_OUTB_F] {BITSOFFS(131, 1), FilterRawOutbound, mad_dump_uint}, [IB_SW_ENHANCED_PORT0_F]{BITSOFFS(132, 1), EnhancedPort0, mad_dump_uint}, /* ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Bug in inline sends with sge_num 0 in libmlx4
Hi, There is a bug in mlx4_post_send(). A data that is sent inline and consists from multiple small sges isn't copied properly into wqe. The following patch fixes it for me. Signed-off-by: Gleb Natapov [EMAIL PROTECTED] diff --git a/src/qp.c b/src/qp.c index 66ee309..83a4fd4 100644 --- a/src/qp.c +++ b/src/qp.c @@ -288,6 +288,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, memcpy(wqe, addr, len); wqe += len; seg_len += len; + off += len; } if (seg_len) { -- Gleb. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] RE: OFA website edits
I would like to propose adding project directories under http://www.openfabrics.org/downloads/ where appropriate and give maintainers access. For example: http://www.openfabrics.org/downloads/verbs (rdreier) http://www.openfabrics.org/downloads/rdmacm (shefty) http://www.openfabrics.org/downloads/dapl (ardavis) http://www.openfabrics.org/downloads/management (sashak) http://www.openfabrics.org/downloads/OFED (vlad) http://www.openfabrics.org/downloads/WinOF (ardavis) http://www.openfabrics.org/downloads/archives (vlad) ?? etc... Each of these would contain a README that details the contents of the directory along with WEB_README that provides a short description for the webpage. Jeff could then automatically parse for directories under downloads and if it contains WEB_README add a webpage link to the directory along with the short description. Looks good for me. Regards, Vladimir ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Bug in inline sends with sge_num 0 in libmlx4
On Tuesday 24 July 2007 15:14, Gleb Natapov wrote: Hi, There is a bug in mlx4_post_send(). A data that is sent inline and consists from multiple small sges isn't copied properly into wqe. The following patch fixes it for me. Signed-off-by: Gleb Natapov [EMAIL PROTECTED] diff --git a/src/qp.c b/src/qp.c index 66ee309..83a4fd4 100644 --- a/src/qp.c +++ b/src/qp.c @@ -288,6 +288,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, memcpy(wqe, addr, len); wqe += len; seg_len += len; + off += len; } if (seg_len) { Good catch! This patch is correct. Roland? - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH] libibmad: Fixed a name of a field in SwitchInfo to the right name
On 14:32 Tue 24 Jul , Dotan Barak wrote: Fixed a name of a field in SwitchInfo to the right name. Signed-off-by: Dotan Barak [EMAIL PROTECTED] Applied. Thanks. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] OpenSM detection of duplicated GUIDs on loopback
Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback
Hi Hal, What is this loopback connector used for? Does not seem to me like a very useful thing to do. Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 5:31 PM To: OpenFabrics General Cc: Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik Subject: OpenSM detection of duplicated GUIDs on loopback Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback
Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *Hi Hal,* ** *What is this loopback connector used for?* *Does not seem to me like a very useful thing to do.* Perhaps not but no reason OpenSM can't handle this more gracefully. *Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check.* Why would a separate flag be needed ? -- Hal *Eitan Zahavi*** Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -- *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, July 24, 2007 5:31 PM *To:* OpenFabrics General *Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik *Subject:* OpenSM detection of duplicated GUIDs on loopback Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. but tagging each branch separately is indeed a PITA... anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... Can you publish the scripts and/or the tree? I think we can start by just running the scripts nightly, making it possible for people to view backport history with gitview. i've attached the script that i'm using to compare the trees, but it's a total hack. it doesn't keep the patch history. that would not be too hard to do i guess -- if there's interest... to run the script: cp attached files here... $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel $ cd ofed_kernel $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done now you'll have a bunch of backport-2.6.xxx branches... arthur 2.6.5_sles9_sp3 2.6.9_U2 2.6.9_U3 2.6.9_U4 2.6.9_U5 2.6.11_FC4 2.6.11 2.6.12 2.6.13_suse10_0_u 2.6.13 2.6.14 2.6.15_ubuntu606 2.6.15 2.6.16_sles10 2.6.16_sles10_sp1 2.6.16 2.6.17 2.6.18_FC6 2.6.18 2.6.19 2.6.20 2.6.21 2.6.22 create-backport.sh Description: Bourne shell script ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 5:53 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, What is this loopback connector used for? Does not seem to me like a very useful thing to do. Perhaps not but no reason OpenSM can't handle this more gracefully. Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check. Why would a separate flag be needed ? [EZ] Since I do not see any other solution for the SM to know it is really a loop back plug rather then two devices with same GUID connected back to back ... -- Hal Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 5:31 PM To: OpenFabrics General Cc: Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik Subject: OpenSM detection of duplicated GUIDs on loopback Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback
On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, July 24, 2007 5:53 PM *To:* Eitan Zahavi *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *Hi Hal,* ** *What is this loopback connector used for?* *Does not seem to me like a very useful thing to do.* ** Perhaps not but no reason OpenSM can't handle this more gracefully. *Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check.* ** Why would a separate flag be needed ? *[EZ] Since I do not see any other solution for the SM to know it is really a loop back plug rather then two devices with same GUID connected back to back ...* Technically, this should only occur when looped back and not two devices with same GUID as GUID == globally unique and a duplication indicates a manufacturing issue. Anyhow, can't these be treated the same (and handled more gracefully) without an additional option/flag ? -- Hal -- Hal ** *Eitan Zahavi*** Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -- *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent: *Tuesday, July 24, 2007 5:31 PM *To:* OpenFabrics General *Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik *Subject:* OpenSM detection of duplicated GUIDs on loopback Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. who uses git this way? but tagging each branch separately is indeed a PITA... This is just one problem. For example, git pull can only merge one branch at a time. anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... can you publish the scripts and/or the tree? i think we can start by just running the scripts nightly, making it possible for people to view backport history with gitview. i've attached the script that i'm using to compare the trees, but it's a total hack. it doesn't keep the patch history. that would not be too hard to do i guess -- if there's interest... to run the script: cp attached files here... $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel $ cd ofed_kernel $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done now you'll have a bunch of backport-2.6.xxx branches... So, would you like to have this script run nightly on ofed trees? -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote: Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. who uses git this way? i do. but tagging each branch separately is indeed a PITA... This is just one problem. For example, git pull can only merge one branch at a time. how is this a problem? the way i use git, i use a script to reflow the changes into the dependent branches. over the last few months, anyway, it has worked fine... anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... can you publish the scripts and/or the tree? i think we can start by just running the scripts nightly, making it possible for people to view backport history with gitview. i've attached the script that i'm using to compare the trees, but it's a total hack. it doesn't keep the patch history. that would not be too hard to do i guess -- if there's interest... to run the script: cp attached files here... $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel $ cd ofed_kernel $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done now you'll have a bunch of backport-2.6.xxx branches... So, would you like to have this script run nightly on ofed trees? if someone finds that useful. my main motivation is getting rid of all the patches in ofed, if running this script nightly helps us to get there, then i'm all for it. if it's just for me, it's easy enough to run the scripts by hand... arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 06:32:28PM +0300, Michael S. Tsirkin wrote: [...] For example, git pull can only merge one branch at a time. how is this a problem? the way i use git, i use a script to reflow the changes into the dependent branches. over the last few months, anyway, it has worked fine... Precisely because no one developed on these branches, so you are re-generating themfrom patches - not a problem, but as you point out not too useful either. well, no, i _have_ been doing development on the local branches in our internal repo. i also merge in changes that you make to the ofed repo to our internal backport branches. the script i posted is just so that i can more easily compare our internal branches to the ofed backport branches. If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. yes, i have to merge them one at a time. i still don't see how this is a problem. backport changes can be pulled in and the changes from upstream can be merged in as well. i haven't had a problem with this so far. can you be more specific about what you expect will fail? arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote: Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. who uses git this way? i do. but tagging each branch separately is indeed a PITA... This is just one problem. For example, git pull can only merge one branch at a time. how is this a problem? the way i use git, i use a script to reflow the changes into the dependent branches. over the last few months, anyway, it has worked fine... Precisely because no one developed on these branches, so you are re-generating themfrom patches - not a problem, but as you point out not too useful either. If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Command specification of ca_name and ca_port
On Tue, 24 Jul 2007 04:33:06 +0300 Sasha Khapyorsky [EMAIL PROTECTED] wrote: Hi David, On 09:52 Mon 23 Jul , David McMillen wrote: There are a standard set of command line options that allow specification of the CA to use for sending the requests. I'm adding these to programs that don't have them, since they are very useful when diagnosing a node connected to multiple subnets. Even if you discount multiple subnets on purpose, sometimes this happens when the hardware connecting all of the CA ports to the same place gets broken, and that is when you need diagnostics that can help figure out what is where. The standard options are: -C ca_nameuse the specified ca_name. -P ca_portuse the specified ca_port. -t timeout_ms override the default timeout for the solicited mads. My problem is that saquery already uses -C and -P, although the -t exists for the expected purpose. Also, ibcheckerrs already uses -t for specifying the threshold file. I think unified command line options over diags are good thing, so I guess reasonable renaming should be acceptable. I agree, however right now saquery does not support specifying the ca_name or ca_port, so you would have to add that support. Changing the timeout for ibcheckerrs isn't critical, but not being able to do it doesn't seem right. However, the saquery command could be really handy for figuring out split fabrics, and is useful to those of us that connect to multiple subnets. Does anybody have a useful suggestion? '-T' for the threshold file? That sounds good. But it is easy part - saquery renames are less intuitive :(. Probably just lower case? Or special query option (-q or -Q), so queries could be specified as -qP, -qC? I disagree with this because ~50% of the options are query's, it's primary purpose is to query, and most of the other options change the format of the output of the query. Therefore, I don't think a -q should be required for a query. I think that seems redundant. Perhaps just changing the current option to -c,-p, and adding -C and -P would be best. I know this might break some scripts out there, particularly mine, but I think it is the right thing to do if you really want consistency. Thoughts? Ira ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
at qlogic we now keep the backports as branches in our git tree and this, i find, is much easier to handle. because: * viewing and navigating backport source becomes _much_ easier. * merges are easier -- patches are much more fragile than branches. * comparisons are easier -- checking for differences between backports and between a backport and the canonical source is faster and more convenient... * changesets are readable. trying to decipher diffs to patches is medically proven to take months, if not years, off your life. Let's add that you don't need patches to patches, and the order patches are applied isn't determined alphabetically. anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... I would love OFED to dump the patch directory concept. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 06:53:48PM +0300, Michael S. Tsirkin wrote: [...] well, no, i _have_ been doing development on the local branches in our internal repo. i also merge in changes that you make to the ofed repo to our internal backport branches. the script i posted is just so that i can more easily compare our internal branches to the ofed backport branches. How do you do the merging? for just the backport branches, i merge different ways from different sources: * from upstream, it's a pull into master and a git merge master into local backport branches -- i call this a reflow. * from local developers, it's a git pull straight into the backport branch, then reflow the repo. * from ofed, i apply the backport patch by hand and fixup the inevitable clashes -- either because part of the patch is already applied, or because context has changed enough for git apply to get confused. when these are fixed up, reflow the repo... If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. yes, i have to merge them one at a time. i still don't see how this is a problem. backport changes can be pulled in and the changes from upstream can be merged in as well. i haven't had a problem with this so far. can you be more specific about what you expect will fail? Well, as distro maintainers we need to merge a lot, from different people. We'll have to write all kind of scripts to do it instead of a plain git pull. i can't imagine what script you would need. can you be more specific? it would seem to me that you could just pull straight in to the backport branch... And, I expect almost all git operations will have to be wrapped in a script in some way, to operate on a bush of branches. so far, this hasn't been an issue for me. the only operation that i've scripted is the reflow. for most work, i can just ignore the backport branches and do the work in the (copy of) master, then reflow the changes into the backports... arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Sean Hefty [EMAIL PROTECTED]: Subject: RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits at qlogic we now keep the backports as branches in our git tree and this, i find, is much easier to handle. because: * viewing and navigating backport source becomes _much_ easier. * merges are easier -- patches are much more fragile than branches. * comparisons are easier -- checking for differences between backports and between a backport and the canonical source is faster and more convenient... * changesets are readable. trying to decipher diffs to patches is medically proven to take months, if not years, off your life. Let's add that you don't need patches to patches, and the order patches are applied isn't determined alphabetically. anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... I would love OFED to dump the patch directory concept. I'd love to have a common source for all kernels, and the kernel_addons mechanism does this for us whenever possible. But, for these cases where the code actually needs to be modified, applying a patch seems like the least evil way to do it. Alternatives seem to be much worse. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:53:48PM +0300, Michael S. Tsirkin wrote: [...] well, no, i _have_ been doing development on the local branches in our internal repo. i also merge in changes that you make to the ofed repo to our internal backport branches. the script i posted is just so that i can more easily compare our internal branches to the ofed backport branches. How do you do the merging? for just the backport branches, i merge different ways from different sources: * from upstream, it's a pull into master and a git merge master into local backport branches -- i call this a reflow. * from local developers, it's a git pull straight into the backport branch, then reflow the repo. * from ofed, i apply the backport patch by hand and fixup the inevitable clashes -- either because part of the patch is already applied, or because context has changed enough for git apply to get confused. when these are fixed up, reflow the repo... Hmm. Concider that yuou did all of the above, and then mail me that there's an update. Now I need to merge updates to multiple branches directly and git pull does not do this. It's a problem. If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. yes, i have to merge them one at a time. i still don't see how this is a problem. backport changes can be pulled in and the changes from upstream can be merged in as well. i haven't had a problem with this so far. can you be more specific about what you expect will fail? Well, as distro maintainers we need to merge a lot, from different people. We'll have to write all kind of scripts to do it instead of a plain git pull. i can't imagine what script you would need. can you be more specific? it would seem to me that you could just pull straight in to the backport branch... You'll have to check out branches one by one, and do a pull. What if there's a conflict? I currently just do git reset --hard ORIG_HEAD and mail the maintainer to fix it up - but this won't work with the bush of branches approach. And, I expect almost all git operations will have to be wrapped in a script in some way, to operate on a bush of branches. so far, this hasn't been an issue for me. the only operation that i've scripted is the reflow. for most work, i can just ignore the backport branches and do the work in the (copy of) master, then reflow the changes into the backports... Because you only have your driver to maintain. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPoIB path caching
Linux has a quite sophisticated mechanism to maintain / cache / probe / invalidate / update the network stack L2 neighbour info. Path records are not just L2 info. They contain L4, L3, and L2 info together. For example, in the Voltaire gen1 stack we had an ib arp module which was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This module managed some sort of path cache, were IPoIB was always asking for non-cached path and other ULPs were willing to get cached path. IMO, using a cached AH is no different than using a cached path. You're simply mapping the PR data into another structure. We're ignoring the problem here, and that is that a centralized SA doesn't scale. MPI stacks have largely ignored this problem by simply not doing path record queries. Path information is often hard-coded, with QPN data exchanged out of band over sockets (often over Ethernet). We've seen problems running large MPI jobs without PR caching. I know that Silverstorm/QLogic did as well. And apparently Voltaire hit the same type of problem, since you added a caching module. (Did Mellanox and Topspin/Cisco create PR caches as well?) At least three companies working on IB came up with the same solution. What is the objection to the current patch set? - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 07:23:06PM +0300, Michael S. Tsirkin wrote: [...] for just the backport branches, i merge different ways from different sources: * from upstream, it's a pull into master and a git merge master into local backport branches -- i call this a reflow. * from local developers, it's a git pull straight into the backport branch, then reflow the repo. * from ofed, i apply the backport patch by hand and fixup the inevitable clashes -- either because part of the patch is already applied, or because context has changed enough for git apply to get confused. when these are fixed up, reflow the repo... Hmm. Concider that yuou did all of the above, and then mail me that there's an update. Now I need to merge updates to multiple branches directly and git pull does not do this. It's a problem. for changes made to the canonical source, it's just git pull into ofed_kernel and a reflow. for changes made to the backports, you would need to git checkout and git pull into each of the backport branches _in which i made a change_. the case that i make changes to _all_ or even a significant number of backport patches is sufficiently rare that i doubt it is worth scripting. but, if the script is necessary, it's pretty straightforward: set -e for b in branches-which-have-changed; do git checkout $b git pull remote $b done [...] i can't imagine what script you would need. can you be more specific? it would seem to me that you could just pull straight in to the backport branch... You'll have to check out branches one by one, and do a pull. What if there's a conflict? I currently just do git reset --hard ORIG_HEAD and mail the maintainer to fix it up - but this won't work with the bush of branches approach. it works for me. what do you expect will break? And, I expect almost all git operations will have to be wrapped in a script in some way, to operate on a bush of branches. so far, this hasn't been an issue for me. the only operation that i've scripted is the reflow. for most work, i can just ignore the backport branches and do the work in the (copy of) master, then reflow the changes into the backports... Because you only have your driver to maintain. no, i have to maintain quite a few of the ofed backport branches as well for our release. if i started getting pull requests from people with changes to 15 backport branches in one go, i'd probably want to script it... i have found that drawing a DAG with graphviz has been a big help in making sure that i organize the branches correctly... arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 07:16:46PM +0300, Michael S. Tsirkin wrote: [...] But, for these cases where the code actually needs to be modified, applying a patch seems like the least evil way to do it. Alternatives seem to be much worse. what is it about patches that are less evil than changesets? can you list some of the advantages? arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Because you only have your driver to maintain. no, i have to maintain quite a few of the ofed backport branches as well for our release. if i started getting pull requests from people with changes to 15 backport branches in one go, i'd probably want to script it... Yea. Happens all the time here: when component maintainer makes a change, it will typically affect all backports or none. i have found that drawing a DAG with graphviz has been a big help in making sure that i organize the branches correctly... Ugh .. *that* sounds complicated. Looks like it's much simpler with current setup. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback
Hi, On 11:03 Tue 24 Jul , Hal Rosenstock wrote: On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, July 24, 2007 5:53 PM *To:* Eitan Zahavi *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *Hi Hal,* ** *What is this loopback connector used for?* *Does not seem to me like a very useful thing to do.* ** Perhaps not but no reason OpenSM can't handle this more gracefully. I don't have loopback plug, but used loopback connections for some checks with simulator. There is nothing illegal, so I think it would be better to support it. *Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check.* ** Why would a separate flag be needed ? *[EZ] Since I do not see any other solution for the SM to know it is really a loop back plug rather then two devices with same GUID connected back to back ...* Also we saw the cases when port moving triggers duplicated GUIDs detector (originally was reported on real fabric and it is trivially reproducible in simulated environment). So probably we need to find some better way to handle duplication GUID detector (in general, not just for loopback). For example node_info content could be compared. More ideas? Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 07:16:46PM +0300, Michael S. Tsirkin wrote: [...] But, for these cases where the code actually needs to be modified, applying a patch seems like the least evil way to do it. Alternatives seem to be much worse. what is it about patches that are less evil than changesets? can you list some of the advantages? changesets *do not exist* in git - git tracks content. I compare multiple directories with patches with the bush of branches. With bush of branches: git pull broken, git archive broken, git tag broken, git reset broken. It looks like the list can be continued. Yes, we can start building our own tools on top of git to do this, but I'd rather not. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: opensm: a bug in heavy sweep? - no LFT re-configuration
On 07:56 Tue 24 Jul , Eitan Zahavi wrote: On 20:59 Mon 23 Jul , Eitan Zahavi wrote: Hi Sasha, Hal, I think I have an idea: Since this is a specific switch that reported ChangeBit or Trap why can't we just qualify that there was no change in the switch setup? The ChangeBit seems to be good start point - then OpenSM will query all switch ports PortInfo anyway and if for all ports PortState is = INIT (and at least for one port it is = INIT), it means that this switch was rebooted/reinitialized. And for single port PortState drop to = INIT should indicate reinitialization. Seems correct? Yes. We could send PortInfo, SwitchInfo, SwitchInfo is queried at each light sweep, PortInfo's if ChangeBit is set. Guess we are ok with it even now. I will double check that... Well - even setting one port state to INIT did not cause the switch to be reconfigured. Seems the code does not enforce this condition yet. LFT, MFT, SL2VL, VLArb, PKey queries and make sure no change from previous state. Or we could simply enforce last state by sending it over again ... I think we could want to re-read PKey tables in order to preserve existing PKey indices and just to flush (overwrite with new settings) LFT, MFT, SL2VL, VLArb tables. Reasonable? Correct. Ok, I will prepare patches. I think about separate patches for switches and ports. Also likely MFT should be handled separately, since we don't do incremental update there yet. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Command specification of ca_name and ca_port
On 09:05 Tue 24 Jul , Ira Weiny wrote: But it is easy part - saquery renames are less intuitive :(. Probably just lower case? Or special query option (-q or -Q), so queries could be specified as -qP, -qC? I disagree with this because ~50% of the options are query's, it's primary purpose is to query, and most of the other options change the format of the output of the query. Therefore, I don't think a -q should be required for a query. I think that seems redundant. Perhaps just changing the current option to -c,-p, and adding -C and -P would be best. I know this might break some scripts out there, particularly mine, but I think it is the right thing to do if you really want consistency. Thoughts? -c,-p are fine for me too. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 07:52:03PM +0300, Michael S. Tsirkin wrote: [...] i have found that drawing a DAG with graphviz has been a big help in making sure that i organize the branches correctly... Ugh .. *that* sounds complicated. Looks like it's much simpler with current setup. compared to the rather sophisticated linux-kernel changesets that i see from you on this list -- it's child's play... compared to figuring out the list of options for ofed_scripts/configure just so we can _see_ the source we're running on our box -- it's a walk in the park... one of the goals of OFED 1.3 is to make access to the source easier. to do that, we will prob need to rid ourselves of patches... arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Hmm. Concider that yuou did all of the above, and then mail me that there's an update. Now I need to merge updates to multiple branches directly and git pull does not do this. It's a problem. A simple script can do this. You'll have to check out branches one by one, and do a pull. What if there's a conflict? I currently just do git reset --hard ORIG_HEAD and mail the maintainer to fix it up - but this won't work with the bush of branches approach. If there's a conflict, then you need a different patch. A single patch may work for all backports, or a fix may require different patches depending on the kernel version. As it stands now, there are patches that we apply that do not work and expect a subsequent patch to fix it up. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
one of the goals of OFED 1.3 is to make access to the source easier. to do that, we will prob need to rid ourselves of patches... I'm working on a rather simpler solution to this problem. Stay tuned. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Examples: What if there's a conflict? I currently do git reset, we'll If there's a conflict applying a patch, you reject it. I fail to see any issue here. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [ANNOUNCE] NFS-RDMA for OFED 1.2 G/A
For those interested in NFS-RDMA, OGC has created an install package based on the OFA 1.2 GA release. The package supports both SLES 10 and RHEL 5. You can download this package from http://www.opengridcomputing.com/nfs-rdma.html. Please let me know if you find any problems. Thanks, Tom ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
i'd _really_ like to see a list of the advantages of patches over branches. it's hard for me to know if i'm just missing something if the case is not laid out... Here's a short list off the top of my head - A single git pull merges any number of backport changes - A single git reset ORIG_HEAD recovers from a conflicting merge - A single tag tags all code for all kernels - On update from upstream, if there is a conflict between upstream code and and a patch it's easy to temporarily remote the patch, complete the merge, and go bugger the patch author - For recent kernels there are almost no patches. So an update from upstream for these kernels is free, with branches I will still need to update all branches. - Adding a fix which only affects common code is currently straight-forward: make a change, commit. With multiple branches every fix must be pulled into all branches. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
But the proposal here was to have a bush of branches, all of which need to be merged at the same time. It's possible that some would merge and some would fail, leaving me in an inconsistent state, and no easy way to get back to where I started. A fix could be applied to some kernels, but not others. In fact, if a patch works for kernel X Y, but has a conflict with kernel Z, then different patches are needed anyway. I don't see the requirement to merge everything or even apply a fix to all kernels at the same time. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Here's a short list off the top of my head - A single git pull merges any number of backport changes - A single git reset ORIG_HEAD recovers from a conflicting merge - A single tag tags all code for all kernels - On update from upstream, if there is a conflict between upstream code and and a patch it's easy to temporarily remote the patch, complete the merge, and go bugger the patch author - For recent kernels there are almost no patches. So an update from upstream for these kernels is free, with branches I will still need to update all branches. - Adding a fix which only affects common code is currently straight-forward: make a change, commit. With multiple branches every fix must be pulled into all branches. You seem to be overlooking the fact that you already require a script to check that things work for all kernels. Until you apply a series of patches to form a particular kernel, you don't know if a change that you pulled in caused a conflict. You still have the requirement to verify the fix on all kernels, and it still requires running a script that pushes/pops patches to create each tree. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback
On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *Hi Hal,* ** *The code to find duplicated GUIDs stem from real user cases where flawed * *burning procedure caused actual GUID duplications. There is nothing impossible. * No one said impossible; just a violation of what globally unique (GU from GUID) really means. It's largely because vendors allowed users to program non volatile RAM for GUIDs rather than a real manufacturing process for this which guarantees uniqueness that we are even discussing this aspect of it. *So it is really critical the the SM will be able to recognize this case and abort.* I agree with the detect part but not the abort part. Why can't it report these errors and continue on ? That seems better to me than aborting. -- Hal *It might be that for testing someone wants to use a loopback plug that cause the same * *port GUID appear on both sides of link - but it is better to require the user doing the test * *to set some flag than to miss such a situation in real life cluster.* ** *This requirement was written after many people wasted many hours trying to figure out what was going on.* *PLEASE DO NOT TAKE IT AWAY* ** *Eitan Zahavi*** Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -- *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, July 24, 2007 6:04 PM *To:* Eitan Zahavi *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] ] *Sent:* Tuesday, July 24, 2007 5:53 PM *To:* Eitan Zahavi *Cc:* OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik *Subject:* Re: OpenSM detection of duplicated GUIDs on loopback Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: *Hi Hal,* ** *What is this loopback connector used for?* *Does not seem to me like a very useful thing to do.* ** Perhaps not but no reason OpenSM can't handle this more gracefully. *Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check.* ** Why would a separate flag be needed ? *[EZ] Since I do not see any other solution for the SM to know it is really a loop back plug rather then two devices with same GUID connected back to back ... * Technically, this should only occur when looped back and not two devices with same GUID as GUID == globally unique and a duplication indicates a manufacturing issue. Anyhow, can't these be treated the same (and handled more gracefully) without an additional option/flag ? -- Hal -- Hal ** *Eitan Zahavi*** Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -- *From:* Hal Rosenstock [mailto:[EMAIL PROTECTED] *Sent: *Tuesday, July 24, 2007 5:31 PM *To:* OpenFabrics General *Cc:* Sasha Khapyorsky; Eitan Zahavi; Yevgeny Kliteynik *Subject:* OpenSM detection of duplicated GUIDs on loopback Hi, This is what starts off as a minor issue and I know it has been discussed it somewhat in the past: Putting a loopback connector on a (switch) link causes OpenSM to indicate duplicated GUID error 0D18 as follows: __osm_ni_rcv_set_links { ... /* When there are only two nodes with exact same guids (connected back to back) - the previous check for duplicated guid will not catch them. But the link will be from the port to itself... Enhanced Port 0 is an exception to this */ if ((osm_node_get_node_guid( p_node ) == p_ni_context-node_guid) (port_num == p_ni_context-port_num) (port_num != 0)) { osm_log( p_rcv-p_log, OSM_LOG_ERROR, __osm_ni_rcv_set_links: ERR 0D18: Duplicate GUID found by link from a port to itself: node 0x% PRIx64 , port number 0x%X\n, cl_ntoh64( osm_node_get_node_guid( p_node ) ), port_num ); ... So this occurs over and over and over and fills the log with the same spew. This should be improved IMO. Is this really a fatal condition ? Doesn't seem like it should be to me. Also, OpenSM can ride this out with -y (stay on fatal) but is that safe for this condition ? Seems like something like an extra loopback bit should be added to some port structure which should cause these links to be ignored. This bit would then be reset when the peer is now longer itself. Also, is there a relationship of this with the 12x/duplicated GUID code ? Thanks. -- Hal
[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback
Hi Hal, For many users such a critical failure (one the SM can not really do anything with) is better aborted then forgotten in some log file. Anyway's the -y flag lets you ignore it if you like. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 9:38 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, The code to find duplicated GUIDs stem from real user cases where flawed burning procedure caused actual GUID duplications. There is nothing impossible. No one said impossible; just a violation of what globally unique (GU from GUID) really means. It's largely because vendors allowed users to program non volatile RAM for GUIDs rather than a real manufacturing process for this which guarantees uniqueness that we are even discussing this aspect of it. So it is really critical the the SM will be able to recognize this case and abort. I agree with the detect part but not the abort part. Why can't it report these errors and continue on ? That seems better to me than aborting. -- Hal It might be that for testing someone wants to use a loopback plug that cause the same port GUID appear on both sides of link - but it is better to require the user doing the test to set some flag than to miss such a situation in real life cluster. This requirement was written after many people wasted many hours trying to figure out what was going on. PLEASE DO NOT TAKE IT AWAY Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] ] Sent: Tuesday, July 24, 2007 6:04 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: From: Hal Rosenstock [mailto:[EMAIL PROTECTED] ] Sent: Tuesday, July 24, 2007 5:53 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback Hi Eitan, On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, What is this loopback connector used for? Does not seem to me like a very useful thing to do. Perhaps not but no reason OpenSM can't handle this more gracefully. Anyway, if it is not a production environment we could add a debug mode (-d flag option) to ignore this check. Why would a separate flag be needed ? [EZ] Since I do not see any other solution for the SM to know it is really a loop back plug rather then two devices with same GUID connected back to back ... Technically, this should only occur when looped back and not two devices with same GUID as GUID == globally unique and a duplication indicates a manufacturing issue. Anyhow, can't these be treated the same (and handled more gracefully) without an additional option/flag ? -- Hal -- Hal
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Sean Hefty [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits But the proposal here was to have a bush of branches, all of which need to be merged at the same time. It's possible that some would merge and some would fail, leaving me in an inconsistent state, and no easy way to get back to where I started. A fix could be applied to some kernels, but not others. In fact, if a patch works for kernel X Y, but has a conflict with kernel Z, then different patches are needed anyway. I don't see the requirement to merge everything or even apply a fix to all kernels at the same time. This is typically component maintainer's job, not integrator's. As an integrator, I want to pull but if the merge fails, reset everything back to the original state, and let the maintainer know. -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH 00/10] Implement batching skb API
KK, On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote: J Hadi Salim [EMAIL PROTECTED] wrote on 07/23/2007 06:02:01 PM: Actually you have not sent netperf results with prep and without prep. My results were based on pktgen (which i explained as testing the driver). I think depending on netperf without further analysis is simplistic. It was like me doing forwarding tests on these patches. So _which_ non-LLTX driver doesnt do that? ;- I have no idea since I haven't looked at all drivers. Can you tell which all non-LLTX drivers does that ? I stated this as the sole criterea. The few i have peeked at all do it. I also think the e1000 should be converted to be non-LLTX. The rest of netdev is screaming to kill LLTX. tun driver doesnt use it either - but i doubt that makes it bloat Adding extra code that is currently not usable (esp from a submission point) is bloat. So far i have converted 3 drivers, 1 of them doesnt use it. Two more driver conversions are on the way, they will both use it. How is this bloat again? A few emails back you said if only IPOIB can use batching then thats good enough justification. You waltz in, have the luxury of looking at my code, presentations, many discussions with me etc ... luxury ? I had implemented the entire thing even before knowing that you are working on something similar! and I had sent the first proposal to netdev, I saw your patch at the end of may (or at least 2 weeks after you said it existed). That patch has very little resemblance to what you just posted conceptwise or codewise. I could post it if you would give me permission. *after* which you told that you have your own code and presentations (which I had never seen earlier - I joined netdev a few months back, earlier I was working on RDMA, Infiniband as you know). I am gonna assume you didnt know of my work - which i have been making public for about 3 years. Infact i talked about this topic when i visited your office in 2006 on a day you were not present, so it is plausible you didnt hear of it. And it didn't give me any great ideas either, remember I had posted results for E1000 at the time of sending the proposals. In mid-June you sent me a series of patches which included anything from changing variable names to combining qdisc_restart and about everything i referred to as being cosmetic differences in your posted patches. I took two of those and incorporated them in. One was an XXX in my code already to allocate the dev-blist (Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63). The other one was a mechanical removal of the blist being passed (Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757). Some of the others i asked you to defer. For example, the reason i gave you for not merging any qdisc_restart_combine changes is because i was waiting for Dave to swallow the qdisc_restart changes i made; otherwise maintainance becomes extremely painful for me. Sridhar actually provided a lot more valuable comments and fixes but has not planted a flag on behalf of the queen of spain like you did. However I do give credit in my proposal to you for what ideas that your provided (without actual code), and the same I did for other people who did the same, like Dave, Sridhar. BTW, you too had discussions with me, and I sent some patches to improve your code too, I incorporated two of your patches and asked for deferal of others. These patches have now shown up in what you claim as the difference. I just call them cosmetic difference not to downplay the importance of having an ethtool interface but because they do not make batching perform any better. The real differences are those two items. I am suprised you havent cannibalized those changes as well. I thought you renamed them to something else; according to your posting: This patch will work with drivers updated by Jamal, Matt Michael Chan with minor modifications - rename xmit_win to xmit_slots rename batch handler. Or maybe thats a future plan you have in mind? so it looks like a two way street to me (and that is how open source works and should). Open source is a lot more transparent than that. You posted a question, which was part of your research. I responded and told you i have patches; you asked me for them and i promptly ported them from pre-2.6.18 to the latest kernel at the time. The nature of this batching work is one of performance. So numbers are important. If you had some strong disagreements on something in the architecture, then it would be of great value to explain it in a technical detail - and more importantly to provide some numbers to say why it is a bad idea. You get numbers by running some tests. You did none of the above. Your effort has been to produce your patch for whatever reasons. This would not have been problematic to me if it actually was based within reasons of optimization because the end goal would have been achieved. I have deleted the rest of the email
[ofa-general] [PATCH] amso1100: QP init bug in amso driver
Roland: The guys at UNH found this and fixed it. I'm surprised no one has hit this before. I guess it only breaks when the refcount on the QP is non-zero. Initialize the wait_queue_head_t in the c2_qp structure. Signed-off-by: Ethan Burns [EMAIL PROTECTED] Acked-by: Tom Tucker [EMAIL PROTECTED] --- drivers/infiniband/hw/amso1100/c2_qp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 420c138..01d0786 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -506,6 +506,7 @@ int c2_alloc_qp(struct c2_dev *c2dev, qp-send_sgl_depth = qp_attrs-cap.max_send_sge; qp-rdma_write_sgl_depth = qp_attrs-cap.max_send_sge; qp-recv_sgl_depth = qp_attrs-cap.max_recv_sge; + init_waitqueue_head(qp-wait); /* Initialize the SQ MQ */ q_size = be32_to_cpu(reply-sq_depth); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] RE: OpenSM detection of duplicated GUIDs on loopback
Maybe avoid the log if -y is provided? Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 9:56 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, For many users such a critical failure (one the SM can not really do anything with) is better aborted then forgotten in some log file. Anyway's the -y flag lets you ignore it if you like. So everything else continues to work fine with -y ? In which case, I'm not sure which is the better default. Users certainly won't like their logs filling up with continuous duplicated GUID messages. The log spew should be cleaned up IMO. -- Hal Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] ] Sent: Tuesday, July 24, 2007 9:38 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, The code to find duplicated GUIDs stem from real user cases where flawed burning procedure caused actual GUID duplications. There is nothing impossible. No one said impossible; just a violation of what globally unique (GU from GUID) really means. It's largely because vendors allowed users to program non volatile RAM for GUIDs rather than a real manufacturing process for this which guarantees uniqueness that we are even discussing this aspect of it. So it is really critical the the SM will be able to recognize this case and abort. I agree with the detect part but not the abort part. Why can't it report these errors and continue on ? That seems better to me than aborting. -- Hal It might be that for testing someone wants to use a loopback plug that cause the same port GUID appear on both sides of link - but it is better to require the user doing the test to set some flag than to miss such a situation in real life cluster. This requirement was written after many people wasted many hours trying to figure out what was going on. PLEASE DO NOT TAKE IT AWAY Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL From: Hal Rosenstock [mailto:[EMAIL PROTECTED] ] Sent: Tuesday, July 24, 2007 6:04 PM To: Eitan Zahavi Cc: OpenFabrics General; Sasha Khapyorsky; Yevgeny Kliteynik Subject: Re: OpenSM detection of duplicated GUIDs on loopback On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: From: Hal Rosenstock [mailto:[EMAIL PROTECTED] ] Sent: Tuesday, July 24, 2007 5:53 PM To: Eitan Zahavi Cc: OpenFabrics
[ofa-general] [PATCH] opensm: detect port external reset and flush cached tables
This detects port external reset by validating PortState == INIT, and when detected flushes cached port related tables - re-reads pkey table and drops (overwrites) SL2VL and VLArb tables. Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED] --- opensm/include/opensm/osm_port.h |5 + opensm/opensm/osm_port.c |1 + opensm/opensm/osm_port_info_rcv.c |9 - opensm/opensm/osm_qos.c |9 + 4 files changed, 19 insertions(+), 5 deletions(-) diff --git a/opensm/include/opensm/osm_port.h b/opensm/include/opensm/osm_port.h index f6c40c7..44323ab 100644 --- a/opensm/include/opensm/osm_port.h +++ b/opensm/include/opensm/osm_port.h @@ -118,6 +118,7 @@ typedef struct _osm_physp struct _osm_physp *p_remote_physp; boolean_t healthy; uint8_t vl_high_limit; + unsignedneed_update; osm_dr_path_t dr_path; osm_pkey_tbl_t pkeys; ib_vl_arb_table_t vl_arb[4]; @@ -157,6 +158,10 @@ typedef struct _osm_physp * PortInfo:VLHighLimit value which installed by QoS manager * and should be uploaded to port's PortInfo * +* need_update +* When set indicates that port was probably reset and port +* related tables (PKey, SL2VL, VLArb) require refreshing. +* * dr_path * The directed route path to this port. * diff --git a/opensm/opensm/osm_port.c b/opensm/opensm/osm_port.c index e03e316..11cc5ca 100644 --- a/opensm/opensm/osm_port.c +++ b/opensm/opensm/osm_port.c @@ -118,6 +118,7 @@ osm_physp_init( p_physp-port_guid = port_guid; p_physp-port_num = port_num; p_physp-healthy = TRUE; + p_physp-need_update = 2; p_physp-p_node = (struct _osm_node*)p_node; osm_dr_path_init( diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c index 6fe2d1d..0528e38 100644 --- a/opensm/opensm/osm_port_info_rcv.c +++ b/opensm/opensm/osm_port_info_rcv.c @@ -801,6 +801,12 @@ osm_pi_rcv_process( p_rcv-p_subn-master_sm_base_lid = p_pi-master_sm_base_lid; } +/* if port just inited or reached INIT state (external reset) + request update for port related tables */ +p_physp-need_update = + (ib_port_info_get_port_state(p_pi) == IB_LINK_INIT || + p_physp-need_update 1 ) ? 1 : 0; + switch( osm_node_get_type( p_node ) ) { case IB_NODE_TYPE_CA: @@ -824,7 +830,8 @@ osm_pi_rcv_process( /* Get the tables on the physp. */ -__osm_pi_rcv_get_pkey_slvl_vla_tables( p_rcv, p_node, p_physp ); +if (p_physp-need_update) + __osm_pi_rcv_get_pkey_slvl_vla_tables( p_rcv, p_node, p_physp ); } diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index 17b7e3a..596b6d4 100644 --- a/opensm/opensm/osm_qos.c +++ b/opensm/opensm/osm_qos.c @@ -87,8 +87,9 @@ static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req, for (i = 0; i block_length; i++) block.vl_entry[i].vl = vl_mask; - if (!memcmp(p-vl_arb[block_num], block, -block_length * sizeof(block.vl_entry[0]))) + if (!p-need_update + !memcmp(p-vl_arb[block_num], block, + block_length * sizeof(block.vl_entry[0]))) return IB_SUCCESS; context.vla_context.node_guid = @@ -170,8 +171,8 @@ static ib_api_status_t sl2vl_update_table(osm_req_t * p_req, tbl.raw_vl_by_sl[i] = (vl1 4 ) | vl2 ; } - p_tbl = osm_physp_get_slvl_tbl(p, in_port); - if (p_tbl !memcmp(p_tbl, tbl, sizeof(tbl))) + if (!p-need_update (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) + !memcmp(p_tbl, tbl, sizeof(tbl))) return IB_SUCCESS; context.slvl_context.node_guid = osm_node_get_node_guid(p_node); -- 1.5.3.rc2.29.gc4640f ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
hi michael, ... On Tue, Jul 24, 2007 at 08:52:20PM +0300, Michael S. Tsirkin wrote: i'd _really_ like to see a list of the advantages of patches over branches. it's hard for me to know if i'm just missing something if the case is not laid out... thanks for the list... Here's a short list off the top of my head - A single git pull merges any number of backport changes ok, you can run one command instead of a 4-line script. hmm, i guess you could say this is a very slight advantage to using patches... - A single git reset ORIG_HEAD recovers from a conflicting merge handling conflicts is a big part of a maintainer's job! the _vast_ majority of the time i bet you already know how to do the merge. if you don't, then only the backport branches which haven't merged yet are stuck and you can pick up where you left off (which is how i do it now). but if you're stuck in some strange intermediate state with some patches pushed and some yet to push in the configure script, i could see how you'd want to punt. but, someone is doing this work, and that someone almost certainly has a difficult time reproducing and developing a stack of patches.. if, though, you must have a pristine environment, this is easily solved by using an intermediate repo: git clone -s canonical repo run the pull any conflicts, dump this guy, otherwise, pull this in i bet this is very similar time-wise to running the merge, then the ofed_scripts/configure over all supported branches. merges in git are _fast_... - A single tag tags all code for all kernels store commit ids in a file and tag that? - On update from upstream, if there is a conflict between upstream code and and a patch it's easy to temporarily remote the patch, complete the merge, and go bugger the patch author i think this is easier with the backport branches, see git clone -s above. or, just fixup the error. the reason you have to bugger the author may be that you don't have the tools necessary to actually fix up the patch -- but you can prob bet the author doesn't like to fixup patches in quilt any more than you do... - For recent kernels there are almost no patches. So an update from upstream for these kernels is free, with branches I will still need to update all branches. i can say from a couple months experience that upstream merges are free using backport branches. running the script to reflow the branches is _far_ less complex than the configure script, has fewer dependencies and is much simpler to maintain and understand. also, if the upstream changes touch code that conflicts with a backport patch, you get to fix the problem as it happens in a much more comfortable environment (i.e. you don't need quilt)... - Adding a fix which only affects common code is currently straight-forward: make a change, commit. With multiple branches every fix must be pulled into all branches. this use case is actually a good reason to use backport branches. with the patches, you still need to fan out the changes to all the backport branches. but, in general, you don't. so you end up making a change and _not realizing_ that it broke some random backport patch. by reflowing after every change, you get to see it break right there in front of you and you're way more likely to know how to fix it. you could do this with the build script too, but that would require a 4 line script -- and you'd need to switch over to using quilt or some other patch queue based system (yuck!)... all your points above you made from the POV of the maintainer. but, what about the _users_ of the repo. as long as changes are kept as patches, trying to figure out what has changed with your latest round of backports comes down to recreating a tree and pulling from that. it's extremely fragile and error prone. there is only one maintainer, but many developers. if we can make their lives significantly easier then it should be a net gain... the backport branches make merging upstream changes easier. they make merging developer changes easier. they make finding and fixing backport conflicts easier. they make viewing and navigating changes easier. but, you need to use very short scripts (which i'm happy to create and maintain) to tag and pull -- doesn't seem like much of a price to pay to me... arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: OpenSM detection of duplicated GUIDs on loopback
On 23:25 Tue 24 Jul , Eitan Zahavi wrote: On 7/24/07, Eitan Zahavi [EMAIL PROTECTED] wrote: Maybe avoid the log if -y is provided? That avoids the spew but the duplicated GUID is important to know so IMO something in the middle is needed where duplicated GUIDs are logged but not continually the same ones. [EZ] OK so in -y mode only we track which ones were reported and do not repeat the log? And how port moving problem should be solved? We cannot ask an user to run OpenSM with '-y' if in her/his plans to reconnect some ports in a future and just decrease logging. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH 00/10] Implement batching skb API
Jamal, This is silly. I am not responding to this type of presumptuous and insulting mails. Regards, - KK J Hadi Salim [EMAIL PROTECTED] wrote on 07/25/2007 12:58:20 AM: KK, On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote: J Hadi Salim [EMAIL PROTECTED] wrote on 07/23/2007 06:02:01 PM: Actually you have not sent netperf results with prep and without prep. My results were based on pktgen (which i explained as testing the driver). I think depending on netperf without further analysis is simplistic. It was like me doing forwarding tests on these patches. So _which_ non-LLTX driver doesnt do that? ;- I have no idea since I haven't looked at all drivers. Can you tell which all non-LLTX drivers does that ? I stated this as the sole criterea. The few i have peeked at all do it. I also think the e1000 should be converted to be non-LLTX. The rest of netdev is screaming to kill LLTX. tun driver doesnt use it either - but i doubt that makes it bloat Adding extra code that is currently not usable (esp from a submission point) is bloat. So far i have converted 3 drivers, 1 of them doesnt use it. Two more driver conversions are on the way, they will both use it. How is this bloat again? A few emails back you said if only IPOIB can use batching then thats good enough justification. You waltz in, have the luxury of looking at my code, presentations, many discussions with me etc ... luxury ? I had implemented the entire thing even before knowing that you are working on something similar! and I had sent the first proposal to netdev, I saw your patch at the end of may (or at least 2 weeks after you said it existed). That patch has very little resemblance to what you just posted conceptwise or codewise. I could post it if you would give me permission. *after* which you told that you have your own code and presentations (which I had never seen earlier - I joined netdev a few months back, earlier I was working on RDMA, Infiniband as you know). I am gonna assume you didnt know of my work - which i have been making public for about 3 years. Infact i talked about this topic when i visited your office in 2006 on a day you were not present, so it is plausible you didnt hear of it. And it didn't give me any great ideas either, remember I had posted results for E1000 at the time of sending the proposals. In mid-June you sent me a series of patches which included anything from changing variable names to combining qdisc_restart and about everything i referred to as being cosmetic differences in your posted patches. I took two of those and incorporated them in. One was an XXX in my code already to allocate the dev-blist (Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63). The other one was a mechanical removal of the blist being passed (Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757). Some of the others i asked you to defer. For example, the reason i gave you for not merging any qdisc_restart_combine changes is because i was waiting for Dave to swallow the qdisc_restart changes i made; otherwise maintainance becomes extremely painful for me. Sridhar actually provided a lot more valuable comments and fixes but has not planted a flag on behalf of the queen of spain like you did. However I do give credit in my proposal to you for what ideas that your provided (without actual code), and the same I did for other people who did the same, like Dave, Sridhar. BTW, you too had discussions with me, and I sent some patches to improve your code too, I incorporated two of your patches and asked for deferal of others. These patches have now shown up in what you claim as the difference. I just call them cosmetic difference not to downplay the importance of having an ethtool interface but because they do not make batching perform any better. The real differences are those two items. I am suprised you havent cannibalized those changes as well. I thought you renamed them to something else; according to your posting: This patch will work with drivers updated by Jamal, Matt Michael Chan with minor modifications - rename xmit_win to xmit_slots rename batch handler. Or maybe thats a future plan you have in mind? so it looks like a two way street to me (and that is how open source works and should). Open source is a lot more transparent than that. You posted a question, which was part of your research. I responded and told you i have patches; you asked me for them and i promptly ported them from pre-2.6.18 to the latest kernel at the time. The nature of this batching work is one of performance. So numbers are important. If you had some strong disagreements on something in the architecture, then it would be of great value to explain it in a technical detail - and more importantly to provide some numbers to say why it is a bad idea. You get numbers by running some tests. You did none of the above. Your
[ofa-general] nightly osm_sim report 2007-07-25:normal completion
OSM Simulation Regression Summary [Generated mail - please do NOT reply] OpenSM rev = Thu_Jul_12_11:56:08_2007 [de69204d60071532833b0cdd3baa5e2386dc2c73] ibutils rev = Tue_Mar_13_14:36:32_2007 [80aaff94f0eb65117db39b9db7d609ffdcc055de] Total=520 Pass=520 Fail=0 Pass: 39 Stability IS1-16.topo 39 Pkey IS1-16.topo 39 OsmTest IS1-16.topo 39 OsmStress IS1-16.topo 39 Multicast IS1-16.topo 39 LidMgr IS1-16.topo 13 Stability IS3-loop.topo 13 Stability IS3-128.topo 13 Pkey IS3-128.topo 13 OsmTest IS3-loop.topo 13 OsmTest IS3-128.topo 13 OsmStress IS3-128.topo 13 Multicast IS3-loop.topo 13 Multicast IS3-128.topo 13 LidMgr IS3-128.topo 13 FatTree merge-roots-4-ary-2-tree.topo 13 FatTree merge-root-4-ary-3-tree.topo 13 FatTree gnu-stallion-64.topo 13 FatTree blend-4-ary-2-tree.topo 13 FatTree RhinoDDR.topo 13 FatTree FullGnu.topo 13 FatTree 4-ary-2-tree.topo 13 FatTree 2-ary-4-tree.topo 13 FatTree 12-node-spaced.topo 13 FTreeFail 4-ary-2-tree-missing-sw-link.topo 13 FTreeFail 4-ary-2-tree-links-at-same-rank-2.topo 13 FTreeFail 4-ary-2-tree-links-at-same-rank-1.topo 13 FTreeFail 4-ary-2-tree-diff-num-pgroups.topo Failures: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH 02/12 -Rev2] Changes to netdevice.h
Hi Patrick, Krishna Kumar2/India/IBM wrote on 07/23/2007 08:27:53 AM: Hi Patrick, Patrick McHardy [EMAIL PROTECTED] wrote on 07/22/2007 10:36:51 PM: Krishna Kumar wrote: @@ -472,6 +474,9 @@ struct net_device void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); + int (*hard_start_xmit_batch) (struct net_device + *dev); + Os this function really needed? Can't you just call hard_start_xmit with a NULL skb and have the driver use dev-blist? Probably not. I will see how to do it this way and get back to you. I think this is a good idea and makes code everywhere simpler. I will try this change and test to make sure it doesn't have any negative impact. Will mostly send out rev3 tomorrow. Thanks, - KK ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general