[ewg] [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR
Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an asynchronous error) might be observed on this QP, so we have to initialize all RX fields beforehand. This fixes bug https://bugs.openfabrics.org/show_bug.cgi?id=662 Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] --- Quoting Woodruff, Robert J [EMAIL PROTECTED]: Subject: RE: [ofa-general] crash in ipoib Sean wrote, And here's a version with error handling fixed. Sean, does this solve your crash? We've been running this patch since yesterday and haven't seen any crashes. We'll continue testing this over the week-end. - Sean This looks like it fixed the panic. Should we try to put out a new RC with this latest ipoib fix ? I really think we need it in the release. If we could get another RC out today, that would only delay the release by a couple of more days and we could release on next Friday rather than wed. and still give people a week to test the final RC. woody OK, the following patch has been added to OFED 1.2. Roland, please consider this bugfix for 2.6.22. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..c64249f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return -ENOMEM; p-dev = dev; p-id = cm_id; + cm_id-context = p; + p-state = IPOIB_CM_RX_LIVE; + p-jiffies = jiffies; + INIT_LIST_HEAD(p-list); + p-qp = ipoib_cm_create_rx_qp(dev, p); if (IS_ERR(p-qp)) { ret = PTR_ERR(p-qp); @@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; + spin_lock_irq(priv-lock); + queue_delayed_work(ipoib_workqueue, + priv-cm.stale_task, IPOIB_CM_RX_DELAY); + /* Add this entry to passive ids list head, but do not re-add it +* if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */ + p-jiffies = jiffies; + if (p-state == IPOIB_CM_RX_LIVE) + list_move(p-list, priv-cm.passive_ids); + spin_unlock_irq(priv-lock); + ret = ipoib_cm_send_rep(dev, cm_id, p-qp, event-param.req_rcvd, psn); if (ret) { ipoib_warn(priv, failed to send REP: %d\n, ret); - goto err_rep; + if (ib_modify_qp(p-qp, ipoib_cm_err_attr, IB_QP_STATE)) + ipoib_warn(priv, unable to move qp to error state\n); } - - cm_id-context = p; - p-jiffies = jiffies; - p-state = IPOIB_CM_RX_LIVE; - spin_lock_irq(priv-lock); - if (list_empty(priv-cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - priv-cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(p-list, priv-cm.passive_ids); - spin_unlock_irq(priv-lock); return 0; -err_rep: err_modify: ib_destroy_qp(p-qp); err_qp: -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Issue with IPoIB-CM being enabled at boot
Quoting Jeremy Brown [EMAIL PROTECTED]: Subject: Re: Issue with IPoIB-CM being enabled at boot I apologize for replying to myself, but I just set up two em64t systems with Mellanox HCAs, Fedora 4, and a fresh build and installation of OFED 1.2, and the IPoIB interfaces came up in datagram mode, despite the fact that IPoIB is enabled and configured to come up in connected mode. Does it help if you do #/etc/init.d/openibd restart ? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC OFED-1.3 installation
Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: RFC OFED-1.3 installation On Tue, 2007-07-17 at 19:27 +0300, Michael S. Tsirkin wrote: Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: RFC OFED-1.3 installation On Tue, 2007-07-17 at 18:25 +0300, Michael S. Tsirkin wrote: Let me give an example. In OFED 1.0, you shipped dapl version 1.2. In OFED 1.1, you also shipped dapl version 1.2. However, code inspection shows that between OFED 1.0 and OFED 1.1, dapl did in fact change (not a lot, but anything is enough). So, between OFED 1.0 and OFED 1.1, you have two different versions of dapl, but with exactly the same version number. A person can't tell them apart. Yes, this sure looks like a problem. I think that versioning needs to be addressed at the package level, not at OFED level though. Right? Versioning needs to be addressed at both levels. You need versions of software to start with, but then you still need releases of packages to differentiate between different builds of a specific version of software. Why would we want to have different builds of a specific version of software for a specific OS? Could you give an example pls? It's how you integrate needed patches immediately while waiting on the next release of the software. OK. ... You also bump the release number of the package any time you make changes to the spec file and rebuild. Since we have spec files as part of package, this will be really the same as the previous case, right? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC OFED-1.3 installation
Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: RFC OFED-1.3 installation On Tue, 2007-07-17 at 19:45 +0300, Michael S. Tsirkin wrote: Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: RFC OFED-1.3 installation On Tue, 2007-07-17 at 19:27 +0300, Michael S. Tsirkin wrote: Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: RFC OFED-1.3 installation On Tue, 2007-07-17 at 18:25 +0300, Michael S. Tsirkin wrote: Let me give an example. In OFED 1.0, you shipped dapl version 1.2. In OFED 1.1, you also shipped dapl version 1.2. However, code inspection shows that between OFED 1.0 and OFED 1.1, dapl did in fact change (not a lot, but anything is enough). So, between OFED 1.0 and OFED 1.1, you have two different versions of dapl, but with exactly the same version number. A person can't tell them apart. Yes, this sure looks like a problem. I think that versioning needs to be addressed at the package level, not at OFED level though. Right? Versioning needs to be addressed at both levels. You need versions of software to start with, but then you still need releases of packages to differentiate between different builds of a specific version of software. Why would we want to have different builds of a specific version of software for a specific OS? Could you give an example pls? It's how you integrate needed patches immediately while waiting on the next release of the software. OK. ... You also bump the release number of the package any time you make changes to the spec file and rebuild. Since we have spec files as part of package, this will be really the same as the previous case, right? Depends. Right now the spec file gets its version out of the configure stuff. That version only updates when you update the version of the software itself. It doesn't increment on each change to the source repo, only on the major updates when you would release a new tarball anyway. Package versioning is, by necessity, finer grained than source repo versioning. You don't release a new dapl tarball just because you updated some comments to remove a typo. But you *do* update rpm versions on every single change, at least if you are going to distribute the rpm. Look, rpms are just like versioned tarballs. Once they go out in the wild, that particular name-version-release combination is FROZEN. It really looks like this is a work around for when you want to apply a patch without going through maintainer. The way OFED release process works, we really don't do releases all that often, and when we do, we can coordinate with the maintainer. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC OFED-1.3 installation
There are lots of things that we as a distributor have to care about that upstream generally does not. The spec file and patches are how we solve our customer's problems. They are what make a stable distribution, as opposed to a bleeding edge, must always update to latest upstream version to fix any problem system, a reality. It's the difference between RHEL and Fedora. I think I am getting it - you want to release a patched version of some OFED library without going through openfabrics? OK. So I imagine that's when you would increment the rpm-specific version number. But I can't see why would an OFED release want to play with these. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC OFED-1.3 installation
So you need to be able to tell the difference between a customer running libibverbs-1.0.4 from OFED-1.3-beta1 and libibverbs-1.0.4 from OFED-1.3 final. I don't really think we want customers to run beta code, or intend to support such configurations. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jun 12, 2007 at 11:41:08AM +0300, Michael S. Tsirkin wrote: For whom it may concern, I have created an ofed git tree updated with kernel bits from 2.6.22-rc4, and put that up at git://git.openfabrics.org/~mst/ofed_kernel.git [...] In particular, there were a ton of ipath patches that it seems were for the most part applied. Qlogic maintainers, please help double check that I did not miss something of value. thanks for setting this up, i'm still looking at the diffs to make sure things got setup correctly for the ipath stuff... i have found it difficult to navigate the source having to run: ./ofed_scripts/configure --kernel-version=2.6.xxx --without-quilt everytime to check against our tree. so, rather than spending the better part of the afternoon running these scripts by hand, i created a shell script to populate a bunch of branches with the backports in each branch. at qlogic we now keep the backports as branches in our git tree and this, i find, is much easier to handle. because: * viewing and navigating backport source becomes _much_ easier. * merges are easier -- patches are much more fragile than branches. * comparisons are easier -- checking for differences between backports and between a backport and the canonical source is faster and more convenient... * changesets are readable. trying to decipher diffs to patches is medically proven to take months, if not years, off your life. Sigh. I wish it were possible to do everything through addons tricks. I see the advantages of the bush of branches - for example it's possible to add a backport patch to a recent kernel, and then merge this into other kernel branches. But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... Can you publish the scripts and/or the tree? I think we can start by just running the scripts nightly, making it possible for people to view backport history with gitview. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. who uses git this way? but tagging each branch separately is indeed a PITA... This is just one problem. For example, git pull can only merge one branch at a time. anyway, what do you think? is there anyway i could convince you to dump the backport patches and put all the backports in branches? i'm willing to do the legwork if you see value... can you publish the scripts and/or the tree? i think we can start by just running the scripts nightly, making it possible for people to view backport history with gitview. i've attached the script that i'm using to compare the trees, but it's a total hack. it doesn't keep the patch history. that would not be too hard to do i guess -- if there's interest... to run the script: cp attached files here... $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel $ cd ofed_kernel $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done now you'll have a bunch of backport-2.6.xxx branches... So, would you like to have this script run nightly on ofed trees? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:32:28PM +0300, Michael S. Tsirkin wrote: [...] For example, git pull can only merge one branch at a time. how is this a problem? the way i use git, i use a script to reflow the changes into the dependent branches. over the last few months, anyway, it has worked fine... Precisely because no one developed on these branches, so you are re-generating themfrom patches - not a problem, but as you point out not too useful either. well, no, i _have_ been doing development on the local branches in our internal repo. i also merge in changes that you make to the ofed repo to our internal backport branches. the script i posted is just so that i can more easily compare our internal branches to the ofed backport branches. How do you do the merging? If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. yes, i have to merge them one at a time. i still don't see how this is a problem. backport changes can be pulled in and the changes from upstream can be merged in as well. i haven't had a problem with this so far. can you be more specific about what you expect will fail? Well, as distro maintainers we need to merge a lot, from different people. We'll have to write all kind of scripts to do it instead of a plain git pull. And, I expect almost all git operations will have to be wrapped in a script in some way, to operate on a bush of branches. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote: Quoting Arthur Jones [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits hi michael, ... On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote: [...] But I also see a serious problem with addressing: basically git tracks content. It's not designed to track a bush of branches taken together. For example, take tagging: tag namespace is global, so you can not have the same tag point at multiple branches at the same time. agreed. however, the way we use git, with the location of the git DB as the tag, it's not really a problem in practice. who uses git this way? i do. but tagging each branch separately is indeed a PITA... This is just one problem. For example, git pull can only merge one branch at a time. how is this a problem? the way i use git, i use a script to reflow the changes into the dependent branches. over the last few months, anyway, it has worked fine... Precisely because no one developed on these branches, so you are re-generating themfrom patches - not a problem, but as you point out not too useful either. If people start developing on these branches, then eventually you will need to merge them - and git only merges them one at a time. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Because you only have your driver to maintain. no, i have to maintain quite a few of the ofed backport branches as well for our release. if i started getting pull requests from people with changes to 15 backport branches in one go, i'd probably want to script it... Yea. Happens all the time here: when component maintainer makes a change, it will typically affect all backports or none. i have found that drawing a DAG with graphviz has been a big help in making sure that i organize the branches correctly... Ugh .. *that* sounds complicated. Looks like it's much simpler with current setup. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
i'd _really_ like to see a list of the advantages of patches over branches. it's hard for me to know if i'm just missing something if the case is not laid out... Here's a short list off the top of my head - A single git pull merges any number of backport changes - A single git reset ORIG_HEAD recovers from a conflicting merge - A single tag tags all code for all kernels - On update from upstream, if there is a conflict between upstream code and and a patch it's easy to temporarily remote the patch, complete the merge, and go bugger the patch author - For recent kernels there are almost no patches. So an update from upstream for these kernels is free, with branches I will still need to update all branches. - Adding a fix which only affects common code is currently straight-forward: make a change, commit. With multiple branches every fix must be pulled into all branches. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Quoting Sean Hefty [EMAIL PROTECTED]: Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits Here's a short list off the top of my head - A single git pull merges any number of backport changes - A single git reset ORIG_HEAD recovers from a conflicting merge - A single tag tags all code for all kernels - On update from upstream, if there is a conflict between upstream code and and a patch it's easy to temporarily remote the patch, complete the merge, and go bugger the patch author - For recent kernels there are almost no patches. So an update from upstream for these kernels is free, with branches I will still need to update all branches. - Adding a fix which only affects common code is currently straight-forward: make a change, commit. With multiple branches every fix must be pulled into all branches. You seem to be overlooking the fact that you already require a script to check that things work for all kernels. Until you apply a series of patches to form a particular kernel, you don't know if a change that you pulled in caused a conflict. You still have the requirement to verify the fix on all kernels, and it still requires running a script that pushes/pops patches to create each tree. Yes. But I find it preferable to manage history with full power of native git tools, where a single hash identifies a revision, and limit the scope of the scripts to the build process. This, as opposed to an elaborate methodology that is based on naming conventions, and requires use of scripts to do basic tasks such as tagging, history rewriting, etc. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
- A single git reset ORIG_HEAD recovers from a conflicting merge handling conflicts is a big part of a maintainer's job! Because you are a driver maintainer. That's what's different here from regular merge. Please understand: we have upstream code and we have changes against it. Upstream code is golden. If some patch conflicts with it, it is always this patch that needs to be fixed. And I want to ability to bounce that job to patch author - I simply do not know enough about e.g. ehca. also, if the upstream changes touch code that conflicts with a backport patch, you get to fix the problem as it happens That's exactly the thing that I do not want to do. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality
Quoting Stefan Roscher [EMAIL PROTECTED]: Subject: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality Signed-off-by: Stefan Roscher [EMAIL PROTECTED] --- backport_ehca_2_rhel45_umap.patch | 850 ++ 1 files changed, 850 insertions(+) Guys, I have updated the ofed_kernel (destined for OFED 1.3) kernel tree to 2.6.23-rc1, and this patch no longer applies. The conflicts aren't trivial (e.g. there's been ABI change). I moved it to kernel_patches/attic for now. Could you please take a look and update the patch for that tree? The updated code is here: git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel I expect Vlad'll pull it soon, too. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] add_open_iscsi_h.patch
Erez, add_open_iscsi_h currently does: -#include scsi/iscsi_if.h +#include iscsi_if.h why is ths bit needed? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: add_open_iscsi_h.patch
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: add_open_iscsi_h.patch Michael S. Tsirkin wrote: Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: add_open_iscsi_h.patch Michael S. Tsirkin wrote: Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: add_open_iscsi_h.patch Michael S. Tsirkin wrote: Erez, add_open_iscsi_h currently does: -#include scsi/iscsi_if.h +#include iscsi_if.h why is ths bit needed? Strange. I remember that I couldn't build OFED 1.2 without it in the past. I tried to rebuild it without this now, and it compiles successfully, so let's remove that code. OK, I killed these patches completely and things still build fine. Vlad, please pull my tree into ofed_kernel. Yes, it also works for me. I guess that these are all leftovers. Deleted. Hmm. Do we want to kill them in 1.2.c too? Yes (why not?) Donnu. It's in bugfix-only mode after all. You decide. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
also, if the upstream changes touch code that conflicts with a backport patch, you get to fix the problem as it happens That's exactly the thing that I do not want to do. you don't want to know about a problem a patch until days or weeks later when the auto build keeps failing and you don't know why? it is easy to catch many problems _before_ the build check fails... I don't work this way. I just just apply all patches before pushing out. And I see *immediately* the patch that conflicts - unlike merge conflict where I will know which file conflicts but not which change created the conflict. And if a patch conflicts with upstream code, an option to move the patch aside and defer the merge decision to patch author is very important to me: this just happened with ehca backport and update to 2.6.23-rc1. I do not want to delay update to 2.6.23-rc1 until IBM can be bothered to update their backport. Yes, this means that the specific module won't build on a specific kernel until the conflict is resolved. But there are multiple conflicts and each needs to be resolved by another person. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality Hi Michael, Below is the version without conflicts. And it should compile. Seems to apply fine. I pushed it out. Vlad, can you take it pls? As soon as the build scripts are ready, I'll test the whole backport. What kind of scripts are you waiting for? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RFC: SRC API
Hello! Here is an API proposal for support of the SRC (scalable reliable connected) protocol extension in libibverbs. This adds APIs to: - manage SRC domains - share SRC domains between processes, by means of creating a 1:1 association between an SRC domain and a file. Notes: - The file is specified by means of a file descriptor, this makes it possible for the user to manage file creation/deletion in the most flexible manner (e.g. tmpfile can be used). - I envision implementing this sharing mechanism in kernel by means of a per-device tree, with inode as a key and domain object as a value. Please comment. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] --- diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index acc1b82..503f201 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -370,6 +370,11 @@ struct ibv_ah_attr { uint8_t port_num; }; +struct ibv_src_domain { + struct ibv_context *context; + uint32_thandle; +}; + enum ibv_srq_attr_mask { IBV_SRQ_MAX_WR = 1 0, IBV_SRQ_LIMIT = 1 1 @@ -389,7 +394,8 @@ struct ibv_srq_init_attr { enum ibv_qp_type { IBV_QPT_RC = 2, IBV_QPT_UC, - IBV_QPT_UD + IBV_QPT_UD, + IBV_QPT_SRC }; struct ibv_qp_cap { @@ -408,6 +414,7 @@ struct ibv_qp_init_attr { struct ibv_qp_cap cap; enum ibv_qp_typeqp_type; int sq_sig_all; + struct ibv_src_domain *src_domain; }; enum ibv_qp_attr_mask { @@ -526,6 +533,7 @@ struct ibv_send_wr { uint32_tremote_qkey; } ud; } wr; + uint32_tsrc_remote_srq_num; }; struct ibv_recv_wr { @@ -553,6 +561,10 @@ struct ibv_srq { pthread_mutex_t mutex; pthread_cond_t cond; uint32_tevents_completed; + + uint32_tsrc_srq_num; + struct ibv_src_domain *src_domain; + struct ibv_cq *src_cq; }; struct ibv_qp { @@ -570,6 +582,8 @@ struct ibv_qp { pthread_mutex_t mutex; pthread_cond_t cond; uint32_tevents_completed; + + struct ibv_src_domain *src_domain; }; struct ibv_comp_channel { @@ -912,6 +926,25 @@ struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr); /** + * ibv_create_src_srq - Creates a SRQ associated with the specified protection + * domain and src domain. + * @pd: The protection domain associated with the SRQ. + * @src_domain: The SRC domain associated with the SRQ. + * @src_cq: CQ to report completions for SRC packets on. + * + * @srq_init_attr: A list of initial attributes required to create the SRQ. + * + * srq_attr-max_wr and srq_attr-max_sge are read the determine the + * requested size of the SRQ, and set to the actual values allocated + * on return. If ibv_create_srq() succeeds, then max_wr and max_sge + * will always be at least as large as the requested values. + */ +struct ibv_srq *ibv_create_src_srq(struct ibv_pd *pd, + struct ibv_src_domain *src_domain, + struct ibv_cq *src_cq, + struct ibv_srq_init_attr *srq_init_attr); + +/** * ibv_modify_srq - Modifies the attributes for the specified SRQ. * @srq: The SRQ to modify. * @srq_attr: On input, specifies the SRQ attributes to modify. On output, @@ -1074,6 +1107,44 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); */ int ibv_fork_init(void); +/** + * ibv_alloc_src_domain - Allocate an SRC domain + * Returns a reference to an SRC domain. + * Use ibv_put_src_domain to free the reference. + * @context: Device context + */ +struct ibv_src_domain *ibv_get_new_src_domain(struct ibv_context *context); + +/** + * ibv_share_src_domain - associate the src domain with a file. + * Establishes a connection between an SRC domain object and a file descriptor. + * + * @d: SRC domain to share + * @fd: descriptor for a file to associate with the domain + */ +int ibv_share_src_domain(struct ibv_src_domain *d, int fd); + +/** + * ibv_unshare_src_domain - disassociate the src domain from a file. + * Subsequent calls to ibv_get_shared_src_domain will fail. + * @d: SRC domain to unshare + */ +int ibv_unshare_src_domain(struct ibv_src_domain *d); + +/** + * ibv_get_src_domain - get a reference to shared SRC domain + * @context: Device context + * @fd: descriptor for a file associated with the domain + */ +struct ibv_src_domain *ibv_get_shared_src_domain(struct ibv_context *context, int fd); + +/** + * ibv_put_src_domain - destroy a reference to an SRC domain + * If this is the last reference, destroys the domain. + * @d: reference to SRC domain to put + */ +int ibv_put_src_domain(struct ibv_src_domain *d
[ewg] Re: [ofa-general] RFC: SRC API
On Sun, Jul 29, 2007 at 05:04:31PM +0300, Michael S. Tsirkin wrote: Hello! Here is an API proposal for support of the SRC (scalable reliable connected) protocol extension in libibverbs. This adds APIs to: - manage SRC domains - share SRC domains between processes, by means of creating a 1:1 association between an SRC domain and a file. Notes: - The file is specified by means of a file descriptor, this makes it possible for the user to manage file creation/deletion in the most flexible manner (e.g. tmpfile can be used). - I envision implementing this sharing mechanism in kernel by means of a per-device tree, with inode as a key and domain object as a value. Please comment. Can you provide a pseudo code of an application using this API? Especially QP sharing part. There's no QP sharing here. You mean SRC domain sharing? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] RFC: SRC API
Some code examples: /* create a domain and share it: */ struct ibv_src_domain * d = ibv_get_new_src_domain(ctx); int fd = open(path, O_CREAT | O_RDWR, mode); ibv_share_src_domain(d, fd); /* get a reference to a shared domain: */ int fd = open(path, O_CREAT | O_RDWR, mode); struct ibv_src_domain * d = ibv_get_shared_src_domain(ctx, fd); /* once done: */ ibv_put_src_domain(d); Note: when all users do put, domain is destroyed. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] RFC: SRC API
More code examples: Create an SRC QP, part of SRC domain: attr.qp_type = IBV_QPT_SRC; attr.src_domain = d; qp = ibv_create_qp(pd, attr); Given remote SRQ number, send data to this SRQ over an SRC QP: wr.src_remote_srq_num = src_remote_srq_num; ib_post_send(qp, wr); Note: SRQ number needs to be exchanged as part of CM private data or some other protocol. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] RFC: SRC API
Quoting Gleb Natapov [EMAIL PROTECTED]: Subject: Re: [ofa-general] RFC: SRC API On Mon, Jul 30, 2007 at 12:16:39PM +0300, Michael S. Tsirkin wrote: More code examples: Create an SRC QP, part of SRC domain: attr.qp_type = IBV_QPT_SRC; attr.src_domain = d; qp = ibv_create_qp(pd, attr); Given remote SRQ number, send data to this SRQ over an SRC QP: wr.src_remote_srq_num = src_remote_srq_num; ib_post_send(qp, wr); Note: SRQ number needs to be exchanged as part of CM private data or some other protocol. You are too brief. I can come up with one linears based on the API by myself. I am trying to understand how sharing of SRC between processes will work and your example doesn't show this. It seems what you are missing is what SRC is, not how to use the API. I'll have a working example when I get closer to implementation. For now you'll have to look up Dror's preso if you want to understand what SRC is. Can I connected the same SRC to different QPs? If yes, can I send packet to any SRQ connected to the SRC through any QP connected to the same SRC? Yes to both. If yes how is this different from having regular QPs? With regular QP you can only send to a single SRQ. But again, look at Dror's preso. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] RFC: SRC API
It seems what you are missing is what SRC is, not how to use the API. So tell us. This calls for a separate document. From feedback from Sonoma I really assumed people have it figured out. Let's open a separate thread, and there I will try writing up what SRC is from the protocol point of view. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Scalable reliable connection
Here's some background on what SRC is. This is basically slide 6 in Dror's talk, for those that missed the talk. * * * SRC is an extension supported by recent Mellanox hardware which is geared toward reducing the number of QPs required for all-to-all communication on systems with a high number of jobs per node. === Motivation: === Given N nodes with J jobs per node, number of QPs required for all-to-all communication is: With RC: O((N * J) ^ 2) Since each job out of O(N * J) jobs must create a single QP to communicate with each one of O(N * J) other jobs. With SRC: O(N ^ 2 * J) This is achived by using a single send queue (per job, out of O(N * J) jobs) to send data to all J jobs running on a specific node (out of O(N) nodes). Hardware uses new SRQ number field in packet header to multiplex receive WRs and WCs to private memory of each job. This is similiar idea to IB RD. Q: Why not use RD then? A: Because no hardware supports it. Details: === Verbs extension: === - There is a new transport/QP type SRC. - There is a new object type SRC domain - Each SRQ gets new (optional) attributes: SRC domain SRC SRQ number SRC CQ SRQ must have either all 3 of these or none of these attributes - QPs of type SRC have all the same attributes as regular RC QPs connected to SRQ, except that: A. Each SRC QP has a new required attribute SRC domain B. SRC QPs do *not* have SRQ attribute (do not have a specific SRQ associated with them) === Protocol extension: === SRC QP behaviour: Requestor - Post send WR for this QP type is extended with SRQ number field This number is sent as part of packet header - SRC Packets follow rules for RC packets on the wire, exactly What is different is their handling at the responder side SRC QP behaviour: Responder Each incoming packet passes transport checks with respect to the SRC QP, following RC rules, exactly. After this, SRQ number in packet header is used to look up a specific SRQ. SRC domain of the resulting SRQ must be equal to SRC domain of the QP, otherwise a NAK is sent, and QP moves to error state. If the SRC domains match, receive WR and receive WC processing are as follows: - RC Send - Rather than using SRQ to which the QP is attached, SRQ is looked up by SRQ number in the packet. Receive WR is taken from this SRQ. - Completions are generated on the CQ specified in the SRQ - RDMA/Atomic - Rather than using PD to which the QP is attached, SRQ is looked up by SRQ number in the packet. PD of this SRQ is used for protection checks. === -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/4]: add kfifo from upstream for SLES9 RH4
The following patches add kfifo to ibcore (for SLES9 RH4). kfifo is taken from upstream code. Thanks, applied to 1.2.c and ofed_kernel. Vlad already took 1.2.c, and will I guess take ofed_kernel after it passes his checks. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Scalable reliable connection
Quoting Gleb Natapov [EMAIL PROTECTED]: Subject: Re: Scalable reliable connection On Mon, Jul 30, 2007 at 03:50:54PM +0300, Michael S. Tsirkin wrote: With SRC: O(N ^ 2 * J) This is achived by using a single send queue (per job, out of O(N * J) jobs) to send data to all J jobs running on a specific node (out of O(N) nodes). Hardware uses new SRQ number field in packet header to multiplex receive WRs and WCs to private memory of each job. But since the send queue cannot be used for receiving packets additional receive QPs have to be created one per job so with SRC it is actually O(N ^ 2 * J + N * J) unless I am missing something. Yes but since N = 1, N ^ 2 = N and so O(N ^ 2 * J + N * J) == O(N ^ 2 * J). -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Scalable reliable connection
Quoting Tang, Changqing [EMAIL PROTECTED]: Subject: RE: Scalable reliable connection A send queue can only serve max J jobs within a node. Is it possible to make a single send queue to serve all jobs on all nodes ? How do you propose to do this? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: patches for 1.2.c
Quoting Steve Wise [EMAIL PROTECTED]: Subject: patches for 1.2.c Guys, I have 2 more patches to go in ofed_1_2/ofed_1_2_c. Is there some grand scheme to the naming of kernel_patches/fixes/* for 1.2.c? I noticed a slew of new files for the post-2.6.22 fixes, and wondered if there is a naming scheme? Not really, just stick the module name in there please so it's easy to figure that cxgb3 is involved. Or should I just post a patch for the ofed_1_2 branch and let you all create the ofed_1_2_c kernel_patches/fixes/ patch file ?? It's best if you post the patch that should go into kernel_patches/fixes/, or clone the ofed_1_2_c branch and add the file there. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons The following patches move open-iscsi crypto functions from kernel_patches to kernel_addons. By doing so, we also solve a bug in iscsi tx hash that caused an oops when crc32c was used for data digest. Great, these patches were really fragile. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: OFED 1.2.c-9 is available
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: OFED 1.2.c-9 is available Why under drivers/net rather than drivers/infiniband like all the other drivers ? Does this really need special casing (in libibumad) ? Tziporet is incorrect. There's nothing from the mlx4_core driver either, and when it is implemented, it should work exactly the same as all other drivers. At some point you suggested sticking this stuff under the pci device and adding softlinks under drivers/infiniband, so that if there's an ethernet device on top of the core these can be shared. Not sure how to do this though, and no idea why would just adding the attributes in both places be any worse, either. Comments? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
Quoting Steve Wise [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status Also, Is something broken in the ofed_1_2 branch? I cannot even build against the local kernel on the ofa server using the ~vlad/ofed_1_2/linux-2.6 repository. Does directory ~vlad/ofed_1_2/linux-2.6 exist? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
Looke here: /home/vlad/scripts/ofed_1_2 Quoting Steve Wise [EMAIL PROTECTED]: Subject: Re: [ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status I'm havin' a bad day. Can you all help me? My normal process is to use the build_ofa_kernel.sh script from the ofabuild repository to build against all ofed kernels. But that scripts in the master branch of the ofabuild repository now assumes 1.2.c because it tries to configure in the connectx device. There aren't ofed_1_2 and ofed_1_2_c branches in that repos for tree-specific build scripts. S: What exactly should I be using to do cross-compile builds of my patched trees before submitting patches for inclusion into ofed? Thanks and sorry for the pain. And if there a RTFM somewhere that I should be readying, feel free to say RTFM. :) Steve. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons
Vlad? Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to?kernel_addons Michael S. Tsirkin wrote: Quoting Erez Zilber [EMAIL PROTECTED]: Subject: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons The following patches move open-iscsi crypto functions from kernel_patches to kernel_addons. By doing so, we also solve a bug in iscsi tx hash that caused an oops when crc32c was used for data digest. Great, these patches were really fragile. I saw that these patches are not in 1.2.c-10. Will they be in 1.2.c-11? This is a real bug fix. Thanks, Erez -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RFCv2: SRC API
This is version 2 of the proposal, addressing comments from version 1. Changelog: - Use oflags to make API smaller - Clarify sharing semantics - Add documentation This is the API proposal for support of the SRC (scalable reliable connected) protocol extension in libibverbs. This adds APIs to: - manage SRC domains - share SRC domains between processes, by means of creating a 1:1 association between an SRC domain and an inode. Notes: - The inode is specified by means of a file descriptor, this makes it possible for the user to manage file creation/deletion in the most flexible manner (e.g. tmpfile can be used). - I envision implementing this sharing mechanism in kernel by means of a per-device tree, with inode as a key and domain object as a value. Please comment. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] diff --git a/SRC.txt b/SRC.txt new file mode 100644 index 000..3881477 --- /dev/null +++ b/SRC.txt @@ -0,0 +1,133 @@ +Here's some documentation on Scalable Reliable Connections. + + * * * + +SRC is an extension supported by recent Mellanox hardware +which is geared toward reducing the number of QPs +required for all-to-all communication on systems +with a high number of jobs per node. + +=== +Motivation: +=== +Given N nodes with J jobs per node, number of QPs required +for all-to-all communication is: + +With RC: + O((N * J) ^ 2) + + Since each job out of O(N * J) jobs must create a single QP + to communicate with each one of O(N * J) other jobs. + +With SRC: + O(N ^ 2 * J) + + This is achived by using a single send queue (per job, out of O(N * J) jobs) + to send data to all J jobs running on a specific node (out of O(N) nodes). + Hardware uses new SRQ number field in packet header to + multiplex receive WRs and WCs to private memory of each job. + +This is similiar idea to IB RD. +Q: Why not use RD then? +A: Because no hardware supports it. + +Details: + +=== +Verbs extension: +=== + +- There is a new transport/QP type SRC. +- There is a new object type SRC domain +- Each SRQ gets new (optional) attributes: +SRC domain + SRC SRQ number +SRC CQ + SRQ must have either all 3 of these or none of these attributes + +- QPs of type SRC have all the same attributes as regular RC QPs + connected to SRQ, except that: + A. Each SRC QP has a new required attribute SRC domain + B. SRC QPs do *not* have SRQ attribute + (do not have a specific SRQ associated with them) + +=== +Protocol extension: +=== +SRC QP behaviour: Requestor +- Post send WR for this QP type is extended with SRQ number field + This number is sent as part of packet header +- SRC Packets follow rules for RC packets on the wire, exactly + What is different is their handling at the responder side + +SRC QP behaviour: Responder +Each incoming packet passes transport checks with respect +to the SRC QP, following RC rules, exactly. + +After this, SRQ number in packet header is used to look up +a specific SRQ. SRC domain of the resulting SRQ must be equal +to SRC domain of the QP, otherwise a NAK is sent, +and QP moves to error state. + +If the SRC domains match, receive WR and receive WC processing +are as follows: + +- RC Send + - Rather than using SRQ to which the QP is attached, +SRQ is looked up by SRQ number in the packet. +Receive WR is taken from this SRQ. + - Completions are generated on the CQ specified in the SRQ + +- RDMA/Atomic + - Rather than using PD to which the QP is attached, +SRQ is looked up by SRQ number in the packet. +PD of this SRQ is used for protection checks. + +=== +Pseudo code: +=== + +Consider again a setup where there are N nodes with J jobs per node. +All N * J jobs need to perform all-to-all communication. +Using RC QPs, this would call for O((N * J) ^ 2) QPs. +Here is how SRC can be used to reduce the number of QPs to O(N ^ 2 * J). + +At startup: +1. All jobs on each node share a single SRC domain +2. Each job creates a CQ for receive WCs +3. Each job creates a SRQ attached to this CQ and to the shared domain + +When job j1 needs to transmit to job j2 on remote node n for the first time: +1. Test: does job j1 have an existing connection to some job on node n? +- If no: + j1 creates an SRC QP qp1 (send QP) + qp1 is only used to post send WRs + j2 creates an SRC QP qp2 + qp2 is part of SRC domain
[ewg] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status Hello Michael and Vladimir! ehca backports for kernel.org kernels seem to be broken. 1. Does anyone care enough to fix them? If not we'll disable ehca in build for these kernels. I downloaded daily build package ofa_1_2_c_kernel-20070804-0200.tgz and followed the build scheme configure, make on 2.6.19, 2.6.18, 2.6.17 and 2.6.16/sles10/sles10_sp1. Except for 2.6.16/sles10/sles10_sp1 a patch for kmem_cache_zalloc() is required for ehca the others were built without errors, see below. Thus, I'm wondering what I'm doing differently than your daily build script? Could be different kernel configs or compiler version? Can you please build on ofa server against kernels in ~vlad/kernel.org/? The cross tool chain is here: /home/vlad/cross/ -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1
Let's not do it this way. I think the right thing is to implement kmem_cache_zalloc by means of kmem_cache_allocand memset in kernel_addons. Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Hello Michael and Vladimir! This patch below adds a backport patch for ehca to the dirs 2.6.16, 2.6.16_sles10 and 2.6.16_sles10_sp1 underneath kernel_patches/backport of ofed-1.2.c source tree. Thanks! Nam backport kmem_cache_zalloc() to 2.6.10, 2.6.10_sles10 and 2.6.10_sles10_sp1 Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- 2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch| 97 +++ 2.6.16_sles10/ehca_kmem_cache_zalloc_to_2_6_16.patch | 97 +++ 2.6.16_sles10_sp1/ehca_kmem_cache_zalloc_to_2_6_16.patch | 97 +++ 3 files changed, 291 insertions(+) diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch --- ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch 1970-01-01 01:00:00.0 +0100 +++ ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch 2007-08-06 00:53:59.0 +0200 @@ -0,0 +1,97 @@ +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c 2007-08-06 00:41:50.0 +0200 +@@ -134,13 +134,14 @@ struct ib_cq *ehca_create_cq(struct ib_d + if (cqe = 0x - 64 - additional_cqe) + return ERR_PTR(-EINVAL); + +- my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL); ++ my_cq = kmem_cache_alloc(cq_cache, GFP_KERNEL); + if (!my_cq) { + ehca_err(device, Out of memory for ehca_cq struct device=%p, +device); + return ERR_PTR(-ENOMEM); + } + ++ memset(my_cq, 0, sizeof(*my_cq)); + memset(param, 0, sizeof(struct ehca_alloc_cq_parms)); + + spin_lock_init(my_cq-spinlock); +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c 2007-08-06 00:40:58.0 +0200 +@@ -113,9 +113,11 @@ static struct kmem_cache *ctblk_cache = + + void *ehca_alloc_fw_ctrlblock(gfp_t flags) + { +- void *ret = kmem_cache_zalloc(ctblk_cache, flags); ++ void *ret = kmem_cache_alloc(ctblk_cache, flags); + if (!ret) + ehca_gen_err(Out of memory for ctblk); ++ else ++ memset(ret, 0, EHCA_PAGESIZE); + return ret; + } + +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c 2007-08-06 00:39:30.0 +0200 +@@ -55,8 +55,9 @@ static struct ehca_mr *ehca_mr_new(void) + { + struct ehca_mr *me; + +- me = kmem_cache_zalloc(mr_cache, GFP_KERNEL); ++ me = kmem_cache_alloc(mr_cache, GFP_KERNEL); + if (me) { ++ memset(me, 0, sizeof(*me)); + spin_lock_init(me-mrlock); + } else + ehca_gen_err(alloc failed); +@@ -73,8 +74,9 @@ static struct ehca_mw *ehca_mw_new(void) + { + struct ehca_mw *me; + +- me = kmem_cache_zalloc(mw_cache, GFP_KERNEL); ++ me = kmem_cache_alloc(mw_cache, GFP_KERNEL); + if (me) { ++ memset(me, 0, sizeof(*me)); + spin_lock_init(me-mwlock); + } else + ehca_gen_err(alloc failed); +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c 2007-08-06 00:38:14.0 +0200 +@@ -50,13 +50,14 @@ struct ib_pd *ehca_alloc_pd(struct ib_de + { + struct ehca_pd *pd; + +- pd = kmem_cache_zalloc(pd_cache, GFP_KERNEL); ++ pd = kmem_cache_alloc(pd_cache, GFP_KERNEL); + if (!pd) { + ehca_err(device,
[ewg] Re: RFCv2: SRC API
Only of the job among j2, j3, j4 on remote node n need to create a receiving qp2 for j1, right ? Correct. A single QP can be used to send data to any SRQ that shares the same domain. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFCv2: SRC API
Quoting Tang, Changqing [EMAIL PROTECTED]: Subject: RE: RFCv2: SRC API OK, I was wrong before, here is my question. if remote node n has j2, j3, and j4, and j2 is the job to create qp2 and make connection with qp1 in j1. if j2 is done before j3 and j4, then we can not let j2 to destroy qp2, because j3 and j4 are still communicating with j1. Since j2 owns qp2, j2 need to be the last job to cleanup. Am I right ? Correct. Is this clear from the text, or is some kind of additional clarification necessary? It is not clear at the first read, so please add one sentence to clarify it. Would something like this help? Cleanup: When job j1 does not need to communicate to any jobs on node n, it disconnects qp1 from qp2, and asks j2 to destroy qp2. + +Note: both qp1 and qp2 must exist for the communication to take place. +Thus, j2 should not destroy qp2 (and in particular, should not exit) +until j1 has completed communication with node n and +has asked j2 to disconnect. if j2 is the last job to cleanup, how can it know all other jobs on the same node has called ibv_close_src_domain(), and it is time for itself to cleanup ? Is this something upto application to do ? No, this is handled automatically. Have you seen this text? * ibv_close_src_domain - close an SRC domain * If this is the last reference, destroys the domain. So, each job has a reference to the domain. Once the last reference is gone, the domain is destroyed. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFCv2: SRC API
Quoting Tang, Changqing [EMAIL PROTECTED]: Subject: RE: RFCv2: SRC API Cleanup: When job j1 does not need to communicate to any jobs on node n, it disconnects qp1 from qp2, and asks j2 to destroy qp2. + +Note: both qp1 and qp2 must exist for the communication to take place. +Thus, j2 should not destroy qp2 (and in particular, should not exit) +until j1 has completed communication with node n and has asked j2 to +disconnect. Thanks. Another question. if a node n has 8 jobs, say, j2-j9, usually the first job j2 is the one to create the SRC domain(other jobs just attach and share) and it make sense to let j2 to create all the receiving QPs for all other remote jobs and make all the connections. (we can do in roundrobin way, but more work). Sure, creating allconnections upfront will work to, this is just a usage example. Is there any performance worry to let j2(the first job on a node) to do all the work ? How do you mean? What is the latency of SRC+SRQ ? I'd expect it to be more or less the same as regular SRQ. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c
Quoting Ramachandra K [EMAIL PROTECTED]: Subject: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c Vlad, Please apply this VNIC patch series to both the OFED-1.2 and OFED-1.2.c branches. This series contains changes to the VNIC driver for supporting iPath and the new version of the VEx hardware, the Ethernet Virtual I/O Controller (EVIC). I don't see how adding features to 1.2 *at this stage* can be justufied. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c
Quoting Kuchimanchi, Ramachandra [EMAIL PROTECTED]: Subject: RE: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c -Original Message- From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] Sent: Mon 8/6/2007 11:48 PM To: Kuchimanchi, Ramachandra Cc: [EMAIL PROTECTED]; ewg@lists.openfabrics.org Subject: Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c Quoting Ramachandra K [EMAIL PROTECTED]: Subject: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c Vlad, Please apply this VNIC patch series to both the OFED-1.2 and OFED-1.2.c branches. This series contains changes to the VNIC driver for supporting iPath and the new version of the VEx hardware, the Ethernet Virtual I/O Controller (EVIC). I don't see how adding features to 1.2 *at this stage* can be justufied. Just to clarify, when I by OFED-1.2, I meant the next release in the 1.2 series of OFED i.e OFED-1.2.1 and ultimately for OFED-1.3 down the line. Is there any other branch designated for that ? I think EWG decided that the next release in the 1.2 series will be 1.2.c. So far, the definition of 1.2.c was 1.2 plus bugfixes plus connectx support. Stuff intended for 1.3 should go here for now: git://openfabrics.org/~vlad/ofed_kernel ofed_kernel This has been updated to 2.6.23-rc2, but otherwise is tracking ofed_1_2_c. And I hope there is no objection for inclusion of these patches in OFED-1.2.c branch. This looks like a change of methodology so this might be something EWG would have to agree on. Right? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1
Hmm, I thought about it some more. kmem_cache struct is not exported on recent kernels, so this might br hard to do. So I think the patch is probably the right approach, after all. Quoting Michael S. Tsirkin [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Let's not do it this way. I think the right thing is to implement kmem_cache_zalloc by means of kmem_cache_allocand memset in kernel_addons. Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Hello Michael and Vladimir! This patch below adds a backport patch for ehca to the dirs 2.6.16, 2.6.16_sles10 and 2.6.16_sles10_sp1 underneath kernel_patches/backport of ofed-1.2.c source tree. Thanks! Nam backport kmem_cache_zalloc() to 2.6.10, 2.6.10_sles10 and 2.6.10_sles10_sp1 Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- 2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch| 97 +++ 2.6.16_sles10/ehca_kmem_cache_zalloc_to_2_6_16.patch | 97 +++ 2.6.16_sles10_sp1/ehca_kmem_cache_zalloc_to_2_6_16.patch | 97 +++ 3 files changed, 291 insertions(+) diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch --- ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch 1970-01-01 01:00:00.0 +0100 +++ ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch 2007-08-06 00:53:59.0 +0200 @@ -0,0 +1,97 @@ +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c 2007-08-06 00:41:50.0 +0200 +@@ -134,13 +134,14 @@ struct ib_cq *ehca_create_cq(struct ib_d + if (cqe = 0x - 64 - additional_cqe) + return ERR_PTR(-EINVAL); + +- my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL); ++ my_cq = kmem_cache_alloc(cq_cache, GFP_KERNEL); + if (!my_cq) { + ehca_err(device, Out of memory for ehca_cq struct device=%p, +device); + return ERR_PTR(-ENOMEM); + } + ++ memset(my_cq, 0, sizeof(*my_cq)); + memset(param, 0, sizeof(struct ehca_alloc_cq_parms)); + + spin_lock_init(my_cq-spinlock); +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c 2007-08-06 00:40:58.0 +0200 +@@ -113,9 +113,11 @@ static struct kmem_cache *ctblk_cache = + + void *ehca_alloc_fw_ctrlblock(gfp_t flags) + { +- void *ret = kmem_cache_zalloc(ctblk_cache, flags); ++ void *ret = kmem_cache_alloc(ctblk_cache, flags); + if (!ret) + ehca_gen_err(Out of memory for ctblk); ++ else ++ memset(ret, 0, EHCA_PAGESIZE); + return ret; + } + +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c 2007-08-06 00:39:30.0 +0200 +@@ -55,8 +55,9 @@ static struct ehca_mr *ehca_mr_new(void) + { + struct ehca_mr *me; + +- me = kmem_cache_zalloc(mr_cache, GFP_KERNEL); ++ me = kmem_cache_alloc(mr_cache, GFP_KERNEL); + if (me) { ++ memset(me, 0, sizeof(*me)); + spin_lock_init(me-mrlock); + } else + ehca_gen_err(alloc failed); +@@ -73,8 +74,9 @@ static struct ehca_mw *ehca_mw_new(void) + { + struct ehca_mw *me; + +- me = kmem_cache_zalloc(mw_cache, GFP_KERNEL); ++ me = kmem_cache_alloc(mw_cache, GFP_KERNEL); + if (me) { ++ memset(me, 0, sizeof(*me)); + spin_lock_init(me-mwlock); + } else + ehca_gen_err(alloc failed); +diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c +--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c 2007-08-04 11:00:05.0 +0200 ofa_1_2_c_kernel-20070804-0200/drivers
[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1
I'm happy with stuff as it is: the ifdefs make it easy to figure which version does the backport apply. BTW, I think the same backport will be needed for older kernels as well, no? Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Hello Michael! Below is the patch to backport kmem_cache_zalloc() for 2.6.16/sles10/sles10_sp1 as we've discussed previously. Thereby I realized current backport code in slab.h looks weird to me (sort of copypaste mixture) - actually no build error, only coding issue. Therefore this patch also includes some cleanup. If it's ok, please apply. PS: The mentioned issue in backport slab.h exists also in other versions. If you want me to fix them as well, let me know. Regards Nam backport kmem_cache_zalloc() in slab.h to 2.6.10, 2.6.10_sles10 and 2.6.10_sles10_sp1 Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- 2.6.16/include/linux/slab.h| 22 +++--- 2.6.16_sles10/include/linux/slab.h | 22 +++--- 2.6.16_sles10_sp1/include/linux/slab.h | 22 +++--- 3 files changed, 21 insertions(+), 45 deletions(-) diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16/include/linux/slab.h ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16/include/linux/slab.h --- ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16/include/linux/slab.h 2007-08-04 11:00:08.0 +0200 +++ ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16/include/linux/slab.h 2007-08-06 18:29:17.0 +0200 @@ -1,10 +1,8 @@ -#include_next linux/slab.h +#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16 +#define _LINUX_SLAB_BACKPORT_TO_2_6_16 #include_next linux/slab.h -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline void *kmemdup(const void *src, size_t len, gfp_t gfp) { @@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le return p; } -#endif -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline -void *kmemdup(const void *src, size_t len, gfp_t gfp) +void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags) { - void *p; - - p = kmalloc(len, gfp); - if (p) - memcpy(p, src, len); - return p; + void *ret = kmem_cache_alloc(cache, flags); + if (ret) + memset(ret, 0, kmem_cache_size(cache)); + return ret; } #endif diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h --- ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h 2007-08-04 11:00:08.0 +0200 +++ ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h 2007-08-06 18:30:33.0 +0200 @@ -1,10 +1,8 @@ -#include_next linux/slab.h +#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16 +#define _LINUX_SLAB_BACKPORT_TO_2_6_16 #include_next linux/slab.h -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline void *kmemdup(const void *src, size_t len, gfp_t gfp) { @@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le return p; } -#endif -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline -void *kmemdup(const void *src, size_t len, gfp_t gfp) +void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags) { - void *p; - - p = kmalloc(len, gfp); - if (p) - memcpy(p, src, len); - return p; + void *ret = kmem_cache_alloc(cache, flags); + if (ret) + memset(ret, 0, kmem_cache_size(cache)); + return ret; } #endif diff -Nurp ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h --- ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h 2007-08-04 11:00:08.0 +0200 +++ ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h 2007-08-06 18:30:40.0 +0200 @@ -1,10 +1,8 @@ -#include_next linux/slab.h +#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16 +#define _LINUX_SLAB_BACKPORT_TO_2_6_16 #include_next linux/slab.h -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline void *kmemdup(const void *src, size_t len, gfp_t gfp) { @@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le return p; } -#endif -#ifndef BACKPORT_LINUX_STRING_TO_2_6_18 -#define BACKPORT_LINUX_STRING_TO_2_6_18 - static inline -void *kmemdup(const void *src, size_t len, gfp_t gfp) +void
[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 On Tuesday 07 August 2007 15:23, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Hello Michael! Below is the patch to backport kmem_cache_zalloc() for 2.6.16/sles10/sles10_sp1 as we've discussed previously. Thereby I realized current backport code in slab.h looks weird to me (sort of copypaste mixture) - actually no build error, only coding issue. Therefore this patch also includes some cleanup. If it's ok, please apply. PS: The mentioned issue in backport slab.h exists also in other versions. If you want me to fix them as well, let me know. Regards Nam Would not the following work? If yes, Vlad, I parked this at Wow, you're pretty quick. Yes, it should work. And you're right we need this patch for =2.6.16. PS: The weird thing in slab.h I meant previously is that the ifdef-kmem_dup()-block exists twice in same file, which does not harm the build. Thanks! Nam I haven't seen this. Right. Can you please post a separate patch that just kills the duplicates? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 On Tuesday 07 August 2007 15:23, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1 Hello Michael! Below is the patch to backport kmem_cache_zalloc() for 2.6.16/sles10/sles10_sp1 as we've discussed previously. Thereby I realized current backport code in slab.h looks weird to me (sort of copypaste mixture) - actually no build error, only coding issue. Therefore this patch also includes some cleanup. If it's ok, please apply. PS: The mentioned issue in backport slab.h exists also in other versions. If you want me to fix them as well, let me know. Regards Nam Would not the following work? If yes, Vlad, I parked this at Wow, you're pretty quick. Yes, it should work. And you're right we need this patch for =2.6.16. PS: The weird thing in slab.h I meant previously is that the ifdef-kmem_dup()-block exists twice in same file, which does not harm the build. Thanks! Nam OK, I've cleaned these up. Thanks for pointing this out. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] IB/ehca: fix bugs to support rhel 4.5 in OFED 1.2.c-11
* ehca patches for 2.6.23-rcX were incorporated, which is not acceptable for us to support in 1.2.c. Upstream code of ehca in kernel contains major changes in order to support ehca2 with new features, which is targeted for ofed-1.3. We have not requested to have those new features for ofed-1.2.1/1.2.c/1.2.5. The following command gives empty output, which demonstrates that no changes on top of 2.6.22 were applied to ehca sources in ofed_1_2_c: $ git log v2.6.22..ofed_1_2_c -- drivers/infiniband/hw/ehca/ $ * In kernel_addons/backport/2.6.16 (including sles10/sles10_sp1) I don't see the backport of kmem_cache_zalloc() as we have discussed and agreed on last week. See http://lists.openfabrics.org/pipermail/ewg/2007-August/004186.html * Compiler error report from today's ofed_1_2_c daily build script - I consider 2.6.16 as an example: -- Build failed on powerpc with linux-2.6.16 Log: /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:831: error: invalid type argument of - /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:834: error: invalid type argument of - /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:835: error: invalid type argument of - make[4]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband/hw/ehca] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16 _powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.16' make: *** [kernel] Error 2 -- I downloaded ofa_1_2_c_kernel-20070813-0200, ran configure --with-core-mod --with-ehca-mod --with-ipoib-mod --with-user_access-mod on our native ppc64 system and looked at ehca_main.c source code: int __init ehca_module_init(void) { ret = sysfs_create_group(ehca_driver.driver.kobj, ehca_drv_attr_grp); if (ret) /* only complain; we can live without attributes */ #831: ehca_gen_err(Cannot create driver attributes ret=%d, ret); if (ehca_poll_all_eqs != 1) { #834ehca_gen_err(WARNING!!!); ehca_gen_err(It is possible to lose interrupts.); } else { init_timer(poll_eqs_timer); poll_eqs_timer.function = ehca_poll_eqs; poll_eqs_timer.expires = jiffies + HZ; add_timer(poll_eqs_timer); } Thus, the line number does not match as reported. It looks like we have some config issues on ofa build server. I'll take time tomorrow to look there. Please advice us how to reproduce this errors. Vlad, does your build script detect and report patch rejects? That would help to see such one error sooner. Needless to say I could build ofed without errors on our ppc64 systems. My guess from all of the above is that something's wrong with the tarball. Can you please get code from git and work from there? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status Hello Doug and Scott! On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote: ehca backports for kernel.org kernels seem to be broken. 1. Does anyone care enough to fix them? If not we'll disable ehca in build for these kernels. 2. Could you upload kernels for RHEL4U5 and SLES10 ppc64? Don't you guys already have RHEL4U5? It had a backports directory in the OFED 1.2 release...and it's been out for quite a while... Our cross build environment has the headers from the x86_64 version but not the ppc version. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: OFED Aug 13 meeting summary
1. OFED 1.2.5 (was 1.2.c) is ready for release: An issue with ehca: There are patches form kernel 2.6.23 that were inserted by mistake and must be removed before the release There aren't, really. The snapshot generating scripts seem to be broken and seem to put code from ofed_kernel branch under the 1.2.c name. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
Quoting Doug Ledford [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status On Tue, 2007-08-14 at 09:59 +0200, Hoang-Nam Nguyen wrote: Hi Doug! On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status Hello Doug and Scott! On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote: ehca backports for kernel.org kernels seem to be broken. 1. Does anyone care enough to fix them? If not we'll disable ehca in build for these kernels. 2. Could you upload kernels for RHEL4U5 and SLES10 ppc64? Don't you guys already have RHEL4U5? It had a backports directory in the OFED 1.2 release...and it's been out for quite a while... Some part of this thread might confuse. And really, it's not about any specific backport issue from ehca or other component(s). It's a general prereq for ofed's daily build to have rhel4.5 resp sles10 ppc64 in their daily build runs too. Thanks Nam All of the kernel rpms from our U5 kernel have been on my web page in my sig for *ages*. All you need to do is download the needed rpms and install. I think there's no way to unpack these without a ppc machine, though. Is that right? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: OFED Aug 13 meeting summary
Quoting Stefan Roscher [EMAIL PROTECTED]: Subject: Re: [ewg] Re: OFED Aug 13 meeting summary On Tuesday 14 August 2007 14:06, Tziporet Koren wrote: Michael S. Tsirkin wrote: 1. OFED 1.2.5 (was 1.2.c) is ready for release: An issue with ehca: There are patches form kernel 2.6.23 that were inserted by mistake and must be removed before the release There aren't, really. The snapshot generating scripts seem to be broken and seem to put code from ofed_kernel branch under the 1.2.c name. Good - so we will release 1.2.5 today Hi Tziporet, can we ensure that this patch http://lists.openfabrics.org/pipermail/ewg/2007-August/004299.html is apllied? Without this patch we have a broken ehca build on rhel-4.5. regards Stefan I pushed a fix for this out in my tree, so Vlad will take it in 1.2.c branch. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: ib_local_sa.ko is not created
IIRC this tree includes new local sa bits from Sean which are interated as part of sa module. Quoting Erez Zilber [EMAIL PROTECTED]: Subject: ib_local_sa.ko is not created Vlad, I'm trying to build run ofa_kernel from git://git.openfabrics.org/~vlad/ofed_kernel.git ofed_kernel I'm running the following configure cmd: ./configure --with-core-mod --with-ipoib-mod --with-mthca-mod --with-mlx4-mod --with-iser-mod However, ib_local_sa.ko is not created after I run `make`: [EMAIL PROTECTED] ofed_kernel]# ll drivers/infiniband/core/*.ko -rw-r--r-- 1 root root 467132 Aug 23 2007 drivers/infiniband/core/ib_cm.ko -rw-r--r-- 1 root root 1239100 Aug 23 2007 drivers/infiniband/core/ib_core.ko -rw-r--r-- 1 root root 761570 Aug 23 2007 drivers/infiniband/core/ib_mad.ko -rw-r--r-- 1 root root 744899 Aug 23 2007 drivers/infiniband/core/ib_sa.ko -rw-r--r-- 1 root root 232824 Aug 23 2007 drivers/infiniband/core/iw_cm.ko Did I miss something? Thanks, -- Erez Zilber | 972-9-971-7689 Software Engineer, Storage Solutions Team Voltaire – _The Grid Backbone_ __ www.voltaire.com http://www.voltaire.com/ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: ib_local_sa.ko is not created
It disagrees about the symbol version because my machine still has the original ib_local_sa module that comes with RH4 up4. How can we solve this problem? Reboot the machine. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] stop OFED before uninstalling it
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: [PATCH] stop OFED before uninstalling it Tziporet Koren wrote: Erez Zilber wrote: stop OFED before uninstalling it Signed-off-by: Erez Zilber [EMAIL PROTECTED] --- uninstall.sh |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/uninstall.sh b/uninstall.sh index 177b8a1..89ee3f1 100755 --- a/uninstall.sh +++ b/uninstall.sh @@ -110,6 +110,11 @@ uninstall() { local RC=0 local OLD_PREFIX= + +echo Stopping OFED stack +echo +/etc/init.d/openibd stop + echo echo Removing ${PACKAGE} Software installations echo What would the install do if this is failing or machine hang? Tziporet The user will have to stop OFED at some point. If we don't stop OFED while uninstalling, he will stop it later (and then the machine may hang). The motivation for this patch is: if the user installs OFED over an older version (while the old version is running), he will eventually have a new version of OFED installed with an old loaded version. This may lead to strange scenarios. For example: if the user tries to load iSER, modprobe will fail because iSER (from the new OFED version) cannot use the loaded OFED modules (from the old version). Of course, this can happen with any OFED module. Actually, this fix is related to bug #536. Maybe we should move this discussion to bugzilla. NAK. This would break e.g. systems which rely on ipoib for connectivity. iSER failing with clear version conflict message seems like a minor problem. How about just documenting this? How about producing a message telling the user to reboot? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC: OFED-1.3-20070823-1130 - first build
Quoting Yosef Etigin [EMAIL PROTECTED]: Subject: Re: RFC: OFED-1.3-20070823-1130 - first build Hi Vlad, I have some comments regarding install.pl. Overall, I think it's too long for a perl script. So ... what's your point? 1. The first ~1K lines are a database of the existing packages. It has some unneccesary initiallizations: selected = 0, installed = 0, rpm_exist = 0, rpm_exist32 = 0 I agree here. In my opinion, this database could exist an an external XML file, rather easy to parse that with perl. It's hard to see what inventing yet another format would buy us. Let's keep it simple. 2. How about doing a ? b : c instead of if (a) { b } else { c } ? Looks like a matter of style. 3. There are some copy-and-paste blocks.. for example, in select_packages(): instead of: if ($package eq mvapich2_conf_impl) { $mvapich2_conf_impl = $selected; next; } elsif ... write: if ($package =~ /^mvapich2_conf_/) { $$package = $selected; next; } same for the stuff in set_compilers() 4. Instead of print RED ..., RESET \n; exit 1, you could do smth like error() since redirecting this to files causes some mess 5. instead of iterating over arrays and checking conditions you could use grep, map, and such. Could. But shouldn't. Simple loops are much easier to understand. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ib_local_sa.ko is not created
Quoting Vladimir Sokolovsky [EMAIL PROTECTED]: Subject: Re: [ewg] ib_local_sa.ko is not created Erez Zilber wrote: Vlad, I'm trying to build run ofa_kernel from git://git.openfabrics.org/~vlad/ofed_kernel.git ofed_kernel I'm running the following configure cmd: ./configure --with-core-mod --with-ipoib-mod --with-mthca-mod --with-mlx4-mod --with-iser-mod However, ib_local_sa.ko is not created after I run `make`: [EMAIL PROTECTED] ofed_kernel]# ll drivers/infiniband/core/*.ko -rw-r--r-- 1 root root 467132 Aug 23 2007 drivers/infiniband/core/ib_cm.ko -rw-r--r-- 1 root root 1239100 Aug 23 2007 drivers/infiniband/core/ib_core.ko -rw-r--r-- 1 root root 761570 Aug 23 2007 drivers/infiniband/core/ib_mad.ko -rw-r--r-- 1 root root 744899 Aug 23 2007 drivers/infiniband/core/ib_sa.ko -rw-r--r-- 1 root root 232824 Aug 23 2007 drivers/infiniband/core/iw_cm.ko Did I miss something? Thanks, Hi Erez, You are right, it is not in ofed_kernel yet. 1.2.5 and 1.3 include sean_local_sa_*.patch patches which implement local sa caching. It just isn't put in a separate module the way it was in 1.2. See the following commit: commit b054b6c133aa89907ee93e5d105c0d44774e9e6a Author: Michael S. Tsirkin [EMAIL PROTECTED] Date: Tue May 29 16:07:56 2007 +0300 Update fixes for kernel 2.6.21-rc3: remove applied patches, update patches dma_map_sg.patch and zap_ipoib_5_cm_drain_by_send_wr.patch Patch merged_sean_rdma_dev_ofed_1_2.patch is out for now - it mixes multiple topics, most of them merged already, except local sa. Remember to generate and add in local sa patch at some later point. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Regards, Vladimir This commit is wy old - the patches have since been updated by Sean and made their way in 1.2.5 release and 1.3 tree. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] OFED 1.2.5 - GA release
Quoting Arlin Davis [EMAIL PROTECTED]: Subject: Re: [ofa-general] OFED 1.2.5 - GA release How can I build/install OFED 1.2.5 with ib_local_sa.ko? It seems to build but does not install and I need SA caching options. Can anyone tell me how to get ib_local_sa.ko installed with OFED 1.2.5? We cannot move to OFED 1.2.5 without SA caching options. ib_local_sa was merged with ib_sa in 1.2.5. There are no extra modules to load. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: RE: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3 Quoting Erez Zilber [EMAIL PROTECTED]: Subject: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3 The following patches fix bugs in open-iscsi over iSER for the RH4 backport in OFED 1.3. can you pls stick this in a git tree so I can pull? No problem. I thought that you can take the patch from the e-mail. Anyway, the git tree is here: git://git.openfabrics.org/~erezz/linux-2.6.git ofed_kernel I can, it's just much more work, and with Vlad not here I might not find the time. git pull takes several seconds. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: RE: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3 Quoting Erez Zilber [EMAIL PROTECTED]: Subject: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3 The following patches fix bugs in open-iscsi over iSER for the RH4 backport in OFED 1.3. can you pls stick this in a git tree so I can pull? No problem. I thought that you can take the patch from the e-mail. Anyway, the git tree is here: git://git.openfabrics.org/~erezz/linux-2.6.git ofed_kernel What about kernel_patches/backport/2.6.9_U5/iser_cmd_to_2_6_22.patch given the name, isn't it needed in other kernels up to 2.6.22 too? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL ofed_1_2_c] cxgb3 bug fixes
Done. I'll push soon. Quoting Steve Wise [EMAIL PROTECTED]: Subject: [GIT PULL ofed_1_2_c] cxgb3 bug fixes Vlad (Michael/Tziporet in Vlad's absence), Please integrate the following cxgb3 bug fixes into ofed-1.2.5. All of these patches are either in 2.6.23 or merged into Jeff Garzik's upstream branch of netdev-2.6 and will go into 2.6.24. Chelsio recommends we update ofed-1.2.5 and ofed-1.3 will all of these fixes. I'll send another email with the ofed-1.3 changes as they will be slightly different. Please pull the ofed_1_2_c changes from: git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2_c The patch files added to kernel_patches/fixes include: [EMAIL PROTECTED]:~/git/ofed-1.2.5 stg series + 0029-cxgb3-engine-microcode-load + 0030-cxgb3-MAC-workaround-update + 0031-cxgb3-Update-rx-coalescing-length + 0032-cxgb3-SGE-doorbell-overflow-warning + 0033-cxgb3-use-immediate-data-for-offload-Tx + 0034-cxgb3-Expose-HW-memory-page-info + 0035-cxgb3-tighten-checks-on-TID-values + 0036-cxgb3-Fatal-error-update + 0037-cxgb3-log-adapter-serial-number + 0038-cxgb3-Update-internal-memory-management + 0039-cxgb3-update-firmware-version + 0040-cxgb3-log-and-clear-PEX-errors + 0041-cxgb3-remove-false-positive-in-xgmac-workaround + 0042-cxgb3-Set-the-CQ_ERR-bit-in-CQ-contexts + 0043-cxgb3-CQ-context-operations-time-out-too-soon + 0044-cxgb3-Add-T3C-rev + 0045-cxgb3-Update-engine-microcode-version 0046-cxgb3-driver-version Steve. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC: modify upstream code to make backporting easier
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: RFC: modify upstream code to make backporting easier I wonder whether it's acceptable in cases such as this to add a wrapper in upstream code. For example, upstream could have: #ifndef pci_get_revision #define pci_get_revision(dev) ((dev)-revision) #endif My feeling is that this type of wrapper is just obfuscation that makes the driver harder to read and maintain. Note that some people only run backported drivers, so making it easier to read and maintain *the backport* is also important. If there's a way to make backporting easier that also makes the upstream driver better, then I'm in favor of it, but this sounds like a bad example to me. Do you think applying a patch as we do now is the best way to do it then? Or do you have other ideas on how make backporting this example better? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: building userspace on ppc64 is broken
Quoting Yosef Etigin [EMAIL PROTECTED]: Subject: building userspace on ppc64 is broken While building user-space binaries on ppc64, the libs are placed in /usr/lib64, but they are built as 32 bit. This happens because in ofed 1.2 CFLAGS=-m64 was passed by the environment from the install script. What do you think about doing somthing like this in the spec files to solve the problem? -- diff --git a/libibverbs.spec.in b/libibverbs.spec.in index 459e6f2..8fcdd72 100644 --- a/libibverbs.spec.in +++ b/libibverbs.spec.in @@ -47,6 +47,9 @@ displays information about InfiniBand de %setup -q -n [EMAIL PROTECTED]@ %build +%ifarch ppc64 +%{expand: %%define optflags %{optflags} -m64} +%endif %configure make %{?_smp_mflags} Hmm. Roland? -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: ofed-1.3 daily build package's content
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: ofed-1.3 daily build package's content Hello Vlad and Michael! Just downloaded daily build package OFED-1.3-20070917-0600 and saw in SRPMS: localhost:/home/nguyen/tmp/OFED-1.3-20070917-0600/SRPMS # ls -l ofa_kernel-1.3-ofed2007091* -rw-r--r-- 1 1011 1011 1967453 2007-09-10 15:27 ofa_kernel-1.3-ofed20070910.src.rpm -rw-r--r-- 1 1011 1011 1960701 2007-09-11 15:02 ofa_kernel-1.3-ofed20070911.src.rpm -rw-r--r-- 1 1011 1011 1966672 2007-09-12 15:02 ofa_kernel-1.3-ofed20070912.src.rpm -rw-r--r-- 1 1011 1011 1957624 2007-09-13 15:02 ofa_kernel-1.3-ofed20070913.src.rpm -rw-r--r-- 1 1011 1011 1963469 2007-09-14 15:02 ofa_kernel-1.3-ofed20070914.src.rpm -rw-r--r-- 1 1011 1011 1965865 2007-09-15 15:02 ofa_kernel-1.3-ofed20070915.src.rpm -rw-r--r-- 1 1011 1011 1963044 2007-09-16 15:01 ofa_kernel-1.3-ofed20070916.src.rpm -rw-r--r-- 1 1011 1011 1959261 2007-09-17 15:01 ofa_kernel-1.3-ofed20070917.src.rpm I see this too tar tvzf OFED-1.3-20070917-0600.tgz | grep kernel -rw-r--r-- vlad/vlad 1967453 2007-09-10 16:27:48 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070910.src.rpm -rw-r--r-- vlad/vlad 1960701 2007-09-11 16:02:55 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070911.src.rpm -rw-r--r-- vlad/vlad 1966672 2007-09-12 16:02:32 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070912.src.rpm -rw-r--r-- vlad/vlad 1957624 2007-09-13 16:02:46 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070913.src.rpm -rw-r--r-- vlad/vlad 1963469 2007-09-14 16:02:30 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070914.src.rpm -rw-r--r-- vlad/vlad 1965865 2007-09-15 16:02:32 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070915.src.rpm -rw-r--r-- vlad/vlad 1963044 2007-09-16 15:01:56 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070916.src.rpm -rw-r--r-- vlad/vlad 1959261 2007-09-17 15:01:58 OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070917.src.rpm Is there a reason to include earlier versions of ofa_kernel-1.3? Are they needed by the build script? I don't think so. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ANNOUNCE orenk taking over mstflint/imgen
Oren Kladnitsky [EMAIL PROTECTED] is taking over maintaining mstflint and imgen tools from me. His trees: git://git.openfabrics.org/~orenk/mstflint.git git://git.openfabrics.org/~orenk/imgen.git are, starting now, the authoritative source for these tools. Oren is the internal maintainer of Mellanox FW tools (MFT) and now he is assuming ownership on the OFED tools too. Thanks, -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] installer: fix build environment for ppc64
Will it break build of 32 bit libraries on ppc64? Quoting Yosef Etigin [EMAIL PROTECTED]: Subject: [PATCH] installer: fix build environment for ppc64 On ppc64, binaries are compiled as 32 bit by default unless the -m64 flag is specified. When libs are built for ppc64 they are placed in /usr/lib64, despite the fact they are actually 32-bit. This pacth forces 64 bit compilation on ppc64. Signed-off-by: Yosef Etigin [EMAIL PROTECTED] -- diff --git a/install.pl b/install.pl index 7965cf4..5ce2345 100755 --- a/install.pl +++ b/install.pl @@ -169,6 +169,8 @@ my $mandir = `rpm --eval '%{_mandir chomp $mandir; my $sysconfdir = `rpm --eval '%{_sysconfdir}'`; chomp $sysconfdir; +chomp (my $optflags = `rpm --eval '%{optflags}'`); + my %main_packages = (); my @selected_packages = (); my @selected_by_user = (); @@ -2270,7 +2272,7 @@ # Build RPM from source RPM sub build_rpm { my $name = shift @_; -my $cmd; +my $cmd = ; my $res = 0; my $sig = 0; my $TMPRPMS; @@ -2279,7 +2281,10 @@ sub build_rpm print Build $name RPM\n if ($verbose); if (not $packages_info{$name}{'rpm_exist'}) { -$cmd = rpmbuild --rebuild --define '_topdir $TOPDIR'; +if ($arch eq ppc64) { +$cmd = CFLAGS='$optflags -m64' CXXFLAGS='$optflags -m64' FFLAGS='$optflags -m64' ; +} +$cmd .= rpmbuild --rebuild --define '_topdir $TOPDIR'; $cmd .= --target $target_cpu; if ( $parent eq mvapich) { -- Yossi -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3
Quoting Erez Zilber [EMAIL PROTECTED]: Subject: Re: [ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED?1.3 Erez Zilber wrote: What about kernel_patches/backport/2.6.9_U5/iser_cmd_to_2_6_22.patch given the name, isn't it needed in other kernels up to 2.6.22 too? You're right. I've just fixed that in the git tree. I hope it's ok now. Erez Michael? Looks ok. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2
Quoting Steve Wise [EMAIL PROTECTED]: Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2 Please pull the latest from my libcxgb3 git repos to update the ofed-1.2.5 and ofed-1.3 libcxgb3 release. This will update to version 1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms. git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5 This looks wrong. 1.2.X releases are done from ofed_1_2 branch. 1.2.5 is just a tag. What do you want me to do? and git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3 OK for that one. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2
Quoting Michael S. Tsirkin [EMAIL PROTECTED]: Subject: Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2 Quoting Steve Wise [EMAIL PROTECTED]: Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2 Please pull the latest from my libcxgb3 git repos to update the ofed-1.2.5 and ofed-1.3 libcxgb3 release. This will update to version 1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms. git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5 This looks wrong. 1.2.X releases are done from ofed_1_2 branch. 1.2.5 is just a tag. What do you want me to do? I figured it out. done. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2
Quoting Steve Wise [EMAIL PROTECTED]: Subject: Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2 Michael S. Tsirkin wrote: Quoting Steve Wise [EMAIL PROTECTED]: Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2 Please pull the latest from my libcxgb3 git repos to update the ofed-1.2.5 and ofed-1.3 libcxgb3 release. This will update to version 1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms. git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5 Go look at http://www.openfabrics.org/git/?p=ofed_1_2_5/libcxgb3.git;a=summary It has a ofed_1_2_5 branch. I believe Vlad setup the build scripts to handle this. Yes? This looks wrong. 1.2.X releases are done from ofed_1_2 branch. 1.2.5 is just a tag. What do you want me to do? and git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3 OK for that one. It's OK, done for both. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Please pull libehca.git/libehca ofed_1_3 branch
Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Please pull libehca.git/libehca ofed_1_3 branch Hi Michael and Vlad! Please pull from git://git.openfabrics.org/~hnguyen/libehca.git branch ofed_1_3 to get the fixes below. done -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [for OFED 1.3 PATCH 2/2] IB/ipoib: enable IGMP for userpsace multicast IB apps
Quoting Or Gerlitz [EMAIL PROTECTED]: Subject: [for OFED 1.3 PATCH 2/2] IB/ipoib: enable IGMP for userpsace multicast IB apps Michael, This patch needs to go to all the directories under kernel_patches/backport that contain the ipoib_class_device_to_2_6_20.patch, I suggest it would be named ipoib_class_device_to_2_6_20_umcast.patch Or, please create a public git tree that I or Vlad can pull from. Please remember to run cross build before requesting a pull. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] off list for a while, email address change
Please note that my email address is changing. You can contact me at my new address m dot s dot tsirkin at gmail dot com (address mangled to confuse spambots, replace dot with . and at with @ to get the actual mail address) Near term, I might not have time for openfabrics related issues, and might not monitor openfabrics lists. Please copy me directly if my attention is required. Here is a list of people at Mellanox you might want to contact: Oren Kladnitsky [EMAIL PROTECTED] - for firmware, imgen and mstflint Eli Cohen [EMAIL PROTECTED] - for IPoIB, mlx4 and mthca Jim Mott [EMAIL PROTECTED] - for SDP Jack Morgenstein [EMAIL PROTECTED] - for core, mlx4, mthca, libmlx4, libmthca Vlad Sokolovsky [EMAIL PROTECTED] - for OFED kernel, backports and build Tziporet Koren [EMAIL PROTECTED] - for OFED release, perftest Sagi Rotem [EMAIL PROTECTED] - for perftest Take care, -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Mellanox target workaround in SRP
On Mon, Jan 10, 2011 at 10:51:13AM -0800, Roland Dreier wrote: Maybe we can use MST's current email to ask him... Michael, do you have any memory of the issue we worked around here? I have question regarding workaround introduced in commit 559ce8f1 of the mainline tree: IB/srp: Work around data corruption bug on Mellanox targets Data corruption has been seen with Mellanox SRP targets when FMRs create a memory region with I/O virtual address != 0. Add a workaround that disables FMR merging for Mellanox targets (OUI 0002c9). I don't see how this can make a difference to the target -- it sees an address and length, and there should be no visible difference to it when it gets an FMR versus a direct-mapped region of the same space, right? And how is it different than getting a direct or indirect descriptor with a similar offset? I could see there being a bug on the initiator HCA not liking such FMR mappings, but then it should be keyed off of the vendor of our HCA and not the target. I'm sure this was tested and shown to fix the problem; I'm just confused as to what the problem really was and if this is still relevant. Can someone please enlighten me? I don't recall unfortunately. Sorry. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg