from:"Michael S. Tsirkin"

[ewg] [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR

2007-06-18 Thread Michael S. Tsirkin

Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.

This fixes bug https://bugs.openfabrics.org/show_bug.cgi?id=662

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

 Quoting Woodruff, Robert J [EMAIL PROTECTED]:
 Subject: RE: [ofa-general] crash in ipoib
 
 Sean wrote,
  And here's a version with error handling fixed.
  Sean, does this solve your crash?
 
 We've been running this patch since yesterday and haven't seen any 
 crashes.  We'll continue testing this over the week-end.
 
 - Sean
 
 This looks like it fixed the panic. 
 
 Should we try to put out a new RC with this latest ipoib fix ?
 I really think we need it in the release. If we could get another RC out
 today,
 that would only delay the release by a couple of more days and we could
 release on next Friday rather than wed. and still give people a week to 
 test the final RC.
 
 woody

OK, the following patch has been added to OFED 1.2.
Roland, please consider this bugfix for 2.6.22.

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
return -ENOMEM;
p-dev = dev;
p-id = cm_id;
+   cm_id-context = p;
+   p-state = IPOIB_CM_RX_LIVE;
+   p-jiffies = jiffies;
+   INIT_LIST_HEAD(p-list);
+
p-qp = ipoib_cm_create_rx_qp(dev, p);
if (IS_ERR(p-qp)) {
ret = PTR_ERR(p-qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
if (ret)
goto err_modify;
 
+   spin_lock_irq(priv-lock);
+   queue_delayed_work(ipoib_workqueue,
+  priv-cm.stale_task, IPOIB_CM_RX_DELAY);
+   /* Add this entry to passive ids list head, but do not re-add it
+* if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+   p-jiffies = jiffies;
+   if (p-state == IPOIB_CM_RX_LIVE)
+   list_move(p-list, priv-cm.passive_ids);
+   spin_unlock_irq(priv-lock);
+
ret = ipoib_cm_send_rep(dev, cm_id, p-qp, event-param.req_rcvd, psn);
if (ret) {
ipoib_warn(priv, failed to send REP: %d\n, ret);
-   goto err_rep;
+   if (ib_modify_qp(p-qp, ipoib_cm_err_attr, IB_QP_STATE))
+   ipoib_warn(priv, unable to move qp to error state\n);
}
-
-   cm_id-context = p;
-   p-jiffies = jiffies;
-   p-state = IPOIB_CM_RX_LIVE;
-   spin_lock_irq(priv-lock);
-   if (list_empty(priv-cm.passive_ids))
-   queue_delayed_work(ipoib_workqueue,
-  priv-cm.stale_task, IPOIB_CM_RX_DELAY);
-   list_add(p-list, priv-cm.passive_ids);
-   spin_unlock_irq(priv-lock);
return 0;
 
-err_rep:
 err_modify:
ib_destroy_qp(p-qp);
 err_qp:

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: Issue with IPoIB-CM being enabled at boot

2007-07-03 Thread Michael S. Tsirkin

 Quoting Jeremy Brown [EMAIL PROTECTED]:
 Subject: Re: Issue with IPoIB-CM being enabled at boot
 
 I apologize for replying to myself, but I just set up two em64t systems
 with Mellanox HCAs, Fedora 4, and a fresh build and installation of OFED
 1.2, and the IPoIB interfaces came up in datagram mode, despite the fact
 that IPoIB is enabled and configured to come up in connected mode.

Does it help if you do
#/etc/init.d/openibd restart
?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC OFED-1.3 installation

2007-07-17 Thread Michael S. Tsirkin

 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: RFC OFED-1.3 installation

 On Tue, 2007-07-17 at 19:27 +0300, Michael S. Tsirkin wrote:
   Quoting Doug Ledford [EMAIL PROTECTED]:
   Subject: Re: RFC OFED-1.3 installation

   On Tue, 2007-07-17 at 18:25 +0300, Michael S. Tsirkin wrote:
 Let me give an example.  In OFED 1.0, you shipped dapl version 1.2.  
 In
 OFED 1.1, you also shipped dapl version 1.2.  However, code inspection
 shows that between OFED 1.0 and OFED 1.1, dapl did in fact change 
 (not a
 lot, but anything is enough).  So, between OFED 1.0 and OFED 1.1, you
 have two different versions of dapl, but with exactly the same version
 number.  A person can't tell them apart.

Yes, this sure looks like a problem. I think that versioning needs to 
be addressed
at the package level, not at OFED level though. Right?

   Versioning needs to be addressed at both levels.  You need versions of
   software to start with, but then you still need releases of packages to
   differentiate between different builds of a specific version of
   software.

  Why would we want to have different builds of a specific version of software
  for a specific OS?  Could you give an example pls?

 It's how you integrate needed patches immediately while waiting on the
 next release of the software.

OK.

 ...
 You also bump the release number of the package any time you make
 changes to the spec file and rebuild.

Since we have spec files as part of package, this will be really
the same as the previous case, right?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC OFED-1.3 installation

2007-07-17 Thread Michael S. Tsirkin

 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: RFC OFED-1.3 installation

 On Tue, 2007-07-17 at 19:45 +0300, Michael S. Tsirkin wrote:
   Quoting Doug Ledford [EMAIL PROTECTED]:
   Subject: Re: RFC OFED-1.3 installation

   On Tue, 2007-07-17 at 19:27 +0300, Michael S. Tsirkin wrote:
 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: RFC OFED-1.3 installation

 On Tue, 2007-07-17 at 18:25 +0300, Michael S. Tsirkin wrote:
   Let me give an example.  In OFED 1.0, you shipped dapl version 
   1.2.  In
   OFED 1.1, you also shipped dapl version 1.2.  However, code 
   inspection
   shows that between OFED 1.0 and OFED 1.1, dapl did in fact change 
   (not a
   lot, but anything is enough).  So, between OFED 1.0 and OFED 1.1, 
   you
   have two different versions of dapl, but with exactly the same 
   version
   number.  A person can't tell them apart.

  Yes, this sure looks like a problem. I think that versioning needs 
  to be addressed
  at the package level, not at OFED level though. Right?

 Versioning needs to be addressed at both levels.  You need versions of
 software to start with, but then you still need releases of packages 
 to
 differentiate between different builds of a specific version of
 software.

Why would we want to have different builds of a specific version of 
software
for a specific OS?  Could you give an example pls?

   It's how you integrate needed patches immediately while waiting on the
   next release of the software.

  OK.

   ...
   You also bump the release number of the package any time you make
   changes to the spec file and rebuild.

  Since we have spec files as part of package, this will be really
  the same as the previous case, right?

 Depends.  Right now the spec file gets its version out of the configure
 stuff.  That version only updates when you update the version of the
 software itself.  It doesn't increment on each change to the source
 repo, only on the major updates when you would release a new tarball
 anyway.  Package versioning is, by necessity, finer grained than source
 repo versioning.  You don't release a new dapl tarball just because you
 updated some comments to remove a typo.  But you *do* update rpm
 versions on every single change, at least if you are going to distribute
 the rpm.

 Look, rpms are just like versioned tarballs.  Once they go out in the
 wild, that particular name-version-release combination is FROZEN.

It really looks like this is a work around for when you want to apply
a patch without going through maintainer.

The way OFED release process works, we really don't
do releases all that often, and when we do, we can coordinate with
the maintainer.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC OFED-1.3 installation

2007-07-17 Thread Michael S. Tsirkin

 There are lots of things that we as a distributor have to care about
 that upstream generally does not.  The spec file and patches are how we
 solve our customer's problems.  They are what make a stable
 distribution, as opposed to a bleeding edge, must always update to
 latest upstream version to fix any problem system, a reality.  It's the
 difference between RHEL and Fedora.

I think I am getting it - you want to release a patched version of some OFED
library without going through openfabrics? OK.
So I imagine that's when you would increment the rpm-specific version number.
But I can't see why would an OFED release want to play with these.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC OFED-1.3 installation

2007-07-17 Thread Michael S. Tsirkin

 So you need to be able to
 tell the difference between a customer running libibverbs-1.0.4 from
 OFED-1.3-beta1 and libibverbs-1.0.4 from OFED-1.3 final.

I don't really think we want customers to run beta code, or intend to support
such configurations.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-23 Thread Michael S. Tsirkin

Quoting Arthur Jones [EMAIL PROTECTED]:
Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

hi michael, ...

On Tue, Jun 12, 2007 at 11:41:08AM +0300, Michael S. Tsirkin wrote:
 For whom it may concern,
 I have created an ofed git tree updated with kernel bits from 2.6.22-rc4,
 and put that up at git://git.openfabrics.org/~mst/ofed_kernel.git
 [...] 
 In particular, there were a ton of ipath patches that it seems were
 for the most part applied.
 Qlogic maintainers, please help double check that I did not miss something
 of value.

thanks for setting this up, i'm still looking
at the diffs to make sure things got setup
correctly for the ipath stuff...

i have found it difficult to navigate the
source having to run:

./ofed_scripts/configure --kernel-version=2.6.xxx --without-quilt

everytime to check against our tree.  so, rather
than spending the better part of the afternoon
running these scripts by hand, i created a shell
script to populate a bunch of branches with the
backports in each branch.

at qlogic we now keep the backports as branches in
our git tree and this, i find, is much easier to
handle.  because:

* viewing and navigating backport source becomes
  _much_ easier.
* merges are easier -- patches are much more fragile
  than branches.
* comparisons are easier -- checking for differences
  between backports and between a backport and the
  canonical source is faster and more convenient...
* changesets are readable.  trying to decipher diffs
  to patches is medically proven to take months, if not
  years, off your life.

Sigh. I wish it were possible to do everything through
addons tricks.

I see the advantages of the bush of branches -
for example it's possible
to add a backport patch to a recent kernel, and then
merge this into other kernel branches.

But I also see a serious problem with addressing: basically
git tracks content. It's not designed to track a bush
of branches taken together.  For example, take tagging:
tag namespace is global, so you can not have the same
tag point at multiple branches at the same time.

anyway, what do you think?  is there anyway i could
convince you to dump the backport patches and put
all the backports in branches?  i'm willing to do the
legwork if you see value...

Can you publish the scripts and/or the tree?
I think we can start by just running the scripts nightly,
making it possible for people to view backport history
with gitview.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
  [...]
  But I also see a serious problem with addressing: basically
  git tracks content. It's not designed to track a bush
  of branches taken together.  For example, take tagging:
  tag namespace is global, so you can not have the same
  tag point at multiple branches at the same time.
 
 agreed.  however, the way we use git, with the
 location of the git DB as the tag, it's not
 really a problem in practice.

who uses git this way?

 but tagging each
 branch separately is indeed a PITA...

This is just one problem.
For example, git pull can only merge one branch at a time.

  anyway, what do you think?  is there anyway i could
  convince you to dump the backport patches and put
  all the backports in branches?  i'm willing to do the
  legwork if you see value...
  
  can you publish the scripts and/or the tree?
  i think we can start by just running the scripts nightly,
  making it possible for people to view backport history
  with gitview.
 
 i've attached the script that i'm using to compare
 the trees, but it's a total hack.  it doesn't keep
 the patch history.  that would not be too hard to
 do i guess -- if there's interest...
 
 to run the script:
 
 cp attached files here...
 $ git clone git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel
 $ cd ofed_kernel
 $ for b in `cat ../ofed-backports.txt`; do ../create-backport.sh $b; done
 
 now you'll have a bunch of backport-2.6.xxx branches...

So, would you like to have this script run nightly on ofed trees?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 hi michael, ...
 
 On Tue, Jul 24, 2007 at 06:32:28PM +0300, Michael S. Tsirkin wrote:
  [...]
For example, git pull can only merge one branch at a time.
   
   how is this a problem?  the way i use git,
   i use a script to reflow the changes into
   the dependent branches.  over the last few
   months, anyway, it has worked fine...
  
  Precisely because no one developed on these branches,
  so you are re-generating themfrom patches - not a problem,
  but as you point out not too useful either.
 
 well, no, i _have_ been doing development on the
 local branches in our internal repo.  i also
 merge in changes that you make to the ofed repo
 to our internal backport branches.  the script
 i posted is just so that i can more easily compare
 our internal branches to the ofed backport branches.

How do you do the merging?

  If people start developing on these branches, then
  eventually you will need to merge them - and git only merges
  them one at a time.
 
 yes, i have to merge them one at a time.  i
 still don't see how this is a problem.  backport
 changes can be pulled in and the changes from
 upstream can be merged in as well.  i haven't
 had a problem with this so far.  can you be more
 specific about what you expect will fail?

Well, as distro maintainers we need to merge a lot, from different
people. We'll have to write all kind of scripts to do it instead of
a plain git pull.

And, I expect almost all git operations will have to be wrapped
in a script in some way, to operate on a bush of branches.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

 Quoting Arthur Jones [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

 hi michael, ...

 On Tue, Jul 24, 2007 at 06:09:09PM +0300, Michael S. Tsirkin wrote:
   Quoting Arthur Jones [EMAIL PROTECTED]:
   Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

   hi michael, ...

   On Tue, Jul 24, 2007 at 06:03:41AM +0300, Michael S. Tsirkin wrote:
[...]
But I also see a serious problem with addressing: basically
git tracks content. It's not designed to track a bush
of branches taken together.  For example, take tagging:
tag namespace is global, so you can not have the same
tag point at multiple branches at the same time.

   agreed.  however, the way we use git, with the
   location of the git DB as the tag, it's not
   really a problem in practice.

  who uses git this way?

 i do.

   but tagging each
   branch separately is indeed a PITA...

  This is just one problem.
  For example, git pull can only merge one branch at a time.

 how is this a problem?  the way i use git,
 i use a script to reflow the changes into
 the dependent branches.  over the last few
 months, anyway, it has worked fine...

Precisely because no one developed on these branches,
so you are re-generating themfrom patches - not a problem,
but as you point out not too useful either.

If people start developing on these branches, then
eventually you will need to merge them - and git only merges
them one at a time.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

  Because you only have your driver to maintain.
 
 no, i have to maintain quite a few of the
 ofed backport branches as well for our release.
 if i started getting pull requests from people
 with changes to 15 backport branches in one go,
 i'd probably want to script it...

Yea. Happens all the time here: when component maintainer
makes a change, it will typically affect all backports or none.

 i have found that drawing a DAG with graphviz has
 been a big help in making sure that i organize the
 branches correctly...

Ugh .. *that* sounds complicated.
Looks like it's much simpler with current setup.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

 i'd _really_ like to see a list of the advantages of
 patches over branches.  it's hard for me to know if
 i'm just missing something if the case is not laid out...

Here's a short list off the top of my head

- A single git pull merges any number of backport changes
- A single git reset ORIG_HEAD recovers from a conflicting merge
- A single tag tags all code for all kernels
- On update from upstream, if there is a conflict
  between upstream code and and a patch
  it's easy to temporarily remote the patch, complete the merge,
  and go bugger the patch author
- For recent kernels there are almost no patches.
  So an update from upstream for these kernels is free,
  with branches I will still need to update all branches.
- Adding a fix which only affects common code
  is currently straight-forward: make a change, commit.
  With multiple branches every fix must be pulled into
  all branches.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-24 Thread Michael S. Tsirkin

 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
 
 Here's a short list off the top of my head
 
 - A single git pull merges any number of backport changes
 - A single git reset ORIG_HEAD recovers from a conflicting merge
 - A single tag tags all code for all kernels
 - On update from upstream, if there is a conflict
   between upstream code and and a patch
   it's easy to temporarily remote the patch, complete the merge,
   and go bugger the patch author
 - For recent kernels there are almost no patches.
   So an update from upstream for these kernels is free,
   with branches I will still need to update all branches.
 - Adding a fix which only affects common code
   is currently straight-forward: make a change, commit.
   With multiple branches every fix must be pulled into
   all branches.
 
 You seem to be overlooking the fact that you already require a script to 
 check that things work for all kernels.  Until you apply a series of 
 patches to form a particular kernel, you don't know if a change that you 
 pulled in caused a conflict.  You still have the requirement to verify 
 the fix on all kernels, and it still requires running a script that 
 pushes/pops patches to create each tree.

Yes. But I find it preferable to manage history with
full power of native git tools, where a single hash identifies a revision,
and limit the scope of the scripts to the build process.

This, as opposed to an elaborate methodology that is based
on naming conventions, and requires use of scripts to do
basic tasks such as tagging, history rewriting, etc.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-25 Thread Michael S. Tsirkin

  - A single git reset ORIG_HEAD recovers from a conflicting merge
 
 handling conflicts is a big part of a maintainer's job!

Because you are a driver maintainer.
That's what's different here from regular merge.
Please understand: we have upstream code and we have changes against it.

Upstream code is golden. If some patch conflicts with it,
it is always this patch that needs to be fixed.
And I want to ability to bounce that job to patch author -
I simply do not know enough about e.g. ehca.

 also, if the upstream
 changes touch code that conflicts with a backport
 patch, you get to fix the problem as it happens

That's exactly the thing that I do not want to do.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality

2007-07-25 Thread Michael S. Tsirkin

 Quoting Stefan Roscher [EMAIL PROTECTED]:
 Subject: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap 
 functonality
 
 
 
 Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
 ---
 backport_ehca_2_rhel45_umap.patch |  850 
 ++
 1 files changed, 850 insertions(+)

Guys,
I have updated the ofed_kernel (destined for OFED 1.3)
kernel tree to 2.6.23-rc1, and this patch no longer applies.

The conflicts aren't trivial (e.g. there's been ABI change).

I moved it to kernel_patches/attic for now.

Could you please take a look and update the patch for that tree?

The updated code is here:

git://git.openfabrics.org/~mst/ofed_kernel.git ofed_kernel

I expect Vlad'll pull it soon, too.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] add_open_iscsi_h.patch

2007-07-25 Thread Michael S. Tsirkin

Erez, add_open_iscsi_h currently does:

-#include scsi/iscsi_if.h
+#include iscsi_if.h

why is ths bit needed?


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: add_open_iscsi_h.patch

2007-07-25 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: Re: add_open_iscsi_h.patch

 Michael S. Tsirkin wrote:

   Quoting Erez Zilber [EMAIL PROTECTED]:
   Subject: Re: add_open_iscsi_h.patch

   Michael S. Tsirkin wrote:

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: Re: add_open_iscsi_h.patch

 Michael S. Tsirkin wrote:

  Erez, add_open_iscsi_h currently does:

  -#include scsi/iscsi_if.h
  +#include iscsi_if.h

  why is ths bit needed?

 Strange. I remember that I couldn't build OFED 1.2 without it in the
 past. I tried to rebuild it without this now, and it compiles
 successfully, so let's remove that code.

OK, I killed these patches completely and things still build fine.
Vlad, please pull my tree into ofed_kernel.

   Yes, it also works for me. I guess that these are all leftovers.

  Deleted. Hmm. Do we want to kill them in 1.2.c too?

 Yes (why not?)

Donnu. It's in bugfix-only mode after all. You decide.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits

2007-07-25 Thread Michael S. Tsirkin

   also, if the upstream
   changes touch code that conflicts with a backport
   patch, you get to fix the problem as it happens
  
  That's exactly the thing that I do not want to do.
 
 you don't want to know about a problem a patch
 until days or weeks later when the auto build
 keeps failing and you don't know why?  it is
 easy to catch many problems _before_ the build
 check fails...

I don't work this way.

I just just apply all patches before pushing out.
And I see *immediately* the patch that conflicts - unlike merge
conflict where I will know which file conflicts but not
which change created the conflict.

And if a patch conflicts with upstream code,
an option to move the patch aside and defer
the merge decision to patch author
is very important to me: this just happened
with ehca backport and update to 2.6.23-rc1.
I do not want to delay update to 2.6.23-rc1 until
IBM can be bothered to update their backport.

Yes, this means that the specific module won't
build on a specific kernel until the conflict
is resolved. But there are multiple conflicts and each
needs to be resolved by another person.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap functonality

2007-07-25 Thread Michael S. Tsirkin

 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: [PATCH ofed-1.2-rc3 2/4] ehca: backport for rhel-4.5 - mmap 
 functonality
 
 Hi Michael,
 Below is the version without conflicts. And it should compile.

Seems to apply fine. I pushed it out. Vlad, can you take it pls?

 As soon as the build scripts are ready, I'll test the whole backport.

What kind of scripts are you waiting for?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] RFC: SRC API

2007-07-29 Thread Michael S. Tsirkin

Hello!
Here is an API proposal for support of the SRC
(scalable reliable connected) protocol extension in libibverbs.

This adds APIs to:
- manage SRC domains

- share SRC domains between processes,
  by means of creating a 1:1 association
  between an SRC domain and a file.

Notes:
- The file is specified by means of a file descriptor,
  this makes it possible for the user to manage file
  creation/deletion in the most flexible manner
  (e.g. tmpfile can be used).

- I envision implementing this sharing mechanism in kernel by means
  of a per-device tree, with inode as a key and domain object
  as a value.
 
Please comment.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index acc1b82..503f201 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -370,6 +370,11 @@ struct ibv_ah_attr {
uint8_t port_num;
 };
 
+struct ibv_src_domain {
+   struct ibv_context *context;
+   uint32_thandle;
+};
+
 enum ibv_srq_attr_mask {
IBV_SRQ_MAX_WR  = 1  0,
IBV_SRQ_LIMIT   = 1  1
@@ -389,7 +394,8 @@ struct ibv_srq_init_attr {
 enum ibv_qp_type {
IBV_QPT_RC = 2,
IBV_QPT_UC,
-   IBV_QPT_UD
+   IBV_QPT_UD,
+   IBV_QPT_SRC
 };
 
 struct ibv_qp_cap {
@@ -408,6 +414,7 @@ struct ibv_qp_init_attr {
struct ibv_qp_cap   cap;
enum ibv_qp_typeqp_type;
int sq_sig_all;
+   struct ibv_src_domain  *src_domain;
 };
 
 enum ibv_qp_attr_mask {
@@ -526,6 +533,7 @@ struct ibv_send_wr {
uint32_tremote_qkey;
} ud;
} wr;
+   uint32_tsrc_remote_srq_num;
 };
 
 struct ibv_recv_wr {
@@ -553,6 +561,10 @@ struct ibv_srq {
pthread_mutex_t mutex;
pthread_cond_t  cond;
uint32_tevents_completed;
+
+   uint32_tsrc_srq_num;
+   struct ibv_src_domain  *src_domain;
+   struct ibv_cq  *src_cq;
 };
 
 struct ibv_qp {
@@ -570,6 +582,8 @@ struct ibv_qp {
pthread_mutex_t mutex;
pthread_cond_t  cond;
uint32_tevents_completed;
+
+   struct ibv_src_domain  *src_domain;
 };
 
 struct ibv_comp_channel {
@@ -912,6 +926,25 @@ struct ibv_srq *ibv_create_srq(struct ibv_pd *pd,
   struct ibv_srq_init_attr *srq_init_attr);
 
 /**
+ * ibv_create_src_srq - Creates a SRQ associated with the specified protection
+ *   domain and src domain.
+ * @pd: The protection domain associated with the SRQ.
+ * @src_domain: The SRC domain associated with the SRQ.
+ * @src_cq: CQ to report completions for SRC packets on.
+ *
+ * @srq_init_attr: A list of initial attributes required to create the SRQ.
+ *
+ * srq_attr-max_wr and srq_attr-max_sge are read the determine the
+ * requested size of the SRQ, and set to the actual values allocated
+ * on return.  If ibv_create_srq() succeeds, then max_wr and max_sge
+ * will always be at least as large as the requested values.
+ */
+struct ibv_srq *ibv_create_src_srq(struct ibv_pd *pd,
+  struct ibv_src_domain *src_domain,
+  struct ibv_cq *src_cq,
+  struct ibv_srq_init_attr *srq_init_attr);
+
+/**
  * ibv_modify_srq - Modifies the attributes for the specified SRQ.
  * @srq: The SRQ to modify.
  * @srq_attr: On input, specifies the SRQ attributes to modify.  On output,
@@ -1074,6 +1107,44 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid 
*gid, uint16_t lid);
  */
 int ibv_fork_init(void);
 
+/**
+ * ibv_alloc_src_domain - Allocate an SRC domain
+ * Returns a reference to an SRC domain.
+ * Use ibv_put_src_domain to free the reference.
+ * @context: Device context
+ */
+struct ibv_src_domain *ibv_get_new_src_domain(struct ibv_context *context);
+
+/**
+ * ibv_share_src_domain - associate the src domain with a file.
+ * Establishes a connection between an SRC domain object and a file descriptor.
+ *
+ * @d: SRC domain to share
+ * @fd: descriptor for a file to associate with the domain
+ */
+int ibv_share_src_domain(struct ibv_src_domain *d, int fd);
+
+/**
+ * ibv_unshare_src_domain - disassociate the src domain from a file.
+ * Subsequent calls to ibv_get_shared_src_domain will fail.
+ * @d: SRC domain to unshare
+ */
+int ibv_unshare_src_domain(struct ibv_src_domain *d);
+
+/**
+ * ibv_get_src_domain - get a reference to shared SRC domain
+ * @context: Device context
+ * @fd: descriptor for a file associated with the domain
+ */
+struct ibv_src_domain *ibv_get_shared_src_domain(struct ibv_context *context, 
int fd);
+
+/**
+ * ibv_put_src_domain - destroy a reference to an SRC domain
+ * If this is the last reference, destroys the domain.
+ * @d: reference to SRC domain to put
+ */
+int ibv_put_src_domain(struct ibv_src_domain *d

[ewg] Re: [ofa-general] RFC: SRC API

2007-07-30 Thread Michael S. Tsirkin

 On Sun, Jul 29, 2007 at 05:04:31PM +0300, Michael S. Tsirkin wrote:
  Hello!
  Here is an API proposal for support of the SRC
  (scalable reliable connected) protocol extension in libibverbs.
  
  This adds APIs to:
  - manage SRC domains
  
  - share SRC domains between processes,
by means of creating a 1:1 association
between an SRC domain and a file.
  
  Notes:
  - The file is specified by means of a file descriptor,
this makes it possible for the user to manage file
creation/deletion in the most flexible manner
(e.g. tmpfile can be used).
  
  - I envision implementing this sharing mechanism in kernel by means
of a per-device tree, with inode as a key and domain object
as a value.
   
  Please comment.
 Can you provide a pseudo code of an application using this API?
 Especially QP sharing part.

There's no QP sharing here.
You mean SRC domain sharing?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] RFC: SRC API

2007-07-30 Thread Michael S. Tsirkin

Some code examples:
/* create a domain and share it: */

struct ibv_src_domain * d = ibv_get_new_src_domain(ctx);
int fd = open(path, O_CREAT | O_RDWR, mode);
ibv_share_src_domain(d, fd);

/* get a reference to a shared domain: */

int fd = open(path, O_CREAT | O_RDWR, mode);
struct ibv_src_domain * d = ibv_get_shared_src_domain(ctx, fd);

/* once done: */
ibv_put_src_domain(d);

Note: when all users do put, domain is destroyed.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] RFC: SRC API

2007-07-30 Thread Michael S. Tsirkin

More code examples:

Create an SRC QP, part of SRC domain:

attr.qp_type = IBV_QPT_SRC;
attr.src_domain = d;
qp = ibv_create_qp(pd, attr);

Given remote SRQ number, send data to this SRQ over an SRC QP:

wr.src_remote_srq_num = src_remote_srq_num;
ib_post_send(qp, wr);

Note: SRQ number needs to be exchanged as part of CM private data
  or some other protocol.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] RFC: SRC API

2007-07-30 Thread Michael S. Tsirkin

 Quoting Gleb Natapov [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] RFC: SRC API
 
 On Mon, Jul 30, 2007 at 12:16:39PM +0300, Michael S. Tsirkin wrote:
  More code examples:
  
  Create an SRC QP, part of SRC domain:
  
  attr.qp_type = IBV_QPT_SRC;
  attr.src_domain = d;
  qp = ibv_create_qp(pd, attr);
  
  Given remote SRQ number, send data to this SRQ over an SRC QP:
  
  wr.src_remote_srq_num = src_remote_srq_num;
  ib_post_send(qp, wr);
  
  Note: SRQ number needs to be exchanged as part of CM private data
or some other protocol.
  
 You are too brief. I can come up with one linears based on the API by
 myself. I am trying to understand how sharing of SRC between processes
 will work and your example doesn't show this.

It seems what you are missing is what SRC is, not how to use the API.
I'll have a working example when I get closer to implementation.
For now you'll have to look up Dror's preso if you want to
understand what SRC is.

 Can I connected the same
 SRC to different QPs? If yes, can I send packet to any SRQ connected to
 the SRC through any QP connected to the same SRC?

Yes to both.

 If yes how is this
 different from having regular QPs?

With regular QP you can only send to a single SRQ.
But again, look at Dror's preso.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] RFC: SRC API

2007-07-30 Thread Michael S. Tsirkin

  It seems what you are missing is what SRC is, not how to use the API.
 
 So tell us.

This calls for a separate document. From feedback from Sonoma I really assumed
people have it figured out.

Let's open a separate thread, and there I will try writing up
what SRC is from the protocol point of view.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Scalable reliable connection

2007-07-30 Thread Michael S. Tsirkin


Here's some background on what SRC is.  This is basically slide 6 in Dror's
talk, for those that missed the talk.

 * * *

SRC is an extension supported by recent Mellanox hardware
which is geared toward reducing the number of QPs
required for all-to-all communication on systems
with a high number of jobs per node.

===
Motivation:
===
Given N nodes with J jobs per node, number of QPs required
for all-to-all communication is:

With RC:
O((N * J) ^ 2)

Since each job out of O(N * J) jobs must create a single QP
to communicate with each one of O(N * J) other jobs.

With SRC:
O(N ^ 2 * J)

This is achived by using a single send queue (per job, out of O(N * J) 
jobs)
to send data to all J jobs running on a specific node (out of O(N) 
nodes).
Hardware uses new SRQ number field in packet header to
multiplex receive WRs and WCs to private memory of each job.

This is similiar idea to IB RD.
Q: Why not use RD then?
A: Because no hardware supports it.

Details:

===
Verbs extension:
===

- There is a new transport/QP type SRC.
- There is a new object type SRC domain
- Each SRQ gets new (optional) attributes:
SRC domain
SRC SRQ number
SRC CQ
  SRQ must have either all 3 of these or none of these attributes

- QPs of type SRC have all the same attributes as regular RC QPs
  connected to SRQ, except that:
  A. Each SRC QP has a new required attribute SRC domain
  B. SRC QPs do *not* have SRQ attribute
(do not have a specific SRQ associated with them)

===
Protocol extension:
===
SRC QP behaviour: Requestor
- Post send WR for this QP type is extended with SRQ number field
  This number is sent as part of packet header
- SRC Packets follow rules for RC packets on the wire, exactly
  What is different is their handling at the responder side

SRC QP behaviour: Responder
Each incoming packet passes transport checks with respect
to the SRC QP, following RC rules, exactly.

After this, SRQ number in packet header is used to look up
a specific SRQ. SRC domain of the resulting SRQ must be equal
to SRC domain of the QP, otherwise a NAK is sent,
and QP moves to error state.

If the SRC domains match, receive WR and receive WC processing
are as follows:

- RC Send
  - Rather than using SRQ to which the QP is attached,
SRQ is looked up by SRQ number in the packet.
Receive WR is taken from this SRQ.
  - Completions are generated on the CQ specified in the SRQ

- RDMA/Atomic
  - Rather than using PD to which the QP is attached,
SRQ is looked up by SRQ number in the packet.
PD of this SRQ is used for protection checks.
===
 
-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/4]: add kfifo from upstream for SLES9 RH4

2007-07-30 Thread Michael S. Tsirkin

 The following patches add kfifo to ibcore (for SLES9  RH4). kfifo is taken 
 from upstream code.

Thanks, applied to 1.2.c and ofed_kernel.
Vlad already took 1.2.c, and will I guess take ofed_kernel
after it passes his checks.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: Scalable reliable connection

2007-07-31 Thread Michael S. Tsirkin

 Quoting Gleb Natapov [EMAIL PROTECTED]:
 Subject: Re: Scalable reliable connection
 
 On Mon, Jul 30, 2007 at 03:50:54PM +0300, Michael S. Tsirkin wrote:
  With SRC:
  O(N ^ 2 * J)
  
  This is achived by using a single send queue (per job, out of O(N * J) 
  jobs)
  to send data to all J jobs running on a specific node (out of O(N) 
  nodes).
  Hardware uses new SRQ number field in packet header to
  multiplex receive WRs and WCs to private memory of each job.
  
 But since the send queue cannot be used for receiving packets additional
 receive QPs have to be created one per job so with SRC it is actually
 O(N ^ 2 * J + N * J)
 unless I am missing something.

Yes but since N = 1, N ^ 2 = N and so O(N ^ 2 * J + N * J) == O(N ^ 2 * J).

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: Scalable reliable connection

2007-07-31 Thread Michael S. Tsirkin

 Quoting Tang, Changqing [EMAIL PROTECTED]:
 Subject: RE: Scalable reliable connection
 
 
 A send queue can only serve max J jobs within a node. Is it possible to
 make a single send queue to serve all jobs on all nodes ?

How do you propose to do this?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: patches for 1.2.c

2007-07-31 Thread Michael S. Tsirkin

 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: patches for 1.2.c
 
 Guys,
 
 I have 2 more patches to go in ofed_1_2/ofed_1_2_c.
 
 Is there some grand scheme to the naming of kernel_patches/fixes/* for 
 1.2.c?  I noticed a slew of new files for the post-2.6.22 fixes, and 
 wondered if there is a naming scheme?

Not really, just stick the module name in there please so it's
easy to figure that cxgb3 is involved.

 Or should I just post a patch for the ofed_1_2 branch and let you all 
 create the ofed_1_2_c kernel_patches/fixes/ patch file ??

It's best if you post the patch that should go into kernel_patches/fixes/,
or clone the ofed_1_2_c branch and add the file there.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons

2007-07-31 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to 
 kernel_addons
 
 The following patches move open-iscsi crypto functions from kernel_patches to 
 kernel_addons. By doing so, we also solve a bug in iscsi tx hash that caused 
 an oops when crc32c was used for data digest.

Great, these patches were really fragile.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] Re: OFED 1.2.c-9 is available

2007-07-31 Thread Michael S. Tsirkin

 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] Re: OFED 1.2.c-9 is available
 
   Why under drivers/net rather than drivers/infiniband like all the
   other drivers ? Does this really need special casing (in libibumad) ?
 
 Tziporet is incorrect.  There's nothing from the mlx4_core driver
 either, and when it is implemented, it should work exactly the same as
 all other drivers.

At some point you suggested sticking this stuff under the pci device and
adding softlinks under drivers/infiniband, so that
if there's an ethernet device on top of the core these can be shared.

Not sure how to do this though, and no idea why would
just adding the attributes in both places be any worse, either.

Comments?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status

2007-08-02 Thread Michael S. Tsirkin

 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build 
 status
 
 Also,
 
 Is something broken in the ofed_1_2 branch?  I cannot even build against 
 the local kernel on the ofa server using the ~vlad/ofed_1_2/linux-2.6 
 repository.

Does directory ~vlad/ofed_1_2/linux-2.6 exist?


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status

2007-08-02 Thread Michael S. Tsirkin

Looke here:

/home/vlad/scripts/ofed_1_2

Quoting Steve Wise [EMAIL PROTECTED]:
Subject: Re: [ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily 
build status

I'm havin' a bad day.

Can you all help me?

My normal process is to use the build_ofa_kernel.sh script from the 
ofabuild repository to build against all ofed kernels.  But that scripts 
in the master branch of the ofabuild repository now assumes 1.2.c 
because it tries to configure in the connectx device.  There aren't 
ofed_1_2 and ofed_1_2_c branches in that repos for tree-specific build 
scripts.

S:

What exactly should I be using to do cross-compile builds of my patched 
trees before submitting patches for inclusion into ofed?

Thanks and sorry for the pain.  And if there a RTFM somewhere that I 
should be readying, feel free to say RTFM. :)

Steve.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to kernel_addons

2007-08-05 Thread Michael S. Tsirkin


Vlad?

Quoting Erez Zilber [EMAIL PROTECTED]:
Subject: Re: [PATCH 0/2] IB/iser: move open-iscsi crypto functions 
to?kernel_addons

Michael S. Tsirkin wrote:

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: [PATCH 0/2] IB/iser: move open-iscsi crypto functions to 
 kernel_addons

 The following patches move open-iscsi crypto functions from kernel_patches 
 to kernel_addons. By doing so, we also solve a bug in iscsi tx hash that 
 caused an oops when crc32c was used for data digest.
 

 Great, these patches were really fragile.

   
I saw that these patches are not in 1.2.c-10. Will they be in 1.2.c-11?
This is a real bug fix.

Thanks,
Erez

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] RFCv2: SRC API

2007-08-06 Thread Michael S. Tsirkin

This is version 2 of the proposal, addressing comments
from version 1.

Changelog:
- Use oflags to make API smaller
- Clarify sharing semantics
- Add documentation

This is the API proposal for support of the SRC
(scalable reliable connected) protocol extension in libibverbs.

This adds APIs to:
- manage SRC domains

- share SRC domains between processes,
by means of creating a 1:1 association
between an SRC domain and an inode.

Notes:
- The inode is specified by means of a file descriptor,
this makes it possible for the user to manage file
creation/deletion in the most flexible manner
(e.g. tmpfile can be used).

- I envision implementing this sharing mechanism in kernel by means
of a per-device tree, with inode as a key and domain object
as a value.

Please comment.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]



diff --git a/SRC.txt b/SRC.txt
new file mode 100644
index 000..3881477
--- /dev/null
+++ b/SRC.txt
@@ -0,0 +1,133 @@
+Here's some documentation on Scalable Reliable Connections.
+
+ * * *
+
+SRC is an extension supported by recent Mellanox hardware
+which is geared toward reducing the number of QPs
+required for all-to-all communication on systems
+with a high number of jobs per node.
+
+===
+Motivation:
+===
+Given N nodes with J jobs per node, number of QPs required
+for all-to-all communication is:
+
+With RC:
+   O((N * J) ^ 2)
+
+   Since each job out of O(N * J) jobs must create a single QP
+   to communicate with each one of O(N * J) other jobs.
+
+With SRC:
+   O(N ^ 2 * J)
+
+   This is achived by using a single send queue (per job, out of O(N * J) 
jobs)
+   to send data to all J jobs running on a specific node (out of O(N) 
nodes).
+   Hardware uses new SRQ number field in packet header to
+   multiplex receive WRs and WCs to private memory of each job.
+
+This is similiar idea to IB RD.
+Q: Why not use RD then?
+A: Because no hardware supports it.
+
+Details:
+
+===
+Verbs extension:
+===
+
+- There is a new transport/QP type SRC.
+- There is a new object type SRC domain
+- Each SRQ gets new (optional) attributes:
+SRC domain
+   SRC SRQ number
+SRC CQ
+  SRQ must have either all 3 of these or none of these attributes
+
+- QPs of type SRC have all the same attributes as regular RC QPs
+  connected to SRQ, except that:
+  A. Each SRC QP has a new required attribute SRC domain
+  B. SRC QPs do *not* have SRQ attribute
+   (do not have a specific SRQ associated with them)
+
+===
+Protocol extension:
+===
+SRC QP behaviour: Requestor
+- Post send WR for this QP type is extended with SRQ number field
+  This number is sent as part of packet header
+- SRC Packets follow rules for RC packets on the wire, exactly
+  What is different is their handling at the responder side
+
+SRC QP behaviour: Responder
+Each incoming packet passes transport checks with respect
+to the SRC QP, following RC rules, exactly.
+
+After this, SRQ number in packet header is used to look up
+a specific SRQ. SRC domain of the resulting SRQ must be equal
+to SRC domain of the QP, otherwise a NAK is sent,
+and QP moves to error state.
+
+If the SRC domains match, receive WR and receive WC processing
+are as follows:
+
+- RC Send
+  - Rather than using SRQ to which the QP is attached,
+SRQ is looked up by SRQ number in the packet.
+Receive WR is taken from this SRQ.
+  - Completions are generated on the CQ specified in the SRQ
+
+- RDMA/Atomic
+  - Rather than using PD to which the QP is attached,
+SRQ is looked up by SRQ number in the packet.
+PD of this SRQ is used for protection checks.
+
+===
+Pseudo code:
+===
+
+Consider again a setup where there are N nodes with J jobs per node.
+All N * J jobs need to perform all-to-all communication.
+Using RC QPs, this would call for O((N * J) ^ 2) QPs.
+Here is how SRC can be used to reduce the number of QPs to O(N ^ 2 * J).
+
+At startup:
+1. All jobs on each node share a single SRC domain
+2. Each job creates a CQ for receive WCs
+3. Each job creates a SRQ attached to this CQ and to the shared domain
+
+When job j1 needs to transmit to job j2 on remote node n for the first time:
+1. Test: does job j1 have an existing connection to some job on node n?
+- If no:
+   j1 creates an SRC QP qp1 (send QP)
+   qp1 is only used to post send WRs
+   j2 creates an SRC QP qp2
+   qp2 is part of SRC domain

[ewg] Re: ofa_1_2_c_kernel 20070802-0201 daily build status

2007-08-06 Thread Michael S. Tsirkin


 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status
 
 Hello Michael and Vladimir!
  ehca backports for kernel.org kernels seem to be broken.
  1. Does anyone care enough to fix them? If not we'll disable
 ehca in build for these kernels.
 I downloaded daily build package ofa_1_2_c_kernel-20070804-0200.tgz
 and followed the build scheme configure, make on 2.6.19, 2.6.18, 2.6.17
 and 2.6.16/sles10/sles10_sp1. Except for 2.6.16/sles10/sles10_sp1
 a patch for kmem_cache_zalloc() is required for ehca the others were
 built without errors, see below. Thus, I'm wondering what I'm doing
 differently than your daily build script?

Could be different kernel configs or compiler version?
Can you please build on ofa server against kernels in ~vlad/kernel.org/?
The cross tool chain is here: /home/vlad/cross/

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1

2007-08-06 Thread Michael S. Tsirkin

Let's not do it this way.

I think the right thing is to implement kmem_cache_zalloc
by means of kmem_cache_allocand memset in kernel_addons.



Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
Subject: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
2.6.10/sles10/sles10_sp1

Hello Michael and Vladimir!
This patch below adds a backport patch for ehca to the dirs 2.6.16, 
2.6.16_sles10
and 2.6.16_sles10_sp1 underneath kernel_patches/backport of ofed-1.2.c source 
tree.
Thanks!
Nam



backport kmem_cache_zalloc() to 2.6.10, 2.6.10_sles10 and 2.6.10_sles10_sp1

Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED]
---

 2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch|   97 +++
 2.6.16_sles10/ehca_kmem_cache_zalloc_to_2_6_16.patch |   97 +++
 2.6.16_sles10_sp1/ehca_kmem_cache_zalloc_to_2_6_16.patch |   97 +++
 3 files changed, 291 insertions(+)

diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
 
ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
--- 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
   1970-01-01 01:00:00.0 +0100
+++ 
ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
2007-08-06 00:53:59.0 +0200
@@ -0,0 +1,97 @@
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c   
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c
2007-08-06 00:41:50.0 +0200
+@@ -134,13 +134,14 @@ struct ib_cq *ehca_create_cq(struct ib_d
+   if (cqe = 0x - 64 - additional_cqe)
+   return ERR_PTR(-EINVAL);
+ 
+-  my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL);
++  my_cq = kmem_cache_alloc(cq_cache, GFP_KERNEL);
+   if (!my_cq) {
+   ehca_err(device, Out of memory for ehca_cq struct device=%p,
+device);
+   return ERR_PTR(-ENOMEM);
+   }
+ 
++  memset(my_cq, 0, sizeof(*my_cq));
+   memset(param, 0, sizeof(struct ehca_alloc_cq_parms));
+ 
+   spin_lock_init(my_cq-spinlock);
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c  
2007-08-06 00:40:58.0 +0200
+@@ -113,9 +113,11 @@ static struct kmem_cache *ctblk_cache = 
+ 
+ void *ehca_alloc_fw_ctrlblock(gfp_t flags)
+ {
+-  void *ret = kmem_cache_zalloc(ctblk_cache, flags);
++  void *ret = kmem_cache_alloc(ctblk_cache, flags);
+   if (!ret)
+   ehca_gen_err(Out of memory for ctblk);
++  else
++  memset(ret, 0, EHCA_PAGESIZE);
+   return ret;
+ }
+ 
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c  
2007-08-06 00:39:30.0 +0200
+@@ -55,8 +55,9 @@ static struct ehca_mr *ehca_mr_new(void)
+ {
+   struct ehca_mr *me;
+ 
+-  me = kmem_cache_zalloc(mr_cache, GFP_KERNEL);
++  me = kmem_cache_alloc(mr_cache, GFP_KERNEL);
+   if (me) {
++  memset(me, 0, sizeof(*me));
+   spin_lock_init(me-mrlock);
+   } else
+   ehca_gen_err(alloc failed);
+@@ -73,8 +74,9 @@ static struct ehca_mw *ehca_mw_new(void)
+ {
+   struct ehca_mw *me;
+ 
+-  me = kmem_cache_zalloc(mw_cache, GFP_KERNEL);
++  me = kmem_cache_alloc(mw_cache, GFP_KERNEL);
+   if (me) {
++  memset(me, 0, sizeof(*me));
+   spin_lock_init(me-mwlock);
+   } else
+   ehca_gen_err(alloc failed);
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c   
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c
2007-08-06 00:38:14.0 +0200
+@@ -50,13 +50,14 @@ struct ib_pd *ehca_alloc_pd(struct ib_de
+ {
+   struct ehca_pd *pd;
+ 
+-  pd = kmem_cache_zalloc(pd_cache, GFP_KERNEL);
++  pd = kmem_cache_alloc(pd_cache, GFP_KERNEL);
+   if (!pd) {
+   ehca_err(device,

[ewg] Re: RFCv2: SRC API

2007-08-06 Thread Michael S. Tsirkin

 Only of the job among j2, j3, j4 on remote node n need to create a
 receiving qp2 for j1, right ?

Correct. A single QP can be used to send data to any SRQ that shares the
same domain.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFCv2: SRC API

2007-08-06 Thread Michael S. Tsirkin

 Quoting Tang, Changqing [EMAIL PROTECTED]:
 Subject: RE: RFCv2: SRC API
 
  
 
   OK, I was wrong before, here is my question.
   
   if remote node n has j2, j3, and j4, and j2 is the job to 
  create qp2 
   and make connection with qp1 in j1.
   if j2 is done before j3 and j4, then we can not let j2 to 
  destroy qp2, 
   because j3 and j4 are still communicating with j1. Since j2 
  owns qp2, 
   j2 need to be the last job to cleanup.
   
   Am I right ?
  
  Correct. Is this clear from the text, or is some kind of 
  additional clarification necessary?
 
 It is not clear at the first read, so please add one sentence to clarify
 it.

Would something like this help?

Cleanup:
When job j1 does not need to communicate to any jobs on node n,
it disconnects qp1 from qp2, and asks j2 to destroy qp2.
+
+Note: both qp1 and qp2 must exist for the communication to take place.
+Thus, j2 should not destroy qp2 (and in particular, should not exit)
+until j1 has completed communication with node n and
+has asked j2 to disconnect.


 if j2 is the last job to cleanup, how can it know all other jobs on the
 same node has called 
 ibv_close_src_domain(), and it is time for itself to cleanup ?
 
 Is this something upto application to do ?

No, this is handled automatically.
Have you seen this text?
 * ibv_close_src_domain - close an SRC domain
 * If this is the last reference, destroys the domain.
 
So, each job has a reference to the domain.
Once the last reference is gone, the domain is destroyed.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFCv2: SRC API

2007-08-06 Thread Michael S. Tsirkin

 Quoting Tang, Changqing [EMAIL PROTECTED]:
 Subject: RE: RFCv2: SRC API
 
  
  Cleanup:
  When job j1 does not need to communicate to any jobs on node 
  n, it disconnects qp1 from qp2, and asks j2 to destroy qp2.
  +
  +Note: both qp1 and qp2 must exist for the communication to 
  take place.
  +Thus, j2 should not destroy qp2 (and in particular, should not exit) 
  +until j1 has completed communication with node n and has asked j2 to 
  +disconnect.
  
 Thanks. 
 
 Another question. if a node n has 8 jobs, say, j2-j9, usually the first
 job j2 is the one to create the SRC
 domain(other jobs just attach and share) and it make sense to let j2 to
 create all the receiving QPs for all other
 remote jobs and make all the connections. (we can do in roundrobin way,
 but more work).

Sure, creating allconnections upfront will work to, this is just a usage 
example.

 Is there any performance worry to let j2(the first job on a node) to do
 all the work ?

How do you mean?

 What is the latency of SRC+SRQ ?

I'd expect it to be more or less the same as regular SRQ.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c

2007-08-06 Thread Michael S. Tsirkin

 Quoting Ramachandra K [EMAIL PROTECTED]:
 Subject: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c
 
 
 Vlad,
 
 Please apply this VNIC patch series to both the OFED-1.2 and OFED-1.2.c
 branches.
 
 This series contains changes to the VNIC driver for supporting iPath and the
 new version of the VEx hardware, the Ethernet Virtual I/O Controller (EVIC).

I don't see how adding features to 1.2 *at this stage* can be justufied.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c

2007-08-06 Thread Michael S. Tsirkin

 Quoting Kuchimanchi, Ramachandra [EMAIL PROTECTED]:
 Subject: RE: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c

 -Original Message-
 From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
 Sent: Mon 8/6/2007 11:48 PM
 To: Kuchimanchi, Ramachandra
 Cc: [EMAIL PROTECTED]; ewg@lists.openfabrics.org
 Subject: Re: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c

  Quoting Ramachandra K [EMAIL PROTECTED]:
  Subject: [PATCH 0/5 VNIC] VNIC patch series for OFED-1.2 and OFED-1.2.c

  Vlad,

  Please apply this VNIC patch series to both the OFED-1.2 and OFED-1.2.c
  branches.

  This series contains changes to the VNIC driver for supporting iPath and the
  new version of the VEx hardware, the Ethernet Virtual I/O Controller (EVIC).

  I don't see how adding features to 1.2 *at this stage* can be justufied.

 Just to clarify, when I by OFED-1.2, I meant the next release in the 1.2
 series of OFED i.e OFED-1.2.1 and ultimately for OFED-1.3 down the line. Is
 there any other branch designated for that ?

I think EWG decided that the next release in the 1.2 series will be 1.2.c.
So far, the definition of 1.2.c was 1.2 plus bugfixes plus connectx support.

Stuff intended for 1.3 should go here for now:
git://openfabrics.org/~vlad/ofed_kernel ofed_kernel
This has been updated to 2.6.23-rc2, but otherwise is tracking ofed_1_2_c.

 And I hope there is no objection for inclusion of these patches in OFED-1.2.c
 branch.

This looks like a change of methodology so this might be something EWG
would have to agree on. Right?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1

2007-08-06 Thread Michael S. Tsirkin

Hmm, I thought about it some more.
kmem_cache struct is not exported on recent kernels,
so this might br hard to do.

So I think the patch is probably the right approach, after all.

Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
2.6.10/sles10/sles10_sp1

Let's not do it this way.

I think the right thing is to implement kmem_cache_zalloc
by means of kmem_cache_allocand memset in kernel_addons.



Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
Subject: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
2.6.10/sles10/sles10_sp1

Hello Michael and Vladimir!
This patch below adds a backport patch for ehca to the dirs 2.6.16, 
2.6.16_sles10
and 2.6.16_sles10_sp1 underneath kernel_patches/backport of ofed-1.2.c source 
tree.
Thanks!
Nam



backport kmem_cache_zalloc() to 2.6.10, 2.6.10_sles10 and 2.6.10_sles10_sp1

Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED]
---

 2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch|   97 +++
 2.6.16_sles10/ehca_kmem_cache_zalloc_to_2_6_16.patch |   97 +++
 2.6.16_sles10_sp1/ehca_kmem_cache_zalloc_to_2_6_16.patch |   97 +++
 3 files changed, 291 insertions(+)

diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
 
ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
--- 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
   1970-01-01 01:00:00.0 +0100
+++ 
ofa_1_2_c_kernel-20070804-0200/kernel_patches/backport/2.6.16/ehca_kmem_cache_zalloc_to_2_6_16.patch
2007-08-06 00:53:59.0 +0200
@@ -0,0 +1,97 @@
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_cq.c   
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_cq.c
2007-08-06 00:41:50.0 +0200
+@@ -134,13 +134,14 @@ struct ib_cq *ehca_create_cq(struct ib_d
+   if (cqe = 0x - 64 - additional_cqe)
+   return ERR_PTR(-EINVAL);
+ 
+-  my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL);
++  my_cq = kmem_cache_alloc(cq_cache, GFP_KERNEL);
+   if (!my_cq) {
+   ehca_err(device, Out of memory for ehca_cq struct device=%p,
+device);
+   return ERR_PTR(-ENOMEM);
+   }
+ 
++  memset(my_cq, 0, sizeof(*my_cq));
+   memset(param, 0, sizeof(struct ehca_alloc_cq_parms));
+ 
+   spin_lock_init(my_cq-spinlock);
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_main.c 
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_main.c  
2007-08-06 00:40:58.0 +0200
+@@ -113,9 +113,11 @@ static struct kmem_cache *ctblk_cache = 
+ 
+ void *ehca_alloc_fw_ctrlblock(gfp_t flags)
+ {
+-  void *ret = kmem_cache_zalloc(ctblk_cache, flags);
++  void *ret = kmem_cache_alloc(ctblk_cache, flags);
+   if (!ret)
+   ehca_gen_err(Out of memory for ctblk);
++  else
++  memset(ret, 0, EHCA_PAGESIZE);
+   return ret;
+ }
+ 
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_mrmw.c  
2007-08-06 00:39:30.0 +0200
+@@ -55,8 +55,9 @@ static struct ehca_mr *ehca_mr_new(void)
+ {
+   struct ehca_mr *me;
+ 
+-  me = kmem_cache_zalloc(mr_cache, GFP_KERNEL);
++  me = kmem_cache_alloc(mr_cache, GFP_KERNEL);
+   if (me) {
++  memset(me, 0, sizeof(*me));
+   spin_lock_init(me-mrlock);
+   } else
+   ehca_gen_err(alloc failed);
+@@ -73,8 +74,9 @@ static struct ehca_mw *ehca_mw_new(void)
+ {
+   struct ehca_mw *me;
+ 
+-  me = kmem_cache_zalloc(mw_cache, GFP_KERNEL);
++  me = kmem_cache_alloc(mw_cache, GFP_KERNEL);
+   if (me) {
++  memset(me, 0, sizeof(*me));
+   spin_lock_init(me-mwlock);
+   } else
+   ehca_gen_err(alloc failed);
+diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c 
ofa_1_2_c_kernel-20070804-0200/drivers/infiniband/hw/ehca/ehca_pd.c
+--- ofa_1_2_c_kernel-20070804-0200_orig/drivers/infiniband/hw/ehca/ehca_pd.c   
2007-08-04 11:00:05.0 +0200
 ofa_1_2_c_kernel-20070804-0200/drivers

[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1

2007-08-07 Thread Michael S. Tsirkin

I'm happy with stuff as it is: the ifdefs make it easy to figure
which version does the backport apply.

BTW, I think the same backport will be needed for older kernels as well, no?


Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
2.6.10/sles10/sles10_sp1

Hello Michael!
Below is the patch to backport kmem_cache_zalloc() for 2.6.16/sles10/sles10_sp1
as we've discussed previously. Thereby I realized current backport code
in slab.h looks weird to me (sort of copypaste mixture) - actually no build
error, only coding issue. 
Therefore this patch also includes some cleanup. If it's ok, please apply.
PS: The mentioned issue in backport slab.h exists also in other versions.
If you want me to fix them as well, let me know.
Regards
Nam


backport kmem_cache_zalloc() in slab.h to 2.6.10, 2.6.10_sles10 and 
2.6.10_sles10_sp1

Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED]
---

 2.6.16/include/linux/slab.h|   22 +++---
 2.6.16_sles10/include/linux/slab.h |   22 +++---
 2.6.16_sles10_sp1/include/linux/slab.h |   22 +++---
 3 files changed, 21 insertions(+), 45 deletions(-)

diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16/include/linux/slab.h
 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16/include/linux/slab.h
--- 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16/include/linux/slab.h
  2007-08-04 11:00:08.0 +0200
+++ 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16/include/linux/slab.h
   2007-08-06 18:29:17.0 +0200
@@ -1,10 +1,8 @@
-#include_next linux/slab.h
+#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16
+#define _LINUX_SLAB_BACKPORT_TO_2_6_16
 
 #include_next linux/slab.h
 
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
 void *kmemdup(const void *src, size_t len, gfp_t gfp)
 {
@@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le
return p;
 }
 
-#endif
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags)
 {
-   void *p;
-
-   p = kmalloc(len, gfp);
-   if (p)
-   memcpy(p, src, len);
-   return p;
+   void *ret = kmem_cache_alloc(cache, flags);
+   if (ret)
+   memset(ret, 0, kmem_cache_size(cache));
+   return ret;
 }
 
 #endif
diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h
 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h
--- 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h
   2007-08-04 11:00:08.0 +0200
+++ 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h
2007-08-06 18:30:33.0 +0200
@@ -1,10 +1,8 @@
-#include_next linux/slab.h
+#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16
+#define _LINUX_SLAB_BACKPORT_TO_2_6_16
 
 #include_next linux/slab.h
 
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
 void *kmemdup(const void *src, size_t len, gfp_t gfp)
 {
@@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le
return p;
 }
 
-#endif
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags)
 {
-   void *p;
-
-   p = kmalloc(len, gfp);
-   if (p)
-   memcpy(p, src, len);
-   return p;
+   void *ret = kmem_cache_alloc(cache, flags);
+   if (ret)
+   memset(ret, 0, kmem_cache_size(cache));
+   return ret;
 }
 
 #endif
diff -Nurp 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h
 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h
--- 
ofa_1_2_c_kernel-20070804-0200_orig/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h
   2007-08-04 11:00:08.0 +0200
+++ 
ofa_1_2_c_kernel-20070804-0200/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h
2007-08-06 18:30:40.0 +0200
@@ -1,10 +1,8 @@
-#include_next linux/slab.h
+#ifndef _LINUX_SLAB_BACKPORT_TO_2_6_16
+#define _LINUX_SLAB_BACKPORT_TO_2_6_16
 
 #include_next linux/slab.h
 
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
 void *kmemdup(const void *src, size_t len, gfp_t gfp)
 {
@@ -16,19 +14,13 @@ void *kmemdup(const void *src, size_t le
return p;
 }
 
-#endif
-#ifndef BACKPORT_LINUX_STRING_TO_2_6_18
-#define BACKPORT_LINUX_STRING_TO_2_6_18
-
 static inline
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void

[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1

2007-08-07 Thread Michael S. Tsirkin

 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
 2.6.10/sles10/sles10_sp1

 On Tuesday 07 August 2007 15:23, Michael S. Tsirkin wrote:
   Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
   Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
   2.6.10/sles10/sles10_sp1

   Hello Michael!
   Below is the patch to backport kmem_cache_zalloc() for 
   2.6.16/sles10/sles10_sp1
   as we've discussed previously. Thereby I realized current backport code
   in slab.h looks weird to me (sort of copypaste mixture) - actually no 
   build
   error, only coding issue. 
   Therefore this patch also includes some cleanup. If it's ok, please apply.
   PS: The mentioned issue in backport slab.h exists also in other versions.
   If you want me to fix them as well, let me know.
   Regards
   Nam

  Would not the following work? If yes, Vlad, I parked this at
 Wow, you're pretty quick. Yes, it should work. And you're right we need
 this patch for =2.6.16.
 PS: The weird thing in slab.h I meant previously is that the 
 ifdef-kmem_dup()-block exists twice in same file, which does not harm 
 the build.
 Thanks!
 Nam

I haven't seen this. Right.
Can you please post a separate patch that just kills
the duplicates?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 2.6.10/sles10/sles10_sp1

2007-08-07 Thread Michael S. Tsirkin

 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
 2.6.10/sles10/sles10_sp1

 On Tuesday 07 August 2007 15:23, Michael S. Tsirkin wrote:
   Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
   Subject: Re: [PATCH ofed-1.2.c] ehca: backport kmem_cache_zalloc() for 
   2.6.10/sles10/sles10_sp1

   Hello Michael!
   Below is the patch to backport kmem_cache_zalloc() for 
   2.6.16/sles10/sles10_sp1
   as we've discussed previously. Thereby I realized current backport code
   in slab.h looks weird to me (sort of copypaste mixture) - actually no 
   build
   error, only coding issue. 
   Therefore this patch also includes some cleanup. If it's ok, please apply.
   PS: The mentioned issue in backport slab.h exists also in other versions.
   If you want me to fix them as well, let me know.
   Regards
   Nam

  Would not the following work? If yes, Vlad, I parked this at
 Wow, you're pretty quick. Yes, it should work. And you're right we need
 this patch for =2.6.16.
 PS: The weird thing in slab.h I meant previously is that the 
 ifdef-kmem_dup()-block exists twice in same file, which does not harm 
 the build.
 Thanks!
 Nam

OK, I've cleaned these up. Thanks for pointing this out.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ehca: fix bugs to support rhel 4.5 in OFED 1.2.c-11

2007-08-13 Thread Michael S. Tsirkin

 * ehca patches for 2.6.23-rcX were incorporated, which is not acceptable
   for us to support in 1.2.c. Upstream code of ehca in kernel contains
   major changes in order to support ehca2 with new features, which is
   targeted for ofed-1.3. We have not requested to have those new
   features for ofed-1.2.1/1.2.c/1.2.5.

The following command gives empty output, which demonstrates
that no changes on top of 2.6.22 were applied to ehca sources in ofed_1_2_c:

$ git log v2.6.22..ofed_1_2_c -- drivers/infiniband/hw/ehca/
$

 * In kernel_addons/backport/2.6.16 (including sles10/sles10_sp1) I don't
   see the backport of kmem_cache_zalloc() as we have discussed and agreed
   on last week.
   See http://lists.openfabrics.org/pipermail/ewg/2007-August/004186.html

 * Compiler error report from today's ofed_1_2_c daily build script - I
   consider 2.6.16 as an example:
 --
 Build failed on powerpc with linux-2.6.16
 Log:
 /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:831: error: invalid
 type argument of -
 /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:834: error: invalid
 type argument of -
 /home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.c:835: error: invalid
 type argument of -
 make[4]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband/hw/ehca/ehca_main.o] Error 1
 make[3]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband/hw/ehca] Error 2
 make[2]: *** [/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check/drivers/infiniband] Error 2
 make[1]: ***
 [_module_/home/vlad/tmp/ofa_1_2_c_kernel-20070813-0200_linux-2.6.16
 _powerpc_check] Error 2
 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.16'
 make: *** [kernel] Error 2
 --
 
   I downloaded ofa_1_2_c_kernel-20070813-0200, ran
   configure --with-core-mod --with-ehca-mod --with-ipoib-mod
 --with-user_access-mod
   on our native ppc64 system and looked at ehca_main.c source code:
 
 int __init ehca_module_init(void)
 {
   ret = sysfs_create_group(ehca_driver.driver.kobj,
 ehca_drv_attr_grp);
   if (ret) /* only complain; we can live without attributes */
 #831:   ehca_gen_err(Cannot create driver attributes  ret=%d, ret);
 
   if (ehca_poll_all_eqs != 1) {
 #834ehca_gen_err(WARNING!!!);
 ehca_gen_err(It is possible to lose interrupts.);
   } else {
 init_timer(poll_eqs_timer);
 poll_eqs_timer.function = ehca_poll_eqs;
 poll_eqs_timer.expires = jiffies + HZ;
 add_timer(poll_eqs_timer);
   }
 
   Thus, the line number does not match as reported. It looks like we
   have some config issues on ofa build server. I'll take time tomorrow
   to look there. Please advice us how to reproduce this errors.
   Vlad, does your build script detect and report patch rejects? That
   would help to see such one error sooner.
 
   Needless to say I could build ofed without errors on our ppc64 systems.

My guess from all of the above is that something's wrong with the tarball.
Can you please get code from git and work from there?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: ofa_1_2_c_kernel 20070802-0201 daily build status

2007-08-13 Thread Michael S. Tsirkin

 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status

 On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote:
   Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
   Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status

   Hello Doug and Scott!

   On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote:
ehca backports for kernel.org kernels seem to be broken.
1. Does anyone care enough to fix them? If not we'll disable
   ehca in build for these kernels.

2. Could you upload kernels for RHEL4U5 and SLES10 ppc64?

 Don't you guys already have RHEL4U5?  It had a backports directory in
 the OFED 1.2 release...and it's been out for quite a while...

Our cross build environment has the headers from the x86_64 version
but not the ppc version.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: OFED Aug 13 meeting summary

2007-08-13 Thread Michael S. Tsirkin

 1. OFED 1.2.5 (was 1.2.c) is ready for release:
 An issue with ehca: There are patches form kernel 2.6.23 that were 
 inserted by mistake and must be removed before the release

There aren't, really. The snapshot generating scripts seem
to be broken and seem to put code from ofed_kernel branch
under the 1.2.c name.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status

2007-08-14 Thread Michael S. Tsirkin

 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build 
 status

 On Tue, 2007-08-14 at 09:59 +0200, Hoang-Nam Nguyen wrote:
  Hi Doug!
   On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote:
 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status

 Hello Doug and Scott!

 On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote:
  ehca backports for kernel.org kernels seem to be broken.
  1. Does anyone care enough to fix them? If not we'll disable
 ehca in build for these kernels.

  2. Could you upload kernels for RHEL4U5 and SLES10 ppc64?

   Don't you guys already have RHEL4U5?  It had a backports directory in
   the OFED 1.2 release...and it's been out for quite a while...
  Some part of this thread might confuse. And really, it's not about
  any specific backport issue from ehca or other component(s). It's a
  general prereq for ofed's daily build to have rhel4.5 resp sles10 ppc64
  in their daily build runs too.
  Thanks
  Nam

 All of the kernel rpms from our U5 kernel have been on my web page in my
 sig for *ages*.  All you need to do is download the needed rpms and
 install.

I think there's no way to unpack these without a ppc machine, though.
Is that right?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Re: OFED Aug 13 meeting summary

2007-08-14 Thread Michael S. Tsirkin

 Quoting Stefan Roscher [EMAIL PROTECTED]:
 Subject: Re: [ewg] Re: OFED Aug 13 meeting summary
 
 On Tuesday 14 August 2007 14:06, Tziporet Koren wrote:
  Michael S. Tsirkin wrote:
   1. OFED 1.2.5 (was 1.2.c) is ready for release:
   An issue with ehca: There are patches form kernel 2.6.23 that were 
   inserted by mistake and must be removed before the release
   
  
   There aren't, really. The snapshot generating scripts seem
   to be broken and seem to put code from ofed_kernel branch
   under the 1.2.c name.
 
  Good - so we will release 1.2.5 today
 
 Hi Tziporet,
 
 can we ensure that this patch
 http://lists.openfabrics.org/pipermail/ewg/2007-August/004299.html is apllied?
 Without this patch we have a broken ehca build on rhel-4.5.
 
 regards Stefan
 
I pushed a fix for this out in my tree, so Vlad will take it in 1.2.c branch.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: ib_local_sa.ko is not created

2007-08-23 Thread Michael S. Tsirkin

IIRC this tree includes new local sa bits from Sean which
are interated as part of sa module.

Quoting Erez Zilber [EMAIL PROTECTED]:
Subject: ib_local_sa.ko is not created

Vlad,


I'm trying to build  run ofa_kernel from
git://git.openfabrics.org/~vlad/ofed_kernel.git ofed_kernel


I'm running the following configure cmd:


./configure --with-core-mod --with-ipoib-mod --with-mthca-mod
--with-mlx4-mod --with-iser-mod


However, ib_local_sa.ko is not created after I run `make`:


[EMAIL PROTECTED] ofed_kernel]# ll drivers/infiniband/core/*.ko
-rw-r--r-- 1 root root 467132 Aug 23 2007 drivers/infiniband/core/ib_cm.ko
-rw-r--r-- 1 root root 1239100 Aug 23 2007
drivers/infiniband/core/ib_core.ko
-rw-r--r-- 1 root root 761570 Aug 23 2007 drivers/infiniband/core/ib_mad.ko
-rw-r--r-- 1 root root 744899 Aug 23 2007 drivers/infiniband/core/ib_sa.ko
-rw-r--r-- 1 root root 232824 Aug 23 2007 drivers/infiniband/core/iw_cm.ko


Did I miss something?


Thanks,

-- 



Erez Zilber | 972-9-971-7689

Software Engineer, Storage Solutions Team

Voltaire – _The Grid Backbone_

__

www.voltaire.com http://www.voltaire.com/



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: ib_local_sa.ko is not created

2007-08-26 Thread Michael S. Tsirkin

 It disagrees about the symbol version because my machine still has the
 original ib_local_sa module that comes with RH4 up4. How can we solve
 this problem?

Reboot the machine.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] stop OFED before uninstalling it

2007-08-27 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: Re: [PATCH] stop OFED before uninstalling it
 
 Tziporet Koren wrote:
 
  Erez Zilber wrote:
  stop OFED before uninstalling it
 
  Signed-off-by: Erez Zilber [EMAIL PROTECTED]
  ---
   uninstall.sh |5 +
   1 files changed, 5 insertions(+), 0 deletions(-)
 
  diff --git a/uninstall.sh b/uninstall.sh
  index 177b8a1..89ee3f1 100755
  --- a/uninstall.sh
  +++ b/uninstall.sh
  @@ -110,6 +110,11 @@ uninstall()
   {
   local RC=0
   local OLD_PREFIX=
  +
  +echo Stopping OFED stack
  +echo
  +/etc/init.d/openibd stop
  +
   echo
   echo Removing ${PACKAGE} Software installations
   echo

 
  What would the install do if this is failing or machine hang?
 
  Tziporet
 
 
 The user will have to stop OFED at some point. If we don't stop OFED
 while uninstalling, he will stop it later (and then the machine may hang).
 
 The motivation for this patch is: if the user installs OFED over an
 older version (while the old version is running), he will eventually
 have a new version of OFED installed with an old loaded version. This
 may lead to strange scenarios. For example: if the user tries to load
 iSER, modprobe will fail because iSER (from the new OFED version) cannot
 use the loaded OFED modules (from the old version). Of course, this can
 happen with any OFED module.
 
 Actually, this fix is related to bug #536. Maybe we should move this
 discussion to bugzilla.

NAK.
This would break e.g. systems which rely on ipoib for connectivity.

iSER failing with clear version conflict message seems like a minor problem.
How about just documenting this?  How about producing a message telling the user
to reboot?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC: OFED-1.3-20070823-1130 - first build

2007-08-27 Thread Michael S. Tsirkin

 Quoting Yosef Etigin [EMAIL PROTECTED]:
 Subject: Re: RFC: OFED-1.3-20070823-1130 - first build
 
 Hi Vlad,
 
 I have some comments regarding install.pl.
 Overall, I think it's too long for a perl script.

So ... what's your point?

 1. The first ~1K lines are a database of the existing packages.
It has some unneccesary initiallizations:
  selected = 0, installed = 0, rpm_exist = 0, rpm_exist32 = 0

I agree here.

 In my opinion, this database could exist an an external XML file,
rather easy to parse that with perl.

It's hard to see what inventing yet another format would buy us.
Let's keep it simple.

 2. How about doing a ? b : c instead of if (a) { b } else { c } ? 

Looks like a matter of style.

 3. There are some copy-and-paste blocks.. for example, in select_packages():
 
 instead of:
   if ($package eq mvapich2_conf_impl) {
   $mvapich2_conf_impl = $selected;
   next;
   }
   elsif ...
 write:
   if ($package =~ /^mvapich2_conf_/) {
   $$package = $selected;
   next;
   }
 
same for the stuff in set_compilers()
 
 4. Instead of print RED ..., RESET \n; exit 1, you could do smth like 
 error()
since redirecting this to files causes some mess
 
 5. instead of iterating over arrays and checking conditions you
could use grep, map, and such.

Could. But shouldn't.
Simple loops are much easier to understand.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] ib_local_sa.ko is not created

2007-08-30 Thread Michael S. Tsirkin

 Quoting Vladimir Sokolovsky [EMAIL PROTECTED]:
 Subject: Re: [ewg] ib_local_sa.ko is not created
 
 Erez Zilber wrote:
  Vlad,
  
  
  I'm trying to build  run ofa_kernel from
  git://git.openfabrics.org/~vlad/ofed_kernel.git ofed_kernel
  
  
  I'm running the following configure cmd:
  
  
  ./configure --with-core-mod --with-ipoib-mod --with-mthca-mod
  --with-mlx4-mod --with-iser-mod
  
  
  However, ib_local_sa.ko is not created after I run `make`:
  
  
  [EMAIL PROTECTED] ofed_kernel]# ll drivers/infiniband/core/*.ko
  -rw-r--r-- 1 root root 467132 Aug 23 2007 drivers/infiniband/core/ib_cm.ko
  -rw-r--r-- 1 root root 1239100 Aug 23 2007
  drivers/infiniband/core/ib_core.ko
  -rw-r--r-- 1 root root 761570 Aug 23 2007 drivers/infiniband/core/ib_mad.ko
  -rw-r--r-- 1 root root 744899 Aug 23 2007 drivers/infiniband/core/ib_sa.ko
  -rw-r--r-- 1 root root 232824 Aug 23 2007 drivers/infiniband/core/iw_cm.ko
  
  
  Did I miss something?
  
  
  Thanks,
  
 
 Hi Erez,
 You are right, it is not in ofed_kernel yet.

1.2.5 and 1.3 include sean_local_sa_*.patch
patches which implement local sa caching. It just isn't
put in a separate module the way it was in 1.2.

 See the following commit:
 
 commit b054b6c133aa89907ee93e5d105c0d44774e9e6a
 Author: Michael S. Tsirkin [EMAIL PROTECTED]
 Date:   Tue May 29 16:07:56 2007 +0300
 
  Update fixes for kernel 2.6.21-rc3: remove applied patches,
  update patches dma_map_sg.patch and zap_ipoib_5_cm_drain_by_send_wr.patch
  Patch merged_sean_rdma_dev_ofed_1_2.patch is out for now - it mixes
  multiple topics, most of them merged already, except local sa.
  Remember to generate and add in local sa patch at some later point.
 
  Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
 
 Regards,
 Vladimir

This commit is wy old - the patches have since been updated
by Sean and made their way in 1.2.5 release and 1.3 tree.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] OFED 1.2.5 - GA release

2007-09-07 Thread Michael S. Tsirkin

 Quoting Arlin Davis [EMAIL PROTECTED]:
 Subject: Re: [ofa-general] OFED 1.2.5 - GA release
 
 
 
 How can I build/install OFED 1.2.5 with ib_local_sa.ko? It seems to 
 build but does not install and I need SA caching options.
 
 
 Can anyone tell me how to get ib_local_sa.ko installed with OFED 1.2.5? 
 We cannot move to OFED 1.2.5 without SA caching options.

ib_local_sa was merged with ib_sa in 1.2.5.
There are no extra modules to load.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3

2007-09-11 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: RE: [PATCH 0/2] IB/iser: iSCSI  iSER fixes for RH4 in OFED 1.3

  Quoting Erez Zilber [EMAIL PROTECTED]:
  Subject: [PATCH 0/2] IB/iser: iSCSI  iSER fixes for RH4 in OFED 1.3

  The following patches fix bugs in open-iscsi over iSER for the RH4 
  backport in OFED 1.3.

  can you pls stick this in a git tree so I can pull?

 No problem. I thought that you can take the patch from the e-mail. Anyway, 
 the git tree is here:

 git://git.openfabrics.org/~erezz/linux-2.6.git ofed_kernel

I can, it's just much more work, and with Vlad not
here I might not find the time. git pull takes several seconds.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3

2007-09-11 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: RE: [PATCH 0/2] IB/iser: iSCSI  iSER fixes for RH4 in OFED 1.3

  Quoting Erez Zilber [EMAIL PROTECTED]:
  Subject: [PATCH 0/2] IB/iser: iSCSI  iSER fixes for RH4 in OFED 1.3

  The following patches fix bugs in open-iscsi over iSER for the RH4 
  backport in OFED 1.3.

  can you pls stick this in a git tree so I can pull?

 No problem. I thought that you can take the patch from the e-mail. Anyway, 
 the git tree is here:

 git://git.openfabrics.org/~erezz/linux-2.6.git ofed_kernel

What about
kernel_patches/backport/2.6.9_U5/iser_cmd_to_2_6_22.patch

given the name, isn't it needed in other kernels up to 2.6.22 too?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [GIT PULL ofed_1_2_c] cxgb3 bug fixes

2007-09-16 Thread Michael S. Tsirkin

Done. I'll push soon.

Quoting Steve Wise [EMAIL PROTECTED]:
Subject: [GIT PULL ofed_1_2_c] cxgb3 bug fixes

Vlad (Michael/Tziporet in Vlad's absence),

Please integrate the following cxgb3 bug fixes into ofed-1.2.5.  All of 
these patches are either in 2.6.23 or merged into Jeff Garzik's upstream 
branch of netdev-2.6 and will go into 2.6.24.  Chelsio recommends we 
update ofed-1.2.5 and ofed-1.3 will all of these fixes.

I'll send another email with the ofed-1.3 changes as they will be 
slightly different.

Please pull the ofed_1_2_c changes from:

git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2_c

The patch files added to kernel_patches/fixes include:

[EMAIL PROTECTED]:~/git/ofed-1.2.5 stg series
+ 0029-cxgb3-engine-microcode-load
+ 0030-cxgb3-MAC-workaround-update
+ 0031-cxgb3-Update-rx-coalescing-length
+ 0032-cxgb3-SGE-doorbell-overflow-warning
+ 0033-cxgb3-use-immediate-data-for-offload-Tx
+ 0034-cxgb3-Expose-HW-memory-page-info
+ 0035-cxgb3-tighten-checks-on-TID-values
+ 0036-cxgb3-Fatal-error-update
+ 0037-cxgb3-log-adapter-serial-number
+ 0038-cxgb3-Update-internal-memory-management
+ 0039-cxgb3-update-firmware-version
+ 0040-cxgb3-log-and-clear-PEX-errors
+ 0041-cxgb3-remove-false-positive-in-xgmac-workaround
+ 0042-cxgb3-Set-the-CQ_ERR-bit-in-CQ-contexts
+ 0043-cxgb3-CQ-context-operations-time-out-too-soon
+ 0044-cxgb3-Add-T3C-rev
+ 0045-cxgb3-Update-engine-microcode-version
 0046-cxgb3-driver-version

Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: RFC: modify upstream code to make backporting easier

2007-09-16 Thread Michael S. Tsirkin

 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: RFC: modify upstream code to make backporting easier
 
   I wonder whether it's acceptable in cases such as this to add
   a wrapper in upstream code. For example, upstream could have:
   
   #ifndef pci_get_revision
   #define pci_get_revision(dev) ((dev)-revision)
   #endif
 
 My feeling is that this type of wrapper is just obfuscation that makes
 the driver harder to read and maintain.

Note that some people only run
backported drivers, so making it easier to read and maintain
*the backport* is also important.

 If there's a way to make
 backporting easier that also makes the upstream driver better, then
 I'm in favor of it, but this sounds like a bad example to me.

Do you think applying a patch as we do now is the best way to do it then?
Or do you have other ideas on how make backporting this example better?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: building userspace on ppc64 is broken

2007-09-16 Thread Michael S. Tsirkin

 Quoting Yosef Etigin [EMAIL PROTECTED]:
 Subject: building userspace on ppc64 is broken
 
 While building user-space binaries on ppc64, the libs are placed
 in /usr/lib64, but they are built as 32 bit. This happens because
 in ofed 1.2 CFLAGS=-m64 was passed by the environment from the
 install script. What do you think about doing somthing like this
 in the spec files to solve the problem?
 
 --
 diff --git a/libibverbs.spec.in b/libibverbs.spec.in
 index 459e6f2..8fcdd72 100644
 --- a/libibverbs.spec.in
 +++ b/libibverbs.spec.in
 @@ -47,6 +47,9 @@ displays information about InfiniBand de
  %setup -q -n [EMAIL PROTECTED]@
  
  %build
 +%ifarch ppc64
 +%{expand: %%define optflags %{optflags} -m64}
 +%endif
  %configure
  make %{?_smp_mflags}

Hmm. Roland?

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: ofed-1.3 daily build package's content

2007-09-18 Thread Michael S. Tsirkin

 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: ofed-1.3 daily build package's content
 
 Hello Vlad and Michael!
 Just downloaded daily build package OFED-1.3-20070917-0600 and saw
 in SRPMS:
 localhost:/home/nguyen/tmp/OFED-1.3-20070917-0600/SRPMS # ls -l 
 ofa_kernel-1.3-ofed2007091*
 -rw-r--r-- 1 1011 1011 1967453 2007-09-10 15:27 
 ofa_kernel-1.3-ofed20070910.src.rpm
 -rw-r--r-- 1 1011 1011 1960701 2007-09-11 15:02 
 ofa_kernel-1.3-ofed20070911.src.rpm
 -rw-r--r-- 1 1011 1011 1966672 2007-09-12 15:02 
 ofa_kernel-1.3-ofed20070912.src.rpm
 -rw-r--r-- 1 1011 1011 1957624 2007-09-13 15:02 
 ofa_kernel-1.3-ofed20070913.src.rpm
 -rw-r--r-- 1 1011 1011 1963469 2007-09-14 15:02 
 ofa_kernel-1.3-ofed20070914.src.rpm
 -rw-r--r-- 1 1011 1011 1965865 2007-09-15 15:02 
 ofa_kernel-1.3-ofed20070915.src.rpm
 -rw-r--r-- 1 1011 1011 1963044 2007-09-16 15:01 
 ofa_kernel-1.3-ofed20070916.src.rpm
 -rw-r--r-- 1 1011 1011 1959261 2007-09-17 15:01 
 ofa_kernel-1.3-ofed20070917.src.rpm

I see this too
tar tvzf OFED-1.3-20070917-0600.tgz | grep kernel
-rw-r--r-- vlad/vlad   1967453 2007-09-10 16:27:48
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070910.src.rpm
-rw-r--r-- vlad/vlad   1960701 2007-09-11 16:02:55
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070911.src.rpm
-rw-r--r-- vlad/vlad   1966672 2007-09-12 16:02:32
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070912.src.rpm
-rw-r--r-- vlad/vlad   1957624 2007-09-13 16:02:46
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070913.src.rpm
-rw-r--r-- vlad/vlad   1963469 2007-09-14 16:02:30
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070914.src.rpm
-rw-r--r-- vlad/vlad   1965865 2007-09-15 16:02:32
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070915.src.rpm
-rw-r--r-- vlad/vlad   1963044 2007-09-16 15:01:56
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070916.src.rpm
-rw-r--r-- vlad/vlad   1959261 2007-09-17 15:01:58
OFED-1.3-20070917-0600/SRPMS/ofa_kernel-1.3-ofed20070917.src.rpm

 Is there a reason to include earlier versions of ofa_kernel-1.3? Are they
 needed by the build script?

I don't think so.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] ANNOUNCE orenk taking over mstflint/imgen

2007-09-18 Thread Michael S. Tsirkin

Oren Kladnitsky [EMAIL PROTECTED] is taking over
maintaining mstflint and imgen tools from me.
His trees:

git://git.openfabrics.org/~orenk/mstflint.git
git://git.openfabrics.org/~orenk/imgen.git

are, starting now, the authoritative source for these tools.

Oren is the internal maintainer of Mellanox FW tools (MFT) and now
he is assuming ownership on the OFED tools too.

Thanks,

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] installer: fix build environment for ppc64

2007-09-18 Thread Michael S. Tsirkin

Will it break build of 32 bit libraries on ppc64?

Quoting Yosef Etigin [EMAIL PROTECTED]:
Subject: [PATCH] installer: fix build environment for ppc64

On ppc64, binaries are compiled as 32 bit by default unless the -m64
flag is specified. When libs are built for ppc64 they are placed in
/usr/lib64, despite the fact they are actually 32-bit.
This pacth forces 64 bit compilation on ppc64.

Signed-off-by: Yosef Etigin [EMAIL PROTECTED]
--

diff --git a/install.pl b/install.pl
index 7965cf4..5ce2345 100755
--- a/install.pl
+++ b/install.pl
@@ -169,6 +169,8 @@ my $mandir  = `rpm --eval '%{_mandir
 chomp $mandir;
 my $sysconfdir  = `rpm --eval '%{_sysconfdir}'`;
 chomp $sysconfdir;
+chomp (my $optflags = `rpm --eval '%{optflags}'`);
+
 my %main_packages = ();
 my @selected_packages = ();
 my @selected_by_user = ();
@@ -2270,7 +2272,7 @@ # Build RPM from source RPM
 sub build_rpm
 {
 my $name = shift @_;
-my $cmd;
+my $cmd = ;
 my $res = 0;
 my $sig = 0;
 my $TMPRPMS;
@@ -2279,7 +2281,10 @@ sub build_rpm
 print Build $name RPM\n if ($verbose);
 
 if (not $packages_info{$name}{'rpm_exist'}) {
-$cmd = rpmbuild --rebuild --define '_topdir $TOPDIR';
+if ($arch eq ppc64) {
+$cmd = CFLAGS='$optflags -m64' CXXFLAGS='$optflags -m64' 
FFLAGS='$optflags -m64' ;
+}
+$cmd .= rpmbuild --rebuild --define '_topdir $TOPDIR';
 $cmd .=  --target $target_cpu;
 
 if ( $parent eq mvapich) {

-- 
Yossi

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Re: [PATCH 0/2] IB/iser: iSCSI iSER fixes for RH4 in OFED 1.3

2007-09-19 Thread Michael S. Tsirkin

 Quoting Erez Zilber [EMAIL PROTECTED]:
 Subject: Re: [ewg] Re: [PATCH 0/2] IB/iser: iSCSI  iSER fixes for RH4 in 
 OFED?1.3
 
 Erez Zilber wrote:
 
  

  What about
  kernel_patches/backport/2.6.9_U5/iser_cmd_to_2_6_22.patch
 
  given the name, isn't it needed in other kernels up to 2.6.22 too?
 

  
 
  You're right. I've just fixed that in the git tree. I hope it's ok now.
 
  Erez

 
 Michael?

Looks ok.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

2007-09-24 Thread Michael S. Tsirkin

 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2
 
 Please pull the latest from my libcxgb3 git repos to update the 
 ofed-1.2.5 and ofed-1.3 libcxgb3 release.  This will update to version 
 1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms.
 
 git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5

This looks wrong. 1.2.X releases are done from ofed_1_2 branch.
1.2.5 is just a tag. What do you want me to do?

 and
 
 git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3

OK for that one.


-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

2007-09-24 Thread Michael S. Tsirkin

 Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

  Quoting Steve Wise [EMAIL PROTECTED]:
  Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

  Please pull the latest from my libcxgb3 git repos to update the 
  ofed-1.2.5 and ofed-1.3 libcxgb3 release.  This will update to version 
  1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms.

  git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5

 This looks wrong. 1.2.X releases are done from ofed_1_2 branch.
 1.2.5 is just a tag. What do you want me to do?

I figured it out. done.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

2007-09-24 Thread Michael S. Tsirkin

 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

 Michael S. Tsirkin wrote:
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: [GIT PULL] ofed-1.2.5 / ofed-1.3 - new libcxgb3 release v1.0.2

 Please pull the latest from my libcxgb3 git repos to update the 
 ofed-1.2.5 and ofed-1.3 libcxgb3 release.  This will update to version 
 1.0.2 of libcxgb3 which fixes a doorbell issue on big-endian platforms.

 git://git.openfabrics.org/~swise/libcxgb3 ofed_1_2_5

 Go look at

 http://www.openfabrics.org/git/?p=ofed_1_2_5/libcxgb3.git;a=summary

 It has a ofed_1_2_5 branch.  I believe Vlad setup the build scripts to 
 handle this.

 Yes?

 This looks wrong. 1.2.X releases are done from ofed_1_2 branch.
 1.2.5 is just a tag. What do you want me to do?

 and

 git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3

 OK for that one.

It's OK, done for both.
-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: Please pull libehca.git/libehca ofed_1_3 branch

2007-09-25 Thread Michael S. Tsirkin

 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Please pull libehca.git/libehca ofed_1_3 branch
 
 Hi Michael and Vlad!
 Please pull from git://git.openfabrics.org/~hnguyen/libehca.git
 branch ofed_1_3 to get the fixes below.

done

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [for OFED 1.3 PATCH 2/2] IB/ipoib: enable IGMP for userpsace multicast IB apps

2007-09-25 Thread Michael S. Tsirkin

 Quoting Or Gerlitz [EMAIL PROTECTED]:
 Subject: [for OFED 1.3 PATCH 2/2] IB/ipoib: enable IGMP for userpsace 
 multicast IB apps
 
 Michael,
 
 This patch needs to go to all the directories under kernel_patches/backport 
 that contain the
 ipoib_class_device_to_2_6_20.patch, I suggest it would be named 
 ipoib_class_device_to_2_6_20_umcast.patch

Or,
please create a public git tree that I or Vlad can pull from.
Please remember to run cross build before requesting a pull.
-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] off list for a while, email address change

2007-09-30 Thread Michael S. Tsirkin

Please note that my email address is changing.
You can contact me at my new address

m dot s dot tsirkin at gmail dot com

(address mangled to confuse spambots, replace dot with . and at with @ to
 get the actual mail address)

Near term, I might not have time for openfabrics related issues,
and might not monitor openfabrics lists.
Please copy me directly if my attention is required.

Here is a list of people at Mellanox you might want to contact:

Oren Kladnitsky [EMAIL PROTECTED] - for firmware, imgen and mstflint
Eli Cohen [EMAIL PROTECTED] - for IPoIB, mlx4 and mthca
Jim Mott [EMAIL PROTECTED] - for SDP
Jack Morgenstein [EMAIL PROTECTED] - for core, mlx4, mthca, libmlx4, libmthca
Vlad Sokolovsky [EMAIL PROTECTED] - for OFED kernel, backports and build
Tziporet Koren [EMAIL PROTECTED] - for OFED release, perftest
Sagi Rotem [EMAIL PROTECTED] - for perftest

Take care,

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Mellanox target workaround in SRP

2011-01-12 Thread Michael S. Tsirkin

On Mon, Jan 10, 2011 at 10:51:13AM -0800, Roland Dreier wrote:
 Maybe we can use MST's current email to ask him... Michael, do you have
 any memory of the issue we worked around here?
 
   I have question regarding workaround introduced in commit 559ce8f1 of
   the mainline tree:
   
   IB/srp: Work around data corruption bug on Mellanox targets
   
   Data corruption has been seen with Mellanox SRP targets when FMRs
   create a memory region with I/O virtual address != 0.  Add a
   workaround that disables FMR merging for Mellanox targets (OUI 0002c9).
   
   I don't see how this can make a difference to the target -- it sees an
   address and length, and there should be no visible difference to it when
   it gets an FMR versus a direct-mapped region of the same space, right?
   And how is it different than getting a direct or indirect descriptor
   with a similar offset?
   
   I could see there being a bug on the initiator HCA not liking such FMR
   mappings, but then it should be keyed off of the vendor of our HCA and
   not the target.
   
   I'm sure this was tested and shown to fix the problem; I'm just confused
   as to what the problem really was and if this is still relevant. Can
   someone please enlighten me?


I don't recall unfortunately. Sorry.

-- 
MST
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

74 matches

Mail list logo