It seems to me that this is all stemming from the same old fundamental
confusion between a release and a distribution. I think everyone
would be better served by a process where individual maintainers were
responsible for releasing tarballs of their packages, with schedules
coordinated toward an
I think it's easy enough to make the revision of the RPMS be something
like -0.1.2007-07-17.1 or something like that.
OK, so you say just ignore the content and stick a date in there?
Fine, that'll work, and we can cover the RCs this way too I think.
I just meant to add a revision
CA type: === missing
Firmware version:=== missing
Hardware version:=== missing
These need sysfs entries from the mlx4_ib driver, I guess.
___
ewg mailing list
Why under drivers/net rather than drivers/infiniband like all the
other drivers ? Does this really need special casing (in libibumad) ?
Tziporet is incorrect. There's nothing from the mlx4_core driver
either, and when it is implemented, it should work exactly the same as
all other drivers.
I saw some specific known issues and limitations wrt ConnectX in OFED
1.2c. Is ConnectX officially supported in OFED 1.2c, or will that be OFED
1.3?
The whole point of 1.2.c is to add support for connectx, so yes
connectx is supported in 1.2.c.
- R.
- manage SRC domains
OK, I guess, but why do we need so many functions to create, share,
get shared, put SRC domains?
How about just two functions:
struct ibv_src_domain *ibv_open_src_domain(struct ibv_context *context,
int fd, int
But there are indeed a few cases that look wrong.
yes...
arch/x86_64/kernel/pci-calgary.c: writel(cpu_to_be32(val), target);
eg this almost certainly wants to be
writel(swab32(val), target);
or something equivalent like
__raw_writel(cpu_to_be32(val), target);
git.openfabrics.org/~glenn/libnes.git
git.openfabrics.org/~glenn/ofed_1_2.git
git.openfabrics.org/~glenn/ofascripts.git
git.openfabrics.org/~glenn/ofed_1_2_scripts.git
these aren't actually git URLs. prepending git:// seems to work though.
___
Please report OFED bugs to the appropriate list -- I have no idea what
mthca patches got stuck into OFED 1.2.5 to cause problems. If you
want to test the latest 2.6.23-rc kernel to make sure things are OK
there then that might be useful.
- R.
___
ewg
The only new library is libmlx4.
Roland - when do you expect this library to be released?
Should be soon I guess. Do you guys (Mellanox) have any feedback
about the state of libmlx4? As far as I'm concerned it's in pretty
good shape, and I just need to set aside time to declare a 1.0
-#define HCA_CAP_MR_PGSIZE_4K 1
-#define HCA_CAP_MR_PGSIZE_64K 2
-#define HCA_CAP_MR_PGSIZE_1M 4
-#define HCA_CAP_MR_PGSIZE_16M 8
+#define HCA_CAP_MR_PGSIZE_4K 0x8000
+#define HCA_CAP_MR_PGSIZE_64K 0x4000
+#define HCA_CAP_MR_PGSIZE_1M 0x2000
+#define
If the rest of this patchset is okay with you, could you apply it and
leave out this one patch? The patchset will apply cleanly without it.
Yes, no problem, I'll drop this patch for now.
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
%build
+%ifarch ppc64
+%{expand: %%define optflags %{optflags} -m64}
+%endif
%configure
make %{?_smp_mflags}
Hmm. Roland?
I guess if the OFED build system is breaking the libibverbs build then
putting a patch like this in might be needed to fix things.
obviously OK...applied.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
I don't really care whose fault it is. If OFED is doing something wrong,
we should fix that, not work around it at library level. But what is wrong?
That's a good question. The original bug report was rather short on details.
Roland, I know you already help with packaging libibverbs for
thanks, applied 1-5
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Thanks... I am kind of overloaded trying to handle the last few things
for the 2.6.24 merge window, but I will look at this next week, and I
expect we should be able to merge the driver for 2.6.25 unless there
are unexpected hangups.
___
ewg mailing list
+
+EXTRA_CFLAGS += -DNES_MINICM
I don't see anyplace NES_MINICM is used. Delete this line?
+
+obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o
+
+iw_nes-objs := nes.o nes_hw.o nes_nic.o nes_utils.o nes_verbs.o nes_cm.o
+
Also the file has an extra blank line at the beginning and end.
+config INFINIBAND_NES_DEBUG
+bool Verbose debugging output
+depends on INFINIBAND_NES
+default n
+---help---
+ This option causes the NetEffect RNIC driver to produce debug
+ messages. Select this if you are developing the driver
+ or trying to
+global:
+ibv_driver_init;
There's no version of libibverbs that ever used the ibv_driver_init
entry point, so you can just kill this.
+openib_driver_init;
___
ewg mailing list
ewg@lists.openfabrics.org
In the interest of curtailing the SPAM problem, I would also be in favor
of a only-subscribers-can-post policy. Is there really strong opposition
to this? Thanks.
I would definitely prefer to keep the list open. Making the list
subscribers-only is really annoying to many potential
I don't know if this is possible, but why not just have list
subscribers bypass the SPAM filter. Those people that aren't
subscribers will get the usual filtering applied. Seems to me this
would keep the SPAM situation the same and only inconvenience those
non-subscribers that happen
OK, a couple quick review comments and a process comment too:
- First step in the driver is to kill off a lot of the #ifdefs:
+#ifdef IRQF_SHARED
The upstream driver really shouldn't have compatibility gunk for older
kernels... just make it build against the kernel it's in.
+#ifdef
At the highest level I think this developer summit is suffering from
a lack of a clear goal. (The same could be said about the OpenFabrics
alliance as a whole, but let's not get into that...) I'm supposed to
give a talk about the basics of kernel development and I'm happy to do
so, but that
Thanks Roland. Let me know when you have your branch ready.
OK, I pushed out a neteffect branch at
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
This has the driver from your git tree plus some compile fixes and
cleanups (added as separate commits, so you can see
+/* TODO: deal with EEPROM endian issues */
This is pretty scary. Is the driver broken on big-endian systems now?
+/*
+Everything you wanted to know about CRC algorithms, but were afraid to ask
+ for fear that errors in your understanding might be detected. Version :
3.
etc
As I said earlier on this thread, the open issues I see with the stateless
offload series are (A) the non interoperable checksum offload patch based on
the IB ICRC sent by Michael (and if it is inter-operable, I'd like to be
educated how) (B) LRO - a pure SW optimization, why it need to
thanks, applied both patches.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
But even if there is an error not related to any WR directly,
it is still the first WR that is not sent. I guess NULL could
be used to give slightly more information to the caller but
I don't really expect most application error recovery code to
make the distinction.
Makes sense. I'll
OK, I fixed up ibv_post_send, ibv_post_recv and ibv_post_srq_recv and
pushed out a new libibverbs tree.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
thanks, applied.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Arghh... these don't apply to my neteffect branch, so you've lost
the cleanup work that I did (eg trailing whitespace removal,
formatting fixes, etc). I thought we agreed that I would pull the
driver into my tree for merging into 2.6.25 and we would work on it
there.
Anwyay. I'll reimport the
Or you said I could submit patches to you. One problem with
only using your tree is that I need to supply updates and backports
for OFED builds. I've cloned Vlad's tree for that.
How you manage OFED is up to you. However, once your driver is in the
upstream tree there is no alternative
OK, I just pushed out a few more small cleanups (running unifdef,
fixing signedness warnings, and fixing a locking bug on an error
path). One question: what is the point of the monkeying with
SPIN_BUG_ON on in nes.c?
- R.
___
ewg mailing list
#ifdef NES_NAPI
Is #ifdef napi sprinkled throughout the code common for most drivers? Is
there
a better way to handle this? (Is this OFED only for backports, or for
upstream?)
Is there any reason why we want the upstream kernel to have both NAPI
and non-NAPI support? If so, then
Here is an issue we have:
struct ibv_context {
struct ibv_device *device;
struct ibv_context_ops ops;
int cmd_fd;
int async_fd;
int num_comp_vectors;
pthread_mutex_t
BTW, sifting through the OFED 1.3 libibverbs tree, I do see that the
commit to add max_xrc_domains to struct ibv_device_attr did break
things by adding the member in the middle of the structure (so that an
app compiled against the old header will see bogus values for
local_ca_ack_delay and
oops, sorry... I see that the very next OFED 1.3 commit reverted that
change, so things aren't as bad as I thought.
Never mind.
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
I think the problem is that sizeof struct ibv_context_ops has
changed, so the new driver returns a big struct ibv_context, app
compiled with older header file has a smaller struct ibv_context
and use the old offset to find fields after ops.
Oh crud, you're obviously right. For some
struct ibv_context {
struct ibv_device *device;
struct ibv_context_ops ops;
int cmd_fd;
int async_fd;
int num_comp_vectors;
pthread_mutex_t mutex;
I think in future we will have more such changes, why don't we take
the
pain now to make ops as a pointer and mark it as verbs 1.2 ?
The problem is that undoubtedly the changes that require changing the
ABI will require something more than just additional ops, so we'll end
up
+ ehca_lock_hcalls = !(cur_cpu_spec-cpu_user_features
+ PPC_FEATURE_ARCH_2_05);
We already talked about this yesterday, but I still feel that checking the
instruction set of the CPU should not be used to determine whether a
specific
I think it needs some more inspection. The msleep in there is only called
for hcalls that return H_IS_LONG_BUSY(). In theory, you can call
ehca_plpar_hcall_norets() from inside an interrupt handler if the
hcall in question never returns long busy.
Fair enough... according to
map_phys_fmr
In fact, we do use hCalls there. Our hardware doesn't actually support FMRs,
so we translate a map FMR into a reallocate PMR, which doesn't work
without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE)
might well return H_LONG_BUSY, so the whole
thanks, applied.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
applied, thanks
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Be careful how you forward patches on so that you preserve author and
Signed-off-by information. This patch is fairly trivial so it's no
big deal, but I did write the original patch and it appears in my tree
as:
commit 772bbcb6b5d310bf29a184d280a4d1d8b3350422
Author: Roland Dreier [EMAIL
Another question - How do you like to see multiple patches in a set
where more than one of those patches affects the same file? I've
been creating them where the second patch relies on the first patch
being successfully applied. Correct?
Yes, if you have multiple patches that touch the
Here is the URL for the Q Logic changes to support the new HCA that
Betsy Zeller has discussed with Tziporet. I spoke with Betsy about it
and she asked that I send this to you for you to review and work with.
There are a few compiler warnings and we are working to eliminate them,
but
Normally, the serial numbers for flush requests and flushes
executed should be in sync.
When we decide to flush dirty MRs because there are too many of them, we
wake up the cleanup thread and let it do its stuff. As a side effect, it
increments pool-flush_ser, which leaves it one
Roland, you said that XRC API is ugly, are you going to push it upstream
in its present form?
That's a good question. Since there is no 'present form' for XRC as
far as I can tell, it's hard to make a definitive answer. Certainly I
haven't made up my mind in advance one way or another. In
Well, I can't speak for everyone, but in my opinion if someone wants to run
MPI job so huge that XRC absolutely has to be used to be able to actually
finish it then he should seriously rethink his application design.
But where do you think the crossover is where XRC starts to help MPI?
In
thanks, applied 1-4.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
thanks, applied.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
The corruption happened when the process that allocated the MRs went
away in the middle of the operation. We would free the MR and invalidate
- and expect the in flight RDMA to error out. RDS does not know who is
doing RDMA to or from a MR at any given time.
OK, I see. Of course this
When I hit a RDMA error (which happens quite frequently now at rds-stress
exit, thanks to the fixed mr pool flushing :) I often see the RDS
shutdown_worker
getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear.
This usually works, as all WQ entries are flushed
The part on affiliated asynchronous errors says WQ processing is stopped.
This also happens if we're signalling a remote access error to the
other host.
When a RDMA operation errors out because the remote side destroyed the MR,
the RDMA WQE completes with error 10 (remote access
BTW, what kind of HCA are you using for this testing?
A pair of fairly new Mellanox cards.
How new? Is it ConnectX or something older -- ie do you use the
ib_mthca or mlx4_ib driver? If you're using mlx4, then I could
believe there is a firmware bug that leads to lost completions.
-
I guess you mean just implement XRC without allowing multiple
processes to share an XRC domain? That actually seems like a sensible
thing to implement as well...
This is part of the current XRC implementation -- just give -1 as the fd
value
in ibv_open_xrc_domain().
I *think*
thanks, applied 1-2
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
FWIW, I've updated the packages in my Ubuntu PPA; if you're using it,
you should get the packages automatically with a normal update once
the builds complete. I've also updated the packages I've proposed for
inclusion in Debian.
___
ewg mailing list
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
b/drivers/infiniband/ulp/srp/ib_srp.c
index 950228f..45a2533 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -400,7 +400,6 @@
printk(KERN_DEBUG PFX Sending CM DREQ failed\n);
this didn't apply to a current kernel tree but I fixed it up by hand.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
thanks, applied. one minor request:
Signed-off-by: Chien Tung [EMAIL PROTECTED]
Signed-off-by: Glenn Streiff [EMAIL PROTECTED]
I'm assuming the first signoff line is coming from the actual author
here. If so, for patches like this please include a line like
From: Chien Tung [EMAIL
Thanks, applied. Overall this is very clean (one of the best
formatted patches I've gotten in a while ;), but scripts/checkpatch.pl
does complain:
WARNING: __func__ should be used instead of gcc specific __FUNCTION__
#83: FILE: drivers/infiniband/hw/nes/nes_nic.c:1504:
+
Hello Eli, I have already submitted this patch to mainline. I will follow
up with Roland to get this merged there.
I didn't see the submission... can you resend?
___
ewg mailing list
ewg@lists.openfabrics.org
The send completion errors indicates the packet hasn't been sent out
to the wire. It seems the retries you have added induced a little bit
delay for the packet to be sent out successfully, which might indicates
some flow control or other issues in the device transport layer?
Actually for
I use ibv_get_cq_event to wait a completion event. I use
ibv_ack_cq_events to acknowledge that event every time. But I find I will
get the same event many times.
Is it the same event, or a new event for the same CQ?
I want to delete the entry from cq after processing it.
FWIW I've updated the Debian and Fedora librdmacm packages as well as
my Ubuntu PPA to librdmacm-1.0.7.
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
By the way, the current status of my Debian and Fedora packaging efforts
for userspace code that I use is the following:
libibverbs:
libmthca:
libmlx4:
librdmacm:
Up-to-date packages included in Debian and Fedora.
libipathverbs:
I have Debian packaging
A few quick comments -- I'll look at this in more depth once I've
cleared my backlog of things already submitted.
- This is an awful lot of code for core services. Am I understanding
correctly that this doesn't do anything that a user is actually wants
without even more code layered on
+static struct class_device xsigoib_class_dev;
struct class_device is going away in 2.6.26... better to rewrite this to
use struct device.
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
This patch adds the Xsigo unified API for IB and CM access used by the
Xsigo virtual (v*) drivers like vnic and vhba.
what's wrong with the verbs/CM API we already have?
+static inline struct xsigo_ib_connect_info *get_connect_info(int handle)
What's the reasoning behind using handles
+unsigned int max_ib_mtu;
I don't see where this is ever set?
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
thanks, applied
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
what about this patch:
http://lists.openfabrics.org/pipermail/general/2008-March/048322.html
Looks mostly OK, I plan to merge it.
...looking closer, what happens if the send queue has less than 16
entries? (set with send_queue_size on module load)
Also maybe I'm missing it but where do you
Also it is very important for us that IPoIB 2 kernel panics will be fixed (
https://bugs.openfabrics.org/show_bug.cgi?id=989,
https://bugs.openfabrics.org/show_bug.cgi?id=985)
Are either of these panics seen with upstream kernels?
If we don't know then this points to a serious problem with
I just saw this bug report today, but we've had similar crashes.
Looks like the problem is that in ipoib_neigh_cleanup() this is
done (no locking):
neigh = *to_ipoib_neigh(n);
then later:
spin_lock_irqsave(priv-lock, flags);
if (neigh-ah)
Signed-off-by: Stefan Roscher stefan.roscher at de.ibm.com
Kind of an inadequate changelog ;)
Is this a fix or an enhancement or what?
+if (atomic_read(shca-num_cqs) = ehca_max_cq) {
+if (atomic_read(shca-num_qps) = ehca_max_qp) {
These are racy in the sense that multiple
My bad, on the array index idiom. I can redo.
Yes, please do resend without that.
With regard to post patch clean-ups, I recall you telling me
that is was preferred to either front-load or back-load the
cleanups in a patch series.
Yes, that is true.
I generally cleaned-up the
When the child joins the broadcast group reset the mtu to
the real one.
This changelog is a little too short for me to understand what this is
fixing. It seems that child devices are left with a bogus MTU until
they complete their multicast join, is that it?
+priv-dev-mtu =
thanks, makes sense, applied.
fast turnaround too ;)
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
I think they do but it would require using two fields for flags at the
private data. priv-flags would
save flags that relate to the state of the net device, and say,
priv-cap_flags, to save stuff like LRO,
checksum or any other stuff related to capabilities. What do you think?
We could
All looks fine, I applied all three of your patches.
Thanks
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Steve -- If you put a tarball (from make dist ;) on openfabrics.org,
I'll update the Debian packages.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
we haves seen a few other cases where a large tx queue is needed. I
think we should choose a larger default value than the current 64.
maybe yes, maybe no... what are the cases where it is needed?
The send queue is basically acting as a shock absorber for bursty
traffic. If the queue is
I agree, but I want to have a larger buffer to absorb larger picks. For
example, after applying this patch I tested how many times the net queue
is stopped and woken up when running four streams of netperf, udp, small
packets. When using the default 64 tx queue size it happened 500 times.
- always do peer2peer and don't let the app choose. This forces
the overhead of p2p mode on all apps, but preserves the API.
How bad is the overhead?
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
We are not sure if this should be fixed in the driver or in uverbs itself.
Roland, what's your opinion about this?
Would be nice to be able to fix it in uverbs but I don't see how. In
particular a kernel consumer has to have the same guarantee that no
async events will come in after destroy
I agree, that's why I posted the driver fix first.
Glad we agree :)
I thought about it a little more and really convinced myself that there
is no good way for generic code to handle this race.
So, will you apply it next?
Yes, will apply it shortly.
- R.
So I applied this, but thinking about it further, do you (theoretically
at least) have the same problem with CQ and SRQ async events?
- R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
I'm just trying to define the scope of the issue here... so is there any
conceivable real-life situation where neither a 0B read nor a 0B write
would work, and the connection setup will have to use a 0B send?
i'm not sure what you mean by real-life. For the rnics we have:
nes -
This is a fix for your optimize stamping patch, right?
I'll roll it into that patch.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Increase IPoIB ring sizes to twice the original size to act as
a shock observer for high traffic picks.
Looks fine but I would like to include a little more motivation in the
changelog. What type of workload benefits from this change?
___
ewg
thanks, applied.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
+if (desc-chip desc-chip-eoi)
+desc-chip-eoi(irq);
This doesn't seem like the type of thing a device driver should be
doing. Both patches are rather short on changelog text -- what is the
issue here and how does this fix it?
During corner case testing, we noticed that some versions of ehca
do not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI,
if eqes are pending.
So just to be clear: this is a workaround for a
So just to be clear: this is a workaround for a hardware/firmware bug?
Yes it is.
OK, so paulus et al... does it seem like a good approach to call H_EOI
from driver code (given that this driver makes tons of other hcalls)?
How critical is this? Since you said corner case testing I suspect
During corner case testing, we noticed that some versions of ehca
do not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI,
if eqes are pending.
Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
---
-ipoib_vfree(priv-cm.rx_vmap_srq_ring);
+if (ipoib_cm_has_srq(dev))
+ipoib_vfree(priv-cm.rx_vmap_srq_ring);
+else
+kfree(rx_ring);
What tree is this made against? I can't find any reference to
ipoib_vfree anywhere in the history of IPoIB so of course
1 - 100 of 217 matches
Mail list logo