[ewg] Re: [ofa-general] Re: RFC OFED-1.3 installation

2007-07-17 Thread Roland Dreier
It seems to me that this is all stemming from the same old fundamental confusion between a release and a distribution. I think everyone would be better served by a process where individual maintainers were responsible for releasing tarballs of their packages, with schedules coordinated toward an

[ewg] Re: [ofa-general] Re: RFC OFED-1.3 installation

2007-07-17 Thread Roland Dreier
I think it's easy enough to make the revision of the RPMS be something like -0.1.2007-07-17.1 or something like that. OK, so you say just ignore the content and stick a date in there? Fine, that'll work, and we can cover the RCs this way too I think. I just meant to add a revision

Re: [ofa-general] Re: [ewg] OFED 1.2.c-9 is available

2007-07-31 Thread Roland Dreier
CA type: === missing Firmware version:=== missing Hardware version:=== missing These need sysfs entries from the mlx4_ib driver, I guess. ___ ewg mailing list

Re: [ofa-general] Re: [ewg] OFED 1.2.c-9 is available

2007-07-31 Thread Roland Dreier
Why under drivers/net rather than drivers/infiniband like all the other drivers ? Does this really need special casing (in libibumad) ? Tziporet is incorrect. There's nothing from the mlx4_core driver either, and when it is implemented, it should work exactly the same as all other drivers.

[ewg] Re: [ofa-general] ConnectX support

2007-08-01 Thread Roland Dreier
I saw some specific known issues and limitations wrt ConnectX in OFED 1.2c. Is ConnectX officially supported in OFED 1.2c, or will that be OFED 1.3? The whole point of 1.2.c is to add support for connectx, so yes connectx is supported in 1.2.c. - R.

[ewg] Re: RFC: SRC API

2007-08-03 Thread Roland Dreier
- manage SRC domains OK, I guess, but why do we need so many functions to create, share, get shared, put SRC domains? How about just two functions: struct ibv_src_domain *ibv_open_src_domain(struct ibv_context *context, int fd, int

[ewg] Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Roland Dreier
But there are indeed a few cases that look wrong. yes... arch/x86_64/kernel/pci-calgary.c: writel(cpu_to_be32(val), target); eg this almost certainly wants to be writel(swab32(val), target); or something equivalent like __raw_writel(cpu_to_be32(val), target);

Re: [ewg] [PATCH 0/14] nes: NetEffect 10Gb RNIC Driver

2007-08-08 Thread Roland Dreier
git.openfabrics.org/~glenn/libnes.git git.openfabrics.org/~glenn/ofed_1_2.git git.openfabrics.org/~glenn/ofascripts.git git.openfabrics.org/~glenn/ofed_1_2_scripts.git these aren't actually git URLs. prepending git:// seems to work though. ___

[ewg] Re: OFED 1.2.5 issue with MSI ??

2007-08-15 Thread Roland Dreier
Please report OFED bugs to the appropriate list -- I have no idea what mthca patches got stuck into OFED 1.2.5 to cause problems. If you want to test the latest 2.6.23-rc kernel to make sure things are OK there then that might be useful. - R. ___ ewg

Re: [ewg] Re: Do you (Redhat) plan to integrate OFED 1.2.5 in the coming update of Redhat?

2007-08-29 Thread Roland Dreier
The only new library is libmlx4. Roland - when do you expect this library to be released? Should be soon I guess. Do you guys (Mellanox) have any feedback about the state of libmlx4? As far as I'm concerned it's in pretty good shape, and I just need to set aside time to declare a 1.0

[ewg] Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

2007-09-12 Thread Roland Dreier
-#define HCA_CAP_MR_PGSIZE_4K 1 -#define HCA_CAP_MR_PGSIZE_64K 2 -#define HCA_CAP_MR_PGSIZE_1M 4 -#define HCA_CAP_MR_PGSIZE_16M 8 +#define HCA_CAP_MR_PGSIZE_4K 0x8000 +#define HCA_CAP_MR_PGSIZE_64K 0x4000 +#define HCA_CAP_MR_PGSIZE_1M 0x2000 +#define

[ewg] Re: [PATCH 02/12] IB/ehca: Add 1 is not longer needed because of firmware interface change

2007-09-14 Thread Roland Dreier
If the rest of this patchset is okay with you, could you apply it and leave out this one patch? The patchset will apply cleanly without it. Yes, no problem, I'll drop this patch for now. - R. ___ ewg mailing list ewg@lists.openfabrics.org

[ewg] Re: building userspace on ppc64 is broken

2007-09-17 Thread Roland Dreier
%build +%ifarch ppc64 +%{expand: %%define optflags %{optflags} -m64} +%endif %configure make %{?_smp_mflags} Hmm. Roland? I guess if the OFED build system is breaking the libibverbs build then putting a patch like this in might be needed to fix things.

[ewg] Re: [PATCH 1/3] IB/ehca: Fix large page HW cap defines

2007-09-17 Thread Roland Dreier
obviously OK...applied. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: building userspace on ppc64 is broken

2007-09-18 Thread Roland Dreier
I don't really care whose fault it is. If OFED is doing something wrong, we should fix that, not work around it at library level. But what is wrong? That's a good question. The original bug report was rather short on details. Roland, I know you already help with packaging libibverbs for

[ewg] Re: [PATCH 5/5] IB/ehca: Enable large page MRs by default

2007-10-17 Thread Roland Dreier
thanks, applied 1-5 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 1/14 v2] nes: module and device initialization

2007-10-19 Thread Roland Dreier
Thanks... I am kind of overloaded trying to handle the last few things for the 2.6.24 merge window, but I will look at this next week, and I expect we should be able to merge the driver for 2.6.25 unless there are unexpected hangups. ___ ewg mailing list

[ewg] Re: [ofa-general] [PATCH 14/14 v2] nes: kernel build infrastructure

2007-10-21 Thread Roland Dreier
+ +EXTRA_CFLAGS += -DNES_MINICM I don't see anyplace NES_MINICM is used. Delete this line? + +obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o + +iw_nes-objs := nes.o nes_hw.o nes_nic.o nes_utils.o nes_verbs.o nes_cm.o + Also the file has an extra blank line at the beginning and end.

[ewg] Re: [PATCH 13/14 v2] nes: kernel build infrastructure

2007-10-21 Thread Roland Dreier
+config INFINIBAND_NES_DEBUG +bool Verbose debugging output +depends on INFINIBAND_NES +default n +---help--- + This option causes the NetEffect RNIC driver to produce debug + messages. Select this if you are developing the driver + or trying to

[ewg] Re: [PATCH 1/5 v2] libnes: library init entry points

2007-10-22 Thread Roland Dreier
+global: +ibv_driver_init; There's no version of libibverbs that ever used the ibv_driver_init entry point, so you can just kill this. +openib_driver_init; ___ ewg mailing list ewg@lists.openfabrics.org

Re: [ewg] Re: Dropped OpenFabrics list messages (!)

2007-10-25 Thread Roland Dreier
In the interest of curtailing the SPAM problem, I would also be in favor of a only-subscribers-can-post policy. Is there really strong opposition to this? Thanks. I would definitely prefer to keep the list open. Making the list subscribers-only is really annoying to many potential

Re: [SPAM] - Re: [ewg] Re: Dropped OpenFabrics list messages (!) - Email found in subject

2007-10-25 Thread Roland Dreier
I don't know if this is possible, but why not just have list subscribers bypass the SPAM filter. Those people that aren't subscribers will get the usual filtering applied. Seems to me this would keep the SPAM situation the same and only inconvenience those non-subscribers that happen

[ewg] Re: [PATCH 1/14 v2] nes: module and device initialization

2007-10-26 Thread Roland Dreier
OK, a couple quick review comments and a process comment too: - First step in the driver is to kill off a lot of the #ifdefs: +#ifdef IRQF_SHARED The upstream driver really shouldn't have compatibility gunk for older kernels... just make it build against the kernel it's in. +#ifdef

[ewg] Re: to be discussed at the developer conference

2007-10-30 Thread Roland Dreier
At the highest level I think this developer summit is suffering from a lack of a clear goal. (The same could be said about the OpenFabrics alliance as a whole, but let's not get into that...) I'm supposed to give a talk about the basics of kernel development and I'm happy to do so, but that

[ewg] Re: [PATCH 1/14 v2] nes: module and device initialization

2007-10-30 Thread Roland Dreier
Thanks Roland. Let me know when you have your branch ready. OK, I pushed out a neteffect branch at git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git This has the driver from your git tree plus some compile fixes and cleanups (added as separate commits, so you can see

[ewg] Re: [PATCH 10/14 v2] nes: eeprom and phy routines

2007-10-31 Thread Roland Dreier
+/* TODO: deal with EEPROM endian issues */ This is pretty scary. Is the driver broken on big-endian systems now? +/* +Everything you wanted to know about CRC algorithms, but were afraid to ask + for fear that errors in your understanding might be detected. Version : 3. etc

Re: [ewg] Re: [ofa-general] Feedback on Developer's Summit

2007-11-05 Thread Roland Dreier
As I said earlier on this thread, the open issues I see with the stateless offload series are (A) the non interoperable checksum offload patch based on the IB ICRC sent by Michael (and if it is inter-operable, I'd like to be educated how) (B) LRO - a pure SW optimization, why it need to

[ewg] Re: [PATCH 2/2] IB/ehca: Fix static rate calculation

2007-11-07 Thread Roland Dreier
thanks, applied both patches. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] libibverbs - return valid bad_wr from ibv_cmd_post_send()

2007-11-14 Thread Roland Dreier
But even if there is an error not related to any WR directly, it is still the first WR that is not sent. I guess NULL could be used to give slightly more information to the caller but I don't really expect most application error recovery code to make the distinction. Makes sense. I'll

[ewg] Re: [PATCH] libibverbs - return valid bad_wr from ibv_cmd_post_send()

2007-11-20 Thread Roland Dreier
OK, I fixed up ibv_post_send, ibv_post_recv and ibv_post_srq_recv and pushed out a new libibverbs tree. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ehca: Fix static rate regression

2007-11-24 Thread Roland Dreier
thanks, applied. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] [PATCH 0/6] nes: Cosmetic changes; support virtual WQs and PPC

2007-11-27 Thread Roland Dreier
Arghh... these don't apply to my neteffect branch, so you've lost the cleanup work that I did (eg trailing whitespace removal, formatting fixes, etc). I thought we agreed that I would pull the driver into my tree for merging into 2.6.25 and we would work on it there. Anwyay. I'll reimport the

[ewg] Re: [ofa-general] [PATCH 0/6] nes: Cosmetic changes; support virtual WQs and PPC

2007-12-03 Thread Roland Dreier
Or you said I could submit patches to you. One problem with only using your tree is that I need to supply updates and backports for OFED builds. I've cloned Vlad's tree for that. How you manage OFED is up to you. However, once your driver is in the upstream tree there is no alternative

[ewg] Re: [ofa-general] [PATCH 0/6] nes: Cosmetic changes; support virtual WQs and PPC

2007-12-04 Thread Roland Dreier
OK, I just pushed out a few more small cleanups (running unifdef, fixing signedness warnings, and fixing a locking bug on an error path). One question: what is the point of the monkeying with SPIN_BUG_ON on in nes.c? - R. ___ ewg mailing list

[ewg] Re: [ofa-general] [PATCH 5/5] nes: napi interface fix

2007-12-04 Thread Roland Dreier
#ifdef NES_NAPI Is #ifdef napi sprinkled throughout the code common for most drivers? Is there a better way to handle this? (Is this OFED only for backports, or for upstream?) Is there any reason why we want the upstream kernel to have both NAPI and non-NAPI support? If so, then

[ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
Here is an issue we have: struct ibv_context { struct ibv_device *device; struct ibv_context_ops ops; int cmd_fd; int async_fd; int num_comp_vectors; pthread_mutex_t

Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
BTW, sifting through the OFED 1.3 libibverbs tree, I do see that the commit to add max_xrc_domains to struct ibv_device_attr did break things by adding the member in the middle of the structure (so that an app compiled against the old header will see bogus values for local_ca_ack_delay and

Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
oops, sorry... I see that the very next OFED 1.3 commit reverted that change, so things aren't as bad as I thought. Never mind. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
I think the problem is that sizeof struct ibv_context_ops has changed, so the new driver returns a big struct ibv_context, app compiled with older header file has a smaller struct ibv_context and use the old offset to find fields after ops. Oh crud, you're obviously right. For some

Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-05 Thread Roland Dreier
struct ibv_context { struct ibv_device *device; struct ibv_context_ops ops; int cmd_fd; int async_fd; int num_comp_vectors; pthread_mutex_t mutex;

[ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-05 Thread Roland Dreier
I think in future we will have more such changes, why don't we take the pain now to make ops as a pointer and mark it as verbs 1.2 ? The problem is that undoubtedly the changes that require changing the ABI will require something more than just additional ops, so we'll end up

[ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-06 Thread Roland Dreier
+   ehca_lock_hcalls = !(cur_cpu_spec-cpu_user_features +        PPC_FEATURE_ARCH_2_05); We already talked about this yesterday, but I still feel that checking the instruction set of the CPU should not be used to determine whether a specific

[ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-09 Thread Roland Dreier
I think it needs some more inspection. The msleep in there is only called for hcalls that return H_IS_LONG_BUSY(). In theory, you can call ehca_plpar_hcall_norets() from inside an interrupt handler if the hcall in question never returns long busy. Fair enough... according to

[ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-10 Thread Roland Dreier
map_phys_fmr In fact, we do use hCalls there. Our hardware doesn't actually support FMRs, so we translate a map FMR into a reallocate PMR, which doesn't work without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE) might well return H_LONG_BUSY, so the whole

[ewg] Re: [PATCH] IB/ehca: Return correct #SGEs for SRQ

2007-12-12 Thread Roland Dreier
thanks, applied. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ehca: Fix lock flag location, bump version number

2007-12-13 Thread Roland Dreier
applied, thanks ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 3/6] nes: add missing unlock in error path of nes_alloc_fmr()

2007-12-13 Thread Roland Dreier
Be careful how you forward patches on so that you preserve author and Signed-off-by information. This patch is fairly trivial so it's no big deal, but I did write the original patch and it appears in my tree as: commit 772bbcb6b5d310bf29a184d280a4d1d8b3350422 Author: Roland Dreier [EMAIL

Re: [ewg] RE: [PATCH 3/6] nes: add missing unlock in error path of nes_alloc_fmr()

2007-12-14 Thread Roland Dreier
Another question - How do you like to see multiple patches in a set where more than one of those patches affects the same file? I've been creating them where the second patch relies on the first patch being successfully applied. Correct? Yes, if you have multiple patches that touch the

Re: [ewg] Changes to support new Qlogic HCA

2008-01-03 Thread Roland Dreier
Here is the URL for the Q Logic changes to support the new HCA that Betsy Zeller has discussed with Tziporet. I spoke with Betsy about it and she asked that I send this to you for you to review and work with. There are a few compiler warnings and we are working to eliminate them, but

Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Roland Dreier
Normally, the serial numbers for flush requests and flushes executed should be in sync. When we decide to flush dirty MRs because there are too many of them, we wake up the cleanup thread and let it do its stuff. As a side effect, it increments pool-flush_ser, which leaves it one

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-16 Thread Roland Dreier
Roland, you said that XRC API is ugly, are you going to push it upstream in its present form? That's a good question. Since there is no 'present form' for XRC as far as I can tell, it's hard to make a definitive answer. Certainly I haven't made up my mind in advance one way or another. In

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-17 Thread Roland Dreier
Well, I can't speak for everyone, but in my opinion if someone wants to run MPI job so huge that XRC absolutely has to be used to be able to actually finish it then he should seriously rethink his application design. But where do you think the crossover is where XRC starts to help MPI? In

[ewg] Re: [PATCH 4/4] IB/ehca: Prevent RDMA-related connection failures

2008-01-18 Thread Roland Dreier
thanks, applied 1-4. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Re: [PATCH 2/2] fmr_pool_flush didn't flush all MRs

2008-01-18 Thread Roland Dreier
thanks, applied. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-18 Thread Roland Dreier
The corruption happened when the process that allocated the MRs went away in the middle of the operation. We would free the MR and invalidate - and expect the in flight RDMA to error out. RDS does not know who is doing RDMA to or from a MR at any given time. OK, I see. Of course this

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-18 Thread Roland Dreier
When I hit a RDMA error (which happens quite frequently now at rds-stress exit, thanks to the fixed mr pool flushing :) I often see the RDS shutdown_worker getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear. This usually works, as all WQ entries are flushed

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-20 Thread Roland Dreier
The part on affiliated asynchronous errors says WQ processing is stopped. This also happens if we're signalling a remote access error to the other host. When a RDMA operation errors out because the remote side destroyed the MR, the RDMA WQE completes with error 10 (remote access

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Roland Dreier
BTW, what kind of HCA are you using for this testing? A pair of fairly new Mellanox cards. How new? Is it ConnectX or something older -- ie do you use the ib_mthca or mlx4_ib driver? If you're using mlx4, then I could believe there is a firmware bug that leads to lost completions. -

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-22 Thread Roland Dreier
I guess you mean just implement XRC without allowing multiple processes to share an XRC domain?  That actually seems like a sensible thing to implement as well... This is part of the current XRC implementation -- just give -1 as the fd value in ibv_open_xrc_domain(). I *think*

[ewg] Re: [PATCH 2/2] IB/ehca: Add PMA support

2008-01-30 Thread Roland Dreier
thanks, applied 1-2 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] new releases of libibcm and librdmacm libraries

2008-02-05 Thread Roland Dreier
FWIW, I've updated the packages in my Ubuntu PPA; if you're using it, you should get the packages automatically with a normal update once the builds complete. I've also updated the packages I've proposed for inclusion in Debian. ___ ewg mailing list

Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,

2008-02-06 Thread Roland Dreier
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n);

[ewg] Re: [PATCH] IB/ipoib: remove unnecessary allocation

2008-02-14 Thread Roland Dreier
this didn't apply to a current kernel tree but I fixed it up by hand. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] RDMA/nes: MAC interrupt erroneously masked on ifdown

2008-02-15 Thread Roland Dreier
thanks, applied. one minor request: Signed-off-by: Chien Tung [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] I'm assuming the first signoff line is coming from the actual author here. If so, for patches like this please include a line like From: Chien Tung [EMAIL

[ewg] Re: RDMA/nes: Fix vlan support

2008-02-17 Thread Roland Dreier
Thanks, applied. Overall this is very clean (one of the best formatted patches I've gotten in a while ;), but scripts/checkpatch.pl does complain: WARNING: __func__ should be used instead of gcc specific __FUNCTION__ #83: FILE: drivers/infiniband/hw/nes/nes_nic.c:1504: +

Re: [ewg] Re: [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3

2008-02-18 Thread Roland Dreier
Hello Eli, I have already submitted this patch to mainline. I will follow up with Roland to get this merged there. I didn't see the submission... can you resend? ___ ewg mailing list ewg@lists.openfabrics.org

[ewg] Re: [ofa-general] Re: [PATCH] IPOIB/CM Increase retry counts for OFED-1.3

2008-02-19 Thread Roland Dreier
The send completion errors indicates the packet hasn't been sent out to the wire. It seems the retries you have added induced a little bit delay for the packet to be sent out successfully, which might indicates some flow control or other issues in the device transport layer? Actually for

Re: [ewg] how to delete the processed cq from cq channel?

2008-03-18 Thread Roland Dreier
I use ibv_get_cq_event to wait a completion event. I use ibv_ack_cq_events to acknowledge that event every time. But I find I will get the same event many times. Is it the same event, or a new event for the same CQ? I want to delete the entry from cq after processing it.

[ewg] Re: [ANNOUNCE] librdmacm release 1.0.7

2008-03-29 Thread Roland Dreier
FWIW I've updated the Debian and Fedora librdmacm packages as well as my Ubuntu PPA to librdmacm-1.0.7. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Distribution packaging? (was: [ewg] Re: [ANNOUNCE] librdmacm release 1.0.7)

2008-04-01 Thread Roland Dreier
By the way, the current status of my Debian and Fedora packaging efforts for userspace code that I use is the following: libibverbs: libmthca: libmlx4: librdmacm: Up-to-date packages included in Debian and Fedora. libipathverbs: I have Debian packaging

Re: [ewg] [PATCH 0/9] [RFC] Support for Xsigo core services (xscore)

2008-04-04 Thread Roland Dreier
A few quick comments -- I'll look at this in more depth once I've cleared my backlog of things already submitted. - This is an awful lot of code for core services. Am I understanding correctly that this doesn't do anything that a user is actually wants without even more code layered on

Re: [ewg] [PATCH 2/9] [RFC] Add sysfs support for xsigo IB

2008-04-04 Thread Roland Dreier
+static struct class_device xsigoib_class_dev; struct class_device is going away in 2.6.26... better to rewrite this to use struct device. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [PATCH 1/9] [RFC] Adds the Xsigo unified API for IB and CM access used by the Xsigo virtual (v*) drivers like vnic and vhba

2008-04-04 Thread Roland Dreier
This patch adds the Xsigo unified API for IB and CM access used by the Xsigo virtual (v*) drivers like vnic and vhba. what's wrong with the verbs/CM API we already have? +static inline struct xsigo_ib_connect_info *get_connect_info(int handle) What's the reasoning behind using handles

[ewg] Re: [RFC][1/2] IPoIB UD 4K MTU support

2008-04-04 Thread Roland Dreier
+unsigned int max_ib_mtu; I don't see where this is ever set? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ehca: extend query_device() and query_port() to support all values for ibv_devinfo

2008-04-14 Thread Roland Dreier
thanks, applied ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Re: status of ofed ipoib changes which are not upstream

2008-04-27 Thread Roland Dreier
what about this patch: http://lists.openfabrics.org/pipermail/general/2008-March/048322.html Looks mostly OK, I plan to merge it. ...looking closer, what happens if the send queue has less than 16 entries? (set with send_queue_size on module load) Also maybe I'm missing it but where do you

Re: [ewg] OFED April 21 meeting summary

2008-04-28 Thread Roland Dreier
Also it is very important for us that IPoIB 2 kernel panics will be fixed ( https://bugs.openfabrics.org/show_bug.cgi?id=989, https://bugs.openfabrics.org/show_bug.cgi?id=985) Are either of these panics seen with upstream kernels? If we don't know then this points to a serious problem with

Re: [ofa-general] Re: [ewg] OFED April 21 meeting summary

2008-04-28 Thread Roland Dreier
I just saw this bug report today, but we've had similar crashes. Looks like the problem is that in ipoib_neigh_cleanup() this is done (no locking): neigh = *to_ipoib_neigh(n); then later: spin_lock_irqsave(priv-lock, flags); if (neigh-ah)

[ewg] Re: [PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs

2008-04-29 Thread Roland Dreier
Signed-off-by: Stefan Roscher stefan.roscher at de.ibm.com Kind of an inadequate changelog ;) Is this a fix or an enhancement or what? +if (atomic_read(shca-num_cqs) = ehca_max_cq) { +if (atomic_read(shca-num_qps) = ehca_max_qp) { These are racy in the sense that multiple

[ewg] Re: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup

2008-04-29 Thread Roland Dreier
My bad, on the array index idiom. I can redo. Yes, please do resend without that. With regard to post patch clean-ups, I recall you telling me that is was preferred to either front-load or back-load the cleanups in a patch series. Yes, that is true. I generally cleaned-up the

[ewg] Re: [PATCH] IB/ipoib: set child MTU as the parent's

2008-04-29 Thread Roland Dreier
When the child joins the broadcast group reset the mtu to the real one. This changelog is a little too short for me to understand what this is fixing. It seems that child devices are left with a bogus MTU until they complete their multicast join, is that it? +priv-dev-mtu =

[ewg] Re: [REPOST][PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs

2008-04-29 Thread Roland Dreier
thanks, makes sense, applied. fast turnaround too ;) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ipoib: set child MTU as the parent's

2008-04-29 Thread Roland Dreier
I think they do but it would require using two fields for flags at the private data. priv-flags would save flags that relate to the state of the net device, and say, priv-cap_flags, to save stuff like LRO, checksum or any other stuff related to capabilities. What do you think? We could

[ewg] Re: [ PATCH 3/3 v2 ] RDMA/nes Formatting cleanup

2008-04-29 Thread Roland Dreier
All looks fine, I applied all three of your patches. Thanks ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0

2008-04-30 Thread Roland Dreier
Steve -- If you put a tarball (from make dist ;) on openfabrics.org, I'll update the Debian packages. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup

2008-04-30 Thread Roland Dreier
we haves seen a few other cases where a large tx queue is needed. I think we should choose a larger default value than the current 64. maybe yes, maybe no... what are the cases where it is needed? The send queue is basically acting as a shock absorber for bursty traffic. If the queue is

[ewg] Re: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup

2008-05-01 Thread Roland Dreier
I agree, but I want to have a larger buffer to absorb larger picks. For example, after applying this patch I tested how many times the net queue is stopped and woken up when running four streams of netperf, udp, small packets. When using the default 64 tx queue size it happened 500 times.

[ewg] Re: [PATCH] Request For Comments:

2008-05-06 Thread Roland Dreier
- always do peer2peer and don't let the app choose. This forces the overhead of p2p mode on all apps, but preserves the API. How bad is the overhead? - R. ___ ewg mailing list ewg@lists.openfabrics.org

Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
We are not sure if this should be fixed in the driver or in uverbs itself. Roland, what's your opinion about this? Would be nice to be able to fix it in uverbs but I don't see how. In particular a kernel consumer has to have the same guarantee that no async events will come in after destroy

Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
I agree, that's why I posted the driver fix first. Glad we agree :) I thought about it a little more and really convinced myself that there is no good way for generic code to handle this race. So, will you apply it next? Yes, will apply it shortly. - R.

Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
So I applied this, but thinking about it further, do you (theoretically at least) have the same problem with CQ and SRQ async events? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] Re: [PATCH] Request For Comments:

2008-05-07 Thread Roland Dreier
I'm just trying to define the scope of the issue here... so is there any conceivable real-life situation where neither a 0B read nor a 0B write would work, and the connection setup will have to use a 0B send? i'm not sure what you mean by real-life. For the rnics we have: nes -

[ewg] Re: [PATCH] IB/mlx4: Initialize DS field for stamping

2008-06-02 Thread Roland Dreier
This is a fix for your optimize stamping patch, right? I'll roll it into that patch. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ipoib: increase ring sizes

2008-06-04 Thread Roland Dreier
Increase IPoIB ring sizes to twice the original size to act as a shock observer for high traffic picks. Looks fine but I would like to include a little more motivation in the changelog. What type of workload benefits from this change? ___ ewg

[ewg] Re: IB/ehca: Reject send WRs only for RESET, INIT and RTR state

2008-06-06 Thread Roland Dreier
thanks, applied. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH 2/2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-09 Thread Roland Dreier
+if (desc-chip desc-chip-eoi) +desc-chip-eoi(irq); This doesn't seem like the type of thing a device driver should be doing. Both patches are rather short on changelog text -- what is the issue here and how does this fix it?

[ewg] Re: [PATCH REPOST] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-10 Thread Roland Dreier
During corner case testing, we noticed that some versions of ehca do not properly transition to interrupt done in special load situations. This can be resolved by periodically triggering EOI through H_EOI, if eqes are pending. So just to be clear: this is a workaround for a

[ewg] Re: [PATCH REPOST] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-10 Thread Roland Dreier
So just to be clear: this is a workaround for a hardware/firmware bug? Yes it is. OK, so paulus et al... does it seem like a good approach to call H_EOI from driver code (given that this driver makes tons of other hcalls)? How critical is this? Since you said corner case testing I suspect

[ewg] Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-19 Thread Roland Dreier
During corner case testing, we noticed that some versions of ehca do not properly transition to interrupt done in special load situations. This can be resolved by periodically triggering EOI through H_EOI, if eqes are pending. Signed-off-by: Stefan Roscher [EMAIL PROTECTED] ---

Re: [ewg] [PATCH] non srq panic patch

2008-06-25 Thread Roland Dreier
-ipoib_vfree(priv-cm.rx_vmap_srq_ring); +if (ipoib_cm_has_srq(dev)) +ipoib_vfree(priv-cm.rx_vmap_srq_ring); +else +kfree(rx_ring); What tree is this made against? I can't find any reference to ipoib_vfree anywhere in the history of IPoIB so of course

  1   2   3   >