Re: [ofa-general] ofed1.1 and EL4 2.6.9-67.0.4
Mahmoud Hanafi wrote: I am trying to build ofed1.1 with RedHat EL4 kernel (2.6.9-67.0.4). I get build lots of build errors. Can ofed1.1 be build with this kernel? if so is there a trick that i am missing. Thanks, Mahmoud Hanafi Sr. System Administrator CSC HPC COE Bld. 676 2435 Fifth Street WPAFB, Ohio 45433 (937) 255-1536 Computer Sciences Corporation Registered Office: 2100 East Grand Avenue, El Segundo California 90245, USA Registered in USA No: C-489-59 OFED-1.1 does not support this kernel. You can try OFED-1.2.5.5. Regards, Vladimir ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] ofa_1_3_kernel 20080221-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_3/linux-2.6.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod --with-nes-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.24 Failed: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem)
On Thu, Feb 21, 2008 at 03:20:02PM +1100, Nick Piggin wrote: So why can't you export a device from your xpmem driver, which can be mmap()ed to give out anonymous memory pages to be used for these communication buffers? Because we need to have heap and stack available as well. MPT does not control all the communication buffer areas. I haven't checked, but this is the same problem that IB will have. I believe they are actually allowing any memory region be accessible, but I am not sure of that. Then you should create a driver that the user program can register and unregister regions of their memory with. The driver can do a get_user_pages to get the pages, and then you'd just need to set up some kind of mapping so that userspace can unmap pages / won't leak memory (and an exit_mm notifier I guess). OK. You need to explain this better to me. How would this driver supposedly work? What we have is an MPI library. It gets invoked at process load time to establish its rank-to-rank communication regions. It then turns control over to the processes main(). That is allowed to run until it hits the MPI_Init(argc, argv); The process is then totally under the users control until: MPI_Send(intmessage, m_size, MPI_INT, my_rank+half, tag, MPI_COMM_WORLD); MPI_Recv(intmessage, m_size, MPI_INT, my_rank+half,tag, MPI_COMM_WORLD, status); That is it. That is all our allowed interaction with the users process. Are you saying at the time of the MPI_Send, we should: down_write(current-mm-mmap_sem); Find all the VMAs that describe this region and record their vm_ops structure. Find all currently inserted page table information. Create new VMAs that describe the same regions as before. Insert our special fault handler which merely calls their old fault handler and then exports the page then returns the page to the kernel. Take an extra reference count on the page for each possible remote rank we are exporting this to. That doesn't seem too unreasonable, except when you compare it to how the driver currently works. Remember, this is done from a library which has no insight into what the user has done to its own virtual address space. As a result, each MPI_Send() would result in a system call (or we would need to have a set of callouts for changes to a processes VMAs) which would be a significant increase in communication overhead. Maybe I am missing what you intend to do, but what we need is a means of tracking one processes virtual address space changes so other processes can do direct memory accesses without the need for a system call on each communication event. Because you don't need to swap, you don't need coherency, and you are in control of the areas, then this seems like the best choice. It would allow you to use heap, stack, file-backed, anything. You are missing one point here. The MPI specifications that have been out there for decades do not require the process use a library for allocating the buffer. I realize that is a horrible shortcoming, but that is the world we live in. Even if we could change that spec, we would still need to support the existing specs. As a result, the user can change their virtual address space as they need and still expect communications be cheap. Thanks, Robin ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC] ib_mthca: avoid recycling FMR R_Keys too soon
Jack Morgenstein wrote: As long as the underlying mpt index is not played with, there is no requirement that the sequence bits start from 0. Its just sufficient to guarantee that the same (full) key not be allocated twice before performing an unmap/SYNC_TPT. Index: ofed_kernel/drivers/infiniband/hw/mthca/mthca_mr.c === --- ofed_kernel.orig/drivers/infiniband/hw/mthca/mthca_mr.c 2008-02-21 10:32:50.0 +0200 +++ ofed_kernel/drivers/infiniband/hw/mthca/mthca_mr.c 2008-02-21 12:22:54.393777000 +0200 @@ -839,11 +839,6 @@ void mthca_arbel_fmr_unmap(struct mthca_ if (!fmr-maps) return; - key = arbel_key_to_hw_index(fmr-ibmr.lkey); - key = dev-limits.num_mpts - 1; - key = adjust_key(dev, key); - fmr-ibmr.lkey = fmr-ibmr.rkey = arbel_hw_index_to_key(key); - fmr-maps = 0; *(u8 *) fmr-mem.arbel.mpt = MTHCA_MPT_STATUS_SW; == This can be done for mlx4 (mlx4_fmr_unmap) and tavor (mthca_tavor_fmr_unmap) as well. As far as I understand under Sinai you must issue an adjust_key call when the key is about to wraparound, correct? Or. commit 608d8268be392444f825b4fc8fc7c8b509627129 Author: Michael S. Tsirkin [EMAIL PROTECTED] Date: Mon Apr 16 17:04:55 2007 +0300 IB/mthca: Fix data corruption after FMR unmap on Sinai In mthca_arbel_fmr_unmap(), the high bits of the key are masked off. This gets rid of the effect of adjust_key(), which makes sure that bits 3 and 23 of the key are equal when the Sinai throughput optimization is enabled, and so it may happen that an FMR will end up with bits 3 and 23 in the key being different. This causes data corruption, because when enabling the throughput optimization, the driver promises the HCA firmware that bits 3 and 23 of all memory keys will always be equal. Fix by re-applying adjust_key() after masking the key. Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for help in debug. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Signed-off-by: Roland Dreier [EMAIL PROTECTED] ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 4/8] infiniband/hw/nes/nes.c: fix a check-after-use
From: Adrian Bunk [EMAIL PROTECTED] This patch fixes a check-after-use spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index 7f8853b..b2112f5 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -567,12 +567,12 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i /* Init the adapter */ nesdev-nesadapter = nes_init_adapter(nesdev, hw_rev); - nesdev-nesadapter-et_rx_coalesce_usecs_irq = interrupt_mod_interval; if (!nesdev-nesadapter) { printk(KERN_ERR PFX Unable to initialize adapter.\n); ret = -ENOMEM; goto bail5; } + nesdev-nesadapter-et_rx_coalesce_usecs_irq = interrupt_mod_interval; /* nesdev-base_doorbell_index = nesdev-nesadapter-pd_config_base[PCI_FUNC(nesdev-pcidev-devfn)]; */ ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 1/8] infiniband/hw/nes/nes_verbs.c: address dead code warning in nes_verbs.c
From: Chien Tung [EMAIL PROTECTED] Adrian Bunk found some apparently dead code in nes_verbs.c after a coverity review that really shouldn't have been dead. The function nes_create_cq() was missing the following assignment err = 1; just prior to an iteration that conditionally set err = 0 if a PBL was found for a given virtual CQ. Also noticed we should have been returning -EFAULT on a couple related error paths. Signed-off-by: Chien Tung [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes_verbs.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 4dafbe1..201b95e 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1327,7 +1327,7 @@ static struct ib_qp *nes_create_qp(struct ib_pd *ibpd, (long long unsigned int)req.user_wqe_buffers); nes_free_resource(nesadapter, nesadapter-allocated_qps, qp_num); kfree(nesqp-allocated_buffer); - return ERR_PTR(-ENOMEM); + return ERR_PTR(-EFAULT); } } @@ -1674,6 +1674,7 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, } nes_debug(NES_DBG_CQ, CQ Virtual Address = %08lX, size = %u.\n, (unsigned long)req.user_cq_buffer, entries); + err = 1; list_for_each_entry(nespbl, nes_ucontext-cq_reg_mem_list, list) { if (nespbl-user_base == (unsigned long )req.user_cq_buffer) { list_del(nespbl-list); @@ -1686,7 +1687,7 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, if (err) { nes_free_resource(nesadapter, nesadapter-allocated_cqs, cq_num); kfree(nescq); - return ERR_PTR(err); + return ERR_PTR(-EFAULT); } pbl_entries = nespbl-pbl_size 3; ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 2/8] infiniband/hw/nes/nes_verbs.c: fix off-by-one
From: Adrian Bunk [EMAIL PROTECTED] This patch fixes an off-by-one spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes_verbs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 201b95e..692f0d8 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -929,7 +929,7 @@ static struct ib_pd *nes_alloc_pd(struct ib_device *ibdev, NES_MAX_USER_DB_REGIONS, nesucontext-first_free_db); nes_debug(NES_DBG_PD, find_first_zero_biton doorbells returned %u, mapping pd_id %u.\n, nespd-mmap_db_index, nespd-pd_id); - if (nespd-mmap_db_index NES_MAX_USER_DB_REGIONS) { + if (nespd-mmap_db_index = NES_MAX_USER_DB_REGIONS) { nes_debug(NES_DBG_PD, mmap_db_index MAX\n); nes_free_resource(nesadapter, nesadapter-allocated_pds, pd_num); kfree(nespd); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 5/8] infiniband/hw/nes/nes_verbs.c: fix use-after-free
Adrian Bunk flagged this check-after-use issue spotted by the Coverity checker. Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes_verbs.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 692f0d8..a651e9d 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1832,9 +1832,6 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, spin_unlock_irqrestore(nesdev-cqp.lock, flags); } } - nes_debug(NES_DBG_CQ, iWARP CQ%u create timeout expired, major code = 0x%04X, -minor code = 0x%04X\n, - nescq-hw_cq.cq_number, cqp_request-major_code, cqp_request-minor_code); if (!context) pci_free_consistent(nesdev-pcidev, nescq-cq_mem_size, mem, nescq-hw_cq.cq_pbase); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 6/8] RDMA/nes: Fix rdma connection establishment on big-endian platforms
From: Faisal Latif [EMAIL PROTECTED] With commit ef19454bd437b2ba, behavior of crc32c changes on big-endian platforms. Our algorithm expects previous behavior otherwise we have rdma connection establishment failure on big-endian platforms like ppc64. Applying cpu_to_le32() to value returned by crc32c() to get previous behavior. Signed-off-by: Faisal Latif [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes.h| 14 ++ drivers/infiniband/hw/nes/nes_cm.c |5 +++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index fd57e8a..b0d3c52 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -285,6 +285,20 @@ struct nes_device { }; +static inline u32 get_crc_value(struct nes_v4_quad* nes_quad) +{ + u32 crc_value; + crc_value = crc32c(~0, (void *)nes_quad, sizeof (struct nes_v4_quad)); + + /* + * With commit ef19454bd437b2ba, behavior of crc32c changes on + * big-endian platforms. Our algorithm expects previous behavior + * otherwise we have rdma connection establishment issue on ppc64. + */ + crc_value = cpu_to_le32(crc_value); + return crc_value; +} + static inline void set_wqe_64bit_value(__le32 *wqe_words, u32 index, u64 value) { diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 6c298aa..1f042d1 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -2320,6 +2320,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) struct iw_cm_event cm_event; struct nes_hw_qp_wqe *wqe; struct nes_v4_quad nes_quad; + u32 crc_value; int ret; ibqp = nes_get_qp(cm_id-device, conn_param-qpn); @@ -2436,8 +2437,8 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port; /* Produce hash key */ - nesqp-hte_index = cpu_to_be32( - crc32c(~0, (void *)nes_quad, sizeof(nes_quad)) ^ 0x); + crc_value = get_crc_value(nes_quad); + nesqp-hte_index = cpu_to_be32(crc_value ^ 0x); nes_debug(NES_DBG_CM, HTE Index = 0x%08X, CRC = 0x%08X\n, nesqp-hte_index, nesqp-hte_index adapter-hte_index_mask); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 7/8] RDMA/nes: Fix rdma connection establishment on big-endian platforms
From: Faisal Latif [EMAIL PROTECTED] With commit ef19454bd437b2ba, behavior of crc32c changes on big-endian platforms. Our algorithm expects previous behavior otherwise we have rdma connection establishment failure on big-endian platforms like ppc64. Applying cpu_to_le32() to value returned by crc32c() to get previous behavior. Signed-off-by: Faisal Latif [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes.h| 14 ++ drivers/infiniband/hw/nes/nes_cm.c |5 +++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index fd57e8a..b0d3c52 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -285,6 +285,20 @@ struct nes_device { }; +static inline u32 get_crc_value(struct nes_v4_quad* nes_quad) +{ + u32 crc_value; + crc_value = crc32c(~0, (void *)nes_quad, sizeof (struct nes_v4_quad)); + + /* + * With commit ef19454bd437b2ba, behavior of crc32c changes on + * big-endian platforms. Our algorithm expects previous behavior + * otherwise we have rdma connection establishment issue on ppc64. + */ + crc_value = cpu_to_le32(crc_value); + return crc_value; +} + static inline void set_wqe_64bit_value(__le32 *wqe_words, u32 index, u64 value) { diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 6c298aa..1f042d1 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -2320,6 +2320,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) struct iw_cm_event cm_event; struct nes_hw_qp_wqe *wqe; struct nes_v4_quad nes_quad; + u32 crc_value; int ret; ibqp = nes_get_qp(cm_id-device, conn_param-qpn); @@ -2436,8 +2437,8 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port; /* Produce hash key */ - nesqp-hte_index = cpu_to_be32( - crc32c(~0, (void *)nes_quad, sizeof(nes_quad)) ^ 0x); + crc_value = get_crc_value(nes_quad); + nesqp-hte_index = cpu_to_be32(crc_value ^ 0x); nes_debug(NES_DBG_CM, HTE Index = 0x%08X, CRC = 0x%08X\n, nesqp-hte_index, nesqp-hte_index adapter-hte_index_mask); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH 2.6 8/8] RDMA/nes: Fix interrupt moderation low threshold
From: John Lacombe [EMAIL PROTECTED] Interrupt moderation low threshold value was incorrectly triggering, indicating that the threshold should be lowered. The impact was the timer was likely to become 40usecs and get stuck there. The biggest side effect was too many interrupts and nonoptimal performance. Signed-off-by: John Lacombe [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes_hw.c | 12 drivers/infiniband/hw/nes/nes_hw.h |2 +- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 7c4c0fb..6b677b5 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -156,15 +156,13 @@ static void nes_nic_tune_timer(struct nes_device *nesdev) spin_lock_irqsave(nesadapter-periodic_timer_lock, flags); - if (shared_timer-cq_count_old cq_count) { - if (cq_count shared_timer-threshold_low) - shared_timer-cq_direction_downward=0; - } - if (shared_timer-cq_count_old = cq_count) + if (shared_timer-cq_count_old = cq_count) + shared_timer-cq_direction_downward = 0; + else shared_timer-cq_direction_downward++; shared_timer-cq_count_old = cq_count; if (shared_timer-cq_direction_downward NES_NIC_CQ_DOWNWARD_TREND) { - if (cq_count = shared_timer-threshold_low) { + if (cq_count = shared_timer-threshold_low (shared_timer-threshold_low 4)) { shared_timer-threshold_low = shared_timer-threshold_low/2; shared_timer-cq_direction_downward=0; nesdev-currcq_count = 0; @@ -1728,7 +1726,6 @@ int nes_napi_isr(struct nes_device *nesdev) nesdev-int_req = ~NES_INT_TIMER; nes_write32(nesdev-regs+NES_INTF_INT_MASK, ~(nesdev-intf_int_req)); nes_write32(nesdev-regs+NES_INT_MASK, ~nesdev-int_req); - nesadapter-tune_timer.timer_in_use_old = 0; } nesdev-deepcq_count = 0; return 1; @@ -1867,7 +1864,6 @@ void nes_dpc(unsigned long param) nesdev-int_req = ~NES_INT_TIMER; nes_write32(nesdev-regs + NES_INTF_INT_MASK, ~(nesdev-intf_int_req)); nes_write32(nesdev-regs+NES_INT_MASK, ~nesdev-int_req); - nesdev-nesadapter-tune_timer.timer_in_use_old = 0; } else { nes_write32(nesdev-regs+NES_INT_MASK, 0x|(~nesdev-int_req)); } diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 1e10df5..b7e2844 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -962,7 +962,7 @@ struct nes_arp_entry { #define DEFAULT_JUMBO_NES_QL_LOW12 #define DEFAULT_JUMBO_NES_QL_TARGET 40 #define DEFAULT_JUMBO_NES_QL_HIGH 128 -#define NES_NIC_CQ_DOWNWARD_TREND 8 +#define NES_NIC_CQ_DOWNWARD_TREND 16 struct nes_hw_tune_timer { //u16 cq_count; ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] [PATCH 2.6 6/8] RDMA/nes: Fix rdma connectionestablishment on big-endian platforms
You'll notice I have two 6/8 patches. This is the bogus one. No one told me this job required ability to count. Glenn -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Sent: Thursday, February 21, 2008 8:30 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; general@lists.openfabrics.org; Faisal Latif Subject: [ofa-general] [PATCH 2.6 6/8] RDMA/nes: Fix rdma connectionestablishment on big-endian platforms From: Faisal Latif [EMAIL PROTECTED] With commit ef19454bd437b2ba, behavior of crc32c changes on big-endian platforms. Our algorithm expects previous behavior otherwise we have rdma connection establishment failure on big-endian platforms like ppc64. Applying cpu_to_le32() to value returned by crc32c() to get previous behavior. Signed-off-by: Faisal Latif [EMAIL PROTECTED] Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes.h| 14 ++ drivers/infiniband/hw/nes/nes_cm.c |5 +++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index fd57e8a..b0d3c52 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -285,6 +285,20 @@ struct nes_device { }; +static inline u32 get_crc_value(struct nes_v4_quad* nes_quad) +{ + u32 crc_value; + crc_value = crc32c(~0, (void *)nes_quad, sizeof (struct nes_v4_quad)); + + /* + * With commit ef19454bd437b2ba, behavior of crc32c changes on + * big-endian platforms. Our algorithm expects previous behavior + * otherwise we have rdma connection establishment issue on ppc64. + */ + crc_value = cpu_to_le32(crc_value); + return crc_value; +} + static inline void set_wqe_64bit_value(__le32 *wqe_words, u32 index, u64 value) { diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 6c298aa..1f042d1 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -2320,6 +2320,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) struct iw_cm_event cm_event; struct nes_hw_qp_wqe *wqe; struct nes_v4_quad nes_quad; + u32 crc_value; int ret; ibqp = nes_get_qp(cm_id-device, conn_param-qpn); @@ -2436,8 +2437,8 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port; /* Produce hash key */ - nesqp-hte_index = cpu_to_be32( - crc32c(~0, (void *)nes_quad, sizeof(nes_quad)) ^ 0x); + crc_value = get_crc_value(nes_quad); + nesqp-hte_index = cpu_to_be32(crc_value ^ 0x); nes_debug(NES_DBG_CM, HTE Index = 0x%08X, CRC = 0x%08X\n, nesqp-hte_index, nesqp-hte_index adapter-hte_index_mask); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH] mmu notifiers #v6
On Thu, Feb 21, 2008 at 05:54:30AM +0100, Nick Piggin wrote: will send you incremental changes that can be discussed more easily that way (nothing major, mainly style and minor things). I don't need to say you're very welcome ;). I agree: your coherent, non-sleeping mmu notifiers are pretty simple and unintrusive. The sleeping version is fundamentally going to either need to change VM locks, or be non-coherent, so I don't think there is a question of making one solution fit everybody. So the sleeping / xrmap patch should be kept either completely independent, or as an add-on to this one. The need to change the VM locks to fit the sleepable mmu notifier needs, I think is the major reason why the sleeping patch should be a separate config option unless you think the i_mmap_lock will benefit the VM for its own good regardless of the sleepable mmu notifiers. Otherwise we'll end up merging in mainline an API that can only satisfy the needs of the sleeping users that are only interested about anonymous memory. While the basic concept of the mmu notifiers is to cover the whole user visible address space, not just anonymous memory! Furthermore XPMEM users already asked to work on tmpfs/MAP_SHARED too... Originally the trick that I was trying to remove the atomic param, was to defer the invalidate_range after dropping the i_mmap_lock. But clearly in truncate we'll have no more guarantees that nor the vma nor the MM still exists after spin_unlock(i_mmap_lock) is called... So it's simply impossible to call the mmu notifier out of the i_mmap_lock for truncate, and Christoph's patch looks unfixable without altering the VM core locking. Christoph's API one-config-fits-all can't really fit-all, but only the anonymous memory. However if I wear a KVM hat, I cannot care less what is merged as long as .25 will be able to fully swap reliably a virtualized guest OS ;). This is why I'm totally willing to support any decision in favor of anything (including your own patch that would only work for KVM) that can be merged. I will post some suggestions to you when I get a chance. I really want suggestions on Jack's concern about issuing an invalidate per pte entry or per-pte instead of per-range. I'll answer that in a separate email. For KVM my patch is already close to optimal because each single spte invalidate requires a fixed amount of work, but for GRU a large invalidate-range would be more efficient. To address the GRU _valid_ concern, I can create a second version of my patch with range_begin/end instead of invalidate_pages, that still won't support sleeping users like XPMEM but only KVM and GRU. Then it's up to Christoph when he comes back to alter the vm locking so that those calls can sleep too... But that will require a much bigger change and then perhaps xpmem can share the same mmu notifiers when the config option to make the mmu notifier sleepable is enabled. But that part would better be incremental as it's not so obviously safe to merge as the mmu notifier themself. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] There is no cheaper source of original and perfectly working software.
Don't waste time waiting for delivery of your software on a CD. Download and install it immediately. Choose the program you need from more than 270 programs in many languages. Accept this brilliant offer and take the advantage of our free installation consultations. Money back guarantee is available. http://geocities.com/stephenmiddleton91 Check our site for discounts! ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Save 80% on your pills. Discount Code #MyZZp
Hi openib-general, be wise, purchase your meds from the most well-known online store since 1996. http://www.google.com/pagead/iclk?sa=lai=cIFhPwnum=17719adurl=http://igrementsit.com jedidiah duke ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC] ib_mthca: avoid recycling FMR R_Keys too soon
On Thursday 21 February 2008 13:42, Or Gerlitz wrote: As far as I understand under Sinai you must issue an adjust_key call when the key is about to wraparound, correct? Or. Actually, its not related to wraparound. The key adjustment is in the mpt-index section only, and does not affect the sequence number section. If we don't re-initialize the key, adjust_key should not be called. - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC] ib_mthca: avoid recycling FMR R_Keys too soon
Jack Morgenstein wrote: On Thursday 21 February 2008 13:42, Or Gerlitz wrote: As far as I understand under Sinai you must issue an adjust_key call when the key is about to wraparound, correct? Actually, its not related to wraparound. The key adjustment is in the mpt-index section only, and does not affect the sequence number section. If we don't re-initialize the key, adjust_key should not be called. Is it possible to never re-initialize the key? if yes, what's the semantics of the M=max_map_per_fmr device attribute? I was thinking that after the fmr was mapped M times, something --has-- to be reinitialized, sorry if this is my misunderstanding, can you clarify that? Or commit d4cb0784fd1ea99ef3d20526811bd5608146fe60 Author: Or Gerlitz [EMAIL PROTECTED] Date: Sat Jun 17 20:37:37 2006 -0700 IB/mthca: Fill in max_map_per_fmr device attribute Report the true max_map_per_fmr value from mthca_query_device(), taking into account the change in FMR remapping introduced by the Sinai performance optimization. Signed-off-by: Or Gerlitz [EMAIL PROTECTED] Signed-off-by: Roland Dreier [EMAIL PROTECTED] diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index a2eae8a..8f89ba7 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -115,6 +115,16 @@ static int mthca_query_device(struct ib_device *ibdev, props-max_mcast_qp_attach = MTHCA_QP_PER_MGM; props-max_total_mcast_qp_attach = props-max_mcast_qp_attach * props-max_mcast_grp; + /* +* If Sinai memory key optimization is being used, then only +* the 8-bit key portion will change. For other HCAs, the +* unused index bits will also be used for FMR remapping. +*/ + if (mdev-mthca_flags MTHCA_FLAG_SINAI_OPT) + props-max_map_per_fmr = 255; + else + props-max_map_per_fmr = + (1 (32 - long_log2(mdev-limits.num_mpts))) - 1; err = 0; out: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [2.6 patch] infiniband/hw/nes/nes_verbs.c: fix off-by-one
On Thu, Feb 21, 2008 at 06:39:45AM -0600, Glenn Streiff wrote: No, 51af33e8 was for a similar same bug 400 lines below this bug... Heh, sorry. Glenn -- please review Adrian's patches and let me know which ones are good to apply. I went ahead and created a patch series and attributed Adrian for the patches of his I liked. There were a couple that I tweaked. Wasn't sure if all the hunks would apply nicely after that if we mixed and matched his and mine, hence the series. Hope that's okay. Should I have gotten his ack for the ones I rewrote? The fixes were pretty small so I figured they didn't really need more review. ... Looking at the patches what you did seems OK. But regarding review I have a different criticism directed at Roland: This driver should really have gotten some review before being included in the kernel. Even a simple checkpatch run finds more than 250 stylistic errors (not code bugs but cases where the driver violates the standard code formatting rules of kernel code). And I'm not talking about the 2000 checkpatch warnings that are mostly about too long lines (which should arguably also be fixed). And many more issues that could have been foung during a review. E.g. when you look at 3/8 from this series the code if (!cm_node) return -EINVAL; new_send = kzalloc(sizeof(*new_send), GFP_ATOMIC); if (!new_send) return -1; doesn't look good since the -1 should most likely better be something like -ENOMEM (I haven't checked whether you can immediately change it at this specific place). And these are just comments from someone with zero knowledge about InfiniBand, but I'd expect InfiniBand-specifig bugs might be found before they hit users if an InfiniBand maintainer would review the complete driver. Note that this is not meant as a criticism against Glenn - it's normal that submitted code contains bugs, but a code review can help to cope with this. Glenn cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH] mmu notifiers #v6
I really want suggestions on Jack's concern about issuing an invalidate per pte entry or per-pte instead of per-range. I'll answer that in a separate email. For KVM my patch is already close to optimal because each single spte invalidate requires a fixed amount of work, but for GRU a large invalidate-range would be more efficient. To address the GRU _valid_ concern, I can create a second version of my patch with range_begin/end instead of invalidate_pages, that still I don't know how much significance to place on this data, but it is a real data point. I ran the GRU regression test suite on kernels with both types of mmu_notifiers. The kernel/driver using Christoph's patch had 1/7 the number of TLB invalidates as Andrea's patch. This reduction is due to both differences I mentioned yesterday: - different location of callout for address space teardown - range callouts Unfortunately, the current driver does not allow me to quantify which of the differences is most significant. Also, I'll try to post the driver within the next few days. It is still in development but it compiles and can successfully run most workloads on a system simulator. --- jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] post_recv question
I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: Consider that the post_recv methods usually just program a hw-specific WR in the RQ, then ring a doorbell, then return. There is a delta period between when the app returns from the post_recv call and when the HW actually DMA's the WR and programs up the HW to enable that buffer. (I'm assumming a specific HW design here, but I _think_ most HW behaves this way?). If this is all true, then from the apps point of view, the buffer isn't really available when it returns from post_recv. This can lead to conditions where the app advertises that recv buffer to the peer via some out of band channel, and the peer posts a SEND which arrives _before_ the HW has actually setup the RECV buffer. I'm really not following the question here. When you say that the app advertises the buffer, are you saying that it sends some sort of credit that a receive is posted? I would fully expect the receive buffer to be available to receive data before post_recv returns, but I not sure what race you're referring to. Are you suggesting that this isn't the case? - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
When you post to a receive queue, the buffer does not belong the application until after it returned to the app. While it belongs to the HW, the hardware may use it to write the contents of send messages targeting the QP - during this time the driver should no use this buffer. When you poll the CQ referenced by the receive queue, you will get a CQE with the work request id of the buffer you posted and then you know the buffer is back in the driver ownership. On Thu, 2008-02-21 at 09:41 -0600, Steve Wise wrote: Hey all: I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: Consider that the post_recv methods usually just program a hw-specific WR in the RQ, then ring a doorbell, then return. There is a delta period between when the app returns from the post_recv call and when the HW actually DMA's the WR and programs up the HW to enable that buffer. (I'm assumming a specific HW design here, but I _think_ most HW behaves this way?). If this is all true, then from the apps point of view, the buffer isn't really available when it returns from post_recv. This can lead to conditions where the app advertises that recv buffer to the peer via some out of band channel, and the peer posts a SEND which arrives _before_ the HW has actually setup the RECV buffer. Granted, this hole is small, but does it exist for nes, mthca, ehca, and ipath libs/drivers? Or do they _not_ have this issue? Does the IBTA spec discuss this at all? Most importantly, does the IBTA spec and/or the iWARP verbs spec _mandate_ that the buffer is actually available when the post_recv() method returns (I didn't find it in the iWARP spec)? If such a mandate exists, then it would force post_recv() methods to stall and/or somehow know when the HW has completed setting up the recv buffer. This would kill performance IMO and I think no such mandate exists, but I wanted to know what others think. Maybe this isn't an issue with mthca/ehca/ipath/nes? Thanks, Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Eli Cohen wrote: When you post to a receive queue, the buffer does not belong the application until after it returned to the app. While it belongs to the HW, the hardware may use it to write the contents of send messages targeting the QP - during this time the driver should no use this buffer. When you poll the CQ referenced by the receive queue, you will get a CQE with the work request id of the buffer you posted and then you know the buffer is back in the driver ownership. This has nothing to do with my questions. On Thu, 2008-02-21 at 09:41 -0600, Steve Wise wrote: Hey all: I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: Consider that the post_recv methods usually just program a hw-specific WR in the RQ, then ring a doorbell, then return. There is a delta period between when the app returns from the post_recv call and when the HW actually DMA's the WR and programs up the HW to enable that buffer. (I'm assumming a specific HW design here, but I _think_ most HW behaves this way?). If this is all true, then from the apps point of view, the buffer isn't really available when it returns from post_recv. This can lead to conditions where the app advertises that recv buffer to the peer via some out of band channel, and the peer posts a SEND which arrives _before_ the HW has actually setup the RECV buffer. Granted, this hole is small, but does it exist for nes, mthca, ehca, and ipath libs/drivers? Or do they _not_ have this issue? Does the IBTA spec discuss this at all? Most importantly, does the IBTA spec and/or the iWARP verbs spec _mandate_ that the buffer is actually available when the post_recv() method returns (I didn't find it in the iWARP spec)? If such a mandate exists, then it would force post_recv() methods to stall and/or somehow know when the HW has completed setting up the recv buffer. This would kill performance IMO and I think no such mandate exists, but I wanted to know what others think. Maybe this isn't an issue with mthca/ehca/ipath/nes? Thanks, Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC] ib_mthca: avoid recycling FMR R_Keys too soon
On Thursday 21 February 2008 17:49, Or Gerlitz wrote: Is it possible to never re-initialize the key? if yes, what's the semantics of the M=max_map_per_fmr device attribute? I was thinking that after the fmr was mapped M times, something --has-- to be reinitialized, sorry if this is my misunderstanding, can you clarify that? It does not have to be re-initialized. However, the cache needs to be flushed (SYNC_TPT), so that we do not have the same 32-bit key multiple times in the cache. The something which must be done is to flush the cache. Once the cache is flushed, we again have max_map_per_fmr remap possibilities, and we don't care what the initial sequence value is. However, the index value MUST be the same as it was before. - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Sean Hefty wrote: I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: Consider that the post_recv methods usually just program a hw-specific WR in the RQ, then ring a doorbell, then return. There is a delta period between when the app returns from the post_recv call and when the HW actually DMA's the WR and programs up the HW to enable that buffer. (I'm assumming a specific HW design here, but I _think_ most HW behaves this way?). If this is all true, then from the apps point of view, the buffer isn't really available when it returns from post_recv. This can lead to conditions where the app advertises that recv buffer to the peer via some out of band channel, and the peer posts a SEND which arrives _before_ the HW has actually setup the RECV buffer. I'm really not following the question here. When you say that the app advertises the buffer, are you saying that it sends some sort of credit that a receive is posted? Yes. I would fully expect the receive buffer to be available to receive data before post_recv returns, but I not sure what race you're referring to. Are you suggesting that this isn't the case? That is what I'm suggesting. Here is the timing sequence: t0: app calls post_recv t1: post_recv code builds a hw-specific WR in the hw work queue t2: post_recv code rings a doorbell (write to adapter mem or register) t3: post_recv returns t4: app assumes the buffer is ready t5: device HW dma engine moves the WR to adapter memory t6: device FW prepares the HW RQ entry making the buffer available. Note at time t4, the application thinks its ready, but its really not ready until t6. This clearly is a implementation-specific issue. But I was under the assumption that all the RDMA HW behaves this way. Maybe not? To further complicate things, this race condition is never seen _if_ the application uses the same QP to advertise (send a credit allowing the peer to SEND) the RECV buffer availability. So if the app posts a SEND after the RECV is posted and that SEND allows the peer access to the RECV buffer, then everything is ok. This is due to the fact that the FW/HW will process the SEND only after processing the RECV. If the app uses a different QP to post the SEND advertising the RECV, then the race condition exists allowing the peer to SEND into that RECV buffer before the HW makes it ready. This all assumes a specific design of rdma hw. Maybe nobody else has this issue? Maybe I'm not making sense. :) Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Show your loved one you care, help them quit smoking
Stop killing yourself with cigarettes! Cigarettes are filled with poisonous toxic chemicals that will take years off your life. Take the step towards changing your life now with Live_Free Patches, guaranted to help you stop smoking and give you a new lease on life! I'm happy to report that it worked and that I haven't had a smoke in over 11 months. Thanks for giving me my life back. Scott, CO Take Note!! Special discount prices now in effect, but won't last long! Click here for more information ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, 21 Feb 2008, Steve Wise wrote: Hey all: I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: Consider that the post_recv methods usually just program a hw-specific WR in the RQ, then ring a doorbell, then return. There is a delta period between when the app returns from the post_recv call and when the HW actually DMA's the WR and programs up the HW to enable that buffer. (I'm assumming a specific HW design here, but I _think_ most HW behaves this way?). If this is all true, then from the apps point of view, the buffer isn't really available when it returns from post_recv. This can lead to conditions where the app advertises that recv buffer to the peer via some out of band channel, and the peer posts a SEND which arrives _before_ the HW has actually setup the RECV buffer. Granted, this hole is small, but does it exist for nes, mthca, ehca, and ipath libs/drivers? Or do they _not_ have this issue? Does the IBTA spec discuss this at all? Most importantly, does the IBTA spec and/or the iWARP verbs spec _mandate_ that the buffer is actually available when the post_recv() method returns (I didn't find it in the iWARP spec)? If such a mandate exists, then it would force post_recv() methods to stall and/or somehow know when the HW has completed setting up the recv buffer. This would kill performance IMO and I think no such mandate exists, but I wanted to know what others think. Maybe this isn't an issue with mthca/ehca/ipath/nes? Thanks, Steve. From the RDMA application perspective, the application has to assume that when post_recv() returns, the RECV WR is on the QP's recv queue since there are no APIs for the application to query the availability/eligibility of a RECV. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thursday 21 February 2008 18:34, Steve Wise wrote: This clearly is a implementation-specific issue. Â But I was under the assumption that all the RDMA HW behaves this way. Â Maybe not? For RDMA operations, NO receive WQE needs to be posted. The rdma target is a memory region, with an rkey. The target advertises the rkey and the address, and the source posts an rdma operation using the target data. No completion is generated on the target (if there is no immediate data in the send). - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC] ib_mthca: avoid recycling FMR R_Keys too soon
On Thursday 21 February 2008 12:42, Or Gerlitz wrote: As far as I understand under Sinai you must issue an adjust_key call when the key is about to wraparound, correct? I don't think so. On Arbel, ib_mthca uses the entire upper part of the 32bit word as the sequence counter. Is SINAI_OPT is set, the sequence counter is in bits 25-31 (17-24 seem to be reserved, and bit 3 is mirrored to bit 23 - this is what adjust_key seems to be doing). If SINAI_OPT is not set, the sequence counter is in bits 17-31, and adjust_key is a no-nop. So when the sequence counter overflows, it doesn't spill into any reserved bit. At least that's how I read the code. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play [EMAIL PROTECTED] |/ | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] uDAPL libdat2.so version # problem for today's OFED code
Tang, Changqing wrote: Arlin: Here is another question. The /etc/dat.conf is: OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 ib0 0 OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 ib1 0 ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 ib0 0 ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 ib1 0 A simple code just call dat_registry_list_prodivers() to get the list in /etc/dat.conf, and call dat_ia_openv() in a loop of above list. If I compile and link this code with /usr/include/dat2 and libdat2.so, dat_ia_openv() return DAT_SUCCESS for all four entries. You should be using the dat_ia_open and not the dat_ia_openv. You are setting the MAJOR and MINOR versions according to the query and not based on your build so the open always return's SUCCESS. see dat.h for definition: #define dat_ia_open(name, qlen, async_evd, ia) \ dat_ia_openv((name), (qlen), (async_evd), (ia), \ DAT_VERSION_MAJOR, DAT_VERSION_MINOR, \ DAT_THREADSAFE) -arlin ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Eine Empfehlung von Simon Ziller
Hallo Daniel! Ich habe eine super Seite entdeckt, wo man ganz einfach einen Seitensprung Partner finden kann. Ich habe mir gerade mein Passwort angefordert und kann die Seite nur weiterempfehlen. Echt eine super Sache! Schau einfach auch mal vorbei: http://www.onlineseitensprung3.tk/ Viele Grüße Simon Ziller Diese ePost wurde versendet von: Simon Ziller ([EMAIL PROTECTED]) ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] reduce debt with Debt Pros
Get Out of Debt Today. Avoid Bankruptcy. Save Thousands... The Professional Way!! http://blongl.cn/___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Hello Steve, Here is the timing sequence: t0: app calls post_recv t1: post_recv code builds a hw-specific WR in the hw work queue t2: post_recv code rings a doorbell (write to adapter mem or register) t3: post_recv returns t4: app assumes the buffer is ready t5: device HW dma engine moves the WR to adapter memory t6: device FW prepares the HW RQ entry making the buffer available. Note at time t4, the application thinks its ready, but its really not ready until t6. This clearly is a implementation-specific issue. But I was under the assumption that all the RDMA HW behaves this way. Maybe not? To further complicate things, this race condition is never seen _if_ the application uses the same QP to advertise (send a credit allowing the peer to SEND) the RECV buffer availability. So if the app posts a SEND after the RECV is posted and that SEND allows the peer access to the RECV buffer, then everything is ok. This is due to the fact that the FW/HW will process the SEND only after processing the RECV. If the app uses a different QP to post the SEND advertising the RECV, then the race condition exists allowing the peer to SEND into that RECV buffer before the HW makes it ready. This all assumes a specific design of rdma hw. Maybe nobody else has this issue? Maybe I'm not making sense. :) I think your descriptions here match what Ralph found RNR in IPoIB-CM. Ralph, Does this make sense? Thanks Shirley___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, Feb 21, 2008 at 10:34:47AM -0600, Steve Wise wrote: To further complicate things, this race condition is never seen _if_ the application uses the same QP to advertise (send a credit allowing the peer to SEND) the RECV buffer availability. So if the app posts a SEND after the RECV is posted and that SEND allows the peer access to the RECV buffer, then everything is ok. This is due to the fact that the FW/HW will process the SEND only after processing the RECV. If the app uses a different QP to post the SEND advertising the RECV, then the race condition exists allowing the peer to SEND into that RECV buffer before the HW makes it ready. OpenMPI can be configured to send credit updates over different QP. I'll try to stress it next week to see what happens. -- Gleb. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, Feb 21, 2008 at 7:41 AM, Steve Wise [EMAIL PROTECTED] wrote: Hey all: I have a question regarding exactly _when_ a posted recv buffer is available for the HW to use: None of the verbs specifications will be explicit about this for the simple reason that none wants to specify exactly what an RQ or SRQ actually is. This is how good specifications are written. But, wherever this queue (RQ) or pool (SRQ) of receive WQEs live, successfully posting a receive WQE should mean exactly that -- it was successfully posted. And if a receive WQE was successfully posted I cannot see a justification for an RDMA device raising a no buffer available subsequent to that point. Now if a buffer was received prior to the recv wqe post completing, the error may have already been raised, and the exception might not be delivered to the host until after the recv wqe call successfully completed. So applications SHOULD NOT be written to assume that they will win a tie, but rather to ensure that the recv WQE is posted *before* the message arrives. Sending the request after posting the recv WQE for the reply should be mor than adequate for that purpose. Specific implementations should take whatever steps are required to ensure that the hardware will not declare no buffer after the user has posted a buffer, whether that is through a doorbell or by rechecking the queue after any check of cached contents comes up empty. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Ralph Campbell wrote: On Thu, 2008-02-21 at 10:09 -0800, Shirley Ma wrote: Hello Steve, Here is the timing sequence: t0: app calls post_recv t1: post_recv code builds a hw-specific WR in the hw work queue t2: post_recv code rings a doorbell (write to adapter mem or register) t3: post_recv returns t4: app assumes the buffer is ready This is wrong. The HCA has control of the receive buffer until poll_cq() returns a CQE saying the posted buffer is completed (either OK or error). Think about it. The application can do a post_recv() and it could be days or nanoseconds before a packet is sent to that buffer. The application can't assume anything about the contents until the HCA says something is there. Oh, I see. You are saying the application thinks the buffer is available for the HCA to use. t5: device HW dma engine moves the WR to adapter memory t6: device FW prepares the HW RQ entry making the buffer available. Note at time t4, the application thinks its ready, but its really not ready until t6. This clearly is a implementation-specific issue. But I was under the assumption that all the RDMA HW behaves this way. Maybe not? Not all hardware works the same. You can't make assumptions beyond what the library API guarantees without building hardware specific dependencies into your program. I'm asking this from a device driver developer's perspective. I'm not writing an application. I'm trying to understand and define exactly what must be guaranteed by the device/driver up returning from post_recv(). It can even change between different versions of microcode or kernel software for the same HCA. To further complicate things, this race condition is never seen _if_ the application uses the same QP to advertise (send a credit allowing the peer to SEND) the RECV buffer availability. So if the app posts a SEND after the RECV is posted and that SEND allows the peer access to the RECV buffer, then everything is ok. This is due to the fact that the FW/HW will process the SEND only after processing the RECV. If the app uses a different QP to post the SEND advertising the RECV, then the race condition exists allowing the peer to SEND into that RECV buffer before the HW makes it ready. Well, there is no guarantee that the HCA processes the post_recv() before the post_send() even on the same QP. Send and receive are unordered with respect to each other. The fact that it works is an HCA specific implementation artifact. This all assumes a specific design of rdma hw. Maybe nobody else has this issue? Maybe I'm not making sense. :) I think your descriptions here match what Ralph found RNR in IPoIB-CM. Ralph, Does this make sense? Thanks Shirley I think you are making sense. There is an indeterminate race between post_recv() returning to the application and when a packet being received by the HCA might be able to use that buffer. There are no ordering guarantees between messages sent on one QP and another so the application can't easily use a different QP to advertise posted buffers (credits). That is why the IB RC protocol does this for you in band if the RC QP is using a dedicated receive queue but not a shared receive queue. Do you mean the IB RC protocol advertises credits as part of the transport protocol? The problem with shared receive queues is that the application would have to pick an endpoint and tell it there is a buffer available for the endpoint to send to. Obviously, if you have two endpoints, they both can't send to the same receive buffer. ib_ipoib uses shared receive queues and doesn't try to manage posted buffer credits so the RNR NAK issue isn't the same as what Steve is trying to do. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] post_recv question
I'm asking this from a device driver developer's perspective. I'm not writing an application. I'm trying to understand and define exactly what must be guaranteed by the device/driver up returning from post_recv(). At least from IB's view for post receive (from spec): Control returns to the Consumer immediately after the WQEs have been submitted to the Receive Queue or the SRQ and the HCA has been notified that one or more WQEs are ready to process. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, 2008-02-21 at 21:31 +0200, Gleb Natapov wrote: On Thu, Feb 21, 2008 at 11:10:24AM -0800, Ralph Campbell wrote: To further complicate things, this race condition is never seen _if_ the application uses the same QP to advertise (send a credit allowing the peer to SEND) the RECV buffer availability. So if the app posts a SEND after the RECV is posted and that SEND allows the peer access to the RECV buffer, then everything is ok. This is due to the fact that the FW/HW will process the SEND only after processing the RECV. If the app uses a different QP to post the SEND advertising the RECV, then the race condition exists allowing the peer to SEND into that RECV buffer before the HW makes it ready. Well, there is no guarantee that the HCA processes the post_recv() before the post_send() even on the same QP. Send and receive are unordered with respect to each other. The fact that it works is an HCA specific implementation artifact. So there is no way to implement SW flow control over Infiniband? How is that IB spec has SW flow control specification for SDP in it then? This all assumes a specific design of rdma hw. Maybe nobody else has this issue? Maybe I'm not making sense. :) I think your descriptions here match what Ralph found RNR in IPoIB-CM. Ralph, Does this make sense? Thanks Shirley I think you are making sense. There is an indeterminate race between post_recv() returning to the application and when a packet being received by the HCA might be able to use that buffer. There are no ordering guarantees between messages sent on one QP and another so the application can't easily use a different QP to advertise posted buffers (credits). If after post_recv() returns it is guarantied that receive buffers are available to HW we don't need ordering guaranties between QPs to successfully implement SW flow control. Right. I was just pointing out that Steve is correct in his assumption that there might be races between post_recv() returning and the HCA being able to use that buffer to receive a packet that was already in flight before the post_recv(). That is why the IB RC protocol does this for you in band if the RC QP is using a dedicated receive queue but not a shared receive queue. What do you mean by that? RNR works for both RC and SRQ QPs. Right. I was referring to the credit returned in the ACK header which allows the remote RC QP endpoint to send a message after a post_recv(). There is no such message level flow control if the RC QP is using a SRQ. -- Gleb. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, 2008-02-21 at 13:32 -0600, Steve Wise wrote: I think you are making sense. There is an indeterminate race between post_recv() returning to the application and when a packet being received by the HCA might be able to use that buffer. There are no ordering guarantees between messages sent on one QP and another so the application can't easily use a different QP to advertise posted buffers (credits). That is why the IB RC protocol does this for you in band if the RC QP is using a dedicated receive queue but not a shared receive queue. Do you mean the IB RC protocol advertises credits as part of the transport protocol? Yes. See chapter 9.7.7.2 in Rel 1.2 vol. 1. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] [2.6 patch] infiniband/hw/nes/nes_verbs.c: fixoff-by-one
Looking at the patches what you did seems OK. But regarding review I have a different criticism directed at Roland: This driver should really have gotten some review before being included in the kernel. Even a simple checkpatch run finds more than 250 stylistic errors (not code bugs but cases where the driver violates the standard code formatting rules of kernel code). And I'm not talking about the 2000 checkpatch warnings that are mostly about too long lines (which should arguably also be fixed). And many more issues that could have been foung during a review. E.g. when you look at 3/8 from this series the code if (!cm_node) return -EINVAL; new_send = kzalloc(sizeof(*new_send), GFP_ATOMIC); if (!new_send) return -1; doesn't look good since the -1 should most likely better be something like -ENOMEM (I haven't checked whether you can immediately change it at this specific place). And these are just comments from someone with zero knowledge about InfiniBand, but I'd expect InfiniBand-specifig bugs might be found before they hit users if an InfiniBand maintainer would review the complete driver. Note that this is not meant as a criticism against Glenn - it's normal that submitted code contains bugs, but a code review can help to cope with this. Glenn cu Adrian Hi, Adrian. Yeah, I agree that the stylistic issues are annoying and I am actually itching to get some of those simples things corrected. Roland has outlined several areas for improvement in the driver (style-wise and substance-wise) and I'm working to address those. I'm learning the ropes here so I expect I'll get faster/better at responding and fixing things like the coverity issues you flagged. I need to pull these tools into my own release process so I'm catching flaws on my side. I want the driver to be worthy. Regards, Glenn ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, Feb 21, 2008 at 11:44:07AM -0800, Ralph Campbell wrote: That is why the IB RC protocol does this for you in band if the RC QP is using a dedicated receive queue but not a shared receive queue. What do you mean by that? RNR works for both RC and SRQ QPs. Right. I was referring to the credit returned in the ACK header which allows the remote RC QP endpoint to send a message after a post_recv(). There is no such message level flow control if the RC QP is using a SRQ. Ah, you are talking about that flow control. But the purpose of that flow control is to rate limit a sender in case of receive buffers shortage, not to prevent RNRs completely. -- Gleb. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Sean Hefty wrote: I'm asking this from a device driver developer's perspective. I'm not writing an application. I'm trying to understand and define exactly what must be guaranteed by the device/driver up returning from post_recv(). At least from IB's view for post receive (from spec): Control returns to the Consumer immediately after the WQEs have been submitted to the Receive Queue or the SRQ and the HCA has been notified that one or more WQEs are ready to process. - Sean See? This implies that the HCA is _not_ necessarily ready to place incoming SENDS into those posted recv buffers... the HCA has been notified. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
OpenMPI can be configured to send credit updates over different QP. I'll try to stress it next week to see what happens. It seems that it would be pretty hard to hit this race in practice. And I don't think mem-free Mellanox hardware has any race -- not positive about Tavor/non-mem-free Arbel. (On IB you need to set RNR retries to 0 also for the missing receive to be detectable even if the race exists) ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Caitlin Bestler wrote: On Thu, Feb 21, 2008 at 11:40 AM, Sean Hefty [EMAIL PROTECTED] wrote: I'm asking this from a device driver developer's perspective. I'm not writing an application. I'm trying to understand and define exactly what must be guaranteed by the device/driver up returning from post_recv(). At least from IB's view for post receive (from spec): Control returns to the Consumer immediately after the WQEs have been submitted to the Receive Queue or the SRQ and the HCA has been notified that one or more WQEs are ready to process. - Sean Would you agree that if the WQEs have already successfully been submitted to the Receive Queue or the SRQ and the HCA has been notified that the HCA would be incorrect in subsequently raising an error stating that the buffers were not available? iWARP does not convey send credits in the RDMA protocol, but I believe both iWARP and IB are in agreement that declaring no buffer available and causing the reliable connection to be torn down is a serious step. The HCA/RNIC is not free to be sloppy in making this determination. There are other places in both specifications where the RDMA device is given latitude to asynchronously implement a request. For example, it is clear that a window is *not* necessarily bound when the bind call completes. But in all those cases there is an explicit completion to allow the consumer to unambiguously know when it is safe to proceed. the application is never expected to rely on knowledge of specific HCAs or RNICs, or to guess what might be good enough. There are only two feedbacks from posting a Receive WQE: the call completion and the CQE being returned by cq_poll(). There are only two states for the Recv WQE between those two events: available for allocation and allocated. And the application does not need to know about the difference between those two states on a per-WQE basis. If there were a third state, then there would have to be a mechanism to make that information available. There is none, so such a third state must not exist (at least in any observable form). I agree. Its just that the wording above is pretty loose in my mind. But I'm only seeking clarification. Seems the consensus is that when you return from post_recv() the buffer can be assumed to be available for incoming SEND placement... ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, Feb 21, 2008 at 12:22:16PM -0800, Roland Dreier wrote: (On IB you need to set RNR retries to 0 also for the missing receive to be detectable even if the race exists) OpenMPI does this for SW flow controlled QPs. -- Gleb. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] post_recv question
Seems the consensus is that when you return from post_recv() the buffer can be assumed to be available for incoming SEND placement... As you pointed out earlier you have done a PIO. Nothing is said about the 'readiness'. A race between a doorbell and an incoming send packet isn't really meaningful if they happen independently at or very nearly the same time. Either it makes it or it doesn't. This is like a two processor load/store race in memory. The order is really arbitrary if the two operations are independent. On the other hand if you do the doorbell to post the receive buffer and *then* do another doorbell to post a send operation that causes the existence of the buffer to be made known to the other side and *then* the other side sends a message that has to work so the HCA has to complete the one doorbell before it can handle the second. If you notify out of band then you mileage may vary. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [2.6 patch] infiniband/hw/nes/nes_verbs.c: fix off-by-one
This driver should really have gotten some review before being included in the kernel. Even a simple checkpatch run finds more than 250 stylistic errors (not code bugs but cases where the driver violates the standard code formatting rules of kernel code). Linus has strongly stated that we should merge hardware drivers early, and I agree: although the nes driver clearly needs more work, there's no advantage to users with the hardware in forcing them to wait for 2.6.26 to merge the driver, since they'll just have to patch the grungy code in themselves anyway. And by merging the driver early, we get fixed up for any tree-wide changes and allow janitors to help with the cleanup. (By the way, the code is not that pretty but it a lot closer to upstream style than most driver submissions) And these are just comments from someone with zero knowledge about InfiniBand, but I'd expect InfiniBand-specifig bugs might be found before they hit users if an InfiniBand maintainer would review the complete driver. Just for the record, although this driver is under drivers/infiniband, it is actually for a device that does iWARP/10 Gb ethernet. At some point we may want to rename drivers/infiniband to drivers/rdma, but so far the churn hasn't seemed worth it for what is basically a cosmetic issue. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, Feb 21, 2008 at 11:40 AM, Sean Hefty [EMAIL PROTECTED] wrote: I'm asking this from a device driver developer's perspective. I'm not writing an application. I'm trying to understand and define exactly what must be guaranteed by the device/driver up returning from post_recv(). At least from IB's view for post receive (from spec): Control returns to the Consumer immediately after the WQEs have been submitted to the Receive Queue or the SRQ and the HCA has been notified that one or more WQEs are ready to process. - Sean Would you agree that if the WQEs have already successfully been submitted to the Receive Queue or the SRQ and the HCA has been notified that the HCA would be incorrect in subsequently raising an error stating that the buffers were not available? iWARP does not convey send credits in the RDMA protocol, but I believe both iWARP and IB are in agreement that declaring no buffer available and causing the reliable connection to be torn down is a serious step. The HCA/RNIC is not free to be sloppy in making this determination. There are other places in both specifications where the RDMA device is given latitude to asynchronously implement a request. For example, it is clear that a window is *not* necessarily bound when the bind call completes. But in all those cases there is an explicit completion to allow the consumer to unambiguously know when it is safe to proceed. the application is never expected to rely on knowledge of specific HCAs or RNICs, or to guess what might be good enough. There are only two feedbacks from posting a Receive WQE: the call completion and the CQE being returned by cq_poll(). There are only two states for the Recv WQE between those two events: available for allocation and allocated. And the application does not need to know about the difference between those two states on a per-WQE basis. If there were a third state, then there would have to be a mechanism to make that information available. There is none, so such a third state must not exist (at least in any observable form). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Can we provide a qp_num when creating a QP ?
HI, Roland or other engineers: I have asked this question before. To wire a QP connection, each side need to know peer's side port lid and qp_num. For MPI with many ranks, this is an alltoall exchange. If we can create a QP with provided qp_num, then MPI does not need the qp_num exchange, for a QP-pair between two processes, MPI can figure out peer QP's qp_num. So we can eliminate the third-party channel to exchange information, and speedup the startup time. Curently qp_num is always a return value from the driver. If we can suggest a qp_num when creating a QP, and the qp_num is already used, then IBV can either error out, or pick up another number for app. Thanks. --CQ ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Is it really intended to merge drivers without _any_ kind of review? This driver even lacks a basic please fix the 250 checkpatch errors [1] and similar low hanging fruits that could easily be spotted and then fixed by the submitter within a short amount of time. Just to be clear, this driver was reviewed. Many issues were found, and many were fixed while others are being worked on. It's a judgement call when to merge things, but in this case given the good engagement from the vendor, I didn't see anything to be gained by delaying the merge. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, 2008-02-21 at 12:22 -0800, Roland Dreier wrote: OpenMPI can be configured to send credit updates over different QP. I'll try to stress it next week to see what happens. It seems that it would be pretty hard to hit this race in practice. And I don't think mem-free Mellanox hardware has any race -- not positive about Tavor/non-mem-free Arbel. (On IB you need to set RNR retries to 0 also for the missing receive to be detectable even if the race exists) Wellconsider the case of two adapters on two different pci busses. One is busy one is not. Specifically, the post_recv QP is on an HCA on a busy bus, the post_send (of the credit) is on a QP on an HCA on a dedicated bus. I think we can assume that the ringing of the doorbell is synchronous, i.e. when the processor completes it's write, the card knows there are RQ WQE available in host memory, but whether or not and when the WQE is fetched relative to the processor is asynchronous. The card will have to get on the bus again and read host memory. Meanwhile the processor runs off and posts a send on the other QP on a different HCA of the credit. The peer responds, with a send to the data qp. The receiving adapter knows the WQE is there, but it may not have fetched it yet. The crux of the question is whether or not the adapter MUST fetch the WQE and place the packet, or can it simply drop it. If you say it MUST, then you must have enough buffer to handle worst case delayed placement. If the post guarantee is only within the same QP or affiliated QP (SRQ), then all it must do is ensure that when processing a SQ request AND the associated RQ (SRQ) is empty, that it must fetch outstanding, unread RQ WQE prior to processing the SQ WQE. This allows for the post_recv guarantees without the HCA buffering requirements. I seem to recall that the specs say something about ordering and synchronization between unaffiliated QP and/or between adapters, but the specific reference long ago fell off my LRU list. Tom ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Dear Friend
Dear Friend I did not forgot your past effort and attempts to assist me, now I'm happy to inform you that i have succeeded in getting those funds transferred under the cooperation of a new partner from Paraguay. Now Contact my secretary ask him for ($450.000.) for your compensation his, name is Mr. Morgan Preye E-Mail ([EMAIL PROTECTED] HE will send you the money without any delay. Your information needed to enable him sends the cheque to you because I travel for another investment project. .. 1. FULL NAMES:.. 2. ADDRESS:... 3. TELEPHONE NUMBER:. 4. STATE:... 5. COUNTRY:.. Take care of yourself I hope to meet you soon Regards Barr.Anthony Neil [Esq]. - Yahoo! Answers - Get better answers from someone who knows. Tryit now.___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Greg KH wrote: On Fri, Feb 22, 2008 at 01:33:03AM +0300, Alexey Dobriyan wrote: Speaking of driver, could authors please comment all those barrier() calls and remove trailing return; at the end of void functions. Why don't you make a patch to checkpatch.pl for those types of things? :) Drat, you beat me to that response. :) ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Thu, Feb 21, 2008 at 02:43:15PM -0800, Greg KH wrote: On Fri, Feb 22, 2008 at 01:33:03AM +0300, Alexey Dobriyan wrote: On Thu, Feb 21, 2008 at 01:14:55PM -0800, Linus Torvalds wrote: Quite frankly, I've several times been *this* close (holds up fingers so you can't even see between them) to just remove checkpatch entirely. Agrh! What stopped you?! I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines (where the fixed version is usually worse than the original), but it's been true for some other warnings too. Speaking of driver, could authors please comment all those barrier() calls and remove trailing return; at the end of void functions. Why don't you make a patch to checkpatch.pl for those types of things? :) Sorry, I'm not touching it with an eigthy six foot pole. :^) ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Feb 21 2008 14:43, Greg KH wrote: On Fri, Feb 22, 2008 at 01:33:03AM +0300, Alexey Dobriyan wrote: On Thu, Feb 21, 2008 at 01:14:55PM -0800, Linus Torvalds wrote: Quite frankly, I've several times been *this* close (holds up fingers so you can't even see between them) to just remove checkpatch entirely. Agrh! What stopped you?! I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines (where the fixed version is usually worse than the original), but it's been true for some other warnings too. Speaking of driver, could authors please comment all those barrier() calls and remove trailing return; at the end of void functions. Why don't you make a patch to checkpatch.pl for those types of things? :) checkpatch would never allow a patch to patch checkpatch. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. Especially useful with long printk() lines and long arithmetic expressions. -- Krzysztof Halasa ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Thu, Feb 21, 2008 at 05:33:10PM -0500, Jeff Garzik wrote: ... But similarly... I merge drivers long before our SCSI maintainer will, and I value it works above stupid checkpatch warnings. I was not talking about checkpatch warnings. I'm talking about checkpatch errors for code like if ((page_count!=0)(page_count12)-(region-offset(4096-1))=region-length) I have to accept that Linus prefers to have the driver merged first and let janitors make the code readable in subsequent patches, but if GNU indent wasn't unable to properly cope with the fact that this driver has over 2000 lines that are over 80 characters long I'd simply run this driver through scripts/Lindent . Jeff cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Krzysztof Halasa wrote: Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. I think checkpatch is useful, but I've agreed from the beginning that the line length complaint is completely silly. If a driver is full of lines of length 80, that's a problem. If it's just a few, that's more of a developer decision based on the individual line of code. Jeff ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Fri, 22 Feb 2008 00:38:14 +0100 Krzysztof Halasa [EMAIL PROTECTED] wrote: Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. Especially useful with long printk() lines and long arithmetic expressions. Agreed. The fact I'm having to fix bugs introduced by incorrect printk wrapping confirms that for printk strings at least it is overzealous. I'm all for it complaining about printk(KERN_FOO 90 chars, foo, bar + 37); type bits when the foo, bar should be underneath to be visible but for straight quoted text too long it should not warn and try to get the text folded. Alan ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Good example, more detailed comments in-line. On Thu, Feb 21, 2008 at 2:47 PM, Tom Tucker [EMAIL PROTECTED] wrote: On Thu, 2008-02-21 at 12:22 -0800, Roland Dreier wrote: OpenMPI can be configured to send credit updates over different QP. I'll try to stress it next week to see what happens. It seems that it would be pretty hard to hit this race in practice. And I don't think mem-free Mellanox hardware has any race -- not positive about Tavor/non-mem-free Arbel. (On IB you need to set RNR retries to 0 also for the missing receive to be detectable even if the race exists) Wellconsider the case of two adapters on two different pci busses. One is busy one is not. Specifically, the post_recv QP is on an HCA on a busy bus, the post_send (of the credit) is on a QP on an HCA on a dedicated bus. I think we can assume that the ringing of the doorbell is synchronous, i.e. when the processor completes it's write, the card knows there are RQ WQE available in host memory, but whether or not and when the WQE is fetched relative to the processor is asynchronous. The card will have to get on the bus again and read host memory. Meanwhile the processor runs off and posts a send on the other QP on a different HCA of the credit. The peer responds, with a send to the data qp. The receiving adapter knows the WQE is there, but it may not have fetched it yet. The crux of the question is whether or not the adapter MUST fetch the WQE and place the packet, or can it simply drop it. If you say it MUST, then you must have enough buffer to handle worst case delayed placement. If the post guarantee is only within the same QP or affiliated QP (SRQ), then all it must do is ensure that when processing a SQ request AND the associated RQ (SRQ) is empty, that it must fetch outstanding, unread RQ WQE prior to processing the SQ WQE. This allows for the post_recv guarantees without the HCA buffering requirements. I disagree. What is required is the adapter MUST NOT take an action based on a buffer not available diagnosis until it is certain that it has considered all WQEs that have been successfully posted by the consumer. Further, it MUST NOT require a further action by the consumer to guarantee that it notices a posted WQE. Particularly in iWARP the application layer is free to implement Send/Recv credits by *any* mechanism desired (the only requirement is that there is one, you might recall that there were extensive discussions on this point regarding unsolicited messages for iSER). The concept that the application MUST provide SOME form of flow control was accepted only grudgingly. So clearly any more specific mechanisms were not the intent of the drafters. So if there are still 1000 Recv WQEs in the SRQ we can allow the adapter a great deal of flexibility in when the 1001st is linked into the data structures. The only real constraint is that it MUST do 1001 successful allocations *before* it triggers any sort of buffer not available error. I'm not recalling the specific language immediately, but I do recall concluding that sub-dividing the SRQ on an RSS-like basis was *not* compliant with the RDMAC specs and that the left-half of the adpater could not declare buffer not found while the right-half of the adapter still had a free buffer. This is of course a major pain if you are trying to team two RDMA adapters to form a single virtual adapter, or even two largely independent ports on the same physical adapter. But the intent of the specifications are very clear: if the consumer has posted 1000 recv WQEs and gotten SUCCESS to each of them, then the adapter MUST allocate all 1000 recv WQEs *before* it can fail an operation because no buffer was available. So there is a difference between must be pushed to the adapter now and must be pushed to the adapter before it is too late. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Jeff Garzik [EMAIL PROTECTED] writes: If a driver is full of lines of length 80, that's a problem. I'm not sure. We all have more than 80-chars wide displays for years, don't we? The problem is not the number of characters but code which is too complex and which may sometimes have too many levels of indentation. Unfortunately expressing code complexity in terms of line lengths doesn't seem to work at all. The 80-chars limit harms development, it makes the code less readable, sometimes far less readable. I think we should increase length limit to 132 for the whole kernel code. Obviously printk() _output_ etc. should stay at 80. -- Krzysztof Halasa ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] OpenSM Console Ideas?
LLNL uses the remote console feature in OpenSM. We have a need to secure this remote connection with authentication/authorization and encryption (specifically PAM and OpenSSL). I have a working prototype, and would like to formalize it and share/include this with OpenSM. Before I go down this path too far, I would like to solicit ideas from others who use the console. Currently, the console can be used in local, loopback, or remote modes. If security is added, should it replace other modes, or be an additional mode? The intention is to use PAM for the AA framework, and OpenSSL for secure sockets. Are there any serious objections to this implementation plan? The console feature has always been a configuration/command line option, but should the secure console be conditionally compiled/linked as well? (eliminate dependency on the PAM and OpenSSL libs, pam, pam_misc, cryto, ssl). The secure console would require a relatively primitive client application, which I will probably package under opensm, just like osmtest. Make sense? Do you have any other ideas or suggestions for the remote console? -- Timothy A. Meier Computer Scientist ICCD/High Performance Computing 925.422.3341 [EMAIL PROTECTED] ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Thu, Feb 21, 2008 at 11:31:44PM +, Alan Cox wrote: On Fri, 22 Feb 2008 00:38:14 +0100 Krzysztof Halasa [EMAIL PROTECTED] wrote: Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. Especially useful with long printk() lines and long arithmetic expressions. Agreed. The fact I'm having to fix bugs introduced by incorrect printk wrapping confirms that for printk strings at least it is overzealous. I'm all for it complaining about printk(KERN_FOO 90 chars, foo, bar + 37); type bits when the foo, bar should be underneath to be visible but for straight quoted text too long it should not warn and try to get the text folded. I think it should warn, but people have to be aware of the following: - checkpatch errors are for stuff that really has to be fixed - checkpatch warnings are for stuff that should be looked at - the goal is not 0 checkpatch warnings but readable and bugfree code A nice property of checkpatch is that it encourages to look closer at code like the following (it warns about the volatile): if (!netif_queue_stopped(netdev)) { netif_stop_queue(netdev); barrier(); if ((volatile u16)nesnic-sq_tail)+(nesnic-sq_size*2))-nesnic-sq_head) (nesnic-sq_size - 1)) != 1) { netif_start_queue(netdev); goto sq_no_longer_full; } } Alan cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Krzysztof Halasa wrote: Jeff Garzik [EMAIL PROTECTED] writes: If a driver is full of lines of length 80, that's a problem. I'm not sure. We all have more than 80-chars wide displays for years, don't we? The Every time this discussion comes up, people point out that it remains highly common to open multiple 80-column terminal windows, making the 80-column limit still highly relevant in modern times. The problem is [...] code which is too complex and which may sometimes have too many levels of indentation. Quite true. Jeff ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] opensm/libvendor/osm_vendor_ibumad.c: Add environment variable control for OSM_UMAD_MAX_PENDING
From b8fb2151b92ddd4a7d2a4cc2ab38a6b34fffc7ab Mon Sep 17 00:00:00 2001 From: Ira K. Weiny [EMAIL PROTECTED] Date: Thu, 21 Feb 2008 09:10:10 -0800 Subject: [PATCH] opensm/libvendor/osm_vendor_ibumad.c: Add environment variable control for OSM_UMAD_MAX_PENDING Signed-off-by: Ira K. Weiny [EMAIL PROTECTED] --- opensm/include/vendor/osm_vendor_ibumad.h |4 ++-- opensm/libvendor/osm_vendor_ibumad.c | 27 ++- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/opensm/include/vendor/osm_vendor_ibumad.h b/opensm/include/vendor/osm_vendor_ibumad.h index 84fd21a..3a3f070 100644 --- a/opensm/include/vendor/osm_vendor_ibumad.h +++ b/opensm/include/vendor/osm_vendor_ibumad.h @@ -141,12 +141,12 @@ typedef struct _umad_match { uint32_t version; } umad_match_t; -#define OSM_UMAD_MAX_PENDING 1000 +#define DEFAULT_OSM_UMAD_MAX_PENDING 1000 typedef struct vendor_match_tbl { - umad_match_t tbl[OSM_UMAD_MAX_PENDING]; uint32_t last_version; int max; + umad_match_t *tbl; } vendor_match_tbl_t; typedef struct _osm_vendor { diff --git a/opensm/libvendor/osm_vendor_ibumad.c b/opensm/libvendor/osm_vendor_ibumad.c index 679f06a..f847e61 100644 --- a/opensm/libvendor/osm_vendor_ibumad.c +++ b/opensm/libvendor/osm_vendor_ibumad.c @@ -451,6 +451,7 @@ ib_api_status_t osm_vendor_init(IN osm_vendor_t * const p_vend, IN osm_log_t * const p_log, IN const uint32_t timeout) { + char *max = NULL; int r, n_cas; OSM_LOG_ENTER(p_log); @@ -480,7 +481,31 @@ osm_vendor_init(IN osm_vendor_t * const p_vend, } p_vend-ca_count = n_cas; - p_vend-mtbl.max = OSM_UMAD_MAX_PENDING; + p_vend-mtbl.max = DEFAULT_OSM_UMAD_MAX_PENDING; + + if ((max = getenv(OSM_UMAD_MAX_PENDING)) != NULL) { + int tmp = strtol(max, NULL, 0); + if (tmp 0) + p_vend-mtbl.max = tmp; + else + osm_log(p_vend-p_log, OSM_LOG_ERROR, + osm_vendor_init: Error: + OSM_UMAD_MAX_PENDING=%d is invalid, + tmp); + } + + osm_log(p_vend-p_log, OSM_LOG_INFO, + osm_vendor_init: %d pending umads specified\n, + p_vend-mtbl.max); + + p_vend-mtbl.tbl = calloc(p_vend-mtbl.max, sizeof(*(p_vend-mtbl.tbl))); + if (!p_vend-mtbl.tbl) { + osm_log(p_vend-p_log, OSM_LOG_ERROR, + osm_vendor_init: Error: + failed to allocate vendor match table\n); + r = IB_INSUFFICIENT_MEMORY; + goto Exit; + } Exit: OSM_LOG_EXIT(p_log); -- 1.5.1 0001-opensm-libvendor-osm_vendor_ibumad.c-Add-environmen.patch Description: Binary data ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Thu, Feb 21, 2008 at 01:30:37PM -0800, Greg KH wrote: On Thu, Feb 21, 2008 at 11:01:24PM +0200, Adrian Bunk wrote: BTW: Greg, you are Cc'ed for your joke in [3]... [3] http://lkml.org/lkml/2008/2/12/427 That was not a joke, I ment it. Do you have proof that the majority of patches going into the kernel tree are not reviewed by at least 2 people? ... I don't see any way for getting a proof in any direction, but no matter how many SOB lines a patch has my impression is that usually at a maximum the one person who applies a patch reviews it (review as in understands the code in question well and reviews the patch line for line). Sometimes there's even simply noone who could a patch at all, e.g. I'm not sure whether there is anyone at all who would be able to review a patch by Sam fiddling with kbuild internals. How many lines of code get changed in the kernel per day? And we should have for each changed line two people who are both experienced enough in this area of the kernel and who have the time to review this line? Even one of our best maintained subsystems has commits that contain bugs like + if ((!tid_agg_rx-reorder_buf) net_ratelimit()) { + printk(KERN_ERR can not allocate reordering buffer + to tid %d\n, tid); + goto end; + } thanks, greg k-h cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
On Thu, 2008-02-21 at 15:48 -0800, Caitlin Bestler wrote: Good example, more detailed comments in-line. On Thu, Feb 21, 2008 at 2:47 PM, Tom Tucker [EMAIL PROTECTED] wrote: On Thu, 2008-02-21 at 12:22 -0800, Roland Dreier wrote: OpenMPI can be configured to send credit updates over different QP. I'll try to stress it next week to see what happens. It seems that it would be pretty hard to hit this race in practice. And I don't think mem-free Mellanox hardware has any race -- not positive about Tavor/non-mem-free Arbel. (On IB you need to set RNR retries to 0 also for the missing receive to be detectable even if the race exists) Wellconsider the case of two adapters on two different pci busses. One is busy one is not. Specifically, the post_recv QP is on an HCA on a busy bus, the post_send (of the credit) is on a QP on an HCA on a dedicated bus. I think we can assume that the ringing of the doorbell is synchronous, i.e. when the processor completes it's write, the card knows there are RQ WQE available in host memory, but whether or not and when the WQE is fetched relative to the processor is asynchronous. The card will have to get on the bus again and read host memory. Meanwhile the processor runs off and posts a send on the other QP on a different HCA of the credit. The peer responds, with a send to the data qp. The receiving adapter knows the WQE is there, but it may not have fetched it yet. The crux of the question is whether or not the adapter MUST fetch the WQE and place the packet, or can it simply drop it. If you say it MUST, then you must have enough buffer to handle worst case delayed placement. If the post guarantee is only within the same QP or affiliated QP (SRQ), then all it must do is ensure that when processing a SQ request AND the associated RQ (SRQ) is empty, that it must fetch outstanding, unread RQ WQE prior to processing the SQ WQE. This allows for the post_recv guarantees without the HCA buffering requirements. I disagree. What is required is the adapter MUST NOT take an action based on a buffer not available diagnosis until it is certain that it has considered all WQEs that have been successfully posted by the consumer. Ok. So what does the HW do with the packet while it's pondering it's options? It has to put it somewhere. That's my point. You either guarantee that any advertisement of availability can't be issued prior to the buffer being available, or the buffer is synchronously available prior to the advertisement of the credit. Snooping the [s]RQ while processing SQ is a way of delaying the issuance of a credit before the buffer (spec'd in the WQE) is actually known to the adapter. But this only works in the context of a single HBA. Further, it MUST NOT require a further action by the consumer to guarantee that it notices a posted WQE. Agreed. Particularly in iWARP the application layer is free to implement Send/Recv credits by *any* mechanism desired (the only requirement is that there is one, you might recall that there were extensive discussions on this point regarding unsolicited messages for iSER). The concept that the application MUST provide SOME form of flow control was accepted only grudgingly. So clearly any more specific mechanisms were not the intent of the drafters. Yes, but I'm not sure there's any confusion there -- I think this discussion is about how credits can be issued. In particular what does it mean to issue a credit for: - this QP, - another QP on the same HCA - another QP on a different HCA So far, it seems the consensus is that all of the above should work. I'm just not convinced the current implementations guarantee this. So if there are still 1000 Recv WQEs in the SRQ we can allow the adapter a great deal of flexibility in when the 1001st is linked into the data structures. The only real constraint is that it MUST do 1001 successful allocations *before* it triggers any sort of buffer not available error. agreed. I'm not recalling the specific language immediately, but I do recall concluding that sub-dividing the SRQ on an RSS-like basis was *not* compliant with the RDMAC specs and that the left-half of the adpater could not declare buffer not found while the right-half of the adapter still had a free buffer. agreed. This is of course a major pain if you are trying to team two RDMA adapters to form a single virtual adapter, or even two largely independent ports on the same physical adapter. But the intent of the specifications are very clear: if the consumer has posted 1000 recv WQEs and gotten SUCCESS to each of them, then the adapter MUST allocate all 1000 recv WQEs *before* it can fail an operation because no buffer was available. agreed. So there is a difference between must be pushed to the adapter now and must be pushed to the adapter before it is too late. yes. Tom
[ofa-general] Re: Merging of completely unreviewed drivers
Krzysztof Halasa wrote: Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. Especially useful with long printk() lines and long arithmetic expressions. Yes; or even longer. 80 characters might have made sense on a screen when the alternative was 80 characters on a punched card, but on a modern computer it's very restrictive. That's especially true with the deep indents that you quickly get in C. Even short lines often need to be split when you put a few tabs in front of them, and that makes comprehension that bit harder, not to mention looks ugly. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] RDMA/nes: Fix cm_event_connected() for big-endian platforms
We recently added portabiliity/helper function get_crc_value() to nes_accept(). This should also be deployed to cm_event_connected. Otherwise rmda connection establishment will fail on big-endian platforms such as ppc64. This remediation was triggered by change near 2.6.23 to lib/crc32.c with commit ef19454bd437b2ba. Prior to the commit we might get the following return value from crc32c() on ppc64: 0xc69c51fd After the commit: 0xfd519cc6 So the helper function does an _le32 on the value so we have good interop between kernels at different rev levels for example. Signed-off-by: Glenn Streiff [EMAIL PROTECTED] --- drivers/infiniband/hw/nes/nes_cm.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 0c5dd5b..4705dbc 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -2752,6 +2752,7 @@ void cm_event_connected(struct nes_cm_event *event) struct iw_cm_event cm_event; struct nes_hw_qp_wqe *wqe; struct nes_v4_quad nes_quad; + u32 crc_value; int ret; /* get all our handles */ @@ -2829,8 +2830,8 @@ void cm_event_connected(struct nes_cm_event *event) nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port; /* Produce hash key */ - nesqp-hte_index = cpu_to_be32( - crc32c(~0, (void *)nes_quad, sizeof(nes_quad)) ^ 0x); + crc_value = get_crc_value(nes_quad); + nesqp-hte_index = cpu_to_be32(crc_value ^ 0x); nes_debug(NES_DBG_CM, HTE Index = 0x%08X, After CRC = 0x%08X\n, nesqp-hte_index, nesqp-hte_index nesadapter-hte_index_mask); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
Jeff Garzik [EMAIL PROTECTED] writes: Every time this discussion comes up, people point out that it remains highly common to open multiple 80-column terminal windows, making the 80-column limit still highly relevant in modern times. I guess only because of the limit :-) Raise the limit, terminal windows will follow. I'm using 80-column windows, too. -- Krzysztof Halasa ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Fri, Feb 22, 2008 at 12:16:45PM +1030, David Newall wrote: Krzysztof Halasa wrote: Linus Torvalds [EMAIL PROTECTED] writes: I'm personally of the opinion that a lot of checkpatch fixes are anything but. That mainly concerns fixing overlong lines Perhaps we should increase line length limit, 132 should be fine. Especially useful with long printk() lines and long arithmetic expressions. Yes; or even longer. 80 characters might have made sense on a screen when the alternative was 80 characters on a punched card, but on a modern computer it's very restrictive. That's especially true with the deep indents that you quickly get in C ... if your style is lousy. I agree that situation with printks is not normal in that respect and I certainly have no love for the checkpatch nonsense, but pressure to keep the fucking nesting depth low is a Good Thing(tm). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Fri, 22 Feb 2008, Al Viro wrote: ... if your style is lousy. I agree that situation with printks is not normal in that respect and I certainly have no love for the checkpatch nonsense, but pressure to keep the fucking nesting depth low is a Good Thing(tm). I do agree, but that has little to do with line length *directly*. IOW, I'd personally be happier with a checkpatch that calculated complexity and indentation over line length. There is definitely a correlation there: there is no question that complex lines with deep indentation tend to be long. So yes, long lines are correlated with bad code is certainly true to some degree. But sometimes lines are long just because it's a function call with multiple parameters, and it's just three levels indented, and it had a string there too. It may be long, but it's not complex, and keeping it on one line actually makes it much easier to visually parse (and grep for, for that matter). So I'd be happier with warnings about deep indentation (but how do you count it? Will people then try to fake things out by using 4-space indents and then deep indentations will look like just a couple of tabs?) and against complex expressions (ie if ((a = xyz()) == NULL) .. should just be split up into a = xyz(); if (!a) .., but there are sometimes reasons for those things too! Linus ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Win real money!
Now you have a brilliant possibility to feel casino excitement without leaving your house. All your favorite games are available to play in Golden Gate Casino. Just download free software and start playing. We provide 24 hours a day, 7 days a week support and service! Truly fair play guaranteed for players. High level of security! http://geocities.com/andrewkinney610 Don't hesitate, register now! ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Gamble for Real $ with No Deposit Required
No Credit Check! Gamble on Credit http://www.missoulaofficecity.info/ ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Cheap and excellent software - too good to be true? Read information below!
Don't waste time waiting for delivery of your software on a CD. Download and install it immediately. Choose the program you need from more than 270 programs in many languages. We provide help in installing software. You can ask any question and get a free of charge consultation. Guaranteed access to all updates! Friendly and professional service! http://geocities.com/b_alton You'll definitely find software you need. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
I think we can assume that the ringing of the doorbell is synchronous, i.e. when the processor completes it's write, the card knows there are RQ WQE available in host memory, It doesn't affect your larger point, but to be pedantically precise, writes across PCI will be posted, so the CPU may fully retire a write to MMIO long before that write completes at its final destination. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] post_recv question
Hello Ralph, ib_ipoib uses shared receive queues and doesn't try to manage posted buffer credits so the RNR NAK issue isn't the same as what Steve is trying to do. I meant the problem you saw might be the same reason. How many connections did you have when you hit this problem? Probably more than 1? thanks Shirley___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: Merging of completely unreviewed drivers
Linus Torvalds [EMAIL PROTECTED] writes: So I'd be happier with warnings about deep indentation (but how do you count it? Will people then try to fake things out by using 4-space indents and then deep indentations will look like just a couple of tabs?) and against complex expressions (ie if ((a = xyz()) == NULL) .. should just be split up into a = xyz(); if (!a) .., but there are sometimes reasons for those things too! Deep indentation should be fairly easy, given that you already have rules in place that says Tabs are 8 characters. So if you find a line that begins with more than say 4 SP, you can flag that as already bogus (i.e. does not indent with HT), more than 8 SP definitely so. I'll leave harder complex expressions to sparse experts ;-), ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: Merging of completely unreviewed drivers
On Thu, Feb 21, 2008 at 7:13 PM, Linus Torvalds [EMAIL PROTECTED] wrote: So I'd be happier with warnings about deep indentation (but how do you count it? Will people then try to fake things out by using 4-space indents and then deep indentations will look like just a couple of tabs?) I suspect that 90% of the cases that people really care about would get caught successfully just by counting brace depth. ie, by looking at { { {} {} {{{}{}}} } } I bet you can tell me which section should have been pulled out into a separate routine. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general