[ewg] ib_rdma test over chelsio
I have trouble to run ib_rdma test over chelsio. I got client reported pp_client_connect: unexpected CM event 1 error with or without any server running. The FW is 7.4.0. I tried stack 1.2.5 and ofed-1.4.1 both doesn't seem work to me. Anybody has any idea? Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: avoid enabling napi when it's already enabled
在 2008-10-28二的 10:47 +0200,Vladimir Sokolovsky写道: Yossi Etigin wrote: ipoib_open() may be called from ipoib_pkey_poll(), after napi has already been enbaled, and try to enable it again. This triggers BUG_ON test in napi_enable(). Signed-off-by: Yossi Etigin [EMAIL PROTECTED] Applied, Regards, Vladimir The same fix should submit to mainline kernel. I checked the code, same problem there. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: avoid enabling napi when it's already enabled
We found the same problem during child interface test for ofed-1.4-rc3. Please help on fixing it in ofed-1.4 daily built. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] [PATCH v2] IB/ipoib: copy small SKBs in CM mode
Hello Eli, In this case, how many tx drop packets from ifconfig output? Should we see ifconfig tx drop packets + tx successfully transmit packets close to netperf packets? That's right. I am looking at ipoib_cm_handle_tx_wc(), there is no tx drop packets increased in this situation, so tx transmit packets should be around netperf send packets. void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { ... tx_req = tx-tx_ring[wr_id]; ib_dma_unmap_single(priv-ca, tx_req-mapping[0], tx_req-skb-len, DMA_TO_DEVICE); /* FIXME: is this right? Shouldn't we only increment on success? */ ++dev-stats.tx_packets; dev-stats.tx_bytes += tx_req-skb-len; ... } Any TCP STREAM test results to share here? TCP won't demonstrate the problem since it uses Nagle's algorithm to aggregate data into full sized packets. So when hitting this RNR retry, the error status return was flush err, so the packets were silently dropped instead of failed cm send event and clear the interface up flag? Please correct me if wrong. thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: IPoIB panics on ipoib_cm_handle_rx_wc()
Hello Or, We have seen skb_under_panic() in our test enviornment as well. It's easy reproduce this with tcpdump on and off. thanks Shirley On Thu, 2008-04-24 at 14:11 +0300, Or Gerlitz wrote: https://bugs.openfabrics.org/show_bug.cgi?id=989 --- Comment #13 from [EMAIL PROTECTED] 2008-04-23 13:23 --- I think I found the problem and have a fix. In ipoib_cm_handle_rx_wc(), if the byte_len is SKB_TSHOLD, a new sk_buff is allocated. The sk_buff's mac.raw is not initialized. It sends the packet up the stack to netif_receive_skb(). In netif_receive_skb(), skb-mac_len is computed by subtracting mac.raw from nh.raw. Since mac.raw was not initialized, we get a very large number. It eventually leads to a panic in skb_under_panic. The diff of the fix: drivers/infiniband/ulp/ipoib/ipoib_cm.c.pre_kris drivers/infiniband/ulp/ipoib/ipoib_cm.c *** drivers/infiniband/ulp/ipoib/ipoib_cm.c.pre_kris2008-04-23 16:05:26.0 -0400 --- drivers/infiniband/ulp/ipoib/ipoib_cm.c 2008-04-23 15:16:23.0-0400 *** *** 622,627 --- 622,628 skb_copy_from_linear_data_offset(skb, IPOIB_ENCAP_LEN, small_skb-data, dlen); skb_put(small_skb, dlen); + skb_reset_mac_header(small_skb); skb = small_skb; goto copied; } Hi Kris, Good catch. This code does not exist in the mainline kernel and was added through the ofed 1.3 (non) process, see the patch below. Does this bug hits you for --every-- small packet received with connected mode? if not, can you explain why? The patch for itself as provided by the ofed sources (kernel_patches/fixes/ipoib_0320_small_skb_copy.patch) is event not documented, I took the change log from the git used to store it. This not reviewed and not documented patch which has a bug who could have been found if reviewed is yet another good example why code should not be added through ofed but rather through the mainline cycle, etc, oh well. Or. commit 92557c139fd8329daf1a1bf8beeaa6ae940b055a Author: Eli Cohen [EMAIL PROTECTED](none) Date: Mon Feb 18 21:56:03 2008 +0200 IB/ipoib: copy small SKBs in CM mode CM mode handling of received packets involves iterating trough the fragments. This is time consuming and in case of small packets it is better to allocate a new small skb and copy the data and pass this smaller SKB up to the IP stack. Signed-off-by: Eli Cohen [EMAIL PROTECTED] Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib.h === --- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-18 19:23:23.0 +0200 +++ ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-18 22:20:48.0 +0200 @@ -99,6 +99,8 @@ enum { MAX_SEND_CQE = 16, UD_POST_RCV_COUNT = 16, CM_POST_SRQ_COUNT = 16, + + SKB_TSHOLD= 256, }; #defineIPOIB_OP_RECV (1ul 31) Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_cm.c === --- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2008-02-18 19:23:23.0 +0200 +++ ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2008-02-18 22:21:01.0 +0200 @@ -554,6 +554,7 @@ void ipoib_cm_handle_rx_wc(struct net_de u64 mapping[IPOIB_CM_RX_SG]; int frags; int has_srq; + struct sk_buff *small_skb; ipoib_dbg_data(priv, cm recv completion: id %d, status: %d\n, wr_id, wc-status); @@ -608,6 +609,20 @@ void ipoib_cm_handle_rx_wc(struct net_de } } + if (wc-byte_len SKB_TSHOLD) { + int dlen = wc-byte_len - IPOIB_ENCAP_LEN; + + small_skb = dev_alloc_skb(dlen); + if (small_skb) { + small_skb-protocol = ((struct ipoib_header *)skb-data)-proto; + skb_copy_from_linear_data_offset(skb, IPOIB_ENCAP_LEN, +small_skb-data, dlen); + skb_put(small_skb, dlen); + skb = small_skb; + goto copied; + } + } + frags = PAGE_ALIGN(wc-byte_len - min(wc-byte_len, (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; @@ -634,6 +649,7 @@ void ipoib_cm_handle_rx_wc(struct net_de skb_reset_mac_header(skb); skb_pull(skb, IPOIB_ENCAP_LEN); +copied: dev-last_rx = jiffies; ++dev-stats.rx_packets; dev-stats.rx_bytes += skb-len;
[ewg] Re: [PATCH] IPoIB 4K MTU support
Hello Roland, On Tue, 2008-04-22 at 13:46 -0700, Roland Dreier wrote: Thanks, applied with some cleanups as below. Thanks! As an aside, in the case where we need to use a fragment in the receive skb, does it make sense to make the initial linear part bigger so the TCP and IP headers fit there (and the kernel doesn't have to look into the fragment list to handle the packet)? We can improve this later. Also, is there any clean way where a kernel with PAGE_SIZE 4096 can have ud_need_sg evaluate to 0 at compile time, so that all the unneeded code can be thrown out by the compiler? + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; I've never understood this style: it makes no sense to do return bool ? 1 : 0; instead of just return bool; You are right. +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, + u64 mapping[IPOIB_UD_RX_SG]) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_page(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); + } else + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_BUF_SIZE(priv-max_ib_mtu), DMA_FROM_DEVICE); +} + +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, +struct sk_buff *skb, +unsigned int length) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + skb_frag_t *frag = skb_shinfo(skb)-frags[0]; + /* + * There is only two buffers needed for max_payload = 4K, + * first buf size is IPOIB_UD_HEAD_SIZE + */ + skb-tail += IPOIB_UD_HEAD_SIZE; + frag-size = length - IPOIB_UD_HEAD_SIZE; + skb-data_len += frag-size; + skb-truesize += frag-size; + skb-len += length; + } else + skb_put(skb, length); + +} These are pretty big to put in a header file as inlines... I moved them to the only .c file where they're used. - R. Right. I should have moved it into .c file from Or's comment. I forgot. Thanks. Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] IPoIB 4K MTU support
Hello Roland, I recreated IPoIB 4K MTU patch. Below patch is built against 2.6.25 kernel for 2.6.26 kernel submission. Please review and integrate it. Please let me if any problem. Thanks Shirley This patch enables IPoIB 4K MTU support by using two S/G buffers when PAGE_SIZE is less than or equal to HCA IB MTU size. The first buffer is for IPoIB header + GRH header. The second buffer is IPoIB payload, which is 4K-4. Signed-off-by: Shirley Ma [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h | 50 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 86 +-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 19 -- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 15 - drivers/infiniband/ulp/ipoib/ipoib_vlan.c |1 + 6 files changed, 125 insertions(+), 49 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 73b2b17..6a05ead 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -56,11 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE= IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG= 2, /* max buffer needed for 4K mtu */ + IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -139,7 +139,7 @@ struct ipoib_mcast { struct ipoib_rx_buf { struct sk_buff *skb; - u64 mapping; + u64 mapping[IPOIB_UD_RX_SG]; }; struct ipoib_tx_buf { @@ -294,6 +294,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; + unsigned int max_ib_mtu; struct ipoib_rx_buf *rx_ring; @@ -305,6 +306,9 @@ struct ipoib_dev_priv { struct ib_send_wrtx_wr; unsigned tx_outstanding; + struct ib_recv_wrrx_wr; + struct ib_sgerx_sge[IPOIB_UD_RX_SG]; + struct ib_wc ibwc[IPOIB_NUM_WC]; struct list_head dead_ahs; @@ -366,6 +370,44 @@ struct ipoib_neigh { struct list_headlist; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) + +static inline int ipoib_ud_need_sg(unsigned int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; +} + +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, +u64 mapping[IPOIB_UD_RX_SG]) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_page(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); + } else + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_BUF_SIZE(priv-max_ib_mtu), DMA_FROM_DEVICE); +} + +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, + struct sk_buff *skb, + unsigned int length) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + skb_frag_t *frag = skb_shinfo(skb)-frags[0]; + /* +* There is only two buffers needed for max_payload = 4K, +* first buf size is IPOIB_UD_HEAD_SIZE +*/ + skb-tail += IPOIB_UD_HEAD_SIZE; + frag-size = length - IPOIB_UD_HEAD_SIZE; + skb-data_len += frag-size; + skb-truesize += frag-size; + skb-len += length; + } else + skb_put(skb, length); + +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh-ha. The ALIGN() expression here makes diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 0205eb7..8b3f1b2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -92,25 +92,18 @@ void ipoib_free_ah(struct kref *kref) static int ipoib_ib_post_receive(struct net_device *dev, int id) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_sge list; - struct ib_recv_wr param; struct ib_recv_wr *bad_wr; int ret; - list.addr = priv-rx_ring[id].mapping; - list.length = IPOIB_BUF_SIZE; - list.lkey = priv-mr-lkey; + priv-rx_wr.wr_id = id | IPOIB_OP_RECV; + priv-rx_sge[0].addr = priv-rx_ring[id].mapping[0]; + priv-rx_sge[1].addr = priv-rx_ring[id].mapping[1]; + - param.next= NULL
[ewg] Re: [RFC][1/2] IPoIB UD 4K MTU support
On Fri, 2008-04-04 at 15:36 -0700, Roland Dreier wrote: + unsigned int max_ib_mtu; I don't see where this is ever set? - R. It is set in ipoib_main.c, ipoib_add_port() + if (!ib_query_port(hca, port, attr)) + priv-max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu); Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: 4K MTU patch to kernel 2.6.26
Hello Tziporet, Yes, that's I am working on. I am going on vacation next week, that's why I hesitated to submit this patch this week since I can't respond to the review comments on time. If submitting the patch on April.1 is too late, I can submit this patch by tomorrow. How do you think? thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [RFC][0/2] IPoIB UD 4K MTU support
Here is a patchset to enable IPoIB UD 4K MTU support for any IB fabric where the max IPoIB payload can be up to 4K. This patchset uses two S/G buffers when IPoIB payload + IB_GRH header size is greater than PAGE_SIZE. The first buffer size is IB_GRH_HEAD + IPOIB_ENCAP_LEN. The second buffer is the data. Please review it. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [RFC][1/2] IPoIB UD 4K MTU support
This patch defines some parameters and creates a couple of APIs and for UD RX S/G to be used later. Signed-off-by: Shirley Ma [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h | 48 ++ 1 files changed, 48 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index f9b7caa..73a8fe5 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -61,6 +61,10 @@ enum { IPOIB_ENCAP_LEN = 4, + IPOIB_UD_MAX_PAYLOAD = 4096, + IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG= (IPOIB_UD_MAX_PAYLOAD + IB_GRH_BYTES) / PAGE_SIZE, + IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -141,6 +145,11 @@ struct ipoib_rx_buf { u64 mapping; }; +struct ipoib_ud_rx_buf { + struct sk_buff *skb; + u64 mapping[IPOIB_UD_RX_SG]; +}; + struct ipoib_tx_buf { struct sk_buff *skb; u64 mapping[MAX_SKB_FRAGS + 1]; @@ -289,6 +298,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; + unsigned int max_ib_mtu; struct ipoib_rx_buf *rx_ring; @@ -359,6 +369,44 @@ struct ipoib_neigh { struct list_headlist; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) + +static inline int ipoib_ud_need_sg(unsigned int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; +} + +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, +u64 mapping[IPOIB_UD_RX_SG]) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_page(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); + } else + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_BUF_SIZE(priv-max_ib_mtu), DMA_FROM_DEVICE); +} + +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, + struct sk_buff *skb, + unsigned int length) +{ + if (ipoib_ud_need_sg(priv-max_ib_mtu)) { + skb_frag_t *frag = skb_shinfo(skb)-frags[0]; + /* +* There is only two buffers needed for max_payload = 4K, +* first buf size is IPOIB_UD_HEAD_SIZE +*/ + skb-tail += IPOIB_UD_HEAD_SIZE; + frag-size = length - IPOIB_UD_HEAD_SIZE; + skb-data_len += frag-size; + skb-truesize += frag-size; + skb-len += length; + } else + skb_put(skb, length); + +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh-ha. The ALIGN() expression here makes ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [RFC][2/2] IPoIB UD 4K MTU support
This patch enabled 4K MTU support for IPoIB UD. I fixed unnecessary define in [RFC][1/2] patch since there is only 2 buffers are needed. I will integrate any comments later for this patchset and resubmit it. I have touched test this patch for branch-2.6.25 git tree. Signed-off-by: Shirley Ma [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h | 13 +--- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 86 +-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 19 -- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 15 - drivers/infiniband/ulp/ipoib/ipoib_vlan.c |1 + 6 files changed, 83 insertions(+), 54 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 73a8fe5..fcbb618 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -56,14 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE= IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, IPOIB_UD_MAX_PAYLOAD = 4096, IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, - IPOIB_UD_RX_SG= (IPOIB_UD_MAX_PAYLOAD + IB_GRH_BYTES) / PAGE_SIZE, + IPOIB_UD_RX_SG= 2, /* max buffer needed */ IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, @@ -142,11 +139,6 @@ struct ipoib_mcast { struct ipoib_rx_buf { struct sk_buff *skb; - u64 mapping; -}; - -struct ipoib_ud_rx_buf { - struct sk_buff *skb; u64 mapping[IPOIB_UD_RX_SG]; }; @@ -310,6 +302,9 @@ struct ipoib_dev_priv { struct ib_send_wrtx_wr; unsigned tx_outstanding; + struct ib_recv_wrrx_wr; + struct ib_sgerx_sge[IPOIB_UD_RX_SG]; + struct ib_wc ibwc[IPOIB_NUM_WC]; struct list_head dead_ahs; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c diff --git a/drivers/infiniband/ulp/ipoib/ipoib_fs.c b/drivers/infiniband/ulp/ipoib/ipoib_fs.c diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 9d3e778..072acc2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -90,25 +90,18 @@ void ipoib_free_ah(struct kref *kref) static int ipoib_ib_post_receive(struct net_device *dev, int id) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_sge list; - struct ib_recv_wr param; struct ib_recv_wr *bad_wr; int ret; - list.addr = priv-rx_ring[id].mapping; - list.length = IPOIB_BUF_SIZE; - list.lkey = priv-mr-lkey; + priv-rx_wr.wr_id = id | IPOIB_OP_RECV; + priv-rx_sge[0].addr = priv-rx_ring[id].mapping[0]; + priv-rx_sge[1].addr = priv-rx_ring[id].mapping[1]; + - param.next= NULL; - param.wr_id = id | IPOIB_OP_RECV; - param.sg_list = list; - param.num_sge = 1; - - ret = ib_post_recv(priv-qp, param, bad_wr); + ret = ib_post_recv(priv-qp, priv-rx_wr, bad_wr); if (unlikely(ret)) { ipoib_warn(priv, receive failed for buf %d (%d)\n, id, ret); - ib_dma_unmap_single(priv-ca, priv-rx_ring[id].mapping, - IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + ipoib_ud_dma_unmap_rx(priv, priv-rx_ring[id].mapping); dev_kfree_skb_any(priv-rx_ring[id].skb); priv-rx_ring[id].skb = NULL; } @@ -116,15 +109,22 @@ static int ipoib_ib_post_receive(struct net_device *dev, int id) return ret; } -static int ipoib_alloc_rx_skb(struct net_device *dev, int id) +static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, + int id, + u64 mapping[IPOIB_UD_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; - u64 addr; + int buf_size; + + if (ipoib_ud_need_sg(priv-max_ib_mtu)) + buf_size = IPOIB_UD_HEAD_SIZE; + else + buf_size = IPOIB_UD_BUF_SIZE(priv-max_ib_mtu); - skb = dev_alloc_skb(IPOIB_BUF_SIZE + 4); - if (!skb) - return -ENOMEM; + skb = dev_alloc_skb(buf_size + 4); + if (unlikely(!skb)) + return NULL; /* * IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte @@ -133,17 +133,31 @@ static int ipoib_alloc_rx_skb(struct net_device *dev, int id) */ skb_reserve(skb, 4); - addr = ib_dma_map_single(priv-ca, skb-data, IPOIB_BUF_SIZE
Re: [ewg] [Fwd: Re: [ofa-general] IPOIB/CM increase retry counts]
I saw cases where a fast sender consumed the TX ring and I solved this by increasing the size of the tx queue. I will try to connect ConnectX with Sinai and see if there are such issues. Which indicates we really need to fix bug 907. Thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
On Wed, 2008-02-13 at 10:04 +0200, Or Gerlitz wrote: Also here, does this problem exist in the 2.6.25-rc1 upstream code as well? from the change log I don't understand the source of the problem (only the symptom of failing to destroy ipoib/cm rx QP) and the solution. Or. I believe so. This is not a new problem in OFED-1.3 release. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: IB/ipoib: ipoib_ib_post_receive: infinite loop in error path
Thanks Nam. I will fix it along with ipoib_sg_skb_put_frags() optimization. Thanks Shirley Hoang-Nam Nguyen hnguyen@ To linux.vne [EMAIL PROTECTED], Shirley t.ibm.com Ma/Beaverton/[EMAIL PROTECTED] cc ewg@lists.openfabrics.org, 02/08/08 [EMAIL PROTECTED] 07:10 AM Subject IB/ipoib: ipoib_ib_post_receive: infinite loop in error path Hello Eli! Looked at ipoib code from ofed-1.3-rc4 and the saw the following code snippet in ipoib_ib_post_receive(): if (++priv-rx_outst == UD_POST_RCV_COUNT) { ret = ib_post_recv(priv-qp, priv-rx_wr_draft, bad_wr); if (unlikely(ret)) { ipoib_warn(priv, receive failed for buf %d (%d)\n, id, ret); while (bad_wr) { id = bad_wr-wr_id ~IPOIB_OP_RECV; ipoib_sg_dma_unmap_rx(priv, priv-rx_ring[i].mapping); #1/ipoib_0240_4kmtu.patch: should be priv-rx_ring[id].mapping dev_kfree_skb_any(priv-rx_ring[id].skb); priv-rx_ring[id].skb = NULL; #2/ipoib_0220_ud_post_list.patch: missing iterator forwarding, ie bad_wr = bad_wr-next; } } priv-rx_outst = 0; } #1: I've talked with Shirley about this. #2: I thought to have seen you fixed it, but still see it in rc4 after called configure script. Nam inline: graycol.gifinline: pic06271.gifinline: ecblank.gif___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] IB/ipoib - Problem with latest OFED 1.3 build... IPoIB and iPATH
Hello Ralph, I looked at ehca and mthca, in create_ah(), both driver didn't check dlid condition check like ipath here. In the port initilizaiton, priv-local_lid is set to 0 which is created by ipoib_0190_unsig_udqp.patch in RC4. I will let Eli look at this problem. static struct ib_ah *ipath_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { struct ipath_ah *ah; struct ib_ah *ret; struct ipath_ibdev *dev = to_idev(pd-device); unsigned long flags; /* A multicast address requires a GRH (see ch. 8.4.1). */ if (ah_attr-dlid = IPATH_MULTICAST_LID_BASE ah_attr-dlid != IPATH_PERMISSIVE_LID !(ah_attr-ah_flags IB_AH_GRH)) { ret = ERR_PTR(-EINVAL); goto bail; } if (ah_attr-dlid == 0) { ret = ERR_PTR(-EINVAL); goto bail; } Thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ***SPAM*** Re: [PATCH] IB/ipoib - Problem with latest OFED 1.3 build... IPoIB and iPATH
Hello Ralph, This patch looks OK to me. Let's wait for Eli's response. Thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]
Hello Ralph, What's ifconfig ib0 output? We can reproduce the problem here. We haven't made any ib_ipath driver changes between RC3 and RC4 so some recent patch has broken us. I'm in the process of looking at it. On Wed, 2008-02-06 at 17:17 -0800, Arlin Davis wrote: I cannot ifconfig ib0 on ipath with using the latest build (ofed20080206). ifup ib0 SIOCSIFFLAGS: Invalid argument Failed to bring up ib0. ib0: failed to create own ah int ipoib_ib_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); int ret; if (ib_find_pkey(priv-ca, priv-port, priv-pkey, priv-pkey_index)) { ipoib_warn(priv, P_Key 0x%04x not found\n, priv-pkey); clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); return -1; } set_bit(IPOIB_PKEY_ASSIGNED, priv-flags); ret = create_own_ah(priv); if (ret) { priv-own_ah = NULL; ipoib_warn(priv, failed to create own ah\n); return -1; } Looks like the ipath driver returns error from create_own_ah() call. Are you sure there is no ipath driver changes between RC3 and RC4? Which kernel did you hit this problem? What's the kernel PAGE_SIZE? thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [Fwd: Re: [ofa-general] Problem with latest OFED 1.3 build... IPoIB and iPATH]
On Thu, 2008-02-07 at 18:16 -0800, Ralph Campbell wrote: # cat /etc/*release Red Hat Enterprise Linux Server release 5 (Tikanga) # uname -r 2.6.18-8.el5 4K PAGE_SIZE I don't have ipath driver here. Otherwise I could try them out. A couple suggestions here, could you please try out? 1. try this on 64K page size, like RHEL5U1 to see whether you have the same issue. 2. Can you put a debug message in ipath_create_ah() to see whether this is a memory allocation failure? 3. How many IB cards in your system? If you have severals, just leave one ipath there to see whether you can hit this problem. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] traffic jittery, send queue full reports from mthca driver
Hello Or, I found out that if you increase send_queue_size and recv_queue_size, like 1K, this problem will be gone. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.3 rc4 update
On Wed, 2008-02-06 at 18:25 +0200, Tziporet Koren wrote: Hi, We will have OFED 1.3-rc4 tomorrow after one more night of regression It will include: 1. IPoIB: Non-SRQ for CM mode 2. IPOIB: 4K MTU 3. IPoIB - Small messages improvements Note that today's latest build will include theses features too if someone want to test it today Tziporet Thanks Tziporet. We will test it right after it's out. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] [UPDATE][PATCH] IPoIB-UD 4K MTU patch against 2.6.24 ofed-1.3-git tree
Hello Tziporet, The problem was because of the last check in of small UDP performance patch. It changed the receiving path completely. And I only got less than one day to merge/test the patch with that patch on both intel and PPC platform. The patch was in good/stable shape before this patch. It has passed stress test for both intel and PPC platform. I have tested the whole night of the new patch yesterday night. It works well and passes the stress test without any problem. Regarding Eli's comments, I have sent out. I am sorry for the minor mistake because of the rushing, but I don't see any risk from my test results. Please reconsider this patch to be in OFED-1.3. thanks Shirley Tziporet Koren tziporet To @dev.mell [EMAIL PROTECTED] anox.co.i cc lEli Cohen [EMAIL PROTECTED], Sent by: ewg@lists.openfabrics.org, OpenFabrics general-b General [EMAIL PROTECTED] [EMAIL PROTECTED] Subject sts.openf [ofa-general] Re: [ewg] [UPDATE][PATCH] abrics.or IPoIB-UD 4K MTU patch against 2.6.24 g ofed-1.3-git tree 02/05/08 08:19 AM Shirley Ma wrote: I found one one line was out side for loop when merging this patch with current git-tree. This caused UD_POST_RCV_COUNT = 16 wrong. I have fixed it. This is the updated patch. Thanks Shirley Hi Shirley, Its seems to me that 4K MTU patch is not cooked enough for RC4. I appreciate your hard work to push it but so many changes, possible leaks and not enough time for review and testing means too high risk for now Tziporet ___ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general inline: graycol.gifinline: pic23340.gifinline: ecblank.gif___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] [UPDATE][PATCH] IPoIB-UD 4K MTU patch against 2.6.24 ofed-1.3-git tree
Hello Tziporet, On Tue, 2008-02-05 at 18:56 +0200, Tziporet Koren wrote: Shirley Ma wrote: Hello Tziporet, The problem was because of the last check in of small UDP performance patch. It changed the receiving path completely. And I only got less than one day to merge/test the patch with that patch on both intel and PPC platform. The patch was in good/stable shape before this patch. It has passed stress test for both intel and PPC platform. I have tested the whole night of the new patch yesterday night. It works well and passes the stress test without any problem. Which OS have you tested? 2.6.24 kernel, and I am going to test SLES10SP2 kernel. It has passed stress test the whole night for 2K MTU test suites. Regarding Eli's comments, I have sent out. I am sorry for the minor mistake because of the rushing, but I don't see any risk from my test results. Please reconsider this patch to be in OFED-1.3. OK - we will do this - we will run one set of our regression with your patch now, and also check that it pass compilation on all kernels. If both will be OK we will take it. I cross fingers for you :-) ziporet Appreciate you, Vlad and Eli's help here! There is one line change needed for backporting ++priv-stats and ++dev-stats. I didn't create the backport patch for this. thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] [UPDATE][PATCH] IPoIB-UD 4K MTU patch against 2.6.24 ofed-1.3-git tree
Tziporet Koren [EMAIL PROTECTED] wrote on 02/05/2008 12:07:28 PM: Please test on RHREL 5 too What are your stress tests? Ok. The stress test is similar to netperf/netserver. But it's bi-directional multiple streams. I have stressed the stream to 150, duplex running overnight. Please send this backport patch and specify to which kernels its needed Tziporet Ok. It might be out tonight. Thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] IPoIB-UD 4K MTU patch for RC3 against 2.6.24
Hello Vlad, Here is the IPoIB-4K MTU patch for OFED-1.3-RC3 release agains 2.6.24 kernel. I create an attachment as well since my email has some problem. Regarding the backport, one line is needed to add for priv-stats vs. dev-stats. I don't have the backport patch, if you could help me that would be nice. If this is any issue, I will ask Nam to help out. I have touch tested mthca for 2K MTU for the updated patch. More test are going on. thanks Shirley Signed-off-by Shirley Ma [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h | 28 +++- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 218 +--- drivers/infiniband/ulp/ipoib/ipoib_main.c | 19 ++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 16 ++- 5 files changed, 212 insertions(+), 72 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 8eb6aa2..cb3aeab 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -56,11 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE= IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG= 2, /* for 4K MTU */ + IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -135,9 +135,9 @@ struct ipoib_mcast { struct net_device *dev; }; -struct ipoib_rx_buf { +struct ipoib_sg_rx_buf { struct sk_buff *skb; - u64 mapping; + u64 mapping[IPOIB_UD_RX_SG]; }; struct ipoib_tx_buf { @@ -286,7 +286,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; - struct ipoib_rx_buf *rx_ring; + struct ipoib_sg_rx_buf *rx_ring; spinlock_t tx_lock; struct ipoib_tx_buf *tx_ring; @@ -315,6 +315,9 @@ struct ipoib_dev_priv { struct dentry *mcg_dentry; struct dentry *path_dentry; #endif + int max_ib_mtu; + struct ib_sge rx_sge[IPOIB_UD_RX_SG]; + struct ib_recv_wr rx_wr; }; struct ipoib_ah { @@ -355,6 +358,19 @@ struct ipoib_neigh { struct list_headlist; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) +static inline int ipoib_ud_need_sg(int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; +} +static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv, +u64 mapping[IPOIB_UD_RX_SG]) +{ + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_single(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh-ha. The ALIGN() expression here makes diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 5063dd5..6c9eefe 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -87,32 +87,93 @@ void ipoib_free_ah(struct kref *kref) spin_unlock_irqrestore(priv-lock, flags); } +/* Adjust length of skb with fragments to match received data */ +static void ipoib_ud_skb_put_frags(struct sk_buff *skb, unsigned int length, + struct sk_buff *toskb) +{ + unsigned int size; + skb_frag_t *frag = skb_shinfo(skb)-frags[0]; + + /* put header into skb */ + size = min(length, (unsigned)IPOIB_UD_HEAD_SIZE); + skb-tail += size; + skb-len += size; + length -= size; + + if (length == 0) { + /* don't need this page */ + skb_fill_page_desc(toskb, 0, frag-page, 0, PAGE_SIZE); + --skb_shinfo(skb)-nr_frags; + } else { + size = min(length, (unsigned) PAGE_SIZE); + frag-size = size; + skb-data_len += size; + skb-truesize += size; + skb-len += size; + length -= size; + } +} + +static struct sk_buff *ipoib_sg_alloc_rx_skb(struct net_device *dev, +int id, u64 mapping[IPOIB_UD_RX_SG]) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct page *page; + struct sk_buff *skb; + + skb = dev_alloc_skb(IPOIB_UD_HEAD_SIZE); + + if (unlikely(!skb)) + return NULL; + + mapping[0] = ib_dma_map_single(priv-ca, skb-data, IPOIB_UD_HEAD_SIZE, + DMA_FROM_DEVICE); + if (unlikely
[ewg] Re: [PATCH] IPoIB-UD 4K MTU patch for RC3 against 2.6.24
Hello all, I have created the patch and tested without Eli's patch but with Pradeep's patch. It works OK. Then I create another patch with Eli and Pradeep's patch against today's ofed-1.3 git tree. The ping worked for a while then stopped. I will try to debug it. And We have found a crash in today's ofed git tree in IPoIB-CM mode. Pradeep has narrowed down it to Eli's patch. Please address it on time. So we can continue our test. thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] IPoIB-UD 4K MTU patch against 2.6.24 ofed-1.3-git tree
Tziporet, This IPoIB 4K MTU patch is built against today's 2.6.24 OFED-1.3-Git tree. This patch tested before Eli's patch successfully. This rebuilt patch is on top of Eli's patch. However this constant UD_POST_RCV_COUNT which is defined in Eli's patch as 16 does impact the behavior this patch. When I define this as 1, everything works OK, if I change the value to 8 or bigger, the patch won't work well. We do see a couple of issues after Eli's patch checks in. So I suggest to check in the patch. Then we can work together to address these issues tomorrow. In Eli's patch I would suggest use kzalloc() to alloc 16 ib_sge and ib_recv_wr instead of defining this in ipoib_dev_priv since it might have some memory issue there. I am working on the patch now to see any better results. Vlad, There would be one line change for backporting regarding priv-stats vs. dev-stats. If you have any problem to create the backport patch, let me know. I will ask Nam to help. The attachment is for you to easily apply the patch, my email might have issues. Thanks Shirley Signed-off-by: Shirley Ma [EMAIL PROTECTED] --- diff -urpN ofed_1_3_a/drivers/infiniband/ulp/ipoib/ipoib.h ofed_1_3_b/drivers/infiniband/ulp/ipoib/ipoib.h --- ofed_1_3_a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-04 15:45:44.0 -0800 +++ ofed_1_3_b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-04 15:40:38.0 -0800 @@ -56,11 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE= IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG= 2, /* for 4K MTU */ + IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -141,9 +141,9 @@ struct ipoib_mcast { struct net_device *dev; }; -struct ipoib_rx_buf { +struct ipoib_sg_rx_buf { struct sk_buff *skb; - u64 mapping; + u64 mapping[IPOIB_UD_RX_SG]; }; struct ipoib_tx_buf { @@ -337,7 +337,7 @@ struct ipoib_dev_priv { struct net_device *dev; struct ib_recv_wr rx_wr_draft[UD_POST_RCV_COUNT]; - struct ib_sge sglist_draft[UD_POST_RCV_COUNT]; + struct ib_sge sglist_draft[UD_POST_RCV_COUNT][IPOIB_UD_RX_SG]; unsigned intrx_outst; struct napi_struct napi; @@ -378,7 +378,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; - struct ipoib_rx_buf *rx_ring; + struct ipoib_sg_rx_buf *rx_ring; spinlock_t tx_lock; struct ipoib_tx_buf *tx_ring; @@ -412,6 +412,7 @@ struct ipoib_dev_priv { struct ipoib_ethtool_st etool; struct timer_list poll_timer; struct ib_ah *own_ah; + int max_ib_mtu; }; struct ipoib_ah { @@ -452,6 +453,19 @@ struct ipoib_neigh { struct list_headlist; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) +static inline int ipoib_ud_need_sg(int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; +} +static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv, +u64 mapping[IPOIB_UD_RX_SG]) +{ + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_single(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh-ha. The ALIGN() expression here makes diff -urpN ofed_1_3_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c ofed_1_3_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c --- ofed_1_3_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-04 15:45:44.0 -0800 +++ ofed_1_3_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-04 15:40:38.0 -0800 @@ -96,14 +96,82 @@ static void clean_pending_receives(struc for (i = 0; i priv-rx_outst; ++i) { id = priv-rx_wr_draft[i].wr_id ~IPOIB_OP_RECV; - ib_dma_unmap_single(priv-ca, priv-rx_ring[id].mapping, -IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + if (ipoib_ud_need_sg(priv-max_ib_mtu)) + ipoib_sg_dma_unmap_rx(priv, + priv-rx_ring[i].mapping); + else + ib_dma_unmap_single(priv-ca, priv-rx_ring[id].mapping[0], + IPOIB_UD_BUF_SIZE(priv-max_ib_mtu), DMA_FROM_DEVICE); dev_kfree_skb_any(priv-rx_ring[id].skb); priv-rx_ring[id].skb = NULL; } priv-rx_outst = 0; } +static void ipoib_ud_skb_put_frags(struct
Re: [ewg] Oops with today's OFED 1.3
Eli, Please look at this issues ASAP. Without your patch everything works well. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
***SPAM*** Re: [ewg] Re: [ofa-general] Please send all patches for OFED 1.3 rc4 by end of Monday (Feb 4)
Tziporet Koren [EMAIL PROTECTED] wrote on 02/04/2008 08:14:08 AM: OK - go ahead and regenerate patch and we will be able to include it in RC4 BTW - how did you test it with mthca? It does not support 4K MTU. You can test it with ConnectX since it does supports 4K MTU (with a special burning configuration). Please let me know if you have ConnectX and you wish to test it with 4K MTU Tziporet Thanks Tzipoeret. I would like to test ConnectX. But I can't test right it now since the switch connected to ConnectX is configured as 2K MTU and the test team has other test task to finish. But I can suggest the test team to include 4K MTU test as port of their system validation. Please send me the instructions on how to enable it for ConnectX. Thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [UPDATE][PATCH] IPoIB-UD 4K MTU patch against 2.6.24 ofed-1.3-git tree
I found one one line was out side for loop when merging this patch with current git-tree. This caused UD_POST_RCV_COUNT = 16 wrong. I have fixed it. This is the updated patch. Thanks Shirley Signed-off-by: Shirley Ma [EMAIL PROTECTED] --- diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib.h ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib.h --- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-04 20:09:18.0 -0800 +++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-04 20:11:26.0 -0800 @@ -56,11 +56,11 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE= IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN, + IPOIB_UD_RX_SG= 2, /* for 4K MTU */ + IPOIB_CM_MTU = 0x1 - 0x10, /* padding to align header to 16 */ IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE, @@ -141,9 +141,9 @@ struct ipoib_mcast { struct net_device *dev; }; -struct ipoib_rx_buf { +struct ipoib_sg_rx_buf { struct sk_buff *skb; - u64 mapping; + u64 mapping[IPOIB_UD_RX_SG]; }; struct ipoib_tx_buf { @@ -337,7 +337,7 @@ struct ipoib_dev_priv { struct net_device *dev; struct ib_recv_wr rx_wr_draft[UD_POST_RCV_COUNT]; - struct ib_sge sglist_draft[UD_POST_RCV_COUNT]; + struct ib_sge sglist_draft[UD_POST_RCV_COUNT][IPOIB_UD_RX_SG]; unsigned intrx_outst; struct napi_struct napi; @@ -378,7 +378,7 @@ struct ipoib_dev_priv { unsigned int admin_mtu; unsigned int mcast_mtu; - struct ipoib_rx_buf *rx_ring; + struct ipoib_sg_rx_buf *rx_ring; spinlock_t tx_lock; struct ipoib_tx_buf *tx_ring; @@ -412,6 +412,7 @@ struct ipoib_dev_priv { struct ipoib_ethtool_st etool; struct timer_list poll_timer; struct ib_ah *own_ah; + int max_ib_mtu; }; struct ipoib_ah { @@ -452,6 +453,19 @@ struct ipoib_neigh { struct list_headlist; }; +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) +#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) +static inline int ipoib_ud_need_sg(int ib_mtu) +{ + return (IPOIB_UD_BUF_SIZE(ib_mtu) PAGE_SIZE) ? 1 : 0; +} +static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv, +u64 mapping[IPOIB_UD_RX_SG]) +{ + ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_single(priv-ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE); +} + /* * We stash a pointer to our private neighbour information after our * hardware address in neigh-ha. The ALIGN() expression here makes diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c --- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-04 20:09:18.0 -0800 +++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-04 20:11:26.0 -0800 @@ -96,14 +96,82 @@ static void clean_pending_receives(struc for (i = 0; i priv-rx_outst; ++i) { id = priv-rx_wr_draft[i].wr_id ~IPOIB_OP_RECV; - ib_dma_unmap_single(priv-ca, priv-rx_ring[id].mapping, -IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + if (ipoib_ud_need_sg(priv-max_ib_mtu)) + ipoib_sg_dma_unmap_rx(priv, + priv-rx_ring[i].mapping); + else + ib_dma_unmap_single(priv-ca, priv-rx_ring[id].mapping[0], + IPOIB_UD_BUF_SIZE(priv-max_ib_mtu), DMA_FROM_DEVICE); dev_kfree_skb_any(priv-rx_ring[id].skb); priv-rx_ring[id].skb = NULL; } priv-rx_outst = 0; } +static void ipoib_ud_skb_put_frags(struct sk_buff *skb, unsigned int length, + struct sk_buff *toskb) +{ + unsigned int size; + skb_frag_t *frag = skb_shinfo(skb)-frags[0]; + + /* put header into skb */ + size = min(length, (unsigned)IPOIB_UD_HEAD_SIZE); + skb-tail += size; + skb-len += size; + length -= size; + + if (length == 0) { + /* don't need this page */ + skb_fill_page_desc(toskb, 0, frag-page, 0, PAGE_SIZE); + --skb_shinfo(skb)-nr_frags; + } else { + size = min(length, (unsigned) PAGE_SIZE); + frag-size = size; + skb-data_len += size; + skb-truesize += size; + skb-len += size; + length -= size; + } +} + +static struct sk_buff *ipoib_sg_alloc_rx_skb
RE: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3readiness
On Wed, 2008-01-30 at 17:10 -0800, Woodruff, Robert J wrote: Tziporet wrote, * Delay 1.3 release in a week * Do RC4 next week - Feb 6 * Add RC5 on Feb 18 - this will be the GOLD version * GA release on Feb 25 All - please reply if this is acceptable I hate to keep slipping this, but I think it is important to get what RedHat needs into OFED 1.3, so I am not apposed to this. I think however that perhaps after 1.3, we should discuss our process a bit to try to get a little better at making our original release dates. I think we are getting hit with feature creep, allowing some pretty major changes after the feature freeze date, late in the release cycle. I also think that we do need to be a little more careful and selective about what features go into OFED, as it is suppose to be an enterprise release rather than an experimental code release. For the kernel code, I think that this means keeping things a little closer to the kernel.org kernel features and if something is not upstream, then press for getting it upstream (or at least queued for upsteam) rather than allowing big patches into OFED that have not had a good review. The way we are working now, if it is getting into OFED, people are less aggressive at getting things upstream. Perhaps we can have a discussion about this at the Sonoma workshop. In addition, we should talk about how to integrate patches being queued in upper stream but not in OFED, like IPoIB noSRQ. There is always a window between OFED release and kernel release, a window between Distro release and OFED release. Some customers are targeted OFED release, some customers are targeted OFED release. Then how to handle these windows to meet different customers' requirements could be something t to be discussed at Sonoma workshop as well. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3readiness
In addition, we should talk about how to integrate patches being queued in upper stream but not in OFED, like IPoIB noSRQ. There is always a window between OFED release and kernel release, a window between Distro release and OFED release. Some customers are targeted OFED release, some customers are targeted OFED release. Then how to handle these windows to meet different customers' requirements could be something t to be discussed at Sonoma workshop as well. Oops, a typo, I meant some customers are targeted Distro releases. From customer support point view, it's always better to have OFED releases in Distros. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofa-general] Bonding and hw_csum
Hello Eli, ipoib_0030_hw_csum.patch has been removed Would removing this patch cause any errors on applying the rest of patches? If not, I will remove it for our testing as well. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: non SRQ patch for OFED 1.3
Pradeep, We tries to apply this patch for OFED 1.3 and its breaks some of the backports. Please use the makedist script on the ofa server (there is an explanation in the developers Wiki) and fix this so we can try to apply it Vlad will help you later today too Thanks, Tziporet Thanks Tziporet/Vlad for helping this into OFED-1.3. Sean suggested to compare noSRQ and SRQ performance in a smaller cluster environment long time ago. That's an interesting suggestion. We are planning to compare it in OFED-1.3. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3readiness
Thanks for everyone here. I appreciate your comments and effort. The big challenge for us is how to sync features/blockers with OFED release Distros release. Most of our customers prefer Distros release so they can get same level of support as other pieces. If OFED could work with Distros release, then it will be less problems for both end users and Distros. That's just my personal opinion. We are here to support any issues being found in OFED release cycle on time regarding these patches. Thanks again! Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3 readiness
[EMAIL PROTECTED] wrote on 01/30/2008 08:40:10 AM: Doug Ledford wrote: Hmmm...I'd like to put my $.02 in here. I don't have any visibility into what drives the OFED schedule, so I have no clue as to why people don't want to slip the schedule for this change. I'm sure you guys have your reasons. However, I also happen to be a consumer of this code, and I know for a fact that no one has gotten my input on this issue. So, the deal is that I'm currently integrating OFED 1.3 into what will be RHEL5.2. The RHEL5.2 freeze date has already passed, but in order to keep what finally goes out from being too stale, I'm being allowed to submit the OFED-1.3-rc1 code prior to freeze, and then update to OFED-1.3 final during our beta test process. What this means, is that anything you punt from 1.3 to 1.3.1, you are also punting out of RHEL5.2 and RHEL4.7. So, that being said, there's a whole trickle down effect with various groups that would really like to be able to use 5.2 out of the box that may prefer a slip in 1.3 so that this can be part of it instead of punting to 1.3.1. I'm not saying this will change your mind, but I'm sure it wasn't part of the decision process before, so I'm bringing it up. Thanks for the input (BTW you are welcome to join our weekly meetings and give us feedback online) I think it is important to make sure RH new versions will include best OFED release This my suggestion is: * Delay 1.3 release in a week * Do RC4 next week - Feb 6 * Add RC5 on Feb 18 - this will be the GOLD version * GA release on Feb 25 All - please reply if this is acceptable 760 major [EMAIL PROTECTED] UDP performance on Rx is lower than Tx - for 1.3.1 761 major [EMAIL PROTECTED] Poor and jittery UDP performance at small messages - for 1.3.1 Ditto for requesting these two be in 1.3. We've already had customers bring up the UDP performance issue in our previous releases. We will push some fixes of these to RC4 if the above plan is accepted Tziporet Is also that possible to include some delayed features which are planning to be in later release as well? Like IPoIB noSRQ, 4K mtu etc, we do have some customers request already. IPoIB noSRQ has been in upper stream already, but it's not in 2.6.24, it will be in 2.6.25. 4K mtu patch is under review. We have passed our tests. I will post a new version against RC3, and split the patch into several for 2.6.25 upper stream submission. thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Bonding and hw_csum
Hello Tziporet, the hw checksum patch was removed from OFED 1.3 Tziporet Could youp please specify which patch has been removed? I still can see a list of patches under RC3. here they are: ipoib_0010_Add-high-dma-support-to-ipoib.patch ipoib_0020_Add-s-g-support-for-IPOIB.patch ipoib_0030_hw_csum.patch ipoib_0040_checksum-offload.patch ipoib_0050_Add-LSO-support.patch ipoib_0060_ethtool-support.patch ipoib_0070_modiy_cq_params.patch ipoib_0080_broadcast_null.patch ipoib_0110_set_default_cq_patams.patch thanks Shirley___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg