Re: [PATCH 3.12 34/72] Fix regression in NFSRDMA server
Seems like part of a patch was dropped or there was a merge problem. The code in my tree has all these changes. Very strange. Tom On 11/23/15 7:08 AM, Jiri Slaby wrote: From: Tom Tucker 3.12-stable review patch. If anyone has any objections, please let me know. === commit 7e4359e2611f95a97037e2b6905eab52f28afbeb upstream. The server regression was caused by the addition of rq_next_page (afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that were missed with the update of the rq_respages array. Signed-off-by: Tom Tucker Tested-by: Steve Wise Signed-off-by: J. Bruce Fields Signed-off-by: Jiri Slaby --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 net/sunrpc/xprtrdma/svc_rdma_sendto.c | 1 + 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce75524ed21..8d904e4eef15 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp->rq_respages = >rq_pages[sge_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; This was fixed in an older version. /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt, */ head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no]; rqstp->rq_respages = >rq_arg.pages[page_no+1]; + rqstp->rq_next_page = rqstp->rq_respages + 1; byte_count -= sge_bytes; ch_bytes -= sge_bytes; @@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp->rq_respages = >rq_arg.pages[page_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; /* Create the reply and chunk maps */ offset = 0; @@ -520,13 +523,6 @@ next_sge: for (ch_no = 0; >rq_pages[ch_no] < rqstp->rq_respages; ch_no++) rqstp->rq_pages[ch_no] = NULL; - /* -* Detach res pages. If svc_release sees any it will attempt to -* put them. -*/ - while (rqstp->rq_next_page != rqstp->rq_respages) - *(--rqstp->rq_next_page) = NULL; - return err; } @@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp, /* rq_respages starts after the last arg page */ rqstp->rq_respages = >rq_arg.pages[page_no]; - rqstp->rq_next_page = >rq_arg.pages[page_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; /* Rebuild rq_arg head and tail. */ rqstp->rq_arg.head[0] = head->arg.head[0]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c1d124dc772b..11e90f8c0fc5 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma, if (page_no+1 >= sge_no) ctxt->sge[page_no+1].length = 0; } + rqstp->rq_next_page = rqstp->rq_respages + 1; BUG_ON(sge_no > rdma->sc_max_sge); memset(_wr, 0, sizeof send_wr); ctxt->wr_op = IB_WR_SEND; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.12 34/72] Fix regression in NFSRDMA server
Seems like part of a patch was dropped or there was a merge problem. The code in my tree has all these changes. Very strange. Tom On 11/23/15 7:08 AM, Jiri Slaby wrote: From: Tom Tucker <t...@ogc.us> 3.12-stable review patch. If anyone has any objections, please let me know. === commit 7e4359e2611f95a97037e2b6905eab52f28afbeb upstream. The server regression was caused by the addition of rq_next_page (afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that were missed with the update of the rq_respages array. Signed-off-by: Tom Tucker <t...@ogc.us> Tested-by: Steve Wise <sw...@ogc.us> Signed-off-by: J. Bruce Fields <bfie...@redhat.com> Signed-off-by: Jiri Slaby <jsl...@suse.cz> --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 net/sunrpc/xprtrdma/svc_rdma_sendto.c | 1 + 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce75524ed21..8d904e4eef15 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, sge_no++; } rqstp->rq_respages = >rq_pages[sge_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; This was fixed in an older version. /* We should never run out of SGE because the limit is defined to * support the max allowed RPC data length @@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt, */ head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no]; rqstp->rq_respages = >rq_arg.pages[page_no+1]; + rqstp->rq_next_page = rqstp->rq_respages + 1; byte_count -= sge_bytes; ch_bytes -= sge_bytes; @@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt, /* rq_respages points one past arg pages */ rqstp->rq_respages = >rq_arg.pages[page_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; /* Create the reply and chunk maps */ offset = 0; @@ -520,13 +523,6 @@ next_sge: for (ch_no = 0; >rq_pages[ch_no] < rqstp->rq_respages; ch_no++) rqstp->rq_pages[ch_no] = NULL; - /* -* Detach res pages. If svc_release sees any it will attempt to -* put them. -*/ - while (rqstp->rq_next_page != rqstp->rq_respages) - *(--rqstp->rq_next_page) = NULL; - return err; } @@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp, /* rq_respages starts after the last arg page */ rqstp->rq_respages = >rq_arg.pages[page_no]; - rqstp->rq_next_page = >rq_arg.pages[page_no]; + rqstp->rq_next_page = rqstp->rq_respages + 1; /* Rebuild rq_arg head and tail. */ rqstp->rq_arg.head[0] = head->arg.head[0]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c1d124dc772b..11e90f8c0fc5 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma, if (page_no+1 >= sge_no) ctxt->sge[page_no+1].length = 0; } + rqstp->rq_next_page = rqstp->rq_respages + 1; BUG_ON(sge_no > rdma->sc_max_sge); memset(_wr, 0, sizeof send_wr); ctxt->wr_op = IB_WR_SEND; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/21/15 2:39 AM, Michael Wang wrote: On 04/20/2015 05:51 PM, Tom Tucker wrote: [snip] int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. The purpose is to make sure folks understand what we really want to check when they reviewing the code :-) and prepared for the further reform which may not rely on technology type any more, for example the device could tell core layer directly what management it required with a bitmask :-) Hi Michael, Thanks for the reply, but my premise was just wrong...I need to review the whole patch, not just a snippet. Thanks, Tom Regards, Michael Wang Tom Regards, Michael Wang -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/21/15 2:39 AM, Michael Wang wrote: On 04/20/2015 05:51 PM, Tom Tucker wrote: [snip] int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. The purpose is to make sure folks understand what we really want to check when they reviewing the code :-) and prepared for the further reform which may not rely on technology type any more, for example the device could tell core layer directly what management it required with a bitmask :-) Hi Michael, Thanks for the reply, but my premise was just wrong...I need to review the whole patch, not just a snippet. Thanks, Tom Regards, Michael Wang Tom Regards, Michael Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/20/15 11:19 AM, Jason Gunthorpe wrote: On Mon, Apr 20, 2015 at 10:51:58AM -0500, Tom Tucker wrote: On 4/20/15 10:16 AM, Michael Wang wrote: On 04/20/2015 04:00 PM, Steve Wise wrote: On 4/20/2015 3:40 AM, Michael Wang wrote: [snip] diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6805e3e..e4999f6 100644 +++ b/include/rdma/ib_verbs.h @@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support IWARP + * Communication Manager. + */ +static inline int cap_iw_cm(struct ib_device *device, u8 port_num) +{ +return rdma_tech_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. No, it says this code is concerned with the unique parts of iWarp related to CM, not the other unique parts of iWarp. The check isn't aways true, it is just always true on iWarp devices. That became the problem with the old way of just saying 'is iWarp' (and others). There are too many differences, the why became lost in many places. There are now too many standards, and several do not have public docs, to keep relying on a mess of 'is standard' tests. You're right Jason, this gets called with the device handle so it's only true for iwarp. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/20/15 10:16 AM, Michael Wang wrote: On 04/20/2015 04:00 PM, Steve Wise wrote: On 4/20/2015 3:40 AM, Michael Wang wrote: [snip] diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6805e3e..e4999f6 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support IWARP + * Communication Manager. + */ +static inline int cap_iw_cm(struct ib_device *device, u8 port_num) +{ +return rdma_tech_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. Tom Regards, Michael Wang -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/20/15 10:16 AM, Michael Wang wrote: On 04/20/2015 04:00 PM, Steve Wise wrote: On 4/20/2015 3:40 AM, Michael Wang wrote: [snip] diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6805e3e..e4999f6 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support IWARP + * Communication Manager. + */ +static inline int cap_iw_cm(struct ib_device *device, u8 port_num) +{ +return rdma_tech_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. Tom Regards, Michael Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()
On 4/20/15 11:19 AM, Jason Gunthorpe wrote: On Mon, Apr 20, 2015 at 10:51:58AM -0500, Tom Tucker wrote: On 4/20/15 10:16 AM, Michael Wang wrote: On 04/20/2015 04:00 PM, Steve Wise wrote: On 4/20/2015 3:40 AM, Michael Wang wrote: [snip] diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6805e3e..e4999f6 100644 +++ b/include/rdma/ib_verbs.h @@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num) return rdma_ib_or_iboe(device, port_num); } +/** + * cap_iw_cm - Check if the port of device has the capability IWARP + * Communication Manager. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support IWARP + * Communication Manager. + */ +static inline int cap_iw_cm(struct ib_device *device, u8 port_num) +{ +return rdma_tech_iwarp(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful. Sean suggested to add this helper paired with cap_ib_cm(), may be there are some consideration on maintainability? Me too also prefer this way to make the code more readable ;-) It's more consistent, but not necessarily more readable -- if by readability we mean understanding. If the reader knows how the transports work, then the reader would be confused by the addition of a check that is always true. For the reader that doesn't know, the addition of the check implies that the support is optional, which it is not. No, it says this code is concerned with the unique parts of iWarp related to CM, not the other unique parts of iWarp. The check isn't aways true, it is just always true on iWarp devices. That became the problem with the old way of just saying 'is iWarp' (and others). There are too many differences, the why became lost in many places. There are now too many standards, and several do not have public docs, to keep relying on a mess of 'is standard' tests. You're right Jason, this gets called with the device handle so it's only true for iwarp. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()
On 4/16/15 8:45 AM, Michael Wang wrote: On 04/16/2015 03:42 PM, Hal Rosenstock wrote: On 4/16/2015 9:41 AM, Michael Wang wrote: On 04/16/2015 03:36 PM, Hal Rosenstock wrote: [snip] -EXPORT_SYMBOL(rdma_node_get_transport); - enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num) { if (device->get_link_layer) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 262bf44..f9ef479 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -84,9 +84,6 @@ enum rdma_transport_type { RDMA_TRANSPORT_IBOE, }; -__attribute_const__ enum rdma_transport_type -rdma_node_get_transport(enum rdma_node_type node_type); - enum rdma_link_layer { IB_LINK_LAYER_UNSPECIFIED, Is IB_LINK_LAYER_UNSPECIFIED still possible ? Actually it's impossible in kernel at first, all those who implemented the callback won't return UNSPECIFIED, others all have the correct transport type (otherwise BUG()) and won't result UNSPECIFIED :-) Should it be removed from this enum somewhere in this patch series (perhaps early on) ? I don't think it's ever been 'possible.' It's purpose is to catch initialized errors where the transport fails to initialize it's transport type. So for example, provider = calloc(1, sizeof *provider) If 0 is a valid link layer type, then you wouldn't catch these kinds of errors. Tom It was still directly used by helper like ib_modify_qp_is_ok() as indicator, may be better in another following patch to reform those part :-) Regards, Michael Wang -- Hal Regards, Michael Wang IB_LINK_LAYER_INFINIBAND, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()
On 4/16/15 8:45 AM, Michael Wang wrote: On 04/16/2015 03:42 PM, Hal Rosenstock wrote: On 4/16/2015 9:41 AM, Michael Wang wrote: On 04/16/2015 03:36 PM, Hal Rosenstock wrote: [snip] -EXPORT_SYMBOL(rdma_node_get_transport); - enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num) { if (device-get_link_layer) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 262bf44..f9ef479 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -84,9 +84,6 @@ enum rdma_transport_type { RDMA_TRANSPORT_IBOE, }; -__attribute_const__ enum rdma_transport_type -rdma_node_get_transport(enum rdma_node_type node_type); - enum rdma_link_layer { IB_LINK_LAYER_UNSPECIFIED, Is IB_LINK_LAYER_UNSPECIFIED still possible ? Actually it's impossible in kernel at first, all those who implemented the callback won't return UNSPECIFIED, others all have the correct transport type (otherwise BUG()) and won't result UNSPECIFIED :-) Should it be removed from this enum somewhere in this patch series (perhaps early on) ? I don't think it's ever been 'possible.' It's purpose is to catch initialized errors where the transport fails to initialize it's transport type. So for example, provider = calloc(1, sizeof *provider) If 0 is a valid link layer type, then you wouldn't catch these kinds of errors. Tom It was still directly used by helper like ib_modify_qp_is_ok() as indicator, may be better in another following patch to reform those part :-) Regards, Michael Wang -- Hal Regards, Michael Wang IB_LINK_LAYER_INFINIBAND, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
AhOk On 3/11/13 9:53 PM, Tim Gardner wrote: On 03/11/2013 05:02 PM, Tom Tucker wrote: On 3/11/13 4:25 PM, J. Bruce Fields wrote: Looks good to me; wish we could get it tested I will test it. Tim could you please send me a final version that you'd like tested as a single message? I'm a little confused about what you are asking. I think v3 of the patch is the final version (unless you find bugs with it). Would someone (like Tim maybe ... hint hint) look at tearing out all those dead registration strategies? I don't think we need or will ever use bounce-buffers, memory windows, or mlnx fmr. The only two that are used and tested are all-phys and FRMR (the default). Dunno if I'll get time for that. I had a one day window where I could hack out some simple patches. Now I'm back to the usual grindstone. rtg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
On 3/11/13 4:25 PM, J. Bruce Fields wrote: On Mon, Mar 11, 2013 at 03:15:08PM -0600, Tim Gardner wrote: rpcrdma_register_default_external() is several frames into the call stack which goes deeper yet. You run the risk of stack corruption by declaring such a large automatic variable, so move the array of 'struct ib_phys_buf' objects into the transport structure 'struct rpcrdma_xprt' (which is dynamically allocated) in order to silence the frame-larger-than warning. Access to each struct rpcrdma_xprt is serialized by XPRT_LOCKED in xprt_reserve_xprt(), so there is no danger of multiple accessors to the array of struct ib_phys_buf objects. net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external': net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] gcc version 4.6.3 Cc: Trond Myklebust Cc: "J. Bruce Fields" Cc: "David S. Miller" Cc: Tom Tucker Cc: Haggai Eran Cc: Or Gerlitz Cc: Shani Michaeli Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Signed-off-by: Tim Gardner --- v1 - Use kmalloc() to dynamically allocate and free the array of 'struct ib_phys_buf' objects v2 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_req and pass this request down through rpcrdma_register_external() and rpcrdma_register_default_external(). This is less overhead then using kmalloc() and requires no extra error checking as the allocation burden is shifted to the transport client. v3 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_xprt. Pass a pointer to this transport structure into rpcrdma_register_default_external(). This is less overhead then using kmalloc() and requires no extra error checking as the allocation burden is shifted to the transport client. Looks good to me; wish we could get it tested I will test it. Tim could you please send me a final version that you'd like tested as a single message? Would someone (like Tim maybe ... hint hint) look at tearing out all those dead registration strategies? I don't think we need or will ever use bounce-buffers, memory windows, or mlnx fmr. The only two that are used and tested are all-phys and FRMR (the default). Tom In future if we do decide to also increase the size of that array we may need to allocate it separately from struct rpcrdma_xprt itself, which looks already fairly large without it; on x86_64: $ gdb net/sunrpc/xprtrdma/xprtrdma.ko ... (gdb) p sizeof(struct rpcrdma_xprt) $1 = 2912 But that shouldn't be a big deal to do. --b. net/sunrpc/xprtrdma/verbs.c | 10 ++ net/sunrpc/xprtrdma/xprt_rdma.h |5 - 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..c7aa3da 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1730,13 +1730,14 @@ rpcrdma_deregister_memwin_external(struct rpcrdma_mr_seg *seg, } static int -rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, - int *nsegs, int writing, struct rpcrdma_ia *ia) +rpcrdma_register_default_external(struct rpcrdma_xprt *r_xprt, + struct rpcrdma_mr_seg *seg, int *nsegs, int writing, + struct rpcrdma_ia *ia) { int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE : IB_ACCESS_REMOTE_READ); struct rpcrdma_mr_seg *seg1 = seg; - struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; + struct ib_phys_buf *ipb = r_xprt->ipb; int len, i, rc = 0; if (*nsegs > RPCRDMA_MAX_DATA_SEGS) @@ -1827,7 +1828,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg, /* Default registration each time */ default: - rc = rpcrdma_register_default_external(seg, , writing, ia); + rc = rpcrdma_register_default_external(r_xprt, seg, , + writing, ia); break; } if (rc) diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..d7b440f 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -269,7 +269,8 @@ struct rpcrdma_stats { * for convenience. This structure need not be visible externally. * * It is allocated and initialized during mount, and released - * during unmount. + * during unmount. Access to this structure is serialized by XPRT_LOCKED + * in xprt_reserve_xprt(). */ struct rpcrdma_xprt { struct rpc_xprt xprt; @@ -279,6 +280,8 @@ struct rpcrdma_xprt { struct rpcrdma_create_data_internal rx_data; struct delayed_work rdma_connect; struct rpcrdma_statsrx_stats; + /* temp work array */ + struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; }; #define rpcx_to_rdmax(x) container_of(x, struct rpcrd
Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
On 3/11/13 4:25 PM, J. Bruce Fields wrote: On Mon, Mar 11, 2013 at 03:15:08PM -0600, Tim Gardner wrote: rpcrdma_register_default_external() is several frames into the call stack which goes deeper yet. You run the risk of stack corruption by declaring such a large automatic variable, so move the array of 'struct ib_phys_buf' objects into the transport structure 'struct rpcrdma_xprt' (which is dynamically allocated) in order to silence the frame-larger-than warning. Access to each struct rpcrdma_xprt is serialized by XPRT_LOCKED in xprt_reserve_xprt(), so there is no danger of multiple accessors to the array of struct ib_phys_buf objects. net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external': net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] gcc version 4.6.3 Cc: Trond Myklebust trond.mykleb...@netapp.com Cc: J. Bruce Fields bfie...@fieldses.org Cc: David S. Miller da...@davemloft.net Cc: Tom Tucker t...@ogc.us Cc: Haggai Eran hagg...@mellanox.com Cc: Or Gerlitz ogerl...@mellanox.com Cc: Shani Michaeli sha...@mellanox.com Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Signed-off-by: Tim Gardner tim.gard...@canonical.com --- v1 - Use kmalloc() to dynamically allocate and free the array of 'struct ib_phys_buf' objects v2 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_req and pass this request down through rpcrdma_register_external() and rpcrdma_register_default_external(). This is less overhead then using kmalloc() and requires no extra error checking as the allocation burden is shifted to the transport client. v3 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_xprt. Pass a pointer to this transport structure into rpcrdma_register_default_external(). This is less overhead then using kmalloc() and requires no extra error checking as the allocation burden is shifted to the transport client. Looks good to me; wish we could get it tested I will test it. Tim could you please send me a final version that you'd like tested as a single message? Would someone (like Tim maybe ... hint hint) look at tearing out all those dead registration strategies? I don't think we need or will ever use bounce-buffers, memory windows, or mlnx fmr. The only two that are used and tested are all-phys and FRMR (the default). Tom In future if we do decide to also increase the size of that array we may need to allocate it separately from struct rpcrdma_xprt itself, which looks already fairly large without it; on x86_64: $ gdb net/sunrpc/xprtrdma/xprtrdma.ko ... (gdb) p sizeof(struct rpcrdma_xprt) $1 = 2912 But that shouldn't be a big deal to do. --b. net/sunrpc/xprtrdma/verbs.c | 10 ++ net/sunrpc/xprtrdma/xprt_rdma.h |5 - 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..c7aa3da 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1730,13 +1730,14 @@ rpcrdma_deregister_memwin_external(struct rpcrdma_mr_seg *seg, } static int -rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, - int *nsegs, int writing, struct rpcrdma_ia *ia) +rpcrdma_register_default_external(struct rpcrdma_xprt *r_xprt, + struct rpcrdma_mr_seg *seg, int *nsegs, int writing, + struct rpcrdma_ia *ia) { int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE : IB_ACCESS_REMOTE_READ); struct rpcrdma_mr_seg *seg1 = seg; - struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; + struct ib_phys_buf *ipb = r_xprt-ipb; int len, i, rc = 0; if (*nsegs RPCRDMA_MAX_DATA_SEGS) @@ -1827,7 +1828,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg, /* Default registration each time */ default: - rc = rpcrdma_register_default_external(seg, nsegs, writing, ia); + rc = rpcrdma_register_default_external(r_xprt, seg, nsegs, + writing, ia); break; } if (rc) diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..d7b440f 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -269,7 +269,8 @@ struct rpcrdma_stats { * for convenience. This structure need not be visible externally. * * It is allocated and initialized during mount, and released - * during unmount. + * during unmount. Access to this structure is serialized by XPRT_LOCKED + * in xprt_reserve_xprt(). */ struct rpcrdma_xprt { struct rpc_xprt xprt; @@ -279,6 +280,8 @@ struct rpcrdma_xprt { struct rpcrdma_create_data_internal rx_data; struct delayed_work rdma_connect; struct rpcrdma_statsrx_stats
Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
AhOk On 3/11/13 9:53 PM, Tim Gardner wrote: On 03/11/2013 05:02 PM, Tom Tucker wrote: On 3/11/13 4:25 PM, J. Bruce Fields wrote: snip Looks good to me; wish we could get it tested I will test it. Tim could you please send me a final version that you'd like tested as a single message? I'm a little confused about what you are asking. I think v3 of the patch is the final version (unless you find bugs with it). Would someone (like Tim maybe ... hint hint) look at tearing out all those dead registration strategies? I don't think we need or will ever use bounce-buffers, memory windows, or mlnx fmr. The only two that are used and tested are all-phys and FRMR (the default). Dunno if I'll get time for that. I had a one day window where I could hack out some simple patches. Now I'm back to the usual grindstone. rtg -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
This is the result of 2773395b34883fe54418de188733a63bb38e0ad6. Steve might want to weigh in on this since it was done for performance reasons. Tom On 3/10/13 10:39 AM, Tim Gardner wrote: rpcrdma_register_default_external() is several frames into the call stack which goes deeper yet. You run the risk of stack corruption by declaring such a large automatic variable, so dynamically allocate the array of 'struct ib_phys_buf' objects in order to silence the frame-larger-than warning. net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external': net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] gcc version 4.6.3 Cc: Trond Myklebust Cc: "J. Bruce Fields" Cc: "David S. Miller" Cc: Tom Tucker Cc: Haggai Eran Cc: Or Gerlitz Cc: Shani Michaeli Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Signed-off-by: Tim Gardner --- net/sunrpc/xprtrdma/verbs.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..0916467 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1736,9 +1736,13 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE : IB_ACCESS_REMOTE_READ); struct rpcrdma_mr_seg *seg1 = seg; - struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; + struct ib_phys_buf *ipb; int len, i, rc = 0; + ipb = kmalloc(sizeof(*ipb) * RPCRDMA_MAX_DATA_SEGS, GFP_KERNEL); + if (!ipb) + return -ENOMEM; + if (*nsegs > RPCRDMA_MAX_DATA_SEGS) *nsegs = RPCRDMA_MAX_DATA_SEGS; for (len = 0, i = 0; i < *nsegs;) { @@ -1770,6 +1774,7 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, seg1->mr_len = len; } *nsegs = i; + kfree(ipb); return rc; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf
This is the result of 2773395b34883fe54418de188733a63bb38e0ad6. Steve might want to weigh in on this since it was done for performance reasons. Tom On 3/10/13 10:39 AM, Tim Gardner wrote: rpcrdma_register_default_external() is several frames into the call stack which goes deeper yet. You run the risk of stack corruption by declaring such a large automatic variable, so dynamically allocate the array of 'struct ib_phys_buf' objects in order to silence the frame-larger-than warning. net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external': net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] gcc version 4.6.3 Cc: Trond Myklebust trond.mykleb...@netapp.com Cc: J. Bruce Fields bfie...@fieldses.org Cc: David S. Miller da...@davemloft.net Cc: Tom Tucker t...@ogc.us Cc: Haggai Eran hagg...@mellanox.com Cc: Or Gerlitz ogerl...@mellanox.com Cc: Shani Michaeli sha...@mellanox.com Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Signed-off-by: Tim Gardner tim.gard...@canonical.com --- net/sunrpc/xprtrdma/verbs.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 93726560..0916467 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1736,9 +1736,13 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE : IB_ACCESS_REMOTE_READ); struct rpcrdma_mr_seg *seg1 = seg; - struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; + struct ib_phys_buf *ipb; int len, i, rc = 0; + ipb = kmalloc(sizeof(*ipb) * RPCRDMA_MAX_DATA_SEGS, GFP_KERNEL); + if (!ipb) + return -ENOMEM; + if (*nsegs RPCRDMA_MAX_DATA_SEGS) *nsegs = RPCRDMA_MAX_DATA_SEGS; for (len = 0, i = 0; i *nsegs;) { @@ -1770,6 +1774,7 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, seg1-mr_len = len; } *nsegs = i; + kfree(ipb); return rc; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote: > On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: > > (suitable cc added) > > > > (regression) > > > > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[EMAIL PROTECTED]> > > wrote: > > > > > Hello! > > > > > > After upgrading to 2.6.24 (from .23), we're seeing ALOT > > > of messages like in $subj in dmesg: > > > > > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. > > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. > > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. > > > ... > > > > > > with linux NFS server. The clients are all linux too, mostly 2.6.23 > > > and some 2.6.22. > > > > > > I found the "offending" piece of code in net/sunrpc/svcsock.c, > > > in routine svc_tcp_recvfrom() with condition being: > > > > > >if (svsk->sk_reclen > serv->sv_max_mesg) ... > > The problem might be that the client is setting a bit in the RPC message > length field that is meant to be interpreted and masked off by the > server -- and we're not doing it yet. My bet is that 0x2 is the bit > we're looking for. I'll poke around... Never mind. The way this is supposed to work is that the transport is shut down. The code used to delete the socket directly, but I reorganized this code to just set the XPT_CLOSE bit and let the normal close path handle it when it came back through to retry. I'm not sure exactly what version of the code you have, but it may be that your missing the code from the close path that does this, or it may just not work. As a first shot, can you try this patch and tell me if the messages go away? diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 1d3e5fc..cf6150a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) err_delete: set_bit(XPT_CLOSE, >sk_xprt.xpt_flags); + svc_delete_xprt(>sk_xprt); return -EAGAIN; error: > > > > > > > This happens after a server reboot. At this point, client(s) are trying > > > to perform some NFS transaction and fail, and server starts generating > > > the above messages - till I do a umount followed by mount on all clients. > > > Before, such situation (nfs server reboot) were handled transparently, > > > ie, there was nothing to do, the mount continued working just fine when > > > the server comes back online. > > > > > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace > > > problem. Some time ago we also upgraded nfs-kernel-server (Debian) > > > package, and the remount-after-nfs-server-reboot problem started to > > > occur at THAT time (and it is something to worry about as well, I just > > > had no time to deal with it); but the dmesg spamming only appeared > > > with 2.6.24. > > > > > > How to debug the issue further on from this point? > > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: > (suitable cc added) > > (regression) > > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[EMAIL PROTECTED]> wrote: > > > Hello! > > > > After upgrading to 2.6.24 (from .23), we're seeing ALOT > > of messages like in $subj in dmesg: > > > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. > > ... > > > > with linux NFS server. The clients are all linux too, mostly 2.6.23 > > and some 2.6.22. > > > > I found the "offending" piece of code in net/sunrpc/svcsock.c, > > in routine svc_tcp_recvfrom() with condition being: > > > >if (svsk->sk_reclen > serv->sv_max_mesg) ... The problem might be that the client is setting a bit in the RPC message length field that is meant to be interpreted and masked off by the server -- and we're not doing it yet. My bet is that 0x2 is the bit we're looking for. I'll poke around... > > > > This happens after a server reboot. At this point, client(s) are trying > > to perform some NFS transaction and fail, and server starts generating > > the above messages - till I do a umount followed by mount on all clients. > > Before, such situation (nfs server reboot) were handled transparently, > > ie, there was nothing to do, the mount continued working just fine when > > the server comes back online. > > > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace > > problem. Some time ago we also upgraded nfs-kernel-server (Debian) > > package, and the remount-after-nfs-server-reboot problem started to > > occur at THAT time (and it is something to worry about as well, I just > > had no time to deal with it); but the dmesg spamming only appeared > > with 2.6.24. > > > > How to debug the issue further on from this point? > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: (suitable cc added) (regression) On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev [EMAIL PROTECTED] wrote: Hello! After upgrading to 2.6.24 (from .23), we're seeing ALOT of messages like in $subj in dmesg: Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. ... with linux NFS server. The clients are all linux too, mostly 2.6.23 and some 2.6.22. I found the offending piece of code in net/sunrpc/svcsock.c, in routine svc_tcp_recvfrom() with condition being: if (svsk-sk_reclen serv-sv_max_mesg) ... The problem might be that the client is setting a bit in the RPC message length field that is meant to be interpreted and masked off by the server -- and we're not doing it yet. My bet is that 0x2 is the bit we're looking for. I'll poke around... This happens after a server reboot. At this point, client(s) are trying to perform some NFS transaction and fail, and server starts generating the above messages - till I do a umount followed by mount on all clients. Before, such situation (nfs server reboot) were handled transparently, ie, there was nothing to do, the mount continued working just fine when the server comes back online. Now, I'm not sure if it's really 2.6.24-specific problem or a userspace problem. Some time ago we also upgraded nfs-kernel-server (Debian) package, and the remount-after-nfs-server-reboot problem started to occur at THAT time (and it is something to worry about as well, I just had no time to deal with it); but the dmesg spamming only appeared with 2.6.24. How to debug the issue further on from this point? - To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote: On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: (suitable cc added) (regression) On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev [EMAIL PROTECTED] wrote: Hello! After upgrading to 2.6.24 (from .23), we're seeing ALOT of messages like in $subj in dmesg: Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. ... with linux NFS server. The clients are all linux too, mostly 2.6.23 and some 2.6.22. I found the offending piece of code in net/sunrpc/svcsock.c, in routine svc_tcp_recvfrom() with condition being: if (svsk-sk_reclen serv-sv_max_mesg) ... The problem might be that the client is setting a bit in the RPC message length field that is meant to be interpreted and masked off by the server -- and we're not doing it yet. My bet is that 0x2 is the bit we're looking for. I'll poke around... Never mind. The way this is supposed to work is that the transport is shut down. The code used to delete the socket directly, but I reorganized this code to just set the XPT_CLOSE bit and let the normal close path handle it when it came back through to retry. I'm not sure exactly what version of the code you have, but it may be that your missing the code from the close path that does this, or it may just not work. As a first shot, can you try this patch and tell me if the messages go away? diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 1d3e5fc..cf6150a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) err_delete: set_bit(XPT_CLOSE, svsk-sk_xprt.xpt_flags); + svc_delete_xprt(svsk-sk_xprt); return -EAGAIN; error: This happens after a server reboot. At this point, client(s) are trying to perform some NFS transaction and fail, and server starts generating the above messages - till I do a umount followed by mount on all clients. Before, such situation (nfs server reboot) were handled transparently, ie, there was nothing to do, the mount continued working just fine when the server comes back online. Now, I'm not sure if it's really 2.6.24-specific problem or a userspace problem. Some time ago we also upgraded nfs-kernel-server (Debian) package, and the remount-after-nfs-server-reboot problem started to occur at THAT time (and it is something to worry about as well, I just had no time to deal with it); but the dmesg spamming only appeared with 2.6.24. How to debug the issue further on from this point? - To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: > Hey Roland (and any iommu/ppc/dma experts out there): > > I'm debugging a data corruption issue that happens on PPC64 systems > running rdma on kernels where the iommu page size is 4KB yet the host > page size is 64KB. This "feature" was added to the PPC64 code recently, > and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB > page size, no problems. If the kernel is prior to 2.6.23 then 64KB page > configs work too. Its just a problem when the iommu page size != host > page size. > > It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() is > _not_ 64KB aligned. Here is an example: > > app registers va 0x2d9a3000 len 12288 > ib_umem_get() creates and maps a umem and chunk that looks like (dumping > state from a registered user memory region): > > > umem len 12288 off 12288 pgsz 65536 shift 16 > > chunk 0: nmap 1 nents 1 > > sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr > > 5bff4000 dma_len 65536 > > > > So the kernel maps 1 full page for this MR. But note that the dma > address is 5bff4000 which is 4KB aligned, not 64KB aligned. I > think this is causing grief to the RDMA HW. > > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. > > If this mapping is considered valid, then perhaps the rdma hw is at > fault here. But I'm wondering if this is an PPC/iommu bug. > > BTW: Here is what the Memory Region looks like to the HW: > > > TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 > > perms RW rem_inv_dis 0 addr_type VATO > > bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 > > len 12288 va 2d9a3000 bind_cnt 0 > > PBL: 5bff4000 > > > > Any thoughts? The Ammasso certainly works this way. If you tell it the page size is 64KB, it will ignore bits in the page address that encode 0-65535. > > Steve. > > > ___ > general mailing list > [EMAIL PROTECTED] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? The Ammasso certainly works this way. If you tell it the page size is 64KB, it will ignore bits in the page address that encode 0-65535. Steve. ___ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On 11/27/07 7:27 PM, "Rusty Russell" <[EMAIL PROTECTED]> wrote: > On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote: >> On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: >> Explicitly documenting what comprises the kernel API (external, >> supported) and what comprises the kernel implementation (internal, not >> supported). > > But the former is currently an empty set. > Yes, I overstated this. >> - making it obvious to developers when they are binding their >> implementation to a particular kernel release > > See, there's your problem. All interfaces can, and will, change. You're > always binding yourself to a particular release. > Absolutely in the limit. But there are many bits of code that work quite nicely from release to release because they use services that live in the smooth water in the wake of the Linux head. I think defining that smooth water has merit. I also think that it would be nice to limit the scope of module externs to avoid polluting the global namespace. I'm not sure that this particular patch reaches these goals, but it prompted me to comment. > So you're not proposing we mark what's not stable, you're arguing that we > create a subset which is stable. > Well, this is an interesting question. The answer is I think both are important. It would be nice (and arguably necessary long term) to limit the scope of externs. This can be accomplished with name spaces "I want bob's implementation of read." I think it also has value to define interfaces that are considered stable (but not inviolate) to allow developers to make better informed decisions when choosing interfaces. Having this info explicit in the code seems logical to me. > That's an argument we're not (yet) having. > Yeah, maybe I'm off in the weeds on this one... Tom > Cheers, > Rusty. > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 2007-11-27 at 18:15 +0100, Adrian Bunk wrote: > On Mon, Nov 26, 2007 at 11:35:42PM -0600, Tom Tucker wrote: > > On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: > >... > > > No. That's the wrong question. What's the real upside? > > > > Explicitly documenting what comprises the kernel API (external, > > supported) and what comprises the kernel implementation (internal, not > > supported). > >... > > There is not, never was, and never will be, any supported external API > of the kernel. Philosophically I understand what you're saying, but in practical terms there is the issue of managing core API like kmalloc. Although kmalloc _could_ change, doing so would be extremely painful. In fact anyone who proposed such a change would have to have a profoundly powerful argument as to why it was necessary. I think this patchset is an attempt to make it easier to identify and review these kinds of interfaces. > > cu > Adrian > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 2007-11-27 at 18:15 +0100, Adrian Bunk wrote: On Mon, Nov 26, 2007 at 11:35:42PM -0600, Tom Tucker wrote: On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: ... No. That's the wrong question. What's the real upside? Explicitly documenting what comprises the kernel API (external, supported) and what comprises the kernel implementation (internal, not supported). ... There is not, never was, and never will be, any supported external API of the kernel. Philosophically I understand what you're saying, but in practical terms there is the issue of managing core API like kmalloc. Although kmalloc _could_ change, doing so would be extremely painful. In fact anyone who proposed such a change would have to have a profoundly powerful argument as to why it was necessary. I think this patchset is an attempt to make it easier to identify and review these kinds of interfaces. cu Adrian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On 11/27/07 7:27 PM, Rusty Russell [EMAIL PROTECTED] wrote: On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote: On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: Explicitly documenting what comprises the kernel API (external, supported) and what comprises the kernel implementation (internal, not supported). But the former is currently an empty set. Yes, I overstated this. - making it obvious to developers when they are binding their implementation to a particular kernel release See, there's your problem. All interfaces can, and will, change. You're always binding yourself to a particular release. Absolutely in the limit. But there are many bits of code that work quite nicely from release to release because they use services that live in the smooth water in the wake of the Linux head. I think defining that smooth water has merit. I also think that it would be nice to limit the scope of module externs to avoid polluting the global namespace. I'm not sure that this particular patch reaches these goals, but it prompted me to comment. So you're not proposing we mark what's not stable, you're arguing that we create a subset which is stable. Well, this is an interesting question. The answer is I think both are important. It would be nice (and arguably necessary long term) to limit the scope of externs. This can be accomplished with name spaces I want bob's implementation of read. I think it also has value to define interfaces that are considered stable (but not inviolate) to allow developers to make better informed decisions when choosing interfaces. Having this info explicit in the code seems logical to me. That's an argument we're not (yet) having. Yeah, maybe I'm off in the weeds on this one... Tom Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: > On Monday 26 November 2007 17:15:44 Roland Dreier wrote: > > > Except C doesn't have namespaces and this mechanism doesn't create them. > > > So this is just complete and utter makework; as I said before, noone's > > > going to confuse all those udp_* functions if they're not in the udp > > > namespace. > > > > I don't understand why you're so opposed to organizing the kernel's > > exported symbols in a more self-documenting way. > > No, I was the one who moved exports near their declarations. That's > organised. I just don't see how this new "organization" will help: oh good, > I won't accidentally use the udp functions any more?!? > > > It seems pretty > > clear to me that having a mechanism that requires modules to make > > explicit which (semi-)internal APIs makes reviewing easier > > Perhaps you've got lots of patches were people are using internal APIs they > shouldn't? > Maybe the issue is "who can tell" since what is external and what is internal is not explicitly defined? > > , makes it > > easier to communicate "please don't use that API" to module authors, > > Well, introduce an EXPORT_SYMBOL_INTERNAL(). It's a lot less code. But > you'd > still need to show that people are having trouble knowing what APIs to use. > > and takes at least a small step towards bringing the kernel's exported > > API under control. > > There is no "exported API" to bring under control. Hmm...apparently, there are those that are struggling... > There are symbols we > expose for the kernel's own use which can be used by external modules at > their own risk. > > > What's the real downside? > > No. That's the wrong question. What's the real upside? Explicitly documenting what comprises the kernel API (external, supported) and what comprises the kernel implementation (internal, not supported). > > Let's not put code in the core because "it doesn't seem to hurt". > agreed. > I'm sure you think there's a real problem, but I'm still waiting for someone > to *show* it to me. Then we can look at solutions. I think the benefits should include: - forcing developers to identify their exports as part of the implementation or as part of the kernel API - making it easier for reviewers to identify when developers are adding to the kernel API and thereby focusing the appropriate level of review to the new function - making it obvious to developers when they are binding their implementation to a particular kernel release > Rusty. > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote: On Monday 26 November 2007 17:15:44 Roland Dreier wrote: Except C doesn't have namespaces and this mechanism doesn't create them. So this is just complete and utter makework; as I said before, noone's going to confuse all those udp_* functions if they're not in the udp namespace. I don't understand why you're so opposed to organizing the kernel's exported symbols in a more self-documenting way. No, I was the one who moved exports near their declarations. That's organised. I just don't see how this new organization will help: oh good, I won't accidentally use the udp functions any more?!? It seems pretty clear to me that having a mechanism that requires modules to make explicit which (semi-)internal APIs makes reviewing easier Perhaps you've got lots of patches were people are using internal APIs they shouldn't? Maybe the issue is who can tell since what is external and what is internal is not explicitly defined? , makes it easier to communicate please don't use that API to module authors, Well, introduce an EXPORT_SYMBOL_INTERNAL(). It's a lot less code. But you'd still need to show that people are having trouble knowing what APIs to use. and takes at least a small step towards bringing the kernel's exported API under control. There is no exported API to bring under control. Hmm...apparently, there are those that are struggling... There are symbols we expose for the kernel's own use which can be used by external modules at their own risk. What's the real downside? No. That's the wrong question. What's the real upside? Explicitly documenting what comprises the kernel API (external, supported) and what comprises the kernel implementation (internal, not supported). Let's not put code in the core because it doesn't seem to hurt. agreed. I'm sure you think there's a real problem, but I'm still waiting for someone to *show* it to me. Then we can look at solutions. I think the benefits should include: - forcing developers to identify their exports as part of the implementation or as part of the kernel API - making it easier for reviewers to identify when developers are adding to the kernel API and thereby focusing the appropriate level of review to the new function - making it obvious to developers when they are binding their implementation to a particular kernel release Rusty. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote: [...snip...] > > I think removing the RDMA stack is the wrong thing to do, and you > > shouldn't just threaten to yank entire subsystems because you don't like > > the technology. Lets keep this constructive, can we? RDMA should get > > the respect of any other technology in Linux. Maybe its a niche in your > > opinion, but come on, there's more RDMA users than say, the sparc64 > > port. Eh? > > It's not about being a niche. It's about creating a maintainable > software net stack that has predictable behavior. Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't making RDMA stable, supportable and maintainable equally as important as any other subsystem? > > Needing to reach out of the RDMA sandbox and reserve net stack resources > away from itself travels a path we've consistently avoided. > > > >> I will NACK any patch that opens up sockets to eat up ports or > >> anything stupid like that. > > > > Got it. > > Ditto for me as well. > > Jeff > > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote: [...snip...] I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? It's not about being a niche. It's about creating a maintainable software net stack that has predictable behavior. Isn't RDMA _part_ of the software net stack within Linux? Why isn't making RDMA stable, supportable and maintainable equally as important as any other subsystem? Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Ditto for me as well. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/