Re: [PATCH 3.12 34/72] Fix regression in NFSRDMA server

2015-11-23 Thread Tom Tucker
Seems like part of a patch was dropped or there was a merge problem. The 
code in my tree has all these changes. Very strange.


Tom

On 11/23/15 7:08 AM, Jiri Slaby wrote:

From: Tom Tucker 

3.12-stable review patch.  If anyone has any objections, please let me know.

===

commit 7e4359e2611f95a97037e2b6905eab52f28afbeb upstream.

The server regression was caused by the addition of rq_next_page
(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that
were missed with the update of the rq_respages array.

Signed-off-by: Tom Tucker 
Tested-by: Steve Wise 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Jiri Slaby 
---
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 
  net/sunrpc/xprtrdma/svc_rdma_sendto.c   |  1 +
  2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0ce75524ed21..8d904e4eef15 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
sge_no++;
}
rqstp->rq_respages = >rq_pages[sge_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;

This was fixed in an older version.
  
  	/* We should never run out of SGE because the limit is defined to

 * support the max allowed RPC data length
@@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt,
 */
head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no];
rqstp->rq_respages = >rq_arg.pages[page_no+1];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  		byte_count -= sge_bytes;

ch_bytes -= sge_bytes;
@@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt,
  
  	/* rq_respages points one past arg pages */

rqstp->rq_respages = >rq_arg.pages[page_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  	/* Create the reply and chunk maps */

offset = 0;
@@ -520,13 +523,6 @@ next_sge:
for (ch_no = 0; >rq_pages[ch_no] < rqstp->rq_respages; ch_no++)
rqstp->rq_pages[ch_no] = NULL;
  
-	/*

-* Detach res pages. If svc_release sees any it will attempt to
-* put them.
-*/
-   while (rqstp->rq_next_page != rqstp->rq_respages)
-   *(--rqstp->rq_next_page) = NULL;
-
return err;
  }
  
@@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
  
  	/* rq_respages starts after the last arg page */

rqstp->rq_respages = >rq_arg.pages[page_no];
-   rqstp->rq_next_page = >rq_arg.pages[page_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  	/* Rebuild rq_arg head and tail. */

rqstp->rq_arg.head[0] = head->arg.head[0];
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index c1d124dc772b..11e90f8c0fc5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
if (page_no+1 >= sge_no)
ctxt->sge[page_no+1].length = 0;
}
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
BUG_ON(sge_no > rdma->sc_max_sge);
memset(_wr, 0, sizeof send_wr);
ctxt->wr_op = IB_WR_SEND;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.12 34/72] Fix regression in NFSRDMA server

2015-11-23 Thread Tom Tucker
Seems like part of a patch was dropped or there was a merge problem. The 
code in my tree has all these changes. Very strange.


Tom

On 11/23/15 7:08 AM, Jiri Slaby wrote:

From: Tom Tucker <t...@ogc.us>

3.12-stable review patch.  If anyone has any objections, please let me know.

===

commit 7e4359e2611f95a97037e2b6905eab52f28afbeb upstream.

The server regression was caused by the addition of rq_next_page
(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that
were missed with the update of the rq_respages array.

Signed-off-by: Tom Tucker <t...@ogc.us>
Tested-by: Steve Wise <sw...@ogc.us>
Signed-off-by: J. Bruce Fields <bfie...@redhat.com>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>
---
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 
  net/sunrpc/xprtrdma/svc_rdma_sendto.c   |  1 +
  2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0ce75524ed21..8d904e4eef15 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
sge_no++;
}
rqstp->rq_respages = >rq_pages[sge_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;

This was fixed in an older version.
  
  	/* We should never run out of SGE because the limit is defined to

 * support the max allowed RPC data length
@@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt,
 */
head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no];
rqstp->rq_respages = >rq_arg.pages[page_no+1];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  		byte_count -= sge_bytes;

ch_bytes -= sge_bytes;
@@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt,
  
  	/* rq_respages points one past arg pages */

rqstp->rq_respages = >rq_arg.pages[page_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  	/* Create the reply and chunk maps */

offset = 0;
@@ -520,13 +523,6 @@ next_sge:
for (ch_no = 0; >rq_pages[ch_no] < rqstp->rq_respages; ch_no++)
rqstp->rq_pages[ch_no] = NULL;
  
-	/*

-* Detach res pages. If svc_release sees any it will attempt to
-* put them.
-*/
-   while (rqstp->rq_next_page != rqstp->rq_respages)
-   *(--rqstp->rq_next_page) = NULL;
-
return err;
  }
  
@@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
  
  	/* rq_respages starts after the last arg page */

rqstp->rq_respages = >rq_arg.pages[page_no];
-   rqstp->rq_next_page = >rq_arg.pages[page_no];
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
  
  	/* Rebuild rq_arg head and tail. */

rqstp->rq_arg.head[0] = head->arg.head[0];
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index c1d124dc772b..11e90f8c0fc5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
if (page_no+1 >= sge_no)
ctxt->sge[page_no+1].length = 0;
}
+   rqstp->rq_next_page = rqstp->rq_respages + 1;
BUG_ON(sge_no > rdma->sc_max_sge);
memset(_wr, 0, sizeof send_wr);
ctxt->wr_op = IB_WR_SEND;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-21 Thread Tom Tucker

On 4/21/15 2:39 AM, Michael Wang wrote:


On 04/20/2015 05:51 PM, Tom Tucker wrote:
[snip]

int ib_query_gid(struct ib_device *device,
 u8 port_num, int index, union ib_gid *gid);


iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)

It's more consistent, but not necessarily more readable -- if by readability we 
mean understanding.

If the reader knows how the transports work, then the reader would be confused 
by the addition of a check that is always true. For the reader that doesn't 
know, the addition of the check implies that the support is optional, which it 
is not.

The purpose is to make sure folks understand what we really want to check
when they reviewing the code :-) and prepared for the further reform which may
not rely on technology type any more, for example the device could tell core
layer directly what management it required with a bitmask :-)

Hi Michael,

Thanks for the reply, but my premise was just wrong...I need to review the 
whole patch, not just a snippet.


Thanks,
Tom

Regards,
Michael Wang


Tom


Regards,
Michael Wang

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-21 Thread Tom Tucker

On 4/21/15 2:39 AM, Michael Wang wrote:


On 04/20/2015 05:51 PM, Tom Tucker wrote:
[snip]

int ib_query_gid(struct ib_device *device,
 u8 port_num, int index, union ib_gid *gid);


iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)

It's more consistent, but not necessarily more readable -- if by readability we 
mean understanding.

If the reader knows how the transports work, then the reader would be confused 
by the addition of a check that is always true. For the reader that doesn't 
know, the addition of the check implies that the support is optional, which it 
is not.

The purpose is to make sure folks understand what we really want to check
when they reviewing the code :-) and prepared for the further reform which may
not rely on technology type any more, for example the device could tell core
layer directly what management it required with a bitmask :-)

Hi Michael,

Thanks for the reply, but my premise was just wrong...I need to review the 
whole patch, not just a snippet.


Thanks,
Tom

Regards,
Michael Wang


Tom


Regards,
Michael Wang

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-20 Thread Tom Tucker

On 4/20/15 11:19 AM, Jason Gunthorpe wrote:

On Mon, Apr 20, 2015 at 10:51:58AM -0500, Tom Tucker wrote:

On 4/20/15 10:16 AM, Michael Wang wrote:

On 04/20/2015 04:00 PM, Steve Wise wrote:

On 4/20/2015 3:40 AM, Michael Wang wrote:

[snip]

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6805e3e..e4999f6 100644
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 
port_num)
   return rdma_ib_or_iboe(device, port_num);
   }
   +/**
+ * cap_iw_cm - Check if the port of device has the capability IWARP
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support IWARP
+ * Communication Manager.
+ */
+static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
+{
+return rdma_tech_iwarp(device, port_num);
+}
+
   int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)

It's more consistent, but not necessarily more readable -- if by
readability we mean understanding.

If the reader knows how the transports work, then the reader would
be confused by the addition of a check that is always true. For the
reader that doesn't know, the addition of the check implies that the
support is optional, which it is not.

No, it says this code is concerned with the unique parts of iWarp
related to CM, not the other unique parts of iWarp. The check isn't
aways true, it is just always true on iWarp devices.

That became the problem with the old way of just saying 'is iWarp'
(and others). There are too many differences, the why became lost in
many places.

There are now too many standards, and several do not have public docs,
to keep relying on a mess of 'is standard' tests.


You're right Jason, this gets called with the device handle so it's only 
true for iwarp.



Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-20 Thread Tom Tucker

On 4/20/15 10:16 AM, Michael Wang wrote:

On 04/20/2015 04:00 PM, Steve Wise wrote:

On 4/20/2015 3:40 AM, Michael Wang wrote:

[snip]

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6805e3e..e4999f6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 
port_num)
   return rdma_ib_or_iboe(device, port_num);
   }
   +/**
+ * cap_iw_cm - Check if the port of device has the capability IWARP
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support IWARP
+ * Communication Manager.
+ */
+static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
+{
+return rdma_tech_iwarp(device, port_num);
+}
+
   int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);
   

iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)


It's more consistent, but not necessarily more readable -- if by 
readability we mean understanding.


If the reader knows how the transports work, then the reader would be 
confused by the addition of a check that is always true. For the reader 
that doesn't know, the addition of the check implies that the support is 
optional, which it is not.


Tom


Regards,
Michael Wang




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-20 Thread Tom Tucker

On 4/20/15 10:16 AM, Michael Wang wrote:

On 04/20/2015 04:00 PM, Steve Wise wrote:

On 4/20/2015 3:40 AM, Michael Wang wrote:

[snip]

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6805e3e..e4999f6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 
port_num)
   return rdma_ib_or_iboe(device, port_num);
   }
   +/**
+ * cap_iw_cm - Check if the port of device has the capability IWARP
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support IWARP
+ * Communication Manager.
+ */
+static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
+{
+return rdma_tech_iwarp(device, port_num);
+}
+
   int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);
   

iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)


It's more consistent, but not necessarily more readable -- if by 
readability we mean understanding.


If the reader knows how the transports work, then the reader would be 
confused by the addition of a check that is always true. For the reader 
that doesn't know, the addition of the check implies that the support is 
optional, which it is not.


Tom


Regards,
Michael Wang




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 19/27] IB/Verbs: Use management helper cap_iw_cm()

2015-04-20 Thread Tom Tucker

On 4/20/15 11:19 AM, Jason Gunthorpe wrote:

On Mon, Apr 20, 2015 at 10:51:58AM -0500, Tom Tucker wrote:

On 4/20/15 10:16 AM, Michael Wang wrote:

On 04/20/2015 04:00 PM, Steve Wise wrote:

On 4/20/2015 3:40 AM, Michael Wang wrote:

[snip]

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6805e3e..e4999f6 100644
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 
port_num)
   return rdma_ib_or_iboe(device, port_num);
   }
   +/**
+ * cap_iw_cm - Check if the port of device has the capability IWARP
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support IWARP
+ * Communication Manager.
+ */
+static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
+{
+return rdma_tech_iwarp(device, port_num);
+}
+
   int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

iWARP devices _must_ support the IWCM so cap_iw_cm() is not really useful.

Sean suggested to add this helper paired with cap_ib_cm(), may be there are
some consideration on maintainability?

Me too also prefer this way to make the code more readable ;-)

It's more consistent, but not necessarily more readable -- if by
readability we mean understanding.

If the reader knows how the transports work, then the reader would
be confused by the addition of a check that is always true. For the
reader that doesn't know, the addition of the check implies that the
support is optional, which it is not.

No, it says this code is concerned with the unique parts of iWarp
related to CM, not the other unique parts of iWarp. The check isn't
aways true, it is just always true on iWarp devices.

That became the problem with the old way of just saying 'is iWarp'
(and others). There are too many differences, the why became lost in
many places.

There are now too many standards, and several do not have public docs,
to keep relying on a mess of 'is standard' tests.


You're right Jason, this gets called with the device handle so it's only 
true for iwarp.



Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Tom Tucker

On 4/16/15 8:45 AM, Michael Wang wrote:


On 04/16/2015 03:42 PM, Hal Rosenstock wrote:

On 4/16/2015 9:41 AM, Michael Wang wrote:


On 04/16/2015 03:36 PM, Hal Rosenstock wrote:
[snip]

-EXPORT_SYMBOL(rdma_node_get_transport);
-
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
port_num)
  {
if (device->get_link_layer)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 262bf44..f9ef479 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -84,9 +84,6 @@ enum rdma_transport_type {
RDMA_TRANSPORT_IBOE,
  };
  
-__attribute_const__ enum rdma_transport_type

-rdma_node_get_transport(enum rdma_node_type node_type);
-
  enum rdma_link_layer {
IB_LINK_LAYER_UNSPECIFIED,

Is IB_LINK_LAYER_UNSPECIFIED still possible ?

Actually it's impossible in kernel at first, all those who implemented the 
callback
won't return UNSPECIFIED, others all have the correct transport type (otherwise 
BUG())
and won't result UNSPECIFIED :-)

Should it be removed from this enum somewhere in this patch series
(perhaps early on) ?
I don't think it's ever been 'possible.' It's purpose is to catch 
initialized errors where the transport fails to initialize it's 
transport type. So for example,


provider = calloc(1, sizeof *provider)

If 0 is a valid link layer type, then you wouldn't catch these kinds of 
errors.


Tom

It was still directly used by helper like ib_modify_qp_is_ok() as indicator, 
may be
better in another following patch to reform those part :-)

Regards,
Michael Wang


-- Hal


Regards,
Michael Wang


IB_LINK_LAYER_INFINIBAND,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 27/27] IB/Verbs: Cleanup rdma_node_get_transport()

2015-04-16 Thread Tom Tucker

On 4/16/15 8:45 AM, Michael Wang wrote:


On 04/16/2015 03:42 PM, Hal Rosenstock wrote:

On 4/16/2015 9:41 AM, Michael Wang wrote:


On 04/16/2015 03:36 PM, Hal Rosenstock wrote:
[snip]

-EXPORT_SYMBOL(rdma_node_get_transport);
-
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 
port_num)
  {
if (device-get_link_layer)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 262bf44..f9ef479 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -84,9 +84,6 @@ enum rdma_transport_type {
RDMA_TRANSPORT_IBOE,
  };
  
-__attribute_const__ enum rdma_transport_type

-rdma_node_get_transport(enum rdma_node_type node_type);
-
  enum rdma_link_layer {
IB_LINK_LAYER_UNSPECIFIED,

Is IB_LINK_LAYER_UNSPECIFIED still possible ?

Actually it's impossible in kernel at first, all those who implemented the 
callback
won't return UNSPECIFIED, others all have the correct transport type (otherwise 
BUG())
and won't result UNSPECIFIED :-)

Should it be removed from this enum somewhere in this patch series
(perhaps early on) ?
I don't think it's ever been 'possible.' It's purpose is to catch 
initialized errors where the transport fails to initialize it's 
transport type. So for example,


provider = calloc(1, sizeof *provider)

If 0 is a valid link layer type, then you wouldn't catch these kinds of 
errors.


Tom

It was still directly used by helper like ib_modify_qp_is_ok() as indicator, 
may be
better in another following patch to reform those part :-)

Regards,
Michael Wang


-- Hal


Regards,
Michael Wang


IB_LINK_LAYER_INFINIBAND,


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-11 Thread Tom Tucker


AhOk

On 3/11/13 9:53 PM, Tim Gardner wrote:

On 03/11/2013 05:02 PM, Tom Tucker wrote:

On 3/11/13 4:25 PM, J. Bruce Fields wrote:





Looks good to me; wish we could get it tested


I will test it. Tim could you please send me a final version that you'd
like tested as a single message?



I'm a little confused about what you are asking. I think v3 of the patch 
is the final version (unless you find bugs with it).



Would someone (like Tim maybe ... hint hint) look at tearing out all
those dead registration strategies? I don't think we need or will ever
use bounce-buffers, memory windows, or mlnx fmr.  The only two that are
used and tested are all-phys and FRMR (the default).



Dunno if I'll get time for that. I had a one day window where I could 
hack out some simple patches. Now I'm back to the usual grindstone.


rtg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-11 Thread Tom Tucker

On 3/11/13 4:25 PM, J. Bruce Fields wrote:

On Mon, Mar 11, 2013 at 03:15:08PM -0600, Tim Gardner wrote:

rpcrdma_register_default_external() is several frames into the call stack which
goes deeper yet. You run the risk of stack corruption by declaring such a large
automatic variable, so move the array of 'struct ib_phys_buf' objects into the
transport structure 'struct rpcrdma_xprt' (which is dynamically allocated) in
order to silence the frame-larger-than warning. Access to each struct
rpcrdma_xprt is serialized by XPRT_LOCKED in xprt_reserve_xprt(), so there is
no danger of multiple accessors to the array of struct ib_phys_buf objects.

net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external':
net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is 
larger than 1024 bytes [-Wframe-larger-than=]

gcc version 4.6.3

Cc: Trond Myklebust 
Cc: "J. Bruce Fields" 
Cc: "David S. Miller" 
Cc: Tom Tucker 
Cc: Haggai Eran 
Cc: Or Gerlitz 
Cc: Shani Michaeli 
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
v1 - Use kmalloc() to dynamically allocate and free the array of 'struct
ib_phys_buf' objects

v2 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_req
and pass this request down through rpcrdma_register_external() and
rpcrdma_register_default_external(). This is less overhead then using
kmalloc() and requires no extra error checking as the allocation burden is
shifted to the transport client.

v3 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_xprt.
Pass a pointer to this transport structure into 
rpcrdma_register_default_external().
This is less overhead then using kmalloc() and requires no extra error checking
as the allocation burden is shifted to the transport client.

Looks good to me; wish we could get it tested


I will test it. Tim could you please send me a final version that you'd 
like tested as a single message?


Would someone (like Tim maybe ... hint hint) look at tearing out all those 
dead registration strategies? I don't think we need or will ever use 
bounce-buffers, memory windows, or mlnx fmr.  The only two that are used 
and tested are all-phys and FRMR (the default).


Tom


In future if we do decide to also increase the size of that array we may
need to allocate it separately from struct rpcrdma_xprt itself, which
looks already fairly large without it; on x86_64:

$ gdb net/sunrpc/xprtrdma/xprtrdma.ko
...
(gdb) p sizeof(struct rpcrdma_xprt)
$1 = 2912

But that shouldn't be a big deal to do.

--b.


  net/sunrpc/xprtrdma/verbs.c |   10 ++
  net/sunrpc/xprtrdma/xprt_rdma.h |5 -
  2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 93726560..c7aa3da 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1730,13 +1730,14 @@ rpcrdma_deregister_memwin_external(struct 
rpcrdma_mr_seg *seg,
  }
  
  static int

-rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
-   int *nsegs, int writing, struct rpcrdma_ia *ia)
+rpcrdma_register_default_external(struct rpcrdma_xprt *r_xprt,
+   struct rpcrdma_mr_seg *seg, int *nsegs, int writing,
+   struct rpcrdma_ia *ia)
  {
int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
  IB_ACCESS_REMOTE_READ);
struct rpcrdma_mr_seg *seg1 = seg;
-   struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
+   struct ib_phys_buf *ipb = r_xprt->ipb;
int len, i, rc = 0;
  
  	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)

@@ -1827,7 +1828,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
  
  	/* Default registration each time */

default:
-   rc = rpcrdma_register_default_external(seg, , writing, 
ia);
+   rc = rpcrdma_register_default_external(r_xprt, seg, ,
+   writing, ia);
break;
}
if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cc1445d..d7b440f 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -269,7 +269,8 @@ struct rpcrdma_stats {
   * for convenience. This structure need not be visible externally.
   *
   * It is allocated and initialized during mount, and released
- * during unmount.
+ * during unmount. Access to this structure is serialized by XPRT_LOCKED
+ * in xprt_reserve_xprt().
   */
  struct rpcrdma_xprt {
struct rpc_xprt xprt;
@@ -279,6 +280,8 @@ struct rpcrdma_xprt {
struct rpcrdma_create_data_internal rx_data;
struct delayed_work rdma_connect;
struct rpcrdma_statsrx_stats;
+   /* temp work array */
+   struct ib_phys_buf  ipb[RPCRDMA_MAX_DATA_SEGS];
  };
  
  #define rpcx_to_rdmax(x) container_of(x, struct rpcrd

Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-11 Thread Tom Tucker

On 3/11/13 4:25 PM, J. Bruce Fields wrote:

On Mon, Mar 11, 2013 at 03:15:08PM -0600, Tim Gardner wrote:

rpcrdma_register_default_external() is several frames into the call stack which
goes deeper yet. You run the risk of stack corruption by declaring such a large
automatic variable, so move the array of 'struct ib_phys_buf' objects into the
transport structure 'struct rpcrdma_xprt' (which is dynamically allocated) in
order to silence the frame-larger-than warning. Access to each struct
rpcrdma_xprt is serialized by XPRT_LOCKED in xprt_reserve_xprt(), so there is
no danger of multiple accessors to the array of struct ib_phys_buf objects.

net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external':
net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is 
larger than 1024 bytes [-Wframe-larger-than=]

gcc version 4.6.3

Cc: Trond Myklebust trond.mykleb...@netapp.com
Cc: J. Bruce Fields bfie...@fieldses.org
Cc: David S. Miller da...@davemloft.net
Cc: Tom Tucker t...@ogc.us
Cc: Haggai Eran hagg...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Cc: Shani Michaeli sha...@mellanox.com
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Signed-off-by: Tim Gardner tim.gard...@canonical.com
---
v1 - Use kmalloc() to dynamically allocate and free the array of 'struct
ib_phys_buf' objects

v2 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_req
and pass this request down through rpcrdma_register_external() and
rpcrdma_register_default_external(). This is less overhead then using
kmalloc() and requires no extra error checking as the allocation burden is
shifted to the transport client.

v3 - Move the array of 'struct ib_phys_buf' objects into struct rpcrdma_xprt.
Pass a pointer to this transport structure into 
rpcrdma_register_default_external().
This is less overhead then using kmalloc() and requires no extra error checking
as the allocation burden is shifted to the transport client.

Looks good to me; wish we could get it tested


I will test it. Tim could you please send me a final version that you'd 
like tested as a single message?


Would someone (like Tim maybe ... hint hint) look at tearing out all those 
dead registration strategies? I don't think we need or will ever use 
bounce-buffers, memory windows, or mlnx fmr.  The only two that are used 
and tested are all-phys and FRMR (the default).


Tom


In future if we do decide to also increase the size of that array we may
need to allocate it separately from struct rpcrdma_xprt itself, which
looks already fairly large without it; on x86_64:

$ gdb net/sunrpc/xprtrdma/xprtrdma.ko
...
(gdb) p sizeof(struct rpcrdma_xprt)
$1 = 2912

But that shouldn't be a big deal to do.

--b.


  net/sunrpc/xprtrdma/verbs.c |   10 ++
  net/sunrpc/xprtrdma/xprt_rdma.h |5 -
  2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 93726560..c7aa3da 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1730,13 +1730,14 @@ rpcrdma_deregister_memwin_external(struct 
rpcrdma_mr_seg *seg,
  }
  
  static int

-rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
-   int *nsegs, int writing, struct rpcrdma_ia *ia)
+rpcrdma_register_default_external(struct rpcrdma_xprt *r_xprt,
+   struct rpcrdma_mr_seg *seg, int *nsegs, int writing,
+   struct rpcrdma_ia *ia)
  {
int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
  IB_ACCESS_REMOTE_READ);
struct rpcrdma_mr_seg *seg1 = seg;
-   struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
+   struct ib_phys_buf *ipb = r_xprt-ipb;
int len, i, rc = 0;
  
  	if (*nsegs  RPCRDMA_MAX_DATA_SEGS)

@@ -1827,7 +1828,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
  
  	/* Default registration each time */

default:
-   rc = rpcrdma_register_default_external(seg, nsegs, writing, 
ia);
+   rc = rpcrdma_register_default_external(r_xprt, seg, nsegs,
+   writing, ia);
break;
}
if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cc1445d..d7b440f 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -269,7 +269,8 @@ struct rpcrdma_stats {
   * for convenience. This structure need not be visible externally.
   *
   * It is allocated and initialized during mount, and released
- * during unmount.
+ * during unmount. Access to this structure is serialized by XPRT_LOCKED
+ * in xprt_reserve_xprt().
   */
  struct rpcrdma_xprt {
struct rpc_xprt xprt;
@@ -279,6 +280,8 @@ struct rpcrdma_xprt {
struct rpcrdma_create_data_internal rx_data;
struct delayed_work rdma_connect;
struct rpcrdma_statsrx_stats

Re: [PATCH linux-next v3] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-11 Thread Tom Tucker


AhOk

On 3/11/13 9:53 PM, Tim Gardner wrote:

On 03/11/2013 05:02 PM, Tom Tucker wrote:

On 3/11/13 4:25 PM, J. Bruce Fields wrote:


snip


Looks good to me; wish we could get it tested


I will test it. Tim could you please send me a final version that you'd
like tested as a single message?



I'm a little confused about what you are asking. I think v3 of the patch 
is the final version (unless you find bugs with it).



Would someone (like Tim maybe ... hint hint) look at tearing out all
those dead registration strategies? I don't think we need or will ever
use bounce-buffers, memory windows, or mlnx fmr.  The only two that are
used and tested are all-phys and FRMR (the default).



Dunno if I'll get time for that. I had a one day window where I could 
hack out some simple patches. Now I'm back to the usual grindstone.


rtg


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-10 Thread Tom Tucker


This is the result of 2773395b34883fe54418de188733a63bb38e0ad6. Steve 
might want to weigh in on this since it was done for performance reasons.


Tom

On 3/10/13 10:39 AM, Tim Gardner wrote:

rpcrdma_register_default_external() is several frames into
the call stack which goes deeper yet. You run the risk of stack
corruption by declaring such a large automatic variable,
so dynamically allocate the array of 'struct ib_phys_buf' objects in
order to silence the frame-larger-than warning.

net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external':
net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is 
larger than 1024 bytes [-Wframe-larger-than=]

gcc version 4.6.3

Cc: Trond Myklebust 
Cc: "J. Bruce Fields" 
Cc: "David S. Miller" 
Cc: Tom Tucker 
Cc: Haggai Eran 
Cc: Or Gerlitz 
Cc: Shani Michaeli 
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
  net/sunrpc/xprtrdma/verbs.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 93726560..0916467 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1736,9 +1736,13 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg 
*seg,
int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
  IB_ACCESS_REMOTE_READ);
struct rpcrdma_mr_seg *seg1 = seg;
-   struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
+   struct ib_phys_buf *ipb;
int len, i, rc = 0;
  
+	ipb = kmalloc(sizeof(*ipb) * RPCRDMA_MAX_DATA_SEGS, GFP_KERNEL);

+   if (!ipb)
+   return -ENOMEM;
+
if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
*nsegs = RPCRDMA_MAX_DATA_SEGS;
for (len = 0, i = 0; i < *nsegs;) {
@@ -1770,6 +1774,7 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg 
*seg,
seg1->mr_len = len;
}
*nsegs = i;
+   kfree(ipb);
return rc;
  }
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] SUNRPC: rpcrdma_register_default_external: Dynamically allocate ib_phys_buf

2013-03-10 Thread Tom Tucker


This is the result of 2773395b34883fe54418de188733a63bb38e0ad6. Steve 
might want to weigh in on this since it was done for performance reasons.


Tom

On 3/10/13 10:39 AM, Tim Gardner wrote:

rpcrdma_register_default_external() is several frames into
the call stack which goes deeper yet. You run the risk of stack
corruption by declaring such a large automatic variable,
so dynamically allocate the array of 'struct ib_phys_buf' objects in
order to silence the frame-larger-than warning.

net/sunrpc/xprtrdma/verbs.c: In function 'rpcrdma_register_default_external':
net/sunrpc/xprtrdma/verbs.c:1774:1: warning: the frame size of 1056 bytes is 
larger than 1024 bytes [-Wframe-larger-than=]

gcc version 4.6.3

Cc: Trond Myklebust trond.mykleb...@netapp.com
Cc: J. Bruce Fields bfie...@fieldses.org
Cc: David S. Miller da...@davemloft.net
Cc: Tom Tucker t...@ogc.us
Cc: Haggai Eran hagg...@mellanox.com
Cc: Or Gerlitz ogerl...@mellanox.com
Cc: Shani Michaeli sha...@mellanox.com
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Signed-off-by: Tim Gardner tim.gard...@canonical.com
---
  net/sunrpc/xprtrdma/verbs.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 93726560..0916467 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1736,9 +1736,13 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg 
*seg,
int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
  IB_ACCESS_REMOTE_READ);
struct rpcrdma_mr_seg *seg1 = seg;
-   struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
+   struct ib_phys_buf *ipb;
int len, i, rc = 0;
  
+	ipb = kmalloc(sizeof(*ipb) * RPCRDMA_MAX_DATA_SEGS, GFP_KERNEL);

+   if (!ipb)
+   return -ENOMEM;
+
if (*nsegs  RPCRDMA_MAX_DATA_SEGS)
*nsegs = RPCRDMA_MAX_DATA_SEGS;
for (len = 0, i = 0; i  *nsegs;) {
@@ -1770,6 +1774,7 @@ rpcrdma_register_default_external(struct rpcrdma_mr_seg 
*seg,
seg1-mr_len = len;
}
*nsegs = i;
+   kfree(ipb);
return rc;
  }
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Tom Tucker

On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote:
> On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
> > (suitable cc added)
> > 
> > (regression)
> > 
> > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > Hello!
> > > 
> > > After upgrading to 2.6.24 (from .23), we're seeing ALOT
> > > of messages like in $subj in dmesg:
> > > 
> > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> > > ...
> > > 
> > > with linux NFS server.  The clients are all linux too, mostly 2.6.23
> > > and some 2.6.22.
> > > 
> > > I found the "offending" piece of code in net/sunrpc/svcsock.c,
> > > in routine svc_tcp_recvfrom() with condition being:
> > > 
> > >if (svsk->sk_reclen > serv->sv_max_mesg) ...
> 
> The problem might be that the client is setting a bit in the RPC message
> length field that is meant to be interpreted and masked off by the
> server -- and we're not doing it yet. My bet is that 0x2 is the bit
> we're looking for. I'll poke around...

Never mind. The way this is supposed to work is that the transport is
shut down. The code used to delete the socket directly, but I
reorganized this code to just set the XPT_CLOSE bit and let the normal
close path handle it when it came back through to retry. I'm not sure
exactly what version of the code you have, but it may be that your
missing the code from the close path that does this, or it may just not
work. As a first shot, can you try this patch and tell me if the
messages go away?


diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 1d3e5fc..cf6150a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 
  err_delete:
set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
+   svc_delete_xprt(>sk_xprt);
return -EAGAIN;
 
  error:


> 
> > > 
> > > This happens after a server reboot.  At this point, client(s) are trying
> > > to perform some NFS transaction and fail, and server starts generating
> > > the above messages - till I do a umount followed by mount on all clients.
> > > Before, such situation (nfs server reboot) were handled transparently,
> > > ie, there was nothing to do, the mount continued working just fine when
> > > the server comes back online.
> > > 
> > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> > > problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
> > > package, and the remount-after-nfs-server-reboot problem started to
> > > occur at THAT time (and it is something to worry about as well, I just
> > > had no time to deal with it); but the dmesg spamming only appeared
> > > with 2.6.24.
> > > 
> > > How to debug the issue further on from this point?
> > > 
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Tom Tucker

On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
> (suitable cc added)
> 
> (regression)
> 
> On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <[EMAIL PROTECTED]> wrote:
> 
> > Hello!
> > 
> > After upgrading to 2.6.24 (from .23), we're seeing ALOT
> > of messages like in $subj in dmesg:
> > 
> > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> > ...
> > 
> > with linux NFS server.  The clients are all linux too, mostly 2.6.23
> > and some 2.6.22.
> > 
> > I found the "offending" piece of code in net/sunrpc/svcsock.c,
> > in routine svc_tcp_recvfrom() with condition being:
> > 
> >if (svsk->sk_reclen > serv->sv_max_mesg) ...

The problem might be that the client is setting a bit in the RPC message
length field that is meant to be interpreted and masked off by the
server -- and we're not doing it yet. My bet is that 0x2 is the bit
we're looking for. I'll poke around...

> > 
> > This happens after a server reboot.  At this point, client(s) are trying
> > to perform some NFS transaction and fail, and server starts generating
> > the above messages - till I do a umount followed by mount on all clients.
> > Before, such situation (nfs server reboot) were handled transparently,
> > ie, there was nothing to do, the mount continued working just fine when
> > the server comes back online.
> > 
> > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> > problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
> > package, and the remount-after-nfs-server-reboot problem started to
> > occur at THAT time (and it is something to worry about as well, I just
> > had no time to deal with it); but the dmesg spamming only appeared
> > with 2.6.24.
> > 
> > How to debug the issue further on from this point?
> > 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Tom Tucker

On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
 (suitable cc added)
 
 (regression)
 
 On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev [EMAIL PROTECTED] wrote:
 
  Hello!
  
  After upgrading to 2.6.24 (from .23), we're seeing ALOT
  of messages like in $subj in dmesg:
  
  Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
  Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
  Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
  Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
  Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
  Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
  ...
  
  with linux NFS server.  The clients are all linux too, mostly 2.6.23
  and some 2.6.22.
  
  I found the offending piece of code in net/sunrpc/svcsock.c,
  in routine svc_tcp_recvfrom() with condition being:
  
 if (svsk-sk_reclen  serv-sv_max_mesg) ...

The problem might be that the client is setting a bit in the RPC message
length field that is meant to be interpreted and masked off by the
server -- and we're not doing it yet. My bet is that 0x2 is the bit
we're looking for. I'll poke around...

  
  This happens after a server reboot.  At this point, client(s) are trying
  to perform some NFS transaction and fail, and server starts generating
  the above messages - till I do a umount followed by mount on all clients.
  Before, such situation (nfs server reboot) were handled transparently,
  ie, there was nothing to do, the mount continued working just fine when
  the server comes back online.
  
  Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
  problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
  package, and the remount-after-nfs-server-reboot problem started to
  occur at THAT time (and it is something to worry about as well, I just
  had no time to deal with it); but the dmesg spamming only appeared
  with 2.6.24.
  
  How to debug the issue further on from this point?
  
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Tom Tucker

On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote:
 On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
  (suitable cc added)
  
  (regression)
  
  On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev [EMAIL PROTECTED] 
  wrote:
  
   Hello!
   
   After upgrading to 2.6.24 (from .23), we're seeing ALOT
   of messages like in $subj in dmesg:
   
   Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
   Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
   Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
   Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
   Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
   Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
   ...
   
   with linux NFS server.  The clients are all linux too, mostly 2.6.23
   and some 2.6.22.
   
   I found the offending piece of code in net/sunrpc/svcsock.c,
   in routine svc_tcp_recvfrom() with condition being:
   
  if (svsk-sk_reclen  serv-sv_max_mesg) ...
 
 The problem might be that the client is setting a bit in the RPC message
 length field that is meant to be interpreted and masked off by the
 server -- and we're not doing it yet. My bet is that 0x2 is the bit
 we're looking for. I'll poke around...

Never mind. The way this is supposed to work is that the transport is
shut down. The code used to delete the socket directly, but I
reorganized this code to just set the XPT_CLOSE bit and let the normal
close path handle it when it came back through to retry. I'm not sure
exactly what version of the code you have, but it may be that your
missing the code from the close path that does this, or it may just not
work. As a first shot, can you try this patch and tell me if the
messages go away?


diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 1d3e5fc..cf6150a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 
  err_delete:
set_bit(XPT_CLOSE, svsk-sk_xprt.xpt_flags);
+   svc_delete_xprt(svsk-sk_xprt);
return -EAGAIN;
 
  error:


 
   
   This happens after a server reboot.  At this point, client(s) are trying
   to perform some NFS transaction and fail, and server starts generating
   the above messages - till I do a umount followed by mount on all clients.
   Before, such situation (nfs server reboot) were handled transparently,
   ie, there was nothing to do, the mount continued working just fine when
   the server comes back online.
   
   Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
   problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
   package, and the remount-after-nfs-server-reboot problem started to
   occur at THAT time (and it is something to worry about as well, I just
   had no time to deal with it); but the dmesg spamming only appeared
   with 2.6.24.
   
   How to debug the issue further on from this point?
   
  
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-nfs in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] iommu dma mapping alignment requirements

2007-12-20 Thread Tom Tucker

On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote:
> Hey Roland (and any iommu/ppc/dma experts out there):
> 
> I'm debugging a data corruption issue that happens on PPC64 systems 
> running rdma on kernels where the iommu page size is 4KB yet the host 
> page size is 64KB.  This "feature" was added to the PPC64 code recently, 
> and is in kernel.org from 2.6.23.  So if the kernel is built with a 4KB 
> page size, no problems.  If the kernel is prior to 2.6.23 then 64KB page 
>   configs work too. Its just a problem when the iommu page size != host 
> page size.
> 
> It appears that my problem boils down to a single host page of memory 
> that is mapped for dma, and the dma address returned by dma_map_sg() is 
> _not_ 64KB aligned.  Here is an example:
> 
> app registers va 0x2d9a3000 len 12288
> ib_umem_get() creates and maps a umem and chunk that looks like (dumping 
> state from a registered user memory region):
> 
> > umem len 12288 off 12288 pgsz 65536 shift 16
> > chunk 0: nmap 1 nents 1
> > sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 
> > 5bff4000 dma_len 65536
> > 
> 
> So the kernel maps 1 full page for this MR.  But note that the dma 
> address is 5bff4000 which is 4KB aligned, not 64KB aligned.  I 
> think this is causing grief to the RDMA HW.
> 
> My first question is: Is there an assumption or requirement in linux 
> that dma_addressess should have the same alignment as the host address 
> they are mapped to?  IE the rdma core is mapping the entire 64KB page, 
> but the mapping doesn't begin on a 64KB page boundary.
> 
> If this mapping is considered valid, then perhaps the rdma hw is at 
> fault here.  But I'm wondering if this is an PPC/iommu bug.
> 
> BTW:  Here is what the Memory Region looks like to the HW:
> 
> > TPT entry:  stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2
> > perms RW rem_inv_dis 0 addr_type VATO
> > bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0
> > len 12288 va 2d9a3000 bind_cnt 0
> > PBL: 5bff4000
> 
> 
> 
> Any thoughts?

The Ammasso certainly works this way. If you tell it the page size is
64KB, it will ignore bits in the page address that encode 0-65535.

> 
> Steve.
> 
> 
> ___
> general mailing list
> [EMAIL PROTECTED]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] iommu dma mapping alignment requirements

2007-12-20 Thread Tom Tucker

On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote:
 Hey Roland (and any iommu/ppc/dma experts out there):
 
 I'm debugging a data corruption issue that happens on PPC64 systems 
 running rdma on kernels where the iommu page size is 4KB yet the host 
 page size is 64KB.  This feature was added to the PPC64 code recently, 
 and is in kernel.org from 2.6.23.  So if the kernel is built with a 4KB 
 page size, no problems.  If the kernel is prior to 2.6.23 then 64KB page 
   configs work too. Its just a problem when the iommu page size != host 
 page size.
 
 It appears that my problem boils down to a single host page of memory 
 that is mapped for dma, and the dma address returned by dma_map_sg() is 
 _not_ 64KB aligned.  Here is an example:
 
 app registers va 0x2d9a3000 len 12288
 ib_umem_get() creates and maps a umem and chunk that looks like (dumping 
 state from a registered user memory region):
 
  umem len 12288 off 12288 pgsz 65536 shift 16
  chunk 0: nmap 1 nents 1
  sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 
  5bff4000 dma_len 65536
  
 
 So the kernel maps 1 full page for this MR.  But note that the dma 
 address is 5bff4000 which is 4KB aligned, not 64KB aligned.  I 
 think this is causing grief to the RDMA HW.
 
 My first question is: Is there an assumption or requirement in linux 
 that dma_addressess should have the same alignment as the host address 
 they are mapped to?  IE the rdma core is mapping the entire 64KB page, 
 but the mapping doesn't begin on a 64KB page boundary.
 
 If this mapping is considered valid, then perhaps the rdma hw is at 
 fault here.  But I'm wondering if this is an PPC/iommu bug.
 
 BTW:  Here is what the Memory Region looks like to the HW:
 
  TPT entry:  stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2
  perms RW rem_inv_dis 0 addr_type VATO
  bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0
  len 12288 va 2d9a3000 bind_cnt 0
  PBL: 5bff4000
 
 
 
 Any thoughts?

The Ammasso certainly works this way. If you tell it the page size is
64KB, it will ignore bits in the page address that encode 0-65535.

 
 Steve.
 
 
 ___
 general mailing list
 [EMAIL PROTECTED]
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker


On 11/27/07 7:27 PM, "Rusty Russell" <[EMAIL PROTECTED]> wrote:

> On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote:
>> On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
>> Explicitly documenting what comprises the kernel API (external,
>> supported) and what comprises the kernel implementation (internal, not
>> supported).
> 
> But the former is currently an empty set.
> 

Yes, I overstated this.

>> - making it obvious to developers when they are binding their
>> implementation to a particular kernel release
> 
> See, there's your problem.  All interfaces can, and will, change.  You're
> always binding yourself to a particular release.
> 

Absolutely in the limit. But there are many bits of code that work quite
nicely from release to release because they use services that live in the
smooth water in the wake of the Linux head.

I think defining that smooth water has merit. I also think that it would be
nice to limit the scope of module externs to avoid polluting the global
namespace. I'm not sure that this particular patch reaches these goals, but
it prompted me to comment.

> So you're not proposing we mark what's not stable, you're arguing that we
> create a subset which is stable.
> 

Well, this is an interesting question. The answer is I think both are
important. It would be nice (and arguably necessary long term) to limit the
scope of externs. This can be accomplished with name spaces "I want bob's
implementation of read."

I think it also has value to define interfaces that are considered stable
(but not inviolate) to allow developers to make better informed decisions
when choosing interfaces. Having this info explicit in the code seems
logical to me.

> That's an argument we're not (yet) having.
> 

Yeah, maybe I'm off in the weeds on this one...

Tom

> Cheers,
> Rusty.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker

On Tue, 2007-11-27 at 18:15 +0100, Adrian Bunk wrote:
> On Mon, Nov 26, 2007 at 11:35:42PM -0600, Tom Tucker wrote:
> > On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
> >...
> > > No.  That's the wrong question.  What's the real upside?
> > 
> > Explicitly documenting what comprises the kernel API (external,
> > supported) and what comprises the kernel implementation (internal, not
> > supported).
> >...
> 
> There is not, never was, and never will be, any supported external API 
> of the kernel.

Philosophically I understand what you're saying, but in practical terms
there is the issue of managing core API like kmalloc. Although kmalloc
_could_ change, doing so would be extremely painful. In fact anyone who
proposed such a change would have to have a profoundly powerful argument
as to why it was necessary.

I think this patchset is an attempt to make it easier to identify and
review these kinds of interfaces.

>  
> cu
> Adrian
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker

On Tue, 2007-11-27 at 18:15 +0100, Adrian Bunk wrote:
 On Mon, Nov 26, 2007 at 11:35:42PM -0600, Tom Tucker wrote:
  On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
 ...
   No.  That's the wrong question.  What's the real upside?
  
  Explicitly documenting what comprises the kernel API (external,
  supported) and what comprises the kernel implementation (internal, not
  supported).
 ...
 
 There is not, never was, and never will be, any supported external API 
 of the kernel.

Philosophically I understand what you're saying, but in practical terms
there is the issue of managing core API like kmalloc. Although kmalloc
_could_ change, doing so would be extremely painful. In fact anyone who
proposed such a change would have to have a profoundly powerful argument
as to why it was necessary.

I think this patchset is an attempt to make it easier to identify and
review these kinds of interfaces.

  
 cu
 Adrian
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker


On 11/27/07 7:27 PM, Rusty Russell [EMAIL PROTECTED] wrote:

 On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote:
 On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
 Explicitly documenting what comprises the kernel API (external,
 supported) and what comprises the kernel implementation (internal, not
 supported).
 
 But the former is currently an empty set.
 

Yes, I overstated this.

 - making it obvious to developers when they are binding their
 implementation to a particular kernel release
 
 See, there's your problem.  All interfaces can, and will, change.  You're
 always binding yourself to a particular release.
 

Absolutely in the limit. But there are many bits of code that work quite
nicely from release to release because they use services that live in the
smooth water in the wake of the Linux head.

I think defining that smooth water has merit. I also think that it would be
nice to limit the scope of module externs to avoid polluting the global
namespace. I'm not sure that this particular patch reaches these goals, but
it prompted me to comment.

 So you're not proposing we mark what's not stable, you're arguing that we
 create a subset which is stable.
 

Well, this is an interesting question. The answer is I think both are
important. It would be nice (and arguably necessary long term) to limit the
scope of externs. This can be accomplished with name spaces I want bob's
implementation of read.

I think it also has value to define interfaces that are considered stable
(but not inviolate) to allow developers to make better informed decisions
when choosing interfaces. Having this info explicit in the code seems
logical to me.

 That's an argument we're not (yet) having.
 

Yeah, maybe I'm off in the weeds on this one...

Tom

 Cheers,
 Rusty.
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-26 Thread Tom Tucker

On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
> On Monday 26 November 2007 17:15:44 Roland Dreier wrote:
> >  > Except C doesn't have namespaces and this mechanism doesn't create them.
> >  >  So this is just complete and utter makework; as I said before, noone's
> >  > going to confuse all those udp_* functions if they're not in the udp
> >  > namespace.
> >
> > I don't understand why you're so opposed to organizing the kernel's
> > exported symbols in a more self-documenting way.
> 
> No, I was the one who moved exports near their declarations.  That's 
> organised.  I just don't see how this new "organization" will help: oh good, 
> I won't accidentally use the udp functions any more?!?
> 
> > It seems pretty   
> > clear to me that having a mechanism that requires modules to make
> > explicit which (semi-)internal APIs makes reviewing easier
> 
> Perhaps you've got lots of patches were people are using internal APIs they 
> shouldn't?
> 

Maybe the issue is "who can tell" since what is external and what is
internal is not explicitly defined?

> > , makes it 
> > easier to communicate "please don't use that API" to module authors,
> 
> Well, introduce an EXPORT_SYMBOL_INTERNAL().  It's a lot less code.  But 
> you'd 
> still need to show that people are having trouble knowing what APIs to use.

> > and takes at least a small step towards bringing the kernel's exported
> > API under control.
> 
> There is no "exported API" to bring under control.  

Hmm...apparently, there are those that are struggling...

> There are symbols we 
> expose for the kernel's own use which can be used by external modules at 
> their own risk.  
> 
> > What's the real downside? 
> 
> No.  That's the wrong question.  What's the real upside?

Explicitly documenting what comprises the kernel API (external,
supported) and what comprises the kernel implementation (internal, not
supported).

> 
> Let's not put code in the core because "it doesn't seem to hurt".
> 

agreed.

> I'm sure you think there's a real problem, but I'm still waiting for someone 
> to *show* it to me.  Then we can look at solutions.

I think the benefits should include:

- forcing developers to identify their exports as part of the
implementation or as part of the kernel API

- making it easier for reviewers to identify when developers are adding
to the kernel API and thereby focusing the appropriate level of review
to the new function

- making it obvious to developers when they are binding their
implementation to a particular kernel release



> Rusty.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-26 Thread Tom Tucker

On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
 On Monday 26 November 2007 17:15:44 Roland Dreier wrote:
Except C doesn't have namespaces and this mechanism doesn't create them.
 So this is just complete and utter makework; as I said before, noone's
going to confuse all those udp_* functions if they're not in the udp
namespace.
 
  I don't understand why you're so opposed to organizing the kernel's
  exported symbols in a more self-documenting way.
 
 No, I was the one who moved exports near their declarations.  That's 
 organised.  I just don't see how this new organization will help: oh good, 
 I won't accidentally use the udp functions any more?!?
 
  It seems pretty   
  clear to me that having a mechanism that requires modules to make
  explicit which (semi-)internal APIs makes reviewing easier
 
 Perhaps you've got lots of patches were people are using internal APIs they 
 shouldn't?
 

Maybe the issue is who can tell since what is external and what is
internal is not explicitly defined?

  , makes it 
  easier to communicate please don't use that API to module authors,
 
 Well, introduce an EXPORT_SYMBOL_INTERNAL().  It's a lot less code.  But 
 you'd 
 still need to show that people are having trouble knowing what APIs to use.

  and takes at least a small step towards bringing the kernel's exported
  API under control.
 
 There is no exported API to bring under control.  

Hmm...apparently, there are those that are struggling...

 There are symbols we 
 expose for the kernel's own use which can be used by external modules at 
 their own risk.  
 
  What's the real downside? 
 
 No.  That's the wrong question.  What's the real upside?

Explicitly documenting what comprises the kernel API (external,
supported) and what comprises the kernel implementation (internal, not
supported).

 
 Let's not put code in the core because it doesn't seem to hurt.
 

agreed.

 I'm sure you think there's a real problem, but I'm still waiting for someone 
 to *show* it to me.  Then we can look at solutions.

I think the benefits should include:

- forcing developers to identify their exports as part of the
implementation or as part of the kernel API

- making it easier for reviewers to identify when developers are adding
to the kernel API and thereby focusing the appropriate level of review
to the new function

- making it obvious to developers when they are binding their
implementation to a particular kernel release



 Rusty.
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

> > I think removing the RDMA stack is the wrong thing to do, and you 
> > shouldn't just threaten to yank entire subsystems because you don't like 
> > the technology.  Lets keep this constructive, can we?  RDMA should get 
> > the respect of any other technology in Linux.  Maybe its a niche in your 
> > opinion, but come on, there's more RDMA users than say, the sparc64 
> > port.  Eh?
> 
> It's not about being a niche.  It's about creating a maintainable 
> software net stack that has predictable behavior.

Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

> 
> Needing to reach out of the RDMA sandbox and reserve net stack resources 
> away from itself travels a path we've consistently avoided.
> 
> 
> >> I will NACK any patch that opens up sockets to eat up ports or
> >> anything stupid like that.
> > 
> > Got it.
> 
> Ditto for me as well.
> 
>   Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

  I think removing the RDMA stack is the wrong thing to do, and you 
  shouldn't just threaten to yank entire subsystems because you don't like 
  the technology.  Lets keep this constructive, can we?  RDMA should get 
  the respect of any other technology in Linux.  Maybe its a niche in your 
  opinion, but come on, there's more RDMA users than say, the sparc64 
  port.  Eh?
 
 It's not about being a niche.  It's about creating a maintainable 
 software net stack that has predictable behavior.

Isn't RDMA _part_ of the software net stack within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

 
 Needing to reach out of the RDMA sandbox and reserve net stack resources 
 away from itself travels a path we've consistently avoided.
 
 
  I will NACK any patch that opens up sockets to eat up ports or
  anything stupid like that.
  
  Got it.
 
 Ditto for me as well.
 
   Jeff
 
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/