Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-06 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-06 Thread Sagi Grimberg

Looks fine,

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-03 Thread Bart Van Assche
On 12/03/2015 01:18 AM, Christoph Hellwig wrote:
> The patch looks good to me, but while we touch this area, how about
> throwing in a few cosmetic fixes as well?
 
How about the patch below ? In that version of the ib_sg_to_pages() fix
these concerns have been addressed and additionally to more bugs have been 
fixed.



[PATCH] IB core: Fix ib_sg_to_pages()

Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary.

In the code for coalescing contiguous elements, ensure that
mr->length is correct and that last_page_addr is up-to-date.

Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.

Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig 
---
 drivers/infiniband/core/verbs.c | 43 +
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..545906d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1516,7 +1516,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * @sg_nents:  number of entries in sg
  * @set_page:  driver page assignment function pointer
  *
- * Core service helper for drivers to covert the largest
+ * Core service helper for drivers to convert the largest
  * prefix of given sg list to a page vector. The sg list
  * prefix converted is the prefix that meet the requirements
  * of ib_map_mr_sg.
@@ -1533,7 +1533,7 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 last_end_dma_addr = 0, last_page_addr = 0;
unsigned int last_page_off = 0;
u64 page_mask = ~((u64)mr->page_size - 1);
-   int i;
+   int i, ret;
 
mr->iova = sg_dma_address(&sgl[0]);
mr->length = 0;
@@ -1544,27 +1544,29 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 end_dma_addr = dma_addr + dma_len;
u64 page_addr = dma_addr & page_mask;
 
-   if (i && page_addr != dma_addr) {
-   if (last_end_dma_addr != dma_addr) {
-   /* gap */
-   goto done;
-
-   } else if (last_page_off + dma_len <= mr->page_size) {
-   /* chunk this fragment with the last */
-   mr->length += dma_len;
-   last_end_dma_addr += dma_len;
-   last_page_off += dma_len;
-   continue;
-   } else {
-   /* map starting from the next page */
-   page_addr = last_page_addr + mr->page_size;
-   dma_len -= mr->page_size - last_page_off;
-   }
+   /*
+* For the second and later elements, check whether either the
+* end of element i-1 or the start of element i is not aligned
+* on a page boundary.
+*/
+   if (i && (last_page_off != 0 || page_addr != dma_addr)) {
+   /* Stop mapping if there is a gap. */
+   if (last_end_dma_addr != dma_addr)
+   break;
+
+   /*
+* Coalesce this element with the last. If it is small
+* enough just update mr->length. Otherwise start
+* mapping from the next page.
+*/
+   goto next_page;
}
 
do {
-   if (unlikely(set_page(mr, page_addr)))
-   goto done;
+   ret = set_page(mr, page_addr);
+   if (unlikely(ret < 0))
+   return i ? : ret;
+next_page:
page_addr += mr->page_size;
} while (page_addr < end_dma_addr);
 
@@ -1574,7 +1576,6 @@ int ib_sg_to_pages(struct ib_mr *mr,
last_page_off = end_dma_addr & ~page_mask;
}
 
-done:
return i;
 }
 EXPORT_SYMBOL(ib_sg_to_pages);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-03 Thread Christoph Hellwig
> How about the patch below ?

The patch looks good to me, but while we touch this area, how about
throwing in a few cosmetic fixes as well?

> - if (i && page_addr != dma_addr) {
> + if (i && (page_addr != dma_addr || last_page_off != 0)) {
>   if (last_end_dma_addr != dma_addr) {

Wo about we one or two sentences for each of the conditions here?

>   /* gap */
> - goto done;
> -
> + break;
>   } else if (last_page_off + dma_len <= mr->page_size) {
>   /* chunk this fragment with the last */
>   mr->length += dma_len;

It would be great to avoid the else clauses if we already to a
break/continue/goto to make the code flow more clear, e.g.


/*
 * Gap to the previous segment, we'll need to return
 * and use another FR to map the reminder.
 */
if (last_end_dma_addr != dma_addr)
break;

/*
 * See if this segment is contiguous to the
 * previous one and just merge it in that case.
 */
if (last_page_off + dma_len <= mr->page_size) {
last_end_dma_addr += dma_len;
last_page_off += dma_len;
mr->length += dma_len;
continue;
}

/*
 * New page-aligned segment to map:
 */
page_addr = last_page_addr + mr->page_size;
dma_len -= mr->page_size - last_page_off;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-03 Thread Sagi Grimberg



Hello Sagi,

Hmm ... why would it be unacceptable to return 0 if sg_nents == 0 ?

Regarding which component to modify if mapping the first page fails:
for almost every kernel function I know a negative return value means
failure and a return value >= 0 means success. Hence my proposal to
change the return value of the ib_map_mr_sg() function if mapping the
first page fails.


I'm fine with that.



How about the patch below ?


Looks fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-02 Thread Bart Van Assche
On 12/02/2015 01:31 AM, Sagi Grimberg wrote:
> On 01/12/2015 21:10, Bart Van Assche wrote:
>> On 12/01/2015 10:32 AM, Sagi Grimberg wrote:
>> How ib_sg_to_pages() reports to its caller that mapping the first
>> scatterlist element failed is not important to me. I included that
>> change in this patch because I noticed that in the upstream SRP
>> initiator does not consider ib_map_mr_sg() returning zero as an error. I
>> think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such
>> that "no pages have been mapped" is handled as a mapping failure.
>
> I don't either, but the patch introduces inconsistency in a case where
> the caller passed sg_nents=0 (which will return 0 and not -errno).
>
> Would it make better sense to have srp treat 0 return as error? note
> that all other ULPs treat return_val != sg_nents as error.

Hello Sagi,

Hmm ... why would it be unacceptable to return 0 if sg_nents == 0 ?

Regarding which component to modify if mapping the first page fails:
for almost every kernel function I know a negative return value means
failure and a return value >= 0 means success. Hence my proposal to
change the return value of the ib_map_mr_sg() function if mapping the
first page fails.

How about the patch below ?

Bart.



[PATCH] IB core: Fix ib_sg_to_pages()

Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary. Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.

Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig 
---
 drivers/infiniband/core/verbs.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..1972c50 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1516,7 +1516,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * @sg_nents:  number of entries in sg
  * @set_page:  driver page assignment function pointer
  *
- * Core service helper for drivers to covert the largest
+ * Core service helper for drivers to convert the largest
  * prefix of given sg list to a page vector. The sg list
  * prefix converted is the prefix that meet the requirements
  * of ib_map_mr_sg.
@@ -1533,7 +1533,7 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 last_end_dma_addr = 0, last_page_addr = 0;
unsigned int last_page_off = 0;
u64 page_mask = ~((u64)mr->page_size - 1);
-   int i;
+   int i, ret;
 
mr->iova = sg_dma_address(&sgl[0]);
mr->length = 0;
@@ -1544,11 +1544,10 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 end_dma_addr = dma_addr + dma_len;
u64 page_addr = dma_addr & page_mask;
 
-   if (i && page_addr != dma_addr) {
+   if (i && (page_addr != dma_addr || last_page_off != 0)) {
if (last_end_dma_addr != dma_addr) {
/* gap */
-   goto done;
-
+   break;
} else if (last_page_off + dma_len <= mr->page_size) {
/* chunk this fragment with the last */
mr->length += dma_len;
@@ -1563,8 +1562,9 @@ int ib_sg_to_pages(struct ib_mr *mr,
}
 
do {
-   if (unlikely(set_page(mr, page_addr)))
-   goto done;
+   ret = set_page(mr, page_addr);
+   if (unlikely(ret < 0))
+   return i ? : ret;
page_addr += mr->page_size;
} while (page_addr < end_dma_addr);
 
@@ -1574,7 +1574,6 @@ int ib_sg_to_pages(struct ib_mr *mr,
last_page_off = end_dma_addr & ~page_mask;
}
 
-done:
return i;
 }
 EXPORT_SYMBOL(ib_sg_to_pages);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-02 Thread Sagi Grimberg



On 01/12/2015 21:10, Bart Van Assche wrote:

On 12/01/2015 10:32 AM, Sagi Grimberg wrote:

Fix the code for detecting gaps and disable the code for chunking.
A gap occurs not only if the second or later scatterlist element
is not aligned but also if any scatterlist element other than the
last does not end at a page boundary. Disable the code for chunking.


So can you explain what went wrong with the code? If ib_sg_to_pages()
supports chunking, then sg elements are allowed not to end at a page
boundary if it is physically contig to the next sg and then the next
is chunked. Care to explain how did ib_sg_to_pages mess up?


With the upstream implementation, if the previous element ends at a page
boundary (last_end_dma_addr == dma_addr) but the current element does
not start at a page boundary (page_addr != dma_addr) then the current
element is mapped. I think this is wrong because this condition
indicates a gap and hence that ib_sg_to_pages() should stop iterating in
this case.


Why is (last_end_dma_addr == dma_addr) implying that the previous
element ends at a page boundary? it tests if the current is contig
to the prev (dma_addr is not necessarily the page address).

But I think I see whats wrong. If the prev sg didn't end aligned to
page we won't get into the above test at all and continue mapping.

Does this make the problem go away?
diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c

index 043a60e..23457fe 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1544,7 +1544,7 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 end_dma_addr = dma_addr + dma_len;
u64 page_addr = dma_addr & page_mask;

-   if (i && page_addr != dma_addr) {
+   if (i && (page_addr != dma_addr || last_page_off)) {
if (last_end_dma_addr != dma_addr) {
/* gap */
goto done;
--

Note that I don't mind making the chunking code go away if no one wants
it (I think Steve specifically asked for it). I'm just trying to
understand the exact mistake here.


How ib_sg_to_pages() reports to its caller that mapping the first
scatterlist element failed is not important to me. I included that
change in this patch because I noticed that in the upstream SRP
initiator does not consider ib_map_mr_sg() returning zero as an error. I
think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such
that "no pages have been mapped" is handled as a mapping failure.


I don't either, but the patch introduces inconsistency in a case where
the caller passed sg_nents=0 (which will return 0 and not -errno).

Would it make better sense to have srp treat 0 return as error? note
that all other ULPs treat return_val != sg_nents as error.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-01 Thread Bart Van Assche

On 12/01/2015 10:32 AM, Sagi Grimberg wrote:

Fix the code for detecting gaps and disable the code for chunking.
A gap occurs not only if the second or later scatterlist element
is not aligned but also if any scatterlist element other than the
last does not end at a page boundary. Disable the code for chunking.


So can you explain what went wrong with the code? If ib_sg_to_pages()
supports chunking, then sg elements are allowed not to end at a page
boundary if it is physically contig to the next sg and then the next
is chunked. Care to explain how did ib_sg_to_pages mess up?


With the upstream implementation, if the previous element ends at a page 
boundary (last_end_dma_addr == dma_addr) but the current element does 
not start at a page boundary (page_addr != dma_addr) then the current 
element is mapped. I think this is wrong because this condition 
indicates a gap and hence that ib_sg_to_pages() should stop iterating in 
this case.



Ensure that this function returns a negative error code instead of
zero if the first set_page() call fails.


Umm, my recollection was to make ib_map_mr_sg return the largest prefix
mapped. I don't mind a negative error in this case, but isn't zero an
implicit error (given you didn't want to map 0 sg elements)?

If we do agree on this we need to change ib_map_mr_sg documentation
accordingly.


How ib_sg_to_pages() reports to its caller that mapping the first 
scatterlist element failed is not important to me. I included that 
change in this patch because I noticed that in the upstream SRP 
initiator does not consider ib_map_mr_sg() returning zero as an error. I 
think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such 
that "no pages have been mapped" is handled as a mapping failure.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()

2015-12-01 Thread Sagi Grimberg

Hi Bart,


Fix the code for detecting gaps and disable the code for chunking.
A gap occurs not only if the second or later scatterlist element
is not aligned but also if any scatterlist element other than the
last does not end at a page boundary. Disable the code for chunking.


So can you explain what went wrong with the code? If ib_sg_to_pages()
supports chunking, then sg elements are allowed not to end at a page
boundary if it is physically contig to the next sg and then the next
is chunked. Care to explain how did ib_sg_to_pages mess up?


Ensure that this function returns a negative error code instead of
zero if the first set_page() call fails.


Umm, my recollection was to make ib_map_mr_sg return the largest prefix
mapped. I don't mind a negative error in this case, but isn't zero an
implicit error (given you didn't want to map 0 sg elements)?

If we do agree on this we need to change ib_map_mr_sg documentation
accordingly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html