Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-07 Thread Anssi Hannula
On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote:
> Hi Anssi,

Hi!

> On 05.12.2018 16:00, Anssi Hannula wrote:
>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>> On 30.11.2018 20:21, Anssi Hannula wrote:
 When reading buffer descriptors on RX or on TX completion, an
 RX_USED/TX_USED bit is checked first to ensure that the descriptor has
 been populated. However, there are no memory barriers to ensure that the
 data protected by the RX_USED/TX_USED bit is up-to-date with respect to
 that bit.

 Fix that by adding DMA read memory barriers on those paths.

 I did not observe any actual issues caused by these being missing,
 though.

 Tested on a ZynqMP based system.

 Signed-off-by: Anssi Hannula 
 Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
 Cc: Nicolas Ferre 
 ---
  drivers/net/ethernet/cadence/macb_main.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/net/ethernet/cadence/macb_main.c 
 b/drivers/net/ethernet/cadence/macb_main.c
 index 430b7a0f5436..c93baa8621d5 100644
 --- a/drivers/net/ethernet/cadence/macb_main.c
 +++ b/drivers/net/ethernet/cadence/macb_main.c
 @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue 
 *queue)
  
/* First, update TX stats if needed */
if (skb) {
 +  /* Ensure all of desc is at least as up-to-date
 +   * as ctrl (TX_USED bit)
 +   */
 +  dma_rmb();
 +
>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>>> data specific to this descriptor was completed. The TX descriptors for next
>>> data to be send is updated under a locked spinlock.
>> The previous rmb() is before the TX_USED check, so my understanding is
>> that the following could happen in theory:
> We are using this IP on and ARM architecture, so, with regards to rmb(), I
> understand from [1] that dsb completes when:
> "All explicit memory accesses before this instruction complete.
> All Cache, Branch predictor and TLB maintenance operations before this
> instruction complete."
>
>> 1. rmb().
> According to [1] this should end after all previous instructions (loads,
> stores) ends.
>
>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>> are crossed.
> But, as per [1], no onward instruction will be reached until all
> instruction prior to dsb ends, so, after rmb() all descriptor's members
> should be updated, right?

The descriptor that triggered the TX interrupt should be visible now, yes.
However, the controller may be writing to any other descriptors at the
same time as the loop is processing through them, as there are multiple
TX buffers.

>> 3. HW writes timestamp and sets TX_USED (or they become visible).
> I expect hardware to set TX_USED and timestamp before raising TX complete
> interrupt. If so, there should be no on-flight updates of this descriptor,
> right? Hardware raised a TX_USED bit read interrupt when it reads a
> descriptor like this and hangs TX.

For the first iteration of the loop, that is correct - there should be
no in-flight writes from controller as it already raised the interrupt.
However, the following iterations of the loop process descriptors that
may or may not have the interrupt raised yet, and therefore may still
have in-flight writes.

>> 4. Code checks TX_USED.
>> 5. Code operates on timestamp that is actually garbage.
>>
>> I'm not 100% sure that there isn't some lighter/cleaner way to do this
>> than dma_rmb(), though.
> If you still think this scenario could happen why not calling a dsb in
> gem_ptp_do_timestamp(). I feel like that is a proper place to call it.

OK, I will move it there. Unless we arrive at a conclusion that it is
unnecessary altogether, of course :)

> Moreover, there is bit 32 of desc->ctrl which tells you if a valid
> timestamp was placed in the descriptor. But, again, I expect the timestamp
> and TX_USED to be set by hardware before raising TX complete interrupt.

Yes, but since my concern is that without barriers in between,
desc->ctrl might be read after ts_1/ts_2, so that bit might be seen as
set even though ts_1 is not yet an actual timestamp. And per above, all
this may occur before the TX complete interrupt is raised for the
descriptor in question.

I agree that this TX case seems somewhat unlikely to be triggered in
real life at least at present, though, as the ts_1/ts_2 read is so far
after the desc->ctrl read, and behind function calls, so unlikely to be
reordered before desc->ctrl read.
But the similar RX cases below seem more problematic as the racy loads
in question are right after each other in code.

> [1]
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html
>

Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-06 Thread Claudiu.Beznea
Hi Anssi,

On 05.12.2018 16:00, Anssi Hannula wrote:
> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>
>> On 30.11.2018 20:21, Anssi Hannula wrote:
>>> When reading buffer descriptors on RX or on TX completion, an
>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>>> been populated. However, there are no memory barriers to ensure that the
>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>>> that bit.
>>>
>>> Fix that by adding DMA read memory barriers on those paths.
>>>
>>> I did not observe any actual issues caused by these being missing,
>>> though.
>>>
>>> Tested on a ZynqMP based system.
>>>
>>> Signed-off-by: Anssi Hannula 
>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>>> Cc: Nicolas Ferre 
>>> ---
>>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>>> b/drivers/net/ethernet/cadence/macb_main.c
>>> index 430b7a0f5436..c93baa8621d5 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>>>  
>>> /* First, update TX stats if needed */
>>> if (skb) {
>>> +   /* Ensure all of desc is at least as up-to-date
>>> +* as ctrl (TX_USED bit)
>>> +*/
>>> +   dma_rmb();
>>> +
>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>> data specific to this descriptor was completed. The TX descriptors for next
>> data to be send is updated under a locked spinlock.
> 
> The previous rmb() is before the TX_USED check, so my understanding is
> that the following could happen in theory:

We are using this IP on and ARM architecture, so, with regards to rmb(), I
understand from [1] that dsb completes when:
"All explicit memory accesses before this instruction complete.
All Cache, Branch predictor and TLB maintenance operations before this
instruction complete."

> 
> 1. rmb().
According to [1] this should end after all previous instructions (loads,
stores) ends.

> 2. Reads are reordered so that TX timestamp is read first - no barriers
> are crossed.

But, as per [1], no onward instruction will be reached until all
instruction prior to dsb ends, so, after rmb() all descriptor's members
should be updated, right?

> 3. HW writes timestamp and sets TX_USED (or they become visible).

I expect hardware to set TX_USED and timestamp before raising TX complete
interrupt. If so, there should be no on-flight updates of this descriptor,
right? Hardware raised a TX_USED bit read interrupt when it reads a
descriptor like this and hangs TX.

> 4. Code checks TX_USED.
> 5. Code operates on timestamp that is actually garbage.
> 
> I'm not 100% sure that there isn't some lighter/cleaner way to do this
> than dma_rmb(), though.

If you still think this scenario could happen why not calling a dsb in
gem_ptp_do_timestamp(). I feel like that is a proper place to call it.

Moreover, there is bit 32 of desc->ctrl which tells you if a valid
timestamp was placed in the descriptor. But, again, I expect the timestamp
and TX_USED to be set by hardware before raising TX complete interrupt.

[1]
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html

> 
>>> if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
>>> /* skb now belongs to timestamp buffer
>>>  * and will be removed later
>>> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int 
>>> budget)
>>> rmb();
>>>  
>>> rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>>> -   addr = macb_get_addr(bp, desc);
>>> -   ctrl = desc->ctrl;
>>>  
>>> if (!rxused)
>>> break;
>>>  
>>> +   /* Ensure other data is at least as up-to-date as rxused */
>>> +   dma_rmb();
>> Same here, wouldn't previous rmb() should do this job?
> 
> The scenario I'm concerned about here (and in the last hunk) is:
> 
> 1. rmb().
> 2. ctrl is read (i.e. ctrl read reordered before addr read).

Same here with regards to [1]. All prior loads, stores should be finished
when dsb ends.

> 3. HW updates to ctrl and addr become visible.
> 4. RX_USED check.
> 5. code operates on garbage ctrl.

If this is happen then the data will be read on next interrupt.

dma_rmb() is a dmb. According to [1]:
"Data Memory Barrier acts as a memory barrier. It ensures that all explicit
memory accesses that appear in program order before the DMB instruction are
observed before any explicit memory accesses that appear in program order
after the DMB instruction. It does not affect the ordering of any other

Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-05 Thread Anssi Hannula
On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>
> On 30.11.2018 20:21, Anssi Hannula wrote:
>> When reading buffer descriptors on RX or on TX completion, an
>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>> been populated. However, there are no memory barriers to ensure that the
>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>> that bit.
>>
>> Fix that by adding DMA read memory barriers on those paths.
>>
>> I did not observe any actual issues caused by these being missing,
>> though.
>>
>> Tested on a ZynqMP based system.
>>
>> Signed-off-by: Anssi Hannula 
>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>> Cc: Nicolas Ferre 
>> ---
>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 430b7a0f5436..c93baa8621d5 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>>  
>>  /* First, update TX stats if needed */
>>  if (skb) {
>> +/* Ensure all of desc is at least as up-to-date
>> + * as ctrl (TX_USED bit)
>> + */
>> +dma_rmb();
>> +
> Is this necessary? Wouldn't previous rmb() take care of this? At this time
> data specific to this descriptor was completed. The TX descriptors for next
> data to be send is updated under a locked spinlock.

The previous rmb() is before the TX_USED check, so my understanding is
that the following could happen in theory:

1. rmb().
2. Reads are reordered so that TX timestamp is read first - no barriers
are crossed.
3. HW writes timestamp and sets TX_USED (or they become visible).
4. Code checks TX_USED.
5. Code operates on timestamp that is actually garbage.

I'm not 100% sure that there isn't some lighter/cleaner way to do this
than dma_rmb(), though.

>>  if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
>>  /* skb now belongs to timestamp buffer
>>   * and will be removed later
>> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int 
>> budget)
>>  rmb();
>>  
>>  rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>> -addr = macb_get_addr(bp, desc);
>> -ctrl = desc->ctrl;
>>  
>>  if (!rxused)
>>  break;
>>  
>> +/* Ensure other data is at least as up-to-date as rxused */
>> +dma_rmb();
> Same here, wouldn't previous rmb() should do this job?

The scenario I'm concerned about here (and in the last hunk) is:

1. rmb().
2. ctrl is read (i.e. ctrl read reordered before addr read).
3. HW updates to ctrl and addr become visible.
4. RX_USED check.
5. code operates on garbage ctrl.

I think it may be OK to move the earlier rmb() outside the loop so that
there is an rmb() only before and after the RX loop, as I don't at least
immediately see any hard requirement to do it on each loop pass (unlike
the added dma_rmb()). But my intent was to fix issues instead of
optimization so I didn't look too closely into that.

>> +
>> +addr = macb_get_addr(bp, desc);
>> +ctrl = desc->ctrl;
>> +
>>  queue->rx_tail++;
>>  count++;
>>  
>> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int 
>> budget)
>>  /* Make hw descriptor updates visible to CPU */
>>  rmb();
>>  
>> -ctrl = desc->ctrl;
>> -
>>  if (!(desc->addr & MACB_BIT(RX_USED)))
>>  break;
>>  
>> +/* Ensure other data is at least as up-to-date as addr */
>> +dma_rmb();
> Ditto
>
>> +
>> +ctrl = desc->ctrl;
>> +
>>  if (ctrl & MACB_BIT(RX_SOF)) {
>>  if (first_frag != -1)
>>  discard_partial_frame(queue, first_frag, tail);
>>

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-05 Thread Claudiu.Beznea


On 30.11.2018 20:21, Anssi Hannula wrote:
> When reading buffer descriptors on RX or on TX completion, an
> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
> been populated. However, there are no memory barriers to ensure that the
> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
> that bit.
> 
> Fix that by adding DMA read memory barriers on those paths.
> 
> I did not observe any actual issues caused by these being missing,
> though.
> 
> Tested on a ZynqMP based system.
> 
> Signed-off-by: Anssi Hannula 
> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
> Cc: Nicolas Ferre 
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index 430b7a0f5436..c93baa8621d5 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  
>   /* First, update TX stats if needed */
>   if (skb) {
> + /* Ensure all of desc is at least as up-to-date
> +  * as ctrl (TX_USED bit)
> +  */
> + dma_rmb();
> +

Is this necessary? Wouldn't previous rmb() take care of this? At this time
data specific to this descriptor was completed. The TX descriptors for next
data to be send is updated under a locked spinlock.

>   if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
>   /* skb now belongs to timestamp buffer
>* and will be removed later
> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int 
> budget)
>   rmb();
>  
>   rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
> - addr = macb_get_addr(bp, desc);
> - ctrl = desc->ctrl;
>  
>   if (!rxused)
>   break;
>  
> + /* Ensure other data is at least as up-to-date as rxused */
> + dma_rmb();

Same here, wouldn't previous rmb() should do this job?

> +
> + addr = macb_get_addr(bp, desc);
> + ctrl = desc->ctrl;
> +
>   queue->rx_tail++;
>   count++;
>  
> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int 
> budget)
>   /* Make hw descriptor updates visible to CPU */
>   rmb();
>  
> - ctrl = desc->ctrl;
> -
>   if (!(desc->addr & MACB_BIT(RX_USED)))
>   break;
>  
> + /* Ensure other data is at least as up-to-date as addr */
> + dma_rmb();

Ditto

> +
> + ctrl = desc->ctrl;
> +
>   if (ctrl & MACB_BIT(RX_SOF)) {
>   if (first_frag != -1)
>   discard_partial_frame(queue, first_frag, tail);
> 


[PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-11-30 Thread Anssi Hannula
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptor has
been populated. However, there are no memory barriers to ensure that the
data protected by the RX_USED/TX_USED bit is up-to-date with respect to
that bit.

Fix that by adding DMA read memory barriers on those paths.

I did not observe any actual issues caused by these being missing,
though.

Tested on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb_main.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 430b7a0f5436..c93baa8621d5 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
 
/* First, update TX stats if needed */
if (skb) {
+   /* Ensure all of desc is at least as up-to-date
+* as ctrl (TX_USED bit)
+*/
+   dma_rmb();
+
if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
/* skb now belongs to timestamp buffer
 * and will be removed later
@@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int budget)
rmb();
 
rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
-   addr = macb_get_addr(bp, desc);
-   ctrl = desc->ctrl;
 
if (!rxused)
break;
 
+   /* Ensure other data is at least as up-to-date as rxused */
+   dma_rmb();
+
+   addr = macb_get_addr(bp, desc);
+   ctrl = desc->ctrl;
+
queue->rx_tail++;
count++;
 
@@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int budget)
/* Make hw descriptor updates visible to CPU */
rmb();
 
-   ctrl = desc->ctrl;
-
if (!(desc->addr & MACB_BIT(RX_USED)))
break;
 
+   /* Ensure other data is at least as up-to-date as addr */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
if (ctrl & MACB_BIT(RX_SOF)) {
if (first_frag != -1)
discard_partial_frame(queue, first_frag, tail);
-- 
2.17.2