Re: RFC on writel and writel_relaxed

2018-03-28 Thread okaya

On 2018-03-28 02:14, Linus Torvalds wrote:
On Tue, Mar 27, 2018 at 5:24 PM, Sinan Kaya  
wrote:


Basically changing it to

dma_buffer->foo = 1;/* WB */
wmb()
writel_relaxed(KICK, DMA_KICK_REGISTER);/* UC */
mmiowb()


Why?

Why not  just remove the wmb(), and keep the barrier in the writel()?


Yes, we want to get there indeed. It is because of some arch not 
implementing writel properly. Maintainers want to play safe.


That is why I asked if IA64 and other well known archs follow the 
strongly ordered rule at this moment like PPC and ARM.


Or should we go and inform every arch about this before yanking wmb()?

Maintainers are afraid of introducing a regression.



The above code makes no sense, and just looks stupid to me. It also
generates pointlessly bad code on x86, so it's bad there too.

   Linus


Re: [PATCH v4 17/17] net: ena: Eliminate duplicate barriers on weakly-ordered archs

2018-03-25 Thread okaya

On 2018-03-25 08:06, Belgazal, Netanel wrote:

I think you should either add a parameter to
ena_com_write_sq_doorbell() or add ena_com_write_sq_doorbell_rel().
Right now, you have unused function.


That is true. I got rid of ena_com_write_sq_doorbell_rel.



On 3/20/18, 4:43 AM, "Sinan Kaya"  wrote:

Code includes barrier() followed by writel(). writel() already has 
a

barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before 
executing the

register write.

Create a new wrapper function with relaxed write operator. Use the 
new

wrapper when a write is following a barrier().

Since code already has an explicit barrier call, changing writel() 
to

writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/amazon/ena/ena_com.c |  6 --
 drivers/net/ethernet/amazon/ena/ena_eth_com.h | 22 
--

 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  4 ++--
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c
b/drivers/net/ethernet/amazon/ena/ena_com.c
index bf2de52..b6e628f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -631,7 +631,8 @@ static u32 ena_com_reg_bar_read32(struct
ena_com_dev *ena_dev, u16 offset)
 */
wmb();

-	writel(mmio_read_reg, ena_dev->reg_bar + 
ENA_REGS_MMIO_REG_READ_OFF);

+   writel_relaxed(mmio_read_reg,
+  ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);

for (i = 0; i < timeout; i++) {
if (read_resp->req_id == mmio_read->seq_num)
@@ -1826,7 +1827,8 @@ void ena_com_aenq_intr_handler(struct
ena_com_dev *dev, void *data)

 	/* write the aenq doorbell after all AENQ descriptors were read 
*/

mb();
-	writel((u32)aenq->head, dev->reg_bar + 
ENA_REGS_AENQ_HEAD_DB_OFF);

+   writel_relaxed((u32)aenq->head,
+  dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
 }

 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index 2f76572..09ef7cd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -107,7 +107,8 @@ static inline int
ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
return io_sq->q_depth - 1 - cnt;
 }

-static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq 
*io_sq)
+static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq 
*io_sq,

+   bool relaxed)
 {
u16 tail;

@@ -116,7 +117,24 @@ static inline int
ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
 	pr_debug("write submission queue doorbell for queue: %d tail: 
%d\n",

 io_sq->qid, tail);

-   writel(tail, io_sq->db_addr);
+   if (relaxed)
+   writel_relaxed(tail, io_sq->db_addr);
+   else
+   writel(tail, io_sq->db_addr);
+
+   return 0;
+}
+
+static inline int ena_com_write_sq_doorbell_rel(struct
ena_com_io_sq *io_sq)
+{
+   u16 tail;
+
+   tail = io_sq->tail;
+
+	pr_debug("write submission queue doorbell for queue: %d tail: 
%d\n",

+io_sq->qid, tail);
+
+   writel_relaxed(tail, io_sq->db_addr);

return 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150..0530201 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -556,7 +556,7 @@ static int ena_refill_rx_bufs(struct ena_ring
*rx_ring, u32 num)
 * issue a doorbell
 */
wmb();
-   ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
+   ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
}

rx_ring->next_to_use = next_to_use;
@@ -2151,7 +2151,7 @@ static netdev_tx_t ena_start_xmit(struct
sk_buff *skb, struct net_device *dev)

if (netif_xmit_stopped(txq) || !skb->xmit_more) {
/* trigger the dma engine */
-   ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
+   ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
u64_stats_update_begin(_ring->syncp);
tx_ring->tx_stats.doorbells++;
u64_stats_update_end(_ring->syncp);
--
2.7.4


Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-03-23 Thread okaya

On 2018-03-23 19:58, Jeff Kirsher wrote:

On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:

On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya 
wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing
> the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
>
> I considered "ease of change", "popular usage" and "performance
> critical
> path" as the determining criteria for my filtering.
>
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the
> common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and
> scrub
> all drivers to see what needs to change and what can remain.
>
> We start with mostly used ones and hope to increase the coverage over
> time.
> It will take a while to cover all drivers.
>
> Feel free to apply patches individually.

I looked over the set and they seem good.

Reviewed-by: Alexander Duyck 


Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
branch).  I will deal with this series in a day or two, after I have 
dealt

with my driver pull requests.


Sorry, you will have to replace the ones you took from me.



>
> Changes since v6:
> clean up between 2..6 and then make your Alex's changes on 1 and 7
> The mmiowb shouldn't be needed for Rx. Only one CPU will be running
> NAPI for the queue and we will synchronize this with a full writel
> anyway when we re-enable the interrupts.
>
> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 14 ++
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  4 ++--
>  drivers/net/ethernet/intel/igb/igb_main.c |  4 ++--
>  drivers/net/ethernet/intel/igbvf/netdev.c |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  8 
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  5 -
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ---
>  8 files changed, 30 insertions(+), 24 deletions(-)
>
> --
> 2.7.4
>


Re: [PATCH v4 12/17] net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered archs

2018-03-21 Thread okaya

On 2018-03-21 19:03, Casey Leedom wrote:
[[ Appologies for the DUPLICATE email.  I forgot to tell my Mail Agent 
to

   use Plain Text. -- Casey ]]

  I feel very uncomfortable with these proposed changes.  Our team is 
right

in the middle of trying to tease our way through the various platform
implementations of writel(), writel_relaxed(), __raw_writel(), etc. in 
order

to support x86, PowerPC, ARM, etc. with a single code base.  This is
complicated by the somewhat ... "fuzzily defined" semantics and varying
platform implementations of all of these APIs.  (And note that I'm just
picking writel() as an example.)

  Additionally, many of the changes aren't even in fast paths and are 
thus

unneeded for performance.

  Please don't make these changes.  We're trying to get this all sussed 
out.




I was also given the feedback to look at performance critical path only. 
I am in the process of revisiting the patches.


If you can point me to the ones that are important, I can try to limit 
the changes to those only.


If your team wants to do it, I can drop this patch as well.

I think the semantics of write API is clear. What was actually 
implemented is another story.


I can share a few of my findings.

A portable driver needs to do this.

descriptor update in mem
wmb ()
writel_relaxed ()
mmiowb ()

Using __raw_write() is wrong as it can get reordered.

Using wmb()+writel() is also wrong for performance reasons.

If something is unclear, please ask.





Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()

2018-03-21 Thread okaya

On 2018-03-21 17:54, David Miller wrote:

From: Jeff Kirsher 
Date: Wed, 21 Mar 2018 14:48:08 -0700


On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote:

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 5 -
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)


This patch fails to compile because there is a call to
ixgbevf_write_tail() which you missed cleaning up.


For a change with delicate side effects, it doesn't create much
confidence if the code does not even compile.

Sinan, please put more care into the changes you are making.


I think the issue is the tree that code is getting tested has 
undelivered code as Alex mentioned.


I was using linux-next 4.16 rc4 for testing.

I will rebase to Jeff's tree.



Thank you.


Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()

2018-03-21 Thread okaya

On 2018-03-21 17:48, Jeff Kirsher wrote:

On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote:

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 5 -
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)


This patch fails to compile because there is a call to
ixgbevf_write_tail() which you missed cleaning up.


Hah, I did a compile test but maybe I missed something. I will get v6 of 
this patch only and leave the rest of the series as it is.


Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs

2018-03-14 Thread okaya

On 2018-03-14 01:08, Timur Tabi wrote:

On 3/13/18 10:20 PM, Sinan Kaya wrote:
+/* Assumes caller has executed a write barrier to order memory and 
device

+ * requests.
+ */
  static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 
value)

  {
-   writel(value, ring->tail);
+   writel_relaxed(value, ring->tail);
  }


Why not put the wmb() in this function, or just get rid of the wmb()
in the rest of the file and keep this as writel?  That way, you can
avoid the comment and the risk that comes with it.



Sure, both solutions will work. I want to see what the maintainer 
prefers. I can repost accordingly.


Re: [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs

2018-03-14 Thread okaya

On 2018-03-14 00:12, Jason Gunthorpe wrote:

On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing 
the

register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
 drivers/infiniband/hw/qedr/verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


Sure matches my understanding of writel_relaxed

This is part of a series, should we take just this patch through the
rdma tree? If not:

Acked-by: Jason Gunthorpe 


Feel free to take pieces.




Thanks,
Jason


Re: [PATCH V2] net: ethernet: mellanox: correct page conversion

2016-04-18 Thread okaya

On 2016-04-18 08:12, Christoph Hellwig wrote:

On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote:
Current code is assuming that the address returned by 
dma_alloc_coherent

is a logical address. This is not true on ARM/ARM64 systems.


Can you explain what you mean with a 'logical address' and what actual
issue you're trying to solve?


Vmap call is failing on arm64 systems because dma alloc api already 
returns an address mapped with vmap.


Please see arch/arm64/mm directory.


Re: [PATCH V2] net: ethernet: mellanox: correct page conversion

2016-04-17 Thread okaya

On 2016-04-18 00:00, David Miller wrote:

From: Sinan Kaya 
Date: Sat, 16 Apr 2016 18:23:32 -0400

Current code is assuming that the address returned by 
dma_alloc_coherent
is a logical address. This is not true on ARM/ARM64 systems. This 
patch
replaces dma_alloc_coherent with dma_map_page API. The address 
returned

can later by virtually mapped from the CPU side with vmap API.

Signed-off-by: Sinan Kaya 


You can't do this.

The DMA map page API gives non-coherent mappings, and thus requires
proper flushing.

So a straight conversion like this is never legitimate.


I would agree on proper dma api usage. However, the code is already 
assuming coherent architecture by mapping the cpu pages as page_kernel.


Dma_map_page returns cached buffers and you don't need cache flushes on 
coherent architecture to make the data visible.