RE: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs

2018-03-24 Thread Chopra, Manish
> -Original Message-
> From: Sinan Kaya [mailto:ok...@codeaurora.org]
> Sent: Friday, March 23, 2018 10:44 PM
> To: David Miller 
> Cc: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org;
> linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Elior,
> Ariel ; Dept-Eng Everest Linux L2  engeverestlinu...@cavium.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-
> ordered archs
> 
> On 3/23/2018 1:04 PM, David Miller wrote:
> > From: Sinan Kaya 
> > Date: Fri, 23 Mar 2018 12:51:47 -0400
> >
> >> It could if txdata->tx_db was not a union. There is a data dependency
> >> between txdata->tx_db.data.prod and txdata->tx_db.raw.
> >>
> >> So, no reordering.
> >
> > I don't see it that way, the code requires that:
> >
> > txdata->tx_db.data.prod += nbd;
> >
> > is visible before the doorbell update.>
> > barrier() doesn't provide that.
> >
> > Neither does writel_relaxed().  However plain writel() does.
> 
> Correct for some architectures including ARM but not correct universally.
> 
> writel() just guarantees register read/writes before and after to be ordered
> when HW observes it.
> 
> writel() doesn't guarantee that the memory update is visible to the HW on all
> architectures.
> 
> If you need memory update visibility, that barrier() should have been a
> wmb()
> 
> A correct multi-arch pattern is
> 
> wmb()
> writel_relaxed()
> mmiowb()
> 

Sinan,  Since you have mentioned the use of mmiowb() here after 
writel_relaxed().
I believe this is not always correct for all types of IO mapped memory 
[Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())].
We have a current issue on our NIC (qede) driver on x86 for which the patch is 
already been sent more than a week ago [Still awaiting to hear from David on 
that].
where mmiowb() seems to be useless since we use write combined mapped doorbell 
and mmiowb() just seems to be a compiler barrier() there.
So in order to flush write combined buffer we really need writel_relaxed() 
followed by a wmb() to synchronize writes among CPU cores.
I think  the correct pattern in such cases (for write combined IO) would have 
been like below - 

wmb();
writel_relaxed();
wmb(); -> To flush the writes actually.

Thanks.







RE: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs

2018-03-24 Thread Chopra, Manish
> -Original Message-
> From: Sinan Kaya [mailto:ok...@codeaurora.org]
> Sent: Friday, March 23, 2018 10:44 PM
> To: David Miller 
> Cc: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org;
> linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Elior,
> Ariel ; Dept-Eng Everest Linux L2  engeverestlinu...@cavium.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-
> ordered archs
> 
> On 3/23/2018 1:04 PM, David Miller wrote:
> > From: Sinan Kaya 
> > Date: Fri, 23 Mar 2018 12:51:47 -0400
> >
> >> It could if txdata->tx_db was not a union. There is a data dependency
> >> between txdata->tx_db.data.prod and txdata->tx_db.raw.
> >>
> >> So, no reordering.
> >
> > I don't see it that way, the code requires that:
> >
> > txdata->tx_db.data.prod += nbd;
> >
> > is visible before the doorbell update.>
> > barrier() doesn't provide that.
> >
> > Neither does writel_relaxed().  However plain writel() does.
> 
> Correct for some architectures including ARM but not correct universally.
> 
> writel() just guarantees register read/writes before and after to be ordered
> when HW observes it.
> 
> writel() doesn't guarantee that the memory update is visible to the HW on all
> architectures.
> 
> If you need memory update visibility, that barrier() should have been a
> wmb()
> 
> A correct multi-arch pattern is
> 
> wmb()
> writel_relaxed()
> mmiowb()
> 

Sinan,  Since you have mentioned the use of mmiowb() here after 
writel_relaxed().
I believe this is not always correct for all types of IO mapped memory 
[Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())].
We have a current issue on our NIC (qede) driver on x86 for which the patch is 
already been sent more than a week ago [Still awaiting to hear from David on 
that].
where mmiowb() seems to be useless since we use write combined mapped doorbell 
and mmiowb() just seems to be a compiler barrier() there.
So in order to flush write combined buffer we really need writel_relaxed() 
followed by a wmb() to synchronize writes among CPU cores.
I think  the correct pattern in such cases (for write combined IO) would have 
been like below - 

wmb();
writel_relaxed();
wmb(); -> To flush the writes actually.

Thanks.







RE: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Chopra, Manish
> -Original Message-
> From: Sinan Kaya [mailto:ok...@codeaurora.org]
> Sent: Friday, March 16, 2018 9:46 PM
> To: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org
> Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> Sinan Kaya <ok...@codeaurora.org>; Patil, Harish <harish.pa...@cavium.com>;
> Chopra, Manish <manish.cho...@cavium.com>; Dept-GE Linux NIC Dev  gelinuxnic...@cavium.com>; linux-kernel@vger.kernel.org
> Subject: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on 
> weakly-ordered
> archs
> 
> Code includes wmb() followed by writel(). writel() already has a barrier on 
> some
> architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <ok...@codeaurora.org>
> ---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index 46b0372..97c146e7 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct
> qlcnic_adapter *adapter)
>   wmb();
> 
>   /* clear the interrupt trigger control register */
> - writel(0, adapter->isr_int_vec);
> + writel_relaxed(0, adapter->isr_int_vec);
>   intr_val = readl(adapter->isr_int_vec);
>   do {
>   intr_val = readl(adapter->tgt_status_reg);
> --
> 2.7.4

Acked-by: Manish Chopra <manish.cho...@cavium.com>

Thanks.


RE: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Chopra, Manish
> -Original Message-
> From: Sinan Kaya [mailto:ok...@codeaurora.org]
> Sent: Friday, March 16, 2018 9:46 PM
> To: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org
> Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> Sinan Kaya ; Patil, Harish ;
> Chopra, Manish ; Dept-GE Linux NIC Dev  gelinuxnic...@cavium.com>; linux-kernel@vger.kernel.org
> Subject: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on 
> weakly-ordered
> archs
> 
> Code includes wmb() followed by writel(). writel() already has a barrier on 
> some
> architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index 46b0372..97c146e7 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct
> qlcnic_adapter *adapter)
>   wmb();
> 
>   /* clear the interrupt trigger control register */
> - writel(0, adapter->isr_int_vec);
> + writel_relaxed(0, adapter->isr_int_vec);
>   intr_val = readl(adapter->isr_int_vec);
>   do {
>   intr_val = readl(adapter->tgt_status_reg);
> --
> 2.7.4

Acked-by: Manish Chopra 

Thanks.