RE: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
> -Original Message- > From: Sinan Kaya [mailto:ok...@codeaurora.org] > Sent: Friday, March 23, 2018 10:44 PM > To: David Miller> Cc: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org; > linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Elior, > Ariel ; Dept-Eng Everest Linux L2 engeverestlinu...@cavium.com>; linux-kernel@vger.kernel.org > Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly- > ordered archs > > On 3/23/2018 1:04 PM, David Miller wrote: > > From: Sinan Kaya > > Date: Fri, 23 Mar 2018 12:51:47 -0400 > > > >> It could if txdata->tx_db was not a union. There is a data dependency > >> between txdata->tx_db.data.prod and txdata->tx_db.raw. > >> > >> So, no reordering. > > > > I don't see it that way, the code requires that: > > > > txdata->tx_db.data.prod += nbd; > > > > is visible before the doorbell update.> > > barrier() doesn't provide that. > > > > Neither does writel_relaxed(). However plain writel() does. > > Correct for some architectures including ARM but not correct universally. > > writel() just guarantees register read/writes before and after to be ordered > when HW observes it. > > writel() doesn't guarantee that the memory update is visible to the HW on all > architectures. > > If you need memory update visibility, that barrier() should have been a > wmb() > > A correct multi-arch pattern is > > wmb() > writel_relaxed() > mmiowb() > Sinan, Since you have mentioned the use of mmiowb() here after writel_relaxed(). I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())]. We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that]. where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there. So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores. I think the correct pattern in such cases (for write combined IO) would have been like below - wmb(); writel_relaxed(); wmb(); -> To flush the writes actually. Thanks.
RE: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
> -Original Message- > From: Sinan Kaya [mailto:ok...@codeaurora.org] > Sent: Friday, March 23, 2018 10:44 PM > To: David Miller > Cc: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org; > linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Elior, > Ariel ; Dept-Eng Everest Linux L2 engeverestlinu...@cavium.com>; linux-kernel@vger.kernel.org > Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly- > ordered archs > > On 3/23/2018 1:04 PM, David Miller wrote: > > From: Sinan Kaya > > Date: Fri, 23 Mar 2018 12:51:47 -0400 > > > >> It could if txdata->tx_db was not a union. There is a data dependency > >> between txdata->tx_db.data.prod and txdata->tx_db.raw. > >> > >> So, no reordering. > > > > I don't see it that way, the code requires that: > > > > txdata->tx_db.data.prod += nbd; > > > > is visible before the doorbell update.> > > barrier() doesn't provide that. > > > > Neither does writel_relaxed(). However plain writel() does. > > Correct for some architectures including ARM but not correct universally. > > writel() just guarantees register read/writes before and after to be ordered > when HW observes it. > > writel() doesn't guarantee that the memory update is visible to the HW on all > architectures. > > If you need memory update visibility, that barrier() should have been a > wmb() > > A correct multi-arch pattern is > > wmb() > writel_relaxed() > mmiowb() > Sinan, Since you have mentioned the use of mmiowb() here after writel_relaxed(). I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())]. We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that]. where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there. So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores. I think the correct pattern in such cases (for write combined IO) would have been like below - wmb(); writel_relaxed(); wmb(); -> To flush the writes actually. Thanks.
RE: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on weakly-ordered archs
> -Original Message- > From: Sinan Kaya [mailto:ok...@codeaurora.org] > Sent: Friday, March 16, 2018 9:46 PM > To: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org > Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > Sinan Kaya <ok...@codeaurora.org>; Patil, Harish <harish.pa...@cavium.com>; > Chopra, Manish <manish.cho...@cavium.com>; Dept-GE Linux NIC Dev gelinuxnic...@cavium.com>; linux-kernel@vger.kernel.org > Subject: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on > weakly-ordered > archs > > Code includes wmb() followed by writel(). writel() already has a barrier on > some > architectures like arm64. > > This ends up CPU observing two barriers back to back before executing the > register write. > > Since code already has an explicit barrier call, changing writel() to > writel_relaxed(). > > Signed-off-by: Sinan Kaya <ok...@codeaurora.org> > --- > drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > index 46b0372..97c146e7 100644 > --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > @@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct > qlcnic_adapter *adapter) > wmb(); > > /* clear the interrupt trigger control register */ > - writel(0, adapter->isr_int_vec); > + writel_relaxed(0, adapter->isr_int_vec); > intr_val = readl(adapter->isr_int_vec); > do { > intr_val = readl(adapter->tgt_status_reg); > -- > 2.7.4 Acked-by: Manish Chopra <manish.cho...@cavium.com> Thanks.
RE: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on weakly-ordered archs
> -Original Message- > From: Sinan Kaya [mailto:ok...@codeaurora.org] > Sent: Friday, March 16, 2018 9:46 PM > To: net...@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org > Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > Sinan Kaya ; Patil, Harish ; > Chopra, Manish ; Dept-GE Linux NIC Dev gelinuxnic...@cavium.com>; linux-kernel@vger.kernel.org > Subject: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on > weakly-ordered > archs > > Code includes wmb() followed by writel(). writel() already has a barrier on > some > architectures like arm64. > > This ends up CPU observing two barriers back to back before executing the > register write. > > Since code already has an explicit barrier call, changing writel() to > writel_relaxed(). > > Signed-off-by: Sinan Kaya > --- > drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > index 46b0372..97c146e7 100644 > --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c > @@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct > qlcnic_adapter *adapter) > wmb(); > > /* clear the interrupt trigger control register */ > - writel(0, adapter->isr_int_vec); > + writel_relaxed(0, adapter->isr_int_vec); > intr_val = readl(adapter->isr_int_vec); > do { > intr_val = readl(adapter->tgt_status_reg); > -- > 2.7.4 Acked-by: Manish Chopra Thanks.