Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Tried it, still broken.

I wrote:


But my kernel config had memory compaction (that turned on page migration) and
bounce buffers. I do not remember why I found them necessary but I will try
without them. 


First, I found out that both the problematic alphas had memory compaction and
page migration and bounce buffers turned on, and working alphas had them off.

Next, turing off these options makes the problematic alphas work.

--
Meelis Roos 


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Maciej W. Rozycki
On Tue, 19 Feb 2019, Mikulas Patocka wrote:

> >  As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
> > suppose these are allowed to be weakly ordered.
> > 
> >   Maciej
> 
> iowriteX is for memory-mapped I/O, out[bwl] is for I/O ports.

 Well, actually `iowriteX' is generic and for MMIO you have `writeX'.  
See lib/iomap.c, the comment at the top and e.g. `iowrite8' there for an 
actual proof.  Alpha may have an oddball implementation, but there you go. 
Drivers will assume they can do `iowriteX' to any kind of I/O resource, 
and ordering must be respected as per Documentation/memory-barriers.txt.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka



On Tue, 19 Feb 2019, Arnd Bergmann wrote:

> On Tue, Feb 19, 2019 at 2:44 PM Mikulas Patocka  wrote:
> > On Tue, 19 Feb 2019, Mikulas Patocka wrote:
> >
> > > The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
> > > 92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
> > > code didn't follow the specification. Unfortunatelly, they also reduce
> > > timing when port write is followed by a port read.
> > >
> > > These reduced timing cause hang on boot on the Avanti platform when
> > > probing serial ports. This patch adds memory barrier after the outb, outw,
> > > outl functions, so that there is delay between port write and subsequent
> > > port read - just like before.
> > >
> > > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() 
> > > and iowriteX() ordering")
> > > Cc: sta...@vger.kernel.org# v4.17+
> >
> > you can also add:
> >
> > Tested-by: Mikulas Patocka 
> 
> Acked-by: Arnd Bergmann 
> 
> but I notice you are missing Signed-off-by.

I forgot about it. You made that patch, so there should be your signed-off 
too.

Signed-off-by: Mikulas Patocka 

> We clearly need this patch, but I assumed the alpha maintainers would pick
> it up, not me. I merged the original changes since they were 
> cross-architecture,
> but I don't normally take patches for a particular architecture through the
> asm-generic tree (or the soc tree for that matter).
> 
>   Arnd
> 


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka



On Tue, 19 Feb 2019, Maciej W. Rozycki wrote:

> On Tue, 19 Feb 2019, Arnd Bergmann wrote:
> 
> > We clearly need this patch, but I assumed the alpha maintainers would pick
> > it up, not me. I merged the original changes since they were 
> > cross-architecture,
> > but I don't normally take patches for a particular architecture through the
> > asm-generic tree (or the soc tree for that matter).
> 
>  As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
> suppose these are allowed to be weakly ordered.
> 
>   Maciej

iowriteX is for memory-mapped I/O, out[bwl] is for I/O ports.

If someone finds a problem with memory-mapped device, he can add the 
barier there.

Mikulas


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Maciej W. Rozycki
On Tue, 19 Feb 2019, Arnd Bergmann wrote:

> We clearly need this patch, but I assumed the alpha maintainers would pick
> it up, not me. I merged the original changes since they were 
> cross-architecture,
> but I don't normally take patches for a particular architecture through the
> asm-generic tree (or the soc tree for that matter).

 As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
suppose these are allowed to be weakly ordered.

  Maciej


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Matthew Wilcox
On Tue, Feb 19, 2019 at 02:20:26PM +0100, Jan Kara wrote:
> Thanks for information. Yeah, that makes somewhat more sense. Can you ever
> see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
> findings still seem to indicate that there' some problem with page
> migration and Alpha (added MM list to CC).

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Arnd Bergmann
On Tue, Feb 19, 2019 at 2:44 PM Mikulas Patocka  wrote:
> On Tue, 19 Feb 2019, Mikulas Patocka wrote:
>
> > The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
> > 92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
> > code didn't follow the specification. Unfortunatelly, they also reduce
> > timing when port write is followed by a port read.
> >
> > These reduced timing cause hang on boot on the Avanti platform when
> > probing serial ports. This patch adds memory barrier after the outb, outw,
> > outl functions, so that there is delay between port write and subsequent
> > port read - just like before.
> >
> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and 
> > iowriteX() ordering")
> > Cc: sta...@vger.kernel.org# v4.17+
>
> you can also add:
>
> Tested-by: Mikulas Patocka 

Acked-by: Arnd Bergmann 

but I notice you are missing Signed-off-by.

We clearly need this patch, but I assumed the alpha maintainers would pick
it up, not me. I merged the original changes since they were cross-architecture,
but I don't normally take patches for a particular architecture through the
asm-generic tree (or the soc tree for that matter).

  Arnd


[PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka
The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
code didn't follow the specification. Unfortunatelly, they also reduce
timing when port write is followed by a port read.

These reduced timing cause hang on boot on the Avanti platform when
probing serial ports. This patch adds memory barrier after the outb, outw,
outl functions, so that there is delay between port write and subsequent
port read - just like before.

Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and 
iowriteX() ordering")
Cc: sta...@vger.kernel.org  # v4.17+

---
 arch/alpha/kernel/io.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-stable/arch/alpha/kernel/io.c
===
--- linux-stable.orig/arch/alpha/kernel/io.c2019-02-19 14:06:03.0 
+0100
+++ linux-stable/arch/alpha/kernel/io.c 2019-02-19 14:12:29.0 +0100
@@ -78,16 +78,19 @@ u32 inl(unsigned long port)
 void outb(u8 b, unsigned long port)
 {
iowrite8(b, ioport_map(port, 1));
+   mb();
 }
 
 void outw(u16 b, unsigned long port)
 {
iowrite16(b, ioport_map(port, 2));
+   mb();
 }
 
 void outl(u32 b, unsigned long port)
 {
iowrite32(b, ioport_map(port, 4));
+   mb();
 }
 
 EXPORT_SYMBOL(inb);
@@ -336,6 +339,7 @@ void iowrite8_rep(void __iomem *port, co
 void outsb(unsigned long port, const void *src, unsigned long count)
 {
iowrite8_rep(ioport_map(port, 1), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite8_rep);
@@ -376,6 +380,7 @@ void iowrite16_rep(void __iomem *port, c
 void outsw(unsigned long port, const void *src, unsigned long count)
 {
iowrite16_rep(ioport_map(port, 2), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite16_rep);
@@ -408,6 +413,7 @@ void iowrite32_rep(void __iomem *port, c
 void outsl(unsigned long port, const void *src, unsigned long count)
 {
iowrite32_rep(ioport_map(port, 4), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite32_rep);


Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130

2019-02-19 Thread Mikulas Patocka



On Tue, 21 Aug 2018, Arnd Bergmann wrote:

> On Mon, Aug 20, 2018 at 11:42 PM Mikulas Patocka  wrote:
> > On Mon, 20 Aug 2018, Arnd Bergmann wrote:
> >
> > > On Mon, Aug 20, 2018 at 4:17 PM Mikulas Patocka  
> > > wrote:
> > > > On Sun, 19 Aug 2018, ok...@codeaurora.org wrote:
> > > >
> > > > > +my new email
> > > > >
> > > > > On 2018-08-18 19:03, Arnd Bergmann wrote:
> > > > > > On Sat, Aug 18, 2018 at 12:05 AM Maciej W. Rozycki 
> > > > > > 
> > >
> > > > > I think we need to identify the driver that is failing.
> > > >
> > > > It also may be some timing issue.
> > > >
> > > > I observed that not every kernel with the patch
> > > > 92d7223a74235054f2aa7227d207d9c57f84dca0 fails, some of them get stuck
> > > > only at boot, some get stuck only at shutdown, some not at all. Although
> > > > all the kernels with this patch reverted work.
> > > >
> > > > So the patch may have uncovered some timing problem somewhere.
> > > >
> > > > x86 has the function io_delay that injects delays between I/O accesses 
> > > > for
> > > > hardware that needs it - does alpha have something like this?
> > >
> > > The I/O delay would be very low on my list of possible root causes
> > > for this, hardly any hardware at all relies on it, and all uses I see
> > > are related to outb(), which you've already shown not to be the problem
> > > with my test patch.
> > >
> > The lockup happens somewhere in the function autoconfig in
> > drivers/tty/serial/8250/8250_port.c, but I don't know where exactly
> > because serial console doesn't work while the port is being probed.
> >
> > When I use console on a graphics card, the lockup doesn't happen.
> 
> Ok, this does strongly suggest that it is the outb() operation that I
> suspected after all, I just sent you a wrong patch to test, failing
> to realize that alpha has two implementations of outb, and that the
> extern one is the one that gets used in a defconfig build.
> 
> Could you try again with this patch added in? (Sorry for the whitespace
> damage, you'll have to apply it by hand). Presumably a wmb()
> is sufficient here, but I'm trying to play safe here by restoring the
> barrier that was part of outb() before it broke.
> 
>Arnd

Hi

Will you commit this patch?

The avanti platform is still broken in the kernel 5.0 and I tested that 
this patch fixes it.

Mikulas


> diff --git a/arch/alpha/kernel/io.c b/arch/alpha/kernel/io.c
> index c025a3e5e357..604237fa821f 100644
> --- a/arch/alpha/kernel/io.c
> +++ b/arch/alpha/kernel/io.c
> @@ -78,16 +78,19 @@ u32 inl(unsigned long port)
>  void outb(u8 b, unsigned long port)
>  {
> iowrite8(b, ioport_map(port, 1));
> +   mb();
>  }
> 
>  void outw(u16 b, unsigned long port)
>  {
> iowrite16(b, ioport_map(port, 2));
> +   mb();
>  }
> 
>  void outl(u32 b, unsigned long port)
>  {
> iowrite32(b, ioport_map(port, 4));
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(inb);
> @@ -336,6 +339,7 @@ void iowrite8_rep(void __iomem *port, const void
> *xsrc, unsigned long count)
>  void outsb(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite8_rep(ioport_map(port, 1), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite8_rep);
> @@ -376,6 +380,7 @@ void iowrite16_rep(void __iomem *port, const void
> *src, unsigned long count)
>  void outsw(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite16_rep(ioport_map(port, 2), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite16_rep);
> @@ -408,6 +413,7 @@ void iowrite32_rep(void __iomem *port, const void
> *src, unsigned long count)
>  void outsl(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite32_rep(ioport_map(port, 4), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite32_rep);
> 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Jan Kara
On Tue 19-02-19 14:17:09, Meelis Roos wrote:
> > > > > The result of the bisection is
> > > > > [88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration 
> > > > > stalls for blkdev pages
> > > > > 
> > > > > Is that result relevant for the problem or should I continue 
> > > > > bisecting between 4.20.0 and the so far first bad commit?
> > > > 
> > > > Can you try reverting the commit and see if it makes the problem go 
> > > > away?
> > > 
> > > Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems
> > > to make the kernel work - emerge --sync succeeded.
> There is more to it.
> 
> After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of
> that patch) successfully for Gentoo update, I upgraded the kernel to
> 5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch)
> and it broke on rsync again:
> 
> RepoStorageException: command exited with status -6: rsync -a --link-dest 
> /usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found 
> --exclude=/packages --exclude /.tmp-unverified-download-quarantine 
> /usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/
> 
> Nothing in dmesg.
> 
> This means the real root reason is somewhere deeper and reverting this
> commit just made it less likely to happen.

Thanks for information. Yeah, that makes somewhat more sense. Can you ever
see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
findings still seem to indicate that there' some problem with page
migration and Alpha (added MM list to CC).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

The result of the bisection is
[88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for 
blkdev pages

Is that result relevant for the problem or should I continue bisecting between 
4.20.0 and the so far first bad commit?


Can you try reverting the commit and see if it makes the problem go away?


Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems
to make the kernel work - emerge --sync succeeded.

There is more to it.

After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of that 
patch)
successfully for Gentoo update, I upgraded the kernel to
5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch) and it 
broke on rsync again:

RepoStorageException: command exited with status -6: rsync -a --link-dest 
/usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found 
--exclude=/packages --exclude /.tmp-unverified-download-quarantine 
/usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/

Nothing in dmesg.

This means the real root reason is somewhere deeper and reverting this commit 
just made
it less likely to happen.

--
Meelis Roos