Re: tty crash in tty_ldisc_receive_buf()

2017-04-07 Thread Benjamin Herrenschmidt
On Fri, 2017-04-07 at 09:03 -0500, Rob Herring wrote:
> Is this with a serial port? There were some changes to serial_core
> close in 4.9.

It's a combo of serial and hvc actually.

Cheers,
Ben.



Re: tty crash in tty_ldisc_receive_buf()

2017-04-07 Thread Benjamin Herrenschmidt
On Fri, 2017-04-07 at 09:03 -0500, Rob Herring wrote:
> Is this with a serial port? There were some changes to serial_core
> close in 4.9.

It's a combo of serial and hvc actually.

Cheers,
Ben.



Re: tty crash in tty_ldisc_receive_buf()

2017-04-07 Thread Rob Herring
On Thu, Apr 6, 2017 at 8:03 PM, Benjamin Herrenschmidt
 wrote:
> On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote:
>>
>> Can you try this one [1].
>>
>> Rob
>>
>> [1] https://lkml.org/lkml/2017/3/23/569
>
> Unless I missed something, that patch doesn't change barriers or
> locking, so I don't see how it could fix anything ...

Only because that's a code path that I changed recently and the
locking and ordering appears to be tricky.

> Mind you we haven't quite figured out exactly what's happening here.

Is this with a serial port? There were some changes to serial_core close in 4.9.

Rob


Re: tty crash in tty_ldisc_receive_buf()

2017-04-07 Thread Rob Herring
On Thu, Apr 6, 2017 at 8:03 PM, Benjamin Herrenschmidt
 wrote:
> On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote:
>>
>> Can you try this one [1].
>>
>> Rob
>>
>> [1] https://lkml.org/lkml/2017/3/23/569
>
> Unless I missed something, that patch doesn't change barriers or
> locking, so I don't see how it could fix anything ...

Only because that's a code path that I changed recently and the
locking and ordering appears to be tricky.

> Mind you we haven't quite figured out exactly what's happening here.

Is this with a serial port? There were some changes to serial_core close in 4.9.

Rob


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling

> > +   /* This probably shouldn't happen, but return 0 data processed */
> > +   if (!ldata)
> > +   return 0;
> > +
> >     while (1) {
> >     /*
> >      * When PARMRK is set, each input char may take up to 3
> > chars
> 
> Maybe your patch should looks like:
> + /* This probably shouldn't happen, but return 0 data processed */
> + if (!ldata) {
> +   up_read(>termios_rwsem);
> + return 0;
> +   }

Oops, nice catch.. Thanks!

That does indeed fix the problem now without the softlockup.  I'm not sure it's
the right fix, but full patch below.

Anyone see a problem with this approach? Am I just papering over a real issue?

> Maybe below patch should work:
> @@ -1668,11 +1668,12 @@ static int
>  n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>   char *fp, int count, int flow)
>  {
> -   struct n_tty_data *ldata = tty->disc_data;
> +   struct n_tty_data *ldata;
>      int room, n, rcvd = 0, overflow;
> 
> down_read(>termios_rwsem);
> 
> +   ldata = tty->disc_data;

I did try just that alone and it didn't help.

Mikey



>From 75c2a0369450692946ca8cc7ac148a98deaecd2a Mon Sep 17 00:00:00 2001
From: Michael Neuling 
Date: Fri, 7 Apr 2017 11:31:02 +1000
Subject: [PATCH] tty: fix regression in flush_to_ldisc

When reiniting a tty we can end up with:

[  417.514499] Unable to handle kernel paging request for data at address 
0x2260
[  417.515361] Faulting instruction address: 0xc06fad80
cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb10
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079675d1e00
  paca= 0xcfb0d200   softe: 0irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
2017
enter ? for help
[c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
[c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74

This is due to a NULL ptr dref of tty->disc_data.

This fixes the issue by moving the disc_data read to after we take the
semaphore, then returning 0 data processed when NULL.

Cc: [4.10+]
Signed-off-by: Michael Neuling 
---
 drivers/tty/n_tty.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..a2a9832a42 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1668,11 +1668,17 @@ static int
 n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 char *fp, int count, int flow)
 {
-   struct n_tty_data *ldata = tty->disc_data;
+   struct n_tty_data *ldata;
int room, n, rcvd = 0, overflow;
 
down_read(>termios_rwsem);
 
+   ldata = tty->disc_data;
+   if (!ldata) {
+   up_read(>termios_rwsem);
+   return 0;
+   }
+
while (1) {
/*
 * When PARMRK is set, each input char may take up to 3 chars
-- 
2.9.3




Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling

> > +   /* This probably shouldn't happen, but return 0 data processed */
> > +   if (!ldata)
> > +   return 0;
> > +
> >     while (1) {
> >     /*
> >      * When PARMRK is set, each input char may take up to 3
> > chars
> 
> Maybe your patch should looks like:
> + /* This probably shouldn't happen, but return 0 data processed */
> + if (!ldata) {
> +   up_read(>termios_rwsem);
> + return 0;
> +   }

Oops, nice catch.. Thanks!

That does indeed fix the problem now without the softlockup.  I'm not sure it's
the right fix, but full patch below.

Anyone see a problem with this approach? Am I just papering over a real issue?

> Maybe below patch should work:
> @@ -1668,11 +1668,12 @@ static int
>  n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>   char *fp, int count, int flow)
>  {
> -   struct n_tty_data *ldata = tty->disc_data;
> +   struct n_tty_data *ldata;
>      int room, n, rcvd = 0, overflow;
> 
> down_read(>termios_rwsem);
> 
> +   ldata = tty->disc_data;

I did try just that alone and it didn't help.

Mikey



>From 75c2a0369450692946ca8cc7ac148a98deaecd2a Mon Sep 17 00:00:00 2001
From: Michael Neuling 
Date: Fri, 7 Apr 2017 11:31:02 +1000
Subject: [PATCH] tty: fix regression in flush_to_ldisc

When reiniting a tty we can end up with:

[  417.514499] Unable to handle kernel paging request for data at address 
0x2260
[  417.515361] Faulting instruction address: 0xc06fad80
cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb10
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079675d1e00
  paca= 0xcfb0d200   softe: 0irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
2017
enter ? for help
[c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
[c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74

This is due to a NULL ptr dref of tty->disc_data.

This fixes the issue by moving the disc_data read to after we take the
semaphore, then returning 0 data processed when NULL.

Cc: [4.10+]
Signed-off-by: Michael Neuling 
---
 drivers/tty/n_tty.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..a2a9832a42 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1668,11 +1668,17 @@ static int
 n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 char *fp, int count, int flow)
 {
-   struct n_tty_data *ldata = tty->disc_data;
+   struct n_tty_data *ldata;
int room, n, rcvd = 0, overflow;
 
down_read(>termios_rwsem);
 
+   ldata = tty->disc_data;
+   if (!ldata) {
+   up_read(>termios_rwsem);
+   return 0;
+   }
+
while (1) {
/*
 * When PARMRK is set, each input char may take up to 3 chars
-- 
2.9.3




Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Wang YanQing
On Thu, Apr 06, 2017 at 05:04:41PM +1000, Michael Neuling wrote:
> Hi all,
> 
> We are seeing the following crash (in linux-next but has been around since at
> least v4.10). 
> 
> [  417.514499] Unable to handle kernel paging request for data at address 
> 0x2260
> [  417.515361] Faulting instruction address: 0xc06fad80
> cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
> pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
> lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
> sp: c0799411fb10
>    msr: 9280b033
>    dar: 2260
>  dsisr: 4000
>   current = 0xc079675d1e00
>   paca= 0xcfb0d200 softe: 0    irq_happened: 0x01
> pid   = 5, comm = kworker/u56:0
> Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
> 2017
> enter ? for help
> [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
> [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
> [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
> [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
> [c0799411fd30] c010a528 worker_thread+0x98/0x5d0
> [c0799411fdc0] c011343c kthread+0x16c/0x1b0
> [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
> 
> It seems the null ptr deref is in n_tty_receive_buf_common() where we do:
> 
>   size_t tail = smp_load_acquire(>read_tail);
> 
> ldata is NULL.
> 
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
> 
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
> 
> This is on powerpc but has also been reported by parisc.
> 
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me. 
> 
> If anyone has an idea, I'm happy to try a patch.
> 
> Regards,
> Mikey
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index bdf0e6e899..99dd757aa4 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const 
> unsigned char *cp,
>  
>   down_read(>termios_rwsem);
>  
> + /* This probably shouldn't happen, but return 0 data processed */
> + if (!ldata)
> + return 0;
> +
>   while (1) {
>   /*
>* When PARMRK is set, each input char may take up to 3 chars

Maybe your patch should looks like:
+   /* This probably shouldn't happen, but return 0 data processed */
+   if (!ldata) {
+   up_read(>termios_rwsem);
+   return 0;
+   }

or

Maybe below patch should work:
@@ -1668,11 +1668,12 @@ static int
 n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
  char *fp, int count, int flow)
 {
-   struct n_tty_data *ldata = tty->disc_data;
+   struct n_tty_data *ldata;
   int room, n, rcvd = 0, overflow;

down_read(>termios_rwsem);

+   ldata = tty->disc_data;


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Wang YanQing
On Thu, Apr 06, 2017 at 05:04:41PM +1000, Michael Neuling wrote:
> Hi all,
> 
> We are seeing the following crash (in linux-next but has been around since at
> least v4.10). 
> 
> [  417.514499] Unable to handle kernel paging request for data at address 
> 0x2260
> [  417.515361] Faulting instruction address: 0xc06fad80
> cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
> pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
> lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
> sp: c0799411fb10
>    msr: 9280b033
>    dar: 2260
>  dsisr: 4000
>   current = 0xc079675d1e00
>   paca= 0xcfb0d200 softe: 0    irq_happened: 0x01
> pid   = 5, comm = kworker/u56:0
> Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
> 2017
> enter ? for help
> [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
> [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
> [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
> [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
> [c0799411fd30] c010a528 worker_thread+0x98/0x5d0
> [c0799411fdc0] c011343c kthread+0x16c/0x1b0
> [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
> 
> It seems the null ptr deref is in n_tty_receive_buf_common() where we do:
> 
>   size_t tail = smp_load_acquire(>read_tail);
> 
> ldata is NULL.
> 
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
> 
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
> 
> This is on powerpc but has also been reported by parisc.
> 
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me. 
> 
> If anyone has an idea, I'm happy to try a patch.
> 
> Regards,
> Mikey
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index bdf0e6e899..99dd757aa4 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const 
> unsigned char *cp,
>  
>   down_read(>termios_rwsem);
>  
> + /* This probably shouldn't happen, but return 0 data processed */
> + if (!ldata)
> + return 0;
> +
>   while (1) {
>   /*
>* When PARMRK is set, each input char may take up to 3 chars

Maybe your patch should looks like:
+   /* This probably shouldn't happen, but return 0 data processed */
+   if (!ldata) {
+   up_read(>termios_rwsem);
+   return 0;
+   }

or

Maybe below patch should work:
@@ -1668,11 +1668,12 @@ static int
 n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
  char *fp, int count, int flow)
 {
-   struct n_tty_data *ldata = tty->disc_data;
+   struct n_tty_data *ldata;
   int room, n, rcvd = 0, overflow;

down_read(>termios_rwsem);

+   ldata = tty->disc_data;


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Benjamin Herrenschmidt
On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote:
> 
> Can you try this one [1].
> 
> Rob
> 
> [1] https://lkml.org/lkml/2017/3/23/569

Unless I missed something, that patch doesn't change barriers or
locking, so I don't see how it could fix anything ...

Mind you we haven't quite figured out exactly what's happening here.

Cheers,
Ben.



Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Benjamin Herrenschmidt
On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote:
> 
> Can you try this one [1].
> 
> Rob
> 
> [1] https://lkml.org/lkml/2017/3/23/569

Unless I missed something, that patch doesn't change barriers or
locking, so I don't see how it could fix anything ...

Mind you we haven't quite figured out exactly what's happening here.

Cheers,
Ben.



Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling

> > If anyone has an idea, I'm happy to try a patch.
> 
> Can you try this one [1].

Rob, I'm still hitting it when I apply that on next-20170405. Crash below..

Any other clues?

[  229.422825] Unable to handle kernel paging request for data at address 
0x2260
[  229.423681] Faulting instruction address: 0xc06fad80
cpu 0x13: Vector: 300 (Data Access) at [c0799411f8a0]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb20
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079665d1e00
  paca= 0xcfb0be00   softe: 0    irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405+ (mikey@bml86) (gcc version 5.4.0
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #4 SMP Thu Apr 6 19:13:58 CDT
2017
enter ? for help
[c0799411fbf0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc20] c07009fc tty_port_default_receive_buf+0x2c/0x40
[c0799411fc40] c0700278 flush_to_ldisc+0x168/0x190
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
13:mon> r
R00 = c06fad5c   R16 = 
R01 = c0799411fb20   R17 = 
R02 = c14b9200   R18 = c07994060ea8
R03 =    R19 = c07994060c78
R04 = c21c2420   R20 = c07994060c20
R05 = c21c2520   R21 = 
R06 = 0100   R22 = 
R07 = 0001   R23 = 0001
R08 =    R24 = 
R09 = c000fc459600   R25 = 
R10 = c000fc459628   R26 = c21c2420
R11 = 000201010005   R27 = c21c2520
R12 = 24002828   R28 = 0100
R13 = cfb0be00   R29 = c000fc459400
R14 = c01132d8   R30 = 0001
R15 = c079941f0ec0   R31 = c000fc459400
pc  = c06fad80 n_tty_receive_buf_common+0xc0/0xbd0
cfar= c0b98e10 down_read+0x70/0x90
lr  = c06fad5c n_tty_receive_buf_common+0x9c/0xbd0
msr = 9280b033   cr  = 24042828
ctr = c06fb890   xer =    trap =  300
dar = 2260   dsisr = 4000


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling

> > If anyone has an idea, I'm happy to try a patch.
> 
> Can you try this one [1].

Rob, I'm still hitting it when I apply that on next-20170405. Crash below..

Any other clues?

[  229.422825] Unable to handle kernel paging request for data at address 
0x2260
[  229.423681] Faulting instruction address: 0xc06fad80
cpu 0x13: Vector: 300 (Data Access) at [c0799411f8a0]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb20
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079665d1e00
  paca= 0xcfb0be00   softe: 0    irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405+ (mikey@bml86) (gcc version 5.4.0
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #4 SMP Thu Apr 6 19:13:58 CDT
2017
enter ? for help
[c0799411fbf0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc20] c07009fc tty_port_default_receive_buf+0x2c/0x40
[c0799411fc40] c0700278 flush_to_ldisc+0x168/0x190
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
13:mon> r
R00 = c06fad5c   R16 = 
R01 = c0799411fb20   R17 = 
R02 = c14b9200   R18 = c07994060ea8
R03 =    R19 = c07994060c78
R04 = c21c2420   R20 = c07994060c20
R05 = c21c2520   R21 = 
R06 = 0100   R22 = 
R07 = 0001   R23 = 0001
R08 =    R24 = 
R09 = c000fc459600   R25 = 
R10 = c000fc459628   R26 = c21c2420
R11 = 000201010005   R27 = c21c2520
R12 = 24002828   R28 = 0100
R13 = cfb0be00   R29 = c000fc459400
R14 = c01132d8   R30 = 0001
R15 = c079941f0ec0   R31 = c000fc459400
pc  = c06fad80 n_tty_receive_buf_common+0xc0/0xbd0
cfar= c0b98e10 down_read+0x70/0x90
lr  = c06fad5c n_tty_receive_buf_common+0x9c/0xbd0
msr = 9280b033   cr  = 24042828
ctr = c06fb890   xer =    trap =  300
dar = 2260   dsisr = 4000


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Rob Herring
On Thu, Apr 6, 2017 at 2:04 AM, Michael Neuling  wrote:
> Hi all,
>
> We are seeing the following crash (in linux-next but has been around since at
> least v4.10).
>
> [  417.514499] Unable to handle kernel paging request for data at address 
> 0x2260
> [  417.515361] Faulting instruction address: 0xc06fad80
> cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
> pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
> lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
> sp: c0799411fb10
>msr: 9280b033
>dar: 2260
>  dsisr: 4000
>   current = 0xc079675d1e00
>   paca= 0xcfb0d200   softe: 0irq_happened: 0x01
> pid   = 5, comm = kworker/u56:0
> Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
> 2017
> enter ? for help
> [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
> [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
> [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
> [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
> [c0799411fd30] c010a528 worker_thread+0x98/0x5d0
> [c0799411fdc0] c011343c kthread+0x16c/0x1b0
> [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
>
> It seems the null ptr deref is in n_tty_receive_buf_common() where we do:
>
> size_t tail = smp_load_acquire(>read_tail);
>
> ldata is NULL.
>
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
>
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
>
> This is on powerpc but has also been reported by parisc.
>
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me.
>
> If anyone has an idea, I'm happy to try a patch.

Can you try this one [1].

Rob

[1] https://lkml.org/lkml/2017/3/23/569


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Rob Herring
On Thu, Apr 6, 2017 at 2:04 AM, Michael Neuling  wrote:
> Hi all,
>
> We are seeing the following crash (in linux-next but has been around since at
> least v4.10).
>
> [  417.514499] Unable to handle kernel paging request for data at address 
> 0x2260
> [  417.515361] Faulting instruction address: 0xc06fad80
> cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
> pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
> lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
> sp: c0799411fb10
>msr: 9280b033
>dar: 2260
>  dsisr: 4000
>   current = 0xc079675d1e00
>   paca= 0xcfb0d200   softe: 0irq_happened: 0x01
> pid   = 5, comm = kworker/u56:0
> Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
> 2017
> enter ? for help
> [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
> [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
> [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
> [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
> [c0799411fd30] c010a528 worker_thread+0x98/0x5d0
> [c0799411fdc0] c011343c kthread+0x16c/0x1b0
> [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
>
> It seems the null ptr deref is in n_tty_receive_buf_common() where we do:
>
> size_t tail = smp_load_acquire(>read_tail);
>
> ldata is NULL.
>
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
>
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
>
> This is on powerpc but has also been reported by parisc.
>
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me.
>
> If anyone has an idea, I'm happy to try a patch.

Can you try this one [1].

Rob

[1] https://lkml.org/lkml/2017/3/23/569


Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Benjamin Herrenschmidt
On Thu, 2017-04-06 at 17:04 +1000, Michael Neuling wrote:
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
> 
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
> 
> This is on powerpc but has also been reported by parisc.
> 
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me. 
> 
> If anyone has an idea, I'm happy to try a patch.

Note that we noticed one path that called reinit without the ldisc lock
held for writing, we added that, but it didn't fix the problem.

Cheers,
Ben.



Re: tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Benjamin Herrenschmidt
On Thu, 2017-04-06 at 17:04 +1000, Michael Neuling wrote:
> We see this usually on boot but can also see it if we kill a getty attached to
> tty (which is then respawned by systemd).  It seems like we are flushing data 
> to
> a tty at the same time as it's being torn down and restarted.
> 
> I did try the below patch which avoids the crash but locks up one of the 
> CPUs. I
> guess the data never gets flushed if we say nothing is processed.
> 
> This is on powerpc but has also been reported by parisc.
> 
> I'm not at all familiar with the tty layer and looking at the locks, mutexes,
> semaphores and reference counting in there scares the hell out of me. 
> 
> If anyone has an idea, I'm happy to try a patch.

Note that we noticed one path that called reinit without the ldisc lock
held for writing, we added that, but it didn't fix the problem.

Cheers,
Ben.



tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling
Hi all,

We are seeing the following crash (in linux-next but has been around since at
least v4.10). 

[  417.514499] Unable to handle kernel paging request for data at address 
0x2260
[  417.515361] Faulting instruction address: 0xc06fad80
cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb10
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079675d1e00
  paca= 0xcfb0d200   softe: 0    irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
2017
enter ? for help
[c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
[c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74

It seems the null ptr deref is in n_tty_receive_buf_common() where we do:

size_t tail = smp_load_acquire(>read_tail);

ldata is NULL.

We see this usually on boot but can also see it if we kill a getty attached to
tty (which is then respawned by systemd).  It seems like we are flushing data to
a tty at the same time as it's being torn down and restarted.

I did try the below patch which avoids the crash but locks up one of the CPUs. I
guess the data never gets flushed if we say nothing is processed.

This is on powerpc but has also been reported by parisc.

I'm not at all familiar with the tty layer and looking at the locks, mutexes,
semaphores and reference counting in there scares the hell out of me. 

If anyone has an idea, I'm happy to try a patch.

Regards,
Mikey

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..99dd757aa4 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const 
unsigned char *cp,
 
down_read(>termios_rwsem);
 
+   /* This probably shouldn't happen, but return 0 data processed */
+   if (!ldata)
+   return 0;
+
while (1) {
/*
 * When PARMRK is set, each input char may take up to 3 chars



tty crash in tty_ldisc_receive_buf()

2017-04-06 Thread Michael Neuling
Hi all,

We are seeing the following crash (in linux-next but has been around since at
least v4.10). 

[  417.514499] Unable to handle kernel paging request for data at address 
0x2260
[  417.515361] Faulting instruction address: 0xc06fad80
cpu 0x15: Vector: 300 (Data Access) at [c0799411f890]
pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c0799411fb10
   msr: 9280b033
   dar: 2260
 dsisr: 4000
  current = 0xc079675d1e00
  paca= 0xcfb0d200   softe: 0    irq_happened: 0x01
pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 
2017
enter ? for help
[c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0
[c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0
[c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130
[c0799411fca0] c010a0fc process_one_work+0x1ec/0x580
[c0799411fd30] c010a528 worker_thread+0x98/0x5d0
[c0799411fdc0] c011343c kthread+0x16c/0x1b0
[c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74

It seems the null ptr deref is in n_tty_receive_buf_common() where we do:

size_t tail = smp_load_acquire(>read_tail);

ldata is NULL.

We see this usually on boot but can also see it if we kill a getty attached to
tty (which is then respawned by systemd).  It seems like we are flushing data to
a tty at the same time as it's being torn down and restarted.

I did try the below patch which avoids the crash but locks up one of the CPUs. I
guess the data never gets flushed if we say nothing is processed.

This is on powerpc but has also been reported by parisc.

I'm not at all familiar with the tty layer and looking at the locks, mutexes,
semaphores and reference counting in there scares the hell out of me. 

If anyone has an idea, I'm happy to try a patch.

Regards,
Mikey

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..99dd757aa4 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const 
unsigned char *cp,
 
down_read(>termios_rwsem);
 
+   /* This probably shouldn't happen, but return 0 data processed */
+   if (!ldata)
+   return 0;
+
while (1) {
/*
 * When PARMRK is set, each input char may take up to 3 chars