Re: tty crash in tty_ldisc_receive_buf()
On Fri, 2017-04-07 at 09:03 -0500, Rob Herring wrote: > Is this with a serial port? There were some changes to serial_core > close in 4.9. It's a combo of serial and hvc actually. Cheers, Ben.
Re: tty crash in tty_ldisc_receive_buf()
On Fri, 2017-04-07 at 09:03 -0500, Rob Herring wrote: > Is this with a serial port? There were some changes to serial_core > close in 4.9. It's a combo of serial and hvc actually. Cheers, Ben.
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 6, 2017 at 8:03 PM, Benjamin Herrenschmidtwrote: > On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote: >> >> Can you try this one [1]. >> >> Rob >> >> [1] https://lkml.org/lkml/2017/3/23/569 > > Unless I missed something, that patch doesn't change barriers or > locking, so I don't see how it could fix anything ... Only because that's a code path that I changed recently and the locking and ordering appears to be tricky. > Mind you we haven't quite figured out exactly what's happening here. Is this with a serial port? There were some changes to serial_core close in 4.9. Rob
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 6, 2017 at 8:03 PM, Benjamin Herrenschmidt wrote: > On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote: >> >> Can you try this one [1]. >> >> Rob >> >> [1] https://lkml.org/lkml/2017/3/23/569 > > Unless I missed something, that patch doesn't change barriers or > locking, so I don't see how it could fix anything ... Only because that's a code path that I changed recently and the locking and ordering appears to be tricky. > Mind you we haven't quite figured out exactly what's happening here. Is this with a serial port? There were some changes to serial_core close in 4.9. Rob
Re: tty crash in tty_ldisc_receive_buf()
> > + /* This probably shouldn't happen, but return 0 data processed */ > > + if (!ldata) > > + return 0; > > + > > while (1) { > > /* > > * When PARMRK is set, each input char may take up to 3 > > chars > > Maybe your patch should looks like: > + /* This probably shouldn't happen, but return 0 data processed */ > + if (!ldata) { > + up_read(>termios_rwsem); > + return 0; > + } Oops, nice catch.. Thanks! That does indeed fix the problem now without the softlockup. I'm not sure it's the right fix, but full patch below. Anyone see a problem with this approach? Am I just papering over a real issue? > Maybe below patch should work: > @@ -1668,11 +1668,12 @@ static int > n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, > char *fp, int count, int flow) > { > - struct n_tty_data *ldata = tty->disc_data; > + struct n_tty_data *ldata; > int room, n, rcvd = 0, overflow; > > down_read(>termios_rwsem); > > + ldata = tty->disc_data; I did try just that alone and it didn't help. Mikey >From 75c2a0369450692946ca8cc7ac148a98deaecd2a Mon Sep 17 00:00:00 2001 From: Michael NeulingDate: Fri, 7 Apr 2017 11:31:02 +1000 Subject: [PATCH] tty: fix regression in flush_to_ldisc When reiniting a tty we can end up with: [ 417.514499] Unable to handle kernel paging request for data at address 0x2260 [ 417.515361] Faulting instruction address: 0xc06fad80 cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb10 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079675d1e00 paca= 0xcfb0d200 softe: 0irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017 enter ? for help [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 This is due to a NULL ptr dref of tty->disc_data. This fixes the issue by moving the disc_data read to after we take the semaphore, then returning 0 data processed when NULL. Cc: [4.10+] Signed-off-by: Michael Neuling --- drivers/tty/n_tty.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e899..a2a9832a42 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1668,11 +1668,17 @@ static int n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, char *fp, int count, int flow) { - struct n_tty_data *ldata = tty->disc_data; + struct n_tty_data *ldata; int room, n, rcvd = 0, overflow; down_read(>termios_rwsem); + ldata = tty->disc_data; + if (!ldata) { + up_read(>termios_rwsem); + return 0; + } + while (1) { /* * When PARMRK is set, each input char may take up to 3 chars -- 2.9.3
Re: tty crash in tty_ldisc_receive_buf()
> > + /* This probably shouldn't happen, but return 0 data processed */ > > + if (!ldata) > > + return 0; > > + > > while (1) { > > /* > > * When PARMRK is set, each input char may take up to 3 > > chars > > Maybe your patch should looks like: > + /* This probably shouldn't happen, but return 0 data processed */ > + if (!ldata) { > + up_read(>termios_rwsem); > + return 0; > + } Oops, nice catch.. Thanks! That does indeed fix the problem now without the softlockup. I'm not sure it's the right fix, but full patch below. Anyone see a problem with this approach? Am I just papering over a real issue? > Maybe below patch should work: > @@ -1668,11 +1668,12 @@ static int > n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, > char *fp, int count, int flow) > { > - struct n_tty_data *ldata = tty->disc_data; > + struct n_tty_data *ldata; > int room, n, rcvd = 0, overflow; > > down_read(>termios_rwsem); > > + ldata = tty->disc_data; I did try just that alone and it didn't help. Mikey >From 75c2a0369450692946ca8cc7ac148a98deaecd2a Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Fri, 7 Apr 2017 11:31:02 +1000 Subject: [PATCH] tty: fix regression in flush_to_ldisc When reiniting a tty we can end up with: [ 417.514499] Unable to handle kernel paging request for data at address 0x2260 [ 417.515361] Faulting instruction address: 0xc06fad80 cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb10 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079675d1e00 paca= 0xcfb0d200 softe: 0irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017 enter ? for help [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 This is due to a NULL ptr dref of tty->disc_data. This fixes the issue by moving the disc_data read to after we take the semaphore, then returning 0 data processed when NULL. Cc: [4.10+] Signed-off-by: Michael Neuling --- drivers/tty/n_tty.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e899..a2a9832a42 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1668,11 +1668,17 @@ static int n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, char *fp, int count, int flow) { - struct n_tty_data *ldata = tty->disc_data; + struct n_tty_data *ldata; int room, n, rcvd = 0, overflow; down_read(>termios_rwsem); + ldata = tty->disc_data; + if (!ldata) { + up_read(>termios_rwsem); + return 0; + } + while (1) { /* * When PARMRK is set, each input char may take up to 3 chars -- 2.9.3
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 06, 2017 at 05:04:41PM +1000, Michael Neuling wrote: > Hi all, > > We are seeing the following crash (in linux-next but has been around since at > least v4.10). > > [ 417.514499] Unable to handle kernel paging request for data at address > 0x2260 > [ 417.515361] Faulting instruction address: 0xc06fad80 > cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] > pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 > lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 > sp: c0799411fb10 > msr: 9280b033 > dar: 2260 > dsisr: 4000 > current = 0xc079675d1e00 > paca= 0xcfb0d200 softe: 0 irq_happened: 0x01 > pid = 5, comm = kworker/u56:0 > Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 > 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT > 2017 > enter ? for help > [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 > [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 > [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 > [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 > [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 > [c0799411fdc0] c011343c kthread+0x16c/0x1b0 > [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 > > It seems the null ptr deref is in n_tty_receive_buf_common() where we do: > > size_t tail = smp_load_acquire(>read_tail); > > ldata is NULL. > > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. > > Regards, > Mikey > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > index bdf0e6e899..99dd757aa4 100644 > --- a/drivers/tty/n_tty.c > +++ b/drivers/tty/n_tty.c > @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const > unsigned char *cp, > > down_read(>termios_rwsem); > > + /* This probably shouldn't happen, but return 0 data processed */ > + if (!ldata) > + return 0; > + > while (1) { > /* >* When PARMRK is set, each input char may take up to 3 chars Maybe your patch should looks like: + /* This probably shouldn't happen, but return 0 data processed */ + if (!ldata) { + up_read(>termios_rwsem); + return 0; + } or Maybe below patch should work: @@ -1668,11 +1668,12 @@ static int n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, char *fp, int count, int flow) { - struct n_tty_data *ldata = tty->disc_data; + struct n_tty_data *ldata; int room, n, rcvd = 0, overflow; down_read(>termios_rwsem); + ldata = tty->disc_data;
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 06, 2017 at 05:04:41PM +1000, Michael Neuling wrote: > Hi all, > > We are seeing the following crash (in linux-next but has been around since at > least v4.10). > > [ 417.514499] Unable to handle kernel paging request for data at address > 0x2260 > [ 417.515361] Faulting instruction address: 0xc06fad80 > cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] > pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 > lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 > sp: c0799411fb10 > msr: 9280b033 > dar: 2260 > dsisr: 4000 > current = 0xc079675d1e00 > paca= 0xcfb0d200 softe: 0 irq_happened: 0x01 > pid = 5, comm = kworker/u56:0 > Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 > 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT > 2017 > enter ? for help > [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 > [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 > [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 > [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 > [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 > [c0799411fdc0] c011343c kthread+0x16c/0x1b0 > [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 > > It seems the null ptr deref is in n_tty_receive_buf_common() where we do: > > size_t tail = smp_load_acquire(>read_tail); > > ldata is NULL. > > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. > > Regards, > Mikey > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > index bdf0e6e899..99dd757aa4 100644 > --- a/drivers/tty/n_tty.c > +++ b/drivers/tty/n_tty.c > @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const > unsigned char *cp, > > down_read(>termios_rwsem); > > + /* This probably shouldn't happen, but return 0 data processed */ > + if (!ldata) > + return 0; > + > while (1) { > /* >* When PARMRK is set, each input char may take up to 3 chars Maybe your patch should looks like: + /* This probably shouldn't happen, but return 0 data processed */ + if (!ldata) { + up_read(>termios_rwsem); + return 0; + } or Maybe below patch should work: @@ -1668,11 +1668,12 @@ static int n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, char *fp, int count, int flow) { - struct n_tty_data *ldata = tty->disc_data; + struct n_tty_data *ldata; int room, n, rcvd = 0, overflow; down_read(>termios_rwsem); + ldata = tty->disc_data;
Re: tty crash in tty_ldisc_receive_buf()
On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote: > > Can you try this one [1]. > > Rob > > [1] https://lkml.org/lkml/2017/3/23/569 Unless I missed something, that patch doesn't change barriers or locking, so I don't see how it could fix anything ... Mind you we haven't quite figured out exactly what's happening here. Cheers, Ben.
Re: tty crash in tty_ldisc_receive_buf()
On Thu, 2017-04-06 at 08:28 -0500, Rob Herring wrote: > > Can you try this one [1]. > > Rob > > [1] https://lkml.org/lkml/2017/3/23/569 Unless I missed something, that patch doesn't change barriers or locking, so I don't see how it could fix anything ... Mind you we haven't quite figured out exactly what's happening here. Cheers, Ben.
Re: tty crash in tty_ldisc_receive_buf()
> > If anyone has an idea, I'm happy to try a patch. > > Can you try this one [1]. Rob, I'm still hitting it when I apply that on next-20170405. Crash below.. Any other clues? [ 229.422825] Unable to handle kernel paging request for data at address 0x2260 [ 229.423681] Faulting instruction address: 0xc06fad80 cpu 0x13: Vector: 300 (Data Access) at [c0799411f8a0] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb20 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079665d1e00 paca= 0xcfb0be00 softe: 0 irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405+ (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #4 SMP Thu Apr 6 19:13:58 CDT 2017 enter ? for help [c0799411fbf0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc20] c07009fc tty_port_default_receive_buf+0x2c/0x40 [c0799411fc40] c0700278 flush_to_ldisc+0x168/0x190 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 13:mon> r R00 = c06fad5c R16 = R01 = c0799411fb20 R17 = R02 = c14b9200 R18 = c07994060ea8 R03 = R19 = c07994060c78 R04 = c21c2420 R20 = c07994060c20 R05 = c21c2520 R21 = R06 = 0100 R22 = R07 = 0001 R23 = 0001 R08 = R24 = R09 = c000fc459600 R25 = R10 = c000fc459628 R26 = c21c2420 R11 = 000201010005 R27 = c21c2520 R12 = 24002828 R28 = 0100 R13 = cfb0be00 R29 = c000fc459400 R14 = c01132d8 R30 = 0001 R15 = c079941f0ec0 R31 = c000fc459400 pc = c06fad80 n_tty_receive_buf_common+0xc0/0xbd0 cfar= c0b98e10 down_read+0x70/0x90 lr = c06fad5c n_tty_receive_buf_common+0x9c/0xbd0 msr = 9280b033 cr = 24042828 ctr = c06fb890 xer = trap = 300 dar = 2260 dsisr = 4000
Re: tty crash in tty_ldisc_receive_buf()
> > If anyone has an idea, I'm happy to try a patch. > > Can you try this one [1]. Rob, I'm still hitting it when I apply that on next-20170405. Crash below.. Any other clues? [ 229.422825] Unable to handle kernel paging request for data at address 0x2260 [ 229.423681] Faulting instruction address: 0xc06fad80 cpu 0x13: Vector: 300 (Data Access) at [c0799411f8a0] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb20 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079665d1e00 paca= 0xcfb0be00 softe: 0 irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405+ (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #4 SMP Thu Apr 6 19:13:58 CDT 2017 enter ? for help [c0799411fbf0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc20] c07009fc tty_port_default_receive_buf+0x2c/0x40 [c0799411fc40] c0700278 flush_to_ldisc+0x168/0x190 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 13:mon> r R00 = c06fad5c R16 = R01 = c0799411fb20 R17 = R02 = c14b9200 R18 = c07994060ea8 R03 = R19 = c07994060c78 R04 = c21c2420 R20 = c07994060c20 R05 = c21c2520 R21 = R06 = 0100 R22 = R07 = 0001 R23 = 0001 R08 = R24 = R09 = c000fc459600 R25 = R10 = c000fc459628 R26 = c21c2420 R11 = 000201010005 R27 = c21c2520 R12 = 24002828 R28 = 0100 R13 = cfb0be00 R29 = c000fc459400 R14 = c01132d8 R30 = 0001 R15 = c079941f0ec0 R31 = c000fc459400 pc = c06fad80 n_tty_receive_buf_common+0xc0/0xbd0 cfar= c0b98e10 down_read+0x70/0x90 lr = c06fad5c n_tty_receive_buf_common+0x9c/0xbd0 msr = 9280b033 cr = 24042828 ctr = c06fb890 xer = trap = 300 dar = 2260 dsisr = 4000
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 6, 2017 at 2:04 AM, Michael Neulingwrote: > Hi all, > > We are seeing the following crash (in linux-next but has been around since at > least v4.10). > > [ 417.514499] Unable to handle kernel paging request for data at address > 0x2260 > [ 417.515361] Faulting instruction address: 0xc06fad80 > cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] > pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 > lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 > sp: c0799411fb10 >msr: 9280b033 >dar: 2260 > dsisr: 4000 > current = 0xc079675d1e00 > paca= 0xcfb0d200 softe: 0irq_happened: 0x01 > pid = 5, comm = kworker/u56:0 > Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 > 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT > 2017 > enter ? for help > [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 > [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 > [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 > [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 > [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 > [c0799411fdc0] c011343c kthread+0x16c/0x1b0 > [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 > > It seems the null ptr deref is in n_tty_receive_buf_common() where we do: > > size_t tail = smp_load_acquire(>read_tail); > > ldata is NULL. > > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. Can you try this one [1]. Rob [1] https://lkml.org/lkml/2017/3/23/569
Re: tty crash in tty_ldisc_receive_buf()
On Thu, Apr 6, 2017 at 2:04 AM, Michael Neuling wrote: > Hi all, > > We are seeing the following crash (in linux-next but has been around since at > least v4.10). > > [ 417.514499] Unable to handle kernel paging request for data at address > 0x2260 > [ 417.515361] Faulting instruction address: 0xc06fad80 > cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] > pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 > lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 > sp: c0799411fb10 >msr: 9280b033 >dar: 2260 > dsisr: 4000 > current = 0xc079675d1e00 > paca= 0xcfb0d200 softe: 0irq_happened: 0x01 > pid = 5, comm = kworker/u56:0 > Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 > 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT > 2017 > enter ? for help > [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 > [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 > [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 > [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 > [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 > [c0799411fdc0] c011343c kthread+0x16c/0x1b0 > [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 > > It seems the null ptr deref is in n_tty_receive_buf_common() where we do: > > size_t tail = smp_load_acquire(>read_tail); > > ldata is NULL. > > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. Can you try this one [1]. Rob [1] https://lkml.org/lkml/2017/3/23/569
Re: tty crash in tty_ldisc_receive_buf()
On Thu, 2017-04-06 at 17:04 +1000, Michael Neuling wrote: > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. Note that we noticed one path that called reinit without the ldisc lock held for writing, we added that, but it didn't fix the problem. Cheers, Ben.
Re: tty crash in tty_ldisc_receive_buf()
On Thu, 2017-04-06 at 17:04 +1000, Michael Neuling wrote: > We see this usually on boot but can also see it if we kill a getty attached to > tty (which is then respawned by systemd). It seems like we are flushing data > to > a tty at the same time as it's being torn down and restarted. > > I did try the below patch which avoids the crash but locks up one of the > CPUs. I > guess the data never gets flushed if we say nothing is processed. > > This is on powerpc but has also been reported by parisc. > > I'm not at all familiar with the tty layer and looking at the locks, mutexes, > semaphores and reference counting in there scares the hell out of me. > > If anyone has an idea, I'm happy to try a patch. Note that we noticed one path that called reinit without the ldisc lock held for writing, we added that, but it didn't fix the problem. Cheers, Ben.
tty crash in tty_ldisc_receive_buf()
Hi all, We are seeing the following crash (in linux-next but has been around since at least v4.10). [ 417.514499] Unable to handle kernel paging request for data at address 0x2260 [ 417.515361] Faulting instruction address: 0xc06fad80 cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb10 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079675d1e00 paca= 0xcfb0d200 softe: 0 irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017 enter ? for help [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 It seems the null ptr deref is in n_tty_receive_buf_common() where we do: size_t tail = smp_load_acquire(>read_tail); ldata is NULL. We see this usually on boot but can also see it if we kill a getty attached to tty (which is then respawned by systemd). It seems like we are flushing data to a tty at the same time as it's being torn down and restarted. I did try the below patch which avoids the crash but locks up one of the CPUs. I guess the data never gets flushed if we say nothing is processed. This is on powerpc but has also been reported by parisc. I'm not at all familiar with the tty layer and looking at the locks, mutexes, semaphores and reference counting in there scares the hell out of me. If anyone has an idea, I'm happy to try a patch. Regards, Mikey diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e899..99dd757aa4 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, down_read(>termios_rwsem); + /* This probably shouldn't happen, but return 0 data processed */ + if (!ldata) + return 0; + while (1) { /* * When PARMRK is set, each input char may take up to 3 chars
tty crash in tty_ldisc_receive_buf()
Hi all, We are seeing the following crash (in linux-next but has been around since at least v4.10). [ 417.514499] Unable to handle kernel paging request for data at address 0x2260 [ 417.515361] Faulting instruction address: 0xc06fad80 cpu 0x15: Vector: 300 (Data Access) at [c0799411f890] pc: c06fad80: n_tty_receive_buf_common+0xc0/0xbd0 lr: c06fad5c: n_tty_receive_buf_common+0x9c/0xbd0 sp: c0799411fb10 msr: 9280b033 dar: 2260 dsisr: 4000 current = 0xc079675d1e00 paca= 0xcfb0d200 softe: 0 irq_happened: 0x01 pid = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017 enter ? for help [c0799411fbe0] c06ff968 tty_ldisc_receive_buf+0x48/0xe0 [c0799411fc10] c07009d8 tty_port_default_receive_buf+0x68/0xe0 [c0799411fc50] c06ffce4 flush_to_ldisc+0x114/0x130 [c0799411fca0] c010a0fc process_one_work+0x1ec/0x580 [c0799411fd30] c010a528 worker_thread+0x98/0x5d0 [c0799411fdc0] c011343c kthread+0x16c/0x1b0 [c0799411fe30] c000b4e8 ret_from_kernel_thread+0x5c/0x74 It seems the null ptr deref is in n_tty_receive_buf_common() where we do: size_t tail = smp_load_acquire(>read_tail); ldata is NULL. We see this usually on boot but can also see it if we kill a getty attached to tty (which is then respawned by systemd). It seems like we are flushing data to a tty at the same time as it's being torn down and restarted. I did try the below patch which avoids the crash but locks up one of the CPUs. I guess the data never gets flushed if we say nothing is processed. This is on powerpc but has also been reported by parisc. I'm not at all familiar with the tty layer and looking at the locks, mutexes, semaphores and reference counting in there scares the hell out of me. If anyone has an idea, I'm happy to try a patch. Regards, Mikey diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e899..99dd757aa4 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, down_read(>termios_rwsem); + /* This probably shouldn't happen, but return 0 data processed */ + if (!ldata) + return 0; + while (1) { /* * When PARMRK is set, each input char may take up to 3 chars