Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-05-17 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

INFO: rcu detected stall in corrupted

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
	1-: (124998 ticks this GP) idle=da2/1/4611686018427387906  
softirq=12308/12308 fqs=30969


	1-: (124998 ticks this GP) idle=da2/1/4611686018427387906  
softirq=12308/12308 fqs=30969


 (t=125002 jiffies g=6046 c=6045 q=1179)
(detected by 0, t=125002 jiffies, g=6046, c=6045, q=1179)
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5134 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:rep_nop arch/x86/include/asm/processor.h:667 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:672 [inline]
RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:69 [inline]
RIP: 0010:native_queued_spin_lock_slowpath+0x204/0xde0  
kernel/locking/qspinlock.c:305

RSP: 0018:8801db107030 EFLAGS: 0002
RAX:  RBX: ed003b620e1f RCX: 0004
RDX:  RSI: 0004 RDI: 88b911c0
RBP: 8801db1073a0 R08: fbfff1172239 R09: fbfff1172238
R10: fbfff1172238 R11: 88b911c3 R12: 88b911c0
R13: 8801db107378 R14: 0001 R15: dc00
FS:  7fd0bf7b5700() GS:8801db10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004d91e4 CR3: 0001c68eb000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:674 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:30 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:90 [inline]
 do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
 _raw_spin_lock_irqsave+0x9e/0xc0 kernel/locking/spinlock.c:152
 rcu_dump_cpu_stacks+0x9d/0x1c2 kernel/rcu/tree.c:1373
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
 hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:add_echo_byte drivers/tty/n_tty.c:829 [inline]
RIP: 0010:echo_char+0x152/0x340 drivers/tty/n_tty.c:934
RSP: 0018:8801aa1ef720 EFLAGS: 0293 ORIG_RAX: ff13
RAX: 8801a9a5e180 RBX: c90002bb4000 RCX: 837ff667
RDX:  RSI: 837ff672 RDI: 0001
RBP: 8801aa1ef748 R08: 8801a9a5e180 R09: 0001
R10: 8801a9a5e9e0 R11: 8801a9a5e180 R12: 08df
R13:  R14: 0008 R15: c90002bb6268
 n_tty_receive_char_special+0x1451/0x3510 drivers/tty/n_tty.c:1324
 n_tty_receive_buf_fast drivers/tty/n_tty.c:1595 [inline]
 __receive_buf drivers/tty/n_tty.c:1629 [inline]
 n_tty_receive_buf_common+0x21bd/0x2ca0 drivers/tty/n_tty.c:1727
 n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1756
 tiocsti drivers/tty/tty_io.c:2171 [inline]
 tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455319
RSP: 002b:7fd0bf7b4c68 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 7fd0bf7b56d4 RCX: 00455319
RDX: 20c0 RSI: 5412 RDI: 0004
RBP: 0072bea0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 02f9 R14: 006f77f8 R15: 
Code: 5c 00 00 00 00 48 83 e7 f8 49 29 fc 41 8d 4c 24 64 c1 e9 03 f3 48 

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-05-17 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

INFO: rcu detected stall in corrupted

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
	1-: (124998 ticks this GP) idle=da2/1/4611686018427387906  
softirq=12308/12308 fqs=30969


	1-: (124998 ticks this GP) idle=da2/1/4611686018427387906  
softirq=12308/12308 fqs=30969


 (t=125002 jiffies g=6046 c=6045 q=1179)
(detected by 0, t=125002 jiffies, g=6046, c=6045, q=1179)
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5134 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:rep_nop arch/x86/include/asm/processor.h:667 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:672 [inline]
RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:69 [inline]
RIP: 0010:native_queued_spin_lock_slowpath+0x204/0xde0  
kernel/locking/qspinlock.c:305

RSP: 0018:8801db107030 EFLAGS: 0002
RAX:  RBX: ed003b620e1f RCX: 0004
RDX:  RSI: 0004 RDI: 88b911c0
RBP: 8801db1073a0 R08: fbfff1172239 R09: fbfff1172238
R10: fbfff1172238 R11: 88b911c3 R12: 88b911c0
R13: 8801db107378 R14: 0001 R15: dc00
FS:  7fd0bf7b5700() GS:8801db10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004d91e4 CR3: 0001c68eb000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:674 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:30 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:90 [inline]
 do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
 _raw_spin_lock_irqsave+0x9e/0xc0 kernel/locking/spinlock.c:152
 rcu_dump_cpu_stacks+0x9d/0x1c2 kernel/rcu/tree.c:1373
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
 hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:add_echo_byte drivers/tty/n_tty.c:829 [inline]
RIP: 0010:echo_char+0x152/0x340 drivers/tty/n_tty.c:934
RSP: 0018:8801aa1ef720 EFLAGS: 0293 ORIG_RAX: ff13
RAX: 8801a9a5e180 RBX: c90002bb4000 RCX: 837ff667
RDX:  RSI: 837ff672 RDI: 0001
RBP: 8801aa1ef748 R08: 8801a9a5e180 R09: 0001
R10: 8801a9a5e9e0 R11: 8801a9a5e180 R12: 08df
R13:  R14: 0008 R15: c90002bb6268
 n_tty_receive_char_special+0x1451/0x3510 drivers/tty/n_tty.c:1324
 n_tty_receive_buf_fast drivers/tty/n_tty.c:1595 [inline]
 __receive_buf drivers/tty/n_tty.c:1629 [inline]
 n_tty_receive_buf_common+0x21bd/0x2ca0 drivers/tty/n_tty.c:1727
 n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1756
 tiocsti drivers/tty/tty_io.c:2171 [inline]
 tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455319
RSP: 002b:7fd0bf7b4c68 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 7fd0bf7b56d4 RCX: 00455319
RDX: 20c0 RSI: 5412 RDI: 0004
RBP: 0072bea0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 02f9 R14: 006f77f8 R15: 
Code: 5c 00 00 00 00 48 83 e7 f8 49 29 fc 41 8d 4c 24 64 c1 e9 03 f3 48 

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-05-17 Thread Tetsuo Handa
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master

>From ada7279f7f034c5fd79fc04e1120069ea5f6cef2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 17 May 2018 20:42:50 +0900
Subject: [PATCH] n_tty: Access echo_* variables carefully.

syzbot is reporting stalls at __process_echoes() [1]. This is because
since ldata->echo_commit < ldata->echo_tail becomes true for some reason,
the discard loop is serving as almost infinite loop. This patch tries to
avoid falling into ldata->echo_commit < ldata->echo_tail situation by
making access to echo_* variables more carefully.

Since reset_buffer_flags() is called without output_lock held, it should
not touch echo_* variables. And omit a call to reset_buffer_flags() from
n_tty_open() by using vzalloc().

Since add_echo_byte() is called without output_lock held, it needs memory
barrier between storing into echo_buf[] and incrementing echo_head counter.
echo_buf() needs corresponding memory barrier before reading echo_buf[].
Lack of handling the possibility of not-yet-stored multi-byte operation
might be the reason of falling into ldata->echo_commit < ldata->echo_tail
situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to
echo_buf(ldata, tail + 1), the WARN_ON() fires.

The echo_* variables are now updated with output_lock held. Thus, I guess
that ldata->echo_commit < ldata->echo_tail is no longer possible. But to
become more carefully, use ldata->echo_commit > ldata->echo_tail as
the preventive condition to continue the "while" loops.

[1] 
https://syzkaller.appspot.com/bug?id=3d7481a346958d9469bebbeb0537d5f056bdd6e8

Signed-off-by: Tetsuo Handa 
Reported-by: syzbot 
Cc: Peter Hurley 
---
 drivers/tty/n_tty.c | 51 ++-
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cbe98bc..060ef91 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -90,9 +90,6 @@ struct n_tty_data {
size_t read_head;
size_t commit_head;
size_t canon_head;
-   size_t echo_head;
-   size_t echo_commit;
-   size_t echo_mark;
DECLARE_BITMAP(char_map, 256);
 
/* private to n_tty_receive_overrun (single-threaded) */
@@ -119,6 +116,9 @@ struct n_tty_data {
unsigned int column;
unsigned int canon_column;
size_t echo_tail;
+   size_t echo_head;
+   size_t echo_commit;
+   size_t echo_mark;
 
struct mutex atomic_read_lock;
struct mutex output_lock;
@@ -141,6 +141,7 @@ static inline unsigned char *read_buf_addr(struct 
n_tty_data *ldata, size_t i)
 
 static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i)
 {
+   smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */
return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
 }
 
@@ -316,9 +317,7 @@ static inline void put_tty_queue(unsigned char c, struct 
n_tty_data *ldata)
 static void reset_buffer_flags(struct n_tty_data *ldata)
 {
ldata->read_head = ldata->canon_head = ldata->read_tail = 0;
-   ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0;
ldata->commit_head = 0;
-   ldata->echo_mark = 0;
ldata->line_start = 0;
 
ldata->erasing = 0;
@@ -617,13 +616,22 @@ static size_t __process_echoes(struct tty_struct *tty)
old_space = space = tty_write_room(tty);
 
tail = ldata->echo_tail;
-   while (ldata->echo_commit != tail) {
+   if (WARN_ON(ldata->echo_commit < tail))
+   return 0;
+   while (ldata->echo_commit > tail) {
c = echo_buf(ldata, tail);
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;
 
/*
+* Since add_echo_byte() is called without holding
+* output_lock, we might see only portion of multi-byte
+* operation.
+*/
+   if (ldata->echo_commit == tail + 1)
+   goto not_yet_stored;
+   /*
 * If the buffer byte is the start of a multi-byte
 * operation, get the next byte, which is either the
 * op code or a control character value.
@@ -634,6 +642,8 @@ static size_t __process_echoes(struct tty_struct *tty)
unsigned int num_chars, num_bs;
 
case ECHO_OP_ERASE_TAB:
+   if (ldata->echo_commit == tail + 2)
+   goto not_yet_stored;
num_chars = echo_buf(ldata, tail + 2);
 
/*
@@ -728,7 +738,9 @@ 

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-05-17 Thread Tetsuo Handa
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master

>From ada7279f7f034c5fd79fc04e1120069ea5f6cef2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 17 May 2018 20:42:50 +0900
Subject: [PATCH] n_tty: Access echo_* variables carefully.

syzbot is reporting stalls at __process_echoes() [1]. This is because
since ldata->echo_commit < ldata->echo_tail becomes true for some reason,
the discard loop is serving as almost infinite loop. This patch tries to
avoid falling into ldata->echo_commit < ldata->echo_tail situation by
making access to echo_* variables more carefully.

Since reset_buffer_flags() is called without output_lock held, it should
not touch echo_* variables. And omit a call to reset_buffer_flags() from
n_tty_open() by using vzalloc().

Since add_echo_byte() is called without output_lock held, it needs memory
barrier between storing into echo_buf[] and incrementing echo_head counter.
echo_buf() needs corresponding memory barrier before reading echo_buf[].
Lack of handling the possibility of not-yet-stored multi-byte operation
might be the reason of falling into ldata->echo_commit < ldata->echo_tail
situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to
echo_buf(ldata, tail + 1), the WARN_ON() fires.

The echo_* variables are now updated with output_lock held. Thus, I guess
that ldata->echo_commit < ldata->echo_tail is no longer possible. But to
become more carefully, use ldata->echo_commit > ldata->echo_tail as
the preventive condition to continue the "while" loops.

[1] 
https://syzkaller.appspot.com/bug?id=3d7481a346958d9469bebbeb0537d5f056bdd6e8

Signed-off-by: Tetsuo Handa 
Reported-by: syzbot 
Cc: Peter Hurley 
---
 drivers/tty/n_tty.c | 51 ++-
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cbe98bc..060ef91 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -90,9 +90,6 @@ struct n_tty_data {
size_t read_head;
size_t commit_head;
size_t canon_head;
-   size_t echo_head;
-   size_t echo_commit;
-   size_t echo_mark;
DECLARE_BITMAP(char_map, 256);
 
/* private to n_tty_receive_overrun (single-threaded) */
@@ -119,6 +116,9 @@ struct n_tty_data {
unsigned int column;
unsigned int canon_column;
size_t echo_tail;
+   size_t echo_head;
+   size_t echo_commit;
+   size_t echo_mark;
 
struct mutex atomic_read_lock;
struct mutex output_lock;
@@ -141,6 +141,7 @@ static inline unsigned char *read_buf_addr(struct 
n_tty_data *ldata, size_t i)
 
 static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i)
 {
+   smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */
return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
 }
 
@@ -316,9 +317,7 @@ static inline void put_tty_queue(unsigned char c, struct 
n_tty_data *ldata)
 static void reset_buffer_flags(struct n_tty_data *ldata)
 {
ldata->read_head = ldata->canon_head = ldata->read_tail = 0;
-   ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0;
ldata->commit_head = 0;
-   ldata->echo_mark = 0;
ldata->line_start = 0;
 
ldata->erasing = 0;
@@ -617,13 +616,22 @@ static size_t __process_echoes(struct tty_struct *tty)
old_space = space = tty_write_room(tty);
 
tail = ldata->echo_tail;
-   while (ldata->echo_commit != tail) {
+   if (WARN_ON(ldata->echo_commit < tail))
+   return 0;
+   while (ldata->echo_commit > tail) {
c = echo_buf(ldata, tail);
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;
 
/*
+* Since add_echo_byte() is called without holding
+* output_lock, we might see only portion of multi-byte
+* operation.
+*/
+   if (ldata->echo_commit == tail + 1)
+   goto not_yet_stored;
+   /*
 * If the buffer byte is the start of a multi-byte
 * operation, get the next byte, which is either the
 * op code or a control character value.
@@ -634,6 +642,8 @@ static size_t __process_echoes(struct tty_struct *tty)
unsigned int num_chars, num_bs;
 
case ECHO_OP_ERASE_TAB:
+   if (ldata->echo_commit == tail + 2)
+   goto not_yet_stored;
num_chars = echo_buf(ldata, tail + 2);
 
/*
@@ -728,7 +738,9 @@ static size_t __process_echoes(struct tty_struct *tty)
/* If the echo buffer is nearly full (so that the possibility exists
 * of echo 

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-24 Thread Tetsuo Handa
On 2018/04/24 1:01, Tetsuo Handa wrote:
> Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
> for all input workers") added in 4.4-rc1.
> 
> Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
> reverting that commit on top of 4.16 only helped surviving the test longer
> than vanilla 4.16; still stalls at __process_echoes(). There might be similar
> changes which need to be reverted as well.

I noticed that when stall inside __process_echoes() occurs, it is looping at
discard loop added by commit cbfd0340ae199337 ("n_tty: Process echoes in 
blocks").
Since ldata->echo_commit < ldata->echo_tail is for some reason true upon entry,
the discard loop serves as almost infinite loop.

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -613,12 +613,21 @@ static size_t __process_echoes(struct tty_struct *tty)
int space, old_space;
size_t tail;
unsigned char c;
+   unsigned long start = jiffies;
 
old_space = space = tty_write_room(tty);
 
tail = ldata->echo_tail;
+   if (ldata->echo_commit < tail)
+   printk("BUG1: ldata->echo_commit=%ld tail=%ld space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, space, old_space);
while (ldata->echo_commit != tail) {
c = echo_buf(ldata, tail);
+   if (time_after(jiffies, start + 5 * HZ)) {
+   printk("ldata->echo_commit=%ld tail=%ld c=%c space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, c, space, old_space);
+   start = jiffies;
+   }
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;
@@ -729,6 +738,11 @@ static size_t __process_echoes(struct tty_struct *tty)
 * of echo overrun before the next commit), then discard enough
 * data at the tail to prevent a subsequent overrun */
while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+   if (time_after(jiffies, start + 5 * HZ)) {
+   printk("ldata->echo_commit=%ld ldata->echo_tail=%ld 
tail=%ld space=%d old_space=%d\n",
+  ldata->echo_commit, ldata->echo_tail, tail, 
space, old_space);
+   start = jiffies;
+   }
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -739,6 +753,9 @@ static size_t __process_echoes(struct tty_struct *tty)
}
 
ldata->echo_tail = tail;
+   if (ldata->echo_commit < tail)
+   printk("BUG2: ldata->echo_commit=%ld tail=%ld space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, space, old_space);
return old_space - space;
 }
 


[5.890788] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   27.357324] BUG1: ldata->echo_commit=120 tail=130 space=7936 old_space=7936
[   32.358527] ldata->echo_commit=120 ldata->echo_tail=130 tail=2994833619 
space=0 old_space=7936
[   37.362404] ldata->echo_commit=120 ldata->echo_tail=130 tail=5930715049 
space=0 old_space=7936
[   42.366124] ldata->echo_commit=120 ldata->echo_tail=130 tail=8943875147 
space=0 old_space=7936
[   47.370508] ldata->echo_commit=120 ldata->echo_tail=130 tail=11950164125 
space=0 old_space=7936
[   52.374224] ldata->echo_commit=120 ldata->echo_tail=130 tail=14880807667 
space=0 old_space=7936
[   57.378597] ldata->echo_commit=120 ldata->echo_tail=130 tail=17821338400 
space=0 old_space=7936
[   62.383638] ldata->echo_commit=120 ldata->echo_tail=130 tail=20811717629 
space=0 old_space=7936
[   63.203807] sysrq: SysRq : Resetting

[5.843667] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   28.306346] BUG1: ldata->echo_commit=128 tail=130 space=7936 old_space=7936
[   33.307133] ldata->echo_commit=128 ldata->echo_tail=130 tail=2892537244 
space=0 old_space=7936
[   38.311478] ldata->echo_commit=128 ldata->echo_tail=130 tail=5766457276 
space=0 old_space=7936
[   43.315319] ldata->echo_commit=128 ldata->echo_tail=130 tail=8686538551 
space=0 old_space=7936
[   48.319164] ldata->echo_commit=128 ldata->echo_tail=130 tail=11556458920 
space=0 old_space=7936
[   50.122017] sysrq: SysRq : Resetting

[6.026322] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   38.149864] BUG1: ldata->echo_commit=118 tail=130 space=7936 old_space=7936
[   43.150167] ldata->echo_commit=118 ldata->echo_tail=130 tail=2789490540 
space=0 old_space=7936
[   48.154012] ldata->echo_commit=118 ldata->echo_tail=130 tail=5592467000 
space=0 old_space=7936
[   53.158365] ldata->echo_commit=118 ldata->echo_tail=130 tail=8376906144 
space=0 old_space=7936
[   58.162211] ldata->echo_commit=118 ldata->echo_tail=130 tail=11158984120 
space=0 old_space=7936
[   60.487338] sysrq: SysRq : Resetting

[

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-24 Thread Tetsuo Handa
On 2018/04/24 1:01, Tetsuo Handa wrote:
> Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
> for all input workers") added in 4.4-rc1.
> 
> Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
> reverting that commit on top of 4.16 only helped surviving the test longer
> than vanilla 4.16; still stalls at __process_echoes(). There might be similar
> changes which need to be reverted as well.

I noticed that when stall inside __process_echoes() occurs, it is looping at
discard loop added by commit cbfd0340ae199337 ("n_tty: Process echoes in 
blocks").
Since ldata->echo_commit < ldata->echo_tail is for some reason true upon entry,
the discard loop serves as almost infinite loop.

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -613,12 +613,21 @@ static size_t __process_echoes(struct tty_struct *tty)
int space, old_space;
size_t tail;
unsigned char c;
+   unsigned long start = jiffies;
 
old_space = space = tty_write_room(tty);
 
tail = ldata->echo_tail;
+   if (ldata->echo_commit < tail)
+   printk("BUG1: ldata->echo_commit=%ld tail=%ld space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, space, old_space);
while (ldata->echo_commit != tail) {
c = echo_buf(ldata, tail);
+   if (time_after(jiffies, start + 5 * HZ)) {
+   printk("ldata->echo_commit=%ld tail=%ld c=%c space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, c, space, old_space);
+   start = jiffies;
+   }
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;
@@ -729,6 +738,11 @@ static size_t __process_echoes(struct tty_struct *tty)
 * of echo overrun before the next commit), then discard enough
 * data at the tail to prevent a subsequent overrun */
while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+   if (time_after(jiffies, start + 5 * HZ)) {
+   printk("ldata->echo_commit=%ld ldata->echo_tail=%ld 
tail=%ld space=%d old_space=%d\n",
+  ldata->echo_commit, ldata->echo_tail, tail, 
space, old_space);
+   start = jiffies;
+   }
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -739,6 +753,9 @@ static size_t __process_echoes(struct tty_struct *tty)
}
 
ldata->echo_tail = tail;
+   if (ldata->echo_commit < tail)
+   printk("BUG2: ldata->echo_commit=%ld tail=%ld space=%d 
old_space=%d\n",
+  ldata->echo_commit, tail, space, old_space);
return old_space - space;
 }
 


[5.890788] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   27.357324] BUG1: ldata->echo_commit=120 tail=130 space=7936 old_space=7936
[   32.358527] ldata->echo_commit=120 ldata->echo_tail=130 tail=2994833619 
space=0 old_space=7936
[   37.362404] ldata->echo_commit=120 ldata->echo_tail=130 tail=5930715049 
space=0 old_space=7936
[   42.366124] ldata->echo_commit=120 ldata->echo_tail=130 tail=8943875147 
space=0 old_space=7936
[   47.370508] ldata->echo_commit=120 ldata->echo_tail=130 tail=11950164125 
space=0 old_space=7936
[   52.374224] ldata->echo_commit=120 ldata->echo_tail=130 tail=14880807667 
space=0 old_space=7936
[   57.378597] ldata->echo_commit=120 ldata->echo_tail=130 tail=17821338400 
space=0 old_space=7936
[   62.383638] ldata->echo_commit=120 ldata->echo_tail=130 tail=20811717629 
space=0 old_space=7936
[   63.203807] sysrq: SysRq : Resetting

[5.843667] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   28.306346] BUG1: ldata->echo_commit=128 tail=130 space=7936 old_space=7936
[   33.307133] ldata->echo_commit=128 ldata->echo_tail=130 tail=2892537244 
space=0 old_space=7936
[   38.311478] ldata->echo_commit=128 ldata->echo_tail=130 tail=5766457276 
space=0 old_space=7936
[   43.315319] ldata->echo_commit=128 ldata->echo_tail=130 tail=8686538551 
space=0 old_space=7936
[   48.319164] ldata->echo_commit=128 ldata->echo_tail=130 tail=11556458920 
space=0 old_space=7936
[   50.122017] sysrq: SysRq : Resetting

[6.026322] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   38.149864] BUG1: ldata->echo_commit=118 tail=130 space=7936 old_space=7936
[   43.150167] ldata->echo_commit=118 ldata->echo_tail=130 tail=2789490540 
space=0 old_space=7936
[   48.154012] ldata->echo_commit=118 ldata->echo_tail=130 tail=5592467000 
space=0 old_space=7936
[   53.158365] ldata->echo_commit=118 ldata->echo_tail=130 tail=8376906144 
space=0 old_space=7936
[   58.162211] ldata->echo_commit=118 ldata->echo_tail=130 tail=11158984120 
space=0 old_space=7936
[   60.487338] sysrq: SysRq : Resetting

[

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-23 Thread Tetsuo Handa
On 2018/04/22 0:34, Tetsuo Handa wrote:
> Dmitry Vyukov wrote:
>> This looks somewhat similar to "INFO: rcu detected stall in 
>> __process_echoes":
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> But I am not sure because stall stacks are somewhat different.
> 
> Yes, this will be the same one which can be reproduced using same reproducer
> provided by syzbot. You can try below one like "while :; do echo -n .; 
> ./a.out; done".
> 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char *argv[])
> {
>   int fd[2];
>   unsigned int zero = 0;
>   char buf[128];
>   int ptyno = 0;
>   char c = 0;
>   char data[64] = { };
>   while (1) {
>   fd[0] = open("/dev/ptmx", O_RDWR);
>   ioctl(fd[0], TIOCSPTLCK, );
>   write(fd[0], data, sizeof(data));
>   if (ioctl(fd[0], TIOCGPTN, ))
>   break;
>   sprintf(buf, "/dev/pts/%d", ptyno);
>   fd[1] = open(buf, O_RDONLY);
>   if (fd[1] == EOF)
>   break;
>   ioctl(fd[1], TIOCSTI, );
>   }
>   return 1;
> }
> 

Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
for all input workers") added in 4.4-rc1.

Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
reverting that commit on top of 4.16 only helped surviving the test longer
than vanilla 4.16; still stalls at __process_echoes(). There might be similar
changes which need to be reverted as well.



# bad: [8005c49d9aea74d382f474ce11afbbc7d7130bec] Linux 4.4-rc1
# good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
# good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
# good: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
# good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
# good: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect start 'HEAD' 'v4.3' 'v4.2' 'v4.1' 'v4.0' 'v3.19' 'v3.18' 'v3.17' 
'v3.16' 'v3.15' 'v3.14' 'v3.13' 'v3.12' '--' 'drivers/tty/'
# good: [cc32382d9fd22dc8eebba4a245f50417267bda8e] serial: imx: make setup_ufcr 
more useful
git bisect good cc32382d9fd22dc8eebba4a245f50417267bda8e
# good: [32ede4a51754cb62b0d43d91cb7c4e3c57069a9c] serial: tegra: Add helper 
function for handling RX buffer
git bisect good 32ede4a51754cb62b0d43d91cb7c4e3c57069a9c
# good: [4b41b9539a1e9531f942ededfcdcff372317d2e7] tty: Prevent tty teardown 
during tty_write_message()
git bisect good 4b41b9539a1e9531f942ededfcdcff372317d2e7
# bad: [fd0d351de7bbd718bc2b34d5846854831aa2b88c] Merge tag 'tty-4.4-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad fd0d351de7bbd718bc2b34d5846854831aa2b88c
# good: [14c79092909a52b6fd6394b6ad5e7756c4f9565e] Merge branch 'parisc-4.3-2' 
of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good 14c79092909a52b6fd6394b6ad5e7756c4f9565e
# good: [41ecf1404b34d9975eb97f5005d9e4274eaeb76a] Merge tag 
'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good 41ecf1404b34d9975eb97f5005d9e4274eaeb76a
# bad: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound workqueue 
for all input workers
git bisect bad e052c6d15c61cc4caff2f06cbca72b183da9f15e
# good: [e176058f0de53c2346734e5254835e0045364001] tty: Abstract tty buffer work
git bisect good e176058f0de53c2346734e5254835e0045364001
# first bad commit: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound 
workqueue for all input workers


Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-23 Thread Tetsuo Handa
On 2018/04/22 0:34, Tetsuo Handa wrote:
> Dmitry Vyukov wrote:
>> This looks somewhat similar to "INFO: rcu detected stall in 
>> __process_echoes":
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> But I am not sure because stall stacks are somewhat different.
> 
> Yes, this will be the same one which can be reproduced using same reproducer
> provided by syzbot. You can try below one like "while :; do echo -n .; 
> ./a.out; done".
> 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char *argv[])
> {
>   int fd[2];
>   unsigned int zero = 0;
>   char buf[128];
>   int ptyno = 0;
>   char c = 0;
>   char data[64] = { };
>   while (1) {
>   fd[0] = open("/dev/ptmx", O_RDWR);
>   ioctl(fd[0], TIOCSPTLCK, );
>   write(fd[0], data, sizeof(data));
>   if (ioctl(fd[0], TIOCGPTN, ))
>   break;
>   sprintf(buf, "/dev/pts/%d", ptyno);
>   fd[1] = open(buf, O_RDONLY);
>   if (fd[1] == EOF)
>   break;
>   ioctl(fd[1], TIOCSTI, );
>   }
>   return 1;
> }
> 

Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
for all input workers") added in 4.4-rc1.

Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
reverting that commit on top of 4.16 only helped surviving the test longer
than vanilla 4.16; still stalls at __process_echoes(). There might be similar
changes which need to be reverted as well.



# bad: [8005c49d9aea74d382f474ce11afbbc7d7130bec] Linux 4.4-rc1
# good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
# good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
# good: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
# good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
# good: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect start 'HEAD' 'v4.3' 'v4.2' 'v4.1' 'v4.0' 'v3.19' 'v3.18' 'v3.17' 
'v3.16' 'v3.15' 'v3.14' 'v3.13' 'v3.12' '--' 'drivers/tty/'
# good: [cc32382d9fd22dc8eebba4a245f50417267bda8e] serial: imx: make setup_ufcr 
more useful
git bisect good cc32382d9fd22dc8eebba4a245f50417267bda8e
# good: [32ede4a51754cb62b0d43d91cb7c4e3c57069a9c] serial: tegra: Add helper 
function for handling RX buffer
git bisect good 32ede4a51754cb62b0d43d91cb7c4e3c57069a9c
# good: [4b41b9539a1e9531f942ededfcdcff372317d2e7] tty: Prevent tty teardown 
during tty_write_message()
git bisect good 4b41b9539a1e9531f942ededfcdcff372317d2e7
# bad: [fd0d351de7bbd718bc2b34d5846854831aa2b88c] Merge tag 'tty-4.4-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad fd0d351de7bbd718bc2b34d5846854831aa2b88c
# good: [14c79092909a52b6fd6394b6ad5e7756c4f9565e] Merge branch 'parisc-4.3-2' 
of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good 14c79092909a52b6fd6394b6ad5e7756c4f9565e
# good: [41ecf1404b34d9975eb97f5005d9e4274eaeb76a] Merge tag 
'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good 41ecf1404b34d9975eb97f5005d9e4274eaeb76a
# bad: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound workqueue 
for all input workers
git bisect bad e052c6d15c61cc4caff2f06cbca72b183da9f15e
# good: [e176058f0de53c2346734e5254835e0045364001] tty: Abstract tty buffer work
git bisect good e176058f0de53c2346734e5254835e0045364001
# first bad commit: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound 
workqueue for all input workers


Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-21 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> But I am not sure because stall stacks are somewhat different.

Yes, this will be the same one which can be reproduced using same reproducer
provided by syzbot. You can try below one like "while :; do echo -n .; ./a.out; 
done".


#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fd[2];
unsigned int zero = 0;
char buf[128];
int ptyno = 0;
char c = 0;
char data[64] = { };
while (1) {
fd[0] = open("/dev/ptmx", O_RDWR);
ioctl(fd[0], TIOCSPTLCK, );
write(fd[0], data, sizeof(data));
if (ioctl(fd[0], TIOCGPTN, ))
break;
sprintf(buf, "/dev/pts/%d", ptyno);
fd[1] = open(buf, O_RDONLY);
if (fd[1] == EOF)
break;
ioctl(fd[1], TIOCSTI, );
}
return 1;
}


Since tty subsystem is prone to race condition, same reproducer can hit
various different results.

[  524.349009] a.out   R  running task0  4052  1 0x8084
[  524.350879] Call Trace:
[  524.351816]  
[  524.352672]  sched_show_task+0x16b/0x230
[  524.353864]  show_state_filter+0x9d/0x190
[  524.355098]  sysrq_handle_showstate+0x7/0x10
[  524.356342]  __handle_sysrq+0x123/0x200
[  524.357566]  sysrq_filter+0x356/0x390
[  524.358710]  input_to_handler+0x4b/0xf0
[  524.359911]  input_pass_values.part.2+0x1ba/0x260
[  524.361267]  input_handle_event+0x11a/0x4f0
[  524.362504]  input_event+0x45/0x60
[  524.363601]  atkbd_interrupt+0x41a/0x6f0
[  524.364815]  serio_interrupt+0x3b/0x80
[  524.365975]  i8042_interrupt+0x1e8/0x3a0
[  524.367197]  ? _raw_spin_unlock_irq+0x24/0x40
[  524.368456]  __handle_irq_event_percpu+0x5b/0xc0
[  524.369797]  handle_irq_event_percpu+0x1b/0x50
[  524.371102]  handle_irq_event+0x2f/0x50
[  524.372245]  handle_edge_irq+0xcc/0x1a0
[  524.373433]  handle_irq+0x11/0x20
[  524.374519]  do_IRQ+0x52/0x100
[  524.375519]  common_interrupt+0xf/0xf
[  524.376648]  
[  524.377473] RIP: 0010:n_tty_receive_char_special+0x4d1/0xad0
[  524.379032] RSP: 0018:c90006d2bd08 EFLAGS: 0293 ORIG_RAX: 
ffd2
[  524.380957] RAX: c90010f7d000 RBX: c90010f7d000 RCX: 0011c8d443f0
[  524.382783] RDX: 03ef RSI: 880126ce3800 RDI: 
[  524.384644] RBP: 880126ce3800 R08: 0012 R09: 
[  524.386479] R10:  R11: 814184f3 R12: 0012
[  524.388300] R13: 880126ce3800 R14: 000a13a9e288 R15: 0012
[  524.390158]  ? n_tty_receive_buf_common+0x43/0xd00
[  524.391567]  ? n_tty_receive_char_special+0x4ea/0xad0
[  524.393052]  ? n_tty_receive_buf_common+0x43/0xd00
[  524.394464]  n_tty_receive_buf_common+0xa43/0xd00
[  524.395832]  tty_ioctl+0x3e7/0x910
[  524.397041]  ? trace_hardirqs_on_caller+0xe7/0x180
[  524.398512]  do_vfs_ioctl+0x8a/0x650
[  524.399666]  ? syscall_trace_enter+0x19e/0x250
[  524.400977]  ksys_ioctl+0x6b/0x80
[  524.402102]  __x64_sys_ioctl+0x11/0x20
[  524.403327]  do_syscall_64+0x4f/0x1f0
[  524.404494]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  524.660246] 7 locks held by a.out/4052:
[  524.661533]  #0: eb1b6dd2 (>ldisc_sem){}, at: 
tty_ldisc_ref_wait+0x1f/0x50
[  524.663752]  #1: 5920aabb (_tty->termios_rwsem/1){}, at: 
n_tty_receive_buf_common+0x43/0xd00
[  524.666284]  #2: fc85438a (>lock){-.-.}, at: 
serio_interrupt+0x1f/0x80
[  524.668422]  #3: 806da945 (&(>event_lock)->rlock){-.-.}, at: 
input_event+0x32/0x60
[  524.670712]  #4: 5906d89a (rcu_read_lock){}, at: 
input_pass_values.part.2+0x0/0x260
[  524.673017]  #5: 5906d89a (rcu_read_lock){}, at: 
__handle_sysrq+0x0/0x200
[  524.675144]  #6: ed0ed6ec (tasklist_lock){.?.+}, at: 
debug_show_all_locks+0x37/0x190



[  122.170234] a.out   R  running task0  2818  1 0x8084
[  122.172442] Call Trace:
[  122.173402]  
[  122.174271]  sched_show_task+0x16b/0x230
[  122.175512]  show_state_filter+0x9d/0x190
[  122.176693]  sysrq_handle_showstate+0x7/0x10
[  122.177924]  __handle_sysrq+0x123/0x200
[  122.179122]  sysrq_filter+0x356/0x390
[  122.180237]  input_to_handler+0x4b/0xf0
[  122.181385]  input_pass_values.part.2+0x1ba/0x260
[  122.182769]  input_handle_event+0x11a/0x4f0
[  122.184131]  input_event+0x45/0x60
[  122.185191]  atkbd_interrupt+0x41a/0x6f0
[  122.186350]  serio_interrupt+0x3b/0x80
[  122.187502]  i8042_interrupt+0x1e8/0x3a0
[  122.188668]  __handle_irq_event_percpu+0x5b/0xc0
[  122.189966]  handle_irq_event_percpu+0x1b/0x50
[  122.191384]  handle_irq_event+0x2f/0x50
[  122.192522]  

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-21 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> But I am not sure because stall stacks are somewhat different.

Yes, this will be the same one which can be reproduced using same reproducer
provided by syzbot. You can try below one like "while :; do echo -n .; ./a.out; 
done".


#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fd[2];
unsigned int zero = 0;
char buf[128];
int ptyno = 0;
char c = 0;
char data[64] = { };
while (1) {
fd[0] = open("/dev/ptmx", O_RDWR);
ioctl(fd[0], TIOCSPTLCK, );
write(fd[0], data, sizeof(data));
if (ioctl(fd[0], TIOCGPTN, ))
break;
sprintf(buf, "/dev/pts/%d", ptyno);
fd[1] = open(buf, O_RDONLY);
if (fd[1] == EOF)
break;
ioctl(fd[1], TIOCSTI, );
}
return 1;
}


Since tty subsystem is prone to race condition, same reproducer can hit
various different results.

[  524.349009] a.out   R  running task0  4052  1 0x8084
[  524.350879] Call Trace:
[  524.351816]  
[  524.352672]  sched_show_task+0x16b/0x230
[  524.353864]  show_state_filter+0x9d/0x190
[  524.355098]  sysrq_handle_showstate+0x7/0x10
[  524.356342]  __handle_sysrq+0x123/0x200
[  524.357566]  sysrq_filter+0x356/0x390
[  524.358710]  input_to_handler+0x4b/0xf0
[  524.359911]  input_pass_values.part.2+0x1ba/0x260
[  524.361267]  input_handle_event+0x11a/0x4f0
[  524.362504]  input_event+0x45/0x60
[  524.363601]  atkbd_interrupt+0x41a/0x6f0
[  524.364815]  serio_interrupt+0x3b/0x80
[  524.365975]  i8042_interrupt+0x1e8/0x3a0
[  524.367197]  ? _raw_spin_unlock_irq+0x24/0x40
[  524.368456]  __handle_irq_event_percpu+0x5b/0xc0
[  524.369797]  handle_irq_event_percpu+0x1b/0x50
[  524.371102]  handle_irq_event+0x2f/0x50
[  524.372245]  handle_edge_irq+0xcc/0x1a0
[  524.373433]  handle_irq+0x11/0x20
[  524.374519]  do_IRQ+0x52/0x100
[  524.375519]  common_interrupt+0xf/0xf
[  524.376648]  
[  524.377473] RIP: 0010:n_tty_receive_char_special+0x4d1/0xad0
[  524.379032] RSP: 0018:c90006d2bd08 EFLAGS: 0293 ORIG_RAX: 
ffd2
[  524.380957] RAX: c90010f7d000 RBX: c90010f7d000 RCX: 0011c8d443f0
[  524.382783] RDX: 03ef RSI: 880126ce3800 RDI: 
[  524.384644] RBP: 880126ce3800 R08: 0012 R09: 
[  524.386479] R10:  R11: 814184f3 R12: 0012
[  524.388300] R13: 880126ce3800 R14: 000a13a9e288 R15: 0012
[  524.390158]  ? n_tty_receive_buf_common+0x43/0xd00
[  524.391567]  ? n_tty_receive_char_special+0x4ea/0xad0
[  524.393052]  ? n_tty_receive_buf_common+0x43/0xd00
[  524.394464]  n_tty_receive_buf_common+0xa43/0xd00
[  524.395832]  tty_ioctl+0x3e7/0x910
[  524.397041]  ? trace_hardirqs_on_caller+0xe7/0x180
[  524.398512]  do_vfs_ioctl+0x8a/0x650
[  524.399666]  ? syscall_trace_enter+0x19e/0x250
[  524.400977]  ksys_ioctl+0x6b/0x80
[  524.402102]  __x64_sys_ioctl+0x11/0x20
[  524.403327]  do_syscall_64+0x4f/0x1f0
[  524.404494]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  524.660246] 7 locks held by a.out/4052:
[  524.661533]  #0: eb1b6dd2 (>ldisc_sem){}, at: 
tty_ldisc_ref_wait+0x1f/0x50
[  524.663752]  #1: 5920aabb (_tty->termios_rwsem/1){}, at: 
n_tty_receive_buf_common+0x43/0xd00
[  524.666284]  #2: fc85438a (>lock){-.-.}, at: 
serio_interrupt+0x1f/0x80
[  524.668422]  #3: 806da945 (&(>event_lock)->rlock){-.-.}, at: 
input_event+0x32/0x60
[  524.670712]  #4: 5906d89a (rcu_read_lock){}, at: 
input_pass_values.part.2+0x0/0x260
[  524.673017]  #5: 5906d89a (rcu_read_lock){}, at: 
__handle_sysrq+0x0/0x200
[  524.675144]  #6: ed0ed6ec (tasklist_lock){.?.+}, at: 
debug_show_all_locks+0x37/0x190



[  122.170234] a.out   R  running task0  2818  1 0x8084
[  122.172442] Call Trace:
[  122.173402]  
[  122.174271]  sched_show_task+0x16b/0x230
[  122.175512]  show_state_filter+0x9d/0x190
[  122.176693]  sysrq_handle_showstate+0x7/0x10
[  122.177924]  __handle_sysrq+0x123/0x200
[  122.179122]  sysrq_filter+0x356/0x390
[  122.180237]  input_to_handler+0x4b/0xf0
[  122.181385]  input_pass_values.part.2+0x1ba/0x260
[  122.182769]  input_handle_event+0x11a/0x4f0
[  122.184131]  input_event+0x45/0x60
[  122.185191]  atkbd_interrupt+0x41a/0x6f0
[  122.186350]  serio_interrupt+0x3b/0x80
[  122.187502]  i8042_interrupt+0x1e8/0x3a0
[  122.188668]  __handle_irq_event_percpu+0x5b/0xc0
[  122.189966]  handle_irq_event_percpu+0x1b/0x50
[  122.191384]  handle_irq_event+0x2f/0x50
[  122.192522]  

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-15 Thread syzbot

syzbot has found reproducer for the following crash on upstream commit
18b7fd1c93e5204355ddbf2608a097d64df81b88 (Sat Apr 14 15:50:50 2018 +)
Merge branch 'akpm' (patches from Andrew)
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467


So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5283196245639168
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5978502260064256
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4993708470566912
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8852471259444315113

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+18df353d7540aa6b5...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

INFO: rcu_sched self-detected stall on CPU
	1-: (124999 ticks this GP) idle=792/1/4611686018427387906  
softirq=9920/9920 fqs=31243

 (t=125000 jiffies g=4830 c=4829 q=20)
NMI backtrace for cpu 1
CPU: 1 PID: 4467 Comm: syzkaller756606 Not tainted 4.16.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
 __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
 hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
 
RIP: 0010:echo_char+0x96/0x2e0 drivers/tty/n_tty.c:915
RSP: 0018:8801b079f6f8 EFLAGS: 0293 ORIG_RAX: ff13
RAX: 8801b06ee1c0 RBX: c90001e64000 RCX: 837ab4a7
RDX:  RSI: 837ab4b6 RDI: 0001
RBP: 8801b079f728 R08: 8801b06ee1c0 R09: 0001
R10: 8801b06eea20 R11: 8801b06ee1c0 R12: 000f
R13: 8801b02b0dc0 R14: 00012b1bcc47 R15: 0c47
 n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
 n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
 __receive_buf drivers/tty/n_tty.c:1611 [inline]
 n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
 n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1738
 tiocsti drivers/tty/tty_io.c:2171 [inline]
 tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4404b9
RSP: 002b:7ffee94bd7f8 EFLAGS: 0213 ORIG_RAX: 0010
RAX: ffda RBX: 26c0 RCX: 004404b9
RDX: 20c0 RSI: 5412 RDI: 0012
RBP: 6d74702f7665642f R08: 000a R09: 
R10:  R11: 0213 R12: 00401c30
R13: 00401cc0 R14:  R15: 



Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-15 Thread syzbot

syzbot has found reproducer for the following crash on upstream commit
18b7fd1c93e5204355ddbf2608a097d64df81b88 (Sat Apr 14 15:50:50 2018 +)
Merge branch 'akpm' (patches from Andrew)
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467


So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5283196245639168
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5978502260064256
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4993708470566912
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8852471259444315113

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+18df353d7540aa6b5...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

INFO: rcu_sched self-detected stall on CPU
	1-: (124999 ticks this GP) idle=792/1/4611686018427387906  
softirq=9920/9920 fqs=31243

 (t=125000 jiffies g=4830 c=4829 q=20)
NMI backtrace for cpu 1
CPU: 1 PID: 4467 Comm: syzkaller756606 Not tainted 4.16.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
 __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
 hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
 
RIP: 0010:echo_char+0x96/0x2e0 drivers/tty/n_tty.c:915
RSP: 0018:8801b079f6f8 EFLAGS: 0293 ORIG_RAX: ff13
RAX: 8801b06ee1c0 RBX: c90001e64000 RCX: 837ab4a7
RDX:  RSI: 837ab4b6 RDI: 0001
RBP: 8801b079f728 R08: 8801b06ee1c0 R09: 0001
R10: 8801b06eea20 R11: 8801b06ee1c0 R12: 000f
R13: 8801b02b0dc0 R14: 00012b1bcc47 R15: 0c47
 n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
 n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
 __receive_buf drivers/tty/n_tty.c:1611 [inline]
 n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
 n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1738
 tiocsti drivers/tty/tty_io.c:2171 [inline]
 tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4404b9
RSP: 002b:7ffee94bd7f8 EFLAGS: 0213 ORIG_RAX: 0010
RAX: ffda RBX: 26c0 RCX: 004404b9
RDX: 20c0 RSI: 5412 RDI: 0012
RBP: 6d74702f7665642f R08: 000a R09: 
R10:  R11: 0213 R12: 00401c30
R13: 00401cc0 R14:  R15: 



Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-06 Thread Dmitry Vyukov
On Fri, Apr 6, 2018 at 9:12 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 (Sat Mar 31 01:52:36 2018 +)
> kernel.h: Retain constant expression output for max()/min()
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5836679554269184
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-1647968177339044852
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+18df353d7540aa6b5...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
But I am not sure because stall stacks are somewhat different.


> INFO: rcu_sched detected stalls on CPUs/tasks:
> (detected by 1, t=125007 jiffies, g=42488, c=42487, q=11)
> All QSes seen, last rcu_sched kthread activity 125014
> (4295022441-4294897427), jiffies_till_next_fqs=3, root ->qsmask 0x0
> kworker/u4:5R  running task15272  8806  2 0x8008
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  
>  sched_show_task.cold.87+0x27a/0x301 kernel/sched/core.c:5325
>  print_other_cpu_stall.cold.79+0x92f/0x9d2 kernel/rcu/tree.c:1481
>  check_cpu_stall.isra.61+0x706/0xf50 kernel/rcu/tree.c:1599
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:171
>  tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1179
>  __run_hrtimer kernel/time/hrtimer.c:1337 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1399
>  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1457
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  
> RIP: 0010:echo_char+0xae/0x2e0 drivers/tty/n_tty.c:915
> RSP: 0018:8801d33e71e0 EFLAGS: 0a07 ORIG_RAX: ff13
> RAX: dc00 RBX: c90013158000 RCX: 8375b1b7
> RDX: 11003ad87636 RSI: 8375b1c6 RDI: 8801d6c3b1b4
> RBP: 8801d33e7210 R08: 8801cf482540 R09: f5200262b460
> R10: f5200262b460 R11: c9001315a307 R12: 00cb
> R13: 8801d6c3ae00 R14: c240f0bb R15: 00bb
>  n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
>  n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
>  __receive_buf drivers/tty/n_tty.c:1611 [inline]
>  n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
>  n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>  tty_ldisc_receive_buf+0xb0/0x190 drivers/tty/tty_buffer.c:456
>  tty_port_default_receive_buf+0x110/0x170 drivers/tty/tty_port.c:38
>  receive_buf drivers/tty/tty_buffer.c:475 [inline]
>  flush_to_ldisc+0x3e9/0x560 drivers/tty/tty_buffer.c:524
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> rcu_sched kthread starved for 125626 jiffies! g42488 c42487 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> RCU grace-period kthread stack dump:
> rcu_sched   R  running task23592 9  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x807/0x1e40 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start 

Re: INFO: rcu detected stall in n_tty_receive_char_special

2018-04-06 Thread Dmitry Vyukov
On Fri, Apr 6, 2018 at 9:12 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 (Sat Mar 31 01:52:36 2018 +)
> kernel.h: Retain constant expression output for max()/min()
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5836679554269184
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-1647968177339044852
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+18df353d7540aa6b5...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
But I am not sure because stall stacks are somewhat different.


> INFO: rcu_sched detected stalls on CPUs/tasks:
> (detected by 1, t=125007 jiffies, g=42488, c=42487, q=11)
> All QSes seen, last rcu_sched kthread activity 125014
> (4295022441-4294897427), jiffies_till_next_fqs=3, root ->qsmask 0x0
> kworker/u4:5R  running task15272  8806  2 0x8008
> Workqueue: events_unbound flush_to_ldisc
> Call Trace:
>  
>  sched_show_task.cold.87+0x27a/0x301 kernel/sched/core.c:5325
>  print_other_cpu_stall.cold.79+0x92f/0x9d2 kernel/rcu/tree.c:1481
>  check_cpu_stall.isra.61+0x706/0xf50 kernel/rcu/tree.c:1599
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:171
>  tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1179
>  __run_hrtimer kernel/time/hrtimer.c:1337 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1399
>  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1457
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  
> RIP: 0010:echo_char+0xae/0x2e0 drivers/tty/n_tty.c:915
> RSP: 0018:8801d33e71e0 EFLAGS: 0a07 ORIG_RAX: ff13
> RAX: dc00 RBX: c90013158000 RCX: 8375b1b7
> RDX: 11003ad87636 RSI: 8375b1c6 RDI: 8801d6c3b1b4
> RBP: 8801d33e7210 R08: 8801cf482540 R09: f5200262b460
> R10: f5200262b460 R11: c9001315a307 R12: 00cb
> R13: 8801d6c3ae00 R14: c240f0bb R15: 00bb
>  n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
>  n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
>  __receive_buf drivers/tty/n_tty.c:1611 [inline]
>  n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
>  n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>  tty_ldisc_receive_buf+0xb0/0x190 drivers/tty/tty_buffer.c:456
>  tty_port_default_receive_buf+0x110/0x170 drivers/tty/tty_port.c:38
>  receive_buf drivers/tty/tty_buffer.c:475 [inline]
>  flush_to_ldisc+0x3e9/0x560 drivers/tty/tty_buffer.c:524
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> rcu_sched kthread starved for 125626 jiffies! g42488 c42487 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> RCU grace-period kthread stack dump:
> rcu_sched   R  running task23592 9  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x807/0x1e40 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
>