Re: [PATCH] kernel/hung_task: Add a whitelist and blacklist mechanism.
On 2021/04/19 11:22, 周传高 wrote: > Some SOC vendors' drivers or user-mode processes may be in D state for a long > time, > and normally they do not configure HUNG TASK, so we need to ignore these > tasks if > we use HUNG TASK. Isn't that a sign that the quality of the drivers and user-mode processes is poor? Wait should be killable/interruptible if possible (for better response to e.g. OOM-killer).
Re: [syzbot] unexpected kernel reboot (4)
On 2021/04/17 12:19, Tetsuo Handa wrote: > On 2021/04/15 0:39, Andrey Konovalov wrote: >> On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov wrote: >>> The reproducer connects some USB HID device and communicates with the >>> driver. >>> Previously we observed reboots because HID devices can trigger reboot >>> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set". >>> How else can a USB device reboot the machine? Is it possible to disable it? >>> I don't see any direct includes of in drivers/usb/* >> >> This happens when a keyboard sends the Ctrl+Alt+Del sequence, see >> fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c. >> > > Regarding ctrl_alt_del() problem, doing > > sh -c 'echo 0 > /proc/sys/kernel/ctrl-alt-del; echo $$ > > /proc/sys/kernel/cad_pid' > > as root before start fuzzing might help. > > Also, with the command above, reproducer still triggers suspend operation > which freezes userspace processes. > This could possibly be one of causes for no output / lost connections. Try > disabling freeze/suspend related configs? > I tried # CONFIG_HIBERNATION is not set # CONFIG_SUSPEND is not set # CONFIG_PM is not set but the reproducer still shutdowns the system. The reproducer can be simplified to single threaded. int main(int argc, char *argv[]) { uint64_t r[1] = { 0x }; syscall(__NR_mmap, 0x2000ul, 0x100ul, 7ul, 0x32ul, -1, 0ul); memcpy((void*)0x2200, "\x12\x01\x00\x00\x00\x00\x00\x40\x26\x09\x33\x33\x40\x00\x00\x00\x00" "\x01\x09\x02\x24\x00\x01\x00\x00\x00\x00\x09\x04\x00\x00\x01\x03\x01" "\x00\x00\x09\x21\x00\x00\x00\x01\x22\x01\x00\x09\x05\x81\x03\x08", 50); r[0] = syz_usb_connect(0, 0x36, 0x2200, 0); syz_usb_control_io(r[0], 0, 0); *(uint32_t*)0x2300 = 0x24; *(uint64_t*)0x2304 = 0; *(uint64_t*)0x230c = 0; *(uint64_t*)0x2314 = 0x21c0; *(uint8_t*)0x21c0 = 0; *(uint8_t*)0x21c1 = 0x22; *(uint32_t*)0x21c2 = 5; STORE_BY_BITMASK(uint8_t, , 0x21c6, 3, 0, 2); STORE_BY_BITMASK(uint8_t, , 0x21c6, 0, 2, 2); STORE_BY_BITMASK(uint8_t, , 0x21c6, 0, 4, 4); memcpy((void*)0x21c7, "\x65\xe9\x5e\x9d", 4); *(uint64_t*)0x231c = 0; syz_usb_control_io(r[0], 0x2300, 0); syz_usb_ep_write(r[0], 0, 5, (long) "\x8e\xc5\x44\xbc\x66"); return 0; } Since applying below diff does not help, I don't think that this problem can be avoided by modifying tty code. diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c index 77638629c562..25e6d52f5de6 100644 --- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -323,12 +323,14 @@ int kbd_rate(struct kbd_repeat *rpt) */ static void put_queue(struct vc_data *vc, int ch) { + return; tty_insert_flip_char(>port, ch, 0); tty_schedule_flip(>port); } static void puts_queue(struct vc_data *vc, const char *cp) { + return; tty_insert_flip_string(>port, cp, strlen(cp)); tty_schedule_flip(>port); } @@ -657,7 +659,7 @@ static void k_spec(struct vc_data *vc, unsigned char value, char up_flag) kbd->kbdmode == VC_OFF) && value != KVAL(K_SAK)) return; /* SAK is allowed even in raw mode */ - fn_handler[value](vc); + //fn_handler[value](vc); } static void k_lowercase(struct vc_data *vc, unsigned char value, char up_flag) @@ -1393,6 +1395,7 @@ static void kbd_keycode(unsigned int keycode, int down, bool hw_raw) struct keyboard_notifier_param param = { .vc = vc, .value = keycode, .down = down }; int rc; + return; tty = vc->port.tty; if (tty && (!tty->driver_data)) { @@ -1509,7 +1512,7 @@ static void kbd_keycode(unsigned int keycode, int down, bool hw_raw) if ((raw_mode || kbd->kbdmode == VC_OFF) && type != KT_SPEC && type != KT_SHIFT) return; - (*k_handler[type])(vc, keysym & 0xff, !down); + //(*k_handler[type])(vc, keysym & 0xff, !down); param.ledstate = kbd->ledflagstate; atomic_notifier_call_chain(_notifier_list, KBD_POST_KEYSYM, ); Running the reproducer with dbus-monitor --system > /dev/ttyprintk reports that systemd's shutdown service is requested when syz_usb_ep_write() is called. I can't understand why the command "\x8e\xc5\x44\xbc\x66" leads to this result.
Re: [PATCH] kernel/hung_task: Add a whitelist and blacklist mechanism.
On 2021/04/17 23:13, zhouchuangao wrote: > The main purpose of this patch is to add a whitelist and blacklist > mechanism to the hung task thread. We stopped using the term 'whitelist'/'blacklist' for new code in Linux kernel, and what you are proposing is something like 'ignorelist'/'fatallist'. I think that matching based on comm name is poor, for comm name is subjected to impersonation by malicious user processes. Moreover, speak of syzkaller testing, most of hang task reports are reaction to somebody else consuming too much CPU resources (e.g. printk() flooding, too many pending workqueue requests). Even if some process is in 'ignorelist', it is possible that some problem that should be reported is already happening. Even if some process is in 'fatallist', it is possible that the cause of hang is simply somebody else is consuming too much CPU. By the way, I wish that khungtaskd can report recent top CPU consumers, for it is rare that the cause of hung is locking dependency problems / hardware problems.
Re: [syzbot] unexpected kernel reboot (4)
On 2021/04/15 0:39, Andrey Konovalov wrote: > On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov wrote: >> The reproducer connects some USB HID device and communicates with the driver. >> Previously we observed reboots because HID devices can trigger reboot >> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set". >> How else can a USB device reboot the machine? Is it possible to disable it? >> I don't see any direct includes of in drivers/usb/* > > This happens when a keyboard sends the Ctrl+Alt+Del sequence, see > fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c. > Regarding ctrl_alt_del() problem, doing sh -c 'echo 0 > /proc/sys/kernel/ctrl-alt-del; echo $$ > /proc/sys/kernel/cad_pid' as root before start fuzzing might help. Also, with the command above, reproducer still triggers suspend operation which freezes userspace processes. This could possibly be one of causes for no output / lost connections. Try disabling freeze/suspend related configs? [ 60.881255][ T6280] usb 5-1: new high-speed USB device number 2 using dummy_hcd [ 61.260648][ T6280] usb 5-1: config 0 interface 0 altsetting 0 endpoint 0x81 has an invalid bInterval 0, changing to 7 [ 61.274056][ T6280] usb 5-1: New USB device found, idVendor=0926, idProduct=, bcdDevice= 0.40 [ 61.284700][ T6280] usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 61.289556][ T6280] usb 5-1: config 0 descriptor?? [ 61.780871][ T6280] keytouch 0003:0926:.0002: fixing up Keytouch IEC report descriptor [ 61.792015][ T6280] input: HID 0926: as /devices/platform/dummy_hcd.0/usb5/5-1/5-1:0.0/0003:0926:.0002/input/input5 [ 61.871612][ T6280] keytouch 0003:0926:.0002: input,hidraw1: USB HID v0.00 Keyboard [HID 0926:] on usb-dummy_hcd.0-1/input0 [ 62.137706][ T6847] PM: suspend entry (s2idle) [ 62.147914][ T6847] Filesystems sync: 0.007 seconds [ 62.152031][ T6847] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 62.158369][ T6847] OOM killer disabled. [ 62.159673][ T6847] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done. [ 62.167440][ T6847] vhci_hcd vhci_hcd.15: suspend vhci_hcd [ 62.169569][ T6847] vhci_hcd vhci_hcd.14: suspend vhci_hcd [ 62.171562][ T6847] vhci_hcd vhci_hcd.13: suspend vhci_hcd [ 62.173500][ T6847] vhci_hcd vhci_hcd.12: suspend vhci_hcd [ 62.175740][ T6847] vhci_hcd vhci_hcd.11: suspend vhci_hcd [ 62.177677][ T6847] vhci_hcd vhci_hcd.10: suspend vhci_hcd [ 62.179725][ T6847] vhci_hcd vhci_hcd.9: suspend vhci_hcd [ 62.181602][ T6847] vhci_hcd vhci_hcd.8: suspend vhci_hcd [ 62.183681][ T6847] vhci_hcd vhci_hcd.7: suspend vhci_hcd [ 62.185594][ T6847] vhci_hcd vhci_hcd.6: suspend vhci_hcd [ 62.187552][ T6847] vhci_hcd vhci_hcd.5: suspend vhci_hcd [ 62.189566][ T6847] vhci_hcd vhci_hcd.4: suspend vhci_hcd [ 62.191767][ T6847] vhci_hcd vhci_hcd.3: suspend vhci_hcd [ 62.193657][ T6847] vhci_hcd vhci_hcd.2: suspend vhci_hcd [ 62.195634][ T6847] vhci_hcd vhci_hcd.1: suspend vhci_hcd [ 62.197430][ T6847] vhci_hcd vhci_hcd.0: suspend vhci_hcd [ 62.249881][T8] mptbase: ioc0: pci-suspend: pdev=0x888005495000, slot=:00:10.0, Entering operating state [D0]
Re: [PATCH 1/5] xattr: Complete constify ->name member of "struct xattr"
On 2021/04/15 19:04, Roberto Sassu wrote: > This patch completes commit 9548906b2bb7 ('xattr: Constify ->name member of > "struct xattr"'). It fixes the documentation of the inode_init_security > hook, by removing the xattr name from the objects that are expected to be > allocated by LSMs (only the value is allocated). Also, it removes the > kfree() of name and setting it to NULL in the reiserfs code. Good catch, but well, grep does not find any reiserfs_security_free() callers. Is reiserfs_security_free() a dead code?
[PATCH] ttyprintk: Add TTY hangup callback.
syzbot is reporting hung task due to flood of tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, port->count); message [1], for ioctl(TIOCVHANGUP) prevents tty_port_close() from decrementing port->count due to tty_hung_up_p() == true. -- #include #include #include #include #include int main(int argc, char *argv[]) { int i; int fd[10]; for (i = 0; i < 10; i++) fd[i] = open("/dev/ttyprintk", O_WRONLY); ioctl(fd[0], TIOCVHANGUP); for (i = 0; i < 10; i++) close(fd[i]); close(open("/dev/ttyprintk", O_WRONLY)); return 0; } -- When TTY hangup happens, port->count needs to be reset via "struct tty_operations"->hangup callback. [1] https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a Reported-by: syzbot Reported-by: syzbot Tested-by: syzbot Signed-off-by: Tetsuo Handa Fixes: 24b4b67d17c308aa ("add ttyprintk driver") --- drivers/char/ttyprintk.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c index 6a0059e508e3..93f5d11c830b 100644 --- a/drivers/char/ttyprintk.c +++ b/drivers/char/ttyprintk.c @@ -158,12 +158,23 @@ static int tpk_ioctl(struct tty_struct *tty, return 0; } +/* + * TTY operations hangup function. + */ +static void tpk_hangup(struct tty_struct *tty) +{ + struct ttyprintk_port *tpkp = tty->driver_data; + + tty_port_hangup(>port); +} + static const struct tty_operations ttyprintk_ops = { .open = tpk_open, .close = tpk_close, .write = tpk_write, .write_room = tpk_write_room, .ioctl = tpk_ioctl, + .hangup = tpk_hangup, }; static const struct tty_port_operations null_ops = { }; -- 2.18.4
Re: [syzbot] unexpected kernel reboot (4)
On 2021/04/15 0:39, Andrey Konovalov wrote: > On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov wrote: >> >> On Tue, Apr 13, 2021 at 11:27 PM syzbot >> wrote: >>> >>> Hello, >>> >>> syzbot found the following issue on: >>> >>> HEAD commit:89698bec Merge tag 'm68knommu-for-v5.12-rc7' of git://git... >>> git tree: upstream >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1243fcfed0 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=b234ddbbe2953747 >>> dashboard link: https://syzkaller.appspot.com/bug?extid=9ce030d4c89856b27619 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=173e92fed0 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1735da2ed0 >>> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>> Reported-by: syzbot+9ce030d4c89856b27...@syzkaller.appspotmail.com >>> >>> output_len: 0x0e74eb68 >>> kernel_total_size: 0x0f226000 >>> needed_size: 0x0f40 >>> trampoline_32bit: 0x0009d000 >>> Decompressing Linux... Parsing ELF... done. >>> Booting the kernel. >> >> +linux-input >> >> The reproducer connects some USB HID device and communicates with the driver. >> Previously we observed reboots because HID devices can trigger reboot >> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set". >> How else can a USB device reboot the machine? Is it possible to disable it? >> I don't see any direct includes of in drivers/usb/* > > This happens when a keyboard sends the Ctrl+Alt+Del sequence, see > fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c. > > There was a patchset by Tetsuo [1] to suppress this, but I think it > was abandoned. Not abandoned; I'm waiting for you to join the discussion. But for right now I'm trying to merge LOCKDEP's capacity tuning patch for https://github.com/google/syzkaller/pull/2535 in the next merge window. I still believe that ctrl_alt_del() etc. should be initially controlled via kernel config options (despite Linus's objection at https://lore.kernel.org/lkml/CAHk-=wgz=7MGxxX-tmMmdCsKyYJkuyxNc-4uLP=e_eEV=oz...@mail.gmail.com/ ), for we will need several trials and errors (and an effort to avoid kernel size bloating like https://lkml.kernel.org/r/YD57hjaSmsYapHnQ@alley still remains) before we can determine usable units for allow toggling via kernel command line options. > > (This reminds of a somewhat related syzkaller issue: > https://github.com/google/syzkaller/issues/1824; it relies on a > similar reproducer.) > > [1] https://groups.google.com/g/syzkaller/c/7wCmrGlLgm0/m/5yG6HVtbBQAJ >
Re: How to handle concurrent access to /dev/ttyprintk ?
On 2021/04/14 9:45, Tetsuo Handa wrote: > On 2021/04/12 21:04, Greg Kroah-Hartman wrote: >>> Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from >>> multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ? >> >> Why? Can you not hit the same tty code paths from any other tty driver >> being open multiple times? Why is ttyprintk somehow "special" here? > > I found a simplified reproducer. If we call ioctl(TIOCVHANGUP) on > /dev/ttyprintk , > "struct ttyprintk_port tpk_port".port.count cannot be decremented by > tty_port_close() from tpk_close() due to tty_hung_up_p() == true when > close() is called. As a result, tty->count and port count gets out of sync. > > Then, when /dev/ttyprintk is opened again and then closed without calling > ioctl(TIOCVHANGUP), this message is printed due to tty_hung_up_p() == false. > > If I replace /dev/ttyprintk with /dev/ttyS0 in the reproducer shown above, > this message is not printed. > The difference between /dev/ttyS0 and /dev/ttyprintk is that the former provides uart_hangup() as "struct tty_operations"->hangup while the latter does not provide "struct tty_operations"->hangup . It seems to me that below patch avoids this message, but I'm not familiar with tty code. Is this fix correct? Don't we need to enforce all drivers to provide "struct tty_operations"->hangup in order to reset port count ? diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c index 6a0059e508e3..ff3b9a41b91b 100644 --- a/drivers/char/ttyprintk.c +++ b/drivers/char/ttyprintk.c @@ -158,12 +158,26 @@ static int tpk_ioctl(struct tty_struct *tty, return 0; } +/* + * TTY operations hangup function. + */ +static void tpk_hangup(struct tty_struct *tty) +{ + struct ttyprintk_port *tpkp = tty->driver_data; + unsigned long flags; + + spin_lock_irqsave(>port.lock, flags); + tpkp->port.count = 0; + spin_unlock_irqrestore(>port.lock, flags); +} + static const struct tty_operations ttyprintk_ops = { .open = tpk_open, .close = tpk_close, .write = tpk_write, .write_room = tpk_write_room, .ioctl = tpk_ioctl, + .hangup = tpk_hangup, }; static const struct tty_port_operations null_ops = { };
Re: How to handle concurrent access to /dev/ttyprintk ?
On 2021/04/12 21:04, Greg Kroah-Hartman wrote: >> Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from >> multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ? > > Why? Can you not hit the same tty code paths from any other tty driver > being open multiple times? Why is ttyprintk somehow "special" here? I found a simplified reproducer. If we call ioctl(TIOCVHANGUP) on /dev/ttyprintk , "struct ttyprintk_port tpk_port".port.count cannot be decremented by tty_port_close() from tpk_close() due to tty_hung_up_p() == true when close() is called. As a result, tty->count and port count gets out of sync. Then, when /dev/ttyprintk is opened again and then closed without calling ioctl(TIOCVHANGUP), this message is printed due to tty_hung_up_p() == false. -- Debug print patch start -- diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 391bada4cedb..42d54368adac 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -113,7 +113,7 @@ #ifdef TTY_DEBUG_HANGUP # define tty_debug_hangup(tty, f, args...) tty_debug(tty, f, ##args) #else -# define tty_debug_hangup(tty, f, args...) do { } while (0) +# define tty_debug_hangup(tty, f, args...) tty_info(tty, f, ##args) #endif #define TTY_PARANOIA_CHECK 1 @@ -628,6 +628,7 @@ static void __tty_hangup(struct tty_struct *tty, int exit_session) closecount++; __tty_fasync(-1, filp, 0); /* can't block */ filp->f_op = _up_tty_fops; + tty_warn(tty, "Set hung_up_tty_fops on %px\n", filp); } spin_unlock(>files_lock); diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c index 346d20f4a486..032fc58203ea 100644 --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -561,8 +561,10 @@ int tty_port_close_start(struct tty_port *port, { unsigned long flags; - if (tty_hung_up_p(filp)) + if (tty_hung_up_p(filp)) { + tty_warn(tty, "Skipped by hung_up_tty_fops on %px\n", filp); return 0; + } spin_lock_irqsave(>lock, flags); if (tty->count == 1 && port->count != 1) { -- Debug print patch end -- -- Reproducer start -- #include #include #include #include #include int main(int argc, char *argv[]) { int i; int fd[10]; for (i = 0; i < 10; i++) fd[i] = open("/dev/ttyprintk", O_WRONLY); ioctl(fd[0], TIOCVHANGUP); for (i = 0; i < 10; i++) close(fd[i]); close(open("/dev/ttyprintk", O_WRONLY)); return 0; } -- Reproducer end -- -- Console output start -- [ 31.885820] ttyprintk ttyprintk: opening (count=1) [ 31.885859] ttyprintk ttyprintk: opening (count=2) [ 31.885876] ttyprintk ttyprintk: opening (count=3) [ 31.885892] ttyprintk ttyprintk: opening (count=4) [ 31.885907] ttyprintk ttyprintk: opening (count=5) [ 31.885923] ttyprintk ttyprintk: opening (count=6) [ 31.885939] ttyprintk ttyprintk: opening (count=7) [ 31.885956] ttyprintk ttyprintk: opening (count=8) [ 31.885971] ttyprintk ttyprintk: opening (count=9) [ 31.885988] ttyprintk ttyprintk: opening (count=10) [ 31.885999] ttyprintk ttyprintk: vhangup [ 31.886005] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed1000 [ 31.886009] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed2000 [ 31.886012] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0c00 [ 31.886016] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0e00 [ 31.886019] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed2c00 [ 31.886023] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3a00 [ 31.886026] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3400 [ 31.886029] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3c00 [ 31.886033] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0200 [ 31.886036] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3e00 [ 31.886055] ttyprintk ttyprintk: releasing (count=10) [ 31.886106] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed3e00 [ 31.886117] ttyprintk ttyprintk: releasing (count=9) [ 31.886121] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed0200 [ 31.886128] ttyprintk ttyprintk: releasing (count=8) [ 31.886131] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed3c00 [ 31.886138] ttyprintk ttyprintk: releasing (count=7) [ 31.886141] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed3400 [ 31.886148] ttyprintk ttyprintk: releasing (count=6) [ 31.886151] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed3a00 [ 31.886158] ttyprintk ttyprintk: releasing (count=5) [ 31.886161] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 97c38eed2c00 [ 31.886168] ttyprintk ttyprintk: releasing (count=4) [
Re: [syzbot] WARNING in smk_set_cipso (2)
Commit 7ef4c19d245f3dc2 ("smackfs: restrict bytes count in smackfs write functions") missed that count > SMK_CIPSOMAX check applies to only format == SMK_FIXED24_FMT case. Reported-by: syzbot Signed-off-by: Tetsuo Handa --- security/smack/smackfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c index 22ded2c26089..1ad7d0d1ea62 100644 --- a/security/smack/smackfs.c +++ b/security/smack/smackfs.c @@ -855,6 +855,8 @@ static ssize_t smk_set_cipso(struct file *file, const char __user *buf, if (format == SMK_FIXED24_FMT && (count < SMK_CIPSOMIN || count > SMK_CIPSOMAX)) return -EINVAL; + if (count > PAGE_SIZE) + return -EINVAL; data = memdup_user_nul(buf, count); if (IS_ERR(data)) -- 2.18.4
Re: How to handle concurrent access to /dev/ttyprintk ?
On 2021/04/12 19:44, Greg Kroah-Hartman wrote: > And trying to "open exclusive only" just does not work, the kernel can > not enforce that at all, sorry. Any driver that you see trying to do > that is trivial to work around in userspace, making the kernel code > pointless. You mean something like below cannot be used? diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c index 6a0059e508e3..57200569918a 100644 --- a/drivers/char/ttyprintk.c +++ b/drivers/char/ttyprintk.c @@ -84,14 +84,26 @@ static int tpk_printk(const unsigned char *buf, int count) return count; } +static DEFINE_MUTEX(open_close_lock); + /* * TTY operations open function. */ static int tpk_open(struct tty_struct *tty, struct file *filp) { - tty->driver_data = _port; + int ret; + + if (mutex_lock_killable(_close_lock)) + return -EINTR; - return tty_port_open(_port.port, tty, filp); + if (tpk_port.port.count) { + ret = -EBUSY; + } else { + tty->driver_data = _port; + ret = tty_port_open(_port.port, tty, filp); + } + mutex_unlock(_close_lock); + return ret; } /* @@ -102,12 +114,14 @@ static void tpk_close(struct tty_struct *tty, struct file *filp) struct ttyprintk_port *tpkp = tty->driver_data; unsigned long flags; + mutex_lock(_close_lock); spin_lock_irqsave(>spinlock, flags); /* flush tpk_printk buffer */ tpk_printk(NULL, 0); spin_unlock_irqrestore(>spinlock, flags); tty_port_close(>port, tty, filp); + mutex_unlock(_close_lock); } /* > Like any tty port, if you have multiple accesses, all bets are off and > hilarity ensues. Just don't do that and expect things to be working > well. Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ?
How to handle concurrent access to /dev/ttyprintk ?
What is the intended usage of /dev/ttyprintk ? It seems that drivers/char/ttyprintk.c was not designed to be opened by multiple processes. As a result, syzbot can trigger tty_warn() flooding enough to fire khungtaskd warning due to tty_port_close(). Do we need to allow concurrent access to /dev/ttyprintk ? If we can't change /dev/ttyprintk exclusively open()able by only one thread, how to handle concurrent access to /dev/ttyprintk ? On 2021/04/07 23:24, Tetsuo Handa wrote: > On 2021/04/07 22:48, Greg Kroah-Hartman wrote: >>> By the way, as soon as applying this patch, I guess that syzkaller starts >>> generating hung task reports because /dev/ttyprintk can trivially trigger >>> flood of >>> >>> tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, >>>port->count); >>> >>> message, and adding >>> >>> if (strcmp(tty_driver_name(tty), "ttyprintk")) >> >> Odd, how can ttyprintk() generate that mess? > > So far three tests and results: > > https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/WifLgadvAAAJ > https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/w2_MiMmJ > https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/hfsQqSOPAAAJ > > Patch https://syzkaller.appspot.com/x/patch.diff?x=145e4c9ad0 generated > console output https://syzkaller.appspot.com/x/log.txt?x=162f9fced0 . > > Patch https://syzkaller.appspot.com/x/patch.diff?x=14839931d0 did not > flood the console output enough to fire khungtaskd. > > Maybe it is because /dev/ttyprintk can be opened/closed by multiple processes > without serialization? > > Running > > for i in $(seq 1 100); do sleep 1 > /dev/ttyprintk & done > > results in > > tty_port_close_start: tty->count = 1 port count = 100 > > . If tty_port_open() from tpk_open() can do > > spin_lock_irq(>lock); > ++port->count; > spin_unlock_irq(>lock); > > when tty_port_close_start() from tty_port_close() from tpk_close() is doing > > spin_lock_irqsave(>lock, flags); > if (tty->count == 1 && port->count != 1) { > tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, > port->count); > port->count = 1; > } > if (--port->count < 0) { > tty_warn(tty, "%s: bad port count (%d)\n", __func__, > port->count); > port->count = 0; > } > > if (port->count) { > spin_unlock_irqrestore(>lock, flags); > return 0; > } > spin_unlock_irqrestore(>lock, flags); > > , what prevents port->count from getting larger than 1 ? >
[tip: irq/core] kernel: Initialize cpumask before parsing
The following commit has been merged into the irq/core branch of tip: Commit-ID: c5e3a41187ac01425f5ad1abce927905e4ac44e4 Gitweb: https://git.kernel.org/tip/c5e3a41187ac01425f5ad1abce927905e4ac44e4 Author:Tetsuo Handa AuthorDate:Thu, 01 Apr 2021 14:58:23 +09:00 Committer: Thomas Gleixner CommitterDate: Sat, 10 Apr 2021 13:35:54 +02:00 kernel: Initialize cpumask before parsing KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized. [ 148.133411][ T5509] = [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] = Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var(). Signed-off-by: Tetsuo Handa Signed-off-by: Thomas Gleixner Acked-by: Steven Rostedt (VMware) Link: https://lore.kernel.org/r/20210401055823.3929-1-penguin-ker...@i-love.sakura.ne.jp --- kernel/irq/proc.c| 4 ++-- kernel/profile.c | 2 +- kernel/trace/trace.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index 9813878..7c5cd42 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -144,7 +144,7 @@ static ssize_t write_irq_affinity(int type, struct file *file, if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EIO; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; if (type) @@ -238,7 +238,7 @@ static ssize_t default_affinity_write(struct file *file, cpumask_var_t new_value; int err; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/profile.c b/kernel/profile.c index 6f69a41..c2ebddb 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -430,7 +430,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file, cpumask_var_t new_value; int err; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index eccb4e1..7607dc8 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4827,7 +4827,7 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, cpumask_var_t tracing_cpumask_new; int err; - if (!alloc_cpumask_var(_cpumask_new, GFP_KERNEL)) + if (!zalloc_cpumask_var(_cpumask_new, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(ubuf, count, tracing_cpumask_new);
Re: [PATCH 05/13] tty: remove tty_warn()
On 2021/04/08 21:51, Greg Kroah-Hartman wrote: > Remove users of tty_warn() and replace them with calls to dev_warn() > which provides more information about the tty that has the error and > uses the standard formatting logic. Ouch. This series would be good for clean up, but this series might be bad for handling lockdep warning syzbot is reporting. Since tty_warn() is using plain printk(), we can avoid lockdep warning by using printk_deferred(). If we use dev_warn() instead, we need to modify __dev_printk() to use printk_deferred(), which means that all dev_*() users are affected by this change. Also, we need to modify dev_printk_emit()/dev_vprintk_emit() callers to embed loglevel into the format string so that we pass LOGLEVEL_SCHED to vprintk_emit() ... maybe just change from "if (!in_sched)" to "if (!in_sched && !dev_info)" instead ? Also, dev_vprintk_emit() need to start calling defer_console_output() after returning from vprintk_emit() in order to behave like printk_deferred(). I'm not sure whether this change is safe.
Re: [PATCH v2] tty: use printk_deferred() at tty_msg()
On 2021/04/07 22:48, Greg Kroah-Hartman wrote: >> By the way, as soon as applying this patch, I guess that syzkaller starts >> generating hung task reports because /dev/ttyprintk can trivially trigger >> flood of >> >> tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, >>port->count); >> >> message, and adding >> >> if (strcmp(tty_driver_name(tty), "ttyprintk")) > > Odd, how can ttyprintk() generate that mess? So far three tests and results: https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/WifLgadvAAAJ https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/w2_MiMmJ https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/hfsQqSOPAAAJ Patch https://syzkaller.appspot.com/x/patch.diff?x=145e4c9ad0 generated console output https://syzkaller.appspot.com/x/log.txt?x=162f9fced0 . Patch https://syzkaller.appspot.com/x/patch.diff?x=14839931d0 did not flood the console output enough to fire khungtaskd. Maybe it is because /dev/ttyprintk can be opened/closed by multiple processes without serialization? Running for i in $(seq 1 100); do sleep 1 > /dev/ttyprintk & done results in tty_port_close_start: tty->count = 1 port count = 100 . If tty_port_open() from tpk_open() can do spin_lock_irq(>lock); ++port->count; spin_unlock_irq(>lock); when tty_port_close_start() from tty_port_close() from tpk_close() is doing spin_lock_irqsave(>lock, flags); if (tty->count == 1 && port->count != 1) { tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, port->count); port->count = 1; } if (--port->count < 0) { tty_warn(tty, "%s: bad port count (%d)\n", __func__, port->count); port->count = 0; } if (port->count) { spin_unlock_irqrestore(>lock, flags); return 0; } spin_unlock_irqrestore(>lock, flags); , what prevents port->count from getting larger than 1 ?
[PATCH v2] tty: use printk_deferred() at tty_msg()
intk_safe.c:401 printk+0xba/0xed kernel/printk/printk.c:2146 tty_port_close_start.part.0+0x503/0x550 drivers/tty/tty_port.c:569 tty_port_close_start drivers/tty/tty_port.c:641 [inline] tty_port_close+0x46/0x170 drivers/tty/tty_port.c:634 tty_release+0x45e/0x1210 drivers/tty/tty_io.c:1779 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 exit_task_work include/linux/task_work.h:30 [inline] do_exit+0xbfc/0x2a60 kernel/exit.c:825 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x47f/0x2150 kernel/signal.c:2781 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: console_owner --> _lock_key --> >lock Possible unsafe locking scenario: CPU0CPU1 lock(>lock); lock(_lock_key); lock(>lock); lock(console_owner); *** DEADLOCK *** 3 locks held by syz-executor.4/2155: #0: 88802ce391c0 (>legacy_mutex){+.+.}-{3:3}, at: tty_lock+0xbd/0x120 drivers/tty/tty_mutex.c:19 #1: 90114698 (>lock){-.-.}-{2:2}, at: tty_port_close_start.part.0+0x28/0x550 drivers/tty/tty_port.c:567 #2: 8bf60920 (console_lock){+.+.}-{0:0}, at: vprintk_func+0x8d/0x1e0 kernel/printk/printk_safe.c:401 stack backtrace: CPU: 0 PID: 2155 Comm: syz-executor.4 Not tainted 5.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2127 check_prev_add kernel/locking/lockdep.c:2936 [inline] check_prevs_add kernel/locking/lockdep.c:3059 [inline] validate_chain kernel/locking/lockdep.c:3674 [inline] __lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 console_lock_spinning_enable kernel/printk/printk.c:1714 [inline] console_unlock+0x371/0xc80 kernel/printk/printk.c:2573 vprintk_emit+0x1ca/0x560 kernel/printk/printk.c:2098 vprintk_func+0x8d/0x1e0 kernel/printk/printk_safe.c:401 printk+0xba/0xed kernel/printk/printk.c:2146 tty_port_close_start.part.0+0x503/0x550 drivers/tty/tty_port.c:569 tty_port_close_start drivers/tty/tty_port.c:641 [inline] tty_port_close+0x46/0x170 drivers/tty/tty_port.c:634 tty_release+0x45e/0x1210 drivers/tty/tty_io.c:1779 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 exit_task_work include/linux/task_work.h:30 [inline] do_exit+0xbfc/0x2a60 kernel/exit.c:825 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x47f/0x2150 kernel/signal.c:2781 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae Note that this patch has a small side effect that 5 tty_debug() messages in __proc_set_tty()/tty_release_checks() are enabled, for pr_debug() is conditional while KERN_DEBUG is unconditional. It seems that remaining tty_debug() messages meant to be unconditional without knowing that pr_debug() is conditional. Note that this patch is not complete, for there is same dependency due to memory allocation fault injection at tty_buffer_alloc(). By the way, as soon as applying this patch, I guess that syzkaller starts generating hung task reports because /dev/ttyprintk can trivially trigger flood of tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__, port->count); message, and adding if (strcmp(tty_driver_name(tty), "ttyprintk")) immediately before this tty_warn() gave me Tested-by: response. [1] https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a Reported-by: syzbot Reported-by: syzbot Signed-off-by: Tetsuo Handa Fixes: b6da31b2c07c46f2 ("tty: Fix data race in tty_insert_flip_string_fixed_flag") Cc: # 4.18+ --- include/linux/t
Re: [PATCH] tty: use printk_safe context at tty_msg()
On 2021/04/07 0:10, Petr Mladek wrote: >> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c >> index 6d4995a5f318..d59f7873bc49 100644 >> --- a/drivers/tty/tty_buffer.c >> +++ b/drivers/tty/tty_buffer.c >> @@ -156,6 +156,7 @@ static struct tty_buffer *tty_buffer_alloc(struct >> tty_port *port, size_t size) >> { >> struct llist_node *free; >> struct tty_buffer *p; >> +unsigned long flags; >> >> /* Round the buffer size out */ >> size = __ALIGN_MASK(size, TTYB_ALIGN_MASK); >> @@ -172,7 +173,9 @@ static struct tty_buffer *tty_buffer_alloc(struct >> tty_port *port, size_t size) >> have queued and recycle that ? */ >> if (atomic_read(>buf.mem_used) > port->buf.mem_limit) >> return NULL; >> -p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC); >> +printk_safe_enter_irqsave(flags); >> +p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC | >> __GFP_NOWARN); >> +printk_safe_exit_irqrestore(flags); > > I do not see tty_buffer_alloc() anywhere at > https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a > > Could you please provide more details why this is needed? Quoting from https://syzkaller.appspot.com/text?tag=CrashLog=131d4ddf80 at https://syzkaller.appspot.com/bug?id=bda1a87bea05a9072003e6447c44b03ca1492b1c : [ 274.992629] FAULT_INJECTION: forcing a failure. [ 274.992629] name failslab, interval 1, probability 0, space 0, times 0 [ 274.994085] CPU: 1 PID: 19187 Comm: syz-executor6 Not tainted 4.17.0+ #89 [ 274.994094] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 274.994099] Call Trace: [ 274.994120] dump_stack+0x1b9/0x294 [ 274.994145] ? dump_stack_print_info.cold.2+0x52/0x52 [ 275.270563] should_fail.cold.4+0xa/0x1a [ 275.274639] ? fault_create_debugfs_attr+0x1f0/0x1f0 [ 275.279747] ? graph_lock+0x170/0x170 [ 275.283541] ? check_noncircular+0x20/0x20 [ 275.287788] ? debug_check_no_locks_freed+0x310/0x310 [ 275.292983] ? lock_acquire+0x1dc/0x520 [ 275.296961] ? process_echoes+0xb6/0x170 [ 275.301021] ? check_same_owner+0x320/0x320 [ 275.305347] ? print_usage_bug+0xc0/0xc0 [ 275.309404] ? __might_sleep+0x95/0x190 [ 275.313394] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20 [ 275.318934] __should_failslab+0x124/0x180 [ 275.323168] should_failslab+0x9/0x14 [ 275.326974] __kmalloc+0x63/0x760 [ 275.330423] ? mutex_trylock+0x2a0/0x2a0 [ 275.334479] ? print_usage_bug+0xc0/0xc0 [ 275.338539] ? __tty_buffer_request_room+0x2d2/0x7f0 [ 275.343648] __tty_buffer_request_room+0x2d2/0x7f0 [ 275.348576] ? flush_to_ldisc+0x560/0x560 [ 275.352717] ? lock_acquire+0x1dc/0x520 [ 275.356698] ? pty_write+0xf9/0x1f0 [ 275.360325] ? lock_release+0xa10/0xa10 [ 275.364293] ? kasan_check_read+0x11/0x20 [ 275.368434] ? do_raw_spin_unlock+0x9e/0x2e0 [ 275.372831] ? do_raw_spin_trylock+0x1b0/0x1b0 [ 275.377411] tty_insert_flip_string_fixed_flag+0x8d/0x1f0 [ 275.382942] ? do_raw_spin_lock+0xc1/0x200 [ 275.387174] pty_write+0x12c/0x1f0 [ 275.390720] tty_put_char+0x129/0x150 [ 275.394525] ? dev_match_devt+0x90/0x90 [ 275.398498] ? pty_write_room+0xc9/0xf0 [ 275.402485] ? __sanitizer_cov_trace_switch+0x53/0x90 [ 275.407684] __process_echoes+0x4d9/0x8d0 [ 275.411829] ? _raw_spin_unlock_irqrestore+0x74/0xc0 [ 275.416936] process_echoes+0xfc/0x170 [ 275.420816] n_tty_set_termios+0xb56/0xe80 [ 275.425047] ? n_tty_receive_signal_char+0x120/0x120 [ 275.430148] tty_set_termios+0x7a0/0xac0 [ 275.434201] ? tty_wait_until_sent+0x5b0/0x5b0 [ 275.438782] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20 [ 275.444318] set_termios+0x41e/0x7d0 [ 275.448026] ? tty_perform_flush+0x80/0x80 [ 275.452256] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20 [ 275.457845] tty_mode_ioctl+0x535/0xb50 [ 275.461821] ? set_termios+0x7d0/0x7d0 [ 275.465740] ? check_same_owner+0x320/0x320 [ 275.470059] ? graph_lock+0x170/0x170 [ 275.473857] n_tty_ioctl_helper+0x54/0x3b0 [ 275.478085] n_tty_ioctl+0x54/0x320 [ 275.481711] ? ldsem_down_read+0x37/0x40 [ 275.485760] ? ldsem_down_read+0x37/0x40 [ 275.489836] tty_ioctl+0x5e1/0x1870 [ 275.493454] ? commit_echoes+0x1d0/0x1d0 [ 275.497506] ? tty_vhangup+0x30/0x30 [ 275.501212] ? rcu_is_watching+0x85/0x140 [ 275.505353] ? rcu_report_qs_rnp+0x790/0x790 [ 275.509757] ? __fget+0x40c/0x650 [ 275.513243] ? find_held_lock+0x31/0x1c0 [ 275.517439] ? expand_files.part.8+0x9a0/0x9a0 [ 275.522033] ? kasan_check_write+0x14/0x20 [ 275.526389] ? __mutex_unlock_slowpath+0x180/0x8a0 [ 275.531314] ? wait_for_completion+0x870/0x870 [ 275.535907] ? tty_vhangup+0x30/0x30 [ 275.539616] do_vfs_ioctl+0x1cf/0x16f0 [ 275.543522] ? ioctl_preallocate+0x2e0/0x2e0 [ 275.547934] ? fget_raw+0x20/0x20 [ 275.551554] ? __sb_end_write+0xac/0xe0 [ 275.25] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20 [ 275.561064] ? fput+0x130/0x1a0 [ 275.564345] ?
Re: [PATCH] tty: use printk_safe context at tty_msg()
On 2021/04/06 16:10, Greg Kroah-Hartman wrote: > On Tue, Apr 06, 2021 at 02:31:43PM +0900, Tetsuo Handa wrote: >> On 2021/04/06 13:51, Jiri Slaby wrote: >>> On 03. 04. 21, 6:14, Tetsuo Handa wrote: >>>> --- a/include/linux/tty.h >>>> +++ b/include/linux/tty.h >>>> @@ -14,6 +14,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include <../../kernel/printk/internal.h> >>> >>> Including printk's internal header in linux/tty.h doesn't look correct to >>> me. >>> >> >> This is because this patch wants __printk_safe_enter()/__printk_safe_exit() >> without #ifdef'ing CONFIG_PRINTK. > > Then those functions need to be "properly" exported, not placed only in > an "internal.h" file that obviously should not be included from anywhere > like this. > >> Peter and Sergey, what should we do? >> Can we move printk_safe_enter_irqsave()/printk_safe_exit_irqrestore() to >> include/linux/printk.h ? > > Are you sure that is the only way to resolve this? Publishing printk_safe_enter_irqsave() etc. was once proposed at https://lkml.kernel.org/r/20181016050428.17966-3-sergey.senozhat...@gmail.com and Peter Zijlstra did not like such change. But we reconfirmed that "tty_port lock must switch to printk_safe" at https://lkml.kernel.org/r/20190219013254.GA20023@jagdpanzerIV . Two years has elapsed since a completely new printk implementation which would not use printk_safe context was proposed, but printk_safe context is still remaining. Therefore, we might need to tolerate, with a warning that printk_safe_enter_irqsave() etc. are not intended for general use, use of printk_safe context for tty_msg() and tty_buffer_alloc().
Re: [PATCH] tty: use printk_safe context at tty_msg()
On 2021/04/06 13:51, Jiri Slaby wrote: > On 03. 04. 21, 6:14, Tetsuo Handa wrote: >> --- a/include/linux/tty.h >> +++ b/include/linux/tty.h >> @@ -14,6 +14,7 @@ >> #include >> #include >> #include >> +#include <../../kernel/printk/internal.h> > > Including printk's internal header in linux/tty.h doesn't look correct to me. > This is because this patch wants __printk_safe_enter()/__printk_safe_exit() without #ifdef'ing CONFIG_PRINTK. Peter and Sergey, what should we do? Can we move printk_safe_enter_irqsave()/printk_safe_exit_irqrestore() to include/linux/printk.h ?
Re: [PATCH v5] lockdep: Allow tuning tracing capacity constants.
Dmitry, I've just sent this patch to tomoyo-test1.git tree, for finding bugs (which previously could not be found due to lack of this patch) in linux-next.git will be helpful anyway. Since this patch should appear in tomorrow's linux-next.git , please prepare syzkaller for tuning appropriate values (default value + 1 should be OK). On 2021/03/31 19:58, Tetsuo Handa wrote: > Peter, are you there? > > If you keep silence, we will assume that applying this patch is the way to go. > > On 2021/03/20 16:34, Dmitry Vyukov wrote: >> On Mon, Feb 8, 2021 at 11:29 AM Tetsuo Handa >> wrote: >>> >>> Since syzkaller continues various test cases until the kernel crashes, >>> syzkaller tends to examine more locking dependencies than normal systems. >>> As a result, syzbot is reporting that the fuzz testing was terminated >>> due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis >>> via /proc/lockdep* did not show any obvious culprit [4] [5], we have no >>> choice but allow tuning tracing capacity constants. >>> >>> [1] >>> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b >>> [2] >>> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a >>> [3] >>> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f >>> [4] >>> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp >>> [5] >>> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp >>> >>> Reported-by: syzbot >>> Reported-by: syzbot >>> Reported-by: syzbot >>> References: >>> https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp >>> Signed-off-by: Tetsuo Handa >>> Acked-by: Dmitry Vyukov >> >> Peter, ping. >> Please at least provide some feedback. >> This is still the top crasher on syzbot and harms testing of the whole >> kernel. I am periodically thinking of disabling LOCKDEP on syzbot as >> harming more than helping, but so far talking myself out of it because >> it will likely be broken more when we try to re-enable it and I still >> hope for a timely resolution of this issue. >> >
[PATCH] printk: Make multiple inclusion of kernel/printk/internal.h safe
kernel test robot reported that kernel/printk/internal.h is not ready to be #include'd for multiple times, and that vprintk_func() for CONFIG_PRINTK=n should be marked as "static inline". Since "tty: use printk_safe context at tty_msg()" will make kernel/printk/internal.h be #include'd for multiple times, let's fix this problem first. Reported-by: kernel test robot Signed-off-by: Tetsuo Handa Fixes: 099f1c84c0052ec1 ("printk: introduce per-cpu safe_print seq buffer") Cc: # 4.11+ --- kernel/printk/internal.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 3a8fd491758c..2119c546cba2 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -2,6 +2,10 @@ /* * internal.h - printk internal definitions */ + +#ifndef _KERNEL_PRINTK_INTERNAL_H +#define _KERNEL_PRINTK_INTERNAL_H + #include #ifdef CONFIG_PRINTK @@ -56,7 +60,7 @@ void defer_console_output(void); #else -__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } +static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } /* * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem @@ -72,3 +76,5 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } static inline void printk_safe_init(void) { } static inline bool printk_percpu_data_ready(void) { return false; } #endif /* CONFIG_PRINTK */ + +#endif -- 2.18.4
[PATCH] tty: use printk_safe context at tty_msg()
syzbot is reporting circular locking dependency due to calling printk() with port lock held [1]. When this problem was reported, we worried whether printk_safe context will remain available in future kernels [2], and then this problem was forgotten. But in order to utilize syzbot's resource for finding other bugs/reproducers by closing this one of top crashers, let's apply a patch which counts on availability of printk_safe context. syzbot is also reporting same dependency due to memory allocation fault injection at tty_buffer_alloc(). Although __GFP_NOWARN cannot prevent memory allocation fault injection from calling printk(), let's use __GFP_NOWARN at tty_buffer_alloc() in addition to using printk_safe context, for generating many lines of messages due to warn_alloc() is annoying. If we want to report it, we can use pr_warn() instead. [1] https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a [2] https://lkml.kernel.org/r/20190218054649.GA26686@jagdpanzerIV Reported-by: syzbot Reported-by: syzbot Signed-off-by: Tetsuo Handa Fixes: b6da31b2c07c46f2 ("tty: Fix data race in tty_insert_flip_string_fixed_flag") Cc: # 4.18+ --- drivers/tty/tty_buffer.c | 5 - include/linux/tty.h | 9 - 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c index 6d4995a5f318..d59f7873bc49 100644 --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -156,6 +156,7 @@ static struct tty_buffer *tty_buffer_alloc(struct tty_port *port, size_t size) { struct llist_node *free; struct tty_buffer *p; + unsigned long flags; /* Round the buffer size out */ size = __ALIGN_MASK(size, TTYB_ALIGN_MASK); @@ -172,7 +173,9 @@ static struct tty_buffer *tty_buffer_alloc(struct tty_port *port, size_t size) have queued and recycle that ? */ if (atomic_read(>buf.mem_used) > port->buf.mem_limit) return NULL; - p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC); + printk_safe_enter_irqsave(flags); + p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC | __GFP_NOWARN); + printk_safe_exit_irqrestore(flags); if (p == NULL) return NULL; diff --git a/include/linux/tty.h b/include/linux/tty.h index 95fc2f100f12..7ae8eb46fec3 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -14,6 +14,7 @@ #include #include #include +#include <../../kernel/printk/internal.h> /* @@ -773,7 +774,13 @@ static inline void proc_tty_unregister_driver(struct tty_driver *d) {} #endif #define tty_msg(fn, tty, f, ...) \ - fn("%s %s: " f, tty_driver_name(tty), tty_name(tty), ##__VA_ARGS__) + do {\ + unsigned long flags;\ + \ + printk_safe_enter_irqsave(flags); \ + fn("%s %s: " f, tty_driver_name(tty), tty_name(tty), ##__VA_ARGS__); \ + printk_safe_exit_irqrestore(flags); \ + } while (0) #define tty_debug(tty, f, ...) tty_msg(pr_debug, tty, f, ##__VA_ARGS__) #define tty_info(tty, f, ...) tty_msg(pr_info, tty, f, ##__VA_ARGS__) -- 2.18.4
[PATCH 2/2] misc: vmw_vmci: explicitly initialize vmci_datagram payload
KMSAN complains that vmci_check_host_caps() left the payload part of check_msg uninitialized. = BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10 CPU: 1 PID: 1 Comm: swapper/0 Tainted: GB 5.11.0-rc7+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 Call Trace: dump_stack+0x21c/0x280 kmsan_report+0xfb/0x1e0 kmsan_internal_check_memory+0x202/0x520 kmsan_check_memory+0xd/0x10 iowrite8_rep+0x86/0x380 vmci_guest_probe_device+0xf0b/0x1e70 pci_device_probe+0xab3/0xe70 really_probe+0xd16/0x24d0 driver_probe_device+0x29d/0x3a0 device_driver_attach+0x25a/0x490 __driver_attach+0x78c/0x840 bus_for_each_dev+0x210/0x340 driver_attach+0x89/0xb0 bus_add_driver+0x677/0xc40 driver_register+0x485/0x8e0 __pci_register_driver+0x1ff/0x350 vmci_guest_init+0x3e/0x41 vmci_drv_init+0x1d6/0x43f do_one_initcall+0x39c/0x9a0 do_initcall_level+0x1d7/0x259 do_initcalls+0x127/0x1cb do_basic_setup+0x33/0x36 kernel_init_freeable+0x29a/0x3ed kernel_init+0x1f/0x840 ret_from_fork+0x1f/0x30 Uninit was created at: kmsan_internal_poison_shadow+0x5c/0xf0 kmsan_slab_alloc+0x8d/0xe0 kmem_cache_alloc+0x84f/0xe30 vmci_guest_probe_device+0xd11/0x1e70 pci_device_probe+0xab3/0xe70 really_probe+0xd16/0x24d0 driver_probe_device+0x29d/0x3a0 device_driver_attach+0x25a/0x490 __driver_attach+0x78c/0x840 bus_for_each_dev+0x210/0x340 driver_attach+0x89/0xb0 bus_add_driver+0x677/0xc40 driver_register+0x485/0x8e0 __pci_register_driver+0x1ff/0x350 vmci_guest_init+0x3e/0x41 vmci_drv_init+0x1d6/0x43f do_one_initcall+0x39c/0x9a0 do_initcall_level+0x1d7/0x259 do_initcalls+0x127/0x1cb do_basic_setup+0x33/0x36 kernel_init_freeable+0x29a/0x3ed kernel_init+0x1f/0x840 ret_from_fork+0x1f/0x30 Bytes 28-31 of 36 are uninitialized Memory access of size 36 starts at 8881675e5f00 = Signed-off-by: Tetsuo Handa Fixes: 1f166439917b69d3 ("VMCI: guest side driver implementation.") Cc: --- drivers/misc/vmw_vmci/vmci_guest.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c index cc8eeb361fcd..1018dc77269d 100644 --- a/drivers/misc/vmw_vmci/vmci_guest.c +++ b/drivers/misc/vmw_vmci/vmci_guest.c @@ -168,7 +168,7 @@ static int vmci_check_host_caps(struct pci_dev *pdev) VMCI_UTIL_NUM_RESOURCES * sizeof(u32); struct vmci_datagram *check_msg; - check_msg = kmalloc(msg_size, GFP_KERNEL); + check_msg = kzalloc(msg_size, GFP_KERNEL); if (!check_msg) { dev_err(>dev, "%s: Insufficient memory\n", __func__); return -ENOMEM; -- 2.18.4
[PATCH 1/2] misc: vmw_vmci: explicitly initialize vmci_notify_bm_set_msg struct
KMSAN complains that the vmci_use_ppn64() == false path in vmci_dbell_register_notification_bitmap() left upper 32bits of bitmap_set_msg.bitmap_ppn64 member uninitialized. = BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 Call Trace: dump_stack+0x21c/0x280 kmsan_report+0xfb/0x1e0 kmsan_internal_check_memory+0x484/0x520 kmsan_check_memory+0xd/0x10 iowrite8_rep+0x86/0x380 vmci_send_datagram+0x150/0x280 vmci_dbell_register_notification_bitmap+0x133/0x1e0 vmci_guest_probe_device+0xcab/0x1e70 pci_device_probe+0xab3/0xe70 really_probe+0xd16/0x24d0 driver_probe_device+0x29d/0x3a0 device_driver_attach+0x25a/0x490 __driver_attach+0x78c/0x840 bus_for_each_dev+0x210/0x340 driver_attach+0x89/0xb0 bus_add_driver+0x677/0xc40 driver_register+0x485/0x8e0 __pci_register_driver+0x1ff/0x350 vmci_guest_init+0x3e/0x41 vmci_drv_init+0x1d6/0x43f do_one_initcall+0x39c/0x9a0 do_initcall_level+0x1d7/0x259 do_initcalls+0x127/0x1cb do_basic_setup+0x33/0x36 kernel_init_freeable+0x29a/0x3ed kernel_init+0x1f/0x840 ret_from_fork+0x1f/0x30 Local variable bitmap_set_msg@vmci_dbell_register_notification_bitmap created at: vmci_dbell_register_notification_bitmap+0x50/0x1e0 vmci_dbell_register_notification_bitmap+0x50/0x1e0 Bytes 28-31 of 32 are uninitialized Memory access of size 32 starts at 88810098f570 = Signed-off-by: Tetsuo Handa Fixes: 83e2ec765be03e8a ("VMCI: doorbell implementation.") Cc: --- drivers/misc/vmw_vmci/vmci_doorbell.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c b/drivers/misc/vmw_vmci/vmci_doorbell.c index 345addd9306d..fa8a7fce4481 100644 --- a/drivers/misc/vmw_vmci/vmci_doorbell.c +++ b/drivers/misc/vmw_vmci/vmci_doorbell.c @@ -326,7 +326,7 @@ int vmci_dbell_host_context_notify(u32 src_cid, struct vmci_handle handle) bool vmci_dbell_register_notification_bitmap(u64 bitmap_ppn) { int result; - struct vmci_notify_bm_set_msg bitmap_set_msg; + struct vmci_notify_bm_set_msg bitmap_set_msg = { }; bitmap_set_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID, VMCI_SET_NOTIFY_BITMAP); -- 2.18.4
Re: [PATCH] misc: vmw_vmci: initialize payload passed to vmci_send_datagram()
On 2021/04/01 15:18, Greg Kroah-Hartman wrote: > On Thu, Apr 01, 2021 at 02:57:47PM +0900, Tetsuo Handa wrote: >> KMSAN complains that the vmci_use_ppn64() == false path in >> vmci_dbell_register_notification_bitmap() left upper 32bits of >> bitmap_set_msg.bitmap_ppn64 member uninitialized. >> >> KMSAN also complains that vmci_check_host_caps() left the payload part >> of check_msg uninitialized. >> > > What commit does this "fix"? Can you resend with a proper "Fixes:" tag > so we know where it needs to be backported to? It seems that this problem exists since the introduction. commit 83e2ec765be03e8a8a07619e65df70b48a1db023 Author: George Zhang Date: Tue Jan 8 15:53:51 2013 -0800 VMCI: doorbell implementation. commit 1f166439917b69d3046e2e49fe923579d9181212 Author: George Zhang Date: Tue Jan 8 15:55:32 2013 -0800 VMCI: guest side driver implementation. But this patch should be safe to backport to as old as possible, for this patch is merely explicitly initializing variables.
[PATCH] kernel: initialize cpumask before parsing
KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized. [ 148.133411][ T5509] = [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] = Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var(). Signed-off-by: Tetsuo Handa --- kernel/irq/proc.c| 4 ++-- kernel/profile.c | 2 +- kernel/trace/trace.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index 98138788cb04..7c5cd42df3b9 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -144,7 +144,7 @@ static ssize_t write_irq_affinity(int type, struct file *file, if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EIO; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; if (type) @@ -238,7 +238,7 @@ static ssize_t default_affinity_write(struct file *file, cpumask_var_t new_value; int err; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/profile.c b/kernel/profile.c index 6f69a4195d56..c2ebddb5e974 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -430,7 +430,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file, cpumask_var_t new_value; int err; - if (!alloc_cpumask_var(_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index eccb4e1187cc..7607dc8a20b2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4827,7 +4827,7 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, cpumask_var_t tracing_cpumask_new; int err; - if (!alloc_cpumask_var(_cpumask_new, GFP_KERNEL)) + if (!zalloc_cpumask_var(_cpumask_new, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(ubuf, count, tracing_cpumask_new); -- 2.18.4
[PATCH] misc: vmw_vmci: initialize payload passed to vmci_send_datagram()
[ 21.549639][T1] ? driver_attach+0xb0/0xb0 [ 21.549639][T1] driver_attach+0x89/0xb0 [ 21.549639][T1] bus_add_driver+0x677/0xc40 [ 21.549639][T1] driver_register+0x485/0x8e0 [ 21.549639][T1] __pci_register_driver+0x1ff/0x350 [ 21.549639][T1] vmci_guest_init+0x3e/0x41 [ 21.549639][T1] vmci_drv_init+0x1d6/0x43f [ 21.549639][T1] do_one_initcall+0x39c/0x9a0 [ 21.549639][T1] ? null_init+0x11dc/0x11dc [ 21.549639][T1] ? kmsan_get_metadata+0x116/0x180 [ 21.549639][T1] ? kmsan_get_shadow_origin_ptr+0x84/0xb0 [ 21.549639][T1] ? null_init+0x11dc/0x11dc [ 21.549639][T1] do_initcall_level+0x1d7/0x259 [ 21.549639][T1] do_initcalls+0x127/0x1cb [ 21.549639][T1] ? cpu_init_udelay+0xcf/0xcf [ 21.549639][T1] ? debug_boot_weak_hash_enable+0x61/0x61 [ 21.549639][T1] do_basic_setup+0x33/0x36 [ 21.549639][T1] kernel_init_freeable+0x29a/0x3ed [ 21.549639][T1] ? rest_init+0x1f0/0x1f0 [ 21.549639][T1] kernel_init+0x1f/0x840 [ 21.549639][T1] ? rest_init+0x1f0/0x1f0 [ 21.549639][T1] ret_from_fork+0x1f/0x30 [ 21.549639][T1] [ 21.549639][T1] Uninit was created at: [ 21.549639][T1] kmsan_internal_poison_shadow+0x5c/0xf0 [ 21.549639][T1] kmsan_slab_alloc+0x8d/0xe0 [ 21.549639][T1] kmem_cache_alloc+0x84f/0xe30 [ 21.549639][T1] vmci_guest_probe_device+0xd11/0x1e70 [ 21.549639][T1] pci_device_probe+0xab3/0xe70 [ 21.549639][T1] really_probe+0xd16/0x24d0 [ 21.549639][T1] driver_probe_device+0x29d/0x3a0 [ 21.549639][T1] device_driver_attach+0x25a/0x490 [ 21.549639][T1] __driver_attach+0x78c/0x840 [ 21.549639][T1] bus_for_each_dev+0x210/0x340 [ 21.549639][T1] driver_attach+0x89/0xb0 [ 21.549639][T1] bus_add_driver+0x677/0xc40 [ 21.549639][T1] driver_register+0x485/0x8e0 [ 21.549639][T1] __pci_register_driver+0x1ff/0x350 [ 21.549639][T1] vmci_guest_init+0x3e/0x41 [ 21.549639][T1] vmci_drv_init+0x1d6/0x43f [ 21.549639][T1] do_one_initcall+0x39c/0x9a0 [ 21.549639][T1] do_initcall_level+0x1d7/0x259 [ 21.549639][T1] do_initcalls+0x127/0x1cb [ 21.549639][T1] do_basic_setup+0x33/0x36 [ 21.549639][T1] kernel_init_freeable+0x29a/0x3ed [ 21.549639][T1] kernel_init+0x1f/0x840 [ 21.549639][T1] ret_from_fork+0x1f/0x30 [ 21.549639][T1] [ 21.549639][T1] Bytes 28-31 of 36 are uninitialized [ 21.549639][T1] Memory access of size 36 starts at 8881675e5f00 [ 21.549639][T1] = [ 21.639830][T1] Guest personality initialized and is active [ 21.642165][T1] VMCI host device registered (name=vmci, major=10, minor=121) Signed-off-by: Tetsuo Handa --- drivers/misc/vmw_vmci/vmci_doorbell.c | 2 +- drivers/misc/vmw_vmci/vmci_guest.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c b/drivers/misc/vmw_vmci/vmci_doorbell.c index 345addd9306d..fa8a7fce4481 100644 --- a/drivers/misc/vmw_vmci/vmci_doorbell.c +++ b/drivers/misc/vmw_vmci/vmci_doorbell.c @@ -326,7 +326,7 @@ int vmci_dbell_host_context_notify(u32 src_cid, struct vmci_handle handle) bool vmci_dbell_register_notification_bitmap(u64 bitmap_ppn) { int result; - struct vmci_notify_bm_set_msg bitmap_set_msg; + struct vmci_notify_bm_set_msg bitmap_set_msg = { }; bitmap_set_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID, VMCI_SET_NOTIFY_BITMAP); diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c index cc8eeb361fcd..1018dc77269d 100644 --- a/drivers/misc/vmw_vmci/vmci_guest.c +++ b/drivers/misc/vmw_vmci/vmci_guest.c @@ -168,7 +168,7 @@ static int vmci_check_host_caps(struct pci_dev *pdev) VMCI_UTIL_NUM_RESOURCES * sizeof(u32); struct vmci_datagram *check_msg; - check_msg = kmalloc(msg_size, GFP_KERNEL); + check_msg = kzalloc(msg_size, GFP_KERNEL); if (!check_msg) { dev_err(>dev, "%s: Insufficient memory\n", __func__); return -ENOMEM; -- 2.18.4
Re: [PATCH v5] lockdep: Allow tuning tracing capacity constants.
Peter, are you there? If you keep silence, we will assume that applying this patch is the way to go. On 2021/03/20 16:34, Dmitry Vyukov wrote: > On Mon, Feb 8, 2021 at 11:29 AM Tetsuo Handa > wrote: >> >> Since syzkaller continues various test cases until the kernel crashes, >> syzkaller tends to examine more locking dependencies than normal systems. >> As a result, syzbot is reporting that the fuzz testing was terminated >> due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis >> via /proc/lockdep* did not show any obvious culprit [4] [5], we have no >> choice but allow tuning tracing capacity constants. >> >> [1] >> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b >> [2] >> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a >> [3] >> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f >> [4] >> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp >> [5] >> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp >> >> Reported-by: syzbot >> Reported-by: syzbot >> Reported-by: syzbot >> References: >> https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp >> Signed-off-by: Tetsuo Handa >> Acked-by: Dmitry Vyukov > > Peter, ping. > Please at least provide some feedback. > This is still the top crasher on syzbot and harms testing of the whole > kernel. I am periodically thinking of disabling LOCKDEP on syzbot as > harming more than helping, but so far talking myself out of it because > it will likely be broken more when we try to re-enable it and I still > hope for a timely resolution of this issue. >
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/24 20:10, Mimi Zohar wrote: > On Wed, 2021-03-24 at 19:10 +0900, Tetsuo Handa wrote: >> On 2021/03/24 1:13, Mimi Zohar wrote: >>> On Wed, 2021-03-24 at 00:14 +0900, Tetsuo Handa wrote: >>>> On 2021/03/23 23:47, Mimi Zohar wrote: >>>>> Initially I also questioned making "integrity" an LSM. Perhaps it's >>>>> time to reconsider. For now, it makes sense to just fix the NULL >>>>> pointer dereferencing. >>>> >>>> Do we think calling panic() as "fix the NULL pointer dereferencing" ? >>> >>> Not supplying "integrity" as an "lsm=" option is a user error. There >>> are only two options - allow or deny the caller to proceed. If the >>> user is expecting the integrity subsystem to be properly working, >>> returning a NULL and allowing the system to boot (RFC patch version) >>> does not make sense. Better to fail early. >> >> What does the "user" mean? Those who load the vmlinux? >> Only the "root" user (so called administrators)? >> Any users including other than "root" user? >> >> If the user means those who load the vmlinux, that user is explicitly asking >> for disabling "integrity" for some reason. In that case, it is a bug if >> booting with "integrity" disabled is impossible. >> >> If the user means something other than those who load the vmlinux, >> is there a possibility that that user (especially non "root" users) is >> allowed to try to use "integrity" ? If processes other than global init >> process can try to use "integrity", wouldn't it be a DoS attack vector? >> Please explain in the descripotion why calling panic() does not cause >> DoS attack vector. > > User in this case, is anyone rebooting the system and is intentionally > changing the default values, dropping the "integrity" option on the > boot command line. OK. Then, I expect that the system boots instead of calling panic(). That user is explicitly asking for disabling "integrity" for some reason.
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/24 1:13, Mimi Zohar wrote: > On Wed, 2021-03-24 at 00:14 +0900, Tetsuo Handa wrote: >> On 2021/03/23 23:47, Mimi Zohar wrote: >>> Initially I also questioned making "integrity" an LSM. Perhaps it's >>> time to reconsider. For now, it makes sense to just fix the NULL >>> pointer dereferencing. >> >> Do we think calling panic() as "fix the NULL pointer dereferencing" ? > > Not supplying "integrity" as an "lsm=" option is a user error. There > are only two options - allow or deny the caller to proceed. If the > user is expecting the integrity subsystem to be properly working, > returning a NULL and allowing the system to boot (RFC patch version) > does not make sense. Better to fail early. What does the "user" mean? Those who load the vmlinux? Only the "root" user (so called administrators)? Any users including other than "root" user? If the user means those who load the vmlinux, that user is explicitly asking for disabling "integrity" for some reason. In that case, it is a bug if booting with "integrity" disabled is impossible. If the user means something other than those who load the vmlinux, is there a possibility that that user (especially non "root" users) is allowed to try to use "integrity" ? If processes other than global init process can try to use "integrity", wouldn't it be a DoS attack vector? Please explain in the descripotion why calling panic() does not cause DoS attack vector.
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/23 23:47, Mimi Zohar wrote: > Initially I also questioned making "integrity" an LSM. Perhaps it's > time to reconsider. For now, it makes sense to just fix the NULL > pointer dereferencing. Do we think calling panic() as "fix the NULL pointer dereferencing" ?
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/23 22:37, Tetsuo Handa wrote: > On 2021/03/23 21:09, Mimi Zohar wrote: >> Please take a look at the newer version of this patch. Do you want to >> add any tags? > > Oh, I didn't know that you already posted the newer version. > >> diff --git a/security/integrity/iint.c b/security/integrity/iint.c >> index 1d20003243c3..0ba01847e836 100644 >> --- a/security/integrity/iint.c >> +++ b/security/integrity/iint.c >> @@ -98,6 +98,14 @@ struct integrity_iint_cache *integrity_inode_get(struct >> inode *inode) >> struct rb_node *node, *parent = NULL; >> struct integrity_iint_cache *iint, *test_iint; >> >> +/* >> + * The integrity's "iint_cache" is initialized at security_init(), >> + * unless it is not included in the ordered list of LSMs enabled >> + * on the boot command line. >> + */ >> +if (!iint_cache) >> +panic("%s: lsm=integrity required.\n", __func__); >> + > > This looks strange. If "lsm=" parameter must include "integrity", > it implies that nobody is allowed to disable "integrity" at boot. > Then, why not unconditionally call integrity_iintcache_init() by > not counting on DEFINE_LSM(integrity) declaration? Or, I think below one is also possible. diff --git a/security/integrity/iint.c b/security/integrity/iint.c index 1d20003243c3..37afc5168891 100644 --- a/security/integrity/iint.c +++ b/security/integrity/iint.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "integrity.h" static struct rb_root integrity_iint_tree = RB_ROOT; @@ -85,6 +86,20 @@ static void iint_free(struct integrity_iint_cache *iint) kmem_cache_free(iint_cache, iint); } +static void init_once(void *foo) +{ + struct integrity_iint_cache *iint = foo; + + memset(iint, 0, sizeof(*iint)); + iint->ima_file_status = INTEGRITY_UNKNOWN; + iint->ima_mmap_status = INTEGRITY_UNKNOWN; + iint->ima_bprm_status = INTEGRITY_UNKNOWN; + iint->ima_read_status = INTEGRITY_UNKNOWN; + iint->ima_creds_status = INTEGRITY_UNKNOWN; + iint->evm_status = INTEGRITY_UNKNOWN; + mutex_init(>mutex); +} + /** * integrity_inode_get - find or allocate an iint associated with an inode * @inode: pointer to the inode @@ -102,6 +117,18 @@ struct integrity_iint_cache *integrity_inode_get(struct inode *inode) if (iint) return iint; + if (!iint_cache) { + static DEFINE_MUTEX(lock); + unsigned int flags = memalloc_nofs_save(); + + mutex_lock(); + if (!iint_cache) + iint_cache = kmem_cache_create("iint_cache", + sizeof(struct integrity_iint_cache), + 0, SLAB_PANIC, init_once); + mutex_unlock(); + memalloc_nofs_restore(flags); + } iint = kmem_cache_alloc(iint_cache, GFP_NOFS); if (!iint) return NULL; @@ -150,25 +177,8 @@ void integrity_inode_free(struct inode *inode) iint_free(iint); } -static void init_once(void *foo) -{ - struct integrity_iint_cache *iint = foo; - - memset(iint, 0, sizeof(*iint)); - iint->ima_file_status = INTEGRITY_UNKNOWN; - iint->ima_mmap_status = INTEGRITY_UNKNOWN; - iint->ima_bprm_status = INTEGRITY_UNKNOWN; - iint->ima_read_status = INTEGRITY_UNKNOWN; - iint->ima_creds_status = INTEGRITY_UNKNOWN; - iint->evm_status = INTEGRITY_UNKNOWN; - mutex_init(>mutex); -} - static int __init integrity_iintcache_init(void) { - iint_cache = - kmem_cache_create("iint_cache", sizeof(struct integrity_iint_cache), - 0, SLAB_PANIC, init_once); return 0; } DEFINE_LSM(integrity) = {
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/23 21:09, Mimi Zohar wrote: > Please take a look at the newer version of this patch. Do you want to > add any tags? Oh, I didn't know that you already posted the newer version. > diff --git a/security/integrity/iint.c b/security/integrity/iint.c > index 1d20003243c3..0ba01847e836 100644 > --- a/security/integrity/iint.c > +++ b/security/integrity/iint.c > @@ -98,6 +98,14 @@ struct integrity_iint_cache *integrity_inode_get(struct > inode *inode) > struct rb_node *node, *parent = NULL; > struct integrity_iint_cache *iint, *test_iint; > > + /* > + * The integrity's "iint_cache" is initialized at security_init(), > + * unless it is not included in the ordered list of LSMs enabled > + * on the boot command line. > + */ > + if (!iint_cache) > + panic("%s: lsm=integrity required.\n", __func__); > + This looks strange. If "lsm=" parameter must include "integrity", it implies that nobody is allowed to disable "integrity" at boot. Then, why not unconditionally call integrity_iintcache_init() by not counting on DEFINE_LSM(integrity) declaration? > iint = integrity_iint_find(inode); > if (iint) > return iint; >
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/20 5:03, Mimi Zohar wrote: > The integrity's "iint_cache" is initialized at security_init(). Only > after an IMA policy is loaded, which is initialized at late_initcall, > is a file's integrity status stored in the "iint_cache". > > All integrity_inode_get() callers first verify that the IMA policy has > been loaded, before calling it. Yet for some reason, it is still being > called, causing a NULL pointer dereference. > > qemu-system-x86_64 (...snipped...) lsm=smack (...snipped...) Hmm, why are you using lsm=smack instead of security=smack ? Since use of lsm= overrides CONFIG_LSM="lockdown,yama,safesetid,integrity,tomoyo,smack,bpf" settings, only smack is activated, which means that integrity_iintcache_init() will not be called by DEFINE_LSM(integrity) = { .name = "integrity", .init = integrity_iintcache_init, }; declaration. That's the reason iint_cache == NULL when integrity_inode_get() is called.
Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized
On 2021/03/20 5:03, Mimi Zohar wrote: > The integrity's "iint_cache" is initialized at security_init(). Only > after an IMA policy is loaded, which is initialized at late_initcall, > is a file's integrity status stored in the "iint_cache". > > All integrity_inode_get() callers first verify that the IMA policy has > been loaded, before calling it. Yet for some reason, it is still being > called, causing a NULL pointer dereference. > > As reported by Dmitry Vyukov: > in qemu: > qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu > max,migratable=off -smp 4 -m 4G,slots=4,maxmem=16G-hda > wheezy.img -kernel arch/x86/boot/bzImage -nographic -vga std > -soundhw all -usb -usbdevice tablet -bt hci -bt device:keyboard >-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net > nic,model=virtio-net-pci -object > memory-backend-file,id=pmem1,share=off,mem-path=/dev/zero,size=64M > -device nvdimm,id=nvdimm1,memdev=pmem1 -append "console=ttyS0 > root=/dev/sda earlyprintk=serial rodata=n oops=panic panic_on_warn=1 > panic=86400 lsm=smack numa=fake=2 nopcid dummy_hcd.num=8" -pidfile > vm_pid -m 2G -cpu host > I tried similar command line (without "-enable-kvm" and without "-cpu host" as I'm running from VMware, without "-soundhw all", without "-machine q35,nvdimm" and "-device nvdimm,id=nvdimm1,memdev=pmem1" etc.) on 5.12-rc4. While I was finally able to hit similar crash when I used "-smp 1" instead of "-smp 4", I suspect this is not a integrity module's problem but a memory initialization/corruption problem, for I got various different crashes (INT3) at memory allocation when I was trimming command line options trying to reproduce the same crash. Dmitry, do you get different crashes by changing command line arguments?
[tip: perf/urgent] lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 3a85969e9d912d5dd85362ee37b5f81266e00e77 Gitweb: https://git.kernel.org/tip/3a85969e9d912d5dd85362ee37b5f81266e00e77 Author:Tetsuo Handa AuthorDate:Sun, 21 Mar 2021 15:49:13 +09:00 Committer: Ingo Molnar CommitterDate: Sun, 21 Mar 2021 11:59:57 +01:00 lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message Since this message is printed when dynamically allocated spinlocks (e.g. kzalloc()) are used without initialization (e.g. spin_lock_init()), suggest to developers to check whether initialization functions for objects were called, before making developers wonder what annotation is missing. [ mingo: Minor tweaks to the message. ] Signed-off-by: Tetsuo Handa Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-ker...@i-love.sakura.ne.jp Signed-off-by: Ingo Molnar --- kernel/locking/lockdep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c6d0c1d..c30eb88 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -930,7 +930,8 @@ static bool assign_lock_key(struct lockdep_map *lock) /* Debug-check: all keys must be persistent! */ debug_locks_off(); pr_err("INFO: trying to register non-static key.\n"); - pr_err("the code is fine but needs lockdep annotation.\n"); + pr_err("The code is fine but needs lockdep annotation, or maybe\n"); + pr_err("you didn't initialize this object before use?\n"); pr_err("turning off the locking correctness validator.\n"); dump_stack(); return false;
[PATCH] lockdep: add a hint for "INFO: trying to register non-static key." message
Since this message is printed when dynamically allocated spinlocks (e.g. kzalloc()) are used without initialization (e.g. spin_lock_init()), suggest developers to check whether initialization functions for objects are called, before making developers wonder what annotation is missing. Signed-off-by: Tetsuo Handa --- kernel/locking/lockdep.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c6d0c1dc6253..44c549f5c061 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -931,6 +931,7 @@ static bool assign_lock_key(struct lockdep_map *lock) debug_locks_off(); pr_err("INFO: trying to register non-static key.\n"); pr_err("the code is fine but needs lockdep annotation.\n"); + pr_err("maybe you didn't initialize this object before you use?\n"); pr_err("turning off the locking correctness validator.\n"); dump_stack(); return false; -- 2.18.4
Re: possible deadlock in start_this_handle (2)
Peter, would you answer to 6 questions listed below? Below is a reproducer kernel module (prefixed with "my_" for distinction) for https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba . -- test.c -- #include #include #include <../fs/ext4/ext4.h> static struct lockdep_map my__fs_reclaim_map = STATIC_LOCKDEP_MAP_INIT("my_fs_reclaim", __fs_reclaim_map); static noinline void my_journal_init_common(journal_t *journal) { static struct lock_class_key my_jbd2_trans_commit_key; lockdep_init_map(>j_trans_commit_map, "my_jbd2_handle", _jbd2_trans_commit_key, 0); } static noinline void my_init_once(void *foo) { struct ext4_inode_info *my_ei = (struct ext4_inode_info *) foo; init_rwsem(_ei->xattr_sem); } static int __init lockdep_test_init(void) { journal_t *journal1 = kzalloc(sizeof(*journal1), GFP_KERNEL | __GFP_NOFAIL); struct ext4_inode_info *ei1 = kzalloc(sizeof(*ei1), GFP_KERNEL | __GFP_NOFAIL); struct ext4_inode_info *ei2 = kzalloc(sizeof(*ei2), GFP_KERNEL | __GFP_NOFAIL); my_journal_init_common(journal1); my_init_once(ei1); my_init_once(ei2); rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_); down_write(>xattr_sem); up_write(>xattr_sem); rwsem_release(>j_trans_commit_map, _THIS_IP_); down_write(>xattr_sem); lock_map_acquire(__fs_reclaim_map); lock_map_release(__fs_reclaim_map); up_write(>xattr_sem); lock_map_acquire(__fs_reclaim_map); rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_); rwsem_release(>j_trans_commit_map, _THIS_IP_); lock_map_release(__fs_reclaim_map); return -EINVAL; } module_init(lockdep_test_init); MODULE_LICENSE("GPL"); -- test.c -- -- dmesg -- [ 32.938906][ T2776] test: loading out-of-tree module taints kernel. [ 32.940200][ T2776] [ 32.940240][ T2776] == [ 32.940306][ T2776] WARNING: possible circular locking dependency detected [ 32.940373][ T2776] 5.12.0-rc3+ #846 Tainted: G O [ 32.940434][ T2776] -- [ 32.940500][ T2776] insmod/2776 is trying to acquire lock: [ 32.940557][ T2776] 9f1d438d98e0 (my_jbd2_handle){.+.+}-{0:0}, at: lockdep_test_init+0x0/0x1000 [test] [ 32.940651][ T2776] [ 32.940651][ T2776] but task is already holding lock: [ 32.940719][ T2776] c0631020 (my_fs_reclaim){+.+.}-{0:0}, at: lockdep_test_init+0x0/0x1000 [test] [ 32.940808][ T2776] [ 32.940808][ T2776] which lock already depends on the new lock. [ 32.940808][ T2776] [ 32.940897][ T2776] [ 32.940897][ T2776] the existing dependency chain (in reverse order) is: [ 32.940976][ T2776] [ 32.940976][ T2776] -> #2 (my_fs_reclaim){+.+.}-{0:0}: [ 32.941053][ T2776]lock_acquire+0xb3/0x380 [ 32.941112][ T2776]lockdep_test_init+0xe2/0x1000 [test] [ 32.941176][ T2776]do_one_initcall+0x58/0x2c0 [ 32.941234][ T2776]do_init_module+0x5b/0x210 [ 32.941291][ T2776]load_module+0x1884/0x19a0 [ 32.941347][ T2776]__do_sys_finit_module+0xc1/0x120 [ 32.941408][ T2776]__x64_sys_finit_module+0x15/0x20 [ 32.941469][ T2776]do_syscall_64+0x31/0x40 [ 32.941527][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.941594][ T2776] [ 32.941594][ T2776] -> #1 (_ei->xattr_sem){+.+.}-{3:3}: [ 32.941673][ T2776]lock_acquire+0xb3/0x380 [ 32.941728][ T2776]down_write+0x52/0x620 [ 32.941783][ T2776]lockdep_test_init+0xa2/0x1000 [test] [ 32.941846][ T2776]do_one_initcall+0x58/0x2c0 [ 32.941904][ T2776]do_init_module+0x5b/0x210 [ 32.941960][ T2776]load_module+0x1884/0x19a0 [ 32.942016][ T2776]__do_sys_finit_module+0xc1/0x120 [ 32.942077][ T2776]__x64_sys_finit_module+0x15/0x20 [ 32.942137][ T2776]do_syscall_64+0x31/0x40 [ 32.942193][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.942260][ T2776] [ 32.942260][ T2776] -> #0 (my_jbd2_handle){.+.+}-{0:0}: [ 32.942336][ T2776]check_prevs_add+0x16a/0x1040 [ 32.942395][ T2776]__lock_acquire+0x118b/0x1580 [ 32.942453][ T2776]lock_acquire+0xb3/0x380 [ 32.942509][ T2776]lockdep_test_init+0x140/0x1000 [test] [ 32.942574][ T2776]do_one_initcall+0x58/0x2c0 [ 32.942631][ T2776]do_init_module+0x5b/0x210 [ 32.942687][ T2776]load_module+0x1884/0x19a0 [ 32.942743][ T2776]__do_sys_finit_module+0xc1/0x120 [ 32.942803][ T2776]__x64_sys_finit_module+0x15/0x20 [ 32.942915][ T2776]do_syscall_64+0x31/0x40 [ 32.942973][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.943041][ T2776] [ 32.943041][ T2776] other info that might help us debug this: [ 32.943041][ T2776] [
Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
On 2021/03/12 0:54, Marco Elver wrote: >> But the more we could have the compiler automatically figure out >> things without needing an explicit tag, it would seem to me that this >> would be better, since manual tagging is going to be more error-prone. > > What you're alluding to here would go much further than a data race > detector ("data race" is still just defined by the memory model). The > wish that there was a static analysis tool that would automatically > understand the "concurrency semantics as intended by the developer" is > something that'd be nice to have, but just doesn't seem realistic. > Because how can a tool tell what the developer intended, without input > from that developer? Input from developers is very important for not only compilers and tools but also allowing bug-explorers to understand what is happening. ext4 currently has possible deadlock in start_this_handle (2) https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba which even maintainers cannot understand what is happening. How can bug-explorers know implicit logic which maintainers believe safe and correct? It is possible that some oversight in implicit logic is the cause of "possible deadlock in start_this_handle (2)". Making implicit assumptions clear helps understanding. Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking? syzbot is already waiting for "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 . > > If there's worry marking accesses is error-prone, then that might be a > signal that the concurrency design is too complex (or the developer > hasn't considered all cases). > > For that reason, we need to mark accesses to tell the compiler and > tooling where to expect concurrency, so that 1) the compiler generates > correct code, and 2) tooling such as KCSAN can double-check what the > developer intended is actually what's happening. and 3) bug-explorers can understand what the developers are assuming/missing.
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/18 22:13, Shuah Khan wrote: > Please don't review code that isn't sent upstream. This repo you are > looking at is a private branch created just to verify fixes on syzbot. But nobody was able to review this series when sent to ML (except you simply ignored my questions), and this series was merged to upstream and stable kernels despite there was an obvious assignment error which results in NULL pointer dereference. > > I understand the race window you are talking about. I have my way of > working to resolve it. Without an effort to share your understanding/idea and my understanding/idea, nobody can test/review what you call a solution.
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/18 0:06, Shuah Khan wrote: > Yes. I haven't sent the patch for that reason. I am trying to test a > solution. I haven't come up with a solution yet. > > Holding event_lock isn't the right solution. I am not going to accept > that. This is a window that gets triggered by syzbot injecting errors > in a sequence. Fixing this should be done taking other moving parts of > the driver into account. What is event_lock ? I questioned you what the event_lock is at https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp , but you haven't responded to that post. Also, you need to send https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test=f345de0d2e51a20a2a1c30fc22fa1527670d2095 because Greg already sent this regression into upstream and stable kernels.
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
Shuah, this driver is getting more and more cryptic and buggy. Please explain the strategy for serialization before you write patches. > - Fix attach_store() to check usbip_event_happened() before > waking up threads. No, this helps nothing. > diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c > index c4b4256e5dad3..f0a770adebd97 100644 > --- a/drivers/usb/usbip/vhci_sysfs.c > +++ b/drivers/usb/usbip/vhci_sysfs.c > @@ -418,6 +418,15 @@ static ssize_t attach_store(struct device *dev, struct > device_attribute *attr, > spin_unlock_irqrestore(>lock, flags); > /* end the lock */ > > + if (usbip_event_happened(>ud)) { > + /* > + * something went wrong - event handler shutting > + * the connection and doing reset - bail out > + */ > + dev_err(dev, "Event happended - handler is active\n"); > + return -EAGAIN; > + } > + detach_store() can queue shutdown event as soon as reaching "/* end the lock */" line but attach_store() might be preempted immediately after verifying that usbip_event_happened() was false (i.e. at this location) in order to wait for shutdown event posted by detach_store() to be processed. > wake_up_process(vdev->ud.tcp_rx); > wake_up_process(vdev->ud.tcp_tx); >
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/13 9:48, Tetsuo Handa wrote: > On 2021/03/12 14:44, Tetsuo Handa wrote: >> And what you are missing in your [PATCH 4,5,6/6] is >> >> diff --git a/drivers/usb/usbip/vhci_sysfs.c >> b/drivers/usb/usbip/vhci_sysfs.c >> index c4457026d5ad..3c64bd06ab53 100644 >> --- a/drivers/usb/usbip/vhci_sysfs.c >> +++ b/drivers/usb/usbip/vhci_sysfs.c >> @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct >> device_attribute *attr, >> /* end the lock */ >> >> wake_up_process(vdev->ud.tcp_rx); >> + schedule_timeout_uninterruptible(HZ); // Consider being preempted >> here. >> wake_up_process(vdev->ud.tcp_tx); >> >> rh_port_connect(vdev, speed); >> >> . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before >> wake_up_process(tcp_tx) is called. > > wake_up_process(tcp_rx) can call vhci_shutdown_connection() before > wake_up_process(tcp_tx) is called. > >> Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx >> memory via kthread_stop_put(tcp_tx), >> wake_up_process(tcp_tx) will access already freed memory. Your patch >> converted "NULL pointer dereference caused by >> failing to call kthread_stop_put(tcp_tx)" into "use after free caused by >> succeeding to call kthread_stop_put(tcp_tx)". >> > > And note that > > diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c > index c4457026d5ad..0e1a81d4632c 100644 > --- a/drivers/usb/usbip/vhci_sysfs.c > +++ b/drivers/usb/usbip/vhci_sysfs.c > @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, > struct device_attribute *attr, > spin_unlock_irqrestore(>lock, flags); > /* end the lock */ > > - wake_up_process(vdev->ud.tcp_rx); > - wake_up_process(vdev->ud.tcp_tx); > - > rh_port_connect(vdev, speed); > > + wake_up_process(vdev->ud.tcp_tx); > + wake_up_process(vdev->ud.tcp_rx); > + > return count; >} >static DEVICE_ATTR_WO(attach); > > is as well not sufficient, for you are still missing Well, since tx thread can as well call usbip_event_add(USBIP_EH_SHUTDOWN), reversing the order of wake_up_process(tcp_tx) and wake_up_process(tcp_rx) will not help. > > diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c > index c4457026d5ad..c958f89a9196 100644 > --- a/drivers/usb/usbip/vhci_sysfs.c > +++ b/drivers/usb/usbip/vhci_sysfs.c > @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, > struct device_attribute *attr, > spin_unlock_irqrestore(>lock, flags); > /* end the lock */ > > - wake_up_process(vdev->ud.tcp_rx); > - wake_up_process(vdev->ud.tcp_tx); > + schedule_timeout_uninterruptible(HZ); // Consider being preempted > here. > > rh_port_connect(vdev, speed); > > + wake_up_process(vdev->ud.tcp_tx); > + wake_up_process(vdev->ud.tcp_rx); > + > return count; >} >static DEVICE_ATTR_WO(attach); > > because vhci_port_disconnect() from detach_store() can call > usbip_event_add(>ud, VDEV_EVENT_DOWN) > (same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared > states are set up and > spinlocks are released. > > What you had better consider first is how to protect > event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions > from concurrent calls. Please respond to > https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp > before you try to make further changes. > After all, I believe that there is no choice but introduce a mutex for serialization. Greg, please pick up https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test=f345de0d2e51a20a2a1c30fc22fa1527670d2095 and below patch. >From e0579aa776e4a3568c06f767c193d2204b64679d Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sun, 14 Mar 2021 20:24:16 +0900 Subject: [PATCH v5] usb: usbip: serialize attach/detach operations The root problem syzbot has found [1] is that usbip module is not using serialization between attach/detach operations and event_handler(). This results in the following race windows. (1) two userspace processes can perform attach operation on the same device by writing the same content to the same attach interface file (2) one userspace process can perform detach operation on a device by writing to detach interface file while the other use
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/12 14:44, Tetsuo Handa wrote: > And what you are missing in your [PATCH 4,5,6/6] is > > diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c > index c4457026d5ad..3c64bd06ab53 100644 > --- a/drivers/usb/usbip/vhci_sysfs.c > +++ b/drivers/usb/usbip/vhci_sysfs.c > @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct > device_attribute *attr, > /* end the lock */ > > wake_up_process(vdev->ud.tcp_rx); > + schedule_timeout_uninterruptible(HZ); // Consider being preempted > here. > wake_up_process(vdev->ud.tcp_tx); > > rh_port_connect(vdev, speed); > > . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before > wake_up_process(tcp_tx) is called. wake_up_process(tcp_rx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called. > Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx > memory via kthread_stop_put(tcp_tx), > wake_up_process(tcp_tx) will access already freed memory. Your patch > converted "NULL pointer dereference caused by > failing to call kthread_stop_put(tcp_tx)" into "use after free caused by > succeeding to call kthread_stop_put(tcp_tx)". > And note that diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index c4457026d5ad..0e1a81d4632c 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, spin_unlock_irqrestore(>lock, flags); /* end the lock */ - wake_up_process(vdev->ud.tcp_rx); - wake_up_process(vdev->ud.tcp_tx); - rh_port_connect(vdev, speed); + wake_up_process(vdev->ud.tcp_tx); + wake_up_process(vdev->ud.tcp_rx); + return count; } static DEVICE_ATTR_WO(attach); is as well not sufficient, for you are still missing diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index c4457026d5ad..c958f89a9196 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, spin_unlock_irqrestore(>lock, flags); /* end the lock */ - wake_up_process(vdev->ud.tcp_rx); - wake_up_process(vdev->ud.tcp_tx); + schedule_timeout_uninterruptible(HZ); // Consider being preempted here. rh_port_connect(vdev, speed); + wake_up_process(vdev->ud.tcp_tx); + wake_up_process(vdev->ud.tcp_rx); + return count; } static DEVICE_ATTR_WO(attach); because vhci_port_disconnect() from detach_store() can call usbip_event_add(>ud, VDEV_EVENT_DOWN) (same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared states are set up and spinlocks are released. What you had better consider first is how to protect event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions from concurrent calls. Please respond to https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp before you try to make further changes.
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
I cloned git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git as you are testing changes there. > commit 09e4522c87ff54c655c09f318a68b012eda8eb01 (HEAD -> usbip_test, > origin/usbip_test) > Author: Shuah Khan > Date: Thu Mar 11 11:18:25 2021 -0700 > >usbip: fix vhci races in connection tear down > >- Change vhci_device_reset() to reset tcp_socket and sockfd. > if !in_disconnect How it can happen? vhci_device_reset() can be called only after vhci_shutdown_connection() completed, and vhci_shutdown_connection() from subsequent requests cannot be called until vhci_device_reset() completes. I consider it as a dead code which should be removed by my "[PATCH v4 05/12] usb: usbip: don't reset tcp_socket at vhci_device_reset()". And what you are missing in your [PATCH 4,5,6/6] is diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index c4457026d5ad..3c64bd06ab53 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, /* end the lock */ wake_up_process(vdev->ud.tcp_rx); + schedule_timeout_uninterruptible(HZ); // Consider being preempted here. wake_up_process(vdev->ud.tcp_tx); rh_port_connect(vdev, speed); . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called. Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx memory via kthread_stop_put(tcp_tx), wake_up_process(tcp_tx) will access already freed memory. Your patch converted "NULL pointer dereference caused by failing to call kthread_stop_put(tcp_tx)" into "use after free caused by succeeding to call kthread_stop_put(tcp_tx)".
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/11 21:57, Greg KH wrote: > On Thu, Mar 11, 2021 at 09:34:38PM +0900, Tetsuo Handa wrote: >> On 2021/03/11 3:33, Greg KH wrote: >>> On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote: >>>> This patch series fixes the following problems founds in syzbot >>>> fuzzing. >>> >>> Thanks for these, all now queued up. >> >> I send SIGSTOP to >> >> [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf >> [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf >> [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf >> >> because these patches merely converted NULL pointer dererefence bug to >> use-after-free bug >> by breaking kthread_get_run() into >> kthread_create()/get_task_struct()/wake_up_process(). > > I'll take follow-on patches to fix that other issue, if it's proven to > be valid. It's nice to fix up NULL dereference issues as soon as > possible :) Not an "other issue". Shuah's [PATCH 4,5,6/6] is failing to fix NULL pointer dereference issue. These patches simply replaces NULL pointer dereference issue (caused by preemption) with use after free issue (caused by exactly same preemption) issue. Shuah has to understand the consequence of calling wake_up_process() on rx thread in order to fix this NULL pointer dereference issue. The only fix we can safely apply now is https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp . Since I and Shuah agreed that we will remove kthread_get_run(), it is nice to fix up frequently happening -EINTR pointer dereference issue as soon as possible.
Re: [PATCH 0/6] usbip fixes to crashes found by syzbot
On 2021/03/11 3:33, Greg KH wrote: > On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote: >> This patch series fixes the following problems founds in syzbot >> fuzzing. > > Thanks for these, all now queued up. I send SIGSTOP to [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf because these patches merely converted NULL pointer dererefence bug to use-after-free bug by breaking kthread_get_run() into kthread_create()/get_task_struct()/wake_up_process().
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/10 11:07, Shuah Khan wrote: > On 3/9/21 6:02 PM, Tetsuo Handa wrote: >> On 2021/03/10 9:29, Shuah Khan wrote: >>>> It is not a large grain lock. Since event_handler() is exclusively >>>> executed, this lock >>>> does _NOT_ block event_handler() unless attach/detach operations run >>>> concurrently. >>>> >>> >>> event handler queues the events. It shouldn't be blocked by attach >>> and detach. The events could originate for various reasons during >>> the host and vhci operations. I don't like using this lock for >>> attach and detach. >> >> How can attach/detach deadlock event_handler()? >> event_handler() calls e.g. vhci_shutdown_connection() via >> ud->eh_ops.shutdown(ud). >> vhci_shutdown_connection() e.g. waits for termination of tx/rx threads via >> kthread_stop_put(). >> event_handler() is already blocked by detach operation. >> How it can make situation worse to wait for creation of tx/rx threads in >> attach operation? >> > > event_lock shouldn't be held during event ops. usbip_event_add() > uses it to add events. Protecting shutdown path needs a different > approach. I can't understand what you are talking about. event_lock is defined as static DEFINE_SPINLOCK(event_lock); in drivers/usb/usbip/usbip_event.c and usbip_event_add() uses it like spin_lock_irqsave(_lock, flags); spin_unlock_irqrestore(_lock, flags); , but so what? I know event_lock spinlock cannot be taken when calling ud->eh_ops.shutdown(ud); ud->eh_ops.reset(ud); ud->eh_ops.unusable(ud); in event_handler() because e.g. vhci_shutdown_connection() can sleep. What my [PATCH v4 01/12] is suggesting is usbip_event_mutex which is defined as static DEFINE_MUTEX(usbip_event_mutex); in drivers/usb/usbip/usbip_event.c and held when calling ud->eh_ops.shutdown(ud); ud->eh_ops.reset(ud); ud->eh_ops.unusable(ud); in event_handler(). Since event_handler() is executed exclusively via singlethreaded workqueue, "event_handler() holds usbip_event_mutex" itself does not introduce a new lock dependency. My question is, how holding usbip_event_mutex can cause deadlock if usbip_sockfd_store()/attach_store()/detach_store() also hold usbip_event_mutex . kthread_create() is essentially several kmalloc(GFP_KERNEL) where event_handler() cannot interfere unless event_handler() serves as a memory reclaiming function. My question again. What functions might sleep and hold locks other than kthread_create() for tx/rx threads? Your answer to my question (if you identified such dependency) will go into patch shown bottom which replaces my [PATCH v4 01/12]-[PATCH v4 04/12] patches. > > In any case, do you have comments on this patch which doesn't even > touch vhci driver? I'm just replying to your [PATCH 4/6] because all your [PATCH 4/6]-[PATCH 6/6] patches have the same oversight. > > I understand you are identifying additional race condition that > the vhci patches in this series might not fix. That doesn't mean > that these patches aren't valid. > > Do you have any comments specific to the patches in this series? Your [PATCH 5/6] is still racy regarding rh_port_connect() versus rh_port_disconnect(). As soon as you call wake_up_process(), rh_port_disconnect() can be called before rh_port_connect() is called. Since you don't like serializing event_handler() and usbip_sockfd_store()/attach_store()/detach_store() using usbip_event_mutex, my patch shown below which uses attach_detach_lock mutex cannot close such race window. Ideally, wake_up_process() should be deferred to after releasing attach_detach_lock, but please answer to my question first. From: Tetsuo Handa Date: Wed, 10 Mar 2021 18:31:54 +0900 syzbot is reporting a NULL pointer dereference at sock_sendmsg() [1], for lack of serialization between attach_store() and event_handler() causes vhci_shutdown_connection() to observe vdev->ud.tcp_tx == NULL while vdev->ud.tcp_socket != NULL. Please read the reference link for details of this race window. While usbip module exclusively runs event_handler(), usbip module has never thought the possibility that multiple userspace processes can concurrently call usbip_sockfd_store()/attach_store()/detach_store(). As a result, there is a TOCTTOU bug in these functions regarding ud->status value which can result in similar crashes like [1]. Simplest way would be to run all of event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions exclusively using a global mutex. But Shuah Khan does not want to share a global mutex between these functions, for ...[please fill in this part]... . Therefore, this patch introduces a per-submodule local mutex which closes race window within usbip_sockfd_store() and attach_store()/detach_store(). This loc
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/10 9:29, Shuah Khan wrote: >> It is not a large grain lock. Since event_handler() is exclusively executed, >> this lock >> does _NOT_ block event_handler() unless attach/detach operations run >> concurrently. >> >>> > > event handler queues the events. It shouldn't be blocked by attach > and detach. The events could originate for various reasons during > the host and vhci operations. I don't like using this lock for > attach and detach. How can attach/detach deadlock event_handler()? event_handler() calls e.g. vhci_shutdown_connection() via ud->eh_ops.shutdown(ud). vhci_shutdown_connection() e.g. waits for termination of tx/rx threads via kthread_stop_put(). event_handler() is already blocked by detach operation. How it can make situation worse to wait for creation of tx/rx threads in attach operation?
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/10 8:52, Shuah Khan wrote: > On 3/9/21 4:40 PM, Tetsuo Handa wrote: >> On 2021/03/10 4:50, Shuah Khan wrote: >>> On 3/9/21 4:04 AM, Tetsuo Handa wrote: >>>> On 2021/03/09 1:27, Shuah Khan wrote: >>>>> Yes. We might need synchronization between events, threads, and shutdown >>>>> in usbip_host side and in connection polling and threads in vhci. >>>>> >>>>> I am also looking at the shutdown sequences closely as well since the >>>>> local state is referenced without usbip_device lock in these paths. >>>>> >>>>> I am approaching these problems as peeling the onion an expression so >>>>> we can limit the changes and take a spot fix approach. We have the >>>>> goal to address these crashes and not introduce regressions. >>>> >>>> I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes >>>> without introducing regressions. While ud->lock is held when checking >>>> ud->status, >>>> current attach/detach code is racy about read/update of ud->status . I >>>> think we >>>> can close race in attach/detach code via a simple usbip_event_mutex >>>> serialization. >>>> >>> >>> Do you mean patches 1,2,3,3,4,5,6? >> >> Yes, my 1,2,3,4,5,6. >> >> Since you think that usbip_prepare_threads() does not worth introducing, I'm >> fine with >> replacing my 7,8,9,10,11,12 with your "[PATCH 0/6] usbip fixes to crashes >> found by syzbot". >> > > Using event lock isn't the right approach to solve the race. It is a > large grain lock. I am not looking to replace patches. It is not a large grain lock. Since event_handler() is exclusively executed, this lock does _NOT_ block event_handler() unless attach/detach operations run concurrently. > > I still haven't seen any response from you about if you were able to > verify the fixes I sent in fix the problem you are seeing. I won't be able to verify your fixes, for it is syzbot who is seeing the problem. But I can see that your patch description is wrong because you are ignoring what I'm commenting. Global serialization had better come first. Your patch description depends on global serialization.
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/10 4:50, Shuah Khan wrote: > On 3/9/21 4:04 AM, Tetsuo Handa wrote: >> On 2021/03/09 1:27, Shuah Khan wrote: >>> Yes. We might need synchronization between events, threads, and shutdown >>> in usbip_host side and in connection polling and threads in vhci. >>> >>> I am also looking at the shutdown sequences closely as well since the >>> local state is referenced without usbip_device lock in these paths. >>> >>> I am approaching these problems as peeling the onion an expression so >>> we can limit the changes and take a spot fix approach. We have the >>> goal to address these crashes and not introduce regressions. >> >> I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes >> without introducing regressions. While ud->lock is held when checking >> ud->status, >> current attach/detach code is racy about read/update of ud->status . I think >> we >> can close race in attach/detach code via a simple usbip_event_mutex >> serialization. >> > > Do you mean patches 1,2,3,3,4,5,6? Yes, my 1,2,3,4,5,6. Since you think that usbip_prepare_threads() does not worth introducing, I'm fine with replacing my 7,8,9,10,11,12 with your "[PATCH 0/6] usbip fixes to crashes found by syzbot".
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/09 20:04, Tetsuo Handa wrote: > On 2021/03/09 1:27, Shuah Khan wrote: >> Yes. We might need synchronization between events, threads, and shutdown >> in usbip_host side and in connection polling and threads in vhci. >> >> I am also looking at the shutdown sequences closely as well since the >> local state is referenced without usbip_device lock in these paths. >> >> I am approaching these problems as peeling the onion an expression so >> we can limit the changes and take a spot fix approach. We have the >> goal to address these crashes and not introduce regressions. > > I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes > without introducing regressions. While ud->lock is held when checking > ud->status, > current attach/detach code is racy about read/update of ud->status . I think > we > can close race in attach/detach code via a simple usbip_event_mutex > serialization. > Commit 04679b3489e048cd ("Staging: USB/IP: add client driver") says /* * The important thing is that only one context begins cleanup. * This is why error handling and cleanup become simple. * We do not want to consider race condition as possible. */ static void vhci_shutdown_connection(struct usbip_device *ud) but why are we allowing multiple contexts to begin startup? That's where a subtle race window syzbot is reporting happened. This driver has never thought the possibility that multiple userspace processes can concurrently begin startup. Then, it is quite natural that we make startup simple and safe, by enforcing that only one context begins startup. Without serialization between startup/cleanup/event_handler(), "Fix the above problems:" in your changelog cannot become true. First step should be closing time-of-check to time-of-use bug via entire serialization as if "nonpreemptible UP kernel".
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/09 1:27, Shuah Khan wrote: > Yes. We might need synchronization between events, threads, and shutdown > in usbip_host side and in connection polling and threads in vhci. > > I am also looking at the shutdown sequences closely as well since the > local state is referenced without usbip_device lock in these paths. > > I am approaching these problems as peeling the onion an expression so > we can limit the changes and take a spot fix approach. We have the > goal to address these crashes and not introduce regressions. I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes without introducing regressions. While ud->lock is held when checking ud->status, current attach/detach code is racy about read/update of ud->status . I think we can close race in attach/detach code via a simple usbip_event_mutex serialization.
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/08 16:35, Tetsuo Handa wrote: > On 2021/03/08 12:53, Shuah Khan wrote: >> Fix the above problems: >> - Stop using kthread_get_run() macro to create/start threads. >> - Create threads and get task struct reference. >> - Add kthread_create() failure handling and bail out. >> - Hold usbip_device lock to update local and shared states after >> creating rx and tx threads. >> - Update usbip_device status to SDEV_ST_USED. >> - Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx >> - Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx, >> and status) is complete. > > No, the whole usbip_sockfd_store() etc. should be serialized using a mutex, > for two different threads can open same file and write the same content at > the same moment. This results in seeing SDEV_ST_AVAILABLE and creating two > threads and overwiting global variables and setting SDEV_ST_USED and starting > two threads by each of two thread, which will later fail to call > kthread_stop() > on one of two thread because global variables are overwritten. > > kthread_crate() (which involves GFP_KERNEL allocation) can take long time > enough to hit > > usbip_sockfd_store() must perform > > if (sdev->ud.status != SDEV_ST_AVAILABLE) { Oops. This is if (sdev->ud.status == SDEV_ST_AVAILABLE) { of course. > /* misc assignments for attach operation */ > sdev->ud.status = SDEV_ST_USED; > } > > under a lock, or multiple ud->tcp_{tx,rx} are created (which will later > cause a crash like [1]) and refcount on ud->tcp_socket is leaked when > usbip_sockfd_store() is concurrently called. > > problem. That's why my patch introduced usbip_event_mutex lock. > And I think that same serialization is required between "rh_port_connect() from attach_store()" and "rh_port_disconnect() from vhci_shutdown_connection() via usbip_event_add(>ud, VDEV_EVENT_DOWN) from vhci_port_disconnect() from detach_store()", for both vhci_rx_pdu() from vhci_rx_loop() and vhci_port_disconnect() from detach_store() can queue VDEV_EVENT_DOWN event which can be processed without waiting for attach_store() to complete.
Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
On 2021/03/08 12:53, Shuah Khan wrote: > Fix the above problems: > - Stop using kthread_get_run() macro to create/start threads. > - Create threads and get task struct reference. > - Add kthread_create() failure handling and bail out. > - Hold usbip_device lock to update local and shared states after > creating rx and tx threads. > - Update usbip_device status to SDEV_ST_USED. > - Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx > - Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx, > and status) is complete. No, the whole usbip_sockfd_store() etc. should be serialized using a mutex, for two different threads can open same file and write the same content at the same moment. This results in seeing SDEV_ST_AVAILABLE and creating two threads and overwiting global variables and setting SDEV_ST_USED and starting two threads by each of two thread, which will later fail to call kthread_stop() on one of two thread because global variables are overwritten. kthread_crate() (which involves GFP_KERNEL allocation) can take long time enough to hit usbip_sockfd_store() must perform if (sdev->ud.status != SDEV_ST_AVAILABLE) { /* misc assignments for attach operation */ sdev->ud.status = SDEV_ST_USED; } under a lock, or multiple ud->tcp_{tx,rx} are created (which will later cause a crash like [1]) and refcount on ud->tcp_socket is leaked when usbip_sockfd_store() is concurrently called. problem. That's why my patch introduced usbip_event_mutex lock.
Re: [PATCH] security: tomoyo: fix error return code of tomoyo_update_domain()
On 2021/03/06 22:03, Jia-Ju Bai wrote: > When mutex_lock_interruptible() fails, the error return code of > tomoyo_update_domain() is not properly assigned. > To fix this bug, error is assigned with the return value of > mutex_lock_interruptible(), and then error is checked. Thanks for a patch, but this patch is wrong. Since the variable "error" is initialized as int error = is_delete ? -ENOENT : -ENOMEM; at the beginning of this function, unconditionally overwriting this variable with the return code of mutex_lock_interruptible() breaks if (error && !is_delete) { } block of this function. And the caller does not check if the return code is -EINTR instead of -ENOENT or -ENOMEM.
Re: possible deadlock in start_this_handle (2)
On 2021/02/15 23:29, Jan Kara wrote: > On Mon 15-02-21 23:06:15, Tetsuo Handa wrote: >> On 2021/02/15 21:45, Jan Kara wrote: >>> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote: >>>> Excuse me, but it seems to me that nothing prevents >>>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create() >>>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path. >>>> Will you explain when ext4_get_nojournal() path is executed? >>> >>> That's a good question but sadly I don't think that's it. >>> ext4_get_nojournal() is called when the filesystem is created without a >>> journal. In that case we also don't acquire jbd2_handle lockdep map. In the >>> syzbot report we can see: >> >> Since syzbot can test filesystem images, syzbot might have tested a >> filesystem >> image created both with and without journal within this boot. > > a) I think that syzbot reboots the VM between executing different tests to > get reproducible conditions. But in theory I agree the test may have > contained one image with and one image without a journal. syzkaller reboots the VM upon a crash. syzkaller can test multiple filesystem images within one boot. https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered ( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller. /* Just increment the non-pointer handle value */ static handle_t *ext4_get_nojournal(void) { 86 handle_t *handle = current->journal_info; unsigned long ref_cnt = (unsigned long)handle; BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT); 86 ref_cnt++; handle = (handle_t *)ref_cnt; current->journal_info = handle; 2006 return handle; } /* Decrement the non-pointer handle value */ static void ext4_put_nojournal(handle_t *handle) { unsigned long ref_cnt = (unsigned long)handle; 85 BUG_ON(ref_cnt == 0); 85 ref_cnt--; handle = (handle_t *)ref_cnt; current->journal_info = handle; } handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, int type, int blocks, int rsv_blocks, int revoke_creds) { journal_t *journal; int err; 2006 trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds, 2006 _RET_IP_); 2006 err = ext4_journal_check_start(sb); if (err < 0) return ERR_PTR(err); 2005 journal = EXT4_SB(sb)->s_journal; 1969 if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) 2006 return ext4_get_nojournal(); 1969 return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, GFP_NOFS, type, line); } > > *but* > > b) as I wrote in the email you are replying to, the jbd2_handle key is > private per filesystem. Thus for lockdep to complain about > jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep > maps must come from the same filesystem. > > *and* > > c) filesystem without journal doesn't use jbd2_handle lockdep map at all so > for such filesystems lockdep creates no dependency for jbd2_handle map. > What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case? Does this case happen on filesystem with journal, and can this case be executed by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for calling jbd2__journal_start() case the same? Also, I worry that jbd2__journal_restart() is problematic, for it calls stop_this_handle() (which calls memalloc_nofs_restore()) and then calls start_this_handle() (which fails to call memalloc_nofs_save() if an error occurs). An error from start_this_handle() from journal restart operation might need special handling (i.e. either remount-ro or panic) ?
Re: possible deadlock in start_this_handle (2)
On 2021/02/15 21:45, Jan Kara wrote: > On Sat 13-02-21 23:26:37, Tetsuo Handa wrote: >> Excuse me, but it seems to me that nothing prevents >> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create() >> without memalloc_nofs_save() when hitting ext4_get_nojournal() path. >> Will you explain when ext4_get_nojournal() path is executed? > > That's a good question but sadly I don't think that's it. > ext4_get_nojournal() is called when the filesystem is created without a > journal. In that case we also don't acquire jbd2_handle lockdep map. In the > syzbot report we can see: Since syzbot can test filesystem images, syzbot might have tested a filesystem image created both with and without journal within this boot. > > kswapd0/2246 is trying to acquire lock: > 888041a988e0 (jbd2_handle){}-{0:0}, at: > start_this_handle+0xf81/0x1380 fs/jbd2/transaction.c:444 > > but task is already holding lock: > 8be892c0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x30 > mm/page_alloc.c:5195 > > So this filesystem has very clearly been created with a journal. Also the > journal lockdep tracking machinery uses: While locks held by kswapd0/2246 are fs_reclaim, shrinker_rwsem, >s_umount_key#38 and jbd2_handle, isn't the dependency lockdep considers problematic is Chain exists of: jbd2_handle --> >xattr_sem --> fs_reclaim Possible unsafe locking scenario: CPU0CPU1 lock(fs_reclaim); lock(>xattr_sem); lock(fs_reclaim); lock(jbd2_handle); where CPU0 is kswapd/2246 and CPU1 is the case of ext4_get_nojournal() path? If someone has taken jbd2_handle and >xattr_sem in this order, isn't this dependency true? > > rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_); > > so a lockdep key is per-filesystem. Thus it is not possible that lockdep > would combine lock dependencies from two different filesystems. > > But I guess we could narrow the search for this problem by adding WARN_ONs > to ext4_xattr_set_handle() and ext4_xattr_inode_lookup_create() like: > > WARN_ON(ext4_handle_valid(handle) && !(current->flags & PF_MEMALLOC_NOFS)); > > It would narrow down a place in which PF_MEMALLOC_NOFS flag isn't set > properly... At least that seems like the most plausible way forward to me. You can use CONFIG_DEBUG_AID_FOR_SYZBOT for adding such WARN_ONs on linux-next.
[PATCH] pstore: fix warning in pstore_kill_sb()
syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the assumption that pstore_sb != NULL is wrong because pstore_fill_super() will not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL (due to memory allocation fault injection). Since mount_single() calls pstore_kill_sb() when pstore_fill_super() failed, pstore_kill_sb() needs to be aware of such failure path. [1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0 Reported-by: syzbot Signed-off-by: Tetsuo Handa --- fs/pstore/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 93a217e4f563..14658b009f1b 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -467,7 +467,7 @@ static struct dentry *pstore_mount(struct file_system_type *fs_type, static void pstore_kill_sb(struct super_block *sb) { mutex_lock(_sb_lock); - WARN_ON(pstore_sb != sb); + WARN_ON(pstore_sb && pstore_sb != sb); kill_litter_super(sb); pstore_sb = NULL; -- 2.18.4
Re: possible deadlock in start_this_handle (2)
On 2021/02/11 19:49, Jan Kara wrote: > This stacktrace should never happen. ext4_xattr_set() starts a transaction. > That internally goes through start_this_handle() which calls: > > handle->saved_alloc_context = memalloc_nofs_save(); > > and we restore the allocation context only in stop_this_handle() when > stopping the handle. And with this fs_reclaim_acquire() should remove > __GFP_FS from the mask and not call __fs_reclaim_acquire(). Excuse me, but it seems to me that nothing prevents ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create() without memalloc_nofs_save() when hitting ext4_get_nojournal() path. Will you explain when ext4_get_nojournal() path is executed? ext4_xattr_set() { handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits) == __ext4_journal_start() { return __ext4_journal_start_sb() { journal = EXT4_SB(sb)->s_journal; if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) return ext4_get_nojournal(); // Never calls memalloc_nofs_save() despite returning !IS_ERR() value. return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, GFP_NOFS, type, line); // Calls memalloc_nofs_save() when start_this_handle() returns 0. } } } error = ext4_xattr_set_handle(handle, inode, name_index, name, value, value_len, flags); { ext4_write_lock_xattr(inode, _expand); // Grabs >xattr_sem error = ext4_xattr_ibody_set(handle, inode, , ) { error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */) { ret = ext4_xattr_inode_lookup_create(handle, inode, i->value, i->value_len, _ea_inode); // Using GFP_KERNEL based on assumption that ext4_journal_start() called memalloc_nofs_save(). } } } }
Re: general protection fault in tomoyo_socket_sendmsg_permission
Greg, will you queue https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp (which can close a report which is wasting syzbot's resource with 5300+ crashes) for 5.12 ? The change shown below will be too large to test before merge window for 5.12 opens. The patch for fixing "general protection fault in tomoyo_socket_sendmsg_permission" will kill kthread_get_run(). Closing frequently crashing bug now is the better. On 2021/02/11 22:40, Tetsuo Handa wrote: > I guess that we need to serialize attach operation and reset/detach > operations, for > it seems there is a race window that might result in "general protection > fault in > tomoyo_socket_sendmsg_permission". Details follows... Here is untested diff that is expected to be complete. (1) Handle kthread_create() failure (which avoids "KASAN: null-ptr-deref Write in vhci_shutdown_connection") by grouping socket lookup, SOCK_STREAM check and kthread_get_run() into usbip_prepare_threads() function. (2) Serialize usbip_sockfd_store(), detach_store(), attach_store(), usbip_sockfd_store() and ud->eh_ops.shutdown()/ud->eh_ops.reset()/ud->eh_ops.unusable() operations using usbip_store_mutex mutex (which avoids "general protection fault in tomoyo_socket_sendmsg_permission"). (3) Don't reset ud->tcp_socket to NULL in vhci_device_reset(). Since tx_thread/rx_thread depends on ud->tcp_socket != NULL whereas tcp_socket and tx_thread/rx_thread are assigned at the same time, it is never safe to reset only ud->tcp_socket from ud->eh_ops.reset(). And actually nobody is calling ud->eh_ops.reset() without ud->eh_ops.shutdown(). (4) usbip_sockfd_store() must perform {sdev,udc}->ud.status != SDEV_ST_AVAILABLE && {sdev,udc}->ud.status = SDEV_ST_USED exclusively, or multiple tx_thread/rx_thread can be created when concurrently called. Although (2) will already serialize this action, (1) will make it possible to perform within one spinlock section. drivers/usb/usbip/stub_dev.c | 56 ++-- drivers/usb/usbip/usbip_common.c | 55 +++ drivers/usb/usbip/usbip_common.h | 13 drivers/usb/usbip/usbip_event.c | 9 + drivers/usb/usbip/vhci_hcd.c | 7 +--- drivers/usb/usbip/vhci_sysfs.c | 50 drivers/usb/usbip/vudc_sysfs.c | 50 7 files changed, 175 insertions(+), 65 deletions(-) diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c index 2305d425e6c9..522d58826049 100644 --- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -39,12 +39,11 @@ static DEVICE_ATTR_RO(usbip_status); * is used to transfer usbip requests by kernel threads. -1 is a magic number * by which usbip connection is finished. */ -static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *attr, - const char *buf, size_t count) +static ssize_t __usbip_sockfd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) { struct stub_device *sdev = dev_get_drvdata(dev); int sockfd = 0; - struct socket *socket; int rv; if (!sdev) { @@ -57,7 +56,12 @@ static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *a return -EINVAL; if (sockfd != -1) { - int err; + struct usbip_thread_info uti; + int err = usbip_prepare_threads(, >ud, sockfd, + stub_tx_loop, "stub_tx", stub_rx_loop, "stub_rx"); + + if (err) + return err; dev_info(dev, "stub up\n"); @@ -65,44 +69,46 @@ static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *a if (sdev->ud.status != SDEV_ST_AVAILABLE) { dev_err(dev, "not ready\n"); - goto err; + spin_unlock_irq(>ud.lock); + usbip_unprepare_threads(); + return -EINVAL; } - socket = sockfd_lookup(sockfd, ); - if (!socket) - goto err; - - sdev->ud.tcp_socket = socket; + sdev->ud.tcp_socket = uti.tcp_socket; sdev->ud.sockfd = sockfd; - - spin_unlock_irq(>ud.lock); - - sdev->ud.tcp_rx = kthread_get_run(stub_rx_loop, >ud, - "stub_rx"); - sdev->ud.tcp_tx = kthread_get_run(stub_tx_loop, >ud, - "stub_tx"); - - spin_lock_irq(&g
Re: possible deadlock in start_this_handle (2)
On 2021/02/12 22:12, Michal Hocko wrote: > On Fri 12-02-21 21:58:15, Tetsuo Handa wrote: >> On 2021/02/12 21:30, Michal Hocko wrote: >>> On Fri 12-02-21 12:22:07, Matthew Wilcox wrote: >>>> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote: >>>>> On 2021/02/12 1:41, Michal Hocko wrote: >>>>>> But I suspect we have drifted away from the original issue. I thought >>>>>> that a simple check would help us narrow down this particular case and >>>>>> somebody messing up from the IRQ context didn't sound like a completely >>>>>> off. >>>>>> >>>>> >>>>> From my experience at >>>>> https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp >>>>> , >>>>> I think we can replace direct PF_* manipulation with macros which do not >>>>> receive "struct task_struct *" argument. >>>>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for >>>>> manipulating PFA_* flags on a remote thread, we can >>>>> define similar ones for manipulating PF_* flags on current thread. Then, >>>>> auditing dangerous users becomes easier. >>>> >>>> No, nobody is manipulating another task's GFP flags. >>> >>> Agreed. And nobody should be manipulating PF flags on remote tasks >>> either. >>> >> >> No. You are misunderstanding. The bug report above is an example of >> manipulating PF flags on remote tasks. > > Could you be more specific? I do not remember there was any theory that > somebody is manipulating flags on a remote task. A very vague theory was > that an interrupt context might be doing that on the _current_ context > but even that is not based on any real evidence. It is a pure > speculation. > Please read the link above. The report is an example of manipulating PF flags on a remote task. You are thinking interrupt context as the only possible culprit, but you should also think concurrent access as the other possible culprit.
Re: possible deadlock in start_this_handle (2)
On 2021/02/12 21:30, Michal Hocko wrote: > On Fri 12-02-21 12:22:07, Matthew Wilcox wrote: >> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote: >>> On 2021/02/12 1:41, Michal Hocko wrote: >>>> But I suspect we have drifted away from the original issue. I thought >>>> that a simple check would help us narrow down this particular case and >>>> somebody messing up from the IRQ context didn't sound like a completely >>>> off. >>>> >>> >>> From my experience at >>> https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp >>> , >>> I think we can replace direct PF_* manipulation with macros which do not >>> receive "struct task_struct *" argument. >>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating >>> PFA_* flags on a remote thread, we can >>> define similar ones for manipulating PF_* flags on current thread. Then, >>> auditing dangerous users becomes easier. >> >> No, nobody is manipulating another task's GFP flags. > > Agreed. And nobody should be manipulating PF flags on remote tasks > either. > No. You are misunderstanding. The bug report above is an example of manipulating PF flags on remote tasks. You say "nobody should", but the reality is "there indeed was". There might be unnoticed others. The point of this proposal is to make it possible to "find such unnoticed users who are manipulating PF flags on remote tasks".
Re: possible deadlock in start_this_handle (2)
On 2021/02/12 1:41, Michal Hocko wrote: > But I suspect we have drifted away from the original issue. I thought > that a simple check would help us narrow down this particular case and > somebody messing up from the IRQ context didn't sound like a completely > off. > From my experience at https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp , I think we can replace direct PF_* manipulation with macros which do not receive "struct task_struct *" argument. Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating PFA_* flags on a remote thread, we can define similar ones for manipulating PF_* flags on current thread. Then, auditing dangerous users becomes easier.
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/02/12 11:22, Tetsuo Handa wrote: > On 2021/02/12 10:34, Shuah Khan wrote: >> On 2/10/21 6:14 PM, Tetsuo Handa wrote: >>> (Dropping LSM ML because this is not a TOMOYO's bug.) >>> >>> On 2021/02/11 4:29, Shuah Khan wrote: >>>> This is a good find. I already replied to the thread to send a complete >>>> fix. >>> >>> As I said at >>> https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp >>> , >>> the as-is patch is effectively a complete fix. And applying the as-is patch >>> should help spending >>> syzbot resources for reproducing "general protection fault in >>> tomoyo_socket_sendmsg_permission" >>> with debug printk() patch applied, which in turn will help you in >>> >>>> Right. I would like to get a clear understanding of how this condition >>>> is triggered. I am not saying this isn't a problem. Understanding how >>>> it is triggered helps find the best fix. >>> >>> part. Therefore, I strongly expect you to apply this version now. >>> >> >> Is there a reproducer for this problem? > > There is no reproducer for "general protection fault in > tomoyo_socket_sendmsg_permission" problem, but > the race condition is explained at > https://lkml.kernel.org/r/676d4518-0faa-9fab-15db-0db8d216d...@i-love.sakura.ne.jp > . > Here is a race window widening patch, and I locally reproduced "general protection fault in tomoyo_socket_sendmsg_permission". diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h index 8be857a4fa13..a7c68097aa1d 100644 --- a/drivers/usb/usbip/usbip_common.h +++ b/drivers/usb/usbip/usbip_common.h @@ -286,6 +286,8 @@ struct usbip_device { if (!IS_ERR(__k)) {\ get_task_struct(__k); \ wake_up_process(__k); \ + } else { \ + __k = NULL;\ } \ __k; \ }) diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index be37aec250c2..93e1271d0f5d 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -389,8 +389,12 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, /* end the lock */ vdev->ud.tcp_rx = kthread_get_run(vhci_rx_loop, >ud, "vhci_rx"); - vdev->ud.tcp_tx = kthread_get_run(vhci_tx_loop, >ud, "vhci_tx"); + { + struct task_struct *tx = kthread_get_run(vhci_tx_loop, >ud, "vhci_tx"); + schedule_timeout_uninterruptible(HZ); + vdev->ud.tcp_tx = tx; + } rh_port_connect(vdev, speed); return count; [ 134.880383][ T7879] vhci_hcd vhci_hcd.0: pdev(2) rhport(0) sockfd(6) [ 134.883465][ T7879] vhci_hcd vhci_hcd.0: devid(0) speed(2) speed_str(full-speed) [ 134.904750][ T74] vhci_hcd: vhci_device speed not set [ 134.933053][ T7873] vhci_hcd: connection closed [ 134.933212][ T7870] vhci_hcd: connection closed [ 134.933415][ T4488] vhci_hcd: stop threads [ 134.942970][ T4488] vhci_hcd: release socket [ 134.944949][ T14] vhci_hcd: vhci_device speed not set [ 134.947063][ T4488] vhci_hcd: disconnect device [ 134.951508][ T4488] vhci_hcd: stop threads [ 134.953861][ T4488] vhci_hcd: release socket [ 134.957113][ T4488] vhci_hcd: disconnect device [ 135.014433][ T56] usb 15-1: enqueue for inactive port 0 [ 135.065119][ T7877] vhci_hcd: connection closed [ 135.065205][ T7880] vhci_hcd: connection closed [ 135.065370][ T4480] vhci_hcd: stop threads [ 135.072854][ T4480] vhci_hcd: release socket [ 135.076490][ T4480] vhci_hcd: disconnect device [ 135.079535][ T4480] vhci_hcd: stop threads [ 135.084024][ T4480] vhci_hcd: release socket [ 135.087979][ T4480] vhci_hcd: disconnect device [ 135.134526][ T6820] vhci_hcd: vhci_device speed not set [ 135.314959][ T6821] vhci_hcd: vhci_device speed not set [ 135.621271][ T7884] vhci_hcd vhci_hcd.0: pdev(4) rhport(0) sockfd(3) [ 135.621290][ T7885] vhci_hcd vhci_hcd.0: pdev(0) rhport(0) sockfd(3) [ 135.625072][ T7884] vhci_hcd vhci_hcd.0: devid(0) speed(5) speed_str(super-speed) [ 135.628665][ T7885] vhci_hcd vhci_hcd.0: devid(0) speed(5) speed_str(super-speed) [ 135.630109][ T7887] vhci_hcd vhci_hcd.0: pdev(1) rhport(0) sockfd(3) [ 135.630116][ T7887] vhci_hcd vhci_hcd.0: devid(0) speed(5) speed_str(super-speed) [ 135.672834][ T7895] vhci_hcd
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/02/12 10:34, Shuah Khan wrote: > On 2/10/21 6:14 PM, Tetsuo Handa wrote: >> (Dropping LSM ML because this is not a TOMOYO's bug.) >> >> On 2021/02/11 4:29, Shuah Khan wrote: >>> This is a good find. I already replied to the thread to send a complete >>> fix. >> >> As I said at >> https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp >> , >> the as-is patch is effectively a complete fix. And applying the as-is patch >> should help spending >> syzbot resources for reproducing "general protection fault in >> tomoyo_socket_sendmsg_permission" >> with debug printk() patch applied, which in turn will help you in >> >>> Right. I would like to get a clear understanding of how this condition >>> is triggered. I am not saying this isn't a problem. Understanding how >>> it is triggered helps find the best fix. >> >> part. Therefore, I strongly expect you to apply this version now. >> > > Is there a reproducer for this problem? There is no reproducer for "general protection fault in tomoyo_socket_sendmsg_permission" problem, but the race condition is explained at https://lkml.kernel.org/r/676d4518-0faa-9fab-15db-0db8d216d...@i-love.sakura.ne.jp .
Re: general protection fault in tomoyo_socket_sendmsg_permission
(Dropping LSM ML because this is not a TOMOYO's bug.) On 2021/02/11 4:29, Shuah Khan wrote: > This is a good find. I already replied to the thread to send a complete > fix. As I said at https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp , the as-is patch is effectively a complete fix. And applying the as-is patch should help spending syzbot resources for reproducing "general protection fault in tomoyo_socket_sendmsg_permission" with debug printk() patch applied, which in turn will help you in > Right. I would like to get a clear understanding of how this condition > is triggered. I am not saying this isn't a problem. Understanding how > it is triggered helps find the best fix. part. Therefore, I strongly expect you to apply this version now.
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/02/11 3:17, Shuah Khan wrote: > I am looking to understand the syzbot configuration and a reproducer > to be able to debug and fix the problem. How is syzbot triggering the > vhci_hcd attach and detach sequence? I don't know. I'm waiting for syzbot to reproduce the problem on linux-next with https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/usb/usbip?id=f1bdf414e7dd0cbc26460425719fc3ea479947a2 . > > This helps me determine all these fix suggestions that are coming in > are fixes or papering over a real problem. What are these fix suggestions? "general protection fault in tomoyo_socket_sendmsg_permission" is a NULL pointer dereference which can happen if vhci_device_reset() and/or vhci_device_init() (which does vdev->ud.tcp_socket = NULL;) were unexpectedly called. There is no reproducer, and (as far as I know) no fix suggestion. "KASAN: null-ptr-deref Write in vhci_shutdown_connection" is an ERR_PTR(-EINTR) pointer dereference which can happen if kthread_create() was SIGKILLed. There is a reproducer, and https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp is a fix suggestion.
Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses
On 2021/02/11 1:54, Andy Shevchenko wrote: > On Thu, Feb 11, 2021 at 01:39:41AM +0900, Tetsuo Handa wrote: >> On 2021/02/11 1:18, Steven Rostedt wrote: >>> The point of this exercise is to be able to debug the *same* kernel that >>> someone is having issues with. And this is to facilitate that debugging. >> >> That's too difficult to use. If a problem is not reproducible, we will have >> no choice but always specify "never hash pointers" command line option. If a >> problem is reproducible, we can rebuild that kernel with "never hash >> pointers" >> config option turned on. > > I think what you are targeting is something like dynamic debug approach where > you can choose which prints to enable/disable and what enable/disable in them. What I'm targeting is "zero interaction from kernel command line options" like https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/usb/usbip?id=f1bdf414e7dd0cbc26460425719fc3ea479947a2 . > > In that case you specifically apply a command line option and enable only > files > / lines in the files. While there is boot-config feature for specifying very long kernel command line options, I can't enforce syzkaller users (including syzbot) to switch what to enable/disable via kernel command line options. Let alone defining a kernel command line option for single-purpose debug printk() changes like shown above.
Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses
On 2021/02/11 1:46, Steven Rostedt wrote: > On Thu, 11 Feb 2021 01:39:41 +0900 > Tetsuo Handa wrote: > >> On 2021/02/11 1:18, Steven Rostedt wrote: >>> The point of this exercise is to be able to debug the *same* kernel that >>> someone is having issues with. And this is to facilitate that debugging. >> >> That's too difficult to use. If a problem is not reproducible, we will have >> no choice but always specify "never hash pointers" command line option. If a >> problem is reproducible, we can rebuild that kernel with "never hash >> pointers" >> config option turned on. > > Now the question is, why do you need the unhashed pointer? Because unhashed pointers might give some clue. We can rebuild the same kernel using the same kernel config / compiler etc. and compare unhashed pointers with addresses in System.map / kallsyms files without reproducing the problem. > > Currently, the instruction pointer is what is fine right? You get the > a function and its offset. If there's something that is needed, perhaps we > should look at how to fix that, instead of just unhashing all pointers by > default. I'm not refusing to use kernel command line options. I'm expecting that we can also hardcode using kernel config options. Since boot-time switching via kernel command line options makes the kernel behave differently, less boot-time switching is better for avoiding unexpected problems (e.g. unintended LSM was enabled).
Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses
On 2021/02/11 1:18, Steven Rostedt wrote: > The point of this exercise is to be able to debug the *same* kernel that > someone is having issues with. And this is to facilitate that debugging. That's too difficult to use. If a problem is not reproducible, we will have no choice but always specify "never hash pointers" command line option. If a problem is reproducible, we can rebuild that kernel with "never hash pointers" config option turned on.
Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses
On 2021/02/10 14:18, Timur Tabi wrote: > [accidentally sent from the wrong email address, so resending] > > [The list of email addresses on CC: is getting quite lengthy, > so I hope I've included everyone.] > > Although hashing addresses printed via printk does make the > kernel more secure, it interferes with debugging, especially > with some functions like print_hex_dump() which always uses > hashed addresses. Oh, I was wishing diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 3b53c73580c5..34c7e145ac3c 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -802,7 +802,7 @@ static char *ptr_to_id(char *buf, char *end, const void *ptr, * Print the real pointer value for NULL and error pointers, * as they are not actual addresses. */ - if (IS_ERR_OR_NULL(ptr)) + if (IS_ERR_OR_NULL(ptr) || IS_ENABLED(CONFIG_DEBUG_DONT_HASH_POINTERS)) return pointer_string(buf, end, ptr, spec); /* When debugging early boot use non-cryptographically secure hash. */ change as a kernel config option, for more we try to switch using kernel command line options, more we likely make errors with sharing appropriate kernel command line options (e.g. https://github.com/google/syzkaller/commit/99c64d5c672700d6c0de63d11db25a0678e47a75 ).
[PATCH v5] lockdep: Allow tuning tracing capacity constants.
Since syzkaller continues various test cases until the kernel crashes, syzkaller tends to examine more locking dependencies than normal systems. As a result, syzbot is reporting that the fuzz testing was terminated due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis via /proc/lockdep* did not show any obvious culprit [4] [5], we have no choice but allow tuning tracing capacity constants. [1] https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b [2] https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a [3] https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f [4] https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp [5] https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp Reported-by: syzbot Reported-by: syzbot Reported-by: syzbot References: https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp Signed-off-by: Tetsuo Handa Acked-by: Dmitry Vyukov --- kernel/locking/lockdep.c | 2 +- kernel/locking/lockdep_internals.h | 8 +++--- lib/Kconfig.debug | 40 ++ 3 files changed, 45 insertions(+), 5 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index bdaf4829098c..65b3777e8089 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this, /* * For good efficiency of modular, we use power of 2 */ -#define MAX_CIRCULAR_QUEUE_SIZE4096UL +#define MAX_CIRCULAR_QUEUE_SIZE(1UL << CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS) #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1) /* diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h index de49f9e1c11b..ecb8662e7a4e 100644 --- a/kernel/locking/lockdep_internals.h +++ b/kernel/locking/lockdep_internals.h @@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ = #define MAX_STACK_TRACE_ENTRIES262144UL #define STACK_TRACE_HASH_SIZE 8192 #else -#define MAX_LOCKDEP_ENTRIES32768UL +#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS) -#define MAX_LOCKDEP_CHAINS_BITS16 +#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS /* * Stack-trace: tightly packed array of stack backtrace * addresses. Protected by the hash_lock. */ -#define MAX_STACK_TRACE_ENTRIES524288UL -#define STACK_TRACE_HASH_SIZE 16384 +#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS) +#define STACK_TRACE_HASH_SIZE (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS) #endif /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7937265ef879..4cb84b499636 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1332,6 +1332,46 @@ config LOCKDEP config LOCKDEP_SMALL bool +config LOCKDEP_BITS + int "Bitsize for MAX_LOCKDEP_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 15 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too low!" message. + +config LOCKDEP_CHAINS_BITS + int "Bitsize for MAX_LOCKDEP_CHAINS" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 16 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too low!" message. + +config LOCKDEP_STACK_TRACE_BITS + int "Bitsize for MAX_STACK_TRACE_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 19 + help + Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES too low!" message. + +config LOCKDEP_STACK_TRACE_HASH_BITS + int "Bitsize for STACK_TRACE_HASH_SIZE" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 14 + help + Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES. + +config LOCKDEP_CIRCULAR_QUEUE_BITS + int "Bitsize for elements in circular_queue struct" + depends on LOCKDEP + range 10 30 + default 12 + help + Try increasing this value if you hit "lockdep bfs error:-1" warning due to __cq_enqueue() failure. + config DEBUG_LOCKDEP bool "Lock dependency engine debugging" depends on DEBUG_KERNEL && LOCKDEP -- 2.18.4
Re: [PATCH v4 (resend)] lockdep: Allow tuning tracing capacity constants.
Hello, Andrew and Linus. We are stuck because Peter cannot respond. I think it is time to send this patch to linux-next. What do you think? On 2021/01/20 19:18, Dmitry Vyukov wrote: > On Wed, Jan 20, 2021 at 11:12 AM Tetsuo Handa > wrote: >> >> Since syzkaller continues various test cases until the kernel crashes, >> syzkaller tends to examine more locking dependencies than normal systems. >> As a result, syzbot is reporting that the fuzz testing was terminated >> due to hitting upper limits lockdep can track [1] [2] [3]. >> >> Peter Zijlstra does not want to allow tuning these limits via kernel >> config options, for such change discourages thinking. But analysis via >> /proc/lockdep* did not show any obvious culprit [4] [5]. It is possible >> that many hundreds of kn->active lock instances are to some degree >> contributing to these problems, but there is no means to verify whether >> these instances are created for protecting same callback functions. >> Unless Peter provides a way to make these instances per "which callback >> functions the lock instance will call (identified by something like MD5 >> of string representations of callback functions which each lock instance >> will protect)" than plain "serial number", I don't think that we can >> verify the culprit. >> >> [1] >> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b >> [2] >> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a >> [3] >> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f >> [4] >> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp >> [5] >> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp >> >> Reported-by: syzbot >> Reported-by: syzbot >> Reported-by: syzbot >> Signed-off-by: Tetsuo Handa >> Acked-by: Dmitry Vyukov > > Thanks for your persistence! > I still support this. And assessment of lockdep stats on overflow > seems to confirm it's just a very large lock graph triggered by > syzkaller. >
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/01/30 6:18, Shuah Khan wrote: > In this console log: It seems "this console log" refers to https://syzkaller.appspot.com/x/log.txt?x=1045303450 . > > 06:57:50 executing program 1: > socketpair$tipc(0x1e, 0x2, 0x0, &(0x7fc0)={0x}) > sendmsg$BATADV_CMD_GET_TRANSTABLE_LOCAL(r0, > &(0x7f0002c0)={&(0x7f0001c0), 0xc, &(0x7f000280)={0x0, > 0xd0010100}}, 0x0) > > [ 1151.090883][T23361] vhci_hcd vhci_hcd.0: pdev(4) rhport(0) sockfd(4) > [ 1151.097445][T23361] vhci_hcd vhci_hcd.0: devid(0) speed(1) > speed_str(low-speed) > 06:57:50 executing program 0: > r0 = syz_open_dev$binderN(&(0x7f000680)='/dev/binder#\x00', 0x0, 0x0) > ioctl$BINDER_WRITE_READ(r0, 0xc0306201, &(0x7f000cc0)={0x88, 0x0, > &(0x7f000b80)=[@transaction={0x40406300, {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0}}, @transaction={0x40406300, {0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}], 0x0, 0x0, 0x0}) > > [ 1151.164402][T23363] vhci_hcd: connection closed > [ 1151.167346][ T240] vhci_hcd: stop threads > > > [ 1151.178329][T26761] usb 17-1: new low-speed USB device number 2 using > vhci_hcd > > > SK: Looking at the console log, it looks like while connection is being > torn down, Excuse me, but it looks like (what comes here) while connection is being torn down ? I'm not familiar with driver code. > > > [ 1151.181245][ T240] vhci_hcd: release socket > > > Can you share your your test code for this program: > "executing program 1" I don't think program 1 is relevant. I think program 4 06:57:50 executing program 4: r0 = socket$tipc(0x1e, 0x2, 0x0) syz_usbip_server_init(0x1) close_range(r0, 0x, 0x0) which calls syz_usbip_server_init() as with other duplicates is relevant. > > Also your setup? Do you run usbip_host and vhci_hcd both? Who are you referring to with "you/your" ? I'm not running syzkaller in my setup and I don't have test code. I'm just proposing printing more messages in order to confirm the ordering of events and member values in structures.
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/01/30 1:05, Shuah Khan wrote: >> Since "general protection fault in tomoyo_socket_sendmsg_permission" is >> caused by >> unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread >> to >> terminate, tracing the ordering of events is worth knowing. Even adding >> schedule_timeout_uninterruptible() to before kernel_sendmsg() might help. >> > > What about the duplicate bug information that was in my email? > Did you take a look at that? I was not aware of the duplicate bugs. It is interesting that "KASAN: null-ptr-deref Write in event_handler" says that vdev->ud.tcp_tx became NULL at if (vdev->ud.tcp_tx) { /* this location */ kthread_stop_put(vdev->ud.tcp_tx); vdev->ud.tcp_tx = NULL; } which means that somebody else is unexpectedly resetting vdev->ud.tcp_tx to NULL. If memset() from vhci_device_init() from vhci_start() were unexpectedly called, all of tcp_socket, tcp_rx, tcp_tx etc. becomes NULL which can explain these bugs ? I'm inclined to report not only tcp_socket but also other fields when kernel_sendmsg() detected that tcp_socket is NULL...
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/01/29 4:05, Shuah Khan wrote: > The reason I don't like adding printk's is this is a race condition > and as a result time sensitive. Adding printks in the path will not > help debug this issue. It will make it harder to reproduce the problem. Not always. Adding printk() might make it easier to reproduce the problem. > > I am unable to reproduce the problem using the reproducer and running > multiple instances of the reproducer. Since syzkaller cannot find a reproducer for "general protection fault in tomoyo_socket_sendmsg_permission", and you cannot reproduce other problem using reproducer, trying to obtain some clue via printing messages by asking syzkaller to try debug patch can be very helpful. Since "general protection fault in tomoyo_socket_sendmsg_permission" is caused by unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread to terminate, tracing the ordering of events is worth knowing. Even adding schedule_timeout_uninterruptible() to before kernel_sendmsg() might help.
Re: [PATCH v2] smackfs: restrict bytes count in smackfs write functions
On 2021/01/28 22:27, Sabyrzhan Tasbolatov wrote: >> Doesn't this change break legitimate requests like >> >> char buffer[2]; >> >> memset(buffer, ' ', sizeof(buffer)); >> memcpy(buffer + sizeof(buffer) - 10, "foo", 3); >> write(fd, buffer, sizeof(buffer)); >> >> ? > > It does, in this case. Then I need to patch another version with > whitespace stripping before, after label. I just followed the same thing > that I see in security/selinux/selinuxfs.c sel_write_enforce() etc. > > It has the same memdup_user_nul() and count >= PAGE_SIZE check prior to that. Since sel_write_enforce() accepts string representation of an integer value, PAGE_SIZE is sufficient. But since smk_write_onlycap() and smk_write_relabel_self() accept list of space-delimited words, you need to prove why PAGE_SIZE does not break userspace in your patch. Also, due to the "too small to fail" memory-allocation rule, memdup_user_nul() for count < PAGE_SIZE * 8 bytes is "never fails with -ENOMEM unless SIGKILLed by the OOM killer". Also, memdup_user_nul() for count >= PAGE_SIZE * (1 << MAX_ORDER) - 1 bytes is "never succeeds". Thus, you can safely add if (count >= PAGE_SIZE * (1 << MAX_ORDER) - 1) return -EINVAL; // or -ENOMEM if you want compatibility to smackfs write functions. But it is a strange requirement that the caller of memdup_user_nul() has to be aware of upper limit in a way that we won't hit /* * There are several places where we assume that the order value is sane * so bail out early if the request is out of bound. */ if (unlikely(order >= MAX_ORDER)) { WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); return NULL; } path. memdup_user_nul() side should do if (count >= PAGE_SIZE * (1 << MAX_ORDER) - 1) return -ENOMEM; check and return -ENOMEM if memdup_user_nul() does not want to use __GFP_NOWARN. I still believe that memdup_user_nul() side should be fixed.
Re: [PATCH v2] smackfs: restrict bytes count in smackfs write functions
On 2021/01/28 20:58, Sabyrzhan Tasbolatov wrote: > @@ -2005,6 +2009,9 @@ static ssize_t smk_write_onlycap(struct file *file, > const char __user *buf, > if (!smack_privileged(CAP_MAC_ADMIN)) > return -EPERM; > > + if (count > PAGE_SIZE) > + return -EINVAL; > + > data = memdup_user_nul(buf, count); > if (IS_ERR(data)) > return PTR_ERR(data); > @@ -2740,10 +2754,13 @@ static ssize_t smk_write_relabel_self(struct file > *file, const char __user *buf, > return -EPERM; > > /* > + * No partial write. >* Enough data must be present. >*/ > if (*ppos != 0) > return -EINVAL; > + if (count == 0 || count > PAGE_SIZE) > + return -EINVAL; > > data = memdup_user_nul(buf, count); > if (IS_ERR(data)) > Doesn't this change break legitimate requests like char buffer[2]; memset(buffer, ' ', sizeof(buffer)); memcpy(buffer + sizeof(buffer) - 10, "foo", 3); write(fd, buffer, sizeof(buffer)); ?
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2020/11/14 2:14, Shuah Khan wrote: > On 11/13/20 5:00 AM, Hillf Danton wrote: >> Thu, 12 Nov 2020 23:21:26 -0800 >>> syzbot found the following issue on: >>> >>> HEAD commit: 9dbc1c03 Merge tag 'xfs-5.10-fixes-3' of git://git.kernel... >>> git tree: upstream >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1045303450 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=1735b7978b1c3721 >>> dashboard link: https://syzkaller.appspot.com/bug?extid=95ce4b142579611ef0a9 >>> compiler: gcc (GCC) 10.1.0-syz 20200507 >>> userspace arch: i386 >>> >>> Unfortunately, I don't have any reproducer for this issue yet. >>> > > I would like to see the reproducer for this. I just can't reproduce > this problem. > >> >> Fix 96c2737716d5 ("usbip: move usbip kernel code out of staging") >> by closing the race between readers and writer of ud->tcp_socket on >> sock shutdown. To do that, add waitqueue for usbip device and make >> writer wait for all readers to go home before releasing the socket. Worrysome part for me is vhci_device_reset() which resets ud->tcp_socket to NULL without waiting for tx thread to terminate, though I don't know if vhci_device_reset() can be called while tx thread is running. I'd like to try below debug printk() patch on linx-next tree, for this bug is reported on linux.git and linux-next.git trees. Which git tree can be used for sending this to-be-removed patch to linux-next.git ? I wish there is a kernel config for fuzzers in linux.git so that every git tree can carry debug printk() patch for syzbot's reports... Subject: [PATCH] usb: usbip: vhci_hcd: add printk() for debug This is linux-next only patch for examining a bug reported at https://syzkaller.appspot.com/bug?id=3e1d941a31361efc4ced2ba8b4af2044d4e43c59 . Signed-off-by: Tetsuo Handa --- drivers/usb/usbip/stub_dev.c | 6 ++ drivers/usb/usbip/vhci_hcd.c | 11 +++ drivers/usb/usbip/vhci_sysfs.c | 4 drivers/usb/usbip/vhci_tx.c| 12 drivers/usb/usbip/vudc_dev.c | 6 ++ 5 files changed, 39 insertions(+) diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c index 2305d425e6c9..627f83c0ebc8 100644 --- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -131,10 +131,14 @@ static void stub_shutdown_connection(struct usbip_device *ud) /* 1. stop threads */ if (ud->tcp_rx) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: stop rx %p\n", __func__, ud->tcp_rx); kthread_stop_put(ud->tcp_rx); ud->tcp_rx = NULL; } if (ud->tcp_tx) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: stop tx %p\n", __func__, ud->tcp_tx); kthread_stop_put(ud->tcp_tx); ud->tcp_tx = NULL; } @@ -146,6 +150,8 @@ static void stub_shutdown_connection(struct usbip_device *ud) * not touch NULL socket. */ if (ud->tcp_socket) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: close sock %p\n", __func__, ud->tcp_socket); sockfd_put(ud->tcp_socket); ud->tcp_socket = NULL; ud->sockfd = -1; diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c index 3209b5ddd30c..9e95bf9330f5 100644 --- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -1016,10 +1016,14 @@ static void vhci_shutdown_connection(struct usbip_device *ud) /* kill threads related to this sdev */ if (vdev->ud.tcp_rx) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: stop rx %p\n", __func__, vdev->ud.tcp_rx); kthread_stop_put(vdev->ud.tcp_rx); vdev->ud.tcp_rx = NULL; } if (vdev->ud.tcp_tx) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: stop tx %p\n", __func__, vdev->ud.tcp_tx); kthread_stop_put(vdev->ud.tcp_tx); vdev->ud.tcp_tx = NULL; } @@ -1027,6 +1031,8 @@ static void vhci_shutdown_connection(struct usbip_device *ud) /* active connection is closed */ if (vdev->ud.tcp_socket) { + if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT)) + pr_info("%s: close sock %p\n", __func__, ud->tcp_socket); sockfd_put(vdev->ud.tcp_socket); vdev->ud.tcp_socket = NULL; vdev->ud.sockfd = -1; @@ -1074,6 +1080,11 @@ static void vhci_device_reset(s
Re: [PATCH] smackfs: restrict bytes count in smackfs write functions
On 2021/01/26 3:08, Casey Schaufler wrote: > On 1/24/2021 6:36 AM, Sabyrzhan Tasbolatov wrote: >> syzbot found WARNINGs in several smackfs write operations where >> bytes count is passed to memdup_user_nul which exceeds >> GFP MAX_ORDER. Check count size if bigger SMK_LONGLABEL, >> for smk_write_syslog if bigger than PAGE_SIZE - 1. >> >> Reported-by: syzbot+a71a442385a0b2815...@syzkaller.appspotmail.com >> Signed-off-by: Sabyrzhan Tasbolatov > > Thank you for the patch. Unfortunately, SMK_LONGLABEL isn't > the right value in some of these cases. > Since it uses sscanf(), I think that whitespaces must be excluded from upper limit check. I'm proposing adding __GFP_NOWARM on the memdup_user_nul() side at https://lkml.kernel.org/r/20210120103436.11830-1-penguin-ker...@i-love.sakura.ne.jp .
Re: BUG: MAX_LOCKDEP_KEYS too low!
On 2021/01/23 20:26, Alexey Kardashevskiy wrote: > Should not the first 8192 from "grep lock-classes /proc/lockdep_stats" > decrease > after time (it does not), or once it has failed, it is permanent? Since lockdep_unregister_key() becomes a no-op because graph_lock() returns 0 due to debug_locks being changed from 1 to 0 by __debug_locks_off() from debug_locks_off() from debug_locks_off_graph_unlock(), lock-classes value in /proc/lockdep_stats will not decrease after "BUG: MAX_LOCKDEP_KEYS too low!" is printed.
Re: BUG: MAX_LOCKDEP_KEYS too low!
On 2021/01/23 15:35, Alexey Kardashevskiy wrote: > this behaves quite different but still produces the message (i have > show_workqueue_state() right after the bug message): > > > [ 85.803991] BUG: MAX_LOCKDEP_KEYS too low! > [ 85.804338] turning off the locking correctness validator. > [ 85.804474] Showing busy workqueues and worker pools: > [ 85.804620] workqueue events_unbound: flags=0x2 > [ 85.804764] pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/512 refcnt=3 > [ 85.804965] in-flight: 81:bpf_map_free_deferred > [ 85.805229] workqueue events_power_efficient: flags=0x80 > [ 85.805357] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 85.805558] in-flight: 57:gc_worker > [ 85.805877] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: > 82 24 > [ 85.806147] pool 16: cpus=0-7 flags=0x4 nice=0 hung=69s workers=3 idle: 7 > 251 > ^C[ 100.129747] maxlockdep (5104) used greatest stack depth: 8032 bytes left > > root@le-dbg:~# grep "lock-classes" /proc/lockdep_stats > lock-classes: 8192 [max: 8192] > Right. Hillf's patch can reduce number of active workqueue's worker threads, for only one worker thread can call bpf_map_free_deferred() (which is nice because it avoids bloat of active= and refcnt= fields). But Hillf's patch is not for fixing the cause of "BUG: MAX_LOCKDEP_KEYS too low!" message. Like Dmitry mentioned, bpf syscall allows producing work items faster than bpf_map_free_deferred() can consume. (And a similar problem is observed for network namespaces.) Unless there is a bug that prevents bpf_map_free_deferred() from completing, the classical solution is to put pressure on producers (i.e. slow down bpf syscall side) in a way that consumers (i.e. __bpf_map_put()) will not schedule thousands of backlog "struct bpf_map" works.
BPF: unbounded bpf_map_free_deferred problem
Hello, BPF developers. Alexey Kardashevskiy is reporting that system_wq gets stuck due to flooding of unbounded bpf_map_free_deferred work. Use of WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND workqueue did not solve this problem. Is it possible that a refcount leak somewhere preventing bpf_map_free_deferred from completing? Please see https://lkml.kernel.org/r/CACT4Y+Z+kwPM=WUzJ-e359PWeLLqmF0w4Yxp1spzZ=+j0ek...@mail.gmail.com . On 2021/01/23 7:53, Alexey Kardashevskiy wrote: > > > On 23/01/2021 02:30, Tetsuo Handa wrote: >> On 2021/01/22 22:28, Tetsuo Handa wrote: >>> On 2021/01/22 21:10, Dmitry Vyukov wrote: >>>> On Fri, Jan 22, 2021 at 1:03 PM Alexey Kardashevskiy >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 22/01/2021 21:30, Tetsuo Handa wrote: >>>>>> On 2021/01/22 18:16, Dmitry Vyukov wrote: >>>>>>> The reproducer only does 2 bpf syscalls, so something is slowly leaking >>>>>>> in bpf. >>>>>>> My first suspect would be one of these. Since workqueue is async, it >>>>>>> may cause such slow drain that happens only when tasks are spawned >>>>>>> fast. I don't know if there is a procfs/debugfs introspection file to >>>>>>> monitor workqueue lengths to verify this hypothesis. >>>>>> >>>>>> If you can reproduce locally, you can call show_workqueue_state() >>>>>> (part of SysRq-t) when hitting the limit. >>>>>> >>>>>> --- a/kernel/locking/lockdep.c >>>>>> +++ b/kernel/locking/lockdep.c >>>>>> @@ -1277,6 +1277,7 @@ register_lock_class(struct lockdep_map *lock, >>>>>> unsigned int subclass, int force) >>>>>> >>>>>> print_lockdep_off("BUG: MAX_LOCKDEP_KEYS too low!"); >>>>>> dump_stack(); >>>>>> + show_workqueue_state(); >>>>>> return NULL; >>>>>> } >>>>>> nr_lock_classes++; >>>>>> >>>>> >>>>> >>>>> >>>>> Here is the result: >>>>> https://pastebin.com/rPn0Cytu >>>> >>>> Do you mind posting this publicly? >>>> Yes, it seems that bpf_map_free_deferred is the problem (11138 >>>> outstanding callbacks). >>>> >>> >>> Wow. Horribly stuck queue. I guess BPF wants to try WQ created by >>> >>> alloc_workqueue("bpf_free_wq", WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, >>> 0); >>> >>> rather than system_wq . You can add Tejun Heo for WQ. >>> >>> Anyway, please post your result to ML. > > > > https://pastebin.com/JfrmzguK is with the patch below applied. Seems less > output. Interestingly when I almost hit "send", OOM kicked in and tried > killing a bunch of "maxlockdep" processes (my test case): > > [ 891.037315] [ 31007] 0 31007 281 5 49152 0 > 1000 maxlockdep > [ 891.037540] [ 31009] 0 31009 281 5 49152 0 > 1000 maxlockdep > [ 891.037760] [ 31012] 0 31012 281 5 49152 0 > 1000 maxlockdep > [ 891.037980] [ 31013] 0 31013 281 5 47104 0 > 0 maxlockdep > [ 891.038210] [ 31014] 0 31014 281 5 49152 0 > 1000 maxlockdep > [ 891.038429] [ 31018] 0 31018 281 5 47104 0 > 0 maxlockdep > [ 891.038652] [ 31019] 0 31019 281 5 49152 0 > 1000 maxlockdep > [ 891.038874] [ 31020] 0 31020 281 5 49152 0 > 1000 maxlockdep > [ 891.039095] [ 31021] 0 31021 281 5 49152 0 > 1000 maxlockdep > [ 891.039317] [ 31022] 0 31022 281 5 47104 0 > 0 maxlockdep > > > > > And (re)adding LKML and Tejun as suggested. Thanks, > > >>> >> >> Does this patch (which is only compile tested) reduce number of pending works >> when hitting "BUG: MAX_LOCKDEP_KEYS too low!" ? >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index 07cb5d15e743..c6c6902090f0 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -41,6 +41,7 @@ struct bpf_local_storage_map; >> struct kobject; >> struct mem_c
[PATCH v4 (resend)] lockdep: Allow tuning tracing capacity constants.
Since syzkaller continues various test cases until the kernel crashes, syzkaller tends to examine more locking dependencies than normal systems. As a result, syzbot is reporting that the fuzz testing was terminated due to hitting upper limits lockdep can track [1] [2] [3]. Peter Zijlstra does not want to allow tuning these limits via kernel config options, for such change discourages thinking. But analysis via /proc/lockdep* did not show any obvious culprit [4] [5]. It is possible that many hundreds of kn->active lock instances are to some degree contributing to these problems, but there is no means to verify whether these instances are created for protecting same callback functions. Unless Peter provides a way to make these instances per "which callback functions the lock instance will call (identified by something like MD5 of string representations of callback functions which each lock instance will protect)" than plain "serial number", I don't think that we can verify the culprit. [1] https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b [2] https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a [3] https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f [4] https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp [5] https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp Reported-by: syzbot Reported-by: syzbot Reported-by: syzbot Signed-off-by: Tetsuo Handa Acked-by: Dmitry Vyukov --- kernel/locking/lockdep.c | 2 +- kernel/locking/lockdep_internals.h | 8 +++--- lib/Kconfig.debug | 40 ++ 3 files changed, 45 insertions(+), 5 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c1418b47f625..c0553872668a 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this, /* * For good efficiency of modular, we use power of 2 */ -#define MAX_CIRCULAR_QUEUE_SIZE4096UL +#define MAX_CIRCULAR_QUEUE_SIZE(1UL << CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS) #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1) /* diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h index de49f9e1c11b..ecb8662e7a4e 100644 --- a/kernel/locking/lockdep_internals.h +++ b/kernel/locking/lockdep_internals.h @@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ = #define MAX_STACK_TRACE_ENTRIES262144UL #define STACK_TRACE_HASH_SIZE 8192 #else -#define MAX_LOCKDEP_ENTRIES32768UL +#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS) -#define MAX_LOCKDEP_CHAINS_BITS16 +#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS /* * Stack-trace: tightly packed array of stack backtrace * addresses. Protected by the hash_lock. */ -#define MAX_STACK_TRACE_ENTRIES524288UL -#define STACK_TRACE_HASH_SIZE 16384 +#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS) +#define STACK_TRACE_HASH_SIZE (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS) #endif /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7937265ef879..4cb84b499636 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1332,6 +1332,46 @@ config LOCKDEP config LOCKDEP_SMALL bool +config LOCKDEP_BITS + int "Bitsize for MAX_LOCKDEP_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 15 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too low!" message. + +config LOCKDEP_CHAINS_BITS + int "Bitsize for MAX_LOCKDEP_CHAINS" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 16 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too low!" message. + +config LOCKDEP_STACK_TRACE_BITS + int "Bitsize for MAX_STACK_TRACE_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 19 + help + Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES too low!" message. + +config LOCKDEP_STACK_TRACE_HASH_BITS + int "Bitsize for STACK_TRACE_HASH_SIZE" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 14 + help + Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES. + +config LOCKDEP_CIRCULAR_QUEUE_BITS + int "Bitsize for elements in circular_queue struct" + depends on LOCKDEP + range 10 30 + default 12 + help + Try increasing this value if you hit "lockdep bfs error:-1" warning due to __cq_enqueue() failure. + config DEBUG_LOCKDEP bool "Lock dependency engine debugging" depends on DEBUG_KERNEL && LOCKDEP -- 2.18.4
Re: [Patch] fbcon: i want fbcon soft scrollback feature come back
On 2021/01/08 0:48, xuhuijie wrote: > This commit 50145474f6ef(fbcon: remove soft scrollback code) remove soft > scrollback in > fbcon. So the shift+PageDown and shift+PageUp is missing. But PageUp is a > vary important > feature when system panic or reset. I can get log by PageUp before, but now > there is no > way to get. Especially on the server system, we always use bmc to control > computer. > So I hope the community can add this feature back. > You can configure kdump for panic, and netconsole for reset. (I don't know whether PageUp key works after panic...)
Re: Does uaccess_kernel() work for detecting kernel thread?
On 2021/01/05 16:59, Christoph Hellwig wrote: > On Wed, Dec 23, 2020 at 07:11:38PM +0900, Tetsuo Handa wrote: >> due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel >> builds with set_fs()") and follow up changes. Don't we need to change this >> "uaccess_kernel()" with "(current->flags & PF_KTHREAD)" ? > > No. The real problem here is that when a this funtion is called Called by "who" ? Called by "a userspace process" ? Called by "a kernel thread" ? Called by "an io_uring service thread" ? > under > set_fs it allows kernel memory access for all user pointers, and due to > the indirection in the playload allows reading or changing kernel > memory. A kthread does not have that issue. If this uaccess_kernel() is intended to reject calling this function from "a userspace process", uaccess_kernel() is failing to reject because uaccess_kernel() is always "false" for x86. If this uaccess_kernel() is intended to reject calling this function from "a kernel thread", uaccess_kernel() is failing to reject because uaccess_kernel() is always "false" for x86. If this uaccess_kernel() is intended to reject calling this function from "an io_uring service thread", uaccess_kernel() is failing to reject because uaccess_kernel() is always "false" for x86. What does uaccess_kernel() in sg_check_file_access() (and uhid_char_write(), ib_safe_file_access(), bpfilter_process_sockopt() etc.) want to check? > >>>> For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, >>>> isn't TOMOYO unexpectedly checking permissions for socket operations? >>> >>> Can someone explain WTF TOMOYO is even doing there? A security module >>> has absolutely no business checking what context it is called from, but >>> must check the process credentials instead. >>> >> >> TOMOYO distinguishes userspace processes and kernel threads, and grants >> kernel threads implicit permissions to perform socket operations. > > And this is the problem we need to fix. A kernel thread can't just have > implicit permissions only because it is a kernel thread. Think of e.g. > the io_uring service threads. We can use (current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD like https://lkml.kernel.org/r/dacfb329-de66-d0cf-dcf9-f030ea137...@schaufler-ca.com does in order to exclude e.g. the io_uring service threads, can't we?
[PATCH v4] lockdep: Allow tuning tracing capacity constants.
Since syzkaller continues various test cases until the kernel crashes, syzkaller tends to examine more locking dependencies than normal systems. As a result, syzbot is reporting that the fuzz testing was terminated due to hitting upper limits lockdep can track [1] [2] [3]. Peter Zijlstra does not want to allow tuning these limits via kernel config options, for such change discourages thinking. But analysis via /proc/lockdep* did not show any obvious culprit [4] [5]. It is possible that many hundreds of kn->active lock instances are to some degree contributing to these problems, but there is no means to verify whether these instances are created for protecting same callback functions. Unless Peter provides a way to make these instances per "which callback functions the lock instance will call (identified by something like MD5 of string representations of callback functions which each lock instance will protect)" than plain "serial number", I don't think that we can verify the culprit. Therefore, despite Peter's objection, I push this patch again. The ball is in Peter court. Peter, please don't ignore. [1] https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b [2] https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a [3] https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f [4] https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp [5] https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp Reported-by: syzbot Reported-by: syzbot Reported-by: syzbot Signed-off-by: Tetsuo Handa Acked-by: Dmitry Vyukov --- kernel/locking/lockdep.c | 2 +- kernel/locking/lockdep_internals.h | 8 +++--- lib/Kconfig.debug | 40 ++ 3 files changed, 45 insertions(+), 5 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c1418b47f625..c0553872668a 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this, /* * For good efficiency of modular, we use power of 2 */ -#define MAX_CIRCULAR_QUEUE_SIZE4096UL +#define MAX_CIRCULAR_QUEUE_SIZE(1UL << CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS) #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1) /* diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h index de49f9e1c11b..ecb8662e7a4e 100644 --- a/kernel/locking/lockdep_internals.h +++ b/kernel/locking/lockdep_internals.h @@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ = #define MAX_STACK_TRACE_ENTRIES262144UL #define STACK_TRACE_HASH_SIZE 8192 #else -#define MAX_LOCKDEP_ENTRIES32768UL +#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS) -#define MAX_LOCKDEP_CHAINS_BITS16 +#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS /* * Stack-trace: tightly packed array of stack backtrace * addresses. Protected by the hash_lock. */ -#define MAX_STACK_TRACE_ENTRIES524288UL -#define STACK_TRACE_HASH_SIZE 16384 +#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS) +#define STACK_TRACE_HASH_SIZE (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS) #endif /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e6e58b26e888..2c23939dc6a6 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1340,6 +1340,46 @@ config LOCKDEP config LOCKDEP_SMALL bool +config LOCKDEP_BITS + int "Bitsize for MAX_LOCKDEP_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 15 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too low!" message. + +config LOCKDEP_CHAINS_BITS + int "Bitsize for MAX_LOCKDEP_CHAINS" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 16 + help + Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too low!" message. + +config LOCKDEP_STACK_TRACE_BITS + int "Bitsize for MAX_STACK_TRACE_ENTRIES" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 19 + help + Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES too low!" message. + +config LOCKDEP_STACK_TRACE_HASH_BITS + int "Bitsize for STACK_TRACE_HASH_SIZE" + depends on LOCKDEP && !LOCKDEP_SMALL + range 10 30 + default 14 + help + Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES. + +config LOCKDEP_CIRCULAR_QUEUE_BITS + int "Bitsize for elements in circular_queue struct" + depends on LOCKDEP + range 10 30 + default 12 + help +
Re: Does uaccess_kernel() work for detecting kernel thread?
On 2020/12/23 16:53, Christoph Hellwig wrote: > On Tue, Dec 22, 2020 at 11:39:08PM +0900, Tetsuo Handa wrote: >> For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, >> isn't sg_check_file_access() failing to detect kernel context? > > sg_check_file_access does exactly the right thing - fail for all kernel > threads as those can't support the magic it does. My question is, in Linux 5.10, sg_check_file_access() for x86 became static int sg_check_file_access(struct file *filp, const char *caller) { if (filp->f_cred != current_real_cred()) { pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n", caller, task_tgid_vnr(current), current->comm); return -EPERM; } if (0) { pr_err_once("%s: process %d (%s) called from kernel context, this is not allowed.\n", caller, task_tgid_vnr(current), current->comm); return -EACCES; } return 0; } due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel builds with set_fs()") and follow up changes. Don't we need to change this "uaccess_kernel()" with "(current->flags & PF_KTHREAD)" ? > >> For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, >> isn't TOMOYO unexpectedly checking permissions for socket operations? > > Can someone explain WTF TOMOYO is even doing there? A security module > has absolutely no business checking what context it is called from, but > must check the process credentials instead. > TOMOYO distinguishes userspace processes and kernel threads, and grants kernel threads implicit permissions to perform socket operations. Since "uaccess_kernel()" became "0" for x86, TOMOYO is no longer able to grant kernel threads implicit permissions to perform socket operations. Since Eric says "For PF_IO_WORKER kernel threads which are running code on behalf of a user we want to perform the ordinary permission checks.", I think that TOMOYO wants to use "(current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD" instead.
Does uaccess_kernel() work for detecting kernel thread?
Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced segment_eq(get_fs(), KERNEL_DS) with uaccess_kernel(). But uaccess_kernel() became an unconditional "false" for some architectures due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel builds with set_fs()") and follow up changes in Linux 5.10. As a result, I guess that uaccess_kernel() can no longer be used as a condition for checking whether current thread is a kernel thread or not. For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, isn't sg_check_file_access() failing to detect kernel context? static int sg_check_file_access(struct file *filp, const char *caller) { if (filp->f_cred != current_real_cred()) { pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n", caller, task_tgid_vnr(current), current->comm); return -EPERM; } if (uaccess_kernel()) { pr_err_once("%s: process %d (%s) called from kernel context, this is not allowed.\n", caller, task_tgid_vnr(current), current->comm); return -EACCES; } return 0; } For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, isn't TOMOYO unexpectedly checking permissions for socket operations? static bool tomoyo_kernel_service(void) { /* Nothing to do if I am a kernel service. */ return uaccess_kernel(); } static u8 tomoyo_sock_family(struct sock *sk) { u8 family; if (tomoyo_kernel_service()) return 0; family = sk->sk_family; switch (family) { case PF_INET: case PF_INET6: case PF_UNIX: return family; default: return 0; } } Don't we need to replace such usage with something like (current->flags & PF_KTHREAD) ? I don't know about io_uring, but according to https://lkml.kernel.org/r/dacfb329-de66-d0cf-dcf9-f030ea137...@schaufler-ca.com , should (current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD be used instead?
Re: WARNING: suspicious RCU usage in modeset_lock
On Wed, Dec 16, 2020 at 5:16 PM Paul E. McKenney wrote: > In my experience, lockdep will indeed complain if an interrupt handler > returns while in an RCU read-side critical section. Can't we add lock status checks into the beginning and the end of interrupt handler functions (e.g. whether "struct task_struct"->lockdep_depth did not change) ?
Re: [PATCH 2/4] hung_task: Replace "did_panic" with is_be_panic()
On 2020/12/18 21:59, Pavel Machek wrote: > On Fri 2020-12-18 19:44:04, Xiaoming Ni wrote: > Plus.. is_being_panic is not really english. "is_paniccing" would be > closer...? Or in_panic() ?
Re: [PATCH v2 00/10] allow unprivileged overlay mounts
On 2020/12/08 1:32, Miklos Szeredi wrote: > A general observation is that overlayfs does not call security_path_*() > hooks on the underlying fs. I don't see this as a problem, because a > simple bind mount done inside a private mount namespace also defeats the > path based security checks. Maybe I'm missing something here, so I'm > interested in comments from AppArmor and Tomoyo developers. Regarding TOMOYO, I don't want overlayfs to call security_path_*() hooks on the underlying fs, but the reason is different. It is not because a simple bind mount done inside a private mount namespace defeats the path based security checks. TOMOYO does want to check what device/filesystem is mounted on which location. But currently TOMOYO is failing to check it due to fsopen()/fsmount()/move_mount() API.
Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.
I attempted some analysis, but I couldn't find which lock is causing nr_list_entries == 32768. Since "struct lock_list list_entries[MAX_LOCKDEP_ENTRIES]" is marked as "static" variable in lockdep.c , none of /proc/lockdep_stats /proc/lockdep /proc/lockdep_chains can report which lock is responsible for "BUG: MAX_LOCKDEP_ENTRIES too low!" message? lock-classes: 3456 [max: 8192] direct dependencies: 32768 [max: 32768] indirect dependencies: 200953 all direct dependencies: 1213844 dependency chains: 56497 [max: 65536] dependency chain hlocks used: 249022 [max: 327680] dependency chain hlocks lost:0 in-hardirq chains: 88 in-softirq chains:2636 in-process chains: 53773 stack-trace entries:303496 [max: 524288] number of stack traces: 15714 number of stack hash chains: 10103 combined max dependencies:hardirq-safe locks: 57 hardirq-unsafe locks: 2613 softirq-safe locks:381 softirq-unsafe locks: 2202 irq-safe locks:395 irq-unsafe locks: 2613 hardirq-read-safe locks: 4 hardirq-read-unsafe locks: 233 softirq-read-safe locks:21 softirq-read-unsafe locks: 218 irq-read-safe locks:21 irq-read-unsafe locks: 233 uncategorized locks: 491 unused locks:1 max locking depth: 20 max bfs queue depth: 749 debug_locks: 0 zapped classes:224 zapped lock chains:640 large chain blocks: 1 I was able to calculate "lock-classes:" and "indirect dependencies:" from /proc/lockdep , but I failed to calculate "direct dependencies:" from /proc/lockdep . How can we calculate it? grep ^FD: CrashLog-12016d5550 | wc -l 3456 grep -F -- '->' CrashLog-12016d5550 | wc -l 16004 grep '^ ->' CrashLog-12016d5550 | wc -l 14014 grep '^ ->' CrashLog-12016d5550 | sort -u | wc -l 2506 grep ^FD: CrashLog-12016d5550 | awk ' BEGIN { FD=0; BD=0; } { if ($1 == "FD:") { FD=FD+$2; BD=BD+$4; } } END { print FD" "BD; } ' 200953 200856 Top FD entries tend to be long held locks (e.g. workqueue) whereas top BD entries tend to be short held locks (e.g. scheduler and timekeeping). But given that this result is from a VM with only 2 CPUs, I think that this result does not match your Examples of bad annotations is getting every CPU a separate class, that leads to nr_cpus! chains if CPUs arbitrarily nest (nr_cpus^2 if there's only a single nesting level). case. awk ' { if ($1 == "FD:") print $2" "$6$7 } ' CrashLog-12016d5550 | sort -rV | head -n 40 1933 (wq_completion)usb_hub_wq 1932 (work_completion)(>events) 1730 (wq_completion)events 1508 (delayed_fput_work).work 1332 >s_type->i_mutex_key#13 1227 >f_pos_lock 1173 sb_writers#5 1095 (wq_completion)netns 1094 net_cleanup_work 1094 masq_mutex 1092 pernet_ops_rwsem 1046 (work_completion)(>wq) 1046 (work_completion)(>rq) 1045 >mutex/1 1043 tty_mutex 1042 >legacy_mutex 1039 >lock 1035 >legacy_mutex/1 1033 >ldisc_sem 1009 cb_lock 1007 (work_completion)(>work) 1007 nlk_cb_mutex-GENERIC 1006 genl_mutex 985 sb_writers#8 984 sb_writers#13 984 sb_writers#6 979 >mutex 964 (work_completion)(_work->work) 946 (wq_completion)events_power_efficient 943 _nl_types[idx].sem 942 link_ops_rwsem 939 (wq_completion)gid-cache-wq 937 (work_completion)(>work) 937 (work_completion)(_work->work) 936 devices_rwsem 931 kn->active#88 930 nsim_bus_dev_list_lock 926 rdma_nets_rwsem 924 >compat_devs_mutex 922 (wq_completion)ipv6_addrconf awk ' { if ($3 == "BD:") print $4" "$6$7 } ' CrashLog-12016d5550 | sort -urV | head -n 40 1618 pool_lock 1617 _hash[i].lock 1510 >lock 1489 pvclock_gtod_data 1487 tk_core.seq.seqcount 1465 hrtimer_bases.lock 1460 _cpu_ptr(group->pcpu,cpu)->seq 1459 _rq->rt_runtime_lock 1459 >lock/1 1458 _b->rt_runtime_lock 1458 >lock 1457 >delays->lock 1457 >lock/1 1457 >rto_lock 1457 >lock 1457 _rq->removed.lock 1457 _b->lock 1457 per_cpu_ptr(_rstat_cpu_lock,cpu) 1456 >lock 1299 >pi_lock 1266 depot_lock 1114 &s->seqcount 1085 >list_lock 1071 >lock 1039 >wait#13 1039 wq_mayday_lock 1038 >lock 869 quarantine_lock 859 _lock 854 rcu_node_0 827 _state.expedited_wq 812 key#10 808 >deferred_split_queue.split_queue_lock 807 >lru_lock 806 lock#3 804 >refs_lock 798 >private_lock 788 _state.gp_wq 782 _wait_table[i] 778 pgd_lock grep -F -- '->kn->active#' CrashLog-12016d5550 | wc -l 533 grep -F -- '->kn->active#' CrashLog-12016d5550 | sort -u | wc -l 439 FD: 931 BD:1 .+.+: kn->active#88 ->fs_reclaim ->&s->seqcount ->kernfs_open_file_mutex ->nsim_bus_dev_list_lock
Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.
On Sun, Nov 22, 2020 at 2:56 AM Tetsuo Handa wrote: > Peter, you guessed that the culprit is sysfs at > https://lkml.kernel.org/r/20200916115057.go2...@hirez.programming.kicks-ass.net > , but > syzbot reported at > https://syzkaller.appspot.com/text?tag=MachineInfo=99b8f2b092d9714f > that "BUG: MAX_LOCKDEP_ENTRIES too low!" can occur on a VM with only 2 CPUs. > Is your guess catching the culprit? > > We could improve a few locks, but as a whole we won't be able to afford > keeping > sum of individual dependencies under current threshold. Therefore, allow > lockdep to > tune the capacity and allow syzkaller to dump when reaching the capacity will > be > the way to go. An example dump from a VM with only 2 CPUs ( https://syzkaller.appspot.com/text?tag=CrashLog=12016d5550 ) had a chain with more than kn->active#number 150 entries. kn->active#number is initialized by __kernfs_create_file() but it is not deinitialized (there is no method for erasing lockdep map?) by kernfs_put(). Since syzkaller continues object creation/deletion via fuzz testing until kernel crashes, __kernfs_create_file() are called for so many times? struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, const char *name, umode_t mode, kuid_t uid, kgid_t gid, loff_t size, const struct kernfs_ops *ops, void *priv, const void *ns, struct lock_class_key *key) { (...snipped...) kn = kernfs_new_node(parent, name, (mode & S_IALLUGO) | S_IFREG, uid, gid, flags); if (!kn) return ERR_PTR(-ENOMEM); kn->attr.ops = ops; kn->attr.size = size; kn->ns = ns; kn->priv = priv; #ifdef CONFIG_DEBUG_LOCK_ALLOC if (key) { lockdep_init_map(>dep_map, "kn->active", key, 0); kn->flags |= KERNFS_LOCKDEP; } #endif (...snipped...) } void kernfs_put(struct kernfs_node *kn) { (...snipped...) kmem_cache_free(kernfs_node_cache, kn); (...snipped...) } FD: 919 BD: 59 +.+.: rtnl_mutex ->fs_reclaim ->depot_lock ->pcpu_alloc_mutex ->>wait#6 ->_hash[i].lock ->gdp_mutex ->>list_lock ->lock#2 ->kernfs_mutex ->bus_type_sem ->sysfs_symlink_target_lock ->&s->seqcount ->>power.lock ->dpm_list_mtx ->uevent_sock_mutex ->>lock ->running_helpers_waitq.lock ->subsys mutex#17 ->dev_hotplug_mutex ->dev_base_lock ->input_pool.lock ->nl_table_lock ->nl_table_wait.lock ->net_rwsem ->>lock ->sysctl_lock ->>lock ->krc.lock ->cpu_hotplug_lock ->kthread_create_lock ->>pi_lock ->_rq->removed.lock ->>wait ->>alloc_lock ->_rwsem ->wq_pool_mutex ->>lock ->rcu_node_0 ->_state.expedited_wq ->>list_lock ->>lock ->pool_lock ->lweventlist_lock ->rtnl_mutex.wait_lock ->proc_subdir_lock ->proc_inum_ida.xa_lock ->>k_lock ->pcpu_lock ->param_lock ->logbuf_lock ->(console_sem).lock ->subsys mutex#52 ->pin_fs_lock ->>s_type->i_mutex_key#3 ->>iflist_mtx ->>mtx ->subsys mutex#53 ->>sec_mtx ->>iflist_mtx#2 ->lock#6 ->net_dm_mutex ->failover_lock ->>lock ->>lock ->>lock ->>mc_lock ->>mca_lock ->>lock ->j1939_netdev_lock ->>lock ->>sock_lock ->_requests_lock ->_pending_beacons_lock ->_indoor_lock ->_xmit_LOOPBACK ->(inetaddr_validator_chain).rwsem ->(inetaddr_chain).rwsem ->_dev->mc_tomb_lock ->>lock ->fib_info_lock ->cbs_list_lock ->taprio_list_lock ->(inet6addr_validator_chain).rwsem ->addrconf_hash_lock ->>lock ->>tb6_lock ->rlock-AF_NETLINK ->_xmit_ETHER ->noop_qdisc.q.lock ->quarantine_lock ->kernfs_mutex.wait_lock ->x25_neigh_list_lock ->>mmap_lock#2 ->free_vmap_area_lock ->vmap_area_lock ->>lock ->>lock ->>hash_lock ->init_mm.page_table_lock ->_netdev_addr_lock_key ->listen_lock ->>consumer_lock ->batched_entropy_u32.lock ->(switchdev_blocking_notif_chain).rwsem ->>hash_lock ->nf_hook_mutex ->console_owner_lock ->console_owner ->_priv->tvlv.handler_list_lock ->_priv->tvlv.container_list_lock ->_priv->softif_vlan_list_lock ->key#14 ->_priv->tt.changes_list_lock ->tk_core.seq.seqcount ->>mutex ->init_lock ->deferred_lock ->target_list_lock ->>lock ->_addr_list_lock_key/1 ->tea
Re: [PATCH] tomoyo: Avoid potential null pointer access
On 2020/11/27 16:17, Zheng Zengkai wrote: > Hello Tetsuo, >> On 2020/11/26 15:33, Zheng Zengkai wrote: >>> As your say, I found the function tomoyo_assign_namespace( ) >>> >>> in security/tomoyo/domain.c has the similar situation, >>> >>> Can I add __GFP_NOWARN for both and remove the null check for _entry_ in >>> tomoyo_assign_namespace( )? >>> >> Good catch. Yes, please send as a patch. >> . > > I have resent a patch, thanks! > Applied to tomoyo-test1.git tree. Thank you. By the way, since some people automatically backport patches with Fixes: tag, I think we don't need to add Fixes: tag for patches like this one.
Re: [PATCH] tomoyo: Avoid potential null pointer access
On 2020/11/26 15:33, Zheng Zengkai wrote: > As your say, I found the function tomoyo_assign_namespace( ) > > in security/tomoyo/domain.c has the similar situation, > > Can I add __GFP_NOWARN for both and remove the null check for _entry_ in > tomoyo_assign_namespace( )? > Good catch. Yes, please send as a patch.
Re: [PATCH] tomoyo: Avoid potential null pointer access
Hello, Zheng. Thank you for a patch, but I won't apply this patch. Expected behavior is that tomoyo_warn_oom() is called if tomoyo_memory_ok() is called with entry == NULL. Adding __GFP_NOWARN might be OK, but returning without tomoyo_warn_oom() is NG. On 2020/11/25 21:10, Zheng Zengkai wrote: > Calls to kzalloc() should be null-checked in order to avoid > any potential failures or unnecessary code execution. > Fix this by adding null checks for _entry_ right after allocation. > > Fixes: 57c2590fb7fd ("TOMOYO: Update profile structure") > Reported-by: Hulk Robot > Signed-off-by: Zheng Zengkai Nacked-by: Tetsuo Handa > --- > security/tomoyo/common.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c > index 4bee32bfe16d..99b4fafcb100 100644 > --- a/security/tomoyo/common.c > +++ b/security/tomoyo/common.c > @@ -499,6 +499,8 @@ static struct tomoyo_profile *tomoyo_assign_profile > if (ptr) > return ptr; > entry = kzalloc(sizeof(*entry), GFP_NOFS); > + if (!entry) > + return NULL; > if (mutex_lock_interruptible(_policy_lock)) > goto out; > ptr = ns->profile_ptr[profile]; >
Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.
On 2020/11/20 18:27, Dmitry Vyukov wrote: > Peter, so far it looks like just a very large, but normal graph to me. > The cheapest from an engineering point of view solution would be just > to increase the constants. I assume a 2x increase should buy us lots > of time to overflow. > I can think of more elaborate solutions, e.g. using bitmasks to > represent hot leaf and top-level locks. But it will both increase the > resulting code complexity (no uniform representation anymore, all code > will need to deal with different representations) and require some > time investments (that I can't justify for me at least as compared to > just throwing a bit more machine memory at it). And in the end it > won't really reduce the size of the graph. > What do you think? > Yes, I think it is a normal graph; simply syzkaller kernels tend to record a few times more dependencies than my idle kernel (shown bottom). Peter, you guessed that the culprit is sysfs at https://lkml.kernel.org/r/20200916115057.go2...@hirez.programming.kicks-ass.net , but syzbot reported at https://syzkaller.appspot.com/text?tag=MachineInfo=99b8f2b092d9714f that "BUG: MAX_LOCKDEP_ENTRIES too low!" can occur on a VM with only 2 CPUs. Is your guess catching the culprit? We could improve a few locks, but as a whole we won't be able to afford keeping sum of individual dependencies under current threshold. Therefore, allow lockdep to tune the capacity and allow syzkaller to dump when reaching the capacity will be the way to go. # cat /proc/lockdep_stats lock-classes: 1236 [max: 8192] direct dependencies: 9610 [max: 32768] indirect dependencies: 40401 all direct dependencies:174635 dependency chains: 11398 [max: 65536] dependency chain hlocks used:42830 [max: 327680] dependency chain hlocks lost:0 in-hardirq chains: 61 in-softirq chains: 414 in-process chains: 10923 stack-trace entries: 93041 [max: 524288] number of stack traces: 4997 number of stack hash chains: 4292 combined max dependencies: 281074520 hardirq-safe locks: 50 hardirq-unsafe locks: 805 softirq-safe locks:146 softirq-unsafe locks: 722 irq-safe locks:155 irq-unsafe locks: 805 hardirq-read-safe locks: 2 hardirq-read-unsafe locks: 129 softirq-read-safe locks:11 softirq-read-unsafe locks: 123 irq-read-safe locks:11 irq-read-unsafe locks: 129 uncategorized locks: 224 unused locks:0 max locking depth: 15 max bfs queue depth: 215 chain lookup misses: 11664 chain lookup hits:37393935 cyclic checks: 11053 redundant checks:0 redundant links: 0 find-mask forwards checks:1588 find-mask backwards checks: 1779 hardirq on events:17502380 hardirq off events: 17502376 redundant hardirq ons: 0 redundant hardirq offs: 0 softirq on events: 90845 softirq off events: 90845 redundant softirq ons: 0 redundant softirq offs: 0 debug_locks: 1 zapped classes: 0 zapped lock chains: 0 large chain blocks: 1 # awk ' { if ($2 == "OPS:") print $5" "$9 } ' /proc/lockdep | sort -rV | head -n 30 423 (wq_completion)events 405 (wq_completion)events_unbound 393 >f_pos_lock 355 >lock 349 sb_writers#3 342 sb_writers#6 338 >mutex 330 (work_completion)(>work) 330 pernet_ops_rwsem 289 epmutex 288 >mtx 281 tty_mutex 280 >legacy_mutex 273 >legacy_mutex/1 269 >ldisc_sem 268 (wq_completion)ipv6_addrconf 266 (work_completion)(&(>dad_work)->work) 266 (linkwatch_work).work 266 (addr_chk_work).work 266 >atomic_read_lock 265 (work_completion)(>work) 265 rtnl_mutex 263 >atomic_write_lock 262 >lock 261 >termios_rwsem 242 >buf.lock/1 242 kn->active#40 241 _tty->termios_rwsem/1 240 registration_lock 239 >output_lock # awk ' { if ($2 == "OPS:") print $7" "$9 } ' /proc/lockdep | sort -rV | head -n 30 642 pool_lock 641 _hash[i].lock 582 tk_core.seq.seqcount 561 hrtimer_bases.lock 560 _rq->rt_runtime_lock 559 _b->rt_runtime_lock 559 >lock 559 _rq->removed.lock 559 _b->lock 558 >lock 550 >delays->lock 549 >pi_lock 506 >lock 504 >list_lock 432 &s->seqcount 404 >wait#10 401 >lock 369 >lock 330 rcu_node_0 326 &(kretprobe_table_locks[i].lock) 326 pgd_lock 324 >lru_lock 323 lock#5 321 _wait_table[i] 319 ptlock_ptr(page)#2/1 318 ptlock_ptr(page)#2
Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.
On 2020/11/19 23:30, Dmitry Vyukov wrote: > p.s. it's indeed huge, full log was 11MB, this probably won't be > chewed by syzbot. Is "cat /proc/lockdep*" output written from userspace? Then, we could try "xz -9" + "base64" for recording. > Peter, are these [hex numbers] needed? Could we strip them during > post-processing? At first sight they look like derivatives of the > name. kernel/locking/lockdep.c uses %px (raw address) whereas kernel/locking/lockdep_proc.c uses %p (__ptr_to_hashval() value). I think saving these [hashed hex numbers] is unlikely useful. At least for this testing, we can strip leading part.