from:"Tetsuo Handa"

Re: [PATCH] kernel/hung_task: Add a whitelist and blacklist mechanism.

2021-04-18 Thread Tetsuo Handa

On 2021/04/19 11:22, 周传高 wrote:
> Some SOC vendors' drivers or user-mode processes may be in D state for a long 
> time,
> and normally they do not configure HUNG TASK, so we need to ignore these 
> tasks if
> we use HUNG TASK. 

Isn't that a sign that the quality of the drivers and user-mode processes is 
poor?
Wait should be killable/interruptible if possible (for better response to e.g. 
OOM-killer).

Re: [syzbot] unexpected kernel reboot (4)

2021-04-18 Thread Tetsuo Handa

On 2021/04/17 12:19, Tetsuo Handa wrote:
> On 2021/04/15 0:39, Andrey Konovalov wrote:
>> On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov  wrote:
>>> The reproducer connects some USB HID device and communicates with the 
>>> driver.
>>> Previously we observed reboots because HID devices can trigger reboot
>>> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set".
>>> How else can a USB device reboot the machine? Is it possible to disable it?
>>> I don't see any direct includes of  in drivers/usb/*
>>
>> This happens when a keyboard sends the Ctrl+Alt+Del sequence, see
>> fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c.
>>
> 
> Regarding ctrl_alt_del() problem, doing
> 
>   sh -c 'echo 0 > /proc/sys/kernel/ctrl-alt-del; echo $$ > 
> /proc/sys/kernel/cad_pid'
> 
> as root before start fuzzing might help.
> 
> Also, with the command above, reproducer still triggers suspend operation 
> which freezes userspace processes.
> This could possibly be one of causes for no output / lost connections. Try 
> disabling freeze/suspend related configs?
> 

I tried

  # CONFIG_HIBERNATION is not set
  # CONFIG_SUSPEND is not set
  # CONFIG_PM is not set

but the reproducer still shutdowns the system.

The reproducer can be simplified to single threaded.


int main(int argc, char *argv[])
{
uint64_t r[1] = { 0x };

syscall(__NR_mmap, 0x2000ul, 0x100ul, 7ul, 0x32ul, -1, 0ul);
memcpy((void*)0x2200,
   
"\x12\x01\x00\x00\x00\x00\x00\x40\x26\x09\x33\x33\x40\x00\x00\x00\x00"
   
"\x01\x09\x02\x24\x00\x01\x00\x00\x00\x00\x09\x04\x00\x00\x01\x03\x01"
   
"\x00\x00\x09\x21\x00\x00\x00\x01\x22\x01\x00\x09\x05\x81\x03\x08",
   50);
r[0] = syz_usb_connect(0, 0x36, 0x2200, 0);
syz_usb_control_io(r[0], 0, 0);
*(uint32_t*)0x2300 = 0x24;
*(uint64_t*)0x2304 = 0;
*(uint64_t*)0x230c = 0;
*(uint64_t*)0x2314 = 0x21c0;
*(uint8_t*)0x21c0 = 0;
*(uint8_t*)0x21c1 = 0x22;
*(uint32_t*)0x21c2 = 5;
STORE_BY_BITMASK(uint8_t, , 0x21c6, 3, 0, 2);
STORE_BY_BITMASK(uint8_t, , 0x21c6, 0, 2, 2);
STORE_BY_BITMASK(uint8_t, , 0x21c6, 0, 4, 4);
memcpy((void*)0x21c7, "\x65\xe9\x5e\x9d", 4);
*(uint64_t*)0x231c = 0;
syz_usb_control_io(r[0], 0x2300, 0);
syz_usb_ep_write(r[0], 0, 5, (long) "\x8e\xc5\x44\xbc\x66");
return 0;
}


Since applying below diff does not help, I don't think that
this problem can be avoided by modifying tty code.


diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c
index 77638629c562..25e6d52f5de6 100644
--- a/drivers/tty/vt/keyboard.c
+++ b/drivers/tty/vt/keyboard.c
@@ -323,12 +323,14 @@ int kbd_rate(struct kbd_repeat *rpt)
  */
 static void put_queue(struct vc_data *vc, int ch)
 {
+   return;
tty_insert_flip_char(>port, ch, 0);
tty_schedule_flip(>port);
 }
 
 static void puts_queue(struct vc_data *vc, const char *cp)
 {
+   return;
tty_insert_flip_string(>port, cp, strlen(cp));
tty_schedule_flip(>port);
 }
@@ -657,7 +659,7 @@ static void k_spec(struct vc_data *vc, unsigned char value, 
char up_flag)
 kbd->kbdmode == VC_OFF) &&
 value != KVAL(K_SAK))
return; /* SAK is allowed even in raw mode */
-   fn_handler[value](vc);
+   //fn_handler[value](vc);
 }
 
 static void k_lowercase(struct vc_data *vc, unsigned char value, char up_flag)
@@ -1393,6 +1395,7 @@ static void kbd_keycode(unsigned int keycode, int down, 
bool hw_raw)
struct keyboard_notifier_param param = { .vc = vc, .value = keycode, 
.down = down };
int rc;
 
+   return;
tty = vc->port.tty;
 
if (tty && (!tty->driver_data)) {
@@ -1509,7 +1512,7 @@ static void kbd_keycode(unsigned int keycode, int down, 
bool hw_raw)
if ((raw_mode || kbd->kbdmode == VC_OFF) && type != KT_SPEC && type != 
KT_SHIFT)
return;
 
-   (*k_handler[type])(vc, keysym & 0xff, !down);
+   //(*k_handler[type])(vc, keysym & 0xff, !down);
 
param.ledstate = kbd->ledflagstate;
atomic_notifier_call_chain(_notifier_list, KBD_POST_KEYSYM, 
);


Running the reproducer with

  dbus-monitor --system > /dev/ttyprintk

reports that systemd's shutdown service is requested when syz_usb_ep_write() is 
called.
I can't understand why the command "\x8e\xc5\x44\xbc\x66" leads to this result.

Re: [PATCH] kernel/hung_task: Add a whitelist and blacklist mechanism.

2021-04-17 Thread Tetsuo Handa

On 2021/04/17 23:13, zhouchuangao wrote:
> The main purpose of this patch is to add a whitelist and blacklist
> mechanism to the hung task thread.

We stopped using the term 'whitelist'/'blacklist' for new code in Linux kernel,
and what you are proposing is something like 'ignorelist'/'fatallist'.

I think that matching based on comm name is poor, for comm name is subjected to
impersonation by malicious user processes.

Moreover, speak of syzkaller testing, most of hang task reports are reaction to
somebody else consuming too much CPU resources (e.g. printk() flooding, too many
pending workqueue requests). Even if some process is in 'ignorelist', it is
possible that some problem that should be reported is already happening. Even if
some process is in 'fatallist', it is possible that the cause of hang is simply
somebody else is consuming too much CPU.

By the way, I wish that khungtaskd can report recent top CPU consumers, for it 
is
rare that the cause of hung is locking dependency problems / hardware problems.

Re: [syzbot] unexpected kernel reboot (4)

2021-04-16 Thread Tetsuo Handa

On 2021/04/15 0:39, Andrey Konovalov wrote:
> On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov  wrote:
>> The reproducer connects some USB HID device and communicates with the driver.
>> Previously we observed reboots because HID devices can trigger reboot
>> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set".
>> How else can a USB device reboot the machine? Is it possible to disable it?
>> I don't see any direct includes of  in drivers/usb/*
> 
> This happens when a keyboard sends the Ctrl+Alt+Del sequence, see
> fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c.
> 

Regarding ctrl_alt_del() problem, doing

  sh -c 'echo 0 > /proc/sys/kernel/ctrl-alt-del; echo $$ > 
/proc/sys/kernel/cad_pid'

as root before start fuzzing might help.

Also, with the command above, reproducer still triggers suspend operation which 
freezes userspace processes.
This could possibly be one of causes for no output / lost connections. Try 
disabling freeze/suspend related configs?

[   60.881255][ T6280] usb 5-1: new high-speed USB device number 2 using 
dummy_hcd
[   61.260648][ T6280] usb 5-1: config 0 interface 0 altsetting 0 endpoint 0x81 
has an invalid bInterval 0, changing to 7
[   61.274056][ T6280] usb 5-1: New USB device found, idVendor=0926, 
idProduct=, bcdDevice= 0.40
[   61.284700][ T6280] usb 5-1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[   61.289556][ T6280] usb 5-1: config 0 descriptor??
[   61.780871][ T6280] keytouch 0003:0926:.0002: fixing up Keytouch IEC 
report descriptor
[   61.792015][ T6280] input: HID 0926: as 
/devices/platform/dummy_hcd.0/usb5/5-1/5-1:0.0/0003:0926:.0002/input/input5
[   61.871612][ T6280] keytouch 0003:0926:.0002: input,hidraw1: USB HID 
v0.00 Keyboard [HID 0926:] on usb-dummy_hcd.0-1/input0
[   62.137706][ T6847] PM: suspend entry (s2idle)
[   62.147914][ T6847] Filesystems sync: 0.007 seconds
[   62.152031][ T6847] Freezing user space processes ... (elapsed 0.003 
seconds) done.
[   62.158369][ T6847] OOM killer disabled.
[   62.159673][ T6847] Freezing remaining freezable tasks ... (elapsed 0.003 
seconds) done.
[   62.167440][ T6847] vhci_hcd vhci_hcd.15: suspend vhci_hcd
[   62.169569][ T6847] vhci_hcd vhci_hcd.14: suspend vhci_hcd
[   62.171562][ T6847] vhci_hcd vhci_hcd.13: suspend vhci_hcd
[   62.173500][ T6847] vhci_hcd vhci_hcd.12: suspend vhci_hcd
[   62.175740][ T6847] vhci_hcd vhci_hcd.11: suspend vhci_hcd
[   62.177677][ T6847] vhci_hcd vhci_hcd.10: suspend vhci_hcd
[   62.179725][ T6847] vhci_hcd vhci_hcd.9: suspend vhci_hcd
[   62.181602][ T6847] vhci_hcd vhci_hcd.8: suspend vhci_hcd
[   62.183681][ T6847] vhci_hcd vhci_hcd.7: suspend vhci_hcd
[   62.185594][ T6847] vhci_hcd vhci_hcd.6: suspend vhci_hcd
[   62.187552][ T6847] vhci_hcd vhci_hcd.5: suspend vhci_hcd
[   62.189566][ T6847] vhci_hcd vhci_hcd.4: suspend vhci_hcd
[   62.191767][ T6847] vhci_hcd vhci_hcd.3: suspend vhci_hcd
[   62.193657][ T6847] vhci_hcd vhci_hcd.2: suspend vhci_hcd
[   62.195634][ T6847] vhci_hcd vhci_hcd.1: suspend vhci_hcd
[   62.197430][ T6847] vhci_hcd vhci_hcd.0: suspend vhci_hcd
[   62.249881][T8] mptbase: ioc0: pci-suspend: pdev=0x888005495000, 
slot=:00:10.0, Entering operating state [D0]

Re: [PATCH 1/5] xattr: Complete constify ->name member of "struct xattr"

2021-04-15 Thread Tetsuo Handa

On 2021/04/15 19:04, Roberto Sassu wrote:
> This patch completes commit 9548906b2bb7 ('xattr: Constify ->name member of
> "struct xattr"'). It fixes the documentation of the inode_init_security
> hook, by removing the xattr name from the objects that are expected to be
> allocated by LSMs (only the value is allocated). Also, it removes the
> kfree() of name and setting it to NULL in the reiserfs code.

Good catch, but well, grep does not find any reiserfs_security_free() callers.
Is reiserfs_security_free() a dead code?

[PATCH] ttyprintk: Add TTY hangup callback.

2021-04-14 Thread Tetsuo Handa

syzbot is reporting hung task due to flood of

  tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
   port->count);

message [1], for ioctl(TIOCVHANGUP) prevents tty_port_close() from
decrementing port->count due to tty_hung_up_p() == true.

--
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int i;
int fd[10];

for (i = 0; i < 10; i++)
fd[i] = open("/dev/ttyprintk", O_WRONLY);
ioctl(fd[0], TIOCVHANGUP);
for (i = 0; i < 10; i++)
close(fd[i]);
close(open("/dev/ttyprintk", O_WRONLY));
return 0;
}
--

When TTY hangup happens, port->count needs to be reset via
"struct tty_operations"->hangup callback.

[1] 
https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a

Reported-by: syzbot 
Reported-by: syzbot 
Tested-by: syzbot 
Signed-off-by: Tetsuo Handa 
Fixes: 24b4b67d17c308aa ("add ttyprintk driver")
---
 drivers/char/ttyprintk.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
index 6a0059e508e3..93f5d11c830b 100644
--- a/drivers/char/ttyprintk.c
+++ b/drivers/char/ttyprintk.c
@@ -158,12 +158,23 @@ static int tpk_ioctl(struct tty_struct *tty,
return 0;
 }
 
+/*
+ * TTY operations hangup function.
+ */
+static void tpk_hangup(struct tty_struct *tty)
+{
+   struct ttyprintk_port *tpkp = tty->driver_data;
+
+   tty_port_hangup(>port);
+}
+
 static const struct tty_operations ttyprintk_ops = {
.open = tpk_open,
.close = tpk_close,
.write = tpk_write,
.write_room = tpk_write_room,
.ioctl = tpk_ioctl,
+   .hangup = tpk_hangup,
 };
 
 static const struct tty_port_operations null_ops = { };
-- 
2.18.4

Re: [syzbot] unexpected kernel reboot (4)

2021-04-14 Thread Tetsuo Handa

On 2021/04/15 0:39, Andrey Konovalov wrote:
> On Wed, Apr 14, 2021 at 7:45 AM Dmitry Vyukov  wrote:
>>
>> On Tue, Apr 13, 2021 at 11:27 PM syzbot
>>  wrote:
>>>
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:89698bec Merge tag 'm68knommu-for-v5.12-rc7' of git://git...
>>> git tree:   upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1243fcfed0
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b234ddbbe2953747
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=9ce030d4c89856b27619
>>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=173e92fed0
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1735da2ed0
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+9ce030d4c89856b27...@syzkaller.appspotmail.com
>>>
>>> output_len: 0x0e74eb68
>>> kernel_total_size: 0x0f226000
>>> needed_size: 0x0f40
>>> trampoline_32bit: 0x0009d000
>>> Decompressing Linux... Parsing ELF... done.
>>> Booting the kernel.
>>
>> +linux-input
>>
>> The reproducer connects some USB HID device and communicates with the driver.
>> Previously we observed reboots because HID devices can trigger reboot
>> SYSRQ, but we disable it with "CONFIG_MAGIC_SYSRQ is not set".
>> How else can a USB device reboot the machine? Is it possible to disable it?
>> I don't see any direct includes of  in drivers/usb/*
> 
> This happens when a keyboard sends the Ctrl+Alt+Del sequence, see
> fn_boot_it()->ctrl_alt_del() in drivers/tty/vt/keyboard.c.
> 
> There was a patchset by Tetsuo [1] to suppress this, but I think it
> was abandoned.


Not abandoned; I'm waiting for you to join the discussion.
But for right now I'm trying to merge LOCKDEP's capacity tuning patch for
https://github.com/google/syzkaller/pull/2535 in the next merge window.

I still believe that ctrl_alt_del() etc. should be initially controlled
via kernel config options (despite Linus's objection at
https://lore.kernel.org/lkml/CAHk-=wgz=7MGxxX-tmMmdCsKyYJkuyxNc-4uLP=e_eEV=oz...@mail.gmail.com/
 ), for
we will need several trials and errors (and an effort to avoid kernel size 
bloating
like https://lkml.kernel.org/r/YD57hjaSmsYapHnQ@alley still remains) before we 
can
determine usable units for allow toggling via kernel command line options.

> 
> (This reminds of a somewhat related syzkaller issue:
> https://github.com/google/syzkaller/issues/1824; it relies on a
> similar reproducer.)
> 
> [1] https://groups.google.com/g/syzkaller/c/7wCmrGlLgm0/m/5yG6HVtbBQAJ
>

Re: How to handle concurrent access to /dev/ttyprintk ?

2021-04-14 Thread Tetsuo Handa

On 2021/04/14 9:45, Tetsuo Handa wrote:
> On 2021/04/12 21:04, Greg Kroah-Hartman wrote:
>>> Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from
>>> multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ?
>>
>> Why?  Can you not hit the same tty code paths from any other tty driver
>> being open multiple times?  Why is ttyprintk somehow "special" here?
> 
> I found a simplified reproducer. If we call ioctl(TIOCVHANGUP) on 
> /dev/ttyprintk ,
> "struct ttyprintk_port tpk_port".port.count cannot be decremented by
> tty_port_close() from tpk_close() due to tty_hung_up_p() == true when
> close() is called. As a result, tty->count and port count gets out of sync.
> 
> Then, when /dev/ttyprintk is opened again and then closed without calling
> ioctl(TIOCVHANGUP), this message is printed due to tty_hung_up_p() == false.
> 
> If I replace /dev/ttyprintk with /dev/ttyS0 in the reproducer shown above,
> this message is not printed.
> 

The difference between /dev/ttyS0 and /dev/ttyprintk is that
the former provides uart_hangup() as "struct tty_operations"->hangup
while the latter does not provide "struct tty_operations"->hangup .

It seems to me that below patch avoids this message, but I'm not familiar
with tty code. Is this fix correct? Don't we need to enforce all drivers
to provide "struct tty_operations"->hangup in order to reset port count ?

diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
index 6a0059e508e3..ff3b9a41b91b 100644
--- a/drivers/char/ttyprintk.c
+++ b/drivers/char/ttyprintk.c
@@ -158,12 +158,26 @@ static int tpk_ioctl(struct tty_struct *tty,
return 0;
 }
 
+/*
+ * TTY operations hangup function.
+ */
+static void tpk_hangup(struct tty_struct *tty)
+{
+   struct ttyprintk_port *tpkp = tty->driver_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>port.lock, flags);
+   tpkp->port.count = 0;
+   spin_unlock_irqrestore(>port.lock, flags);
+}
+
 static const struct tty_operations ttyprintk_ops = {
.open = tpk_open,
.close = tpk_close,
.write = tpk_write,
.write_room = tpk_write_room,
.ioctl = tpk_ioctl,
+   .hangup = tpk_hangup,
 };
 
 static const struct tty_port_operations null_ops = { };

Re: How to handle concurrent access to /dev/ttyprintk ?

2021-04-13 Thread Tetsuo Handa

On 2021/04/12 21:04, Greg Kroah-Hartman wrote:
>> Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from
>> multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ?
> 
> Why?  Can you not hit the same tty code paths from any other tty driver
> being open multiple times?  Why is ttyprintk somehow "special" here?

I found a simplified reproducer. If we call ioctl(TIOCVHANGUP) on 
/dev/ttyprintk ,
"struct ttyprintk_port tpk_port".port.count cannot be decremented by
tty_port_close() from tpk_close() due to tty_hung_up_p() == true when
close() is called. As a result, tty->count and port count gets out of sync.

Then, when /dev/ttyprintk is opened again and then closed without calling
ioctl(TIOCVHANGUP), this message is printed due to tty_hung_up_p() == false.

-- Debug print patch start --
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 391bada4cedb..42d54368adac 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -113,7 +113,7 @@
 #ifdef TTY_DEBUG_HANGUP
 # define tty_debug_hangup(tty, f, args...) tty_debug(tty, f, ##args)
 #else
-# define tty_debug_hangup(tty, f, args...) do { } while (0)
+# define tty_debug_hangup(tty, f, args...) tty_info(tty, f, ##args)
 #endif
 
 #define TTY_PARANOIA_CHECK 1
@@ -628,6 +628,7 @@ static void __tty_hangup(struct tty_struct *tty, int 
exit_session)
closecount++;
__tty_fasync(-1, filp, 0);  /* can't block */
filp->f_op = _up_tty_fops;
+   tty_warn(tty, "Set hung_up_tty_fops on %px\n", filp);
}
spin_unlock(>files_lock);
 
diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 346d20f4a486..032fc58203ea 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -561,8 +561,10 @@ int tty_port_close_start(struct tty_port *port,
 {
unsigned long flags;
 
-   if (tty_hung_up_p(filp))
+   if (tty_hung_up_p(filp)) {
+   tty_warn(tty, "Skipped by hung_up_tty_fops on %px\n", filp);
return 0;
+   }
 
spin_lock_irqsave(>lock, flags);
if (tty->count == 1 && port->count != 1) {
-- Debug print patch end --

-- Reproducer start --
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int i;
int fd[10];

for (i = 0; i < 10; i++)
fd[i] = open("/dev/ttyprintk", O_WRONLY);
ioctl(fd[0], TIOCVHANGUP);
for (i = 0; i < 10; i++)
close(fd[i]);
close(open("/dev/ttyprintk", O_WRONLY));
return 0;
}
-- Reproducer end --

-- Console output start --
[   31.885820] ttyprintk ttyprintk: opening (count=1)
[   31.885859] ttyprintk ttyprintk: opening (count=2)
[   31.885876] ttyprintk ttyprintk: opening (count=3)
[   31.885892] ttyprintk ttyprintk: opening (count=4)
[   31.885907] ttyprintk ttyprintk: opening (count=5)
[   31.885923] ttyprintk ttyprintk: opening (count=6)
[   31.885939] ttyprintk ttyprintk: opening (count=7)
[   31.885956] ttyprintk ttyprintk: opening (count=8)
[   31.885971] ttyprintk ttyprintk: opening (count=9)
[   31.885988] ttyprintk ttyprintk: opening (count=10)
[   31.885999] ttyprintk ttyprintk: vhangup
[   31.886005] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed1000
[   31.886009] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed2000
[   31.886012] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0c00
[   31.886016] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0e00
[   31.886019] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed2c00
[   31.886023] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3a00
[   31.886026] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3400
[   31.886029] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3c00
[   31.886033] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed0200
[   31.886036] ttyprintk ttyprintk: Set hung_up_tty_fops on 97c38eed3e00
[   31.886055] ttyprintk ttyprintk: releasing (count=10)
[   31.886106] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed3e00
[   31.886117] ttyprintk ttyprintk: releasing (count=9)
[   31.886121] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed0200
[   31.886128] ttyprintk ttyprintk: releasing (count=8)
[   31.886131] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed3c00
[   31.886138] ttyprintk ttyprintk: releasing (count=7)
[   31.886141] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed3400
[   31.886148] ttyprintk ttyprintk: releasing (count=6)
[   31.886151] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed3a00
[   31.886158] ttyprintk ttyprintk: releasing (count=5)
[   31.886161] ttyprintk ttyprintk: Skipped by hung_up_tty_fops on 
97c38eed2c00
[   31.886168] ttyprintk ttyprintk: releasing (count=4)
[

Re: [syzbot] WARNING in smk_set_cipso (2)

2021-04-12 Thread Tetsuo Handa

Commit 7ef4c19d245f3dc2 ("smackfs: restrict bytes count in smackfs write
functions") missed that count > SMK_CIPSOMAX check applies to only
format == SMK_FIXED24_FMT case.

Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
---
 security/smack/smackfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 22ded2c26089..1ad7d0d1ea62 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -855,6 +855,8 @@ static ssize_t smk_set_cipso(struct file *file, const char 
__user *buf,
if (format == SMK_FIXED24_FMT &&
(count < SMK_CIPSOMIN || count > SMK_CIPSOMAX))
return -EINVAL;
+   if (count > PAGE_SIZE)
+   return -EINVAL;
 
data = memdup_user_nul(buf, count);
if (IS_ERR(data))
-- 
2.18.4

Re: How to handle concurrent access to /dev/ttyprintk ?

2021-04-12 Thread Tetsuo Handa

On 2021/04/12 19:44, Greg Kroah-Hartman wrote:
> And trying to "open exclusive only" just does not work, the kernel can
> not enforce that at all, sorry.  Any driver that you see trying to do
> that is trivial to work around in userspace, making the kernel code
> pointless.

You mean something like below cannot be used?

diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
index 6a0059e508e3..57200569918a 100644
--- a/drivers/char/ttyprintk.c
+++ b/drivers/char/ttyprintk.c
@@ -84,14 +84,26 @@ static int tpk_printk(const unsigned char *buf, int count)
return count;
 }
 
+static DEFINE_MUTEX(open_close_lock);
+
 /*
  * TTY operations open function.
  */
 static int tpk_open(struct tty_struct *tty, struct file *filp)
 {
-   tty->driver_data = _port;
+   int ret;
+
+   if (mutex_lock_killable(_close_lock))
+   return -EINTR;
 
-   return tty_port_open(_port.port, tty, filp);
+   if (tpk_port.port.count) {
+   ret = -EBUSY;
+   } else {
+   tty->driver_data = _port;
+   ret = tty_port_open(_port.port, tty, filp);
+   }
+   mutex_unlock(_close_lock);
+   return ret;
 }
 
 /*
@@ -102,12 +114,14 @@ static void tpk_close(struct tty_struct *tty, struct file 
*filp)
struct ttyprintk_port *tpkp = tty->driver_data;
unsigned long flags;
 
+   mutex_lock(_close_lock);
spin_lock_irqsave(>spinlock, flags);
/* flush tpk_printk buffer */
tpk_printk(NULL, 0);
spin_unlock_irqrestore(>spinlock, flags);
 
tty_port_close(>port, tty, filp);
+   mutex_unlock(_close_lock);
 }
 
 /*

> Like any tty port, if you have multiple accesses, all bets are off and
> hilarity ensues.  Just don't do that and expect things to be working
> well.

Since syzkaller is a fuzzer, syzkaller happily opens /dev/ttyprintk from
multiple threads. Should we update syzkaller to use CONFIG_TTY_PRINTK=n ?

How to handle concurrent access to /dev/ttyprintk ?

2021-04-12 Thread Tetsuo Handa

What is the intended usage of /dev/ttyprintk ?

It seems that drivers/char/ttyprintk.c was not designed to be opened by
multiple processes. As a result, syzbot can trigger tty_warn() flooding
enough to fire khungtaskd warning due to tty_port_close().

Do we need to allow concurrent access to /dev/ttyprintk ?
If we can't change /dev/ttyprintk exclusively open()able by only
one thread, how to handle concurrent access to /dev/ttyprintk ?

On 2021/04/07 23:24, Tetsuo Handa wrote:
> On 2021/04/07 22:48, Greg Kroah-Hartman wrote:
>>> By the way, as soon as applying this patch, I guess that syzkaller starts
>>> generating hung task reports because /dev/ttyprintk can trivially trigger
>>> flood of
>>>
>>>   tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
>>>port->count);
>>>
>>> message, and adding
>>>
>>>   if (strcmp(tty_driver_name(tty), "ttyprintk"))
>>
>> Odd, how can ttyprintk() generate that mess?
> 
> So far three tests and results:
> 
>   https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/WifLgadvAAAJ
>   https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/w2_MiMmJ
>   https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/hfsQqSOPAAAJ
> 
> Patch https://syzkaller.appspot.com/x/patch.diff?x=145e4c9ad0 generated
> console output https://syzkaller.appspot.com/x/log.txt?x=162f9fced0 .
> 
> Patch https://syzkaller.appspot.com/x/patch.diff?x=14839931d0 did not
> flood the console output enough to fire khungtaskd.
> 
> Maybe it is because /dev/ttyprintk can be opened/closed by multiple processes
> without serialization?
> 
> Running
> 
>   for i in $(seq 1 100); do sleep 1 > /dev/ttyprintk & done
> 
> results in
> 
>   tty_port_close_start: tty->count = 1 port count = 100
> 
> . If tty_port_open() from tpk_open() can do
> 
>   spin_lock_irq(>lock);
>   ++port->count;
>   spin_unlock_irq(>lock);
> 
> when tty_port_close_start() from tty_port_close() from tpk_close() is doing
> 
>   spin_lock_irqsave(>lock, flags);
>   if (tty->count == 1 && port->count != 1) {
> tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
>  port->count);
> port->count = 1;
>   }
>   if (--port->count < 0) {
> tty_warn(tty, "%s: bad port count (%d)\n", __func__,
>  port->count);
> port->count = 0;
>   }
> 
>   if (port->count) {
> spin_unlock_irqrestore(>lock, flags);
> return 0;
>   }
>   spin_unlock_irqrestore(>lock, flags);
> 
> , what prevents port->count from getting larger than 1 ?
>

[tip: irq/core] kernel: Initialize cpumask before parsing

2021-04-10 Thread tip-bot2 for Tetsuo Handa

The following commit has been merged into the irq/core branch of tip:

Commit-ID: c5e3a41187ac01425f5ad1abce927905e4ac44e4
Gitweb:
https://git.kernel.org/tip/c5e3a41187ac01425f5ad1abce927905e4ac44e4
Author:Tetsuo Handa 
AuthorDate:Thu, 01 Apr 2021 14:58:23 +09:00
Committer: Thomas Gleixner 
CommitterDate: Sat, 10 Apr 2021 13:35:54 +02:00

kernel: Initialize cpumask before parsing

KMSAN complains that new_value at cpumask_parse_user() from
write_irq_affinity() from irq_affinity_proc_write() is uninitialized.

  [  148.133411][ T5509] =
  [  148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340
  [  148.137819][ T5509]
  [  148.138448][ T5509] Local variable new_value.i@irq_affinity_proc_write 
created at:
  [  148.140768][ T5509]  irq_affinity_proc_write+0xc3/0x3d0
  [  148.142298][ T5509]  irq_affinity_proc_write+0xc3/0x3d0
  [  148.143823][ T5509] =

Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(),
any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility
that find_next_bit() accesses uninitialized cpu mask variable. Fix this
problem by replacing alloc_cpumask_var() with zalloc_cpumask_var().

Signed-off-by: Tetsuo Handa 
Signed-off-by: Thomas Gleixner 
Acked-by: Steven Rostedt (VMware) 
Link: 
https://lore.kernel.org/r/20210401055823.3929-1-penguin-ker...@i-love.sakura.ne.jp

---
 kernel/irq/proc.c| 4 ++--
 kernel/profile.c | 2 +-
 kernel/trace/trace.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 9813878..7c5cd42 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -144,7 +144,7 @@ static ssize_t write_irq_affinity(int type, struct file 
*file,
if (!irq_can_set_affinity_usr(irq) || no_irq_affinity)
return -EIO;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
if (type)
@@ -238,7 +238,7 @@ static ssize_t default_affinity_write(struct file *file,
cpumask_var_t new_value;
int err;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(buffer, count, new_value);
diff --git a/kernel/profile.c b/kernel/profile.c
index 6f69a41..c2ebddb 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -430,7 +430,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file,
cpumask_var_t new_value;
int err;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(buffer, count, new_value);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index eccb4e1..7607dc8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4827,7 +4827,7 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
cpumask_var_t tracing_cpumask_new;
int err;
 
-   if (!alloc_cpumask_var(_cpumask_new, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_cpumask_new, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(ubuf, count, tracing_cpumask_new);

Re: [PATCH 05/13] tty: remove tty_warn()

2021-04-08 Thread Tetsuo Handa

On 2021/04/08 21:51, Greg Kroah-Hartman wrote:
> Remove users of tty_warn() and replace them with calls to dev_warn()
> which provides more information about the tty that has the error and
> uses the standard formatting logic.

Ouch. This series would be good for clean up, but this series might be
bad for handling lockdep warning syzbot is reporting.

Since tty_warn() is using plain printk(), we can avoid lockdep warning by
using printk_deferred(). If we use dev_warn() instead, we need to modify
__dev_printk() to use printk_deferred(), which means that all dev_*() users
are affected by this change.

Also, we need to modify dev_printk_emit()/dev_vprintk_emit() callers to embed
loglevel into the format string so that we pass LOGLEVEL_SCHED to 
vprintk_emit() ...
maybe just change from "if (!in_sched)" to "if (!in_sched && !dev_info)" 
instead ?

Also, dev_vprintk_emit() need to start calling defer_console_output()
after returning from vprintk_emit() in order to behave like printk_deferred().

I'm not sure whether this change is safe.

Re: [PATCH v2] tty: use printk_deferred() at tty_msg()

2021-04-07 Thread Tetsuo Handa

On 2021/04/07 22:48, Greg Kroah-Hartman wrote:
>> By the way, as soon as applying this patch, I guess that syzkaller starts
>> generating hung task reports because /dev/ttyprintk can trivially trigger
>> flood of
>>
>>   tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
>>port->count);
>>
>> message, and adding
>>
>>   if (strcmp(tty_driver_name(tty), "ttyprintk"))
> 
> Odd, how can ttyprintk() generate that mess?

So far three tests and results:

  https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/WifLgadvAAAJ
  https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/w2_MiMmJ
  https://groups.google.com/g/syzkaller-bugs/c/yRLYijD2tbw/m/hfsQqSOPAAAJ

Patch https://syzkaller.appspot.com/x/patch.diff?x=145e4c9ad0 generated
console output https://syzkaller.appspot.com/x/log.txt?x=162f9fced0 .

Patch https://syzkaller.appspot.com/x/patch.diff?x=14839931d0 did not
flood the console output enough to fire khungtaskd.

Maybe it is because /dev/ttyprintk can be opened/closed by multiple processes
without serialization?

Running

  for i in $(seq 1 100); do sleep 1 > /dev/ttyprintk & done

results in

  tty_port_close_start: tty->count = 1 port count = 100

. If tty_port_open() from tpk_open() can do

  spin_lock_irq(>lock);
  ++port->count;
  spin_unlock_irq(>lock);

when tty_port_close_start() from tty_port_close() from tpk_close() is doing

  spin_lock_irqsave(>lock, flags);
  if (tty->count == 1 && port->count != 1) {
tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
 port->count);
port->count = 1;
  }
  if (--port->count < 0) {
tty_warn(tty, "%s: bad port count (%d)\n", __func__,
 port->count);
port->count = 0;
  }

  if (port->count) {
spin_unlock_irqrestore(>lock, flags);
return 0;
  }
  spin_unlock_irqrestore(>lock, flags);

, what prevents port->count from getting larger than 1 ?

[PATCH v2] tty: use printk_deferred() at tty_msg()

2021-04-07 Thread Tetsuo Handa

intk_safe.c:401
 printk+0xba/0xed kernel/printk/printk.c:2146
 tty_port_close_start.part.0+0x503/0x550 drivers/tty/tty_port.c:569
 tty_port_close_start drivers/tty/tty_port.c:641 [inline]
 tty_port_close+0x46/0x170 drivers/tty/tty_port.c:634
 tty_release+0x45e/0x1210 drivers/tty/tty_io.c:1779
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 exit_task_work include/linux/task_work.h:30 [inline]
 do_exit+0xbfc/0x2a60 kernel/exit.c:825
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x47f/0x2150 kernel/signal.c:2781
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae

  other info that might help us debug this:

  Chain exists of:
console_owner --> _lock_key --> >lock

   Possible unsafe locking scenario:

 CPU0CPU1
 
lock(>lock);
 lock(_lock_key);
 lock(>lock);
lock(console_owner);

   *** DEADLOCK ***

  3 locks held by syz-executor.4/2155:
   #0: 88802ce391c0 (>legacy_mutex){+.+.}-{3:3}, at: 
tty_lock+0xbd/0x120 drivers/tty/tty_mutex.c:19
   #1: 90114698 (>lock){-.-.}-{2:2}, at: 
tty_port_close_start.part.0+0x28/0x550 drivers/tty/tty_port.c:567
   #2: 8bf60920 (console_lock){+.+.}-{0:0}, at: vprintk_func+0x8d/0x1e0 
kernel/printk/printk_safe.c:401

  stack backtrace:
  CPU: 0 PID: 2155 Comm: syz-executor.4 Not tainted 5.12.0-rc6-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x141/0x1d7 lib/dump_stack.c:120
   check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2127
   check_prev_add kernel/locking/lockdep.c:2936 [inline]
   check_prevs_add kernel/locking/lockdep.c:3059 [inline]
   validate_chain kernel/locking/lockdep.c:3674 [inline]
   __lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
   lock_acquire kernel/locking/lockdep.c:5510 [inline]
   lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
   console_lock_spinning_enable kernel/printk/printk.c:1714 [inline]
   console_unlock+0x371/0xc80 kernel/printk/printk.c:2573
   vprintk_emit+0x1ca/0x560 kernel/printk/printk.c:2098
   vprintk_func+0x8d/0x1e0 kernel/printk/printk_safe.c:401
   printk+0xba/0xed kernel/printk/printk.c:2146
   tty_port_close_start.part.0+0x503/0x550 drivers/tty/tty_port.c:569
   tty_port_close_start drivers/tty/tty_port.c:641 [inline]
   tty_port_close+0x46/0x170 drivers/tty/tty_port.c:634
   tty_release+0x45e/0x1210 drivers/tty/tty_io.c:1779
   __fput+0x288/0x920 fs/file_table.c:280
   task_work_run+0xdd/0x1a0 kernel/task_work.c:140
   exit_task_work include/linux/task_work.h:30 [inline]
   do_exit+0xbfc/0x2a60 kernel/exit.c:825
   do_group_exit+0x125/0x310 kernel/exit.c:922
   get_signal+0x47f/0x2150 kernel/signal.c:2781
   arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789
   handle_signal_work kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
   exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
   __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Note that this patch has a small side effect that 5 tty_debug() messages
in __proc_set_tty()/tty_release_checks() are enabled, for pr_debug() is
conditional while KERN_DEBUG is unconditional. It seems that remaining
tty_debug() messages meant to be unconditional without knowing that
pr_debug() is conditional.

Note that this patch is not complete, for there is same dependency due to
memory allocation fault injection at tty_buffer_alloc().

By the way, as soon as applying this patch, I guess that syzkaller starts
generating hung task reports because /dev/ttyprintk can trivially trigger
flood of

  tty_warn(tty, "%s: tty->count = 1 port count = %d\n", __func__,
   port->count);

message, and adding

  if (strcmp(tty_driver_name(tty), "ttyprintk"))

immediately before this tty_warn() gave me Tested-by: response.

[1] 
https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a

Reported-by: syzbot 
Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
Fixes: b6da31b2c07c46f2 ("tty: Fix data race in 
tty_insert_flip_string_fixed_flag")
Cc:  # 4.18+
---
 include/linux/t

Re: [PATCH] tty: use printk_safe context at tty_msg()

2021-04-06 Thread Tetsuo Handa

On 2021/04/07 0:10, Petr Mladek wrote:
>> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
>> index 6d4995a5f318..d59f7873bc49 100644
>> --- a/drivers/tty/tty_buffer.c
>> +++ b/drivers/tty/tty_buffer.c
>> @@ -156,6 +156,7 @@ static struct tty_buffer *tty_buffer_alloc(struct 
>> tty_port *port, size_t size)
>>  {
>>  struct llist_node *free;
>>  struct tty_buffer *p;
>> +unsigned long flags;
>>  
>>  /* Round the buffer size out */
>>  size = __ALIGN_MASK(size, TTYB_ALIGN_MASK);
>> @@ -172,7 +173,9 @@ static struct tty_buffer *tty_buffer_alloc(struct 
>> tty_port *port, size_t size)
>> have queued and recycle that ? */
>>  if (atomic_read(>buf.mem_used) > port->buf.mem_limit)
>>  return NULL;
>> -p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC);
>> +printk_safe_enter_irqsave(flags);
>> +p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC | 
>> __GFP_NOWARN);
>> +printk_safe_exit_irqrestore(flags);
> 
> I do not see tty_buffer_alloc() anywhere at
> https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a
> 
> Could you please provide more details why this is needed?

Quoting from https://syzkaller.appspot.com/text?tag=CrashLog=131d4ddf80 
at 
https://syzkaller.appspot.com/bug?id=bda1a87bea05a9072003e6447c44b03ca1492b1c :

[  274.992629] FAULT_INJECTION: forcing a failure.
[  274.992629] name failslab, interval 1, probability 0, space 0, times 0
[  274.994085] CPU: 1 PID: 19187 Comm: syz-executor6 Not tainted 4.17.0+ #89
[  274.994094] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[  274.994099] Call Trace:
[  274.994120]  dump_stack+0x1b9/0x294
[  274.994145]  ? dump_stack_print_info.cold.2+0x52/0x52
[  275.270563]  should_fail.cold.4+0xa/0x1a
[  275.274639]  ? fault_create_debugfs_attr+0x1f0/0x1f0
[  275.279747]  ? graph_lock+0x170/0x170
[  275.283541]  ? check_noncircular+0x20/0x20
[  275.287788]  ? debug_check_no_locks_freed+0x310/0x310
[  275.292983]  ? lock_acquire+0x1dc/0x520
[  275.296961]  ? process_echoes+0xb6/0x170
[  275.301021]  ? check_same_owner+0x320/0x320
[  275.305347]  ? print_usage_bug+0xc0/0xc0
[  275.309404]  ? __might_sleep+0x95/0x190
[  275.313394]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[  275.318934]  __should_failslab+0x124/0x180
[  275.323168]  should_failslab+0x9/0x14
[  275.326974]  __kmalloc+0x63/0x760
[  275.330423]  ? mutex_trylock+0x2a0/0x2a0
[  275.334479]  ? print_usage_bug+0xc0/0xc0
[  275.338539]  ? __tty_buffer_request_room+0x2d2/0x7f0
[  275.343648]  __tty_buffer_request_room+0x2d2/0x7f0
[  275.348576]  ? flush_to_ldisc+0x560/0x560
[  275.352717]  ? lock_acquire+0x1dc/0x520
[  275.356698]  ? pty_write+0xf9/0x1f0
[  275.360325]  ? lock_release+0xa10/0xa10
[  275.364293]  ? kasan_check_read+0x11/0x20
[  275.368434]  ? do_raw_spin_unlock+0x9e/0x2e0
[  275.372831]  ? do_raw_spin_trylock+0x1b0/0x1b0
[  275.377411]  tty_insert_flip_string_fixed_flag+0x8d/0x1f0
[  275.382942]  ? do_raw_spin_lock+0xc1/0x200
[  275.387174]  pty_write+0x12c/0x1f0
[  275.390720]  tty_put_char+0x129/0x150
[  275.394525]  ? dev_match_devt+0x90/0x90
[  275.398498]  ? pty_write_room+0xc9/0xf0
[  275.402485]  ? __sanitizer_cov_trace_switch+0x53/0x90
[  275.407684]  __process_echoes+0x4d9/0x8d0
[  275.411829]  ? _raw_spin_unlock_irqrestore+0x74/0xc0
[  275.416936]  process_echoes+0xfc/0x170
[  275.420816]  n_tty_set_termios+0xb56/0xe80
[  275.425047]  ? n_tty_receive_signal_char+0x120/0x120
[  275.430148]  tty_set_termios+0x7a0/0xac0
[  275.434201]  ? tty_wait_until_sent+0x5b0/0x5b0
[  275.438782]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[  275.444318]  set_termios+0x41e/0x7d0
[  275.448026]  ? tty_perform_flush+0x80/0x80
[  275.452256]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[  275.457845]  tty_mode_ioctl+0x535/0xb50
[  275.461821]  ? set_termios+0x7d0/0x7d0
[  275.465740]  ? check_same_owner+0x320/0x320
[  275.470059]  ? graph_lock+0x170/0x170
[  275.473857]  n_tty_ioctl_helper+0x54/0x3b0
[  275.478085]  n_tty_ioctl+0x54/0x320
[  275.481711]  ? ldsem_down_read+0x37/0x40
[  275.485760]  ? ldsem_down_read+0x37/0x40
[  275.489836]  tty_ioctl+0x5e1/0x1870
[  275.493454]  ? commit_echoes+0x1d0/0x1d0
[  275.497506]  ? tty_vhangup+0x30/0x30
[  275.501212]  ? rcu_is_watching+0x85/0x140
[  275.505353]  ? rcu_report_qs_rnp+0x790/0x790
[  275.509757]  ? __fget+0x40c/0x650
[  275.513243]  ? find_held_lock+0x31/0x1c0
[  275.517439]  ? expand_files.part.8+0x9a0/0x9a0
[  275.522033]  ? kasan_check_write+0x14/0x20
[  275.526389]  ? __mutex_unlock_slowpath+0x180/0x8a0
[  275.531314]  ? wait_for_completion+0x870/0x870
[  275.535907]  ? tty_vhangup+0x30/0x30
[  275.539616]  do_vfs_ioctl+0x1cf/0x16f0
[  275.543522]  ? ioctl_preallocate+0x2e0/0x2e0
[  275.547934]  ? fget_raw+0x20/0x20
[  275.551554]  ? __sb_end_write+0xac/0xe0
[  275.25]  ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[  275.561064]  ? fput+0x130/0x1a0
[  275.564345]  ?

Re: [PATCH] tty: use printk_safe context at tty_msg()

2021-04-06 Thread Tetsuo Handa

On 2021/04/06 16:10, Greg Kroah-Hartman wrote:
> On Tue, Apr 06, 2021 at 02:31:43PM +0900, Tetsuo Handa wrote:
>> On 2021/04/06 13:51, Jiri Slaby wrote:
>>> On 03. 04. 21, 6:14, Tetsuo Handa wrote:
>>>> --- a/include/linux/tty.h
>>>> +++ b/include/linux/tty.h
>>>> @@ -14,6 +14,7 @@
>>>>   #include 
>>>>   #include 
>>>>   #include 
>>>> +#include <../../kernel/printk/internal.h>
>>>
>>> Including printk's internal header in linux/tty.h doesn't look correct to 
>>> me.
>>>
>>
>> This is because this patch wants __printk_safe_enter()/__printk_safe_exit()
>> without #ifdef'ing CONFIG_PRINTK.
> 
> Then those functions need to be "properly" exported, not placed only in
> an "internal.h" file that obviously should not be included from anywhere
> like this.
> 
>> Peter and Sergey, what should we do?
>> Can we move printk_safe_enter_irqsave()/printk_safe_exit_irqrestore() to 
>> include/linux/printk.h ?
> 
> Are you sure that is the only way to resolve this?

Publishing printk_safe_enter_irqsave() etc. was once proposed at
https://lkml.kernel.org/r/20181016050428.17966-3-sergey.senozhat...@gmail.com 
and
Peter Zijlstra did not like such change. But we reconfirmed that "tty_port lock 
must
switch to printk_safe" at 
https://lkml.kernel.org/r/20190219013254.GA20023@jagdpanzerIV .

Two years has elapsed since a completely new printk implementation which would 
not
use printk_safe context was proposed, but printk_safe context is still 
remaining.
Therefore, we might need to tolerate, with a warning that 
printk_safe_enter_irqsave()
etc. are not intended for general use, use of printk_safe context for tty_msg() 
and
tty_buffer_alloc().

Re: [PATCH] tty: use printk_safe context at tty_msg()

2021-04-05 Thread Tetsuo Handa

On 2021/04/06 13:51, Jiri Slaby wrote:
> On 03. 04. 21, 6:14, Tetsuo Handa wrote:
>> --- a/include/linux/tty.h
>> +++ b/include/linux/tty.h
>> @@ -14,6 +14,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include <../../kernel/printk/internal.h>
> 
> Including printk's internal header in linux/tty.h doesn't look correct to me.
> 

This is because this patch wants __printk_safe_enter()/__printk_safe_exit()
without #ifdef'ing CONFIG_PRINTK.

Peter and Sergey, what should we do?
Can we move printk_safe_enter_irqsave()/printk_safe_exit_irqrestore() to 
include/linux/printk.h ?

Re: [PATCH v5] lockdep: Allow tuning tracing capacity constants.

2021-04-05 Thread Tetsuo Handa

Dmitry, I've just sent this patch to tomoyo-test1.git tree, for finding bugs
(which previously could not be found due to lack of this patch) in 
linux-next.git
will be helpful anyway.

Since this patch should appear in tomorrow's linux-next.git , please prepare
syzkaller for tuning appropriate values (default value + 1 should be OK).

On 2021/03/31 19:58, Tetsuo Handa wrote:
> Peter, are you there?
> 
> If you keep silence, we will assume that applying this patch is the way to go.
> 
> On 2021/03/20 16:34, Dmitry Vyukov wrote:
>> On Mon, Feb 8, 2021 at 11:29 AM Tetsuo Handa
>>  wrote:
>>>
>>> Since syzkaller continues various test cases until the kernel crashes,
>>> syzkaller tends to examine more locking dependencies than normal systems.
>>> As a result, syzbot is reporting that the fuzz testing was terminated
>>> due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis
>>> via /proc/lockdep* did not show any obvious culprit [4] [5], we have no
>>> choice but allow tuning tracing capacity constants.
>>>
>>> [1] 
>>> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
>>> [2] 
>>> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
>>> [3] 
>>> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
>>> [4] 
>>> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
>>> [5] 
>>> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp
>>>
>>> Reported-by: syzbot 
>>> Reported-by: syzbot 
>>> Reported-by: syzbot 
>>> References: 
>>> https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
>>> Signed-off-by: Tetsuo Handa 
>>> Acked-by: Dmitry Vyukov 
>>
>> Peter, ping.
>> Please at least provide some feedback.
>> This is still the top crasher on syzbot and harms testing of the whole
>> kernel. I am periodically thinking of disabling LOCKDEP on syzbot as
>> harming more than helping, but so far talking myself out of it because
>> it will likely be broken more when we try to re-enable it and I still
>> hope for a timely resolution of this issue.
>>
>

[PATCH] printk: Make multiple inclusion of kernel/printk/internal.h safe

2021-04-03 Thread Tetsuo Handa

kernel test robot reported that kernel/printk/internal.h is not ready
to be #include'd for multiple times, and that vprintk_func() for
CONFIG_PRINTK=n should be marked as "static inline".

Since "tty: use printk_safe context at tty_msg()" will make
kernel/printk/internal.h be #include'd for multiple times, let's fix
this problem first.

Reported-by: kernel test robot 
Signed-off-by: Tetsuo Handa 
Fixes: 099f1c84c0052ec1 ("printk: introduce per-cpu safe_print seq buffer")
Cc:  # 4.11+
---
 kernel/printk/internal.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..2119c546cba2 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -2,6 +2,10 @@
 /*
  * internal.h - printk internal definitions
  */
+
+#ifndef _KERNEL_PRINTK_INTERNAL_H
+#define _KERNEL_PRINTK_INTERNAL_H
+
 #include 
 
 #ifdef CONFIG_PRINTK
@@ -56,7 +60,7 @@ void defer_console_output(void);
 
 #else
 
-__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { 
return 0; }
 
 /*
  * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
@@ -72,3 +76,5 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
args) { return 0; }
 static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
+
+#endif
-- 
2.18.4

[PATCH] tty: use printk_safe context at tty_msg()

2021-04-02 Thread Tetsuo Handa

syzbot is reporting circular locking dependency due to calling printk()
with port lock held [1]. When this problem was reported, we worried
whether printk_safe context will remain available in future kernels [2],
and then this problem was forgotten. But in order to utilize syzbot's
resource for finding other bugs/reproducers by closing this one of top
crashers, let's apply a patch which counts on availability of printk_safe
context.

syzbot is also reporting same dependency due to memory allocation fault
injection at tty_buffer_alloc(). Although __GFP_NOWARN cannot prevent
memory allocation fault injection from calling printk(), let's use
__GFP_NOWARN at tty_buffer_alloc() in addition to using printk_safe
context, for generating many lines of messages due to warn_alloc() is
annoying. If we want to report it, we can use pr_warn() instead.

[1] 
https://syzkaller.appspot.com/bug?id=39ea6caa479af471183997376dc7e90bc7d64a6a
[2] https://lkml.kernel.org/r/20190218054649.GA26686@jagdpanzerIV

Reported-by: syzbot 
Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
Fixes: b6da31b2c07c46f2 ("tty: Fix data race in 
tty_insert_flip_string_fixed_flag")
Cc:  # 4.18+
---
 drivers/tty/tty_buffer.c | 5 -
 include/linux/tty.h  | 9 -
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 6d4995a5f318..d59f7873bc49 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -156,6 +156,7 @@ static struct tty_buffer *tty_buffer_alloc(struct tty_port 
*port, size_t size)
 {
struct llist_node *free;
struct tty_buffer *p;
+   unsigned long flags;
 
/* Round the buffer size out */
size = __ALIGN_MASK(size, TTYB_ALIGN_MASK);
@@ -172,7 +173,9 @@ static struct tty_buffer *tty_buffer_alloc(struct tty_port 
*port, size_t size)
   have queued and recycle that ? */
if (atomic_read(>buf.mem_used) > port->buf.mem_limit)
return NULL;
-   p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC);
+   printk_safe_enter_irqsave(flags);
+   p = kmalloc(sizeof(struct tty_buffer) + 2 * size, GFP_ATOMIC | 
__GFP_NOWARN);
+   printk_safe_exit_irqrestore(flags);
if (p == NULL)
return NULL;
 
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 95fc2f100f12..7ae8eb46fec3 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include <../../kernel/printk/internal.h>
 
 
 /*
@@ -773,7 +774,13 @@ static inline void proc_tty_unregister_driver(struct 
tty_driver *d) {}
 #endif
 
 #define tty_msg(fn, tty, f, ...) \
-   fn("%s %s: " f, tty_driver_name(tty), tty_name(tty), ##__VA_ARGS__)
+   do {\
+   unsigned long flags;\
+   \
+   printk_safe_enter_irqsave(flags);   \
+   fn("%s %s: " f, tty_driver_name(tty), tty_name(tty), 
##__VA_ARGS__); \
+   printk_safe_exit_irqrestore(flags); \
+   } while (0)
 
 #define tty_debug(tty, f, ...) tty_msg(pr_debug, tty, f, ##__VA_ARGS__)
 #define tty_info(tty, f, ...)  tty_msg(pr_info, tty, f, ##__VA_ARGS__)
-- 
2.18.4

[PATCH 2/2] misc: vmw_vmci: explicitly initialize vmci_datagram payload

2021-04-02 Thread Tetsuo Handa

KMSAN complains that vmci_check_host_caps() left the payload part of
check_msg uninitialized.

  =
  BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
  CPU: 1 PID: 1 Comm: swapper/0 Tainted: GB 5.11.0-rc7+ #4
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 02/27/2020
  Call Trace:
   dump_stack+0x21c/0x280
   kmsan_report+0xfb/0x1e0
   kmsan_internal_check_memory+0x202/0x520
   kmsan_check_memory+0xd/0x10
   iowrite8_rep+0x86/0x380
   vmci_guest_probe_device+0xf0b/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Uninit was created at:
   kmsan_internal_poison_shadow+0x5c/0xf0
   kmsan_slab_alloc+0x8d/0xe0
   kmem_cache_alloc+0x84f/0xe30
   vmci_guest_probe_device+0xd11/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Bytes 28-31 of 36 are uninitialized
  Memory access of size 36 starts at 8881675e5f00
  =

Signed-off-by: Tetsuo Handa 
Fixes: 1f166439917b69d3 ("VMCI: guest side driver implementation.")
Cc: 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index cc8eeb361fcd..1018dc77269d 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -168,7 +168,7 @@ static int vmci_check_host_caps(struct pci_dev *pdev)
VMCI_UTIL_NUM_RESOURCES * sizeof(u32);
struct vmci_datagram *check_msg;
 
-   check_msg = kmalloc(msg_size, GFP_KERNEL);
+   check_msg = kzalloc(msg_size, GFP_KERNEL);
if (!check_msg) {
dev_err(>dev, "%s: Insufficient memory\n", __func__);
return -ENOMEM;
-- 
2.18.4

[PATCH 1/2] misc: vmw_vmci: explicitly initialize vmci_notify_bm_set_msg struct

2021-04-02 Thread Tetsuo Handa

KMSAN complains that the vmci_use_ppn64() == false path in
vmci_dbell_register_notification_bitmap() left upper 32bits of
bitmap_set_msg.bitmap_ppn64 member uninitialized.

  =
  BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7+ #4
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 02/27/2020
  Call Trace:
   dump_stack+0x21c/0x280
   kmsan_report+0xfb/0x1e0
   kmsan_internal_check_memory+0x484/0x520
   kmsan_check_memory+0xd/0x10
   iowrite8_rep+0x86/0x380
   vmci_send_datagram+0x150/0x280
   vmci_dbell_register_notification_bitmap+0x133/0x1e0
   vmci_guest_probe_device+0xcab/0x1e70
   pci_device_probe+0xab3/0xe70
   really_probe+0xd16/0x24d0
   driver_probe_device+0x29d/0x3a0
   device_driver_attach+0x25a/0x490
   __driver_attach+0x78c/0x840
   bus_for_each_dev+0x210/0x340
   driver_attach+0x89/0xb0
   bus_add_driver+0x677/0xc40
   driver_register+0x485/0x8e0
   __pci_register_driver+0x1ff/0x350
   vmci_guest_init+0x3e/0x41
   vmci_drv_init+0x1d6/0x43f
   do_one_initcall+0x39c/0x9a0
   do_initcall_level+0x1d7/0x259
   do_initcalls+0x127/0x1cb
   do_basic_setup+0x33/0x36
   kernel_init_freeable+0x29a/0x3ed
   kernel_init+0x1f/0x840
   ret_from_fork+0x1f/0x30

  Local variable bitmap_set_msg@vmci_dbell_register_notification_bitmap 
created at:
   vmci_dbell_register_notification_bitmap+0x50/0x1e0
   vmci_dbell_register_notification_bitmap+0x50/0x1e0

  Bytes 28-31 of 32 are uninitialized
  Memory access of size 32 starts at 88810098f570
  =

Signed-off-by: Tetsuo Handa 
Fixes: 83e2ec765be03e8a ("VMCI: doorbell implementation.")
Cc: 
---
 drivers/misc/vmw_vmci/vmci_doorbell.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c 
b/drivers/misc/vmw_vmci/vmci_doorbell.c
index 345addd9306d..fa8a7fce4481 100644
--- a/drivers/misc/vmw_vmci/vmci_doorbell.c
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.c
@@ -326,7 +326,7 @@ int vmci_dbell_host_context_notify(u32 src_cid, struct 
vmci_handle handle)
 bool vmci_dbell_register_notification_bitmap(u64 bitmap_ppn)
 {
int result;
-   struct vmci_notify_bm_set_msg bitmap_set_msg;
+   struct vmci_notify_bm_set_msg bitmap_set_msg = { };
 
bitmap_set_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
  VMCI_SET_NOTIFY_BITMAP);
-- 
2.18.4

Re: [PATCH] misc: vmw_vmci: initialize payload passed to vmci_send_datagram()

2021-04-01 Thread Tetsuo Handa

On 2021/04/01 15:18, Greg Kroah-Hartman wrote:
> On Thu, Apr 01, 2021 at 02:57:47PM +0900, Tetsuo Handa wrote:
>> KMSAN complains that the vmci_use_ppn64() == false path in
>> vmci_dbell_register_notification_bitmap() left upper 32bits of
>> bitmap_set_msg.bitmap_ppn64 member uninitialized.
>>
>> KMSAN also complains that vmci_check_host_caps() left the payload part
>> of check_msg uninitialized.
>>
> 
> What commit does this "fix"?  Can you resend with a proper "Fixes:" tag
> so we know where it needs to be backported to?

It seems that this problem exists since the introduction.

  commit 83e2ec765be03e8a8a07619e65df70b48a1db023
  Author: George Zhang 
  Date:   Tue Jan 8 15:53:51 2013 -0800
  
  VMCI: doorbell implementation.
  
  commit 1f166439917b69d3046e2e49fe923579d9181212
  Author: George Zhang 
  Date:   Tue Jan 8 15:55:32 2013 -0800
  
  VMCI: guest side driver implementation.

But this patch should be safe to backport to as old as possible, for
this patch is merely explicitly initializing variables.

[PATCH] kernel: initialize cpumask before parsing

2021-04-01 Thread Tetsuo Handa

KMSAN complains that new_value at cpumask_parse_user() from
write_irq_affinity() from irq_affinity_proc_write() is uninitialized.

  [  148.133411][ T5509] =
  [  148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340
  [  148.137819][ T5509]
  [  148.138448][ T5509] Local variable new_value.i@irq_affinity_proc_write 
created at:
  [  148.140768][ T5509]  irq_affinity_proc_write+0xc3/0x3d0
  [  148.142298][ T5509]  irq_affinity_proc_write+0xc3/0x3d0
  [  148.143823][ T5509] =

Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(),
any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility
that find_next_bit() accesses uninitialized cpu mask variable. Fix this
problem by replacing alloc_cpumask_var() with zalloc_cpumask_var().

Signed-off-by: Tetsuo Handa 
---
 kernel/irq/proc.c| 4 ++--
 kernel/profile.c | 2 +-
 kernel/trace/trace.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 98138788cb04..7c5cd42df3b9 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -144,7 +144,7 @@ static ssize_t write_irq_affinity(int type, struct file 
*file,
if (!irq_can_set_affinity_usr(irq) || no_irq_affinity)
return -EIO;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
if (type)
@@ -238,7 +238,7 @@ static ssize_t default_affinity_write(struct file *file,
cpumask_var_t new_value;
int err;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(buffer, count, new_value);
diff --git a/kernel/profile.c b/kernel/profile.c
index 6f69a4195d56..c2ebddb5e974 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -430,7 +430,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file,
cpumask_var_t new_value;
int err;
 
-   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(buffer, count, new_value);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index eccb4e1187cc..7607dc8a20b2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4827,7 +4827,7 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
cpumask_var_t tracing_cpumask_new;
int err;
 
-   if (!alloc_cpumask_var(_cpumask_new, GFP_KERNEL))
+   if (!zalloc_cpumask_var(_cpumask_new, GFP_KERNEL))
return -ENOMEM;
 
err = cpumask_parse_user(ubuf, count, tracing_cpumask_new);
-- 
2.18.4

[PATCH] misc: vmw_vmci: initialize payload passed to vmci_send_datagram()

2021-03-31 Thread Tetsuo Handa

  [   21.549639][T1]  ? driver_attach+0xb0/0xb0
  [   21.549639][T1]  driver_attach+0x89/0xb0
  [   21.549639][T1]  bus_add_driver+0x677/0xc40
  [   21.549639][T1]  driver_register+0x485/0x8e0
  [   21.549639][T1]  __pci_register_driver+0x1ff/0x350
  [   21.549639][T1]  vmci_guest_init+0x3e/0x41
  [   21.549639][T1]  vmci_drv_init+0x1d6/0x43f
  [   21.549639][T1]  do_one_initcall+0x39c/0x9a0
  [   21.549639][T1]  ? null_init+0x11dc/0x11dc
  [   21.549639][T1]  ? kmsan_get_metadata+0x116/0x180
  [   21.549639][T1]  ? kmsan_get_shadow_origin_ptr+0x84/0xb0
  [   21.549639][T1]  ? null_init+0x11dc/0x11dc
  [   21.549639][T1]  do_initcall_level+0x1d7/0x259
  [   21.549639][T1]  do_initcalls+0x127/0x1cb
  [   21.549639][T1]  ? cpu_init_udelay+0xcf/0xcf
  [   21.549639][T1]  ? debug_boot_weak_hash_enable+0x61/0x61
  [   21.549639][T1]  do_basic_setup+0x33/0x36
  [   21.549639][T1]  kernel_init_freeable+0x29a/0x3ed
  [   21.549639][T1]  ? rest_init+0x1f0/0x1f0
  [   21.549639][T1]  kernel_init+0x1f/0x840
  [   21.549639][T1]  ? rest_init+0x1f0/0x1f0
  [   21.549639][T1]  ret_from_fork+0x1f/0x30
  [   21.549639][T1]
  [   21.549639][T1] Uninit was created at:
  [   21.549639][T1]  kmsan_internal_poison_shadow+0x5c/0xf0
  [   21.549639][T1]  kmsan_slab_alloc+0x8d/0xe0
  [   21.549639][T1]  kmem_cache_alloc+0x84f/0xe30
  [   21.549639][T1]  vmci_guest_probe_device+0xd11/0x1e70
  [   21.549639][T1]  pci_device_probe+0xab3/0xe70
  [   21.549639][T1]  really_probe+0xd16/0x24d0
  [   21.549639][T1]  driver_probe_device+0x29d/0x3a0
  [   21.549639][T1]  device_driver_attach+0x25a/0x490
  [   21.549639][T1]  __driver_attach+0x78c/0x840
  [   21.549639][T1]  bus_for_each_dev+0x210/0x340
  [   21.549639][T1]  driver_attach+0x89/0xb0
  [   21.549639][T1]  bus_add_driver+0x677/0xc40
  [   21.549639][T1]  driver_register+0x485/0x8e0
  [   21.549639][T1]  __pci_register_driver+0x1ff/0x350
  [   21.549639][T1]  vmci_guest_init+0x3e/0x41
  [   21.549639][T1]  vmci_drv_init+0x1d6/0x43f
  [   21.549639][T1]  do_one_initcall+0x39c/0x9a0
  [   21.549639][T1]  do_initcall_level+0x1d7/0x259
  [   21.549639][T1]  do_initcalls+0x127/0x1cb
  [   21.549639][T1]  do_basic_setup+0x33/0x36
  [   21.549639][T1]  kernel_init_freeable+0x29a/0x3ed
  [   21.549639][T1]  kernel_init+0x1f/0x840
  [   21.549639][T1]  ret_from_fork+0x1f/0x30
  [   21.549639][T1]
  [   21.549639][T1] Bytes 28-31 of 36 are uninitialized
  [   21.549639][T1] Memory access of size 36 starts at 8881675e5f00
  [   21.549639][T1] =
  [   21.639830][T1] Guest personality initialized and is active
  [   21.642165][T1] VMCI host device registered (name=vmci, major=10, 
minor=121)

Signed-off-by: Tetsuo Handa 
---
 drivers/misc/vmw_vmci/vmci_doorbell.c | 2 +-
 drivers/misc/vmw_vmci/vmci_guest.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c 
b/drivers/misc/vmw_vmci/vmci_doorbell.c
index 345addd9306d..fa8a7fce4481 100644
--- a/drivers/misc/vmw_vmci/vmci_doorbell.c
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.c
@@ -326,7 +326,7 @@ int vmci_dbell_host_context_notify(u32 src_cid, struct 
vmci_handle handle)
 bool vmci_dbell_register_notification_bitmap(u64 bitmap_ppn)
 {
int result;
-   struct vmci_notify_bm_set_msg bitmap_set_msg;
+   struct vmci_notify_bm_set_msg bitmap_set_msg = { };
 
bitmap_set_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
  VMCI_SET_NOTIFY_BITMAP);
diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index cc8eeb361fcd..1018dc77269d 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -168,7 +168,7 @@ static int vmci_check_host_caps(struct pci_dev *pdev)
VMCI_UTIL_NUM_RESOURCES * sizeof(u32);
struct vmci_datagram *check_msg;
 
-   check_msg = kmalloc(msg_size, GFP_KERNEL);
+   check_msg = kzalloc(msg_size, GFP_KERNEL);
if (!check_msg) {
dev_err(>dev, "%s: Insufficient memory\n", __func__);
return -ENOMEM;
-- 
2.18.4

Re: [PATCH v5] lockdep: Allow tuning tracing capacity constants.

2021-03-31 Thread Tetsuo Handa

Peter, are you there?

If you keep silence, we will assume that applying this patch is the way to go.

On 2021/03/20 16:34, Dmitry Vyukov wrote:
> On Mon, Feb 8, 2021 at 11:29 AM Tetsuo Handa
>  wrote:
>>
>> Since syzkaller continues various test cases until the kernel crashes,
>> syzkaller tends to examine more locking dependencies than normal systems.
>> As a result, syzbot is reporting that the fuzz testing was terminated
>> due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis
>> via /proc/lockdep* did not show any obvious culprit [4] [5], we have no
>> choice but allow tuning tracing capacity constants.
>>
>> [1] 
>> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
>> [2] 
>> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
>> [3] 
>> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
>> [4] 
>> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
>> [5] 
>> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp
>>
>> Reported-by: syzbot 
>> Reported-by: syzbot 
>> Reported-by: syzbot 
>> References: 
>> https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
>> Signed-off-by: Tetsuo Handa 
>> Acked-by: Dmitry Vyukov 
> 
> Peter, ping.
> Please at least provide some feedback.
> This is still the top crasher on syzbot and harms testing of the whole
> kernel. I am periodically thinking of disabling LOCKDEP on syzbot as
> harming more than helping, but so far talking myself out of it because
> it will likely be broken more when we try to re-enable it and I still
> hope for a timely resolution of this issue.
>

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-24 Thread Tetsuo Handa

On 2021/03/24 20:10, Mimi Zohar wrote:
> On Wed, 2021-03-24 at 19:10 +0900, Tetsuo Handa wrote:
>> On 2021/03/24 1:13, Mimi Zohar wrote:
>>> On Wed, 2021-03-24 at 00:14 +0900, Tetsuo Handa wrote:
>>>> On 2021/03/23 23:47, Mimi Zohar wrote:
>>>>> Initially I also questioned making "integrity" an LSM.  Perhaps it's
>>>>> time to reconsider.   For now, it makes sense to just fix the NULL
>>>>> pointer dereferencing.
>>>>
>>>> Do we think calling panic() as "fix the NULL pointer dereferencing" ?
>>>
>>> Not supplying "integrity" as an "lsm=" option is a user error.  There
>>> are only two options - allow or deny the caller to proceed.   If the
>>> user is expecting the integrity subsystem to be properly working,
>>> returning a NULL and allowing the system to boot (RFC patch version)
>>> does not make sense.   Better to fail early.
>>
>> What does the "user" mean? Those who load the vmlinux?
>> Only the "root" user (so called administrators)?
>> Any users including other than "root" user?
>>
>> If the user means those who load the vmlinux, that user is explicitly asking
>> for disabling "integrity" for some reason. In that case, it is a bug if
>> booting with "integrity" disabled is impossible.
>>
>> If the user means something other than those who load the vmlinux,
>> is there a possibility that that user (especially non "root" users) is
>> allowed to try to use "integrity" ? If processes other than global init
>> process can try to use "integrity", wouldn't it be a DoS attack vector?
>> Please explain in the descripotion why calling panic() does not cause
>> DoS attack vector.
> 
> User in this case, is anyone rebooting the system and is intentionally
> changing the default values, dropping the "integrity" option on the
> boot command line.

OK. Then, I expect that the system boots instead of calling panic().
That user is explicitly asking for disabling "integrity" for some reason.

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-24 Thread Tetsuo Handa

On 2021/03/24 1:13, Mimi Zohar wrote:
> On Wed, 2021-03-24 at 00:14 +0900, Tetsuo Handa wrote:
>> On 2021/03/23 23:47, Mimi Zohar wrote:
>>> Initially I also questioned making "integrity" an LSM.  Perhaps it's
>>> time to reconsider.   For now, it makes sense to just fix the NULL
>>> pointer dereferencing.
>>
>> Do we think calling panic() as "fix the NULL pointer dereferencing" ?
> 
> Not supplying "integrity" as an "lsm=" option is a user error.  There
> are only two options - allow or deny the caller to proceed.   If the
> user is expecting the integrity subsystem to be properly working,
> returning a NULL and allowing the system to boot (RFC patch version)
> does not make sense.   Better to fail early.

What does the "user" mean? Those who load the vmlinux?
Only the "root" user (so called administrators)?
Any users including other than "root" user?

If the user means those who load the vmlinux, that user is explicitly asking
for disabling "integrity" for some reason. In that case, it is a bug if
booting with "integrity" disabled is impossible.

If the user means something other than those who load the vmlinux,
is there a possibility that that user (especially non "root" users) is
allowed to try to use "integrity" ? If processes other than global init
process can try to use "integrity", wouldn't it be a DoS attack vector?
Please explain in the descripotion why calling panic() does not cause
DoS attack vector.

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-23 Thread Tetsuo Handa

On 2021/03/23 23:47, Mimi Zohar wrote:
> Initially I also questioned making "integrity" an LSM.  Perhaps it's
> time to reconsider.   For now, it makes sense to just fix the NULL
> pointer dereferencing.

Do we think calling panic() as "fix the NULL pointer dereferencing" ?

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-23 Thread Tetsuo Handa

On 2021/03/23 22:37, Tetsuo Handa wrote:
> On 2021/03/23 21:09, Mimi Zohar wrote:
>> Please take a look at the newer version of this patch.   Do you want to
>> add any tags?
> 
> Oh, I didn't know that you already posted the newer version.
> 
>> diff --git a/security/integrity/iint.c b/security/integrity/iint.c
>> index 1d20003243c3..0ba01847e836 100644
>> --- a/security/integrity/iint.c
>> +++ b/security/integrity/iint.c
>> @@ -98,6 +98,14 @@ struct integrity_iint_cache *integrity_inode_get(struct 
>> inode *inode)
>>  struct rb_node *node, *parent = NULL;
>>  struct integrity_iint_cache *iint, *test_iint;
>>  
>> +/*
>> + * The integrity's "iint_cache" is initialized at security_init(),
>> + * unless it is not included in the ordered list of LSMs enabled
>> + * on the boot command line.
>> + */
>> +if (!iint_cache)
>> +panic("%s: lsm=integrity required.\n", __func__);
>> +
> 
> This looks strange. If "lsm=" parameter must include "integrity",
> it implies that nobody is allowed to disable "integrity" at boot.
> Then, why not unconditionally call integrity_iintcache_init() by
> not counting on DEFINE_LSM(integrity) declaration?

Or, I think below one is also possible.

diff --git a/security/integrity/iint.c b/security/integrity/iint.c
index 1d20003243c3..37afc5168891 100644
--- a/security/integrity/iint.c
+++ b/security/integrity/iint.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "integrity.h"
 
 static struct rb_root integrity_iint_tree = RB_ROOT;
@@ -85,6 +86,20 @@ static void iint_free(struct integrity_iint_cache *iint)
kmem_cache_free(iint_cache, iint);
 }
 
+static void init_once(void *foo)
+{
+   struct integrity_iint_cache *iint = foo;
+
+   memset(iint, 0, sizeof(*iint));
+   iint->ima_file_status = INTEGRITY_UNKNOWN;
+   iint->ima_mmap_status = INTEGRITY_UNKNOWN;
+   iint->ima_bprm_status = INTEGRITY_UNKNOWN;
+   iint->ima_read_status = INTEGRITY_UNKNOWN;
+   iint->ima_creds_status = INTEGRITY_UNKNOWN;
+   iint->evm_status = INTEGRITY_UNKNOWN;
+   mutex_init(>mutex);
+}
+
 /**
  * integrity_inode_get - find or allocate an iint associated with an inode
  * @inode: pointer to the inode
@@ -102,6 +117,18 @@ struct integrity_iint_cache *integrity_inode_get(struct 
inode *inode)
if (iint)
return iint;
 
+   if (!iint_cache) {
+   static DEFINE_MUTEX(lock);
+   unsigned int flags = memalloc_nofs_save();
+
+   mutex_lock();
+   if (!iint_cache)
+   iint_cache = kmem_cache_create("iint_cache",
+  sizeof(struct 
integrity_iint_cache),
+  0, SLAB_PANIC, 
init_once);
+   mutex_unlock();
+   memalloc_nofs_restore(flags);
+   }
iint = kmem_cache_alloc(iint_cache, GFP_NOFS);
if (!iint)
return NULL;
@@ -150,25 +177,8 @@ void integrity_inode_free(struct inode *inode)
iint_free(iint);
 }
 
-static void init_once(void *foo)
-{
-   struct integrity_iint_cache *iint = foo;
-
-   memset(iint, 0, sizeof(*iint));
-   iint->ima_file_status = INTEGRITY_UNKNOWN;
-   iint->ima_mmap_status = INTEGRITY_UNKNOWN;
-   iint->ima_bprm_status = INTEGRITY_UNKNOWN;
-   iint->ima_read_status = INTEGRITY_UNKNOWN;
-   iint->ima_creds_status = INTEGRITY_UNKNOWN;
-   iint->evm_status = INTEGRITY_UNKNOWN;
-   mutex_init(>mutex);
-}
-
 static int __init integrity_iintcache_init(void)
 {
-   iint_cache =
-   kmem_cache_create("iint_cache", sizeof(struct integrity_iint_cache),
- 0, SLAB_PANIC, init_once);
return 0;
 }
 DEFINE_LSM(integrity) = {

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-23 Thread Tetsuo Handa

On 2021/03/23 21:09, Mimi Zohar wrote:
> Please take a look at the newer version of this patch.   Do you want to
> add any tags?

Oh, I didn't know that you already posted the newer version.

> diff --git a/security/integrity/iint.c b/security/integrity/iint.c
> index 1d20003243c3..0ba01847e836 100644
> --- a/security/integrity/iint.c
> +++ b/security/integrity/iint.c
> @@ -98,6 +98,14 @@ struct integrity_iint_cache *integrity_inode_get(struct 
> inode *inode)
>   struct rb_node *node, *parent = NULL;
>   struct integrity_iint_cache *iint, *test_iint;
>  
> + /*
> +  * The integrity's "iint_cache" is initialized at security_init(),
> +  * unless it is not included in the ordered list of LSMs enabled
> +  * on the boot command line.
> +  */
> + if (!iint_cache)
> + panic("%s: lsm=integrity required.\n", __func__);
> +

This looks strange. If "lsm=" parameter must include "integrity",
it implies that nobody is allowed to disable "integrity" at boot.
Then, why not unconditionally call integrity_iintcache_init() by
not counting on DEFINE_LSM(integrity) declaration?

>   iint = integrity_iint_find(inode);
>   if (iint)
>   return iint;
>

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-22 Thread Tetsuo Handa

On 2021/03/20 5:03, Mimi Zohar wrote:
> The integrity's "iint_cache" is initialized at security_init().  Only
> after an IMA policy is loaded, which is initialized at late_initcall,
> is a file's integrity status stored in the "iint_cache".
> 
> All integrity_inode_get() callers first verify that the IMA policy has
> been loaded, before calling it.  Yet for some reason, it is still being
> called, causing a NULL pointer dereference.
> 
> qemu-system-x86_64 (...snipped...) lsm=smack (...snipped...)

Hmm, why are you using lsm=smack instead of security=smack ?
Since use of lsm= overrides 
CONFIG_LSM="lockdown,yama,safesetid,integrity,tomoyo,smack,bpf" settings,
only smack is activated, which means that integrity_iintcache_init() will not 
be called by

  DEFINE_LSM(integrity) = {
.name = "integrity",
.init = integrity_iintcache_init,
  };

declaration. That's the reason iint_cache == NULL when integrity_inode_get() is 
called.

Re: [RFC PATCH 2/2] integrity: double check iint_cache was initialized

2021-03-22 Thread Tetsuo Handa

On 2021/03/20 5:03, Mimi Zohar wrote:
> The integrity's "iint_cache" is initialized at security_init().  Only
> after an IMA policy is loaded, which is initialized at late_initcall,
> is a file's integrity status stored in the "iint_cache".
> 
> All integrity_inode_get() callers first verify that the IMA policy has
> been loaded, before calling it.  Yet for some reason, it is still being
> called, causing a NULL pointer dereference.
> 
> As reported by Dmitry Vyukov:
> in qemu:
> qemu-system-x86_64   -enable-kvm -machine q35,nvdimm -cpu
> max,migratable=off -smp 4   -m 4G,slots=4,maxmem=16G-hda
> wheezy.img  -kernel arch/x86/boot/bzImage   -nographic -vga std
>  -soundhw all -usb -usbdevice tablet  -bt hci -bt device:keyboard
>-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net
> nic,model=virtio-net-pci   -object
> memory-backend-file,id=pmem1,share=off,mem-path=/dev/zero,size=64M
>   -device nvdimm,id=nvdimm1,memdev=pmem1  -append "console=ttyS0
> root=/dev/sda earlyprintk=serial rodata=n oops=panic panic_on_warn=1
> panic=86400 lsm=smack numa=fake=2 nopcid dummy_hcd.num=8"   -pidfile
> vm_pid -m 2G -cpu host
> 

I tried similar command line (without "-enable-kvm" and without "-cpu host"
as I'm running from VMware, without "-soundhw all", without "-machine 
q35,nvdimm"
and "-device nvdimm,id=nvdimm1,memdev=pmem1" etc.) on 5.12-rc4. While I was 
finally
able to hit similar crash when I used "-smp 1" instead of "-smp 4", I suspect
this is not a integrity module's problem but a memory initialization/corruption
problem, for I got various different crashes (INT3) at memory allocation when
I was trimming command line options trying to reproduce the same crash.

Dmitry, do you get different crashes by changing command line arguments?

[tip: perf/urgent] lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message

2021-03-21 Thread tip-bot2 for Tetsuo Handa

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 3a85969e9d912d5dd85362ee37b5f81266e00e77
Gitweb:
https://git.kernel.org/tip/3a85969e9d912d5dd85362ee37b5f81266e00e77
Author:Tetsuo Handa 
AuthorDate:Sun, 21 Mar 2021 15:49:13 +09:00
Committer: Ingo Molnar 
CommitterDate: Sun, 21 Mar 2021 11:59:57 +01:00

lockdep: Add a missing initialization hint to the "INFO: Trying to register 
non-static key" message

Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.

[ mingo: Minor tweaks to the message. ]

Signed-off-by: Tetsuo Handa 
Signed-off-by: Ingo Molnar 
Link: 
https://lore.kernel.org/r/20210321064913.4619-1-penguin-ker...@i-love.sakura.ne.jp
Signed-off-by: Ingo Molnar 
---
 kernel/locking/lockdep.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c6d0c1d..c30eb88 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -930,7 +930,8 @@ static bool assign_lock_key(struct lockdep_map *lock)
/* Debug-check: all keys must be persistent! */
debug_locks_off();
pr_err("INFO: trying to register non-static key.\n");
-   pr_err("the code is fine but needs lockdep annotation.\n");
+   pr_err("The code is fine but needs lockdep annotation, or 
maybe\n");
+   pr_err("you didn't initialize this object before use?\n");
pr_err("turning off the locking correctness validator.\n");
dump_stack();
return false;

[PATCH] lockdep: add a hint for "INFO: trying to register non-static key." message

2021-03-21 Thread Tetsuo Handa

Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest developers to check whether initialization functions for objects
are called, before making developers wonder what annotation is missing.

Signed-off-by: Tetsuo Handa 
---
 kernel/locking/lockdep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c6d0c1dc6253..44c549f5c061 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -931,6 +931,7 @@ static bool assign_lock_key(struct lockdep_map *lock)
debug_locks_off();
pr_err("INFO: trying to register non-static key.\n");
pr_err("the code is fine but needs lockdep annotation.\n");
+   pr_err("maybe you didn't initialize this object before you 
use?\n");
pr_err("turning off the locking correctness validator.\n");
dump_stack();
return false;
-- 
2.18.4

Re: possible deadlock in start_this_handle (2)

2021-03-20 Thread Tetsuo Handa

Peter, would you answer to 6 questions listed below?

Below is a reproducer kernel module (prefixed with "my_" for distinction) for
https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba .

-- test.c --
#include 
#include 
#include <../fs/ext4/ext4.h>

static struct lockdep_map my__fs_reclaim_map = 
STATIC_LOCKDEP_MAP_INIT("my_fs_reclaim", __fs_reclaim_map);

static noinline void my_journal_init_common(journal_t *journal)
{
static struct lock_class_key my_jbd2_trans_commit_key;

lockdep_init_map(>j_trans_commit_map, "my_jbd2_handle", 
_jbd2_trans_commit_key, 0);
}

static noinline void my_init_once(void *foo)
{
struct ext4_inode_info *my_ei = (struct ext4_inode_info *) foo;

init_rwsem(_ei->xattr_sem);
}

static int __init lockdep_test_init(void)
{
journal_t *journal1 = kzalloc(sizeof(*journal1), GFP_KERNEL | 
__GFP_NOFAIL);
struct ext4_inode_info *ei1 = kzalloc(sizeof(*ei1), GFP_KERNEL | 
__GFP_NOFAIL);
struct ext4_inode_info *ei2 = kzalloc(sizeof(*ei2), GFP_KERNEL | 
__GFP_NOFAIL);

my_journal_init_common(journal1);
my_init_once(ei1);
my_init_once(ei2);

rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_);
down_write(>xattr_sem);
up_write(>xattr_sem);
rwsem_release(>j_trans_commit_map, _THIS_IP_);

down_write(>xattr_sem);
lock_map_acquire(__fs_reclaim_map);
lock_map_release(__fs_reclaim_map);
up_write(>xattr_sem);

lock_map_acquire(__fs_reclaim_map);
rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_);
rwsem_release(>j_trans_commit_map, _THIS_IP_);
lock_map_release(__fs_reclaim_map);

return -EINVAL;
}

module_init(lockdep_test_init);
MODULE_LICENSE("GPL");
-- test.c --

-- dmesg --
[   32.938906][ T2776] test: loading out-of-tree module taints kernel.
[   32.940200][ T2776]
[   32.940240][ T2776] ==
[   32.940306][ T2776] WARNING: possible circular locking dependency detected
[   32.940373][ T2776] 5.12.0-rc3+ #846 Tainted: G   O
[   32.940434][ T2776] --
[   32.940500][ T2776] insmod/2776 is trying to acquire lock:
[   32.940557][ T2776] 9f1d438d98e0 (my_jbd2_handle){.+.+}-{0:0}, at: 
lockdep_test_init+0x0/0x1000 [test]
[   32.940651][ T2776]
[   32.940651][ T2776] but task is already holding lock:
[   32.940719][ T2776] c0631020 (my_fs_reclaim){+.+.}-{0:0}, at: 
lockdep_test_init+0x0/0x1000 [test]
[   32.940808][ T2776]
[   32.940808][ T2776] which lock already depends on the new lock.
[   32.940808][ T2776]
[   32.940897][ T2776]
[   32.940897][ T2776] the existing dependency chain (in reverse order) is:
[   32.940976][ T2776]
[   32.940976][ T2776] -> #2 (my_fs_reclaim){+.+.}-{0:0}:
[   32.941053][ T2776]lock_acquire+0xb3/0x380
[   32.941112][ T2776]lockdep_test_init+0xe2/0x1000 [test]
[   32.941176][ T2776]do_one_initcall+0x58/0x2c0
[   32.941234][ T2776]do_init_module+0x5b/0x210
[   32.941291][ T2776]load_module+0x1884/0x19a0
[   32.941347][ T2776]__do_sys_finit_module+0xc1/0x120
[   32.941408][ T2776]__x64_sys_finit_module+0x15/0x20
[   32.941469][ T2776]do_syscall_64+0x31/0x40
[   32.941527][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   32.941594][ T2776]
[   32.941594][ T2776] -> #1 (_ei->xattr_sem){+.+.}-{3:3}:
[   32.941673][ T2776]lock_acquire+0xb3/0x380
[   32.941728][ T2776]down_write+0x52/0x620
[   32.941783][ T2776]lockdep_test_init+0xa2/0x1000 [test]
[   32.941846][ T2776]do_one_initcall+0x58/0x2c0
[   32.941904][ T2776]do_init_module+0x5b/0x210
[   32.941960][ T2776]load_module+0x1884/0x19a0
[   32.942016][ T2776]__do_sys_finit_module+0xc1/0x120
[   32.942077][ T2776]__x64_sys_finit_module+0x15/0x20
[   32.942137][ T2776]do_syscall_64+0x31/0x40
[   32.942193][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   32.942260][ T2776]
[   32.942260][ T2776] -> #0 (my_jbd2_handle){.+.+}-{0:0}:
[   32.942336][ T2776]check_prevs_add+0x16a/0x1040
[   32.942395][ T2776]__lock_acquire+0x118b/0x1580
[   32.942453][ T2776]lock_acquire+0xb3/0x380
[   32.942509][ T2776]lockdep_test_init+0x140/0x1000 [test]
[   32.942574][ T2776]do_one_initcall+0x58/0x2c0
[   32.942631][ T2776]do_init_module+0x5b/0x210
[   32.942687][ T2776]load_module+0x1884/0x19a0
[   32.942743][ T2776]__do_sys_finit_module+0xc1/0x120
[   32.942803][ T2776]__x64_sys_finit_module+0x15/0x20
[   32.942915][ T2776]do_syscall_64+0x31/0x40
[   32.942973][ T2776]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   32.943041][ T2776]
[   32.943041][ T2776] other info that might help us debug this:
[   32.943041][ T2776]
[

Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

2021-03-19 Thread Tetsuo Handa

On 2021/03/12 0:54, Marco Elver wrote:
>> But the more we could have the compiler automatically figure out
>> things without needing an explicit tag, it would seem to me that this
>> would be better, since manual tagging is going to be more error-prone.
> 
> What you're alluding to here would go much further than a data race
> detector ("data race" is still just defined by the memory model). The
> wish that there was a static analysis tool that would automatically
> understand the "concurrency semantics as intended by the developer" is
> something that'd be nice to have, but just doesn't seem realistic.
> Because how can a tool tell what the developer intended, without input
> from that developer?

Input from developers is very important for not only compilers and tools
but also allowing bug-explorers to understand what is happening.
ext4 currently has

  possible deadlock in start_this_handle (2)
  https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba

which even maintainers cannot understand what is happening.
How can bug-explorers know implicit logic which maintainers believe safe and 
correct?
It is possible that some oversight in implicit logic is the cause of
"possible deadlock in start_this_handle (2)".
Making implicit assumptions clear helps understanding.

Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed 
by marking?
syzbot is already waiting for
"KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" 
at
https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .

> 
> If there's worry marking accesses is error-prone, then that might be a
> signal that the concurrency design is too complex (or the developer
> hasn't considered all cases).
> 
> For that reason, we need to mark accesses to tell the compiler and
> tooling where to expect concurrency, so that 1) the compiler generates
> correct code, and 2) tooling such as KCSAN can double-check what the
> developer intended is actually what's happening.

and 3) bug-explorers can understand what the developers are assuming/missing.

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-18 Thread Tetsuo Handa

On 2021/03/18 22:13, Shuah Khan wrote:
> Please don't review code that isn't sent upstream. This repo you are
> looking at is a private branch created just to verify fixes on syzbot.

But nobody was able to review this series when sent to ML (except you simply
ignored my questions), and this series was merged to upstream and stable kernels
despite there was an obvious assignment error which results in NULL pointer
dereference.

> 
> I understand the race window you are talking about. I have my way of
> working to resolve it.

Without an effort to share your understanding/idea and my understanding/idea,
nobody can test/review what you call a solution.

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-17 Thread Tetsuo Handa

On 2021/03/18 0:06, Shuah Khan wrote:
> Yes. I haven't sent the patch for that reason. I am trying to test a
> solution. I haven't come up with a solution yet.
> 
> Holding event_lock isn't the right solution. I am not going to accept
> that. This is a window that gets triggered by syzbot injecting errors
> in a sequence. Fixing this should be done taking other moving parts of
> the driver into account.

What is event_lock ? I questioned you what the event_lock is at
https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp
 ,
but you haven't responded to that post.

Also, you need to send 
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test=f345de0d2e51a20a2a1c30fc22fa1527670d2095
because Greg already sent this regression into upstream and stable kernels.

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-17 Thread Tetsuo Handa

Shuah, this driver is getting more and more cryptic and buggy.
Please explain the strategy for serialization before you write patches.

> - Fix attach_store() to check usbip_event_happened() before
>   waking up threads.

No, this helps nothing.

> diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
> index c4b4256e5dad3..f0a770adebd97 100644
> --- a/drivers/usb/usbip/vhci_sysfs.c
> +++ b/drivers/usb/usbip/vhci_sysfs.c
> @@ -418,6 +418,15 @@ static ssize_t attach_store(struct device *dev, struct 
> device_attribute *attr,
>   spin_unlock_irqrestore(>lock, flags);
>   /* end the lock */
>  
> + if (usbip_event_happened(>ud)) {
> + /*
> +  * something went wrong - event handler shutting
> +  * the connection and doing reset - bail out
> +  */
> + dev_err(dev, "Event happended - handler is active\n");
> + return -EAGAIN;
> + }
> +

detach_store() can queue shutdown event as soon as reaching "/* end the lock 
*/" line
but attach_store() might be preempted immediately after verifying that
usbip_event_happened() was false (i.e. at this location) in order to wait for
shutdown event posted by detach_store() to be processed.

>   wake_up_process(vdev->ud.tcp_rx);
>   wake_up_process(vdev->ud.tcp_tx);
>

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-14 Thread Tetsuo Handa

On 2021/03/13 9:48, Tetsuo Handa wrote:
> On 2021/03/12 14:44, Tetsuo Handa wrote:
>> And what you are missing in your [PATCH 4,5,6/6] is
>>
>>   diff --git a/drivers/usb/usbip/vhci_sysfs.c 
>> b/drivers/usb/usbip/vhci_sysfs.c
>>   index c4457026d5ad..3c64bd06ab53 100644
>>   --- a/drivers/usb/usbip/vhci_sysfs.c
>>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>>   @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct 
>> device_attribute *attr,
>>   /* end the lock */
>>   
>>   wake_up_process(vdev->ud.tcp_rx);
>>   +   schedule_timeout_uninterruptible(HZ); // Consider being preempted 
>> here.
>>   wake_up_process(vdev->ud.tcp_tx);
>>   
>>   rh_port_connect(vdev, speed);
>>
>> . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before 
>> wake_up_process(tcp_tx) is called.
> 
> wake_up_process(tcp_rx) can call vhci_shutdown_connection() before 
> wake_up_process(tcp_tx) is called.
> 
>> Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx 
>> memory via kthread_stop_put(tcp_tx),
>> wake_up_process(tcp_tx) will access already freed memory. Your patch 
>> converted "NULL pointer dereference caused by
>> failing to call kthread_stop_put(tcp_tx)" into "use after free caused by 
>> succeeding to call kthread_stop_put(tcp_tx)".
>>
> 
> And note that
> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..0e1a81d4632c 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, 
> struct device_attribute *attr,
>   spin_unlock_irqrestore(>lock, flags);
>   /* end the lock */
>   
>   -   wake_up_process(vdev->ud.tcp_rx);
>   -   wake_up_process(vdev->ud.tcp_tx);
>   -
>   rh_port_connect(vdev, speed);
>   
>   +   wake_up_process(vdev->ud.tcp_tx);
>   +   wake_up_process(vdev->ud.tcp_rx);
>   +
>   return count;
>}
>static DEVICE_ATTR_WO(attach);
> 
> is as well not sufficient, for you are still missing

Well, since tx thread can as well call usbip_event_add(USBIP_EH_SHUTDOWN), 
reversing
the order of wake_up_process(tcp_tx) and wake_up_process(tcp_rx) will not help.

> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..c958f89a9196 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, 
> struct device_attribute *attr,
>   spin_unlock_irqrestore(>lock, flags);
>   /* end the lock */
>   
>   -   wake_up_process(vdev->ud.tcp_rx);
>   -   wake_up_process(vdev->ud.tcp_tx);
>   +   schedule_timeout_uninterruptible(HZ); // Consider being preempted 
> here.
>   
>   rh_port_connect(vdev, speed);
>   
>   +   wake_up_process(vdev->ud.tcp_tx);
>   +   wake_up_process(vdev->ud.tcp_rx);
>   +
>   return count;
>}
>static DEVICE_ATTR_WO(attach);
> 
> because vhci_port_disconnect() from detach_store() can call 
> usbip_event_add(>ud, VDEV_EVENT_DOWN)
> (same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared 
> states are set up and
> spinlocks are released.
> 
> What you had better consider first is how to protect 
> event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions
>  from concurrent calls. Please respond to 
> https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp
> before you try to make further changes.
> 

After all, I believe that there is no choice but introduce a mutex for 
serialization.

Greg, please pick up 
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test=f345de0d2e51a20a2a1c30fc22fa1527670d2095
and below patch.

>From e0579aa776e4a3568c06f767c193d2204b64679d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Sun, 14 Mar 2021 20:24:16 +0900
Subject: [PATCH v5] usb: usbip: serialize attach/detach operations

The root problem syzbot has found [1] is that usbip module is not using
serialization between attach/detach operations and event_handler().
This results in the following race windows.

  (1) two userspace processes can perform attach operation on the same
  device by writing the same content to the same attach interface
  file

  (2) one userspace process can perform detach operation on a device by
  writing to detach interface file while the other use

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-12 Thread Tetsuo Handa

On 2021/03/12 14:44, Tetsuo Handa wrote:
> And what you are missing in your [PATCH 4,5,6/6] is
> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..3c64bd06ab53 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct 
> device_attribute *attr,
>   /* end the lock */
>   
>   wake_up_process(vdev->ud.tcp_rx);
>   +   schedule_timeout_uninterruptible(HZ); // Consider being preempted 
> here.
>   wake_up_process(vdev->ud.tcp_tx);
>   
>   rh_port_connect(vdev, speed);
> 
> . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before 
> wake_up_process(tcp_tx) is called.

wake_up_process(tcp_rx) can call vhci_shutdown_connection() before 
wake_up_process(tcp_tx) is called.

> Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx 
> memory via kthread_stop_put(tcp_tx),
> wake_up_process(tcp_tx) will access already freed memory. Your patch 
> converted "NULL pointer dereference caused by
> failing to call kthread_stop_put(tcp_tx)" into "use after free caused by 
> succeeding to call kthread_stop_put(tcp_tx)".
> 

And note that

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..0e1a81d4632c 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
  spin_unlock_irqrestore(>lock, flags);
  /* end the lock */
  
  -   wake_up_process(vdev->ud.tcp_rx);
  -   wake_up_process(vdev->ud.tcp_tx);
  -
  rh_port_connect(vdev, speed);
  
  +   wake_up_process(vdev->ud.tcp_tx);
  +   wake_up_process(vdev->ud.tcp_rx);
  +
  return count;
   }
   static DEVICE_ATTR_WO(attach);

is as well not sufficient, for you are still missing

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..c958f89a9196 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
  spin_unlock_irqrestore(>lock, flags);
  /* end the lock */
  
  -   wake_up_process(vdev->ud.tcp_rx);
  -   wake_up_process(vdev->ud.tcp_tx);
  +   schedule_timeout_uninterruptible(HZ); // Consider being preempted 
here.
  
  rh_port_connect(vdev, speed);
  
  +   wake_up_process(vdev->ud.tcp_tx);
  +   wake_up_process(vdev->ud.tcp_rx);
  +
  return count;
   }
   static DEVICE_ATTR_WO(attach);

because vhci_port_disconnect() from detach_store() can call 
usbip_event_add(>ud, VDEV_EVENT_DOWN)
(same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared 
states are set up and
spinlocks are released.

What you had better consider first is how to protect 
event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions
 from concurrent calls. Please respond to 
https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd...@i-love.sakura.ne.jp
before you try to make further changes.

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-11 Thread Tetsuo Handa

I cloned git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git as you 
are testing changes there.

> commit 09e4522c87ff54c655c09f318a68b012eda8eb01 (HEAD -> usbip_test, 
> origin/usbip_test)
> Author: Shuah Khan 
> Date:   Thu Mar 11 11:18:25 2021 -0700
>
>usbip: fix vhci races in connection tear down
>
>- Change vhci_device_reset() to reset tcp_socket and sockfd.
>  if !in_disconnect

How it can happen? vhci_device_reset() can be called only after 
vhci_shutdown_connection()
completed, and vhci_shutdown_connection() from subsequent requests cannot be 
called until
vhci_device_reset() completes. I consider it as a dead code which should be 
removed by
my "[PATCH v4 05/12] usb: usbip: don't reset tcp_socket at vhci_device_reset()".

And what you are missing in your [PATCH 4,5,6/6] is

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..3c64bd06ab53 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
  /* end the lock */
  
  wake_up_process(vdev->ud.tcp_rx);
  +   schedule_timeout_uninterruptible(HZ); // Consider being preempted 
here.
  wake_up_process(vdev->ud.tcp_tx);
  
  rh_port_connect(vdev, speed);

. wake_up_process(tcp_tx) can call vhci_shutdown_connection() before 
wake_up_process(tcp_tx) is called.
Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx 
memory via kthread_stop_put(tcp_tx),
wake_up_process(tcp_tx) will access already freed memory. Your patch converted 
"NULL pointer dereference caused by
failing to call kthread_stop_put(tcp_tx)" into "use after free caused by 
succeeding to call kthread_stop_put(tcp_tx)".

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-11 Thread Tetsuo Handa

On 2021/03/11 21:57, Greg KH wrote:
> On Thu, Mar 11, 2021 at 09:34:38PM +0900, Tetsuo Handa wrote:
>> On 2021/03/11 3:33, Greg KH wrote:
>>> On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
>>>> This patch series fixes the following problems founds in syzbot
>>>> fuzzing.
>>>
>>> Thanks for these, all now queued up.
>>
>> I send SIGSTOP to
>>
>>   [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
>>   [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf
>>   [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf
>>
>> because these patches merely converted NULL pointer dererefence bug to 
>> use-after-free bug
>> by breaking kthread_get_run() into 
>> kthread_create()/get_task_struct()/wake_up_process().
> 
> I'll take follow-on patches to fix that other issue, if it's proven to
> be valid.  It's nice to fix up NULL dereference issues as soon as
> possible :)

Not an "other issue". Shuah's [PATCH 4,5,6/6] is failing to fix NULL pointer 
dereference issue.
These patches simply replaces NULL pointer dereference issue (caused by 
preemption) with
use after free issue (caused by exactly same preemption) issue. Shuah has to 
understand
the consequence of calling wake_up_process() on rx thread in order to fix this 
NULL pointer
dereference issue.

The only fix we can safely apply now is
https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp
 .
Since I and Shuah agreed that we will remove kthread_get_run(), it is nice to 
fix up
frequently happening -EINTR pointer dereference issue as soon as possible.

Re: [PATCH 0/6] usbip fixes to crashes found by syzbot

2021-03-11 Thread Tetsuo Handa

On 2021/03/11 3:33, Greg KH wrote:
> On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
>> This patch series fixes the following problems founds in syzbot
>> fuzzing.
> 
> Thanks for these, all now queued up.

I send SIGSTOP to

  [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
  [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf
  [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf

because these patches merely converted NULL pointer dererefence bug to 
use-after-free bug
by breaking kthread_get_run() into 
kthread_create()/get_task_struct()/wake_up_process().

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-10 Thread Tetsuo Handa

On 2021/03/10 11:07, Shuah Khan wrote:
> On 3/9/21 6:02 PM, Tetsuo Handa wrote:
>> On 2021/03/10 9:29, Shuah Khan wrote:
>>>> It is not a large grain lock. Since event_handler() is exclusively 
>>>> executed, this lock
>>>> does _NOT_ block event_handler() unless attach/detach operations run 
>>>> concurrently.
>>>>
>>>
>>> event handler queues the events. It shouldn't be blocked by attach
>>> and detach. The events could originate for various reasons during
>>> the host and vhci operations. I don't like using this lock for
>>> attach and detach.
>>
>> How can attach/detach deadlock event_handler()?
>> event_handler() calls e.g. vhci_shutdown_connection() via 
>> ud->eh_ops.shutdown(ud).
>> vhci_shutdown_connection() e.g. waits for termination of tx/rx threads via 
>> kthread_stop_put().
>> event_handler() is already blocked by detach operation.
>> How it can make situation worse to wait for creation of tx/rx threads in 
>> attach operation?
>>
> 
> event_lock shouldn't be held during event ops. usbip_event_add()
> uses it to add events. Protecting shutdown path needs a different
> approach.

I can't understand what you are talking about.

event_lock is defined as

  static DEFINE_SPINLOCK(event_lock);

in drivers/usb/usbip/usbip_event.c and usbip_event_add() uses it like

  spin_lock_irqsave(_lock, flags);
  spin_unlock_irqrestore(_lock, flags);

, but so what? I know event_lock spinlock cannot be taken when calling

  ud->eh_ops.shutdown(ud);
  ud->eh_ops.reset(ud);
  ud->eh_ops.unusable(ud);

in event_handler() because e.g. vhci_shutdown_connection() can sleep.

What my [PATCH v4 01/12] is suggesting is usbip_event_mutex which is defined as

  static DEFINE_MUTEX(usbip_event_mutex);

in drivers/usb/usbip/usbip_event.c and held when calling

  ud->eh_ops.shutdown(ud);
  ud->eh_ops.reset(ud);
  ud->eh_ops.unusable(ud);

in event_handler(). Since event_handler() is executed exclusively via
singlethreaded workqueue, "event_handler() holds usbip_event_mutex" itself
does not introduce a new lock dependency.

My question is, how holding usbip_event_mutex can cause deadlock if
usbip_sockfd_store()/attach_store()/detach_store() also hold usbip_event_mutex .

kthread_create() is essentially several kmalloc(GFP_KERNEL) where 
event_handler()
cannot interfere unless event_handler() serves as a memory reclaiming function.
My question again. What functions might sleep and hold locks other than
kthread_create() for tx/rx threads?

Your answer to my question (if you identified such dependency) will go into
patch shown bottom which replaces my [PATCH v4 01/12]-[PATCH v4 04/12] patches.

> 
> In any case, do you have comments on this patch which doesn't even
> touch vhci driver?

I'm just replying to your [PATCH 4/6] because all your [PATCH 4/6]-[PATCH 6/6]
patches have the same oversight.

> 
> I understand you are identifying additional race condition that
> the vhci patches in this series might not fix. That doesn't mean
> that these patches aren't valid.
> 
> Do you have any comments specific to the patches in this series?

Your [PATCH 5/6] is still racy regarding rh_port_connect() versus 
rh_port_disconnect().
As soon as you call wake_up_process(), rh_port_disconnect() can be called before
rh_port_connect() is called. Since you don't like serializing event_handler() 
and
usbip_sockfd_store()/attach_store()/detach_store() using usbip_event_mutex, my
patch shown below which uses attach_detach_lock mutex cannot close such race 
window.
Ideally, wake_up_process() should be deferred to after releasing 
attach_detach_lock,
but please answer to my question first.

From: Tetsuo Handa 
Date: Wed, 10 Mar 2021 18:31:54 +0900

syzbot is reporting a NULL pointer dereference at sock_sendmsg() [1], for
lack of serialization between attach_store() and event_handler() causes
vhci_shutdown_connection() to observe vdev->ud.tcp_tx == NULL while
vdev->ud.tcp_socket != NULL. Please read the reference link for details of
this race window.

While usbip module exclusively runs event_handler(), usbip module has never
thought the possibility that multiple userspace processes can concurrently
call usbip_sockfd_store()/attach_store()/detach_store(). As a result, there
is a TOCTTOU bug in these functions regarding ud->status value which can
result in similar crashes like [1].

Simplest way would be to run all of
event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions
exclusively using a global mutex. But Shuah Khan does not want to share a
global mutex between these functions, for ...[please fill in this part]... .

Therefore, this patch introduces a per-submodule local mutex which closes
race window within usbip_sockfd_store() and attach_store()/detach_store().

This loc

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-09 Thread Tetsuo Handa

On 2021/03/10 9:29, Shuah Khan wrote:
>> It is not a large grain lock. Since event_handler() is exclusively executed, 
>> this lock
>> does _NOT_ block event_handler() unless attach/detach operations run 
>> concurrently.
>>
>>>
> 
> event handler queues the events. It shouldn't be blocked by attach
> and detach. The events could originate for various reasons during
> the host and vhci operations. I don't like using this lock for
> attach and detach.

How can attach/detach deadlock event_handler()?
event_handler() calls e.g. vhci_shutdown_connection() via 
ud->eh_ops.shutdown(ud).
vhci_shutdown_connection() e.g. waits for termination of tx/rx threads via 
kthread_stop_put().
event_handler() is already blocked by detach operation.
How it can make situation worse to wait for creation of tx/rx threads in attach 
operation?

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-09 Thread Tetsuo Handa

On 2021/03/10 8:52, Shuah Khan wrote:
> On 3/9/21 4:40 PM, Tetsuo Handa wrote:
>> On 2021/03/10 4:50, Shuah Khan wrote:
>>> On 3/9/21 4:04 AM, Tetsuo Handa wrote:
>>>> On 2021/03/09 1:27, Shuah Khan wrote:
>>>>> Yes. We might need synchronization between events, threads, and shutdown
>>>>> in usbip_host side and in connection polling and threads in vhci.
>>>>>
>>>>> I am also looking at the shutdown sequences closely as well since the
>>>>> local state is referenced without usbip_device lock in these paths.
>>>>>
>>>>> I am approaching these problems as peeling the onion an expression so
>>>>> we can limit the changes and take a spot fix approach. We have the
>>>>> goal to address these crashes and not introduce regressions.
>>>>
>>>> I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes
>>>> without introducing regressions. While ud->lock is held when checking 
>>>> ud->status,
>>>> current attach/detach code is racy about read/update of ud->status . I 
>>>> think we
>>>> can close race in attach/detach code via a simple usbip_event_mutex 
>>>> serialization.
>>>>
>>>
>>> Do you mean patches 1,2,3,3,4,5,6?
>>
>> Yes, my 1,2,3,4,5,6.
>>
>> Since you think that usbip_prepare_threads() does not worth introducing, I'm 
>> fine with
>> replacing my 7,8,9,10,11,12 with your "[PATCH 0/6] usbip fixes to crashes 
>> found by syzbot".
>>
> 
> Using event lock isn't the right approach to solve the race. It is a
> large grain lock. I am not looking to replace patches.

It is not a large grain lock. Since event_handler() is exclusively executed, 
this lock
does _NOT_ block event_handler() unless attach/detach operations run 
concurrently.

> 
> I still haven't seen any response from you about if you were able to
> verify the fixes I sent in fix the problem you are seeing.

I won't be able to verify your fixes, for it is syzbot who is seeing the 
problem.
But I can see that your patch description is wrong because you are ignoring 
what I'm commenting.

Global serialization had better come first. Your patch description depends on 
global serialization.

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-09 Thread Tetsuo Handa

On 2021/03/10 4:50, Shuah Khan wrote:
> On 3/9/21 4:04 AM, Tetsuo Handa wrote:
>> On 2021/03/09 1:27, Shuah Khan wrote:
>>> Yes. We might need synchronization between events, threads, and shutdown
>>> in usbip_host side and in connection polling and threads in vhci.
>>>
>>> I am also looking at the shutdown sequences closely as well since the
>>> local state is referenced without usbip_device lock in these paths.
>>>
>>> I am approaching these problems as peeling the onion an expression so
>>> we can limit the changes and take a spot fix approach. We have the
>>> goal to address these crashes and not introduce regressions.
>>
>> I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes
>> without introducing regressions. While ud->lock is held when checking 
>> ud->status,
>> current attach/detach code is racy about read/update of ud->status . I think 
>> we
>> can close race in attach/detach code via a simple usbip_event_mutex 
>> serialization.
>>
> 
> Do you mean patches 1,2,3,3,4,5,6?

Yes, my 1,2,3,4,5,6.

Since you think that usbip_prepare_threads() does not worth introducing, I'm 
fine with
replacing my 7,8,9,10,11,12 with your "[PATCH 0/6] usbip fixes to crashes found 
by syzbot".

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-09 Thread Tetsuo Handa

On 2021/03/09 20:04, Tetsuo Handa wrote:
> On 2021/03/09 1:27, Shuah Khan wrote:
>> Yes. We might need synchronization between events, threads, and shutdown
>> in usbip_host side and in connection polling and threads in vhci.
>>
>> I am also looking at the shutdown sequences closely as well since the
>> local state is referenced without usbip_device lock in these paths.
>>
>> I am approaching these problems as peeling the onion an expression so
>> we can limit the changes and take a spot fix approach. We have the
>> goal to address these crashes and not introduce regressions.
> 
> I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes
> without introducing regressions. While ud->lock is held when checking 
> ud->status,
> current attach/detach code is racy about read/update of ud->status . I think 
> we
> can close race in attach/detach code via a simple usbip_event_mutex 
> serialization.
> 

Commit 04679b3489e048cd ("Staging: USB/IP: add client driver") says

  /*
   * The important thing is that only one context begins cleanup.
   * This is why error handling and cleanup become simple.
   * We do not want to consider race condition as possible.
   */
  static void vhci_shutdown_connection(struct usbip_device *ud)

but why are we allowing multiple contexts to begin startup?
That's where a subtle race window syzbot is reporting happened.
This driver has never thought the possibility that multiple userspace
processes can concurrently begin startup. Then, it is quite natural that
we make startup simple and safe, by enforcing that only one context
begins startup.

Without serialization between startup/cleanup/event_handler(),
"Fix the above problems:" in your changelog cannot become true.
First step should be closing time-of-check to time-of-use bug
via entire serialization as if "nonpreemptible UP kernel".

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-09 Thread Tetsuo Handa

On 2021/03/09 1:27, Shuah Khan wrote:
> Yes. We might need synchronization between events, threads, and shutdown
> in usbip_host side and in connection polling and threads in vhci.
> 
> I am also looking at the shutdown sequences closely as well since the
> local state is referenced without usbip_device lock in these paths.
> 
> I am approaching these problems as peeling the onion an expression so
> we can limit the changes and take a spot fix approach. We have the
> goal to address these crashes and not introduce regressions.

I think my [PATCH v4 01/12]-[PATCH v4 06/12] simplify your further changes
without introducing regressions. While ud->lock is held when checking 
ud->status,
current attach/detach code is racy about read/update of ud->status . I think we
can close race in attach/detach code via a simple usbip_event_mutex 
serialization.

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-08 Thread Tetsuo Handa

On 2021/03/08 16:35, Tetsuo Handa wrote:
> On 2021/03/08 12:53, Shuah Khan wrote:
>> Fix the above problems:
>> - Stop using kthread_get_run() macro to create/start threads.
>> - Create threads and get task struct reference.
>> - Add kthread_create() failure handling and bail out.
>> - Hold usbip_device lock to update local and shared states after
>>   creating rx and tx threads.
>> - Update usbip_device status to SDEV_ST_USED.
>> - Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx
>> - Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx,
>>   and status) is complete.
> 
> No, the whole usbip_sockfd_store() etc. should be serialized using a mutex,
> for two different threads can open same file and write the same content at
> the same moment. This results in seeing SDEV_ST_AVAILABLE and creating two
> threads and overwiting global variables and setting SDEV_ST_USED and starting
> two threads by each of two thread, which will later fail to call 
> kthread_stop()
> on one of two thread because global variables are overwritten.
> 
> kthread_crate() (which involves GFP_KERNEL allocation) can take long time
> enough to hit
> 
>   usbip_sockfd_store() must perform
> 
>   if (sdev->ud.status != SDEV_ST_AVAILABLE) {

Oops. This is

if (sdev->ud.status == SDEV_ST_AVAILABLE) {

of course.

> /* misc assignments for attach operation */
> sdev->ud.status = SDEV_ST_USED;
>   }
> 
>   under a lock, or multiple ud->tcp_{tx,rx} are created (which will later
>   cause a crash like [1]) and refcount on ud->tcp_socket is leaked when
>   usbip_sockfd_store() is concurrently called.
> 
> problem. That's why my patch introduced usbip_event_mutex lock.
> 

And I think that same serialization is required between "rh_port_connect() from 
attach_store()" and
"rh_port_disconnect() from vhci_shutdown_connection() via 
usbip_event_add(>ud, VDEV_EVENT_DOWN)
 from vhci_port_disconnect() from detach_store()", for both vhci_rx_pdu() from 
vhci_rx_loop() and
vhci_port_disconnect() from detach_store() can queue VDEV_EVENT_DOWN event 
which can be processed
without waiting for attach_store() to complete.

Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf

2021-03-07 Thread Tetsuo Handa

On 2021/03/08 12:53, Shuah Khan wrote:
> Fix the above problems:
> - Stop using kthread_get_run() macro to create/start threads.
> - Create threads and get task struct reference.
> - Add kthread_create() failure handling and bail out.
> - Hold usbip_device lock to update local and shared states after
>   creating rx and tx threads.
> - Update usbip_device status to SDEV_ST_USED.
> - Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx
> - Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx,
>   and status) is complete.

No, the whole usbip_sockfd_store() etc. should be serialized using a mutex,
for two different threads can open same file and write the same content at
the same moment. This results in seeing SDEV_ST_AVAILABLE and creating two
threads and overwiting global variables and setting SDEV_ST_USED and starting
two threads by each of two thread, which will later fail to call kthread_stop()
on one of two thread because global variables are overwritten.

kthread_crate() (which involves GFP_KERNEL allocation) can take long time
enough to hit

  usbip_sockfd_store() must perform

  if (sdev->ud.status != SDEV_ST_AVAILABLE) {
/* misc assignments for attach operation */
sdev->ud.status = SDEV_ST_USED;
  }

  under a lock, or multiple ud->tcp_{tx,rx} are created (which will later
  cause a crash like [1]) and refcount on ud->tcp_socket is leaked when
  usbip_sockfd_store() is concurrently called.

problem. That's why my patch introduced usbip_event_mutex lock.

Re: [PATCH] security: tomoyo: fix error return code of tomoyo_update_domain()

2021-03-06 Thread Tetsuo Handa

On 2021/03/06 22:03, Jia-Ju Bai wrote:
> When mutex_lock_interruptible() fails, the error return code of
> tomoyo_update_domain() is not properly assigned.
> To fix this bug, error is assigned with the return value of
> mutex_lock_interruptible(), and then error is checked.

Thanks for a patch, but this patch is wrong.
Since the variable "error" is initialized as

  int error = is_delete ? -ENOENT : -ENOMEM;

at the beginning of this function, unconditionally overwriting
this variable with the return code of mutex_lock_interruptible() breaks

  if (error && !is_delete) {
  }

block of this function.

And the caller does not check if the return code is -EINTR
instead of -ENOENT or -ENOMEM.

Re: possible deadlock in start_this_handle (2)

2021-02-19 Thread Tetsuo Handa

On 2021/02/15 23:29, Jan Kara wrote:
> On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:
>> On 2021/02/15 21:45, Jan Kara wrote:
>>> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
>>>> Excuse me, but it seems to me that nothing prevents
>>>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
>>>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
>>>> Will you explain when ext4_get_nojournal() path is executed?
>>>
>>> That's a good question but sadly I don't think that's it.
>>> ext4_get_nojournal() is called when the filesystem is created without a
>>> journal. In that case we also don't acquire jbd2_handle lockdep map. In the
>>> syzbot report we can see:
>>
>> Since syzbot can test filesystem images, syzbot might have tested a 
>> filesystem
>> image created both with and without journal within this boot.
> 
> a) I think that syzbot reboots the VM between executing different tests to
> get reproducible conditions. But in theory I agree the test may have
> contained one image with and one image without a journal.

syzkaller reboots the VM upon a crash.
syzkaller can test multiple filesystem images within one boot.

https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this
statistic snapshot is volatile) reports that ext4_get_nojournal() is partially 
covered
( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by 
syzkaller.

  /* Just increment the non-pointer handle value */
  static handle_t *ext4_get_nojournal(void)
  {
   86 handle_t *handle = current->journal_info;
  unsigned long ref_cnt = (unsigned long)handle;
  
  BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);
  
   86 ref_cnt++;
  handle = (handle_t *)ref_cnt;
  
  current->journal_info = handle;
 2006 return handle;
  }
  
  
  /* Decrement the non-pointer handle value */
  static void ext4_put_nojournal(handle_t *handle)
  {
  unsigned long ref_cnt = (unsigned long)handle;
  
   85 BUG_ON(ref_cnt == 0);
  
   85 ref_cnt--;
  handle = (handle_t *)ref_cnt;
  
  current->journal_info = handle;
  }


  handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int 
line,
int type, int blocks, int rsv_blocks,
int revoke_creds)
  {
  journal_t *journal;
  int err;
  
 2006 trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds,
 2006  _RET_IP_);
 2006 err = ext4_journal_check_start(sb);
  if (err < 0)
  return ERR_PTR(err);
  
 2005 journal = EXT4_SB(sb)->s_journal;
 1969 if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
 2006 return ext4_get_nojournal();
 1969 return jbd2__journal_start(journal, blocks, rsv_blocks, 
revoke_creds,
 GFP_NOFS, type, line);
  }

> 
> *but*
> 
> b) as I wrote in the email you are replying to, the jbd2_handle key is
> private per filesystem. Thus for lockdep to complain about
> jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
> maps must come from the same filesystem.
> 
> *and*
> 
> c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
> for such filesystems lockdep creates no dependency for jbd2_handle map.
> 

What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case?
Does this case happen on filesystem with journal, and can this case be executed
by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are
the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for
calling jbd2__journal_start() case the same?

Also, I worry that jbd2__journal_restart() is problematic, for it calls
stop_this_handle() (which calls memalloc_nofs_restore()) and then calls
start_this_handle() (which fails to call memalloc_nofs_save() if an error
occurs). An error from start_this_handle() from journal restart operation
might need special handling (i.e. either remount-ro or panic) ?

Re: possible deadlock in start_this_handle (2)

2021-02-15 Thread Tetsuo Handa

On 2021/02/15 21:45, Jan Kara wrote:
> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
>> Excuse me, but it seems to me that nothing prevents
>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
>> Will you explain when ext4_get_nojournal() path is executed?
> 
> That's a good question but sadly I don't think that's it.
> ext4_get_nojournal() is called when the filesystem is created without a
> journal. In that case we also don't acquire jbd2_handle lockdep map. In the
> syzbot report we can see:

Since syzbot can test filesystem images, syzbot might have tested a filesystem
image created both with and without journal within this boot.

> 
> kswapd0/2246 is trying to acquire lock:
> 888041a988e0 (jbd2_handle){}-{0:0}, at: 
> start_this_handle+0xf81/0x1380 fs/jbd2/transaction.c:444
> 
> but task is already holding lock:
> 8be892c0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x30 
> mm/page_alloc.c:5195
> 
> So this filesystem has very clearly been created with a journal. Also the
> journal lockdep tracking machinery uses:

While locks held by kswapd0/2246 are fs_reclaim, shrinker_rwsem, 
>s_umount_key#38
and jbd2_handle, isn't the dependency lockdep considers problematic is

  Chain exists of:
jbd2_handle --> >xattr_sem --> fs_reclaim

   Possible unsafe locking scenario:

 CPU0CPU1
 
lock(fs_reclaim);
 lock(>xattr_sem);
 lock(fs_reclaim);
lock(jbd2_handle);

where CPU0 is kswapd/2246 and CPU1 is the case of ext4_get_nojournal() path?
If someone has taken jbd2_handle and >xattr_sem in this order, isn't this
dependency true?

> 
> rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_);
> 
> so a lockdep key is per-filesystem. Thus it is not possible that lockdep
> would combine lock dependencies from two different filesystems.
> 
> But I guess we could narrow the search for this problem by adding WARN_ONs
> to ext4_xattr_set_handle() and ext4_xattr_inode_lookup_create() like:
> 
> WARN_ON(ext4_handle_valid(handle) && !(current->flags & PF_MEMALLOC_NOFS));
> 
> It would narrow down a place in which PF_MEMALLOC_NOFS flag isn't set
> properly... At least that seems like the most plausible way forward to me.

You can use CONFIG_DEBUG_AID_FOR_SYZBOT for adding such WARN_ONs on linux-next.

[PATCH] pstore: fix warning in pstore_kill_sb()

2021-02-13 Thread Tetsuo Handa

syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the
assumption that pstore_sb != NULL is wrong because pstore_fill_super() will
not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL
(due to memory allocation fault injection).

Since mount_single() calls pstore_kill_sb() when pstore_fill_super()
failed, pstore_kill_sb() needs to be aware of such failure path.

[1] 
https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0

Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
---
 fs/pstore/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 93a217e4f563..14658b009f1b 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -467,7 +467,7 @@ static struct dentry *pstore_mount(struct file_system_type 
*fs_type,
 static void pstore_kill_sb(struct super_block *sb)
 {
mutex_lock(_sb_lock);
-   WARN_ON(pstore_sb != sb);
+   WARN_ON(pstore_sb && pstore_sb != sb);
 
kill_litter_super(sb);
pstore_sb = NULL;
-- 
2.18.4

Re: possible deadlock in start_this_handle (2)

2021-02-13 Thread Tetsuo Handa

On 2021/02/11 19:49, Jan Kara wrote:
> This stacktrace should never happen. ext4_xattr_set() starts a transaction.
> That internally goes through start_this_handle() which calls:
> 
>   handle->saved_alloc_context = memalloc_nofs_save();
> 
> and we restore the allocation context only in stop_this_handle() when
> stopping the handle. And with this fs_reclaim_acquire() should remove
> __GFP_FS from the mask and not call __fs_reclaim_acquire().

Excuse me, but it seems to me that nothing prevents ext4_xattr_set_handle() 
from reaching
ext4_xattr_inode_lookup_create() without memalloc_nofs_save() when hitting 
ext4_get_nojournal() path.
Will you explain when ext4_get_nojournal() path is executed?

ext4_xattr_set() {
  handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits) == 
__ext4_journal_start() {
  return __ext4_journal_start_sb() {
journal = EXT4_SB(sb)->s_journal;
if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
  return ext4_get_nojournal(); // Never calls memalloc_nofs_save() 
despite returning !IS_ERR() value.
return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, 
GFP_NOFS, type, line); // Calls memalloc_nofs_save() when start_this_handle() 
returns 0.
  }
}
  }
  error = ext4_xattr_set_handle(handle, inode, name_index, name, value, 
value_len, flags); {
ext4_write_lock_xattr(inode, _expand); // Grabs >xattr_sem
error = ext4_xattr_ibody_set(handle, inode, , ) {
  error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */) {
ret = ext4_xattr_inode_lookup_create(handle, inode, i->value, 
i->value_len, _ea_inode); // Using GFP_KERNEL based on assumption that 
ext4_journal_start() called memalloc_nofs_save().
  }
}
  }
}

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-02-13 Thread Tetsuo Handa

Greg, will you queue 
https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp
 (which can
close a report which is wasting syzbot's resource with 5300+ crashes) for 5.12 
? The change shown below will be
too large to test before merge window for 5.12 opens. 

The patch for fixing "general protection fault in 
tomoyo_socket_sendmsg_permission" will kill kthread_get_run().
Closing frequently crashing bug now is the better.

On 2021/02/11 22:40, Tetsuo Handa wrote:
> I guess that we need to serialize attach operation and reset/detach 
> operations, for
> it seems there is a race window that might result in "general protection 
> fault in
> tomoyo_socket_sendmsg_permission". Details follows...

Here is untested diff that is expected to be complete.

(1) Handle kthread_create() failure (which avoids "KASAN: null-ptr-deref Write 
in vhci_shutdown_connection")
by grouping socket lookup, SOCK_STREAM check and kthread_get_run() into 
usbip_prepare_threads() function.

(2) Serialize usbip_sockfd_store(), detach_store(), attach_store(), 
usbip_sockfd_store() and
ud->eh_ops.shutdown()/ud->eh_ops.reset()/ud->eh_ops.unusable() operations 
using usbip_store_mutex mutex
(which avoids "general protection fault in 
tomoyo_socket_sendmsg_permission").

(3) Don't reset ud->tcp_socket to NULL in vhci_device_reset(). Since 
tx_thread/rx_thread depends on
ud->tcp_socket != NULL whereas tcp_socket and tx_thread/rx_thread are 
assigned at the same time,
it is never safe to reset only ud->tcp_socket from ud->eh_ops.reset(). And 
actually nobody is
calling ud->eh_ops.reset() without ud->eh_ops.shutdown().

(4) usbip_sockfd_store() must perform {sdev,udc}->ud.status != 
SDEV_ST_AVAILABLE && {sdev,udc}->ud.status = SDEV_ST_USED
exclusively, or multiple tx_thread/rx_thread can be created when 
concurrently called. Although (2) will already
serialize this action, (1) will make it possible to perform within one 
spinlock section.

 drivers/usb/usbip/stub_dev.c | 56 ++--
 drivers/usb/usbip/usbip_common.c | 55 +++
 drivers/usb/usbip/usbip_common.h | 13 
 drivers/usb/usbip/usbip_event.c  |  9 +
 drivers/usb/usbip/vhci_hcd.c |  7 +---
 drivers/usb/usbip/vhci_sysfs.c   | 50 
 drivers/usb/usbip/vudc_sysfs.c   | 50 
 7 files changed, 175 insertions(+), 65 deletions(-)

diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c
index 2305d425e6c9..522d58826049 100644
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -39,12 +39,11 @@ static DEVICE_ATTR_RO(usbip_status);
  * is used to transfer usbip requests by kernel threads. -1 is a magic number
  * by which usbip connection is finished.
  */
-static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute 
*attr,
-   const char *buf, size_t count)
+static ssize_t __usbip_sockfd_store(struct device *dev, struct 
device_attribute *attr,
+   const char *buf, size_t count)
 {
struct stub_device *sdev = dev_get_drvdata(dev);
int sockfd = 0;
-   struct socket *socket;
int rv;

if (!sdev) {
@@ -57,7 +56,12 @@ static ssize_t usbip_sockfd_store(struct device *dev, struct 
device_attribute *a
return -EINVAL;

if (sockfd != -1) {
-   int err;
+   struct usbip_thread_info uti;
+   int err = usbip_prepare_threads(, >ud, sockfd,
+   stub_tx_loop, "stub_tx", 
stub_rx_loop, "stub_rx");
+
+   if (err)
+   return err;

dev_info(dev, "stub up\n");

@@ -65,44 +69,46 @@ static ssize_t usbip_sockfd_store(struct device *dev, 
struct device_attribute *a

if (sdev->ud.status != SDEV_ST_AVAILABLE) {
dev_err(dev, "not ready\n");
-   goto err;
+   spin_unlock_irq(>ud.lock);
+   usbip_unprepare_threads();
+   return -EINVAL;
}

-   socket = sockfd_lookup(sockfd, );
-   if (!socket)
-   goto err;
-
-   sdev->ud.tcp_socket = socket;
+   sdev->ud.tcp_socket = uti.tcp_socket;
sdev->ud.sockfd = sockfd;
-
-   spin_unlock_irq(>ud.lock);
-
-   sdev->ud.tcp_rx = kthread_get_run(stub_rx_loop, >ud,
- "stub_rx");
-   sdev->ud.tcp_tx = kthread_get_run(stub_tx_loop, >ud,
- "stub_tx");
-
-   spin_lock_irq(&g

Re: possible deadlock in start_this_handle (2)

2021-02-12 Thread Tetsuo Handa

On 2021/02/12 22:12, Michal Hocko wrote:
> On Fri 12-02-21 21:58:15, Tetsuo Handa wrote:
>> On 2021/02/12 21:30, Michal Hocko wrote:
>>> On Fri 12-02-21 12:22:07, Matthew Wilcox wrote:
>>>> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote:
>>>>> On 2021/02/12 1:41, Michal Hocko wrote:
>>>>>> But I suspect we have drifted away from the original issue. I thought
>>>>>> that a simple check would help us narrow down this particular case and
>>>>>> somebody messing up from the IRQ context didn't sound like a completely
>>>>>> off.
>>>>>>
>>>>>
>>>>>  From my experience at 
>>>>> https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp
>>>>>  ,
>>>>> I think we can replace direct PF_* manipulation with macros which do not 
>>>>> receive "struct task_struct *" argument.
>>>>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for 
>>>>> manipulating PFA_* flags on a remote thread, we can
>>>>> define similar ones for manipulating PF_* flags on current thread. Then, 
>>>>> auditing dangerous users becomes easier.
>>>>
>>>> No, nobody is manipulating another task's GFP flags.
>>>
>>> Agreed. And nobody should be manipulating PF flags on remote tasks
>>> either.
>>>
>>
>> No. You are misunderstanding. The bug report above is an example of 
>> manipulating PF flags on remote tasks.
> 
> Could you be more specific? I do not remember there was any theory that
> somebody is manipulating flags on a remote task. A very vague theory was
> that an interrupt context might be doing that on the _current_ context
> but even that is not based on any real evidence. It is a pure
> speculation.
> 

Please read the link above. The report is an example of manipulating PF flags 
on a remote task.
You are thinking interrupt context as the only possible culprit, but you should 
also think
concurrent access as the other possible culprit.

Re: possible deadlock in start_this_handle (2)

2021-02-12 Thread Tetsuo Handa

On 2021/02/12 21:30, Michal Hocko wrote:
> On Fri 12-02-21 12:22:07, Matthew Wilcox wrote:
>> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote:
>>> On 2021/02/12 1:41, Michal Hocko wrote:
>>>> But I suspect we have drifted away from the original issue. I thought
>>>> that a simple check would help us narrow down this particular case and
>>>> somebody messing up from the IRQ context didn't sound like a completely
>>>> off.
>>>>
>>>
>>>  From my experience at 
>>> https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp
>>>  ,
>>> I think we can replace direct PF_* manipulation with macros which do not 
>>> receive "struct task_struct *" argument.
>>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating 
>>> PFA_* flags on a remote thread, we can
>>> define similar ones for manipulating PF_* flags on current thread. Then, 
>>> auditing dangerous users becomes easier.
>>
>> No, nobody is manipulating another task's GFP flags.
> 
> Agreed. And nobody should be manipulating PF flags on remote tasks
> either.
> 

No. You are misunderstanding. The bug report above is an example of 
manipulating PF flags on remote tasks.
You say "nobody should", but the reality is "there indeed was". There might be 
unnoticed others. The point of
this proposal is to make it possible to "find such unnoticed users who are 
manipulating PF flags on remote tasks".

Re: possible deadlock in start_this_handle (2)

2021-02-12 Thread Tetsuo Handa

On 2021/02/12 1:41, Michal Hocko wrote:
> But I suspect we have drifted away from the original issue. I thought
> that a simple check would help us narrow down this particular case and
> somebody messing up from the IRQ context didn't sound like a completely
> off.
> 

 From my experience at 
https://lkml.kernel.org/r/201409192053.ihj35462.jlomosoffvt...@i-love.sakura.ne.jp
 ,
I think we can replace direct PF_* manipulation with macros which do not 
receive "struct task_struct *" argument.
Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating 
PFA_* flags on a remote thread, we can
define similar ones for manipulating PF_* flags on current thread. Then, 
auditing dangerous users becomes easier.

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-02-11 Thread Tetsuo Handa

On 2021/02/12 11:22, Tetsuo Handa wrote:
> On 2021/02/12 10:34, Shuah Khan wrote:
>> On 2/10/21 6:14 PM, Tetsuo Handa wrote:
>>> (Dropping LSM ML because this is not a TOMOYO's bug.)
>>>
>>> On 2021/02/11 4:29, Shuah Khan wrote:
>>>> This is a good find. I already replied to the thread to send a complete
>>>> fix.
>>>
>>> As I said at 
>>> https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp
>>>  ,
>>> the as-is patch is effectively a complete fix. And applying the as-is patch 
>>> should help spending
>>> syzbot resources for reproducing "general protection fault in 
>>> tomoyo_socket_sendmsg_permission"
>>> with debug printk() patch applied, which in turn will help you in
>>>
>>>> Right. I would like to get a clear understanding of how this condition
>>>> is triggered. I am not saying this isn't a problem. Understanding how
>>>> it is triggered helps find the best fix.
>>>
>>> part. Therefore, I strongly expect you to apply this version now.
>>>
>>
>> Is there a reproducer for this problem?
> 
> There is no reproducer for "general protection fault in 
> tomoyo_socket_sendmsg_permission" problem, but
> the race condition is explained at 
> https://lkml.kernel.org/r/676d4518-0faa-9fab-15db-0db8d216d...@i-love.sakura.ne.jp
>  .
> 

Here is a race window widening patch, and I locally reproduced "general 
protection fault in tomoyo_socket_sendmsg_permission".

diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h
index 8be857a4fa13..a7c68097aa1d 100644
--- a/drivers/usb/usbip/usbip_common.h
+++ b/drivers/usb/usbip/usbip_common.h
@@ -286,6 +286,8 @@ struct usbip_device {
if (!IS_ERR(__k)) {\
get_task_struct(__k);  \
wake_up_process(__k);  \
+   } else {   \
+   __k = NULL;\
}  \
__k;   \
 })
diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
index be37aec250c2..93e1271d0f5d 100644
--- a/drivers/usb/usbip/vhci_sysfs.c
+++ b/drivers/usb/usbip/vhci_sysfs.c
@@ -389,8 +389,12 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
/* end the lock */
 
vdev->ud.tcp_rx = kthread_get_run(vhci_rx_loop, >ud, "vhci_rx");
-   vdev->ud.tcp_tx = kthread_get_run(vhci_tx_loop, >ud, "vhci_tx");
+   {
+   struct task_struct *tx = kthread_get_run(vhci_tx_loop, 
>ud, "vhci_tx");
 
+   schedule_timeout_uninterruptible(HZ);
+   vdev->ud.tcp_tx = tx;
+   }
rh_port_connect(vdev, speed);
 
return count;


[  134.880383][ T7879] vhci_hcd vhci_hcd.0: pdev(2) rhport(0) sockfd(6)
[  134.883465][ T7879] vhci_hcd vhci_hcd.0: devid(0) speed(2) 
speed_str(full-speed)
[  134.904750][   T74] vhci_hcd: vhci_device speed not set
[  134.933053][ T7873] vhci_hcd: connection closed
[  134.933212][ T7870] vhci_hcd: connection closed
[  134.933415][ T4488] vhci_hcd: stop threads
[  134.942970][ T4488] vhci_hcd: release socket
[  134.944949][   T14] vhci_hcd: vhci_device speed not set
[  134.947063][ T4488] vhci_hcd: disconnect device
[  134.951508][ T4488] vhci_hcd: stop threads
[  134.953861][ T4488] vhci_hcd: release socket
[  134.957113][ T4488] vhci_hcd: disconnect device
[  135.014433][   T56] usb 15-1: enqueue for inactive port 0
[  135.065119][ T7877] vhci_hcd: connection closed
[  135.065205][ T7880] vhci_hcd: connection closed
[  135.065370][ T4480] vhci_hcd: stop threads
[  135.072854][ T4480] vhci_hcd: release socket
[  135.076490][ T4480] vhci_hcd: disconnect device
[  135.079535][ T4480] vhci_hcd: stop threads
[  135.084024][ T4480] vhci_hcd: release socket
[  135.087979][ T4480] vhci_hcd: disconnect device
[  135.134526][ T6820] vhci_hcd: vhci_device speed not set
[  135.314959][ T6821] vhci_hcd: vhci_device speed not set
[  135.621271][ T7884] vhci_hcd vhci_hcd.0: pdev(4) rhport(0) sockfd(3)
[  135.621290][ T7885] vhci_hcd vhci_hcd.0: pdev(0) rhport(0) sockfd(3)
[  135.625072][ T7884] vhci_hcd vhci_hcd.0: devid(0) speed(5) 
speed_str(super-speed)
[  135.628665][ T7885] vhci_hcd vhci_hcd.0: devid(0) speed(5) 
speed_str(super-speed)
[  135.630109][ T7887] vhci_hcd vhci_hcd.0: pdev(1) rhport(0) sockfd(3)
[  135.630116][ T7887] vhci_hcd vhci_hcd.0: devid(0) speed(5) 
speed_str(super-speed)
[  135.672834][ T7895] vhci_hcd

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-02-11 Thread Tetsuo Handa

On 2021/02/12 10:34, Shuah Khan wrote:
> On 2/10/21 6:14 PM, Tetsuo Handa wrote:
>> (Dropping LSM ML because this is not a TOMOYO's bug.)
>>
>> On 2021/02/11 4:29, Shuah Khan wrote:
>>> This is a good find. I already replied to the thread to send a complete
>>> fix.
>>
>> As I said at 
>> https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp
>>  ,
>> the as-is patch is effectively a complete fix. And applying the as-is patch 
>> should help spending
>> syzbot resources for reproducing "general protection fault in 
>> tomoyo_socket_sendmsg_permission"
>> with debug printk() patch applied, which in turn will help you in
>>
>>> Right. I would like to get a clear understanding of how this condition
>>> is triggered. I am not saying this isn't a problem. Understanding how
>>> it is triggered helps find the best fix.
>>
>> part. Therefore, I strongly expect you to apply this version now.
>>
> 
> Is there a reproducer for this problem?

There is no reproducer for "general protection fault in 
tomoyo_socket_sendmsg_permission" problem, but
the race condition is explained at 
https://lkml.kernel.org/r/676d4518-0faa-9fab-15db-0db8d216d...@i-love.sakura.ne.jp
 .

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-02-10 Thread Tetsuo Handa

(Dropping LSM ML because this is not a TOMOYO's bug.)

On 2021/02/11 4:29, Shuah Khan wrote:
> This is a good find. I already replied to the thread to send a complete
> fix.

As I said at 
https://lkml.kernel.org/r/f8cae6b1-8f84-0e6a-7d9c-fc4aec68f...@i-love.sakura.ne.jp
 ,
the as-is patch is effectively a complete fix. And applying the as-is patch 
should help spending
syzbot resources for reproducing "general protection fault in 
tomoyo_socket_sendmsg_permission"
with debug printk() patch applied, which in turn will help you in

> Right. I would like to get a clear understanding of how this condition
> is triggered. I am not saying this isn't a problem. Understanding how
> it is triggered helps find the best fix.

part. Therefore, I strongly expect you to apply this version now.

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-02-10 Thread Tetsuo Handa

On 2021/02/11 3:17, Shuah Khan wrote:
> I am looking to understand the syzbot configuration and a reproducer
> to be able to debug and fix the problem. How is syzbot triggering the
> vhci_hcd attach and detach sequence?

I don't know. I'm waiting for syzbot to reproduce the problem on linux-next
with 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/usb/usbip?id=f1bdf414e7dd0cbc26460425719fc3ea479947a2
 .

> 
> This helps me determine all these fix suggestions that are coming in
> are fixes or papering over a real problem.

What are these fix suggestions?

"general protection fault in tomoyo_socket_sendmsg_permission" is a NULL pointer
dereference which can happen if vhci_device_reset() and/or vhci_device_init()
(which does vdev->ud.tcp_socket = NULL;) were unexpectedly called. There is no
reproducer, and (as far as I know) no fix suggestion.

"KASAN: null-ptr-deref Write in vhci_shutdown_connection" is an ERR_PTR(-EINTR)
pointer dereference which can happen if kthread_create() was SIGKILLed. There is
a reproducer, and 
https://lkml.kernel.org/r/20210205135707.4574-1-penguin-ker...@i-love.sakura.ne.jp
is a fix suggestion.

Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses

2021-02-10 Thread Tetsuo Handa

On 2021/02/11 1:54, Andy Shevchenko wrote:
> On Thu, Feb 11, 2021 at 01:39:41AM +0900, Tetsuo Handa wrote:
>> On 2021/02/11 1:18, Steven Rostedt wrote:
>>> The point of this exercise is to be able to debug the *same* kernel that
>>> someone is having issues with. And this is to facilitate that debugging.
>>
>> That's too difficult to use. If a problem is not reproducible, we will have
>> no choice but always specify "never hash pointers" command line option. If a
>> problem is reproducible, we can rebuild that kernel with "never hash 
>> pointers"
>> config option turned on.
> 
> I think what you are targeting is something like dynamic debug approach where
> you can choose which prints to enable/disable and what enable/disable in them.

What I'm targeting is "zero interaction from kernel command line options" like
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/usb/usbip?id=f1bdf414e7dd0cbc26460425719fc3ea479947a2
 .

> 
> In that case you specifically apply a command line option and enable only 
> files
> / lines in the files.

While there is boot-config feature for specifying very long kernel command line
options, I can't enforce syzkaller users (including syzbot) to switch what to
enable/disable via kernel command line options. Let alone defining a kernel
command line option for single-purpose debug printk() changes like shown above.

Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses

2021-02-10 Thread Tetsuo Handa

On 2021/02/11 1:46, Steven Rostedt wrote:
> On Thu, 11 Feb 2021 01:39:41 +0900
> Tetsuo Handa  wrote:
> 
>> On 2021/02/11 1:18, Steven Rostedt wrote:
>>> The point of this exercise is to be able to debug the *same* kernel that
>>> someone is having issues with. And this is to facilitate that debugging.  
>>
>> That's too difficult to use. If a problem is not reproducible, we will have
>> no choice but always specify "never hash pointers" command line option. If a
>> problem is reproducible, we can rebuild that kernel with "never hash 
>> pointers"
>> config option turned on.
> 
> Now the question is, why do you need the unhashed pointer?

Because unhashed pointers might give some clue. We can rebuild the same kernel
using the same kernel config / compiler etc. and compare unhashed pointers with
addresses in System.map / kallsyms files without reproducing the problem.

> 
> Currently, the instruction pointer is what is fine right? You get the
> a function and its offset. If there's something that is needed, perhaps we
> should look at how to fix that, instead of just unhashing all pointers by
> default.

I'm not refusing to use kernel command line options. I'm expecting that we can
also hardcode using kernel config options. Since boot-time switching via kernel
command line options makes the kernel behave differently, less boot-time
switching is better for avoiding unexpected problems (e.g. unintended LSM was
enabled).

Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses

2021-02-10 Thread Tetsuo Handa

On 2021/02/11 1:18, Steven Rostedt wrote:
> The point of this exercise is to be able to debug the *same* kernel that
> someone is having issues with. And this is to facilitate that debugging.

That's too difficult to use. If a problem is not reproducible, we will have
no choice but always specify "never hash pointers" command line option. If a
problem is reproducible, we can rebuild that kernel with "never hash pointers"
config option turned on.

Re: [PATCH 0/3][RESEND] add support for never printing hashed addresses

2021-02-10 Thread Tetsuo Handa

On 2021/02/10 14:18, Timur Tabi wrote:
> [accidentally sent from the wrong email address, so resending]
> 
> [The list of email addresses on CC: is getting quite lengthy,
> so I hope I've included everyone.]
> 
> Although hashing addresses printed via printk does make the
> kernel more secure, it interferes with debugging, especially
> with some functions like print_hex_dump() which always uses
> hashed addresses.

Oh, I was wishing

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b53c73580c5..34c7e145ac3c 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -802,7 +802,7 @@ static char *ptr_to_id(char *buf, char *end, const void 
*ptr,
 * Print the real pointer value for NULL and error pointers,
 * as they are not actual addresses.
 */
-   if (IS_ERR_OR_NULL(ptr))
+   if (IS_ERR_OR_NULL(ptr) || IS_ENABLED(CONFIG_DEBUG_DONT_HASH_POINTERS))
return pointer_string(buf, end, ptr, spec);
 
/* When debugging early boot use non-cryptographically secure hash. */

change as a kernel config option, for more we try to switch using kernel 
command line options,
more we likely make errors with sharing appropriate kernel command line options
(e.g. 
https://github.com/google/syzkaller/commit/99c64d5c672700d6c0de63d11db25a0678e47a75
 ).

[PATCH v5] lockdep: Allow tuning tracing capacity constants.

2021-02-08 Thread Tetsuo Handa

Since syzkaller continues various test cases until the kernel crashes,
syzkaller tends to examine more locking dependencies than normal systems.
As a result, syzbot is reporting that the fuzz testing was terminated
due to hitting upper limits lockdep can track [1] [2] [3]. Since analysis
via /proc/lockdep* did not show any obvious culprit [4] [5], we have no
choice but allow tuning tracing capacity constants.

[1] 
https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
[2] 
https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
[3] 
https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
[4] 
https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
[5] 
https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp

Reported-by: syzbot 
Reported-by: syzbot 
Reported-by: syzbot 
References: 
https://lkml.kernel.org/r/1595640639-9310-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa 
Acked-by: Dmitry Vyukov 
---
 kernel/locking/lockdep.c   |  2 +-
 kernel/locking/lockdep_internals.h |  8 +++---
 lib/Kconfig.debug  | 40 ++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bdaf4829098c..65b3777e8089 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this,
 /*
  * For good efficiency of modular, we use power of 2
  */
-#define MAX_CIRCULAR_QUEUE_SIZE4096UL
+#define MAX_CIRCULAR_QUEUE_SIZE(1UL << 
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS)
 #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1)
 
 /*
diff --git a/kernel/locking/lockdep_internals.h 
b/kernel/locking/lockdep_internals.h
index de49f9e1c11b..ecb8662e7a4e 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
 #define MAX_STACK_TRACE_ENTRIES262144UL
 #define STACK_TRACE_HASH_SIZE  8192
 #else
-#define MAX_LOCKDEP_ENTRIES32768UL
+#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS)
 
-#define MAX_LOCKDEP_CHAINS_BITS16
+#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS
 
 /*
  * Stack-trace: tightly packed array of stack backtrace
  * addresses. Protected by the hash_lock.
  */
-#define MAX_STACK_TRACE_ENTRIES524288UL
-#define STACK_TRACE_HASH_SIZE  16384
+#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS)
+#define STACK_TRACE_HASH_SIZE  (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)
 #endif
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7937265ef879..4cb84b499636 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1332,6 +1332,46 @@ config LOCKDEP
 config LOCKDEP_SMALL
bool
 
+config LOCKDEP_BITS
+   int "Bitsize for MAX_LOCKDEP_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 15
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too 
low!" message.
+
+config LOCKDEP_CHAINS_BITS
+   int "Bitsize for MAX_LOCKDEP_CHAINS"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 16
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too 
low!" message.
+
+config LOCKDEP_STACK_TRACE_BITS
+   int "Bitsize for MAX_STACK_TRACE_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 19
+   help
+ Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES 
too low!" message.
+
+config LOCKDEP_STACK_TRACE_HASH_BITS
+   int "Bitsize for STACK_TRACE_HASH_SIZE"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 14
+   help
+ Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES.
+
+config LOCKDEP_CIRCULAR_QUEUE_BITS
+   int "Bitsize for elements in circular_queue struct"
+   depends on LOCKDEP
+   range 10 30
+   default 12
+   help
+ Try increasing this value if you hit "lockdep bfs error:-1" warning 
due to __cq_enqueue() failure.
+
 config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP
-- 
2.18.4

Re: [PATCH v4 (resend)] lockdep: Allow tuning tracing capacity constants.

2021-02-01 Thread Tetsuo Handa

Hello, Andrew and Linus.

We are stuck because Peter cannot respond.
I think it is time to send this patch to linux-next. What do you think?

On 2021/01/20 19:18, Dmitry Vyukov wrote:
> On Wed, Jan 20, 2021 at 11:12 AM Tetsuo Handa
>  wrote:
>>
>> Since syzkaller continues various test cases until the kernel crashes,
>> syzkaller tends to examine more locking dependencies than normal systems.
>> As a result, syzbot is reporting that the fuzz testing was terminated
>> due to hitting upper limits lockdep can track [1] [2] [3].
>>
>> Peter Zijlstra does not want to allow tuning these limits via kernel
>> config options, for such change discourages thinking. But analysis via
>> /proc/lockdep* did not show any obvious culprit [4] [5]. It is possible
>> that many hundreds of kn->active lock instances are to some degree
>> contributing to these problems, but there is no means to verify whether
>> these instances are created for protecting same callback functions.
>> Unless Peter provides a way to make these instances per "which callback
>> functions the lock instance will call (identified by something like MD5
>> of string representations of callback functions which each lock instance
>> will protect)" than plain "serial number", I don't think that we can
>> verify the culprit.
>>
>> [1] 
>> https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
>> [2] 
>> https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
>> [3] 
>> https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
>> [4] 
>> https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
>> [5] 
>> https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp
>>
>> Reported-by: syzbot 
>> Reported-by: syzbot 
>> Reported-by: syzbot 
>> Signed-off-by: Tetsuo Handa 
>> Acked-by: Dmitry Vyukov 
> 
> Thanks for your persistence!
> I still support this. And assessment of lockdep stats on overflow
> seems to confirm it's just a very large lock graph triggered by
> syzkaller.
>

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-01-30 Thread Tetsuo Handa

On 2021/01/30 6:18, Shuah Khan wrote:
> In this console log:

It seems "this console log" refers to 
https://syzkaller.appspot.com/x/log.txt?x=1045303450 .

> 
> 06:57:50 executing program 1:
> socketpair$tipc(0x1e, 0x2, 0x0, &(0x7fc0)={0x})
> sendmsg$BATADV_CMD_GET_TRANSTABLE_LOCAL(r0, 
> &(0x7f0002c0)={&(0x7f0001c0), 0xc, &(0x7f000280)={0x0, 
> 0xd0010100}}, 0x0)
> 
> [ 1151.090883][T23361] vhci_hcd vhci_hcd.0: pdev(4) rhport(0) sockfd(4)
> [ 1151.097445][T23361] vhci_hcd vhci_hcd.0: devid(0) speed(1) 
> speed_str(low-speed)
> 06:57:50 executing program 0:
> r0 = syz_open_dev$binderN(&(0x7f000680)='/dev/binder#\x00', 0x0, 0x0)
> ioctl$BINDER_WRITE_READ(r0, 0xc0306201, &(0x7f000cc0)={0x88, 0x0, 
> &(0x7f000b80)=[@transaction={0x40406300, {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0, 0x0, 0x0}}, @transaction={0x40406300, {0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}], 0x0, 0x0, 0x0})
> 
> [ 1151.164402][T23363] vhci_hcd: connection closed
> [ 1151.167346][  T240] vhci_hcd: stop threads
> 
> 
> [ 1151.178329][T26761] usb 17-1: new low-speed USB device number 2 using 
> vhci_hcd
> 
> 
> SK: Looking at the console log, it looks like while connection is being
>     torn down,

Excuse me, but it looks like (what comes here) while connection is being torn 
down ?
I'm not familiar with driver code.

> 
> 
> [ 1151.181245][  T240] vhci_hcd: release socket
> 
> 
> Can you share your your test code for this program:
> "executing program 1"

I don't think program 1 is relevant. I think program 4

  06:57:50 executing program 4:
  r0 = socket$tipc(0x1e, 0x2, 0x0)
  syz_usbip_server_init(0x1)
  close_range(r0, 0x, 0x0)

which calls syz_usbip_server_init() as with other duplicates is relevant.

> 
> Also your setup? Do you run usbip_host and vhci_hcd both?

Who are you referring to with "you/your" ? I'm not running syzkaller in my setup
and I don't have test code.

I'm just proposing printing more messages in order to confirm the ordering of
events and member values in structures.

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-01-29 Thread Tetsuo Handa

On 2021/01/30 1:05, Shuah Khan wrote:
>> Since "general protection fault in tomoyo_socket_sendmsg_permission" is 
>> caused by
>> unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread 
>> to
>> terminate, tracing the ordering of events is worth knowing. Even adding
>> schedule_timeout_uninterruptible() to before kernel_sendmsg() might help.
>>
> 
> What about the duplicate bug information that was in my email?
> Did you take a look at that?

I was not aware of the duplicate bugs. It is interesting that
"KASAN: null-ptr-deref Write in event_handler" says that vdev->ud.tcp_tx became 
NULL at

if (vdev->ud.tcp_tx) {

/* this location */

kthread_stop_put(vdev->ud.tcp_tx);
vdev->ud.tcp_tx = NULL;
}

which means that somebody else is unexpectedly resetting vdev->ud.tcp_tx to 
NULL.

If memset() from vhci_device_init() from vhci_start() were unexpectedly called,
all of tcp_socket, tcp_rx, tcp_tx etc. becomes NULL which can explain these 
bugs ?
I'm inclined to report not only tcp_socket but also other fields when 
kernel_sendmsg()
detected that tcp_socket is NULL...

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-01-28 Thread Tetsuo Handa

On 2021/01/29 4:05, Shuah Khan wrote:
> The reason I don't like adding printk's is this is a race condition
> and as a result time sensitive. Adding printks in the path will not
> help debug this issue. It will make it harder to reproduce the problem.

Not always. Adding printk() might make it easier to reproduce the problem.

> 
> I am unable to reproduce the problem using the reproducer and running 
> multiple instances of the reproducer.

Since syzkaller cannot find a reproducer for "general protection fault in
tomoyo_socket_sendmsg_permission", and you cannot reproduce other problem
using reproducer, trying to obtain some clue via printing messages by asking
syzkaller to try debug patch can be very helpful.

Since "general protection fault in tomoyo_socket_sendmsg_permission" is caused 
by
unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread to
terminate, tracing the ordering of events is worth knowing. Even adding
schedule_timeout_uninterruptible() to before kernel_sendmsg() might help.

Re: [PATCH v2] smackfs: restrict bytes count in smackfs write functions

2021-01-28 Thread Tetsuo Handa

On 2021/01/28 22:27, Sabyrzhan Tasbolatov wrote:
>> Doesn't this change break legitimate requests like
>>
>>   char buffer[2];
>>
>>   memset(buffer, ' ', sizeof(buffer));
>>   memcpy(buffer + sizeof(buffer) - 10, "foo", 3);
>>   write(fd, buffer, sizeof(buffer));
>>
>> ?
> 
> It does, in this case. Then I need to patch another version with
> whitespace stripping before, after label. I just followed the same thing
> that I see in security/selinux/selinuxfs.c sel_write_enforce() etc.
> 
> It has the same memdup_user_nul() and count >= PAGE_SIZE check prior to that.

Since sel_write_enforce() accepts string representation of an integer value, 
PAGE_SIZE is sufficient.
But since smk_write_onlycap() and smk_write_relabel_self() accept list of 
space-delimited words,
you need to prove why PAGE_SIZE does not break userspace in your patch.

Also, due to the "too small to fail" memory-allocation rule, memdup_user_nul() 
for
count < PAGE_SIZE * 8 bytes is "never fails with -ENOMEM unless SIGKILLed by 
the OOM
killer". Also, memdup_user_nul() for count >= PAGE_SIZE * (1 << MAX_ORDER) - 1 
bytes is
"never succeeds". Thus, you can safely add

if (count >= PAGE_SIZE * (1 << MAX_ORDER) - 1)
return -EINVAL; // or -ENOMEM if you want compatibility

to smackfs write functions. But it is a strange requirement that the caller of
memdup_user_nul() has to be aware of upper limit in a way that we won't hit

/*
 * There are several places where we assume that the order value is sane
 * so bail out early if the request is out of bound.
 */
if (unlikely(order >= MAX_ORDER)) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
return NULL;
}

path. memdup_user_nul() side should do

if (count >= PAGE_SIZE * (1 << MAX_ORDER) - 1)
return -ENOMEM;

check and return -ENOMEM if memdup_user_nul() does not want to use __GFP_NOWARN.
I still believe that memdup_user_nul() side should be fixed.

Re: [PATCH v2] smackfs: restrict bytes count in smackfs write functions

2021-01-28 Thread Tetsuo Handa

On 2021/01/28 20:58, Sabyrzhan Tasbolatov wrote:
> @@ -2005,6 +2009,9 @@ static ssize_t smk_write_onlycap(struct file *file, 
> const char __user *buf,
>   if (!smack_privileged(CAP_MAC_ADMIN))
>   return -EPERM;
>  
> + if (count > PAGE_SIZE)
> + return -EINVAL;
> +
>   data = memdup_user_nul(buf, count);
>   if (IS_ERR(data))
>   return PTR_ERR(data);
> @@ -2740,10 +2754,13 @@ static ssize_t smk_write_relabel_self(struct file 
> *file, const char __user *buf,
>   return -EPERM;
>  
>   /*
> +  * No partial write.
>* Enough data must be present.
>*/
>   if (*ppos != 0)
>   return -EINVAL;
> + if (count == 0 || count > PAGE_SIZE)
> + return -EINVAL;
>  
>   data = memdup_user_nul(buf, count);
>   if (IS_ERR(data))
> 

Doesn't this change break legitimate requests like

  char buffer[2];

  memset(buffer, ' ', sizeof(buffer));
  memcpy(buffer + sizeof(buffer) - 10, "foo", 3);
  write(fd, buffer, sizeof(buffer));

?

Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-01-27 Thread Tetsuo Handa

On 2020/11/14 2:14, Shuah Khan wrote:
> On 11/13/20 5:00 AM, Hillf Danton wrote:
>> Thu, 12 Nov 2020 23:21:26 -0800
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    9dbc1c03 Merge tag 'xfs-5.10-fixes-3' of git://git.kernel...
>>> git tree:   upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1045303450
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1735b7978b1c3721
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=95ce4b142579611ef0a9
>>> compiler:   gcc (GCC) 10.1.0-syz 20200507
>>> userspace arch: i386
>>>
>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>
> 
> I would like to see the reproducer for this. I just can't reproduce
> this problem.
> 
>>
>> Fix 96c2737716d5 ("usbip: move usbip kernel code out of staging")
>> by closing the race between readers and writer of ud->tcp_socket on
>> sock shutdown. To do that, add waitqueue for usbip device and make
>> writer wait for all readers to go home before releasing the socket.

Worrysome part for me is vhci_device_reset() which resets ud->tcp_socket to NULL
without waiting for tx thread to terminate, though I don't know if
vhci_device_reset() can be called while tx thread is running.

I'd like to try below debug printk() patch on linx-next tree, for this bug is
reported on linux.git and linux-next.git trees. Which git tree can be used for
sending this to-be-removed patch to linux-next.git ? I wish there is a kernel
config for fuzzers in linux.git so that every git tree can carry debug printk()
patch for syzbot's reports...

Subject: [PATCH] usb: usbip: vhci_hcd: add printk() for debug

This is linux-next only patch for examining a bug reported at
https://syzkaller.appspot.com/bug?id=3e1d941a31361efc4ced2ba8b4af2044d4e43c59 .

Signed-off-by: Tetsuo Handa 
---
 drivers/usb/usbip/stub_dev.c   |  6 ++
 drivers/usb/usbip/vhci_hcd.c   | 11 +++
 drivers/usb/usbip/vhci_sysfs.c |  4 
 drivers/usb/usbip/vhci_tx.c| 12 
 drivers/usb/usbip/vudc_dev.c   |  6 ++
 5 files changed, 39 insertions(+)

diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c
index 2305d425e6c9..627f83c0ebc8 100644
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -131,10 +131,14 @@ static void stub_shutdown_connection(struct usbip_device 
*ud)
 
/* 1. stop threads */
if (ud->tcp_rx) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: stop rx %p\n", __func__, ud->tcp_rx);
kthread_stop_put(ud->tcp_rx);
ud->tcp_rx = NULL;
}
if (ud->tcp_tx) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: stop tx %p\n", __func__, ud->tcp_tx);
kthread_stop_put(ud->tcp_tx);
ud->tcp_tx = NULL;
}
@@ -146,6 +150,8 @@ static void stub_shutdown_connection(struct usbip_device 
*ud)
 * not touch NULL socket.
 */
if (ud->tcp_socket) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: close sock %p\n", __func__, 
ud->tcp_socket);
sockfd_put(ud->tcp_socket);
ud->tcp_socket = NULL;
ud->sockfd = -1;
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 3209b5ddd30c..9e95bf9330f5 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -1016,10 +1016,14 @@ static void vhci_shutdown_connection(struct 
usbip_device *ud)
 
/* kill threads related to this sdev */
if (vdev->ud.tcp_rx) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: stop rx %p\n", __func__, vdev->ud.tcp_rx);
kthread_stop_put(vdev->ud.tcp_rx);
vdev->ud.tcp_rx = NULL;
}
if (vdev->ud.tcp_tx) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: stop tx %p\n", __func__, vdev->ud.tcp_tx);
kthread_stop_put(vdev->ud.tcp_tx);
vdev->ud.tcp_tx = NULL;
}
@@ -1027,6 +1031,8 @@ static void vhci_shutdown_connection(struct usbip_device 
*ud)
 
/* active connection is closed */
if (vdev->ud.tcp_socket) {
+   if (IS_BUILTIN(CONFIG_DEBUG_AID_FOR_SYZBOT))
+   pr_info("%s: close sock %p\n", __func__, 
ud->tcp_socket);
sockfd_put(vdev->ud.tcp_socket);
vdev->ud.tcp_socket = NULL;
vdev->ud.sockfd = -1;
@@ -1074,6 +1080,11 @@ static void vhci_device_reset(s

Re: [PATCH] smackfs: restrict bytes count in smackfs write functions

2021-01-25 Thread Tetsuo Handa

On 2021/01/26 3:08, Casey Schaufler wrote:
> On 1/24/2021 6:36 AM, Sabyrzhan Tasbolatov wrote:
>> syzbot found WARNINGs in several smackfs write operations where
>> bytes count is passed to memdup_user_nul which exceeds
>> GFP MAX_ORDER. Check count size if bigger SMK_LONGLABEL,
>> for smk_write_syslog if bigger than PAGE_SIZE - 1.
>>
>> Reported-by: syzbot+a71a442385a0b2815...@syzkaller.appspotmail.com
>> Signed-off-by: Sabyrzhan Tasbolatov 
> 
> Thank you for the patch. Unfortunately, SMK_LONGLABEL isn't
> the right value in some of these cases. 
> 

Since it uses sscanf(), I think that whitespaces must be excluded from upper 
limit
check. I'm proposing adding __GFP_NOWARM on the memdup_user_nul() side at
https://lkml.kernel.org/r/20210120103436.11830-1-penguin-ker...@i-love.sakura.ne.jp
 .

Re: BUG: MAX_LOCKDEP_KEYS too low!

2021-01-23 Thread Tetsuo Handa

On 2021/01/23 20:26, Alexey Kardashevskiy wrote:
> Should not the first 8192 from "grep lock-classes /proc/lockdep_stats" 
> decrease
> after time (it does not), or once it has failed, it is permanent?

Since lockdep_unregister_key() becomes a no-op because graph_lock() returns 0
due to debug_locks being changed from 1 to 0 by __debug_locks_off() from
debug_locks_off() from debug_locks_off_graph_unlock(), lock-classes value in
/proc/lockdep_stats will not decrease after "BUG: MAX_LOCKDEP_KEYS too low!"
is printed.

Re: BUG: MAX_LOCKDEP_KEYS too low!

2021-01-23 Thread Tetsuo Handa

On 2021/01/23 15:35, Alexey Kardashevskiy wrote:
> this behaves quite different but still produces the message (i have 
> show_workqueue_state() right after the bug message):
> 
> 
> [   85.803991] BUG: MAX_LOCKDEP_KEYS too low!
> [   85.804338] turning off the locking correctness validator.
> [   85.804474] Showing busy workqueues and worker pools:
> [   85.804620] workqueue events_unbound: flags=0x2
> [   85.804764]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/512 refcnt=3
> [   85.804965] in-flight: 81:bpf_map_free_deferred
> [   85.805229] workqueue events_power_efficient: flags=0x80
> [   85.805357]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   85.805558] in-flight: 57:gc_worker
> [   85.805877] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 
> 82 24
> [   85.806147] pool 16: cpus=0-7 flags=0x4 nice=0 hung=69s workers=3 idle: 7 
> 251
> ^C[  100.129747] maxlockdep (5104) used greatest stack depth: 8032 bytes left
> 
> root@le-dbg:~# grep "lock-classes" /proc/lockdep_stats
>  lock-classes: 8192 [max: 8192]
> 

Right. Hillf's patch can reduce number of active workqueue's worker threads, for
only one worker thread can call bpf_map_free_deferred() (which is nice because
it avoids bloat of active= and refcnt= fields). But Hillf's patch is not for
fixing the cause of "BUG: MAX_LOCKDEP_KEYS too low!" message.

Like Dmitry mentioned, bpf syscall allows producing work items faster than
bpf_map_free_deferred() can consume. (And a similar problem is observed for
network namespaces.) Unless there is a bug that prevents bpf_map_free_deferred()
 from completing, the classical solution is to put pressure on producers (i.e.
slow down bpf syscall side) in a way that consumers (i.e. __bpf_map_put())
will not schedule thousands of backlog "struct bpf_map" works.

BPF: unbounded bpf_map_free_deferred problem

2021-01-22 Thread Tetsuo Handa

Hello, BPF developers.

Alexey Kardashevskiy is reporting that system_wq gets stuck due to flooding of
unbounded bpf_map_free_deferred work. Use of WQ_MEM_RECLAIM | WQ_HIGHPRI | 
WQ_UNBOUND
workqueue did not solve this problem. Is it possible that a refcount leak 
somewhere
preventing bpf_map_free_deferred from completing? Please see
https://lkml.kernel.org/r/CACT4Y+Z+kwPM=WUzJ-e359PWeLLqmF0w4Yxp1spzZ=+j0ek...@mail.gmail.com
 .

On 2021/01/23 7:53, Alexey Kardashevskiy wrote:
> 
> 
> On 23/01/2021 02:30, Tetsuo Handa wrote:
>> On 2021/01/22 22:28, Tetsuo Handa wrote:
>>> On 2021/01/22 21:10, Dmitry Vyukov wrote:
>>>> On Fri, Jan 22, 2021 at 1:03 PM Alexey Kardashevskiy  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 22/01/2021 21:30, Tetsuo Handa wrote:
>>>>>> On 2021/01/22 18:16, Dmitry Vyukov wrote:
>>>>>>> The reproducer only does 2 bpf syscalls, so something is slowly leaking 
>>>>>>> in bpf.
>>>>>>> My first suspect would be one of these. Since workqueue is async, it
>>>>>>> may cause such slow drain that happens only when tasks are spawned
>>>>>>> fast. I don't know if there is a procfs/debugfs introspection file to
>>>>>>> monitor workqueue lengths to verify this hypothesis.
>>>>>>
>>>>>> If you can reproduce locally, you can call show_workqueue_state()
>>>>>> (part of SysRq-t) when hitting the limit.
>>>>>>
>>>>>> --- a/kernel/locking/lockdep.c
>>>>>> +++ b/kernel/locking/lockdep.c
>>>>>> @@ -1277,6 +1277,7 @@ register_lock_class(struct lockdep_map *lock, 
>>>>>> unsigned int subclass, int force)
>>>>>>
>>>>>>   print_lockdep_off("BUG: MAX_LOCKDEP_KEYS too low!");
>>>>>>   dump_stack();
>>>>>> +   show_workqueue_state();
>>>>>>   return NULL;
>>>>>>   }
>>>>>>   nr_lock_classes++;
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Here is the result:
>>>>> https://pastebin.com/rPn0Cytu
>>>>
>>>> Do you mind posting this publicly?
>>>> Yes, it seems that bpf_map_free_deferred is the problem (11138
>>>> outstanding callbacks).
>>>>
>>>
>>> Wow. Horribly stuck queue. I guess BPF wants to try WQ created by
>>>
>>>    alloc_workqueue("bpf_free_wq", WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 
>>> 0);
>>>
>>> rather than system_wq . You can add Tejun Heo  for WQ.
>>>
>>> Anyway, please post your result to ML.
> 
> 
> 
> https://pastebin.com/JfrmzguK is with the patch below applied. Seems less 
> output. Interestingly when I almost hit "send", OOM kicked in and tried 
> killing a bunch of "maxlockdep" processes (my test case):
> 
> [  891.037315] [  31007] 0 31007  281    5    49152    0  
>     1000 maxlockdep
> [  891.037540] [  31009] 0 31009  281    5    49152    0  
>     1000 maxlockdep
> [  891.037760] [  31012] 0 31012  281    5    49152    0  
>     1000 maxlockdep
> [  891.037980] [  31013] 0 31013  281    5    47104    0  
>    0 maxlockdep
> [  891.038210] [  31014] 0 31014  281    5    49152    0  
>     1000 maxlockdep
> [  891.038429] [  31018] 0 31018  281    5    47104    0  
>    0 maxlockdep
> [  891.038652] [  31019] 0 31019  281    5    49152    0  
>     1000 maxlockdep
> [  891.038874] [  31020] 0 31020  281    5    49152    0  
>     1000 maxlockdep
> [  891.039095] [  31021] 0 31021  281    5    49152    0  
>     1000 maxlockdep
> [  891.039317] [  31022] 0 31022  281    5    47104    0  
>    0 maxlockdep
> 
> 
> 
> 
> And (re)adding LKML and Tejun as suggested. Thanks,
> 
> 
>>>
>>
>> Does this patch (which is only compile tested) reduce number of pending works
>> when hitting "BUG: MAX_LOCKDEP_KEYS too low!" ?
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 07cb5d15e743..c6c6902090f0 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -41,6 +41,7 @@ struct bpf_local_storage_map;
>>   struct kobject;
>>   struct mem_c

[PATCH v4 (resend)] lockdep: Allow tuning tracing capacity constants.

2021-01-20 Thread Tetsuo Handa

Since syzkaller continues various test cases until the kernel crashes,
syzkaller tends to examine more locking dependencies than normal systems.
As a result, syzbot is reporting that the fuzz testing was terminated
due to hitting upper limits lockdep can track [1] [2] [3].

Peter Zijlstra does not want to allow tuning these limits via kernel
config options, for such change discourages thinking. But analysis via
/proc/lockdep* did not show any obvious culprit [4] [5]. It is possible
that many hundreds of kn->active lock instances are to some degree
contributing to these problems, but there is no means to verify whether
these instances are created for protecting same callback functions.
Unless Peter provides a way to make these instances per "which callback
functions the lock instance will call (identified by something like MD5
of string representations of callback functions which each lock instance
will protect)" than plain "serial number", I don't think that we can
verify the culprit.

[1] 
https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
[2] 
https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
[3] 
https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
[4] 
https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
[5] 
https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp

Reported-by: syzbot 
Reported-by: syzbot 
Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
Acked-by: Dmitry Vyukov 
---
 kernel/locking/lockdep.c   |  2 +-
 kernel/locking/lockdep_internals.h |  8 +++---
 lib/Kconfig.debug  | 40 ++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c1418b47f625..c0553872668a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this,
 /*
  * For good efficiency of modular, we use power of 2
  */
-#define MAX_CIRCULAR_QUEUE_SIZE4096UL
+#define MAX_CIRCULAR_QUEUE_SIZE(1UL << 
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS)
 #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1)
 
 /*
diff --git a/kernel/locking/lockdep_internals.h 
b/kernel/locking/lockdep_internals.h
index de49f9e1c11b..ecb8662e7a4e 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
 #define MAX_STACK_TRACE_ENTRIES262144UL
 #define STACK_TRACE_HASH_SIZE  8192
 #else
-#define MAX_LOCKDEP_ENTRIES32768UL
+#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS)
 
-#define MAX_LOCKDEP_CHAINS_BITS16
+#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS
 
 /*
  * Stack-trace: tightly packed array of stack backtrace
  * addresses. Protected by the hash_lock.
  */
-#define MAX_STACK_TRACE_ENTRIES524288UL
-#define STACK_TRACE_HASH_SIZE  16384
+#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS)
+#define STACK_TRACE_HASH_SIZE  (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)
 #endif
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7937265ef879..4cb84b499636 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1332,6 +1332,46 @@ config LOCKDEP
 config LOCKDEP_SMALL
bool
 
+config LOCKDEP_BITS
+   int "Bitsize for MAX_LOCKDEP_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 15
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too 
low!" message.
+
+config LOCKDEP_CHAINS_BITS
+   int "Bitsize for MAX_LOCKDEP_CHAINS"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 16
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too 
low!" message.
+
+config LOCKDEP_STACK_TRACE_BITS
+   int "Bitsize for MAX_STACK_TRACE_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 19
+   help
+ Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES 
too low!" message.
+
+config LOCKDEP_STACK_TRACE_HASH_BITS
+   int "Bitsize for STACK_TRACE_HASH_SIZE"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 14
+   help
+ Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES.
+
+config LOCKDEP_CIRCULAR_QUEUE_BITS
+   int "Bitsize for elements in circular_queue struct"
+   depends on LOCKDEP
+   range 10 30
+   default 12
+   help
+ Try increasing this value if you hit "lockdep bfs error:-1" warning 
due to __cq_enqueue() failure.
+
 config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP
-- 
2.18.4

Re: [Patch] fbcon: i want fbcon soft scrollback feature come back

2021-01-07 Thread Tetsuo Handa

On 2021/01/08 0:48, xuhuijie wrote:
> This commit 50145474f6ef(fbcon: remove soft scrollback code) remove soft 
> scrollback in
> fbcon. So the shift+PageDown and shift+PageUp is missing. But PageUp is a 
> vary important
> feature when system panic or reset. I can get log by PageUp before, but now 
> there is no
> way to get. Especially on the server system, we always use bmc to control 
> computer.
> So I hope the community can add this feature back.
> 

You can configure kdump for panic, and netconsole for reset.
(I don't know whether PageUp key works after panic...)

Re: Does uaccess_kernel() work for detecting kernel thread?

2021-01-05 Thread Tetsuo Handa

On 2021/01/05 16:59, Christoph Hellwig wrote:
> On Wed, Dec 23, 2020 at 07:11:38PM +0900, Tetsuo Handa wrote:
>> due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel
>> builds with set_fs()") and follow up changes. Don't we need to change this
>> "uaccess_kernel()" with "(current->flags & PF_KTHREAD)" ?
> 
> No.  The real problem here is that when a this funtion is called

Called by "who" ?
Called by "a userspace process" ?
Called by "a kernel thread" ?
Called by "an io_uring service thread" ?

>  under
> set_fs it allows kernel memory access for all user pointers, and due to
> the indirection in the playload allows reading or changing kernel
> memory.  A kthread does not have that issue.

If this uaccess_kernel() is intended to reject calling this function from
"a userspace process", uaccess_kernel() is failing to reject because
uaccess_kernel() is always "false" for x86.

If this uaccess_kernel() is intended to reject calling this function from
"a kernel thread", uaccess_kernel() is failing to reject because
uaccess_kernel() is always "false" for x86.

If this uaccess_kernel() is intended to reject calling this function from
"an io_uring service thread", uaccess_kernel() is failing to reject because
uaccess_kernel() is always "false" for x86.

What does uaccess_kernel() in sg_check_file_access() (and uhid_char_write(),
ib_safe_file_access(), bpfilter_process_sockopt() etc.) want to check?

> 
>>>> For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
>>>> isn't TOMOYO unexpectedly checking permissions for socket operations?
>>>
>>> Can someone explain WTF TOMOYO is even doing there?  A security module
>>> has absolutely no business checking what context it is called from, but
>>> must check the process credentials instead.
>>>
>>
>> TOMOYO distinguishes userspace processes and kernel threads, and grants
>> kernel threads implicit permissions to perform socket operations.
> 
> And this is the problem we need to fix.  A kernel thread can't just have
> implicit permissions only because it is a kernel thread.  Think of e.g.
> the io_uring service threads.

We can use (current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD like
https://lkml.kernel.org/r/dacfb329-de66-d0cf-dcf9-f030ea137...@schaufler-ca.com 
does
in order to exclude e.g. the io_uring service threads, can't we?

[PATCH v4] lockdep: Allow tuning tracing capacity constants.

2021-01-01 Thread Tetsuo Handa

Since syzkaller continues various test cases until the kernel crashes,
syzkaller tends to examine more locking dependencies than normal systems.
As a result, syzbot is reporting that the fuzz testing was terminated
due to hitting upper limits lockdep can track [1] [2] [3].

Peter Zijlstra does not want to allow tuning these limits via kernel
config options, for such change discourages thinking. But analysis via
/proc/lockdep* did not show any obvious culprit [4] [5]. It is possible
that many hundreds of kn->active lock instances are to some degree
contributing to these problems, but there is no means to verify whether
these instances are created for protecting same callback functions.
Unless Peter provides a way to make these instances per "which callback
functions the lock instance will call (identified by something like MD5
of string representations of callback functions which each lock instance
will protect)" than plain "serial number", I don't think that we can
verify the culprit.

Therefore, despite Peter's objection, I push this patch again.
The ball is in Peter court. Peter, please don't ignore.

[1] 
https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
[2] 
https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a
[3] 
https://syzkaller.appspot.com/bug?id=a588183ac34c1437fc0785e8f220e88282e5a29f
[4] 
https://lkml.kernel.org/r/4b8f7a57-fa20-47bd-48a0-ae35d860f...@i-love.sakura.ne.jp
[5] 
https://lkml.kernel.org/r/1c351187-253b-2d49-acaf-4563c63ae...@i-love.sakura.ne.jp

Reported-by: syzbot 
Reported-by: syzbot 
Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
Acked-by: Dmitry Vyukov 
---
 kernel/locking/lockdep.c   |  2 +-
 kernel/locking/lockdep_internals.h |  8 +++---
 lib/Kconfig.debug  | 40 ++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c1418b47f625..c0553872668a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1391,7 +1391,7 @@ static int add_lock_to_list(struct lock_class *this,
 /*
  * For good efficiency of modular, we use power of 2
  */
-#define MAX_CIRCULAR_QUEUE_SIZE4096UL
+#define MAX_CIRCULAR_QUEUE_SIZE(1UL << 
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS)
 #define CQ_MASK(MAX_CIRCULAR_QUEUE_SIZE-1)
 
 /*
diff --git a/kernel/locking/lockdep_internals.h 
b/kernel/locking/lockdep_internals.h
index de49f9e1c11b..ecb8662e7a4e 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -99,16 +99,16 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
 #define MAX_STACK_TRACE_ENTRIES262144UL
 #define STACK_TRACE_HASH_SIZE  8192
 #else
-#define MAX_LOCKDEP_ENTRIES32768UL
+#define MAX_LOCKDEP_ENTRIES(1UL << CONFIG_LOCKDEP_BITS)
 
-#define MAX_LOCKDEP_CHAINS_BITS16
+#define MAX_LOCKDEP_CHAINS_BITSCONFIG_LOCKDEP_CHAINS_BITS
 
 /*
  * Stack-trace: tightly packed array of stack backtrace
  * addresses. Protected by the hash_lock.
  */
-#define MAX_STACK_TRACE_ENTRIES524288UL
-#define STACK_TRACE_HASH_SIZE  16384
+#define MAX_STACK_TRACE_ENTRIES(1UL << CONFIG_LOCKDEP_STACK_TRACE_BITS)
+#define STACK_TRACE_HASH_SIZE  (1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)
 #endif
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e6e58b26e888..2c23939dc6a6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1340,6 +1340,46 @@ config LOCKDEP
 config LOCKDEP_SMALL
bool
 
+config LOCKDEP_BITS
+   int "Bitsize for MAX_LOCKDEP_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 15
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too 
low!" message.
+
+config LOCKDEP_CHAINS_BITS
+   int "Bitsize for MAX_LOCKDEP_CHAINS"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 16
+   help
+ Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too 
low!" message.
+
+config LOCKDEP_STACK_TRACE_BITS
+   int "Bitsize for MAX_STACK_TRACE_ENTRIES"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 19
+   help
+ Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES 
too low!" message.
+
+config LOCKDEP_STACK_TRACE_HASH_BITS
+   int "Bitsize for STACK_TRACE_HASH_SIZE"
+   depends on LOCKDEP && !LOCKDEP_SMALL
+   range 10 30
+   default 14
+   help
+ Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES.
+
+config LOCKDEP_CIRCULAR_QUEUE_BITS
+   int "Bitsize for elements in circular_queue struct"
+   depends on LOCKDEP
+   range 10 30
+   default 12
+   help
+

Re: Does uaccess_kernel() work for detecting kernel thread?

2020-12-23 Thread Tetsuo Handa

On 2020/12/23 16:53, Christoph Hellwig wrote:
> On Tue, Dec 22, 2020 at 11:39:08PM +0900, Tetsuo Handa wrote:
>> For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
>> isn't sg_check_file_access() failing to detect kernel context?
> 
> sg_check_file_access does exactly the right thing - fail for all kernel
> threads as those can't support the magic it does.

My question is, in Linux 5.10, sg_check_file_access() for x86 became

static int sg_check_file_access(struct file *filp, const char *caller)
{
if (filp->f_cred != current_real_cred()) {
pr_err_once("%s: process %d (%s) changed security contexts 
after opening file descriptor, this is not allowed.\n",
caller, task_tgid_vnr(current), current->comm);
return -EPERM;
}
if (0) {
pr_err_once("%s: process %d (%s) called from kernel context, 
this is not allowed.\n",
caller, task_tgid_vnr(current), current->comm);
return -EACCES;
}
return 0;
}

due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel
builds with set_fs()") and follow up changes. Don't we need to change this
"uaccess_kernel()" with "(current->flags & PF_KTHREAD)" ?

> 
>> For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
>> isn't TOMOYO unexpectedly checking permissions for socket operations?
> 
> Can someone explain WTF TOMOYO is even doing there?  A security module
> has absolutely no business checking what context it is called from, but
> must check the process credentials instead.
> 

TOMOYO distinguishes userspace processes and kernel threads, and grants
kernel threads implicit permissions to perform socket operations.
Since "uaccess_kernel()" became "0" for x86, TOMOYO is no longer able to
grant kernel threads implicit permissions to perform socket operations.
Since Eric says "For PF_IO_WORKER kernel threads which are running code on 
behalf
of a user we want to perform the ordinary permission checks.", I think that
TOMOYO wants to use "(current->flags & (PF_KTHREAD | PF_IO_WORKER)) == 
PF_KTHREAD"
instead.

Does uaccess_kernel() work for detecting kernel thread?

2020-12-22 Thread Tetsuo Handa

Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced 
segment_eq(get_fs(), KERNEL_DS)
with uaccess_kernel(). But uaccess_kernel() became an unconditional "false" for 
some architectures
due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for kernel builds 
with set_fs()") and
follow up changes in Linux 5.10. As a result, I guess that uaccess_kernel() can 
no longer be used
as a condition for checking whether current thread is a kernel thread or not.

For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
isn't sg_check_file_access() failing to detect kernel context?

static int sg_check_file_access(struct file *filp, const char *caller)
{
if (filp->f_cred != current_real_cred()) {
pr_err_once("%s: process %d (%s) changed security contexts 
after opening file descriptor, this is not allowed.\n",
caller, task_tgid_vnr(current), current->comm);
return -EPERM;
}
if (uaccess_kernel()) {
pr_err_once("%s: process %d (%s) called from kernel context, 
this is not allowed.\n",
caller, task_tgid_vnr(current), current->comm);
return -EACCES;
}
return 0;
}

For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
isn't TOMOYO unexpectedly checking permissions for socket operations?

static bool tomoyo_kernel_service(void)
{
/* Nothing to do if I am a kernel service. */
return uaccess_kernel();
}

static u8 tomoyo_sock_family(struct sock *sk)
{
u8 family;

if (tomoyo_kernel_service())
return 0;
family = sk->sk_family;
switch (family) {
case PF_INET:
case PF_INET6:
case PF_UNIX:
return family;
default:
return 0;
}
}

Don't we need to replace such usage with something like (current->flags & 
PF_KTHREAD) ?
I don't know about io_uring, but according to
https://lkml.kernel.org/r/dacfb329-de66-d0cf-dcf9-f030ea137...@schaufler-ca.com 
,
should (current->flags & (PF_KTHREAD | PF_IO_WORKER)) == PF_KTHREAD be used 
instead?

Re: WARNING: suspicious RCU usage in modeset_lock

2020-12-18 Thread Tetsuo Handa

On Wed, Dec 16, 2020 at 5:16 PM Paul E. McKenney  wrote:
> In my experience, lockdep will indeed complain if an interrupt handler
> returns while in an RCU read-side critical section.

Can't we add lock status checks into the beginning and the end of interrupt 
handler functions
(e.g. whether "struct task_struct"->lockdep_depth did not change) ?

Re: [PATCH 2/4] hung_task: Replace "did_panic" with is_be_panic()

2020-12-18 Thread Tetsuo Handa

On 2020/12/18 21:59, Pavel Machek wrote:
> On Fri 2020-12-18 19:44:04, Xiaoming Ni wrote:
> Plus.. is_being_panic is not really english. "is_paniccing" would be
> closer...?

Or in_panic() ?

Re: [PATCH v2 00/10] allow unprivileged overlay mounts

2020-12-08 Thread Tetsuo Handa

On 2020/12/08 1:32, Miklos Szeredi wrote:
> A general observation is that overlayfs does not call security_path_*()
> hooks on the underlying fs.  I don't see this as a problem, because a
> simple bind mount done inside a private mount namespace also defeats the
> path based security checks.  Maybe I'm missing something here, so I'm
> interested in comments from AppArmor and Tomoyo developers.

Regarding TOMOYO, I don't want overlayfs to call security_path_*() hooks on the
underlying fs, but the reason is different. It is not because a simple bind 
mount
done inside a private mount namespace defeats the path based security checks.
TOMOYO does want to check what device/filesystem is mounted on which location. 
But
currently TOMOYO is failing to check it due to fsopen()/fsmount()/move_mount() 
API.

Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.

2020-12-04 Thread Tetsuo Handa

I attempted some analysis, but I couldn't find which lock is causing 
nr_list_entries == 32768.
Since "struct lock_list list_entries[MAX_LOCKDEP_ENTRIES]" is marked as 
"static" variable in lockdep.c ,
none of /proc/lockdep_stats /proc/lockdep /proc/lockdep_chains can report which 
lock is responsible
for "BUG: MAX_LOCKDEP_ENTRIES too low!" message?

 lock-classes: 3456 [max: 8192]
 direct dependencies: 32768 [max: 32768]
 indirect dependencies:  200953
 all direct dependencies:   1213844
 dependency chains:   56497 [max: 65536]
 dependency chain hlocks used:   249022 [max: 327680]
 dependency chain hlocks lost:0
 in-hardirq chains:  88
 in-softirq chains:2636
 in-process chains:   53773
 stack-trace entries:303496 [max: 524288]
 number of stack traces:  15714
 number of stack hash chains: 10103
 combined max dependencies:hardirq-safe locks: 57
 hardirq-unsafe locks: 2613
 softirq-safe locks:381
 softirq-unsafe locks: 2202
 irq-safe locks:395
 irq-unsafe locks: 2613
 hardirq-read-safe locks: 4
 hardirq-read-unsafe locks: 233
 softirq-read-safe locks:21
 softirq-read-unsafe locks: 218
 irq-read-safe locks:21
 irq-read-unsafe locks: 233
 uncategorized locks:   491
 unused locks:1
 max locking depth:  20
 max bfs queue depth:   749
 debug_locks: 0

 zapped classes:224
 zapped lock chains:640
 large chain blocks:  1

I was able to calculate "lock-classes:" and "indirect dependencies:" from 
/proc/lockdep , but
I failed to calculate "direct dependencies:" from /proc/lockdep . How can we 
calculate it?

grep ^FD: CrashLog-12016d5550 | wc -l
3456

grep -F -- '->' CrashLog-12016d5550 | wc -l
16004

grep '^ ->' CrashLog-12016d5550 | wc -l
14014

grep '^ ->' CrashLog-12016d5550 | sort -u | wc -l
2506

grep ^FD: CrashLog-12016d5550 | awk ' BEGIN { FD=0; BD=0; } { if ($1 == 
"FD:") { FD=FD+$2; BD=BD+$4; } } END { print FD" "BD; } '
200953 200856

Top FD entries tend to be long held locks (e.g. workqueue) whereas
top BD entries tend to be short held locks (e.g. scheduler and timekeeping).
But given that this result is from a VM with only 2 CPUs, I think that
this result does not match your

  Examples of bad annotations is getting every CPU a separate class, that
  leads to nr_cpus! chains if CPUs arbitrarily nest (nr_cpus^2 if there's
  only a single nesting level).

case.

awk ' { if ($1 == "FD:") print $2" "$6$7 } ' CrashLog-12016d5550 | sort -rV 
| head -n 40
1933 (wq_completion)usb_hub_wq
1932 (work_completion)(>events)
1730 (wq_completion)events
1508 (delayed_fput_work).work
1332 >s_type->i_mutex_key#13
1227 >f_pos_lock
1173 sb_writers#5
1095 (wq_completion)netns
1094 net_cleanup_work
1094 masq_mutex
1092 pernet_ops_rwsem
1046 (work_completion)(>wq)
1046 (work_completion)(>rq)
1045 >mutex/1
1043 tty_mutex
1042 >legacy_mutex
1039 >lock
1035 >legacy_mutex/1
1033 >ldisc_sem
1009 cb_lock
1007 (work_completion)(>work)
1007 nlk_cb_mutex-GENERIC
1006 genl_mutex
985 sb_writers#8
984 sb_writers#13
984 sb_writers#6
979 >mutex
964 (work_completion)(_work->work)
946 (wq_completion)events_power_efficient
943 _nl_types[idx].sem
942 link_ops_rwsem
939 (wq_completion)gid-cache-wq
937 (work_completion)(>work)
937 (work_completion)(_work->work)
936 devices_rwsem
931 kn->active#88
930 nsim_bus_dev_list_lock
926 rdma_nets_rwsem
924 >compat_devs_mutex
922 (wq_completion)ipv6_addrconf

awk ' { if ($3 == "BD:") print $4" "$6$7 } ' CrashLog-12016d5550 | sort 
-urV | head -n 40
1618 pool_lock
1617 _hash[i].lock
1510 >lock
1489 pvclock_gtod_data
1487 tk_core.seq.seqcount
1465 hrtimer_bases.lock
1460 _cpu_ptr(group->pcpu,cpu)->seq
1459 _rq->rt_runtime_lock
1459 >lock/1
1458 _b->rt_runtime_lock
1458 >lock
1457 >delays->lock
1457 >lock/1
1457 >rto_lock
1457 >lock
1457 _rq->removed.lock
1457 _b->lock
1457 per_cpu_ptr(_rstat_cpu_lock,cpu)
1456 >lock
1299 >pi_lock
1266 depot_lock
1114 &s->seqcount
1085 >list_lock
1071 >lock
1039 >wait#13
1039 wq_mayday_lock
1038 >lock
869 quarantine_lock
859 _lock
854 rcu_node_0
827 _state.expedited_wq
812 key#10
808 >deferred_split_queue.split_queue_lock
807 >lru_lock
806 lock#3
804 >refs_lock
798 >private_lock
788 _state.gp_wq
782 _wait_table[i]
778 pgd_lock

grep -F -- '->kn->active#' CrashLog-12016d5550 | wc -l
533
grep -F -- '->kn->active#' CrashLog-12016d5550 | sort -u | wc -l
439



FD:  931 BD:1 .+.+: kn->active#88
 ->fs_reclaim
 ->&s->seqcount
 ->kernfs_open_file_mutex
 ->nsim_bus_dev_list_lock

Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.

2020-12-03 Thread Tetsuo Handa

On Sun, Nov 22, 2020 at 2:56 AM Tetsuo Handa 
 wrote:
> Peter, you guessed that the culprit is sysfs at
> https://lkml.kernel.org/r/20200916115057.go2...@hirez.programming.kicks-ass.net
>  , but
> syzbot reported at 
> https://syzkaller.appspot.com/text?tag=MachineInfo=99b8f2b092d9714f
> that "BUG: MAX_LOCKDEP_ENTRIES too low!" can occur on a VM with only 2 CPUs.
> Is your guess catching the culprit?
>
> We could improve a few locks, but as a whole we won't be able to afford 
> keeping
> sum of individual dependencies under current threshold. Therefore, allow 
> lockdep to
> tune the capacity and allow syzkaller to dump when reaching the capacity will 
> be
> the way to go.

An example dump from a VM with only 2 CPUs
( https://syzkaller.appspot.com/text?tag=CrashLog=12016d5550 ) had
a chain with more than kn->active#number 150 entries.

kn->active#number is initialized by __kernfs_create_file() but it is not
deinitialized (there is no method for erasing lockdep map?) by kernfs_put().
Since syzkaller continues object creation/deletion via fuzz testing until
kernel crashes, __kernfs_create_file() are called for so many times?



struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 const char *name,
 umode_t mode, kuid_t uid, kgid_t gid,
 loff_t size,
 const struct kernfs_ops *ops,
 void *priv, const void *ns,
 struct lock_class_key *key)
{
(...snipped...)
kn = kernfs_new_node(parent, name, (mode & S_IALLUGO) | S_IFREG,
 uid, gid, flags);
if (!kn)
return ERR_PTR(-ENOMEM);

kn->attr.ops = ops;
kn->attr.size = size;
kn->ns = ns;
kn->priv = priv;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
if (key) {
lockdep_init_map(>dep_map, "kn->active", key, 0);
kn->flags |= KERNFS_LOCKDEP;
}
#endif
(...snipped...)
}

void kernfs_put(struct kernfs_node *kn)
{
(...snipped...)
kmem_cache_free(kernfs_node_cache, kn);
(...snipped...)
}



FD:  919 BD:   59 +.+.: rtnl_mutex
 ->fs_reclaim
 ->depot_lock
 ->pcpu_alloc_mutex
 ->>wait#6
 ->_hash[i].lock
 ->gdp_mutex
 ->>list_lock
 ->lock#2
 ->kernfs_mutex
 ->bus_type_sem
 ->sysfs_symlink_target_lock
 ->&s->seqcount
 ->>power.lock
 ->dpm_list_mtx
 ->uevent_sock_mutex
 ->>lock
 ->running_helpers_waitq.lock
 ->subsys mutex#17
 ->dev_hotplug_mutex
 ->dev_base_lock
 ->input_pool.lock
 ->nl_table_lock
 ->nl_table_wait.lock
 ->net_rwsem
 ->>lock
 ->sysctl_lock
 ->>lock
 ->krc.lock
 ->cpu_hotplug_lock
 ->kthread_create_lock
 ->>pi_lock
 ->_rq->removed.lock
 ->>wait
 ->>alloc_lock
 ->_rwsem
 ->wq_pool_mutex
 ->>lock
 ->rcu_node_0
 ->_state.expedited_wq
 ->>list_lock
 ->>lock
 ->pool_lock
 ->lweventlist_lock
 ->rtnl_mutex.wait_lock
 ->proc_subdir_lock
 ->proc_inum_ida.xa_lock
 ->>k_lock
 ->pcpu_lock
 ->param_lock
 ->logbuf_lock
 ->(console_sem).lock
 ->subsys mutex#52
 ->pin_fs_lock
 ->>s_type->i_mutex_key#3
 ->>iflist_mtx
 ->>mtx
 ->subsys mutex#53
 ->>sec_mtx
 ->>iflist_mtx#2
 ->lock#6
 ->net_dm_mutex
 ->failover_lock
 ->>lock
 ->>lock
 ->>lock
 ->>mc_lock
 ->>mca_lock
 ->>lock
 ->j1939_netdev_lock
 ->>lock
 ->>sock_lock
 ->_requests_lock
 ->_pending_beacons_lock
 ->_indoor_lock
 ->_xmit_LOOPBACK
 ->(inetaddr_validator_chain).rwsem
 ->(inetaddr_chain).rwsem
 ->_dev->mc_tomb_lock
 ->>lock
 ->fib_info_lock
 ->cbs_list_lock
 ->taprio_list_lock
 ->(inet6addr_validator_chain).rwsem
 ->addrconf_hash_lock
 ->>lock
 ->>tb6_lock
 ->rlock-AF_NETLINK
 ->_xmit_ETHER
 ->noop_qdisc.q.lock
 ->quarantine_lock
 ->kernfs_mutex.wait_lock
 ->x25_neigh_list_lock
 ->>mmap_lock#2
 ->free_vmap_area_lock
 ->vmap_area_lock
 ->>lock
 ->>lock
 ->>hash_lock
 ->init_mm.page_table_lock
 ->_netdev_addr_lock_key
 ->listen_lock
 ->>consumer_lock
 ->batched_entropy_u32.lock
 ->(switchdev_blocking_notif_chain).rwsem
 ->>hash_lock
 ->nf_hook_mutex
 ->console_owner_lock
 ->console_owner
 ->_priv->tvlv.handler_list_lock
 ->_priv->tvlv.container_list_lock
 ->_priv->softif_vlan_list_lock
 ->key#14
 ->_priv->tt.changes_list_lock
 ->tk_core.seq.seqcount
 ->>mutex
 ->init_lock
 ->deferred_lock
 ->target_list_lock
 ->>lock
 ->_addr_list_lock_key/1
 ->tea

Re: [PATCH] tomoyo: Avoid potential null pointer access

2020-11-27 Thread Tetsuo Handa

On 2020/11/27 16:17, Zheng Zengkai wrote:
> Hello Tetsuo,
>> On 2020/11/26 15:33, Zheng Zengkai wrote:
>>> As your say,  I found the function tomoyo_assign_namespace( )
>>>
>>> in security/tomoyo/domain.c has the similar situation,
>>>
>>> Can I add __GFP_NOWARN for both and remove the null check for _entry_ in 
>>> tomoyo_assign_namespace( )?
>>>
>> Good catch. Yes, please send as a patch.
>> .
> 
> I have resent a patch, thanks!
> 

Applied to tomoyo-test1.git tree. Thank you.

By the way, since some people automatically backport patches with Fixes: tag,
I think we don't need to add Fixes: tag for patches like this one.

Re: [PATCH] tomoyo: Avoid potential null pointer access

2020-11-25 Thread Tetsuo Handa

On 2020/11/26 15:33, Zheng Zengkai wrote:
> As your say,  I found the function tomoyo_assign_namespace( )
> 
> in security/tomoyo/domain.c has the similar situation,
> 
> Can I add __GFP_NOWARN for both and remove the null check for _entry_ in 
> tomoyo_assign_namespace( )?
> 

Good catch. Yes, please send as a patch.

Re: [PATCH] tomoyo: Avoid potential null pointer access

2020-11-25 Thread Tetsuo Handa

Hello, Zheng.

Thank you for a patch, but I won't apply this patch.
Expected behavior is that tomoyo_warn_oom() is called
if tomoyo_memory_ok() is called with entry == NULL.

Adding __GFP_NOWARN might be OK, but returning without tomoyo_warn_oom() is NG.

On 2020/11/25 21:10, Zheng Zengkai wrote:
> Calls to kzalloc() should be null-checked in order to avoid
> any potential failures or unnecessary code execution.
> Fix this by adding null checks for _entry_ right after allocation.
> 
> Fixes: 57c2590fb7fd ("TOMOYO: Update profile structure")
> Reported-by: Hulk Robot 
> Signed-off-by: Zheng Zengkai 

Nacked-by: Tetsuo Handa 

> ---
>  security/tomoyo/common.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c
> index 4bee32bfe16d..99b4fafcb100 100644
> --- a/security/tomoyo/common.c
> +++ b/security/tomoyo/common.c
> @@ -499,6 +499,8 @@ static struct tomoyo_profile *tomoyo_assign_profile
>   if (ptr)
>   return ptr;
>   entry = kzalloc(sizeof(*entry), GFP_NOFS);
> + if (!entry)
> + return NULL;
>   if (mutex_lock_interruptible(_policy_lock))
>   goto out;
>   ptr = ns->profile_ptr[profile];
>

Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.

2020-11-21 Thread Tetsuo Handa

On 2020/11/20 18:27, Dmitry Vyukov wrote:
> Peter, so far it looks like just a very large, but normal graph to me.
> The cheapest from an engineering point of view solution would be just
> to increase the constants. I assume a 2x increase should buy us lots
> of time to overflow.
> I can think of more elaborate solutions, e.g. using bitmasks to
> represent hot leaf and top-level locks. But it will both increase the
> resulting code complexity (no uniform representation anymore, all code
> will need to deal with different representations) and require some
> time investments (that I can't justify for me at least as compared to
> just throwing a bit more machine memory at it). And in the end it
> won't really reduce the size of the graph.
> What do you think?
> 

Yes, I think it is a normal graph; simply syzkaller kernels tend to record
a few times more dependencies than my idle kernel (shown bottom).

Peter, you guessed that the culprit is sysfs at
https://lkml.kernel.org/r/20200916115057.go2...@hirez.programming.kicks-ass.net 
, but
syzbot reported at 
https://syzkaller.appspot.com/text?tag=MachineInfo=99b8f2b092d9714f
that "BUG: MAX_LOCKDEP_ENTRIES too low!" can occur on a VM with only 2 CPUs.
Is your guess catching the culprit?

We could improve a few locks, but as a whole we won't be able to afford keeping
sum of individual dependencies under current threshold. Therefore, allow 
lockdep to
tune the capacity and allow syzkaller to dump when reaching the capacity will be
the way to go.



# cat /proc/lockdep_stats
 lock-classes: 1236 [max: 8192]
 direct dependencies:  9610 [max: 32768]
 indirect dependencies:   40401
 all direct dependencies:174635
 dependency chains:   11398 [max: 65536]
 dependency chain hlocks used:42830 [max: 327680]
 dependency chain hlocks lost:0
 in-hardirq chains:  61
 in-softirq chains: 414
 in-process chains:   10923
 stack-trace entries: 93041 [max: 524288]
 number of stack traces:   4997
 number of stack hash chains:  4292
 combined max dependencies:   281074520
 hardirq-safe locks: 50
 hardirq-unsafe locks:  805
 softirq-safe locks:146
 softirq-unsafe locks:  722
 irq-safe locks:155
 irq-unsafe locks:  805
 hardirq-read-safe locks: 2
 hardirq-read-unsafe locks: 129
 softirq-read-safe locks:11
 softirq-read-unsafe locks: 123
 irq-read-safe locks:11
 irq-read-unsafe locks: 129
 uncategorized locks:   224
 unused locks:0
 max locking depth:  15
 max bfs queue depth:   215
 chain lookup misses: 11664
 chain lookup hits:37393935
 cyclic checks:   11053
 redundant checks:0
 redundant links: 0
 find-mask forwards checks:1588
 find-mask backwards checks:   1779
 hardirq on events:17502380
 hardirq off events:   17502376
 redundant hardirq ons:   0
 redundant hardirq offs:  0
 softirq on events:   90845
 softirq off events:  90845
 redundant softirq ons:   0
 redundant softirq offs:  0
 debug_locks: 1

 zapped classes:  0
 zapped lock chains:  0
 large chain blocks:  1
# awk ' { if ($2 == "OPS:") print $5" "$9 } ' /proc/lockdep | sort -rV | head 
-n 30
423 (wq_completion)events
405 (wq_completion)events_unbound
393 >f_pos_lock
355 >lock
349 sb_writers#3
342 sb_writers#6
338 >mutex
330 (work_completion)(>work)
330 pernet_ops_rwsem
289 epmutex
288 >mtx
281 tty_mutex
280 >legacy_mutex
273 >legacy_mutex/1
269 >ldisc_sem
268 (wq_completion)ipv6_addrconf
266 (work_completion)(&(>dad_work)->work)
266 (linkwatch_work).work
266 (addr_chk_work).work
266 >atomic_read_lock
265 (work_completion)(>work)
265 rtnl_mutex
263 >atomic_write_lock
262 >lock
261 >termios_rwsem
242 >buf.lock/1
242 kn->active#40
241 _tty->termios_rwsem/1
240 registration_lock
239 >output_lock
# awk ' { if ($2 == "OPS:") print $7" "$9 } ' /proc/lockdep | sort -rV | head 
-n 30
642 pool_lock
641 _hash[i].lock
582 tk_core.seq.seqcount
561 hrtimer_bases.lock
560 _rq->rt_runtime_lock
559 _b->rt_runtime_lock
559 >lock
559 _rq->removed.lock
559 _b->lock
558 >lock
550 >delays->lock
549 >pi_lock
506 >lock
504 >list_lock
432 &s->seqcount
404 >wait#10
401 >lock
369 >lock
330 rcu_node_0
326 &(kretprobe_table_locks[i].lock)
326 pgd_lock
324 >lru_lock
323 lock#5
321 _wait_table[i]
319 ptlock_ptr(page)#2/1
318 ptlock_ptr(page)#2

Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants.

2020-11-19 Thread Tetsuo Handa

On 2020/11/19 23:30, Dmitry Vyukov wrote:
> p.s. it's indeed huge, full log was 11MB, this probably won't be
> chewed by syzbot.

Is "cat /proc/lockdep*" output written from userspace?
Then, we could try "xz -9" + "base64" for recording.

> Peter, are these [hex numbers] needed? Could we strip them during
> post-processing? At first sight they look like derivatives of the
> name.

kernel/locking/lockdep.c uses %px (raw address) whereas
kernel/locking/lockdep_proc.c uses %p (__ptr_to_hashval() value).
I think saving these [hashed hex numbers] is unlikely useful.
At least for this testing, we can strip leading  part.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3816 matches

Mail list logo