Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Sat, Dec 30, 2017 at 00:30:32 +0800, weiping zhangwrote: 1. Add proper SELINUX policy that give permission to mdadm for debugfs. 2. Split mdadm into 2 part, Firstly, user proccess mdadm trigger a kwork, secondly kwork will create gendisk)and mdadm wait it done, Like following: diff --git a/drivers/md/md.c b/drivers/md/md.c Is that patch ready to be tested? Fedora hasn't built an rc5 kernel yet, probably because a lot of people are off work this week. So I haven't done that test yet.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 22, 2017 at 08:04:23AM -0600, Bruno Wolff III wrote: > On Fri, Dec 22, 2017 at 21:20:10 +0800, > weiping zhangwrote: > >2017-12-22 12:53 GMT+08:00 Bruno Wolff III : > >>On Thu, Dec 21, 2017 at 17:16:03 -0600, > >> Bruno Wolff III wrote: > >>> > >>> > >>>Enforcing mode alone isn't enough as I tested that one one machine at home > >>>and it didn't trigger the problem. I'll try another machine late tonight. > >> > >> > >>I got the problem to occur on my i686 machine when booting in enforcing > >>mode. This machine uses raid 1 vua mdraid which may or may not be a factor > >>in this problem. The boot log has a trace at the end and might be helpful, > >>so I'm attaching it here. > >Hi Bruno, > >I can reproduce this issue in my QEMU test VM easily, just add an soft > >RAID1, always trigger > >that warning, I'll debug it later. > > Great. When you have a fix, I can test it. This issue can trigger easily in Centos7.3, if meet two factors: 1. SELINUX in enforceing mode 2. mdadm try to create new gendisk. if disable SELINUX or let it in permissive mode, issue disappear. As Jens has revert that commit, it seems boot normally, actually this is no diretor created under /sys/kernel/debug/bdi/, though has no effect on disk workflow. As james said before, "debugfs files should be treated as optional", so kernel give warning here is enough. So there are 2 ways to fix this issue: 1. Add proper SELINUX policy allow mdadm create dir at debugfs 2. mdadm don't create gendisk directly, first mdadm trigger a kwork and wait it done, let kwork create gendisk. A possible change for MD like following: diff --git a/drivers/md/md.c b/drivers/md/md.c index 4e4dee0..86ead5a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -90,6 +90,7 @@ EXPORT_SYMBOL(md_cluster_mod); static DECLARE_WAIT_QUEUE_HEAD(resync_wait); +static struct workqueue_struct *md_probe_wq; static struct workqueue_struct *md_wq; static struct workqueue_struct *md_misc_wq; @@ -5367,10 +5368,27 @@ static int md_alloc(dev_t dev, char *name) return error; } +static void md_probe_work_fn(struct work_struct *ws) +{ + struct md_probe_work *mpw = container_of(ws, struct md_probe_work, + work); + md_alloc(mpw->dev, NULL); + mpw->done = 1; + wake_up(>wait); +} + static struct kobject *md_probe(dev_t dev, int *part, void *data) { - if (create_on_open) - md_alloc(dev, NULL); + struct md_probe_work mpw; + + if (create_on_open) { + init_waitqueue_head(); + mpw.dev = dev; + mpw.done = 0; + INIT_WORK(, md_probe_work_fn); + queue_work(md_probe_wq, ); + wait_event(mpw.wait, mpw.done); + } return NULL; } @@ -9023,9 +9041,13 @@ static int __init md_init(void) { int ret = -ENOMEM; + md_probe_wq = alloc_workqueue("md_probe", 0, 0); + if (!md_probe_wq) + goto err_wq; + md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) - goto err_wq; + goto err_probe_wq; md_misc_wq = alloc_workqueue("md_misc", 0, 0); if (!md_misc_wq) @@ -9055,6 +9077,8 @@ static int __init md_init(void) destroy_workqueue(md_misc_wq); err_misc_wq: destroy_workqueue(md_wq); +err_probe_wq: + destroy_workqueue(md_probe_wq); err_wq: return ret; } @@ -9311,6 +9335,7 @@ static __exit void md_exit(void) } destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); + destroy_workqueue(md_probe_wq); } subsys_initcall(md_init); diff --git a/drivers/md/md.h b/drivers/md/md.h index 7d6bcf0..3953896 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -487,6 +487,13 @@ enum recovery_flags { MD_RECOVERY_ERROR, /* sync-action interrupted because io-error */ }; +struct md_probe_work { + struct work_struct work; + wait_queue_head_t wait; + dev_t dev; + int done; +}; + static inline int __must_check mddev_lock(struct mddev *mddev) { return mutex_lock_interruptible(>reconfig_mutex);
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 22, 2017 at 08:04:23AM -0600, Bruno Wolff III wrote: > On Fri, Dec 22, 2017 at 21:20:10 +0800, > weiping zhangwrote: > >2017-12-22 12:53 GMT+08:00 Bruno Wolff III : > >>On Thu, Dec 21, 2017 at 17:16:03 -0600, > >> Bruno Wolff III wrote: > >>> > >>> > >>>Enforcing mode alone isn't enough as I tested that one one machine at home > >>>and it didn't trigger the problem. I'll try another machine late tonight. > >> > >> > >>I got the problem to occur on my i686 machine when booting in enforcing > >>mode. This machine uses raid 1 vua mdraid which may or may not be a factor > >>in this problem. The boot log has a trace at the end and might be helpful, > >>so I'm attaching it here. > >Hi Bruno, > >I can reproduce this issue in my QEMU test VM easily, just add an soft > >RAID1, always trigger > >that warning, I'll debug it later. > > Great. When you have a fix, I can test it. This issue can trigger easily in Centos7.3 + kernel-4.15-rc3, if meet two factors: 1. SELINUX in enforceing mode 2. mdadm try to create new gendisk. if disable SELINUX or let it in permissive mode, issue disappear. As Jens has revert that commit, it seems boot normally, actually there is no diretory created under /sys/kernel/debug/bdi/, though has no effect on disk workflow. As James said before, "debugfs files should be treated as optional", so kernel give warning here is enough. So, we may solve this issue in two ways: 1. Add proper SELINUX policy that give permission to mdadm for debugfs. 2. Split mdadm into 2 part, Firstly, user proccess mdadm trigger a kwork, secondly kwork will create gendisk)and mdadm wait it done, Like following: diff --git a/drivers/md/md.c b/drivers/md/md.c index 4e4dee0..86ead5a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -90,6 +90,7 @@ EXPORT_SYMBOL(md_cluster_mod); static DECLARE_WAIT_QUEUE_HEAD(resync_wait); +static struct workqueue_struct *md_probe_wq; static struct workqueue_struct *md_wq; static struct workqueue_struct *md_misc_wq; @@ -5367,10 +5368,27 @@ static int md_alloc(dev_t dev, char *name) return error; } +static void md_probe_work_fn(struct work_struct *ws) +{ + struct md_probe_work *mpw = container_of(ws, struct md_probe_work, + work); + md_alloc(mpw->dev, NULL); + mpw->done = 1; + wake_up(>wait); +} + static struct kobject *md_probe(dev_t dev, int *part, void *data) { - if (create_on_open) - md_alloc(dev, NULL); + struct md_probe_work mpw; + + if (create_on_open) { + init_waitqueue_head(); + mpw.dev = dev; + mpw.done = 0; + INIT_WORK(, md_probe_work_fn); + queue_work(md_probe_wq, ); + wait_event(mpw.wait, mpw.done); + } return NULL; } @@ -9023,9 +9041,13 @@ static int __init md_init(void) { int ret = -ENOMEM; + md_probe_wq = alloc_workqueue("md_probe", 0, 0); + if (!md_probe_wq) + goto err_wq; + md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) - goto err_wq; + goto err_probe_wq; md_misc_wq = alloc_workqueue("md_misc", 0, 0); if (!md_misc_wq) @@ -9055,6 +9077,8 @@ static int __init md_init(void) destroy_workqueue(md_misc_wq); err_misc_wq: destroy_workqueue(md_wq); +err_probe_wq: + destroy_workqueue(md_probe_wq); err_wq: return ret; } @@ -9311,6 +9335,7 @@ static __exit void md_exit(void) } destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); + destroy_workqueue(md_probe_wq); } subsys_initcall(md_init); diff --git a/drivers/md/md.h b/drivers/md/md.h index 7d6bcf0..3953896 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -487,6 +487,13 @@ enum recovery_flags { MD_RECOVERY_ERROR, /* sync-action interrupted because io-error */ }; +struct md_probe_work { + struct work_struct work; + wait_queue_head_t wait; + dev_t dev; + int done; +}; + static inline int __must_check mddev_lock(struct mddev *mddev) { return mutex_lock_interruptible(>reconfig_mutex);
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 22, 2017 at 21:20:10 +0800, weiping zhangwrote: 2017-12-22 12:53 GMT+08:00 Bruno Wolff III : On Thu, Dec 21, 2017 at 17:16:03 -0600, Bruno Wolff III wrote: Enforcing mode alone isn't enough as I tested that one one machine at home and it didn't trigger the problem. I'll try another machine late tonight. I got the problem to occur on my i686 machine when booting in enforcing mode. This machine uses raid 1 vua mdraid which may or may not be a factor in this problem. The boot log has a trace at the end and might be helpful, so I'm attaching it here. Hi Bruno, I can reproduce this issue in my QEMU test VM easily, just add an soft RAID1, always trigger that warning, I'll debug it later. Great. When you have a fix, I can test it.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-22 12:53 GMT+08:00 Bruno Wolff III: > On Thu, Dec 21, 2017 at 17:16:03 -0600, > Bruno Wolff III wrote: >> >> >> Enforcing mode alone isn't enough as I tested that one one machine at home >> and it didn't trigger the problem. I'll try another machine late tonight. > > > I got the problem to occur on my i686 machine when booting in enforcing > mode. This machine uses raid 1 vua mdraid which may or may not be a factor > in this problem. The boot log has a trace at the end and might be helpful, > so I'm attaching it here. Hi Bruno, I can reproduce this issue in my QEMU test VM easily, just add an soft RAID1, always trigger that warning, I'll debug it later.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 17:16:03 -0600, Bruno Wolff IIIwrote: Enforcing mode alone isn't enough as I tested that one one machine at home and it didn't trigger the problem. I'll try another machine late tonight. I got the problem to occur on my i686 machine when booting in enforcing mode. This machine uses raid 1 vua mdraid which may or may not be a factor in this problem. The boot log has a trace at the end and might be helpful, so I'm attaching it here. -- Logs begin at Sun 2017-09-24 07:43:45 CDT, end at Thu 2017-12-21 22:46:47 CST. -- Dec 21 21:36:32 wolff.to kernel: Linux version 4.15.0-0.rc4.git1.2.fc28.i686 (mockbu...@buildvm-15.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-4) (GCC)) #1 SMP Tue Dec 19 17:26:41 UTC 2017 Dec 21 21:36:32 wolff.to kernel: x86/fpu: x87 FPU will use FXSAVE Dec 21 21:36:32 wolff.to kernel: e820: BIOS-provided physical RAM map: Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0x-0x0009cbff] usable Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0x0009cc00-0x0009] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0x000e-0x000f] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0x0010-0xbfee] usable Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xbfef-0xbfefbfff] ACPI data Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xbfefc000-0xbfef] ACPI NVS Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xbff0-0xbff7] usable Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xbff8-0xbfff] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xfec0-0xfec0] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xfee0-0xfee00fff] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xff80-0xffbf] reserved Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 0xfff0-0x] reserved Dec 21 21:36:32 wolff.to kernel: Notice: NX (Execute Disable) protection missing in CPU! Dec 21 21:36:32 wolff.to kernel: random: fast init done Dec 21 21:36:32 wolff.to kernel: SMBIOS 2.32 present. Dec 21 21:36:32 wolff.to kernel: DMI: Hewlett-Packard hp workstation xw8000/0844, BIOS JQ.W1.19US 04/13/05 Dec 21 21:36:32 wolff.to kernel: e820: update [mem 0x-0x0fff] usable ==> reserved Dec 21 21:36:32 wolff.to kernel: e820: remove [mem 0x000a-0x000f] usable Dec 21 21:36:32 wolff.to kernel: e820: last_pfn = 0xbff80 max_arch_pfn = 0x10 Dec 21 21:36:32 wolff.to kernel: MTRR default type: uncachable Dec 21 21:36:32 wolff.to kernel: MTRR fixed ranges enabled: Dec 21 21:36:32 wolff.to kernel: 0-9 write-back Dec 21 21:36:32 wolff.to kernel: A-B uncachable Dec 21 21:36:32 wolff.to kernel: C-F write-protect Dec 21 21:36:32 wolff.to kernel: MTRR variable ranges enabled: Dec 21 21:36:32 wolff.to kernel: 0 base 0 mask F8000 write-back Dec 21 21:36:32 wolff.to kernel: 1 base 08000 mask FC000 write-back Dec 21 21:36:32 wolff.to kernel: 2 disabled Dec 21 21:36:32 wolff.to kernel: 3 disabled Dec 21 21:36:32 wolff.to kernel: 4 disabled Dec 21 21:36:32 wolff.to kernel: 5 disabled Dec 21 21:36:32 wolff.to kernel: 6 disabled Dec 21 21:36:32 wolff.to kernel: 7 disabled Dec 21 21:36:32 wolff.to kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- UC Dec 21 21:36:32 wolff.to kernel: found SMP MP-table at [mem 0x000f63a0-0x000f63af] mapped at [(ptrval)] Dec 21 21:36:32 wolff.to kernel: initial memory mapped: [mem 0x-0x0a7f] Dec 21 21:36:32 wolff.to kernel: Base memory trampoline at [(ptrval)] 98000 size 16384 Dec 21 21:36:32 wolff.to kernel: BRK [0x0a53f000, 0x0a53] PGTABLE Dec 21 21:36:32 wolff.to kernel: BRK [0x0a54, 0x0a541fff] PGTABLE Dec 21 21:36:32 wolff.to kernel: BRK [0x0a542000, 0x0a542fff] PGTABLE Dec 21 21:36:32 wolff.to kernel: RAMDISK: [mem 0x36732000-0x37fe] Dec 21 21:36:32 wolff.to kernel: Allocated new RAMDISK: [mem 0x34e74000-0x367318b1] Dec 21 21:36:32 wolff.to kernel: Move RAMDISK from [mem 0x36732000-0x37fef8b1] to [mem 0x34e74000-0x367318b1] Dec 21 21:36:32 wolff.to kernel: ACPI: Early table checksum verification disabled Dec 21 21:36:32 wolff.to kernel: ACPI: RSDP 0x000F6370 14 (v00 PTLTD ) Dec 21 21:36:32 wolff.to kernel: ACPI: RSDT 0xBFEF8D1D 34 (v01 PTLTDRSDT 0604 LTP ) Dec 21 21:36:32 wolff.to kernel: ACPI: FACP 0xBFEFBDB5 74 (v01 INTEL PLACER 0604 PTL 0008) Dec 21 21:36:32 wolff.to kernel: ACPI: DSDT 0xBFEF8D51 003064 (v01 hp silvertn 0604 MSFT 010E) Dec 21 21:36:32 wolff.to kernel: ACPI: FACS 0xBFEFCFC0 40 Dec 21 21:36:32 wolff.to kernel: ACPI: _HP_ 0xBFEFBE29 000113 (v01 HPINVT HPINVENT
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 12:15:31 -0600, Bruno Wolff IIIwrote: One important thing I have just found is that it looks like the problem only happens when booting in enforcing mode. If I boot in permissive mode it does not happen. My home machines are currently set to boot in permissive mode and I'll test this evening to see if I can reproduce the problem if I change them to enforcing mode. If so I'll be able to do lots of testing during my vacation. Enforcing mode alone isn't enough as I tested that one one machine at home and it didn't trigger the problem. I'll try another machine late tonight.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 10:02:15 -0700, Jens Axboewrote: On 12/21/17 9:42 AM, Bruno Wolff III wrote: On Thu, Dec 21, 2017 at 23:48:19 +0800, weiping zhang wrote: output you want. I never saw it for any kernels I compiled myself. Only when I test kernels built by Fedora do I see it. see it every boot ? I don't look every boot. The warning gets scrolled of the screen. Once I see the CPU hang warnings I know the boot is failing. I don't always look at journalctl later to see what's there. I'm going to revert a0747a859ef6 for now, since we're now 8 days into this and no progress has been made on fixing it. One important thing I have just found is that it looks like the problem only happens when booting in enforcing mode. If I boot in permissive mode it does not happen. My home machines are currently set to boot in permissive mode and I'll test this evening to see if I can reproduce the problem if I change them to enforcing mode. If so I'll be able to do lots of testing during my vacation.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On 12/21/17 9:42 AM, Bruno Wolff III wrote: > On Thu, Dec 21, 2017 at 23:48:19 +0800, > weiping zhangwrote: >>> output you want. I never saw it for any kernels I compiled myself. Only when >>> I test kernels built by Fedora do I see it. >> see it every boot ? > > I don't look every boot. The warning gets scrolled of the screen. Once I see > the CPU hang warnings I know the boot is failing. I don't always look > at journalctl later to see what's there. I'm going to revert a0747a859ef6 for now, since we're now 8 days into this and no progress has been made on fixing it. -- Jens Axboe
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 23:48:19 +0800, weiping zhangwrote: output you want. I never saw it for any kernels I compiled myself. Only when I test kernels built by Fedora do I see it. see it every boot ? I don't look every boot. The warning gets scrolled of the screen. Once I see the CPU hang warnings I know the boot is failing. I don't always look at journalctl later to see what's there.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-21 23:36 GMT+08:00 Bruno Wolff III: > On Thu, Dec 21, 2017 at 23:31:40 +0800, > weiping zhang wrote: >> >> does every time boot fail can trigger WANRING in device_add_disk ? > > > Not that I see. But the message could scroll off the screen. The boot gets > far enough that systemd copies over dmesg output to permanent storage that I > can see on my next successful boot. That's where I looked for the warning > output you want. I never saw it for any kernels I compiled myself. Only when > I test kernels built by Fedora do I see it. see it every boot ? > I just tried booting to single user and the boot still hangs. > > When I build the kernels, the compiler options are probably a bit different > than when Fedora does. That might affect what happens during boot.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 23:31:40 +0800, weiping zhangwrote: does every time boot fail can trigger WANRING in device_add_disk ? Not that I see. But the message could scroll off the screen. The boot gets far enough that systemd copies over dmesg output to permanent storage that I can see on my next successful boot. That's where I looked for the warning output you want. I never saw it for any kernels I compiled myself. Only when I test kernels built by Fedora do I see it. I just tried booting to single user and the boot still hangs. When I build the kernels, the compiler options are probably a bit different than when Fedora does. That might affect what happens during boot.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 21, 2017 at 22:01:33 +0800, weiping zhangwrote: Hi, how do you do bisect ?build all kernel commit one by one ? as you did before: https://bugzilla.redhat.com/show_bug.cgi?id=1520982 I just did the one bisect using Linus' tree. After each build, I would do a test boot and see if the boot was normal or if I got errors and an eventual hang before boot. Since then I have used git revert to revert just the problem commit from later kernels (such as v4.15-rc4) and when I do the system boots normally. And when I don't do the revert or just use stock Fedora kernels the problem occurs every time. I also did a couple of tests with Josh Boyer's Fedora kernel tree that has Fedora patches on top of the development kernel. what kernel source code do you use that occur warning at device_add_disk? from fedora or any official release ? if so ,could you provide web link? That was from an offical Fedora kernel. I believe I got it from the nodebug repo, but that kernel should be the same as the one that was normally used for rawhide. It is at https://koji.fedoraproject.org/koji/buildinfo?buildID=1007500 but I don't know how much longer the binaries will stay available in koji. if you use same kernel source code and same .config, why your own build Cann't trigger that warning ? I don't know. The install script may build the initramfs differently. As far as I can tell, if the WARN_ON was triggered, I should have gotten output.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-21 21:00 GMT+08:00 Bruno Wolff III: > After today, I won't have physical access to the problem machine until > January 2nd. So if you guys have any testing suggestions I need them soon if > they are to get done before my vacation. > I do plan to try booting to level 1 to see if I can get a login prompt that > might facilitate testing. The lockups do happen fairly late in the boot > process. I never get to X, but maybe it will get far enough for a console > login. > Hi, how do you do bisect ?build all kernel commit one by one ? as you did before: https://bugzilla.redhat.com/show_bug.cgi?id=1520982 what kernel source code do you use that occur warning at device_add_disk? from fedora or any official release ? if so ,could you provide web link? if you use same kernel source code and same .config, why your own build Cann't trigger that warning ?
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
After today, I won't have physical access to the problem machine until January 2nd. So if you guys have any testing suggestions I need them soon if they are to get done before my vacation. I do plan to try booting to level 1 to see if I can get a login prompt that might facilitate testing. The lockups do happen fairly late in the boot process. I never get to X, but maybe it will get far enough for a console login.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Tue, Dec 19, 2017 at 10:24:52 -0800, Shaohua Liwrote: Not sure if this is MD related, but could you please check if this debug patch changes anything? I'm doing a build now. I do use md to mirror disk partitions between two disks. I do that on another machine that doesn't exhibit the problem, but it is i686, not x86_64.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Tue, Dec 19, 2017 at 10:17:43AM -0600, Bruno Wolff III wrote: > On Sun, Dec 17, 2017 at 21:43:50 +0800, > weiping zhangwrote: > > Hi, thanks for testing, I think you first reproduce this issue(got WARNING > > at device_add_disk) by your own build, then add my debug patch. > > The problem is still in rc4. Reverting the commit still fixes the problem. I > tested that warning level messages should appear using lkdtm. While there > could be something weird relating to the WARN_ON macro, more likely there is > something different about the boots with the kernels I build (the exact way > initramfs is built is probably different) and probably that (WARN_ON) code > is not getting executed. Not sure if this is MD related, but could you please check if this debug patch changes anything? diff --git a/drivers/md/md.c b/drivers/md/md.c index 4e4dee0..c365179 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -518,7 +518,6 @@ static void mddev_put(struct mddev *mddev) mddev->ctime == 0 && !mddev->hold_active) { /* Array is not configured at all, and not held active, * so destroy it */ - list_del_init(>all_mddevs); bs = mddev->bio_set; sync_bs = mddev->sync_set; mddev->bio_set = NULL; @@ -5210,6 +5209,10 @@ static void md_free(struct kobject *ko) } percpu_ref_exit(>writes_pending); + spin_lock(_mddevs_lock); + list_del_init(>all_mddevs); + spin_unlock(_mddevs_lock); + kfree(mddev); }
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Sun, Dec 17, 2017 at 21:43:50 +0800, weiping zhangwrote: Hi, thanks for testing, I think you first reproduce this issue(got WARNING at device_add_disk) by your own build, then add my debug patch. The problem is still in rc4. Reverting the commit still fixes the problem. I tested that warning level messages should appear using lkdtm. While there could be something weird relating to the WARN_ON macro, more likely there is something different about the boots with the kernels I build (the exact way initramfs is built is probably different) and probably that (WARN_ON) code is not getting executed.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Sun, Dec 17, 2017 at 21:43:50 +0800, weiping zhangwrote: Hi, thanks for testing, I think you first reproduce this issue(got WARNING at device_add_disk) by your own build, then add my debug patch. I'm going to try testing warnings with a kernel I've built, to try to determine if warnings are working at all for the ones I'm building. However it might be that the WARN_ONs are not being reached for the kernels I've built. If that turns out to be the case, I may not be able to get you both the output from the WARN_ONs and the output from your debugging patch at the same time. My next kernel build isn't going to finish in time to test today.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Sun, Dec 17, 2017 at 21:43:50 +0800, weiping zhangwrote: Hi, thanks for testing, I think you first reproduce this issue(got WARNING at device_add_disk) by your own build, then add my debug patch. No, the first log (that Laura copied) was from the Fedora bug and it was from a Fedora kernel before I started doing the bisect. That warnings are missing from all of my builds, not just the ones from the patches. I didn't spot anything obvious going through the kernel spec file when I looked, but I could easily have missed something. Hopefully it is just a boot default that is different and that I can override it with a kernel boot parameter.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-17 0:32 GMT+08:00 Bruno Wolff III: > On Fri, Dec 15, 2017 at 13:51:22 -0600, > Bruno Wolff III wrote: >> >> >> I do not know what is different. Do you have any ideas? Most likely I >> won't be able to test any more kernels until Monday (unless I can use most >> of my most recent build over again very soon). > > > The .config looks like it should be OK. I'll test setting loglevel on boot > in case the default is different than what the config file says. I can't do > that until Monday morning. > > I think it is more likely the the WARN_ON macro code isn't being compiled in > for some reason. I haven't confirmed that, nor have I found anything that > would leave that code out when I do a make, but include it during Fedora > builds. Hi, thanks for testing, I think you first reproduce this issue(got WARNING at device_add_disk) by your own build, then add my debug patch.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 13:51:22 -0600, Bruno Wolff IIIwrote: I do not know what is different. Do you have any ideas? Most likely I won't be able to test any more kernels until Monday (unless I can use most of my most recent build over again very soon). The .config looks like it should be OK. I'll test setting loglevel on boot in case the default is different than what the config file says. I can't do that until Monday morning. I think it is more likely the the WARN_ON macro code isn't being compiled in for some reason. I haven't confirmed that, nor have I found anything that would leave that code out when I do a make, but include it during Fedora builds.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 09:18:56 -0800, Laura Abbottwrote: You can see the trees Fedora produces at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git which includes the configs (you want to look at the ones withtout - debug) Thanks. I found it a little while ago and am already doing a test build without weiping's test patch to see if that kernel provides what he(?) needs. Doing a rebuild with the test patch will go pretty quickly. So if I get the message with device_add_disk from these kernels, I should be able to get the information this afternoon. If there is some other reason I don't get that when I do the builds, I'm probably not going to be able to figure it out and get a build done before I leave. I don't live close enough to the office that I'm going to want to drive in just to be able to do a reboot test. (And my hardware at home does exhibit the problem.) If you have some other idea about why I might not be seeing the device_add_disk message, I'd be interested in hearing it.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On 12/15/2017 08:30 AM, Bruno Wolff III wrote: On Fri, Dec 15, 2017 at 22:02:20 +0800, weiping zhangwrote: Yes, please help reproduce this issue include my debug patch. Reproduce means we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner. I'm not sure why yet, but I'm only getting the warning message you want with Fedora kernels, not the ones I am building (with or without your test patch). I'll attach a debug config file if you want to look there. But in theory that should be essentially what Fedora is using for theirs. They probably have some out of tree patches they are applying, but I wouldn't expect those to make a difference here. I think they now have a tree somewhere that I can try to build from that has their patches applied to the upstream kernel and if I can find it I will try building it just to test this out. I only have about 6 hours of physical access to the machine exhibiting the problem, and after that I won't be able to do test boots until Monday. You can see the trees Fedora produces at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git which includes the configs (you want to look at the ones withtout - debug)
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 22:02:20 +0800, weiping zhangwrote: Yes, please help reproduce this issue include my debug patch. Reproduce means we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner. I'm not sure why yet, but I'm only getting the warning message you want with Fedora kernels, not the ones I am building (with or without your test patch). I'll attach a debug config file if you want to look there. But in theory that should be essentially what Fedora is using for theirs. They probably have some out of tree patches they are applying, but I wouldn't expect those to make a difference here. I think they now have a tree somewhere that I can try to build from that has their patches applied to the upstream kernel and if I can find it I will try building it just to test this out. I only have about 6 hours of physical access to the machine exhibiting the problem, and after that I won't be able to do test boots until Monday.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-15 19:10 GMT+08:00 Bruno Wolff III: > On Fri, Dec 15, 2017 at 10:04:32 +0800, > weiping zhang wrote: >> >> I just want to know WARN_ON WHAT in device_add_disk, >> if bdi_register_owner return error code, it may fail at any step of >> following: > > > Was that output in the original boot log? I didn't see anything there that > had the string WARN_ON. The first log was from a Fedora kernel. The second Sorry to let you confuse, WARN_ON means we catch log as following: WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460 > from a kernel I built. I used a Fedora config though. The config was > probably from one of their nodebug kernels, I could build another one using > a config from a debug kernel. Would that likely provide what you are looking > for? Yes, please help reproduce this issue include my debug patch. Reproduce means we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 10:04:32 +0800, weiping zhangwrote: I just want to know WARN_ON WHAT in device_add_disk, if bdi_register_owner return error code, it may fail at any step of following: Was that output in the original boot log? I didn't see anything there that had the string WARN_ON. The first log was from a Fedora kernel. The second from a kernel I built. I used a Fedora config though. The config was probably from one of their nodebug kernels, I could build another one using a config from a debug kernel. Would that likely provide what you are looking for?
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 10:04:32 +0800, weiping zhangwrote: so I want see the WARN_ON as you paste before, also my DEBUG log will help to find which step fail. The previous time also journalctl for output, but maybe I used slightly different options. I'll look and see if it is in the journal for the last bad boot. I can do that from home.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-15 9:44 GMT+08:00 Bruno Wolff III: > On Fri, Dec 15, 2017 at 09:22:21 +0800, > weiping zhang wrote: >> >> >> Thanks your testing, but I cann't find WARN_ON in device_add_disk from >> this boot1.log, could you help reproduce that issue? And does this issue >> can be >> triggered at every bootup ? > > > I don't know what you need for the first question. When I am physically at > the machine I can do test reboots. If you have something specific you want > me to try I should be able to. > > Every time I boot with the problem commit, the boot never completes. However > it does seem to get pretty far. I get multiple register dumps every time. > After a while (a few minutes) I reboot to a wrking kernel. > > The output I included is from: journalctl -k -b -1 > If you think it would be better to see more than dmesg output let me know. I just want to know WARN_ON WHAT in device_add_disk, if bdi_register_owner return error code, it may fail at any step of following: bdi_debug_root is NULL bdi->debug_dir is NULL bdi->debug_stats is NULL so I want see the WARN_ON as you paste before, also my DEBUG log will help to find which step fail.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Fri, Dec 15, 2017 at 09:22:21 +0800, weiping zhangwrote: Thanks your testing, but I cann't find WARN_ON in device_add_disk from this boot1.log, could you help reproduce that issue? And does this issue can be triggered at every bootup ? I don't know what you need for the first question. When I am physically at the machine I can do test reboots. If you have something specific you want me to try I should be able to. Every time I boot with the problem commit, the boot never completes. However it does seem to get pretty far. I get multiple register dumps every time. After a while (a few minutes) I reboot to a wrking kernel. The output I included is from: journalctl -k -b -1 If you think it would be better to see more than dmesg output let me know.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
2017-12-14 23:41 GMT+08:00 Bruno Wolff III: > On Thu, Dec 14, 2017 at 18:09:27 +0800, > weiping zhang wrote: >> >> >> It seems something wrong with bdi debugfs register, could you help >> test the forllowing debug patch, I add some debug log, no function >> change, thanks. > > > I applied your patch to d39a01eff9af1045f6e30ff9db40310517c4b45f and there > were some new debug messages in the dmesg output. Hopefully this helps. I > also added the patch and output to the Fedora bug for people following > there. Hi Bruno, Thanks your testing, but I cann't find WARN_ON in device_add_disk from this boot1.log, could you help reproduce that issue? And does this issue can be triggered at every bootup ? -- Thanks weiping
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 14, 2017 at 18:09:27 +0800, weiping zhangwrote: It seems something wrong with bdi debugfs register, could you help test the forllowing debug patch, I add some debug log, no function change, thanks. I applied your patch to d39a01eff9af1045f6e30ff9db40310517c4b45f and there were some new debug messages in the dmesg output. Hopefully this helps. I also added the patch and output to the Fedora bug for people following there. -- Logs begin at Thu 2017-09-28 16:17:29 CDT, end at Thu 2017-12-14 09:36:50 CST. -- Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: microcode: microcode updated early to revision 0x3a, date = 2017-01-30 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: Linux version 4.15.0-rc3+ (br...@cerberus.csd.uwm.edu) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-4) (GCC)) #15 SMP Thu Dec 14 09:07:46 CST 2017 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: Command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc3+ root=/dev/mapper/luks-f5e2d09b-f8a3-487d-9517-abe4fb0eada3 ro rd.md.uuid=7f4fcca0:13b1445f:a91ff455:6bb1ab48 rd.luks.uuid=luks-cc6ee93c-e729-4f78-9baf-0cc5cc8a9ff1 rd.md.uuid=ef18531c:760102fb:7797cbdb:5cf9516f rd.md.uuid=42efe386:0c315f28:f7c61920:ea098f81 rd.luks.uuid=luks-f5e2d09b-f8a3-487d-9517-abe4fb0eada3 LANG=en_US.UTF-8 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: BIOS-provided physical RAM map: Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x-0x0009e7ff] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x0009e800-0x0009] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x000e-0x000f] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x0010-0x998f1fff] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x998f2000-0x9a29dfff] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9a29e000-0x9a2e6fff] ACPI data Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9a2e7000-0x9af43fff] ACPI NVS Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9af44000-0x9b40afff] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9b40b000-0x9b40bfff] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9b40c000-0x9b419fff] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x9b41a000-0x9cff] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0xa000-0xafff] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0xfed1c000-0xfed1] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0xff00-0x] reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 0x0001-0x00085fff] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: NX (Execute Disable) protection: active Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: random: fast init done Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: SMBIOS 2.8 present. Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: DMI: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: update [mem 0x-0x0fff] usable ==> reserved Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: remove [mem 0x000a-0x000f] usable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: last_pfn = 0x86 max_arch_pfn = 0x4 Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR default type: write-back Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR fixed ranges enabled: Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: 0-9 write-back Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: A-B uncachable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: C-E3FFF write-through Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: E4000-F write-protect Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR variable ranges enabled: Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: 0 base C000 mask 3FFFC000 uncachable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: 1 base A000 mask 3FFFE000 uncachable Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: 2 base 0300 mask
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 14, 2017 at 18:09:27 +0800, weiping zhangwrote: On Thu, Dec 14, 2017 at 02:24:52AM -0600, Bruno Wolff III wrote: On Wed, Dec 13, 2017 at 16:54:17 -0800, Laura Abbott wrote: >Hi, > >Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982 >of a boot failure/bug on Linus' master (full bootlog at the bugzilla) I'm available for testing. The problem happens on my x86_64 Dell Workstation, but not an old i386 server or an x86_64 mac hardware based laptop. Hi, It seems something wrong with bdi debugfs register, could you help test the forllowing debug patch, I add some debug log, no function change, thanks. I'll test it this morning. I'll probably have results in about 7 hrs from now.
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Thu, Dec 14, 2017 at 02:24:52AM -0600, Bruno Wolff III wrote: > On Wed, Dec 13, 2017 at 16:54:17 -0800, > Laura Abbottwrote: > >Hi, > > > >Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982 > >of a boot failure/bug on Linus' master (full bootlog at the bugzilla) > > I'm available for testing. The problem happens on my x86_64 Dell > Workstation, but not an old i386 server or an x86_64 mac hardware > based laptop. Hi, It seems something wrong with bdi debugfs register, could you help test the forllowing debug patch, I add some debug log, no function change, thanks. >From d2728c07589e8b83115a51e0c629451bff7308db Mon Sep 17 00:00:00 2001 From: weiping zhang Date: Thu, 14 Dec 2017 17:56:22 +0800 Subject: [PATCH] bdi debugfs Signed-off-by: weiping zhang --- mm/backing-dev.c | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 84b2dc7..fbbb9a6 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -39,6 +39,10 @@ static struct dentry *bdi_debug_root; static void bdi_debug_init(void) { bdi_debug_root = debugfs_create_dir("bdi", NULL); + if (!bdi_debug_root) + pr_err("DEBUG:bdi_debug_root fail\n"); + else + pr_err("DEBUG:bdi_debug_root success\n"); } static int bdi_debug_stats_show(struct seq_file *m, void *v) @@ -115,18 +119,29 @@ static const struct file_operations bdi_debug_stats_fops = { static int bdi_debug_register(struct backing_dev_info *bdi, const char *name) { - if (!bdi_debug_root) + if (!bdi_debug_root) { + pr_err("DEBUG:dev:%s, bdi_debug_root fail\n", name); return -ENOMEM; + } else { + pr_err("DEBUG:dev:%s, bdi_debug_root success\n", name); + } bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root); - if (!bdi->debug_dir) + if (!bdi->debug_dir) { + pr_err("DEBUG:dev:%s, debug_dir fail\n", name); return -ENOMEM; + } else { + pr_err("DEBUG:dev:%s, debug_dir success\n", name); + } bdi->debug_stats = debugfs_create_file("stats", 0444, bdi->debug_dir, bdi, _debug_stats_fops); if (!bdi->debug_stats) { debugfs_remove(bdi->debug_dir); + pr_err("DEBUG:dev:%s, debug_stats fail\n", name); return -ENOMEM; + } else { + pr_err("DEBUG:dev:%s, debug_stats success\n", name); } return 0; @@ -879,13 +894,20 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) return 0; dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); - if (IS_ERR(dev)) + if (IS_ERR(dev)) { + pr_err("DEBUG: bdi device_create_vargs fail\n"); return PTR_ERR(dev); + } + pr_err("DEBUG: bdi(0x%p) device_create_vargs sucess\n", bdi); if (bdi_debug_register(bdi, dev_name(dev))) { + pr_err("DEBUG: dev:%s, bdi(0x%p) bdi_debug_register fail\n", + dev_name(dev), bdi); device_destroy(bdi_class, dev->devt); return -ENOMEM; } + pr_err("DEBUG: dev:%s, bdi(0x%p) bdi_debug_register success\n", + dev_name(dev), bdi); cgwb_bdi_register(bdi); bdi->dev = dev; -- 2.9.4
Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On Wed, Dec 13, 2017 at 16:54:17 -0800, Laura Abbottwrote: Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982 of a boot failure/bug on Linus' master (full bootlog at the bugzilla) I'm available for testing. The problem happens on my x86_64 Dell Workstation, but not an old i386 server or an x86_64 mac hardware based laptop.
Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982 of a boot failure/bug on Linus' master (full bootlog at the bugzilla) WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460 Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore analog gameport joydev CPU: 3 PID: 3486 Comm: mdadm Not tainted 4.15.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015 task: e8461579 task.stack: bfe85ee4 RIP: 0010:device_add_disk+0x3d9/0x460 RSP: 0018:b42783b37b30 EFLAGS: 00010282 RAX: fff4 RBX: 952df829b000 RCX: RDX: RSI: 0001f040 RDI: 01ff RBP: 952df829b070 R08: 952df6bb2d60 R09: 0001820001ff R10: 0001 R11: 1401 R12: R13: 952df829b00c R14: 0009 R15: 952df829b000 FS: 7fd492882740() GS:952e1fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd4921a95b0 CR3: 000837ecf001 CR4: 001606e0 Call Trace: ? pm_runtime_init+0xa0/0xc0 md_alloc+0x1a8/0x360 md_probe+0x15/0x20 kobj_lookup+0x100/0x150 ? md_alloc+0x360/0x360 get_gendisk+0x29/0x110 blkdev_get+0x61/0x2f0 ? bd_acquire+0xb0/0xb0 ? bd_acquire+0xb0/0xb0 do_dentry_open+0x1b1/0x2d0 ? security_inode_permission+0x3c/0x50 path_openat+0x602/0x14e0 do_filp_open+0x9b/0x110 ? __check_object_size+0xaf/0x1b0 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x61/0x170 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fd492234a5e RSP: 002b:7fff5d59e9f0 EFLAGS: 0246 ORIG_RAX: 0101 RAX: ffda RBX: 4082 RCX: 7fd492234a5e RDX: 4082 RSI: 7fff5d59ea80 RDI: ff9c RBP: 7fff5d59ea80 R08: 7fff5d59ea80 R09: R10: R11: 0246 R12: 0009 R13: 007c R14: 7fff5d59eae0 R15: 7fff5d59eb68 Code: 48 83 c6 10 e8 19 08 f0 ff 85 c0 0f 84 d6 fd ff ff 0f ff e9 cf fd ff ff 80 a3 bc 00 00 00 ef e9 c3 fd ff ff 0f ff e9 d8 fd ff ff <0f> ff e9 ba fe ff ff 31 f6 48 89 df e8 36 ec ff ff 48 85 c0 48 ---[ end trace 9590c1ef4c38eb03 ]--- BUG: unable to handle kernel NULL pointer dereference at 54605537 IP: sysfs_do_create_link_sd.isra.2+0x2f/0xb0 PGD 0 P4D 0 Oops: [#1] SMP Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore analog gameport joydev CPU: 3 PID: 3486 Comm: mdadm Tainted: GW 4.15.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015 task: e8461579 task.stack: bfe85ee4 RIP: 0010:sysfs_do_create_link_sd.isra.2+0x2f/0xb0 RSP: 0018:b42783b37b00 EFLAGS: 00010246 RAX: RBX: 0040 RCX: 0001 RDX: 0001 RSI: 0040 RDI: bb613b0c RBP: baca3577 R08: 0008 R09: 0008 R10: f9efe0e8ca00 R11: f9efe0d77001 R12: 0001 R13: 952df6f45110 R14: 0009 R15: 952df829b000 FS: 7fd492882740() GS:952e1fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0040 CR3: 000837ecf001 CR4: 001606e0 Call Trace: device_add_disk+0x3b7/0x460 md_alloc+0x1a8/0x360 md_probe+0x15/0x20 kobj_lookup+0x100/0x150 ? md_alloc+0x360/0x360 get_gendisk+0x29/0x110 blkdev_get+0x61/0x2f0 ? bd_acquire+0xb0/0xb0 ? bd_acquire+0xb0/0xb0 do_dentry_open+0x1b1/0x2d0 ? security_inode_permission+0x3c/0x50 path_openat+0x602/0x14e0 do_filp_open+0x9b/0x110 ?