Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-29 Thread Bruno Wolff III

On Sat, Dec 30, 2017 at 00:30:32 +0800,
 weiping zhang  wrote:

1. Add proper SELINUX policy that give permission to mdadm for debugfs.
2. Split mdadm into 2 part, Firstly, user proccess mdadm trigger a kwork,
secondly kwork will create gendisk)and mdadm wait it done, Like
following:

diff --git a/drivers/md/md.c b/drivers/md/md.c


Is that patch ready to be tested?

Fedora hasn't built an rc5 kernel yet, probably because a lot of people 
are off work this week. So I haven't done that test yet.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-29 Thread weiping zhang
On Fri, Dec 22, 2017 at 08:04:23AM -0600, Bruno Wolff III wrote:
> On Fri, Dec 22, 2017 at 21:20:10 +0800,
>  weiping zhang  wrote:
> >2017-12-22 12:53 GMT+08:00 Bruno Wolff III :
> >>On Thu, Dec 21, 2017 at 17:16:03 -0600,
> >> Bruno Wolff III  wrote:
> >>>
> >>>
> >>>Enforcing mode alone isn't enough as I tested that one one machine at home
> >>>and it didn't trigger the problem. I'll try another machine late tonight.
> >>
> >>
> >>I got the problem to occur on my i686 machine when booting in enforcing
> >>mode. This machine uses raid 1 vua mdraid which may or may not be a factor
> >>in this problem. The boot log has a trace at the end and might be helpful,
> >>so I'm attaching it here.
> >Hi Bruno,
> >I can reproduce this issue in my QEMU test VM easily, just add an soft
> >RAID1, always trigger
> >that warning, I'll debug it later.
> 
> Great. When you have a fix, I can test it.
This issue can trigger easily in Centos7.3, if meet two factors:
1. SELINUX in enforceing mode
2. mdadm try to create new gendisk.

if disable SELINUX or let it in permissive mode, issue disappear.
As Jens has revert that commit, it seems boot normally, actually
this is no diretor created under /sys/kernel/debug/bdi/, though
has no effect on disk workflow.

As james said before, "debugfs files should be treated as optional",
so kernel give warning here is enough.

So there are 2 ways to fix this issue:
1. Add proper SELINUX policy allow mdadm create dir at debugfs
2. mdadm don't create gendisk directly, first mdadm trigger a kwork and
wait it done, let kwork create gendisk.
A possible change for MD like following:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4e4dee0..86ead5a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -90,6 +90,7 @@
 EXPORT_SYMBOL(md_cluster_mod);
 
 static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
+static struct workqueue_struct *md_probe_wq;
 static struct workqueue_struct *md_wq;
 static struct workqueue_struct *md_misc_wq;
 
@@ -5367,10 +5368,27 @@ static int md_alloc(dev_t dev, char *name)
return error;
 }
 
+static void md_probe_work_fn(struct work_struct *ws)
+{
+   struct md_probe_work *mpw = container_of(ws, struct md_probe_work,
+   work);
+   md_alloc(mpw->dev, NULL);
+   mpw->done = 1;
+   wake_up(>wait);
+}
+
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {
-   if (create_on_open)
-   md_alloc(dev, NULL);
+   struct md_probe_work mpw;
+
+   if (create_on_open) {
+   init_waitqueue_head();
+   mpw.dev = dev;
+   mpw.done = 0;
+   INIT_WORK(, md_probe_work_fn);
+   queue_work(md_probe_wq, );
+   wait_event(mpw.wait, mpw.done);
+   }
return NULL;
 }
 
@@ -9023,9 +9041,13 @@ static int __init md_init(void)
 {
int ret = -ENOMEM;
 
+   md_probe_wq = alloc_workqueue("md_probe", 0, 0);
+   if (!md_probe_wq)
+   goto err_wq;
+
md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
if (!md_wq)
-   goto err_wq;
+   goto err_probe_wq;
 
md_misc_wq = alloc_workqueue("md_misc", 0, 0);
if (!md_misc_wq)
@@ -9055,6 +9077,8 @@ static int __init md_init(void)
destroy_workqueue(md_misc_wq);
 err_misc_wq:
destroy_workqueue(md_wq);
+err_probe_wq:
+   destroy_workqueue(md_probe_wq);
 err_wq:
return ret;
 }
@@ -9311,6 +9335,7 @@ static __exit void md_exit(void)
}
destroy_workqueue(md_misc_wq);
destroy_workqueue(md_wq);
+   destroy_workqueue(md_probe_wq);
 }
 
 subsys_initcall(md_init);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 7d6bcf0..3953896 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -487,6 +487,13 @@ enum recovery_flags {
MD_RECOVERY_ERROR,  /* sync-action interrupted because io-error */
 };
 
+struct md_probe_work {
+   struct work_struct work;
+   wait_queue_head_t wait;
+   dev_t dev;
+   int done;
+};
+
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
return mutex_lock_interruptible(>reconfig_mutex);


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-29 Thread weiping zhang
On Fri, Dec 22, 2017 at 08:04:23AM -0600, Bruno Wolff III wrote:
> On Fri, Dec 22, 2017 at 21:20:10 +0800,
>  weiping zhang  wrote:
> >2017-12-22 12:53 GMT+08:00 Bruno Wolff III :
> >>On Thu, Dec 21, 2017 at 17:16:03 -0600,
> >> Bruno Wolff III  wrote:
> >>>
> >>>
> >>>Enforcing mode alone isn't enough as I tested that one one machine at home
> >>>and it didn't trigger the problem. I'll try another machine late tonight.
> >>
> >>
> >>I got the problem to occur on my i686 machine when booting in enforcing
> >>mode. This machine uses raid 1 vua mdraid which may or may not be a factor
> >>in this problem. The boot log has a trace at the end and might be helpful,
> >>so I'm attaching it here.
> >Hi Bruno,
> >I can reproduce this issue in my QEMU test VM easily, just add an soft
> >RAID1, always trigger
> >that warning, I'll debug it later.
> 
> Great. When you have a fix, I can test it.
This issue can trigger easily in Centos7.3 + kernel-4.15-rc3, if meet two 
factors:
1. SELINUX in enforceing mode
2. mdadm try to create new gendisk.

if disable SELINUX or let it in permissive mode, issue disappear.
As Jens has revert that commit, it seems boot normally, actually
there is no diretory created under /sys/kernel/debug/bdi/, though
has no effect on disk workflow.

As James said before, "debugfs files should be treated as optional",
so kernel give warning here is enough.

So, we may solve this issue in two ways:
1. Add proper SELINUX policy that give permission to mdadm for debugfs.
2. Split mdadm into 2 part, Firstly, user proccess mdadm trigger a kwork,
secondly kwork will create gendisk)and mdadm wait it done, Like
following: 

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4e4dee0..86ead5a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -90,6 +90,7 @@
 EXPORT_SYMBOL(md_cluster_mod);
 
 static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
+static struct workqueue_struct *md_probe_wq;
 static struct workqueue_struct *md_wq;
 static struct workqueue_struct *md_misc_wq;
 
@@ -5367,10 +5368,27 @@ static int md_alloc(dev_t dev, char *name)
return error;
 }
 
+static void md_probe_work_fn(struct work_struct *ws)
+{
+   struct md_probe_work *mpw = container_of(ws, struct md_probe_work,
+   work);
+   md_alloc(mpw->dev, NULL);
+   mpw->done = 1;
+   wake_up(>wait);
+}
+
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {
-   if (create_on_open)
-   md_alloc(dev, NULL);
+   struct md_probe_work mpw;
+
+   if (create_on_open) {
+   init_waitqueue_head();
+   mpw.dev = dev;
+   mpw.done = 0;
+   INIT_WORK(, md_probe_work_fn);
+   queue_work(md_probe_wq, );
+   wait_event(mpw.wait, mpw.done);
+   }
return NULL;
 }
 
@@ -9023,9 +9041,13 @@ static int __init md_init(void)
 {
int ret = -ENOMEM;
 
+   md_probe_wq = alloc_workqueue("md_probe", 0, 0);
+   if (!md_probe_wq)
+   goto err_wq;
+
md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
if (!md_wq)
-   goto err_wq;
+   goto err_probe_wq;
 
md_misc_wq = alloc_workqueue("md_misc", 0, 0);
if (!md_misc_wq)
@@ -9055,6 +9077,8 @@ static int __init md_init(void)
destroy_workqueue(md_misc_wq);
 err_misc_wq:
destroy_workqueue(md_wq);
+err_probe_wq:
+   destroy_workqueue(md_probe_wq);
 err_wq:
return ret;
 }
@@ -9311,6 +9335,7 @@ static __exit void md_exit(void)
}
destroy_workqueue(md_misc_wq);
destroy_workqueue(md_wq);
+   destroy_workqueue(md_probe_wq);
 }
 
 subsys_initcall(md_init);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 7d6bcf0..3953896 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -487,6 +487,13 @@ enum recovery_flags {
MD_RECOVERY_ERROR,  /* sync-action interrupted because io-error */
 };
 
+struct md_probe_work {
+   struct work_struct work;
+   wait_queue_head_t wait;
+   dev_t dev;
+   int done;
+};
+
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
return mutex_lock_interruptible(>reconfig_mutex);


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-22 Thread Bruno Wolff III

On Fri, Dec 22, 2017 at 21:20:10 +0800,
 weiping zhang  wrote:

2017-12-22 12:53 GMT+08:00 Bruno Wolff III :

On Thu, Dec 21, 2017 at 17:16:03 -0600,
 Bruno Wolff III  wrote:



Enforcing mode alone isn't enough as I tested that one one machine at home
and it didn't trigger the problem. I'll try another machine late tonight.



I got the problem to occur on my i686 machine when booting in enforcing
mode. This machine uses raid 1 vua mdraid which may or may not be a factor
in this problem. The boot log has a trace at the end and might be helpful,
so I'm attaching it here.

Hi Bruno,
I can reproduce this issue in my QEMU test VM easily, just add an soft
RAID1, always trigger
that warning, I'll debug it later.


Great. When you have a fix, I can test it.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-22 Thread weiping zhang
2017-12-22 12:53 GMT+08:00 Bruno Wolff III :
> On Thu, Dec 21, 2017 at 17:16:03 -0600,
>  Bruno Wolff III  wrote:
>>
>>
>> Enforcing mode alone isn't enough as I tested that one one machine at home
>> and it didn't trigger the problem. I'll try another machine late tonight.
>
>
> I got the problem to occur on my i686 machine when booting in enforcing
> mode. This machine uses raid 1 vua mdraid which may or may not be a factor
> in this problem. The boot log has a trace at the end and might be helpful,
> so I'm attaching it here.
Hi Bruno,
I can reproduce this issue in my QEMU test VM easily, just add an soft
RAID1, always trigger
that warning, I'll debug it later.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 17:16:03 -0600,
 Bruno Wolff III  wrote:


Enforcing mode alone isn't enough as I tested that one one machine at 
home and it didn't trigger the problem. I'll try another machine late 
tonight.


I got the problem to occur on my i686 machine when booting in enforcing 
mode. This machine uses raid 1 vua mdraid which may or may not be a 
factor in this problem. The boot log has a trace at the end and might be 
helpful, so I'm attaching it here.
-- Logs begin at Sun 2017-09-24 07:43:45 CDT, end at Thu 2017-12-21 22:46:47 
CST. --
Dec 21 21:36:32 wolff.to kernel: Linux version 4.15.0-0.rc4.git1.2.fc28.i686 
(mockbu...@buildvm-15.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red 
Hat 7.2.1-4) (GCC)) #1 SMP Tue Dec 19 17:26:41 UTC 2017
Dec 21 21:36:32 wolff.to kernel: x86/fpu: x87 FPU will use FXSAVE
Dec 21 21:36:32 wolff.to kernel: e820: BIOS-provided physical RAM map:
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0x-0x0009cbff] usable
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0x0009cc00-0x0009] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0x000e-0x000f] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0x0010-0xbfee] usable
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xbfef-0xbfefbfff] ACPI data
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xbfefc000-0xbfef] ACPI NVS
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xbff0-0xbff7] usable
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xbff8-0xbfff] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xfec0-0xfec0] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xfee0-0xfee00fff] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xff80-0xffbf] reserved
Dec 21 21:36:32 wolff.to kernel: BIOS-e820: [mem 
0xfff0-0x] reserved
Dec 21 21:36:32 wolff.to kernel: Notice: NX (Execute Disable) protection 
missing in CPU!
Dec 21 21:36:32 wolff.to kernel: random: fast init done
Dec 21 21:36:32 wolff.to kernel: SMBIOS 2.32 present.
Dec 21 21:36:32 wolff.to kernel: DMI: Hewlett-Packard hp workstation 
xw8000/0844, BIOS JQ.W1.19US  04/13/05
Dec 21 21:36:32 wolff.to kernel: e820: update [mem 0x-0x0fff] 
usable ==> reserved
Dec 21 21:36:32 wolff.to kernel: e820: remove [mem 0x000a-0x000f] usable
Dec 21 21:36:32 wolff.to kernel: e820: last_pfn = 0xbff80 max_arch_pfn = 
0x10
Dec 21 21:36:32 wolff.to kernel: MTRR default type: uncachable
Dec 21 21:36:32 wolff.to kernel: MTRR fixed ranges enabled:
Dec 21 21:36:32 wolff.to kernel:   0-9 write-back
Dec 21 21:36:32 wolff.to kernel:   A-B uncachable
Dec 21 21:36:32 wolff.to kernel:   C-F write-protect
Dec 21 21:36:32 wolff.to kernel: MTRR variable ranges enabled:
Dec 21 21:36:32 wolff.to kernel:   0 base 0 mask F8000 write-back
Dec 21 21:36:32 wolff.to kernel:   1 base 08000 mask FC000 write-back
Dec 21 21:36:32 wolff.to kernel:   2 disabled
Dec 21 21:36:32 wolff.to kernel:   3 disabled
Dec 21 21:36:32 wolff.to kernel:   4 disabled
Dec 21 21:36:32 wolff.to kernel:   5 disabled
Dec 21 21:36:32 wolff.to kernel:   6 disabled
Dec 21 21:36:32 wolff.to kernel:   7 disabled
Dec 21 21:36:32 wolff.to kernel: x86/PAT: Configuration [0-7]: WB  WC  UC- UC  
WB  WC  UC- UC  
Dec 21 21:36:32 wolff.to kernel: found SMP MP-table at [mem 
0x000f63a0-0x000f63af] mapped at [(ptrval)]
Dec 21 21:36:32 wolff.to kernel: initial memory mapped: [mem 
0x-0x0a7f]
Dec 21 21:36:32 wolff.to kernel: Base memory trampoline at [(ptrval)] 98000 
size 16384
Dec 21 21:36:32 wolff.to kernel: BRK [0x0a53f000, 0x0a53] PGTABLE
Dec 21 21:36:32 wolff.to kernel: BRK [0x0a54, 0x0a541fff] PGTABLE
Dec 21 21:36:32 wolff.to kernel: BRK [0x0a542000, 0x0a542fff] PGTABLE
Dec 21 21:36:32 wolff.to kernel: RAMDISK: [mem 0x36732000-0x37fe]
Dec 21 21:36:32 wolff.to kernel: Allocated new RAMDISK: [mem 
0x34e74000-0x367318b1]
Dec 21 21:36:32 wolff.to kernel: Move RAMDISK from [mem 0x36732000-0x37fef8b1] 
to [mem 0x34e74000-0x367318b1]
Dec 21 21:36:32 wolff.to kernel: ACPI: Early table checksum verification 
disabled
Dec 21 21:36:32 wolff.to kernel: ACPI: RSDP 0x000F6370 14 (v00 
PTLTD )
Dec 21 21:36:32 wolff.to kernel: ACPI: RSDT 0xBFEF8D1D 34 (v01 
PTLTDRSDT   0604  LTP )
Dec 21 21:36:32 wolff.to kernel: ACPI: FACP 0xBFEFBDB5 74 (v01 
INTEL  PLACER   0604 PTL  0008)
Dec 21 21:36:32 wolff.to kernel: ACPI: DSDT 0xBFEF8D51 003064 (v01 hp   
  silvertn 0604 MSFT 010E)
Dec 21 21:36:32 wolff.to kernel: ACPI: FACS 0xBFEFCFC0 40
Dec 21 21:36:32 wolff.to kernel: ACPI: _HP_ 0xBFEFBE29 000113 (v01 
HPINVT HPINVENT 

Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 12:15:31 -0600,
 Bruno Wolff III  wrote:


One important thing I have just found is that it looks like the 
problem only happens when booting in enforcing mode. If I boot in 
permissive mode it does not happen. My home machines are currently set 
to boot in permissive mode and I'll test this evening to see if I can 
reproduce the problem if I change them to enforcing mode. If so I'll 
be able to do lots of testing during my vacation.


Enforcing mode alone isn't enough as I tested that one one machine at 
home and it didn't trigger the problem. I'll try another machine late 
tonight.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 10:02:15 -0700,
 Jens Axboe  wrote:

On 12/21/17 9:42 AM, Bruno Wolff III wrote:

On Thu, Dec 21, 2017 at 23:48:19 +0800,
  weiping zhang  wrote:

output you want. I never saw it for any kernels I compiled myself. Only when
I test kernels built by Fedora do I see it.

see it every boot ?


I don't look every boot. The warning gets scrolled of the screen. Once I see
the CPU hang warnings I know the boot is failing. I don't always look
at journalctl later to see what's there.


I'm going to revert a0747a859ef6 for now, since we're now 8 days into this
and no progress has been made on fixing it.


One important thing I have just found is that it looks like the problem 
only happens when booting in enforcing mode. If I boot in permissive 
mode it does not happen. My home machines are currently set to boot in 
permissive mode and I'll test this evening to see if I can reproduce the 
problem if I change them to enforcing mode. If so I'll be able to do lots 
of testing during my vacation.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Jens Axboe
On 12/21/17 9:42 AM, Bruno Wolff III wrote:
> On Thu, Dec 21, 2017 at 23:48:19 +0800,
>   weiping zhang  wrote:
>>> output you want. I never saw it for any kernels I compiled myself. Only when
>>> I test kernels built by Fedora do I see it.
>> see it every boot ?
> 
> I don't look every boot. The warning gets scrolled of the screen. Once I see 
> the CPU hang warnings I know the boot is failing. I don't always look 
> at journalctl later to see what's there.

I'm going to revert a0747a859ef6 for now, since we're now 8 days into this
and no progress has been made on fixing it.

-- 
Jens Axboe



Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 23:48:19 +0800,
 weiping zhang  wrote:

output you want. I never saw it for any kernels I compiled myself. Only when
I test kernels built by Fedora do I see it.

see it every boot ?


I don't look every boot. The warning gets scrolled of the screen. Once I see 
the CPU hang warnings I know the boot is failing. I don't always look 
at journalctl later to see what's there.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread weiping zhang
2017-12-21 23:36 GMT+08:00 Bruno Wolff III :
> On Thu, Dec 21, 2017 at 23:31:40 +0800,
>  weiping zhang  wrote:
>>
>> does every time boot fail can trigger WANRING in device_add_disk ?
>
>
> Not that I see. But the message could scroll off the screen. The boot gets
> far enough that systemd copies over dmesg output to permanent storage that I
> can see on my next successful boot. That's where I looked for the warning
> output you want. I never saw it for any kernels I compiled myself. Only when
> I test kernels built by Fedora do I see it.
see it every boot ?

> I just tried booting to single user and the boot still hangs.
>
> When I build the kernels, the compiler options are probably a bit different
> than when Fedora does. That might affect what happens during boot.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 23:31:40 +0800,
 weiping zhang  wrote:

does every time boot fail can trigger WANRING in device_add_disk ?


Not that I see. But the message could scroll off the screen. The boot gets 
far enough that systemd copies over dmesg output to permanent storage that 
I can see on my next successful boot. That's where I looked for the warning 
output you want. I never saw it for any kernels I compiled myself. Only 
when I test kernels built by Fedora do I see it.


I just tried booting to single user and the boot still hangs.

When I build the kernels, the compiler options are probably a bit different 
than when Fedora does. That might affect what happens during boot.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III

On Thu, Dec 21, 2017 at 22:01:33 +0800,
 weiping zhang  wrote:

Hi,
how do you do bisect ?build all kernel commit one by one ?
as you did before:
https://bugzilla.redhat.com/show_bug.cgi?id=1520982


I just did the one bisect using Linus' tree. After each build, I would do 
a test boot and see if the boot was normal or if I got errors and an 
eventual hang before boot.


Since then I have used git revert to revert just the problem commit from 
later kernels (such as v4.15-rc4) and when I do the system boots normally. 
And when I don't do the revert or just use stock Fedora kernels the problem 
occurs every time.


I also did a couple of tests with Josh Boyer's Fedora kernel tree that 
has Fedora patches on top of the development kernel.



what kernel source code do you use that occur warning at device_add_disk?
from fedora or any official release ? if so ,could you provide web link?


That was from an offical Fedora kernel. I believe I got it from the 
nodebug repo, but that kernel should be the same as the one that was 
normally used for rawhide. It is at 
https://koji.fedoraproject.org/koji/buildinfo?buildID=1007500 
but I don't know how much longer the binaries will stay available in koji. 


if you use same kernel source code and same .config, why your own build
Cann't trigger that warning ?


I don't know. The install script may build the initramfs differently. As 
far as I can tell, if the WARN_ON was triggered, I should have gotten 
output. 


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread weiping zhang
2017-12-21 21:00 GMT+08:00 Bruno Wolff III :
> After today, I won't have physical access to the problem machine until
> January 2nd. So if you guys have any testing suggestions I need them soon if
> they are to get done before my vacation.
> I do plan to try booting to level 1 to see if I can get a login prompt that
> might facilitate testing. The lockups do happen fairly late in the boot
> process. I never get to X, but maybe it will get far enough for a console
> login.
>
Hi,
how do you do bisect ?build all kernel commit one by one ?
as you did before:
https://bugzilla.redhat.com/show_bug.cgi?id=1520982

what kernel source code do you use that occur warning at device_add_disk?
from fedora or any official release ? if so ,could you provide web link?

if you use same kernel source code and same .config, why your own build
Cann't trigger that warning ?


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-21 Thread Bruno Wolff III
After today, I won't have physical access to the problem machine until 
January 2nd. So if you guys have any testing suggestions I need them soon 
if they are to get done before my vacation.
I do plan to try booting to level 1 to see if I can get a login prompt 
that might facilitate testing. The lockups do happen fairly late in the 
boot process. I never get to X, but maybe it will get far enough for 
a console login.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-19 Thread Bruno Wolff III

On Tue, Dec 19, 2017 at 10:24:52 -0800,
 Shaohua Li  wrote:


Not sure if this is MD related, but could you please check if this debug patch
changes anything?


I'm doing a build now. I do use md to mirror disk partitions between two disks. I do that on another machine that doesn't exhibit the problem, but it is 
i686, not x86_64.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-19 Thread Shaohua Li
On Tue, Dec 19, 2017 at 10:17:43AM -0600, Bruno Wolff III wrote:
> On Sun, Dec 17, 2017 at 21:43:50 +0800,
>  weiping zhang  wrote:
> > Hi, thanks for testing, I think you first reproduce this issue(got WARNING
> > at device_add_disk) by your own build, then add my debug patch.
> 
> The problem is still in rc4. Reverting the commit still fixes the problem. I
> tested that warning level messages should appear using lkdtm. While there
> could be something weird relating to the WARN_ON macro, more likely there is
> something different about the boots with the kernels I build (the exact way
> initramfs is built is probably different) and probably that (WARN_ON) code
> is not getting executed.

Not sure if this is MD related, but could you please check if this debug patch
changes anything?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4e4dee0..c365179 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -518,7 +518,6 @@ static void mddev_put(struct mddev *mddev)
mddev->ctime == 0 && !mddev->hold_active) {
/* Array is not configured at all, and not held active,
 * so destroy it */
-   list_del_init(>all_mddevs);
bs = mddev->bio_set;
sync_bs = mddev->sync_set;
mddev->bio_set = NULL;
@@ -5210,6 +5209,10 @@ static void md_free(struct kobject *ko)
}
percpu_ref_exit(>writes_pending);
 
+   spin_lock(_mddevs_lock);
+   list_del_init(>all_mddevs);
+   spin_unlock(_mddevs_lock);
+
kfree(mddev);
 }
 


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-19 Thread Bruno Wolff III

On Sun, Dec 17, 2017 at 21:43:50 +0800,
 weiping zhang  wrote:

Hi, thanks for testing, I think you first reproduce this issue(got WARNING
at device_add_disk) by your own build, then add my debug patch.


The problem is still in rc4. Reverting the commit still fixes the problem. 
I tested that warning level messages should appear using lkdtm. While 
there could be something weird relating to the WARN_ON macro, more likely 
there is something different about the boots with the kernels I build 
(the exact way initramfs is built is probably different) and probably 
that (WARN_ON) code is not getting executed.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-18 Thread Bruno Wolff III

On Sun, Dec 17, 2017 at 21:43:50 +0800,
 weiping zhang  wrote:

Hi, thanks for testing, I think you first reproduce this issue(got WARNING
at device_add_disk) by your own build, then add my debug patch.


I'm going to try testing warnings with a kernel I've built, to try to 
determine if warnings are working at all for the ones I'm building. However 
it might be that the WARN_ONs are not being reached for the kernels I've 
built. If that turns out to be the case, I may not be able to get you both 
the output from the WARN_ONs and the output from your debugging patch at 
the same time.

My next kernel build isn't going to finish in time to test today.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-17 Thread Bruno Wolff III

On Sun, Dec 17, 2017 at 21:43:50 +0800,
 weiping zhang  wrote:

Hi, thanks for testing, I think you first reproduce this issue(got WARNING
at device_add_disk) by your own build, then add my debug patch.


No, the first log (that Laura copied) was from the Fedora bug and it was 
from a Fedora kernel before I started doing the bisect. That warnings are 
missing from all of my builds, not just the ones from the patches.


I didn't spot anything obvious going through the kernel spec file when 
I looked, but I could easily have missed something.


Hopefully it is just a boot default that is different and that I can 
override it with a kernel boot parameter.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-17 Thread weiping zhang
2017-12-17 0:32 GMT+08:00 Bruno Wolff III :
> On Fri, Dec 15, 2017 at 13:51:22 -0600,
>  Bruno Wolff III  wrote:
>>
>>
>> I do not know what is different. Do you have any ideas? Most likely I
>> won't be able to test any more kernels until Monday (unless I can use most
>> of my most recent build over again very soon).
>
>
> The .config looks like it should be OK. I'll test setting loglevel on boot
> in case the default is different than what the config file says. I can't do
> that until Monday morning.
>
> I think it is more likely the the WARN_ON macro code isn't being compiled in
> for some reason. I haven't confirmed that, nor have I found anything that
> would leave that code out when I do a make, but include it during Fedora
> builds.
Hi, thanks for testing, I think you first reproduce this issue(got WARNING
at device_add_disk) by your own build, then add my debug patch.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-16 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 13:51:22 -0600,
 Bruno Wolff III  wrote:


I do not know what is different. Do you have any ideas? Most likely I 
won't be able to test any more kernels until Monday (unless I can use 
most of my most recent build over again very soon).


The .config looks like it should be OK. I'll test setting loglevel on 
boot in case the default is different than what the config file says. 
I can't do that until Monday morning.


I think it is more likely the the WARN_ON macro code isn't being 
compiled in for some reason. I haven't confirmed that, nor have I found 
anything that would leave that code out when I do a make, but include 
it during Fedora builds.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 09:18:56 -0800,
 Laura Abbott  wrote:


You can see the trees Fedora produces at 
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git
which includes the configs (you want to look at the ones withtout - debug)


Thanks. I found it a little while ago and am already doing a test build 
without weiping's test patch to see if that kernel provides what he(?) 
needs. Doing a rebuild with the test patch will go pretty quickly. So 
if I get the message with device_add_disk from these kernels, I should 
be able to get the information this afternoon. If there is some other 
reason I don't get that when I do the builds, I'm probably not going to be 
able to figure it out and get a build done before I leave. I don't live 
close enough to the office that I'm going to want to drive in just to 
be able to do a reboot test. (And my hardware at home does exhibit the 
problem.)


If you have some other idea about why I might not be seeing the 
device_add_disk message, I'd be interested in hearing it.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread Laura Abbott

On 12/15/2017 08:30 AM, Bruno Wolff III wrote:

On Fri, Dec 15, 2017 at 22:02:20 +0800,
  weiping zhang  wrote:


Yes, please help reproduce this issue include my debug patch. Reproduce means
we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner.


I'm not sure why yet, but I'm only getting the warning message you want with 
Fedora kernels, not the ones I am building (with or without your test patch). 
I'll attach a debug config file if you want to look there. But in theory that 
should be essentially what Fedora is using for theirs. They probably have some 
out of tree patches they are applying, but I wouldn't expect those to make a 
difference here. I think they now have a tree somewhere that I can try to build 
from that has their patches applied to the upstream kernel and if I can find it 
I will try building it just to test this out.

I only have about 6 hours of physical access to the machine exhibiting the 
problem, and after that I won't be able to do test boots until Monday.



You can see the trees Fedora produces at 
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git
which includes the configs (you want to look at the ones withtout - debug)


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 22:02:20 +0800,
 weiping zhang  wrote:


Yes, please help reproduce this issue include my debug patch. Reproduce means
we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner.


I'm not sure why yet, but I'm only getting the warning message you want 
with Fedora kernels, not the ones I am building (with or without your test 
patch). I'll attach a debug config file if you want to look there. But in 
theory that should be essentially what Fedora is using for theirs. They 
probably have some out of tree patches they are applying, but I wouldn't 
expect those to make a difference here. I think they now have a tree 
somewhere that I can try to build from that has their patches applied 
to the upstream kernel and if I can find it I will try building it just 
to test this out.


I only have about 6 hours of physical access to the machine exhibiting 
the problem, and after that I won't be able to do test boots until Monday.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread weiping zhang
2017-12-15 19:10 GMT+08:00 Bruno Wolff III :
> On Fri, Dec 15, 2017 at 10:04:32 +0800,
>  weiping zhang  wrote:
>>
>> I just want to know WARN_ON WHAT in device_add_disk,
>> if bdi_register_owner return error code, it may fail at any step of
>> following:
>
>
> Was that output in the original boot log? I didn't see anything there that
> had the string WARN_ON. The first log was from a Fedora kernel. The second
Sorry to let you confuse, WARN_ON means we catch log as following:
WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460

> from a kernel I built. I used a Fedora config though. The config was
> probably from one of their nodebug kernels, I could build another one using
> a config from a debug kernel. Would that likely provide what you are looking
> for?

Yes, please help reproduce this issue include my debug patch. Reproduce means
we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 10:04:32 +0800,
 weiping zhang  wrote:

I just want to know WARN_ON WHAT in device_add_disk,
if bdi_register_owner return error code, it may fail at any step of following:


Was that output in the original boot log? I didn't see anything there 
that had the string WARN_ON. The first log was from a Fedora kernel. The 
second from a kernel I built. I used a Fedora config though. The config 
was probably from one of their nodebug kernels, I could build another 
one using a config from a debug kernel. Would that likely provide what 
you are looking for?


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 10:04:32 +0800,
 weiping zhang  wrote:


so I want see the WARN_ON as you paste before, also my DEBUG log will help
to find which step fail.


The previous time also journalctl for output, but maybe I used slightly 
different options. I'll look and see if it is in the journal for the last 
bad boot. I can do that from home.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread weiping zhang
2017-12-15 9:44 GMT+08:00 Bruno Wolff III :
> On Fri, Dec 15, 2017 at 09:22:21 +0800,
>  weiping zhang  wrote:
>>
>>
>> Thanks your testing, but I cann't find WARN_ON in device_add_disk from
>> this boot1.log, could you help reproduce that issue? And does this issue
>> can be
>> triggered at every bootup ?
>
>
> I don't know what you need for the first question. When I am physically at
> the machine I can do test reboots. If you have something specific you want
> me to try I should be able to.
>
> Every time I boot with the problem commit, the boot never completes. However
> it does seem to get pretty far. I get multiple register dumps every time.
> After a while (a few minutes) I reboot to a wrking kernel.
>
> The output I included is from: journalctl -k -b -1
> If you think it would be better to see more than dmesg output let me know.
I just want to know WARN_ON WHAT in device_add_disk,
if bdi_register_owner return error code, it may fail at any step of following:

bdi_debug_root is NULL
bdi->debug_dir is NULL
bdi->debug_stats is NULL

so I want see the WARN_ON as you paste before, also my DEBUG log will help
to find which step fail.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread Bruno Wolff III

On Fri, Dec 15, 2017 at 09:22:21 +0800,
 weiping zhang  wrote:


Thanks your testing, but I cann't find WARN_ON in device_add_disk from
this boot1.log, could you help reproduce that issue? And does this issue can be
triggered at every bootup ?


I don't know what you need for the first question. When I am physically at 
the machine I can do test reboots. If you have something specific you want 
me to try I should be able to.


Every time I boot with the problem commit, the boot never completes. However 
it does seem to get pretty far. I get multiple register dumps every time. 
After a while (a few minutes) I reboot to a wrking kernel.


The output I included is from: journalctl -k -b -1
If you think it would be better to see more than dmesg output let me know.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread weiping zhang
2017-12-14 23:41 GMT+08:00 Bruno Wolff III :
> On Thu, Dec 14, 2017 at 18:09:27 +0800,
>  weiping zhang  wrote:
>>
>>
>> It seems something wrong with bdi debugfs register, could you help
>> test the forllowing debug patch, I add some debug log, no function
>> change, thanks.
>
>
> I applied your patch to d39a01eff9af1045f6e30ff9db40310517c4b45f and there
> were some new debug messages in the dmesg output. Hopefully this helps. I
> also added the patch and output to the Fedora bug for people following
> there.

Hi Bruno,

Thanks your testing, but I cann't find WARN_ON in device_add_disk from
this boot1.log, could you help reproduce that issue? And does this issue can be
triggered at every bootup ?

--
Thanks
weiping


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread Bruno Wolff III

On Thu, Dec 14, 2017 at 18:09:27 +0800,
 weiping zhang  wrote:


It seems something wrong with bdi debugfs register, could you help
test the forllowing debug patch, I add some debug log, no function
change, thanks.


I applied your patch to d39a01eff9af1045f6e30ff9db40310517c4b45f and there 
were some new debug messages in the dmesg output. Hopefully this helps. I 
also added the patch and output to the Fedora bug for people following there.
-- Logs begin at Thu 2017-09-28 16:17:29 CDT, end at Thu 2017-12-14 09:36:50 
CST. --
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: microcode: microcode updated early 
to revision 0x3a, date = 2017-01-30
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: Linux version 4.15.0-rc3+ 
(br...@cerberus.csd.uwm.edu) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-4) 
(GCC)) #15 SMP Thu Dec 14 09:07:46 CST 2017
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: Command line: 
BOOT_IMAGE=/vmlinuz-4.15.0-rc3+ 
root=/dev/mapper/luks-f5e2d09b-f8a3-487d-9517-abe4fb0eada3 ro 
rd.md.uuid=7f4fcca0:13b1445f:a91ff455:6bb1ab48 
rd.luks.uuid=luks-cc6ee93c-e729-4f78-9baf-0cc5cc8a9ff1 
rd.md.uuid=ef18531c:760102fb:7797cbdb:5cf9516f 
rd.md.uuid=42efe386:0c315f28:f7c61920:ea098f81 
rd.luks.uuid=luks-f5e2d09b-f8a3-487d-9517-abe4fb0eada3 LANG=en_US.UTF-8
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 
0x001: 'x87 floating point registers'
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 
0x002: 'SSE registers'
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Supporting XSAVE feature 
0x004: 'AVX registers'
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: xstate_offset[2]:  576, 
xstate_sizes[2]:  256
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: x86/fpu: Enabled xstate features 
0x7, context size is 832 bytes, using 'standard' format.
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: BIOS-provided physical RAM 
map:
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x-0x0009e7ff] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x0009e800-0x0009] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x000e-0x000f] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x0010-0x998f1fff] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x998f2000-0x9a29dfff] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9a29e000-0x9a2e6fff] ACPI data
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9a2e7000-0x9af43fff] ACPI NVS
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9af44000-0x9b40afff] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9b40b000-0x9b40bfff] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9b40c000-0x9b419fff] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x9b41a000-0x9cff] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0xa000-0xafff] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0xfed1c000-0xfed1] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0xff00-0x] reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: BIOS-e820: [mem 
0x0001-0x00085fff] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: NX (Execute Disable) protection: 
active
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: random: fast init done
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: SMBIOS 2.8 present.
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: DMI: Dell Inc. Precision Tower 
5810/0WR1RF, BIOS A07 04/14/2015
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: update [mem 
0x-0x0fff] usable ==> reserved
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: remove [mem 
0x000a-0x000f] usable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: e820: last_pfn = 0x86 
max_arch_pfn = 0x4
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR default type: write-back
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR fixed ranges enabled:
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   0-9 write-back
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   A-B uncachable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   C-E3FFF write-through
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   E4000-F write-protect
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel: MTRR variable ranges enabled:
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   0 base C000 mask 
3FFFC000 uncachable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   1 base A000 mask 
3FFFE000 uncachable
Dec 14 09:17:43 cerberus.csd.uwm.edu kernel:   2 base 0300 mask 

Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread Bruno Wolff III

On Thu, Dec 14, 2017 at 18:09:27 +0800,
 weiping zhang  wrote:

On Thu, Dec 14, 2017 at 02:24:52AM -0600, Bruno Wolff III wrote:

On Wed, Dec 13, 2017 at 16:54:17 -0800,
 Laura Abbott  wrote:
>Hi,
>
>Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982
>of a boot failure/bug on Linus' master (full bootlog at the bugzilla)

I'm available for testing. The problem happens on my x86_64 Dell
Workstation, but not an old i386 server or an x86_64 mac hardware
based laptop.


Hi,

It seems something wrong with bdi debugfs register, could you help
test the forllowing debug patch, I add some debug log, no function
change, thanks.


I'll test it this morning. I'll probably have results in about 7 hrs from now.


Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread weiping zhang
On Thu, Dec 14, 2017 at 02:24:52AM -0600, Bruno Wolff III wrote:
> On Wed, Dec 13, 2017 at 16:54:17 -0800,
>  Laura Abbott  wrote:
> >Hi,
> >
> >Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982
> >of a boot failure/bug on Linus' master (full bootlog at the bugzilla)
> 
> I'm available for testing. The problem happens on my x86_64 Dell
> Workstation, but not an old i386 server or an x86_64 mac hardware
> based laptop.

Hi,

It seems something wrong with bdi debugfs register, could you help
test the forllowing debug patch, I add some debug log, no function
change, thanks.


>From d2728c07589e8b83115a51e0c629451bff7308db Mon Sep 17 00:00:00 2001
From: weiping zhang 
Date: Thu, 14 Dec 2017 17:56:22 +0800
Subject: [PATCH] bdi debugfs

Signed-off-by: weiping zhang 
---
 mm/backing-dev.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 84b2dc7..fbbb9a6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -39,6 +39,10 @@ static struct dentry *bdi_debug_root;
 static void bdi_debug_init(void)
 {
bdi_debug_root = debugfs_create_dir("bdi", NULL);
+   if (!bdi_debug_root)
+   pr_err("DEBUG:bdi_debug_root fail\n");
+   else
+   pr_err("DEBUG:bdi_debug_root success\n");
 }
 
 static int bdi_debug_stats_show(struct seq_file *m, void *v)
@@ -115,18 +119,29 @@ static const struct file_operations bdi_debug_stats_fops 
= {
 
 static int bdi_debug_register(struct backing_dev_info *bdi, const char *name)
 {
-   if (!bdi_debug_root)
+   if (!bdi_debug_root) {
+   pr_err("DEBUG:dev:%s, bdi_debug_root fail\n", name);
return -ENOMEM;
+   } else {
+   pr_err("DEBUG:dev:%s, bdi_debug_root success\n", name);
+   }
 
bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root);
-   if (!bdi->debug_dir)
+   if (!bdi->debug_dir) {
+   pr_err("DEBUG:dev:%s, debug_dir fail\n", name);
return -ENOMEM;
+   } else {
+   pr_err("DEBUG:dev:%s, debug_dir success\n", name);
+   }
 
bdi->debug_stats = debugfs_create_file("stats", 0444, bdi->debug_dir,
   bdi, _debug_stats_fops);
if (!bdi->debug_stats) {
debugfs_remove(bdi->debug_dir);
+   pr_err("DEBUG:dev:%s, debug_stats fail\n", name);
return -ENOMEM;
+   } else {
+   pr_err("DEBUG:dev:%s, debug_stats success\n", name);
}
 
return 0;
@@ -879,13 +894,20 @@ int bdi_register_va(struct backing_dev_info *bdi, const 
char *fmt, va_list args)
return 0;
 
dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
-   if (IS_ERR(dev))
+   if (IS_ERR(dev)) {
+   pr_err("DEBUG: bdi device_create_vargs fail\n");
return PTR_ERR(dev);
+   }
+   pr_err("DEBUG: bdi(0x%p) device_create_vargs sucess\n", bdi);
 
if (bdi_debug_register(bdi, dev_name(dev))) {
+   pr_err("DEBUG: dev:%s, bdi(0x%p) bdi_debug_register fail\n",
+   dev_name(dev), bdi);
device_destroy(bdi_class, dev->devt);
return -ENOMEM;
}
+   pr_err("DEBUG: dev:%s, bdi(0x%p) bdi_debug_register success\n",
+   dev_name(dev), bdi);
cgwb_bdi_register(bdi);
bdi->dev = dev;
 
-- 
2.9.4



Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-14 Thread Bruno Wolff III

On Wed, Dec 13, 2017 at 16:54:17 -0800,
 Laura Abbott  wrote:

Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982
of a boot failure/bug on Linus' master (full bootlog at the bugzilla)


I'm available for testing. The problem happens on my x86_64 Dell Workstation, 
but not an old i386 server or an x86_64 mac hardware based laptop.


Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-13 Thread Laura Abbott

Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982
of a boot failure/bug on Linus' master (full bootlog at the bugzilla)

WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460
Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi 
snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support 
mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas 
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon 
i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei 
wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 
radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel 
ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore 
analog gameport joydev
CPU: 3 PID: 3486 Comm: mdadm Not tainted 4.15.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015
task: e8461579 task.stack: bfe85ee4
RIP: 0010:device_add_disk+0x3d9/0x460
RSP: 0018:b42783b37b30 EFLAGS: 00010282
RAX: fff4 RBX: 952df829b000 RCX: 
RDX:  RSI: 0001f040 RDI: 01ff
RBP: 952df829b070 R08: 952df6bb2d60 R09: 0001820001ff
R10: 0001 R11: 1401 R12: 
R13: 952df829b00c R14: 0009 R15: 952df829b000
FS:  7fd492882740() GS:952e1fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd4921a95b0 CR3: 000837ecf001 CR4: 001606e0
Call Trace:
 ? pm_runtime_init+0xa0/0xc0
 md_alloc+0x1a8/0x360
 md_probe+0x15/0x20
 kobj_lookup+0x100/0x150
 ? md_alloc+0x360/0x360
 get_gendisk+0x29/0x110
 blkdev_get+0x61/0x2f0
 ? bd_acquire+0xb0/0xb0
 ? bd_acquire+0xb0/0xb0
 do_dentry_open+0x1b1/0x2d0
 ? security_inode_permission+0x3c/0x50
 path_openat+0x602/0x14e0
 do_filp_open+0x9b/0x110
 ? __check_object_size+0xaf/0x1b0
 ? do_sys_open+0x1bd/0x250
 do_sys_open+0x1bd/0x250
 do_syscall_64+0x61/0x170
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fd492234a5e
RSP: 002b:7fff5d59e9f0 EFLAGS: 0246 ORIG_RAX: 0101
RAX: ffda RBX: 4082 RCX: 7fd492234a5e
RDX: 4082 RSI: 7fff5d59ea80 RDI: ff9c
RBP: 7fff5d59ea80 R08: 7fff5d59ea80 R09: 
R10:  R11: 0246 R12: 0009
R13: 007c R14: 7fff5d59eae0 R15: 7fff5d59eb68
Code: 48 83 c6 10 e8 19 08 f0 ff 85 c0 0f 84 d6 fd ff ff 0f ff e9 cf fd ff ff 80 a3 
bc 00 00 00 ef e9 c3 fd ff ff 0f ff e9 d8 fd ff ff <0f> ff e9 ba fe ff ff 31 f6 
48 89 df e8 36 ec ff ff 48 85 c0 48
---[ end trace 9590c1ef4c38eb03 ]---
BUG: unable to handle kernel NULL pointer dereference at 54605537
IP: sysfs_do_create_link_sd.isra.2+0x2f/0xb0
PGD 0 P4D 0
Oops:  [#1] SMP
Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi 
snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support 
mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas 
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon 
i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei 
wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 
radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel 
ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore 
analog gameport joydev
CPU: 3 PID: 3486 Comm: mdadm Tainted: GW
4.15.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015
task: e8461579 task.stack: bfe85ee4
RIP: 0010:sysfs_do_create_link_sd.isra.2+0x2f/0xb0
RSP: 0018:b42783b37b00 EFLAGS: 00010246
RAX:  RBX: 0040 RCX: 0001
RDX: 0001 RSI: 0040 RDI: bb613b0c
RBP: baca3577 R08: 0008 R09: 0008
R10: f9efe0e8ca00 R11: f9efe0d77001 R12: 0001
R13: 952df6f45110 R14: 0009 R15: 952df829b000
FS:  7fd492882740() GS:952e1fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0040 CR3: 000837ecf001 CR4: 001606e0
Call Trace:
 device_add_disk+0x3b7/0x460
 md_alloc+0x1a8/0x360
 md_probe+0x15/0x20
 kobj_lookup+0x100/0x150
 ? md_alloc+0x360/0x360
 get_gendisk+0x29/0x110
 blkdev_get+0x61/0x2f0
 ? bd_acquire+0xb0/0xb0
 ? bd_acquire+0xb0/0xb0
 do_dentry_open+0x1b1/0x2d0
 ? security_inode_permission+0x3c/0x50
 path_openat+0x602/0x14e0
 do_filp_open+0x9b/0x110
 ?