Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
On 12/15/2017 08:30 AM, Bruno Wolff III wrote: On Fri, Dec 15, 2017 at 22:02:20 +0800, weiping zhangwrote: Yes, please help reproduce this issue include my debug patch. Reproduce means we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner. I'm not sure why yet, but I'm only getting the warning message you want with Fedora kernels, not the ones I am building (with or without your test patch). I'll attach a debug config file if you want to look there. But in theory that should be essentially what Fedora is using for theirs. They probably have some out of tree patches they are applying, but I wouldn't expect those to make a difference here. I think they now have a tree somewhere that I can try to build from that has their patches applied to the upstream kernel and if I can find it I will try building it just to test this out. I only have about 6 hours of physical access to the machine exhibiting the problem, and after that I won't be able to do test boots until Monday. You can see the trees Fedora produces at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git which includes the configs (you want to look at the ones withtout - debug)
Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")
Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982 of a boot failure/bug on Linus' master (full bootlog at the bugzilla) WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460 Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore analog gameport joydev CPU: 3 PID: 3486 Comm: mdadm Not tainted 4.15.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015 task: e8461579 task.stack: bfe85ee4 RIP: 0010:device_add_disk+0x3d9/0x460 RSP: 0018:b42783b37b30 EFLAGS: 00010282 RAX: fff4 RBX: 952df829b000 RCX: RDX: RSI: 0001f040 RDI: 01ff RBP: 952df829b070 R08: 952df6bb2d60 R09: 0001820001ff R10: 0001 R11: 1401 R12: R13: 952df829b00c R14: 0009 R15: 952df829b000 FS: 7fd492882740() GS:952e1fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd4921a95b0 CR3: 000837ecf001 CR4: 001606e0 Call Trace: ? pm_runtime_init+0xa0/0xc0 md_alloc+0x1a8/0x360 md_probe+0x15/0x20 kobj_lookup+0x100/0x150 ? md_alloc+0x360/0x360 get_gendisk+0x29/0x110 blkdev_get+0x61/0x2f0 ? bd_acquire+0xb0/0xb0 ? bd_acquire+0xb0/0xb0 do_dentry_open+0x1b1/0x2d0 ? security_inode_permission+0x3c/0x50 path_openat+0x602/0x14e0 do_filp_open+0x9b/0x110 ? __check_object_size+0xaf/0x1b0 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x61/0x170 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fd492234a5e RSP: 002b:7fff5d59e9f0 EFLAGS: 0246 ORIG_RAX: 0101 RAX: ffda RBX: 4082 RCX: 7fd492234a5e RDX: 4082 RSI: 7fff5d59ea80 RDI: ff9c RBP: 7fff5d59ea80 R08: 7fff5d59ea80 R09: R10: R11: 0246 R12: 0009 R13: 007c R14: 7fff5d59eae0 R15: 7fff5d59eb68 Code: 48 83 c6 10 e8 19 08 f0 ff 85 c0 0f 84 d6 fd ff ff 0f ff e9 cf fd ff ff 80 a3 bc 00 00 00 ef e9 c3 fd ff ff 0f ff e9 d8 fd ff ff <0f> ff e9 ba fe ff ff 31 f6 48 89 df e8 36 ec ff ff 48 85 c0 48 ---[ end trace 9590c1ef4c38eb03 ]--- BUG: unable to handle kernel NULL pointer dereference at 54605537 IP: sysfs_do_create_link_sd.isra.2+0x2f/0xb0 PGD 0 P4D 0 Oops: [#1] SMP Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore analog gameport joydev CPU: 3 PID: 3486 Comm: mdadm Tainted: GW 4.15.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015 task: e8461579 task.stack: bfe85ee4 RIP: 0010:sysfs_do_create_link_sd.isra.2+0x2f/0xb0 RSP: 0018:b42783b37b00 EFLAGS: 00010246 RAX: RBX: 0040 RCX: 0001 RDX: 0001 RSI: 0040 RDI: bb613b0c RBP: baca3577 R08: 0008 R09: 0008 R10: f9efe0e8ca00 R11: f9efe0d77001 R12: 0001 R13: 952df6f45110 R14: 0009 R15: 952df829b000 FS: 7fd492882740() GS:952e1fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0040 CR3: 000837ecf001 CR4: 001606e0 Call Trace: device_add_disk+0x3b7/0x460 md_alloc+0x1a8/0x360 md_probe+0x15/0x20 kobj_lookup+0x100/0x150 ? md_alloc+0x360/0x360 get_gendisk+0x29/0x110 blkdev_get+0x61/0x2f0 ? bd_acquire+0xb0/0xb0 ? bd_acquire+0xb0/0xb0 do_dentry_open+0x1b1/0x2d0 ? security_inode_permission+0x3c/0x50 path_openat+0x602/0x14e0 do_filp_open+0x9b/0x110 ?
Hangs in balance_dirty_pages with arm-32 LPAE + highmem
Hi, The Fedora arm-32 build VMs have a somewhat long standing problem of hanging when running mkfs.ext4 with a bunch of processes stuck in D state. This has been seen as far back as 4.13 but is still present on 4.14: sysrq: SysRq : Show Blocked State [255/1885] taskPC stack pid father auditd D0 377 1 0x0020 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (__sys_trace_return+0x0/0x10) rs:main Q:Reg D0 441 1 0x [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) ntpdD0 1453 1 0x0001 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [203/1885] [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) kojid D0 4616 1 0x [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) kworker/u8:0D0 28525 2 0x Workqueue: writeback wb_workfn (flush-7:0) [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (io_schedule+0x1c/0x2c) [] (io_schedule) from [] (wbt_wait+0x21c/0x300) [] (wbt_wait) from [] (blk_mq_make_request+0xac/0x560) [] (blk_mq_make_request) from [] (generic_make_request+0xd0/0x214) [] (generic_make_request) from [] (submit_bio+0x114/0x16c) [] (submit_bio) from [] (submit_bh_wbc+0x190/0x1a0) [] (submit_bh_wbc) from [] (__block_write_full_page+0x2e8/0x43c) [] (__block_write_full_page) from [] (block_write_full_page+0x80/0xec) [] (block_write_full_page) from [] (__writepage+0x1c/0x4c) [] (__writepage) from [] (write_cache_pages+0x350/0x3f0) [] (write_cache_pages) from [] (generic_writepages+0x44/0x60) [] (generic_writepages) from [] (do_writepages+0x3c/0x74) [] (do_writepages) from [] (__writeback_single_inode+0xb4/0x404) [] (__writeback_single_inode) from [] (writeback_sb_inodes+0x258/0x438) [] (writeback_sb_inodes) from [] (__writeback_inodes_wb+0x6c/0xa8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x1c4/0x30c) [] (wb_writeback) from [] (wb_workfn+0x130/0x450) [] (wb_workfn) from [] (process_one_work+0x254/0x42c) []
Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem
On 02/26/2018 06:28 AM, Michal Hocko wrote: On Fri 23-02-18 11:51:41, Laura Abbott wrote: Hi, The Fedora arm-32 build VMs have a somewhat long standing problem of hanging when running mkfs.ext4 with a bunch of processes stuck in D state. This has been seen as far back as 4.13 but is still present on 4.14: [...] This looks like everything is blocked on the writeback completing but the writeback has been throttled. According to the infra team, this problem is _not_ seen without LPAE (i.e. only 4G of RAM). I did see https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to quite match since this seems to be completely stuck. Any suggestions to narrow the problem down? How much dirtyable memory does the system have? We do allow only lowmem to be dirtyable by default on 32b highmem systems. Maybe you have the lowmem mostly consumed by the kernel memory. Have you tried to enable highmem_is_dirtyable? Setting highmem_is_dirtyable did fix the problem. The infrastructure people seemed satisfied enough with this (and are happy to have the machines back). I'll see if they are willing to run a few more tests to get some more state information. Thanks, Laura