Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-15 Thread Laura Abbott

On 12/15/2017 08:30 AM, Bruno Wolff III wrote:

On Fri, Dec 15, 2017 at 22:02:20 +0800,
  weiping zhang  wrote:


Yes, please help reproduce this issue include my debug patch. Reproduce means
we can see WARN_ON in device_add_disk caused by failure of bdi_register_owner.


I'm not sure why yet, but I'm only getting the warning message you want with 
Fedora kernels, not the ones I am building (with or without your test patch). 
I'll attach a debug config file if you want to look there. But in theory that 
should be essentially what Fedora is using for theirs. They probably have some 
out of tree patches they are applying, but I wouldn't expect those to make a 
difference here. I think they now have a tree somewhere that I can try to build 
from that has their patches applied to the upstream kernel and if I can find it 
I will try building it just to test this out.

I only have about 6 hours of physical access to the machine exhibiting the 
problem, and after that I won't be able to do test boots until Monday.



You can see the trees Fedora produces at 
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git
which includes the configs (you want to look at the ones withtout - debug)


Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-13 Thread Laura Abbott

Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1520982
of a boot failure/bug on Linus' master (full bootlog at the bugzilla)

WARNING: CPU: 3 PID: 3486 at block/genhd.c:680 device_add_disk+0x3d9/0x460
Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi 
snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support 
mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas 
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon 
i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei 
wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 
radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel 
ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore 
analog gameport joydev
CPU: 3 PID: 3486 Comm: mdadm Not tainted 4.15.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015
task: e8461579 task.stack: bfe85ee4
RIP: 0010:device_add_disk+0x3d9/0x460
RSP: 0018:b42783b37b30 EFLAGS: 00010282
RAX: fff4 RBX: 952df829b000 RCX: 
RDX:  RSI: 0001f040 RDI: 01ff
RBP: 952df829b070 R08: 952df6bb2d60 R09: 0001820001ff
R10: 0001 R11: 1401 R12: 
R13: 952df829b00c R14: 0009 R15: 952df829b000
FS:  7fd492882740() GS:952e1fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fd4921a95b0 CR3: 000837ecf001 CR4: 001606e0
Call Trace:
 ? pm_runtime_init+0xa0/0xc0
 md_alloc+0x1a8/0x360
 md_probe+0x15/0x20
 kobj_lookup+0x100/0x150
 ? md_alloc+0x360/0x360
 get_gendisk+0x29/0x110
 blkdev_get+0x61/0x2f0
 ? bd_acquire+0xb0/0xb0
 ? bd_acquire+0xb0/0xb0
 do_dentry_open+0x1b1/0x2d0
 ? security_inode_permission+0x3c/0x50
 path_openat+0x602/0x14e0
 do_filp_open+0x9b/0x110
 ? __check_object_size+0xaf/0x1b0
 ? do_sys_open+0x1bd/0x250
 do_sys_open+0x1bd/0x250
 do_syscall_64+0x61/0x170
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fd492234a5e
RSP: 002b:7fff5d59e9f0 EFLAGS: 0246 ORIG_RAX: 0101
RAX: ffda RBX: 4082 RCX: 7fd492234a5e
RDX: 4082 RSI: 7fff5d59ea80 RDI: ff9c
RBP: 7fff5d59ea80 R08: 7fff5d59ea80 R09: 
R10:  R11: 0246 R12: 0009
R13: 007c R14: 7fff5d59eae0 R15: 7fff5d59eb68
Code: 48 83 c6 10 e8 19 08 f0 ff 85 c0 0f 84 d6 fd ff ff 0f ff e9 cf fd ff ff 80 a3 
bc 00 00 00 ef e9 c3 fd ff ff 0f ff e9 d8 fd ff ff <0f> ff e9 ba fe ff ff 31 f6 
48 89 df e8 36 ec ff ff 48 85 c0 48
---[ end trace 9590c1ef4c38eb03 ]---
BUG: unable to handle kernel NULL pointer dereference at 54605537
IP: sysfs_do_create_link_sd.isra.2+0x2f/0xb0
PGD 0 P4D 0
Oops:  [#1] SMP
Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
qcaux snd_usb_audio snd_usbmidi_lib coretemp floppy(+) snd_rawmidi 
snd_seq_device cdc_acm kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support 
mei_wdt intel_wmi_thunderbolt intel_cstate intel_uncore intel_rapl_perf dcdbas 
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic dell_smm_hwmon 
i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mei_me mei 
wmi shpchp target_core_mod snd_pcm_oss snd_mixer_oss binfmt_misc dm_crypt raid1 
radeon i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel 
ttm ghash_clmulni_intel drm e1000e ptp pps_core snd_pcm snd_timer snd soundcore 
analog gameport joydev
CPU: 3 PID: 3486 Comm: mdadm Tainted: GW
4.15.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Dell Inc. Precision Tower 5810/0WR1RF, BIOS A07 04/14/2015
task: e8461579 task.stack: bfe85ee4
RIP: 0010:sysfs_do_create_link_sd.isra.2+0x2f/0xb0
RSP: 0018:b42783b37b00 EFLAGS: 00010246
RAX:  RBX: 0040 RCX: 0001
RDX: 0001 RSI: 0040 RDI: bb613b0c
RBP: baca3577 R08: 0008 R09: 0008
R10: f9efe0e8ca00 R11: f9efe0d77001 R12: 0001
R13: 952df6f45110 R14: 0009 R15: 952df829b000
FS:  7fd492882740() GS:952e1fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0040 CR3: 000837ecf001 CR4: 001606e0
Call Trace:
 device_add_disk+0x3b7/0x460
 md_alloc+0x1a8/0x360
 md_probe+0x15/0x20
 kobj_lookup+0x100/0x150
 ? md_alloc+0x360/0x360
 get_gendisk+0x29/0x110
 blkdev_get+0x61/0x2f0
 ? bd_acquire+0xb0/0xb0
 ? bd_acquire+0xb0/0xb0
 do_dentry_open+0x1b1/0x2d0
 ? security_inode_permission+0x3c/0x50
 path_openat+0x602/0x14e0
 do_filp_open+0x9b/0x110
 ? 

Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-02-23 Thread Laura Abbott

Hi,

The Fedora arm-32 build VMs have a somewhat long standing problem
of hanging when running mkfs.ext4 with a bunch of processes stuck
in D state. This has been seen as far back as 4.13 but is still
present on 4.14:

sysrq: SysRq : Show Blocked State   
 [255/1885]
  taskPC stack   pid father
auditd  D0   377  1 0x0020
[] (__schedule) from [] (schedule+0x98/0xbc)
[] (schedule) from [] (schedule_timeout+0x328/0x3ac)
[] (schedule_timeout) from [] 
(io_schedule_timeout+0x24/0x38)
[] (io_schedule_timeout) from [] 
(balance_dirty_pages.constprop.6+0xac8
/0xc5c)
[] (balance_dirty_pages.constprop.6) from [] 
(balance_dirty_pages_ratel
imited+0x2b8/0x43c)
[] (balance_dirty_pages_ratelimited) from [] 
(generic_perform_write+0x1
74/0x1a4)
[] (generic_perform_write) from [] 
(__generic_file_write_iter+0x16c/0x1
98)
[] (__generic_file_write_iter) from [] 
(ext4_file_write_iter+0x314/0x41
4)
[] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128)
[] (__vfs_write) from [] (vfs_write+0xc0/0x194)
[] (vfs_write) from [] (SyS_write+0x44/0x7c)
[] (SyS_write) from [] (__sys_trace_return+0x0/0x10)
rs:main Q:Reg   D0   441  1 0x
[] (__schedule) from [] (schedule+0x98/0xbc)
[] (schedule) from [] (schedule_timeout+0x328/0x3ac)
[] (schedule_timeout) from [] 
(io_schedule_timeout+0x24/0x38)
[] (io_schedule_timeout) from [] 
(balance_dirty_pages.constprop.6+0xac8
/0xc5c)
[] (balance_dirty_pages.constprop.6) from [] 
(balance_dirty_pages_ratel
imited+0x2b8/0x43c)
[] (balance_dirty_pages_ratelimited) from [] 
(generic_perform_write+0x1
74/0x1a4)
[] (generic_perform_write) from [] 
(__generic_file_write_iter+0x16c/0x1
98)
[] (__generic_file_write_iter) from [] 
(ext4_file_write_iter+0x314/0x41
4)
[] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128)
[] (__vfs_write) from [] (vfs_write+0xc0/0x194)
[] (vfs_write) from [] (SyS_write+0x44/0x7c)
[] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c)
ntpdD0  1453  1 0x0001
[] (__schedule) from [] (schedule+0x98/0xbc)
[] (schedule) from [] (schedule_timeout+0x328/0x3ac)
[] (schedule_timeout) from [] 
(io_schedule_timeout+0x24/0x38)
[] (io_schedule_timeout) from [] 
(balance_dirty_pages.constprop.6+0xac8
/0xc5c)
[] (balance_dirty_pages.constprop.6) from [] 
(balance_dirty_pages_ratel
imited+0x2b8/0x43c)
[] (balance_dirty_pages_ratelimited) from [] 
(generic_perform_write+0x1
74/0x1a4)
[] (generic_perform_write) from [] 
(__generic_file_write_iter+0x16c/0x1
98)
[] (__generic_file_write_iter) from [] 
(ext4_file_write_iter+0x314/0x41
4)
[] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128)
[] (__vfs_write) from [] (vfs_write+0xc0/0x194)
[] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) 
 [203/1885]
[] (__vfs_write) from [] (vfs_write+0xc0/0x194)
[] (vfs_write) from [] (SyS_write+0x44/0x7c)
[] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c)
kojid   D0  4616  1 0x
[] (__schedule) from [] (schedule+0x98/0xbc)
[] (schedule) from [] (schedule_timeout+0x328/0x3ac)
[] (schedule_timeout) from [] 
(io_schedule_timeout+0x24/0x38)
[] (io_schedule_timeout) from [] 
(balance_dirty_pages.constprop.6+0xac8
/0xc5c)
[] (balance_dirty_pages.constprop.6) from [] 
(balance_dirty_pages_ratel
imited+0x2b8/0x43c)
[] (balance_dirty_pages_ratelimited) from [] 
(generic_perform_write+0x1
74/0x1a4)
[] (generic_perform_write) from [] 
(__generic_file_write_iter+0x16c/0x1
98)
[] (__generic_file_write_iter) from [] 
(ext4_file_write_iter+0x314/0x41
4)
[] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128)
[] (__vfs_write) from [] (vfs_write+0xc0/0x194)
[] (vfs_write) from [] (SyS_write+0x44/0x7c)
[] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c)
kworker/u8:0D0 28525  2 0x
Workqueue: writeback wb_workfn (flush-7:0)
[] (__schedule) from [] (schedule+0x98/0xbc)
[] (schedule) from [] (io_schedule+0x1c/0x2c)
[] (io_schedule) from [] (wbt_wait+0x21c/0x300)
[] (wbt_wait) from [] (blk_mq_make_request+0xac/0x560)
[] (blk_mq_make_request) from [] 
(generic_make_request+0xd0/0x214)
[] (generic_make_request) from [] (submit_bio+0x114/0x16c)
[] (submit_bio) from [] (submit_bh_wbc+0x190/0x1a0)
[] (submit_bh_wbc) from [] 
(__block_write_full_page+0x2e8/0x43c)
[] (__block_write_full_page) from [] 
(block_write_full_page+0x80/0xec)
[] (block_write_full_page) from [] (__writepage+0x1c/0x4c)
[] (__writepage) from [] (write_cache_pages+0x350/0x3f0)
[] (write_cache_pages) from [] 
(generic_writepages+0x44/0x60)
[] (generic_writepages) from [] (do_writepages+0x3c/0x74)
[] (do_writepages) from [] 
(__writeback_single_inode+0xb4/0x404)
[] (__writeback_single_inode) from [] 
(writeback_sb_inodes+0x258/0x438)
[] (writeback_sb_inodes) from [] 
(__writeback_inodes_wb+0x6c/0xa8)
[] (__writeback_inodes_wb) from [] 
(wb_writeback+0x1c4/0x30c)
[] (wb_writeback) from [] (wb_workfn+0x130/0x450)
[] (wb_workfn) from [] (process_one_work+0x254/0x42c)
[] 

Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-03-05 Thread Laura Abbott

On 02/26/2018 06:28 AM, Michal Hocko wrote:

On Fri 23-02-18 11:51:41, Laura Abbott wrote:

Hi,

The Fedora arm-32 build VMs have a somewhat long standing problem
of hanging when running mkfs.ext4 with a bunch of processes stuck
in D state. This has been seen as far back as 4.13 but is still
present on 4.14:


[...]

This looks like everything is blocked on the writeback completing but
the writeback has been throttled. According to the infra team, this problem
is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
quite match since this seems to be completely stuck. Any suggestions to
narrow the problem down?


How much dirtyable memory does the system have? We do allow only lowmem
to be dirtyable by default on 32b highmem systems. Maybe you have the
lowmem mostly consumed by the kernel memory. Have you tried to enable
highmem_is_dirtyable?



Setting highmem_is_dirtyable did fix the problem. The infrastructure
people seemed satisfied enough with this (and are happy to have the
machines back). I'll see if they are willing to run a few more tests
to get some more state information.

Thanks,
Laura