42d8c91e for drivers/net/phy/realtek.c causing loss on Banana Pi
Hi, in kernel 5.9, my Banana Pi test systems suffers from catastrophic packet loss on the Ethernet that makes the machine nearly unusable. Reverting bbc4d71d63549bcd003a430de18a72a742d8c91e fixes the issue for me. Please investigate the breakage caused by this commit. I am prepared to help with testing and would appreciate a fix even in Greg's stable releases. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56
On Sun, Jun 02, 2019 at 03:48:42PM +0200, Marc Haber wrote: > On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote: > > on my primary notebook, a Lenovo X260, with an Intel Wireless 8260 > > (8086:24f3), running Debian unstable, I have started to see network > > hangs since upgrading to kernel 5.1. In this situation, I cannot > > restart Network-Manager (the call just hangs), I can log out of X, but > > the system does not cleanly shut down and I need to Magic SysRq myself > > out of the running system. This happens about once every two days. > > The issue is also present in 5.1.5 and 5.1.6. Almost a month later, 5.1.15 still crashes about twice a day on my Notebook. The error message seems pretty clear to me, how can I go on from there and may be identify a line number outside of a library? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56
On Fri, Jun 07, 2019 at 10:20:56PM +0200, Yussuf Khalil wrote: > CC'ing iwlwifi maintainers to get some attention for this issue. > > I am experiencing the very same bug on a ThinkPad T480s running 5.1.6 with > Fedora 30. A friend is seeing it on his X1 Carbon 6th Gen, too. Both have an > "Intel Corporation Wireless 8265 / 8275" card according to lspci. I have an older 04:00.0 Network controller [0280]: Intel Corporation Wireless 8260 [8086:24f3] (rev 3a) on a Thinkpad X260. > Notably, in all cases I've observed it occurred right after roaming from one > AP to another (though I can't guarantee this isn't a coincidence). I also have multiple Access Points broadcasting the same SSID in my house, and yes, I experience those issues often when I move from one part of the hose to another. I have, however, also experienced it in a hotel when I was using the mobile hotspot offered by my mobile, so that was clearly not a roaming situation. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56
On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote: > on my primary notebook, a Lenovo X260, with an Intel Wireless 8260 > (8086:24f3), running Debian unstable, I have started to see network > hangs since upgrading to kernel 5.1. In this situation, I cannot > restart Network-Manager (the call just hangs), I can log out of X, but > the system does not cleanly shut down and I need to Magic SysRq myself > out of the running system. This happens about once every two days. The issue is also present in 5.1.5 and 5.1.6. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56
xor uas usb_storage raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci psmouse xhci_pci i2c_i801 e1000e libata xhci_hcd rtsx_pci mfd_core scsi_mod usbcore usb_common i915 i2c_algo_bit drm_kms_helper drm i2c_core thermal video [38179.854652] ---[ end trace fd93637fcde969e6 ]--- [38179.854654] RIP: 0010:compaction_alloc+0x569/0x8c0 [38179.854656] Code: 62 01 00 00 49 be 00 00 00 00 00 16 00 00 eb 72 48 b8 00 00 00 00 00 ea ff ff 49 89 da 49 c1 e2 06 4d 8d 2c 02 4d 85 ed 74 3b <41> 8b 45 30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 f9 00 00 00 41 80 [38179.854657] RSP: 0018:c90001a5f900 EFLAGS: 00010286 [38179.854658] RAX: ea00 RBX: 800ffe00 RCX: 003c [38179.854659] RDX: 800ffe00 RSI: RDI: 8884417f42c0 [38179.854660] RBP: 8010 R08: R09: 8884417fab80 [38179.854661] R10: 03ff8000 R11: 80122c00 R12: 0020 [38179.854662] R13: ea0003ff8000 R14: 1600 R15: c90001a5fae0 [38179.854664] FS: () GS:88843180() knlGS: [38179.854665] CS: 0010 DS: ES: CR0: 80050033 [38179.854666] CR2: 7f87c26a5000 CR3: 00033154a006 CR4: 003626f0 [38179.854667] DR0: DR1: DR2: [38179.854667] DR3: DR6: fffe0ff0 DR7: 0400 Is that a known issue? I currently have this with 5.1.5, are there patches in the queue that may be candidates to stabilize my wireless again? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Linux in KVM guest segfaults when hosts runs Linux 5.1
On Tue, May 14, 2019 at 08:51:28AM +0200, Marc Haber wrote: > On Mon, May 13, 2019 at 04:10:35PM +0200, Radim Krčmář wrote: > > 2019-05-12 13:53+0200, Marc Haber: > > > since updating my home desktop machine to kernel 5.1.1, KVM guests > > > started on that machine segfault after booting: > > [...] > > > Any idea short of bisecting? > > > > It has also been spotted by Borislav and the fix [1] should land in the > > next kernel update, thanks for the report. > > 1: https://patchwork.kernel.org/patch/10936271/ > > I can confirm that this patch fixes the segfaults for me. And it is not yet in Linux 5.1.5. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Kernel 5.1 breaks UDP checksums for SIP packets
On Mon, May 20, 2019 at 12:28:02PM +0200, Florian Westphal wrote: > Marc Haber wrote: > > when I update my Firewall from Kernel 5.0 to Kernel 5.1, SIP clients > > that connect from the internal network to an external, commercial SIP > > service do not work any more. When I trace beyond the NAT, I see that > > the outgoing SIP packets have incorrect UDP checksums: > > I'm a moron. Can you please try this patch? > > diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c > --- a/net/netfilter/nf_nat_helper.c > +++ b/net/netfilter/nf_nat_helper.c > @@ -170,7 +170,7 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb, > if (!udph->check && skb->ip_summed != CHECKSUM_PARTIAL) > return true; > > - nf_nat_csum_recalc(skb, nf_ct_l3num(ct), IPPROTO_TCP, > + nf_nat_csum_recalc(skb, nf_ct_l3num(ct), IPPROTO_UDP, > udph, >check, datalen, oldlen); > > return true; Thanks for the lightning fast reaction. The patch indeed fixes the issue for me, everything is online now, incoming and outgoing calls are possible. Can you funnel that one to Greg please for the next stable release? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Kernel 5.1 breaks UDP checksums for SIP packets
ke things work again. The obvious candidates are nf_conntrack_sip and nf_nat_sip. nf_nat_sip didn't change between 5.0.13 and 5.1.3, and transplanting 5.0's nf_conntrack_sip onto a 5.1.3 kernel didn't change 5.1.3's faulty behavior. Does anybody have an idea that I could try before bisecting 7074 revisions in roughly 13 steps? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Linux 5.1 on APU runs in circles with Call Traces
On Sun, May 12, 2019 at 09:32:03PM +0200, Marc Haber wrote: > I regret to inform you that I have now the third crippling issue in > Linux 5.1, with the fourth one in the process of being isolated. I had GPIOLIB missing in my kernel configuration. That was not autodetected and resulted just in a bunch of kernel warnings scrolling by in a second. make oldconfig allowed me to actually see the warnings. Issue solved. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Linux in KVM guest segfaults when hosts runs Linux 5.1
On Mon, May 13, 2019 at 04:10:35PM +0200, Radim Krčmář wrote: > 2019-05-12 13:53+0200, Marc Haber: > > since updating my home desktop machine to kernel 5.1.1, KVM guests > > started on that machine segfault after booting: > [...] > > Any idea short of bisecting? > > It has also been spotted by Borislav and the fix [1] should land in the > next kernel update, thanks for the report. > 1: https://patchwork.kernel.org/patch/10936271/ I can confirm that this patch fixes the segfaults for me. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Linux 5.1 on APU runs in circles with Call Traces
Hi, I regret to inform you that I have now the third crippling issue in Linux 5.1, with the fourth one in the process of being isolated. This time, it's a PC Engines APU2 running in circles right after booting: May 12 20:56:01 prom kernel: CPU: 2 PID: 657 Comm: kworker/2:2 Tainted: G W 5.1.1-zgsrv20080 #5.1.1.20190511.0 May 12 20:56:01 prom kernel: Hardware name: PC Engines apu2/apu2, BIOS 88a4f96 03/11/2016 May 12 20:56:01 prom kernel: Workqueue: events_freezable input_polled_device_work [input_polldev] May 12 20:56:01 prom kernel: RIP: 0010:gpio_keys_polled_check_state.isra.1+0xa/0x60 [gpio_keys_polled] May 12 20:56:01 prom kernel: Code: 48 8b 17 48 8b 42 10 48 8b 40 20 48 85 c0 74 09 48 8b 7a 08 e9 f7 fa 6e e1 c3 66 0f 1f 44 00 00 41 54 55 48 89 cd 53 48 89 d3 <0f> 0b 8b 46 18 85 c0 74 20 8d 50 fe 83 fa 01 77 1d 8b 03 85 c0 74 May 12 20:56:01 prom kernel: RSP: 0018:c981fe20 EFLAGS: 00010286 May 12 20:56:01 prom kernel: RAX: RBX: 888117ca6548 RCX: 888117ca654c May 12 20:56:01 prom kernel: RDX: 888117ca6548 RSI: a03e9040 RDI: 888116043000 May 12 20:56:01 prom kernel: RBP: 888117ca654c R08: 0010 R09: May 12 20:56:01 prom kernel: R10: 8080808080808080 R11: 0018 R12: 888117ca6538 May 12 20:56:01 prom kernel: R13: 0001 R14: 888117ca6530 R15: 8881161ece40 May 12 20:56:01 prom kernel: FS: () GS:88811ab0() knlGS: May 12 20:56:01 prom kernel: CS: 0010 DS: ES: CR0: 80050033 May 12 20:56:01 prom kernel: CR2: 55572f98a251 CR3: 000117dd6000 CR4: 000406e0 May 12 20:56:01 prom kernel: Call Trace: May 12 20:56:01 prom kernel: gpio_keys_polled_poll+0xd0/0x240 [gpio_keys_polled] May 12 20:56:01 prom kernel: ? __switch_to+0x171/0x410 May 12 20:56:01 prom kernel: ? finish_task_switch+0x6f/0x260 May 12 20:56:01 prom kernel: input_polled_device_work+0x11/0x20 [input_polldev] May 12 20:56:01 prom kernel: process_one_work+0x171/0x300 May 12 20:56:01 prom kernel: worker_thread+0x2b/0x370 May 12 20:56:01 prom kernel: ? process_one_work+0x300/0x300 May 12 20:56:01 prom kernel: kthread+0x108/0x120 May 12 20:56:01 prom kernel: ? kthread_park+0x80/0x80 May 12 20:56:01 prom kernel: ret_from_fork+0x22/0x40 May 12 20:56:01 prom kernel: ---[ end trace 72a086f2949e1d45 ]--- May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 0 (apu:green:1) May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 0 (apu:green:2) May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 0 (apu:green:3) May 12 20:56:01 prom kernel: WARNING: CPU: 2 PID: 657 at include/linux/gpio/consumer.h:421 gpio_keys_polled_check_state.isr a.1+0xa/0x60 [gpio_keys_polled] May 12 20:56:01 prom kernel: Modules linked in: 8021q crct10dif_pclmul(+) crc32_pclmul leds_gpio ghash_clmulni_intel pcengi nes_apuv2 gpio_keys_polled aesni_intel input_polldev aes_x86_64 crypto_simd cryptd glue_helper fam15h_power k10temp input_l eds sp5100_tco led_class sg ccp pcc_cpufreq acpi_cpufreq bridge stp llc ip_tables x_tables autofs4 ext4 mbcache usbhid jbd2 dm_mod usb_storage sd_mod ehci_pci ehci_hcd xhci_pci xhci_hcd ahci crc32c_intel libahci igb i2c_algo_bit usbcore i2c_piix4 ptp libata i2c_core usb_common pps_core hwmon This thing repeats indefinetely at the speed the serial console is able to print. Going back to 5.0.13 immediately fixes this. Any idea short of bisecting? I am sorry, but I am running out of time for kernel debugging this month. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Linux in KVM guest segfaults when hosts runs Linux 5.1
Hi, since updating my home desktop machine to kernel 5.1.1, KVM guests started on that machine segfault after booting: general protection fault: [#1] PREEMPT SMP NOPTI CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted 5.0.13-zgsrv20080 #5.0.13.20190505.0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events once_deferred RIP: 0010:native_read_pmc+0x2/0x10 Code: e2 20 89 3e 48 09 d0 c3 89 f9 89 f0 0f 30 c3 66 0f 1f 84 00 00 00 00 00 89 f0 89 f9 0f 30 31 c0 c3 0f 1f 80 00 00 00 00 89 f9 <0f> 33 48 c1 e2 20 48 09 d0 c3 0f 1f 40 00 0f 20 c0 c3 66 66 2e 0f RSP: 0018:8881b9a03e50 EFLAGS: 00010083 RAX: 0001 RBX: 8001 RCX: RDX: 002f RSI: RDI: RBP: 8881b590e400 R08: 8881b590e400 R09: 0003 R10: e8c05440 R11: R12: 8881b590e5d8 R13: 0010 R14: 8881b590e420 R15: e8c05400 FS: () GS:8881b9a0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f9bcc5c61f8 CR3: 0001b6a24000 CR4: 06f0 Call Trace: x86_perf_event_update+0x3b/0x80 x86_pmu_stop+0x84/0xa0 x86_pmu_del+0x52/0x160 event_sched_out.isra.59+0x95/0x190 group_sched_out.part.61+0x51/0xc0 ctx_sched_out+0xf2/0x220 ctx_resched+0xb8/0xc0 __perf_install_in_context+0x175/0x1f0 remote_function+0x3e/0x50 flush_smp_call_function_queue+0x30/0xe0 smp_call_function_interrupt+0x2f/0x40 call_function_single_interrupt+0xf/0x20 RIP: 0010:smp_call_function_many+0x1ca/0x230 Code: ee 89 c7 e8 e8 f5 51 00 3b 05 a6 23 db 00 0f 83 b2 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 28 db 81 8b 51 18 83 e2 01 74 0a 90 8b 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 60 17 e9 81 48 89 ee RSP: 0018:c9cc3d48 EFLAGS: 0202 ORIG_RAX: ff04 RAX: 0001 RBX: 8881b9a21e00 RCX: 8881b9a647c0 RDX: 0001 RSI: RDI: 8881b9a21e08 RBP: 8881b9a21e08 R08: 003e R09: R10: 81c04584 R11: R12: 81027af0 R13: R14: 0001 R15: 0008 ? arch_unregister_cpu+0x20/0x20 ? smp_call_function_many+0x1a8/0x230 ? inet_ehashfn+0x29/0x100 ? arch_unregister_cpu+0x20/0x20 ? inet_ehashfn+0x2a/0x100 smp_call_function+0x20/0x40 on_each_cpu+0x18/0x70 ? inet_ehashfn+0x29/0x100 ? inet_ehashfn+0x2a/0x100 text_poke_bp+0x8d/0xda __jump_label_transform+0x10d/0x120 arch_jump_label_transform+0x21/0x30 __jump_label_update+0x70/0xe0 static_key_disable_cpuslocked+0x54/0x80 static_key_disable+0x11/0x20 once_deferred+0x1a/0x30 process_one_work+0x171/0x300 worker_thread+0x2b/0x370 ? process_one_work+0x300/0x300 kthread+0x108/0x120 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x40 Modules linked in: input_leds sg led_class virtio_balloon virtio_console qemu_fw_cfg dm_mod virtio_rng ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto sr_mod cdrom ata_generic virtio_net net_failover failover virtio_blk virtio_pci i2c_piix4 virtio_ring ata_piix virtio libata i2c_core floppy ---[ end trace 60c8d1a075894c8d ]--- RIP: 0010:native_read_pmc+0x2/0x10 Code: e2 20 89 3e 48 09 d0 c3 89 f9 89 f0 0f 30 c3 66 0f 1f 84 00 00 00 00 00 89 f0 89 f9 0f 30 31 c0 c3 0f 1f 80 00 00 00 00 89 f9 <0f> 33 48 c1 e2 20 48 09 d0 c3 0f 1f 40 00 0f 20 c0 c3 66 66 2e 0f RSP: 0018:8881b9a03e50 EFLAGS: 00010083 RAX: 0001 RBX: 8001 RCX: RDX: 002f RSI: RDI: RBP: 8881b590e400 R08: 8881b590e400 R09: 0003 R10: e8c05440 R11: R12: 8881b590e5d8 R13: 0010 R14: 8881b590e420 R15: e8c05400 FS: () GS:8881b9a0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f9bcc5c61f8 CR3: 0001b6a24000 CR4: 06f0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- The host seems to be running fine, the KVM guest crash is reproducible. Both host and guest are running Debian unstable with a locally built kernel; the host runs 5.1.1, the guest 5.0.13. The crash also happens when the host is running 5.1.0; going back to 5.0.13 with the host allows the guest to finish bootup and run fine. Please note that my kernel 5.1.1 image is not fully broken in KVM, I have update my APU machine which runs firewall and other infrastructure services and the guests run fine there. The machine in question is an older box with an AMD Phenom(tm) II X6 1090T Processor. I guess that the issue is related to the Phenom CPU. Any idea short of bisecting? Greetings Marc -- ----- Marc Haber | "I don'
Re: VMs freezing when host is running 4.14
Hi, after in total nine weeks of bisecting, broken filesystems, service outages (thankfully on unportant systems), 4.15 seems to have fixed the issue. After going to 4.15, the crashes never happened again. They have, however, happened with each and every 4.14 release I tried, which I stopped doing with 4.14.15 on Jan 28. This means, for me, that the issue is fixed and that I have just wasted nine weeks of time. For you, this means that you have a crippling, data-eating issue in the current long-term releae kernel. I do sincerely hope that I never have to lay my eye on any 4.14 kernel and hope that no major distribution will release with this version. Greetings Marc On Mon, Jan 08, 2018 at 10:10:25AM +0100, Marc Haber wrote: > it's been five weeks since I gave you the last information about this > issue. Alas, I don't have a solution yet, only reports: > > - The bisect between 4.13 and 4.14 ended up on a one-character fix in a > comment, so that was a total waste. > - The issue is present in all recent kernels up to 4.15-rc5, I didn't > try any newer 4.15 version yet. > - 4.13-rc4 seems good > - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss > to understand why a bug introduced during the 4.13 RC phase could > _not_ be present in the 4.13 release but reappear in 4.14. I didn't > try any 4.14 rc versions but suspect that those are all bad as well. > > I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is > "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours > (as I found out that 24 hours might not be long enough). > > I am still open to any suggestions that might help in identifying this > issue which now affects five of my six systems that to KVM > virtualization one way or the other. I have in the mean time experienced > file system corruption and data loss (and do have backups). > > Greetings > Marc > > On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote: > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > > +cc kvm > > > > > > 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>: > > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > > > >> On the affected host, VMs freeze at a rate about two or three per day. > > > >> They just stop dead in their tracks, console and serial console become > > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > > > >> virsh destroy. > > > > > > > > I was able to obtain a log of a VM before it became unresponsive. here > > > > we go: > > > > > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 > > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console > > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 > > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic > > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 > > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata > > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not > > > > tainted 4.14.1-zgsrv20080 #3 > > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > > > c91fc000 > > > > Nov 22 08:19:01 weave kernel: RIP: > > > > 0010:kvm_async_pf_task_wait+0x167/0x200 > > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: > > > > 0202 > > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: > > > > c91ffa30 RCX: 0002 > > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: > > > > 8173514b RDI: 819bdd80 > > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: > > > > 00193fc0 R09: 8800 > > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > > > R12: c91ffa40 > > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: > > > > 819bdd80 R15: ea193f80 > > > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > > > GS:88001fd0() knlGS: > >
Re: VMs freezing when host is running 4.14
Hi, after in total nine weeks of bisecting, broken filesystems, service outages (thankfully on unportant systems), 4.15 seems to have fixed the issue. After going to 4.15, the crashes never happened again. They have, however, happened with each and every 4.14 release I tried, which I stopped doing with 4.14.15 on Jan 28. This means, for me, that the issue is fixed and that I have just wasted nine weeks of time. For you, this means that you have a crippling, data-eating issue in the current long-term releae kernel. I do sincerely hope that I never have to lay my eye on any 4.14 kernel and hope that no major distribution will release with this version. Greetings Marc On Mon, Jan 08, 2018 at 10:10:25AM +0100, Marc Haber wrote: > it's been five weeks since I gave you the last information about this > issue. Alas, I don't have a solution yet, only reports: > > - The bisect between 4.13 and 4.14 ended up on a one-character fix in a > comment, so that was a total waste. > - The issue is present in all recent kernels up to 4.15-rc5, I didn't > try any newer 4.15 version yet. > - 4.13-rc4 seems good > - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss > to understand why a bug introduced during the 4.13 RC phase could > _not_ be present in the 4.13 release but reappear in 4.14. I didn't > try any 4.14 rc versions but suspect that those are all bad as well. > > I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is > "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours > (as I found out that 24 hours might not be long enough). > > I am still open to any suggestions that might help in identifying this > issue which now affects five of my six systems that to KVM > virtualization one way or the other. I have in the mean time experienced > file system corruption and data loss (and do have backups). > > Greetings > Marc > > On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote: > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > > +cc kvm > > > > > > 2017-11-22 10:39 GMT+01:00 Marc Haber : > > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > > > >> On the affected host, VMs freeze at a rate about two or three per day. > > > >> They just stop dead in their tracks, console and serial console become > > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > > > >> virsh destroy. > > > > > > > > I was able to obtain a log of a VM before it became unresponsive. here > > > > we go: > > > > > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 > > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console > > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 > > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic > > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 > > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata > > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not > > > > tainted 4.14.1-zgsrv20080 #3 > > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > > > c91fc000 > > > > Nov 22 08:19:01 weave kernel: RIP: > > > > 0010:kvm_async_pf_task_wait+0x167/0x200 > > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: > > > > 0202 > > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: > > > > c91ffa30 RCX: 0002 > > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: > > > > 8173514b RDI: 819bdd80 > > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: > > > > 00193fc0 R09: 8800 > > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > > > R12: c91ffa40 > > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: > > > > 819bdd80 R15: ea193f80 > > > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > > > GS:88001fd0() knlGS: > > > > Nov 22 08:19:01 weave ke
Re: VMs freezing when host is running 4.14
Hi, it's been five weeks since I gave you the last information about this issue. Alas, I don't have a solution yet, only reports: - The bisect between 4.13 and 4.14 ended up on a one-character fix in a comment, so that was a total waste. - The issue is present in all recent kernels up to 4.15-rc5, I didn't try any newer 4.15 version yet. - 4.13-rc4 seems good - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss to understand why a bug introduced during the 4.13 RC phase could _not_ be present in the 4.13 release but reappear in 4.14. I didn't try any 4.14 rc versions but suspect that those are all bad as well. I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours (as I found out that 24 hours might not be long enough). I am still open to any suggestions that might help in identifying this issue which now affects five of my six systems that to KVM virtualization one way or the other. I have in the mean time experienced file system corruption and data loss (and do have backups). Greetings Marc On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote: > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > +cc kvm > > > > 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>: > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > > >> On the affected host, VMs freeze at a rate about two or three per day. > > >> They just stop dead in their tracks, console and serial console become > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > > >> virsh destroy. > > > > > > I was able to obtain a log of a VM before it became unresponsive. here > > > we go: > > > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted > > > 4.14.1-zgsrv20080 #3 > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > > c91fc000 > > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 > > > RCX: 0002 > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b > > > RDI: 819bdd80 > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 > > > R09: 8800 > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > > R12: c91ffa40 > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 > > > R15: ea193f80 > > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > > GS:88001fd0() knlGS: > > > Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: > > > 80050033 > > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 > > > CR4: 000406e0 > > > Nov 22 08:19:01 weave kernel: Call Trace: > > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 > > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 > > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 > > > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10 > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246 > > > Nov 22 08:19:01 weave kernel: RAX: RBX: 0004 > > > RCX: 0200 > > > Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 > > > RDI: 8800064fe000 > > > Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 > > > R09: 8800 > > > Nov 22 08:19:01 weave kernel: R10: 00
Re: VMs freezing when host is running 4.14
Hi, it's been five weeks since I gave you the last information about this issue. Alas, I don't have a solution yet, only reports: - The bisect between 4.13 and 4.14 ended up on a one-character fix in a comment, so that was a total waste. - The issue is present in all recent kernels up to 4.15-rc5, I didn't try any newer 4.15 version yet. - 4.13-rc4 seems good - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss to understand why a bug introduced during the 4.13 RC phase could _not_ be present in the 4.13 release but reappear in 4.14. I didn't try any 4.14 rc versions but suspect that those are all bad as well. I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours (as I found out that 24 hours might not be long enough). I am still open to any suggestions that might help in identifying this issue which now affects five of my six systems that to KVM virtualization one way or the other. I have in the mean time experienced file system corruption and data loss (and do have backups). Greetings Marc On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote: > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > +cc kvm > > > > 2017-11-22 10:39 GMT+01:00 Marc Haber : > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > > >> On the affected host, VMs freeze at a rate about two or three per day. > > >> They just stop dead in their tracks, console and serial console become > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > > >> virsh destroy. > > > > > > I was able to obtain a log of a VM before it became unresponsive. here > > > we go: > > > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted > > > 4.14.1-zgsrv20080 #3 > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > > c91fc000 > > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 > > > RCX: 0002 > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b > > > RDI: 819bdd80 > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 > > > R09: 8800 > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > > R12: c91ffa40 > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 > > > R15: ea193f80 > > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > > GS:88001fd0() knlGS: > > > Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: > > > 80050033 > > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 > > > CR4: 000406e0 > > > Nov 22 08:19:01 weave kernel: Call Trace: > > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 > > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 > > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 > > > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10 > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246 > > > Nov 22 08:19:01 weave kernel: RAX: RBX: 0004 > > > RCX: 0200 > > > Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 > > > RDI: 8800064fe000 > > > Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 > > > R09: 8800 > > > Nov 22 08:19:01 weave kernel: R10: R11: 00
Re: VMs freezing when host is running 4.14
4.14.3 is still affected. I am still bisecting between 4.13 and 4.14, 5 steps to go. Defining a kernel as "good" if it survived 24 hours on the hosts. Greetings Marc On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > From: 王金浦 <jinpuw...@gmail.com> > Subject: Re: VMs freezing when host is running 4.14 > To: Marc Haber <mh+linux-ker...@zugschlus.de> > Cc: LKML <linux-kernel@vger.kernel.org>, "KVM-ML (k...@vger.kernel.org)" > <k...@vger.kernel.org> > Date: Wed, 22 Nov 2017 16:04:42 +0100 > List-ID: > X-Spam-Score: (-) -1.9 > X-Spam-Report: torres.zugschlus.de Content analysis details: (-1.9 > points, 5.0 required) pts rule name description > -- --- -0.0 > RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no > trust > [209.85.215.48 listed in list.dnswl.org] 0.0 > HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail > domains are different 0.0 FREEMAIL_FROM > Sender email is commonly abused enduser mail provider > (jinpuwang[at]gmail.com) -0.0 > RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) > [209.85.215.48 listed in wl.mailspike.net] > -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% > [score: 0.] -0.1 DKIM_VALID > Message has at least one valid DKIM or DK signature -0.1 > DKIM_VALID_AU Message has a valid DKIM or DK signature from > author's domain 0.1 DKIM_SIGNED > Message has a DKIM or DK signature, not necessarily valid 0.1 > FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom > freemail headers are different -0.0 > RCVD_IN_MSPIKE_WL Mailspike good senders > > +cc kvm > > 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>: > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > >> On the affected host, VMs freeze at a rate about two or three per day. > >> They just stop dead in their tracks, console and serial console become > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > >> virsh destroy. > > > > I was able to obtain a log of a VM before it became unresponsive. here > > we go: > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd > > glue_helper cryptd input_leds virtio_balloon virtio_console led_class > > qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid > > sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci > > ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio > > ata_piix i2c_core libata > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted > > 4.14.1-zgsrv20080 #3 > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > c91fc000 > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 > > RCX: 0002 > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b > > RDI: 819bdd80 > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 > > R09: 8800 > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > R12: c91ffa40 > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 > > R15: ea193f80 > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > GS:88001fd0() knlGS: > > Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: > > 80050033 > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 > > CR4: 000406e0 > > Nov 22 08:19:01 weave kernel: Call Trace: > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 > > N
Re: VMs freezing when host is running 4.14
4.14.3 is still affected. I am still bisecting between 4.13 and 4.14, 5 steps to go. Defining a kernel as "good" if it survived 24 hours on the hosts. Greetings Marc On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > From: 王金浦 > Subject: Re: VMs freezing when host is running 4.14 > To: Marc Haber > Cc: LKML , "KVM-ML (k...@vger.kernel.org)" > > Date: Wed, 22 Nov 2017 16:04:42 +0100 > List-ID: > X-Spam-Score: (-) -1.9 > X-Spam-Report: torres.zugschlus.de Content analysis details: (-1.9 > points, 5.0 required) pts rule name description > -- --- -0.0 > RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no > trust > [209.85.215.48 listed in list.dnswl.org] 0.0 > HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail > domains are different 0.0 FREEMAIL_FROM > Sender email is commonly abused enduser mail provider > (jinpuwang[at]gmail.com) -0.0 > RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) > [209.85.215.48 listed in wl.mailspike.net] > -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% > [score: 0.] -0.1 DKIM_VALID > Message has at least one valid DKIM or DK signature -0.1 > DKIM_VALID_AU Message has a valid DKIM or DK signature from > author's domain 0.1 DKIM_SIGNED > Message has a DKIM or DK signature, not necessarily valid 0.1 > FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom > freemail headers are different -0.0 > RCVD_IN_MSPIKE_WL Mailspike good senders > > +cc kvm > > 2017-11-22 10:39 GMT+01:00 Marc Haber : > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > >> On the affected host, VMs freeze at a rate about two or three per day. > >> They just stop dead in their tracks, console and serial console become > >> unresponsive, ping stops, they don't react to virsh shutdown, only to > >> virsh destroy. > > > > I was able to obtain a log of a VM before it became unresponsive. here > > we go: > > > > Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd > > glue_helper cryptd input_leds virtio_balloon virtio_console led_class > > qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid > > sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci > > ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio > > ata_piix i2c_core libata > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted > > 4.14.1-zgsrv20080 #3 > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: > > c91fc000 > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 > > RCX: 0002 > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b > > RDI: 819bdd80 > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 > > R09: 8800 > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: > > R12: c91ffa40 > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 > > R15: ea193f80 > > Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() > > GS:88001fd0() knlGS: > > Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: > > 80050033 > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 > > CR4: 000406e0 > > Nov 22 08:19:01 weave kernel: Call Trace: > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 > > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10 > > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246 > > Nov 22 08:19:01
Re: VMs freezing when host is running 4.14
On Thu, Nov 23, 2017 at 06:26:36PM +0200, Liran Alon wrote: > If there is no nested guest so no. My fix here probably won't help. I can confirm that I am not running nested virt, the host is running directly on the APU. I also have three other machines that are running flawlessly with 4.14, and another virtualization host, a "real" server with a somewhat dated AMD Opteron 1389 that has the same issue. The machine that first showed the issue is Intel, so we are not having a vendor issue. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Thu, Nov 23, 2017 at 06:26:36PM +0200, Liran Alon wrote: > If there is no nested guest so no. My fix here probably won't help. I can confirm that I am not running nested virt, the host is running directly on the APU. I also have three other machines that are running flawlessly with 4.14, and another virtualization host, a "real" server with a somewhat dated AMD Opteron 1389 that has the same issue. The machine that first showed the issue is Intel, so we are not having a vendor issue. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote: > 2017-11-22 16:52+0100, Marc Haber: > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > > So all guest kernels are 4.14, or also other older kernel? > > > > Guest kernels are also 4.14, but the issue disappears when the host is > > downgraded to an older kernel. I therefore reckoned that the guest > > kernel doesn't matter, but that was before I saw the trace in the log. > > The two most suspicious patches since 4.13 (which I assume works) are > > 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been > injected") That one does not revert cleanly, the line in questions seems to have been removed a bit later. Reject is: 141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c +++ arch/x86/kvm/vmx.c @@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned nr = vcpu->arch.exception.nr; bool has_error_code = vcpu->arch.exception.has_error_code; - bool reinject = vcpu->arch.exception.injected; + bool reinject = vcpu->arch.exception.reinject; u32 error_code = vcpu->arch.exception.error_code; u32 intr_info = nr | INTR_INFO_VALID_MASK; > > and > > 9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present" > and "Page Ready" exceptions simultaneously") > > please try reverting them to see if it helps, That one reverted cleanly. I am now running the new kernel on the affected machine, and I think that a second machine has joined the market of being affected. Would this matter on the host only or on the guests as well? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote: > 2017-11-22 16:52+0100, Marc Haber: > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > > > So all guest kernels are 4.14, or also other older kernel? > > > > Guest kernels are also 4.14, but the issue disappears when the host is > > downgraded to an older kernel. I therefore reckoned that the guest > > kernel doesn't matter, but that was before I saw the trace in the log. > > The two most suspicious patches since 4.13 (which I assume works) are > > 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been > injected") That one does not revert cleanly, the line in questions seems to have been removed a bit later. Reject is: 141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c +++ arch/x86/kvm/vmx.c @@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned nr = vcpu->arch.exception.nr; bool has_error_code = vcpu->arch.exception.has_error_code; - bool reinject = vcpu->arch.exception.injected; + bool reinject = vcpu->arch.exception.reinject; u32 error_code = vcpu->arch.exception.error_code; u32 intr_info = nr | INTR_INFO_VALID_MASK; > > and > > 9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present" > and "Page Ready" exceptions simultaneously") > > please try reverting them to see if it helps, That one reverted cleanly. I am now running the new kernel on the affected machine, and I think that a second machine has joined the market of being affected. Would this matter on the host only or on the guests as well? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > So all guest kernels are 4.14, or also other older kernel? Guest kernels are also 4.14, but the issue disappears when the host is downgraded to an older kernel. I therefore reckoned that the guest kernel doesn't matter, but that was before I saw the trace in the log. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote: > So all guest kernels are 4.14, or also other older kernel? Guest kernels are also 4.14, but the issue disappears when the host is downgraded to an older kernel. I therefore reckoned that the guest kernel doesn't matter, but that was before I saw the trace in the log. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: VMs freezing when host is running 4.14
On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > On the affected host, VMs freeze at a rate about two or three per day. > They just stop dead in their tracks, console and serial console become > unresponsive, ping stops, they don't react to virsh shutdown, only to > virsh destroy. I was able to obtain a log of a VM before it became unresponsive. here we go: Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio ata_piix i2c_core libata Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 4.14.1-zgsrv20080 #3 Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: c91fc000 Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 RCX: 0002 Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b RDI: 819bdd80 Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 R09: 8800 Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: R12: c91ffa40 Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 R15: ea193f80 Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() GS:88001fd0() knlGS: Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 CR4: 000406e0 Nov 22 08:19:01 weave kernel: Call Trace: Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10 Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246 Nov 22 08:19:01 weave kernel: RAX: RBX: 0004 RCX: 0200 Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 RDI: 8800064fe000 Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 R09: 8800 Nov 22 08:19:01 weave kernel: R10: R11: R12: 0020 Nov 22 08:19:01 weave kernel: R13: 88001ffd5500 R14: c91ffce8 R15: ea193f80 Nov 22 08:19:01 weave kernel: ? get_page_from_freelist+0x8c3/0xaf0 Nov 22 08:19:01 weave kernel: ? __mem_cgroup_threshold+0x8a/0x130 Nov 22 08:19:01 weave kernel: ? free_pcppages_bulk+0x3f6/0x410 Nov 22 08:19:01 weave kernel: __alloc_pages_nodemask+0xe4/0xe20 Nov 22 08:19:01 weave kernel: ? free_hot_cold_page_list+0x2b/0x50 Nov 22 08:19:01 weave kernel: ? release_pages+0x2b7/0x360 Nov 22 08:19:01 weave kernel: ? mem_cgroup_commit_charge+0x7a/0x520 Nov 22 08:19:01 weave kernel: ? account_entity_enqueue+0x95/0xc0 Nov 22 08:19:01 weave kernel: alloc_pages_vma+0x7f/0x1e0 Nov 22 08:19:01 weave kernel: __handle_mm_fault+0x9cb/0xf20 Nov 22 08:19:01 weave kernel: handle_mm_fault+0xb2/0x1f0 Nov 22 08:19:01 weave kernel: __do_page_fault+0x1f2/0x440 Nov 22 08:19:01 weave kernel: do_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: do_async_page_fault+0x4c/0x70 Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: RIP: 0033:0x56434ef679d8 Nov 22 08:19:01 weave kernel: RSP: 002b:7ffd6b48ad80 EFLAGS: 00010206 Nov 22 08:19:01 weave kernel: RAX: 00eb RBX: 001d RCX: aaab Nov 22 08:19:01 weave kernel: RDX: 56434f5eb300 RSI: 000f RDI: 56434f3ca6c0 Nov 22 08:19:01 weave kernel: RBP: 00ec R08: 7f97e2453000 R09: 56434f5eb3ea Nov 22 08:19:01 weave kernel: R10: 56434f5eb3eb R11: 56434f4510a0 R12: 003a Nov 22 08:19:01 weave kernel: R13: 56434f3ca500 R14: 56434f451240 R15: 7f97e1024750 Nov 22 08:19:01 weave kernel: Code: f7 49 89 9d a0 d1 9b 81 48 89 55 98 4c 8d 63 10 e8 4f 02 53 00 eb 20 48 83 7d 98 00 74 3a e8 21 6e 06 00 80 7d c0 00 74 3f fb f4 66 66 90 66 66 90 e8 7d 6f 06 00 80 7d c0 00 75 da 48 8d b5 Nov 22 08:19:01 weave kernel: RIP: kvm_async_pf_task_wait+0x167/0x200 RSP: c91ffa10 Nov 22 08:19:01 weave kernel: ---[ end trace 4701012ee256be25 ]--- Does that help? Greetings Marc -- --
Re: VMs freezing when host is running 4.14
On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote: > On the affected host, VMs freeze at a rate about two or three per day. > They just stop dead in their tracks, console and serial console become > unresponsive, ping stops, they don't react to virsh shutdown, only to > virsh destroy. I was able to obtain a log of a VM before it became unresponsive. here we go: Nov 22 08:19:01 weave kernel: double fault: [#1] PREEMPT SMP Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio ata_piix i2c_core libata Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 4.14.1-zgsrv20080 #3 Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: c91fc000 Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200 Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202 Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 RCX: 0002 Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b RDI: 819bdd80 Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 R09: 8800 Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: R12: c91ffa40 Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 R15: ea193f80 Nov 22 08:19:01 weave kernel: FS: 7f97e25dd700() GS:88001fd0() knlGS: Nov 22 08:19:01 weave kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 CR4: 000406e0 Nov 22 08:19:01 weave kernel: Call Trace: Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70 Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70 Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10 Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246 Nov 22 08:19:01 weave kernel: RAX: RBX: 0004 RCX: 0200 Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 RDI: 8800064fe000 Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 R09: 8800 Nov 22 08:19:01 weave kernel: R10: R11: R12: 0020 Nov 22 08:19:01 weave kernel: R13: 88001ffd5500 R14: c91ffce8 R15: ea193f80 Nov 22 08:19:01 weave kernel: ? get_page_from_freelist+0x8c3/0xaf0 Nov 22 08:19:01 weave kernel: ? __mem_cgroup_threshold+0x8a/0x130 Nov 22 08:19:01 weave kernel: ? free_pcppages_bulk+0x3f6/0x410 Nov 22 08:19:01 weave kernel: __alloc_pages_nodemask+0xe4/0xe20 Nov 22 08:19:01 weave kernel: ? free_hot_cold_page_list+0x2b/0x50 Nov 22 08:19:01 weave kernel: ? release_pages+0x2b7/0x360 Nov 22 08:19:01 weave kernel: ? mem_cgroup_commit_charge+0x7a/0x520 Nov 22 08:19:01 weave kernel: ? account_entity_enqueue+0x95/0xc0 Nov 22 08:19:01 weave kernel: alloc_pages_vma+0x7f/0x1e0 Nov 22 08:19:01 weave kernel: __handle_mm_fault+0x9cb/0xf20 Nov 22 08:19:01 weave kernel: handle_mm_fault+0xb2/0x1f0 Nov 22 08:19:01 weave kernel: __do_page_fault+0x1f2/0x440 Nov 22 08:19:01 weave kernel: do_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: do_async_page_fault+0x4c/0x70 Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30 Nov 22 08:19:01 weave kernel: RIP: 0033:0x56434ef679d8 Nov 22 08:19:01 weave kernel: RSP: 002b:7ffd6b48ad80 EFLAGS: 00010206 Nov 22 08:19:01 weave kernel: RAX: 00eb RBX: 001d RCX: aaab Nov 22 08:19:01 weave kernel: RDX: 56434f5eb300 RSI: 000f RDI: 56434f3ca6c0 Nov 22 08:19:01 weave kernel: RBP: 00ec R08: 7f97e2453000 R09: 56434f5eb3ea Nov 22 08:19:01 weave kernel: R10: 56434f5eb3eb R11: 56434f4510a0 R12: 003a Nov 22 08:19:01 weave kernel: R13: 56434f3ca500 R14: 56434f451240 R15: 7f97e1024750 Nov 22 08:19:01 weave kernel: Code: f7 49 89 9d a0 d1 9b 81 48 89 55 98 4c 8d 63 10 e8 4f 02 53 00 eb 20 48 83 7d 98 00 74 3a e8 21 6e 06 00 80 7d c0 00 74 3f fb f4 66 66 90 66 66 90 e8 7d 6f 06 00 80 7d c0 00 75 da 48 8d b5 Nov 22 08:19:01 weave kernel: RIP: kvm_async_pf_task_wait+0x167/0x200 RSP: c91ffa10 Nov 22 08:19:01 weave kernel: ---[ end trace 4701012ee256be25 ]--- Does that help? Greetings Marc -- --
VMs freezing when host is running 4.14
Hi, I am running Debian stable with home-built kernels on a number of KVM hosts and a bigger number of KVM VMs. With 4.14, I have an interesting phenomenon on _one_ of my hosts, while all other hosts run fine. All systems are reasonably similar to each other. On the affected host, VMs freeze at a rate about two or three per day. They just stop dead in their tracks, console and serial console become unresponsive, ping stops, they don't react to virsh shutdown, only to virsh destroy. They do, however, take a noticeable part of CPU resources when they're in this state, up to a full CPU core. What's left in syslog of a VM is unsuspicious, the host logs don't have anything uncommon. When I start a VM that allocates a lot of memory (like the 8 GB Windows VM that I use for bookkeeping and taxes), it happens that two to five of the Linux VMs freeze in the same second. The affected host is a Thinkpad T520 with 16 Gig of RAM and an Intel(R) Core(TM) i5-2520M that is neither under noticeable load nor under memory pressure (it can afford 8 Gig of disk cache). When I boot the host back to 4.13.11, things are just fine and the machine is chugging away painlessly for days. When I boot 4.14, the VM freeze phenomenon usually appears in the first 24 hours. The other VM hosts (ranging from a small PC Engines APU to a bigger 48 Gig Server in Housing) run just fine with 4.14 as well. Any ideas? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
VMs freezing when host is running 4.14
Hi, I am running Debian stable with home-built kernels on a number of KVM hosts and a bigger number of KVM VMs. With 4.14, I have an interesting phenomenon on _one_ of my hosts, while all other hosts run fine. All systems are reasonably similar to each other. On the affected host, VMs freeze at a rate about two or three per day. They just stop dead in their tracks, console and serial console become unresponsive, ping stops, they don't react to virsh shutdown, only to virsh destroy. They do, however, take a noticeable part of CPU resources when they're in this state, up to a full CPU core. What's left in syslog of a VM is unsuspicious, the host logs don't have anything uncommon. When I start a VM that allocates a lot of memory (like the 8 GB Windows VM that I use for bookkeeping and taxes), it happens that two to five of the Linux VMs freeze in the same second. The affected host is a Thinkpad T520 with 16 Gig of RAM and an Intel(R) Core(TM) i5-2520M that is neither under noticeable load nor under memory pressure (it can afford 8 Gig of disk cache). When I boot the host back to 4.13.11, things are just fine and the machine is chugging away painlessly for days. When I boot 4.14, the VM freeze phenomenon usually appears in the first 24 hours. The other VM hosts (ranging from a small PC Engines APU to a bigger 48 Gig Server in Housing) run just fine with 4.14 as well. Any ideas? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
Hi, those four patches never made it into any 4.13 release. 0001-net-call-sk_reuseport_match-if-we-are-a-reusesock.patch 0001-net-don-t-fast-patch-mismatched-sockets-in-STRICT-mo.patch 0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch 0001-net-set-tb-fast_sk_family.patch And I have just seen that the first two are not even in 4.14. What does that mean for libvirt users on systems runnign a 4.14 kernel? The third and fourth patch (0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch and 0001-net-set-tb-fast_sk_family.patch) seem to be in 4.14. Greetings Marc On Mon, Sep 18, 2017 at 10:02:32AM +0200, Marc Haber wrote: > On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote: > > On 09/15/2017 01:51 PM, Josef Bacik wrote: > > > Finally got access to a box to run this down myself. This patch on top > > > of the other patches fixes the problem for me, could you verify it works > > > for you? Thanks, > > > > > > > Yup I can confirm that patch fixes things when applied on top of the > > previous 3 patches. Thanks! Please tag those patches for stable releases > > if appropriate, this is affecting a decent amount of libvirt users > > I can also confirm that these four patches fix things for me (on > Debian) as well. Thanks! > > I would love to have this in one of Greg's next 4.13 releases. -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
Hi, those four patches never made it into any 4.13 release. 0001-net-call-sk_reuseport_match-if-we-are-a-reusesock.patch 0001-net-don-t-fast-patch-mismatched-sockets-in-STRICT-mo.patch 0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch 0001-net-set-tb-fast_sk_family.patch And I have just seen that the first two are not even in 4.14. What does that mean for libvirt users on systems runnign a 4.14 kernel? The third and fourth patch (0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch and 0001-net-set-tb-fast_sk_family.patch) seem to be in 4.14. Greetings Marc On Mon, Sep 18, 2017 at 10:02:32AM +0200, Marc Haber wrote: > On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote: > > On 09/15/2017 01:51 PM, Josef Bacik wrote: > > > Finally got access to a box to run this down myself. This patch on top > > > of the other patches fixes the problem for me, could you verify it works > > > for you? Thanks, > > > > > > > Yup I can confirm that patch fixes things when applied on top of the > > previous 3 patches. Thanks! Please tag those patches for stable releases > > if appropriate, this is affecting a decent amount of libvirt users > > I can also confirm that these four patches fix things for me (on > Debian) as well. Thanks! > > I would love to have this in one of Greg's next 4.13 releases. -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote: > On 09/15/2017 01:51 PM, Josef Bacik wrote: > > Finally got access to a box to run this down myself. This patch on top of > > the other patches fixes the problem for me, could you verify it works for > > you? Thanks, > > > > Yup I can confirm that patch fixes things when applied on top of the > previous 3 patches. Thanks! Please tag those patches for stable releases > if appropriate, this is affecting a decent amount of libvirt users I can also confirm that these four patches fix things for me (on Debian) as well. Thanks! I would love to have this in one of Greg's next 4.13 releases. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote: > On 09/15/2017 01:51 PM, Josef Bacik wrote: > > Finally got access to a box to run this down myself. This patch on top of > > the other patches fixes the problem for me, could you verify it works for > > you? Thanks, > > > > Yup I can confirm that patch fixes things when applied on top of the > previous 3 patches. Thanks! Please tag those patches for stable releases > if appropriate, this is affecting a decent amount of libvirt users I can also confirm that these four patches fix things for me (on Debian) as well. Thanks! I would love to have this in one of Greg's next 4.13 releases. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6 [solved]
Hi, On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > and in 4.6 there is not even /dev/bus/usb. This turned out to be a configuration issue. 4.6 kernels on Banana Pi need CONFIG_AXP20X_POWER for working USB. If that driver is missing, one gets a silent fail. Thanks for all your help. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6 [solved]
Hi, On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > and in 4.6 there is not even /dev/bus/usb. This turned out to be a configuration issue. 4.6 kernels on Banana Pi need CONFIG_AXP20X_POWER for working USB. If that driver is missing, one gets a silent fail. Thanks for all your help. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Sat, Jun 11, 2016 at 02:55:04PM +0200, Marc Haber wrote: > On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote: > > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to > > find the offending commit? > > I can. The first round of bisecting let me end up with > c8b710b3e4348119924051551b836c94835331b1 as the first bad commit, > which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad > as well. I am not sure whether things went well since I had to use git > bisect skip twice because the resulting kernel wouldn't boot on the pi. The kernel panic on boot is caused by bugs in the parport part. I worked around these by disabling PARPORT in the kernel configuration. However, a weekend of bisecting just sent me back to commit d85ce830eef6c10d1e9617172dea4681f02b8424, which is a purely cosmetic commit. What totally confuses me is the sheer size of the diff. [8/506]mh@fan:~/linux/debug/linux.bad$ less .git/BISECT_LOG [9/507]mh@fan:~/linux/debug/linux.bad$ git log v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | cat d85ce830eef6c10d1e9617172dea4681f02b8424 perf pmu: Fix misleadingly indented assignment (whitespace) [10/508]mh@fan:~/linux/debug/linux.bad$ git diff v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l 811131 [11/509]mh@fan:~/linux/debug/linux.bad$ git show d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l 14 [12/510]mh@fan:~/linux/debug/linux.bad$ Why do I get a 80+ line diff for a 14 line commit? This can't be correct. Hints? Here is the BISECT_LOG: git bisect start # bad: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6 git bisect bad 2dcd0af568b0cf583645c8a317dd12e344b1c72a # good: [b562e44f507e863c6792946e4e1b1449fbbac85d] Linux 4.5 git bisect good b562e44f507e863c6792946e4e1b1449fbbac85d # bad: [6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f] Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup git bisect bad 6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f # bad: [96b9b1c95660d4bc5510c5d798d3817ae9f0b391] Merge tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect bad 96b9b1c95660d4bc5510c5d798d3817ae9f0b391 # bad: [277edbabf6fece057b14fb6db5e3a34e00f42f42] Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 277edbabf6fece057b14fb6db5e3a34e00f42f42 # bad: [5ca5446ec5ba5e79a6f271cd026bb153d6850fcc] Merge tag 'pinctrl-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad 5ca5446ec5ba5e79a6f271cd026bb153d6850fcc # bad: [e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047 # bad: [54fbad54ebcde9db9c7459e9e379f2350c25e1f1] perf mem record: Check for memory events support git bisect bad 54fbad54ebcde9db9c7459e9e379f2350c25e1f1 # bad: [598b7c6919c7bbcc1243009721a01bc12275ff3e] perf jit: add source line info support git bisect bad 598b7c6919c7bbcc1243009721a01bc12275ff3e # bad: [3848c23b19e07188bfa15e3d9a2ac27692f2ff3c] perf report: Don't show blank lines if entry has no callchain git bisect bad 3848c23b19e07188bfa15e3d9a2ac27692f2ff3c # bad: [5ac76283b32b116c58e362e99542182ddcfc8262] perf cpumap: Auto initialize cpu__max_{node,cpu} git bisect bad 5ac76283b32b116c58e362e99542182ddcfc8262 # bad: [cfd92dadc5e830268036efb25ff41618f29c3306] perf sort: Provide a way to find out if per-thread bucketing is in place git bisect bad cfd92dadc5e830268036efb25ff41618f29c3306 # bad: [3379e0c3effa87d7734fc06277a7023292aadb0c] perf tools: Document the perf sysctls git bisect bad 3379e0c3effa87d7734fc06277a7023292aadb0c # bad: [86a2cf3123bfec118bfb98728d88be0668779b2b] perf stat: Making several helper functions static git bisect bad 86a2cf3123bfec118bfb98728d88be0668779b2b # bad: [403567217d3fa5d4801f820317ada52e5c5f0e53] perf symbols: Do not read symbols/data from device files git bisect bad 403567217d3fa5d4801f820317ada52e5c5f0e53 # bad: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly indented assignment (whitespace) git bisect bad d85ce830eef6c10d1e9617172dea4681f02b8424 # first bad commit: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly indented assignment (whitespace) [13/511]mh@fan:~/linux/debug/linux.bad$ (started with git checkout v4.6, git bisect start, git bisect bad, git bisect good v4.5). Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Sat, Jun 11, 2016 at 02:55:04PM +0200, Marc Haber wrote: > On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote: > > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to > > find the offending commit? > > I can. The first round of bisecting let me end up with > c8b710b3e4348119924051551b836c94835331b1 as the first bad commit, > which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad > as well. I am not sure whether things went well since I had to use git > bisect skip twice because the resulting kernel wouldn't boot on the pi. The kernel panic on boot is caused by bugs in the parport part. I worked around these by disabling PARPORT in the kernel configuration. However, a weekend of bisecting just sent me back to commit d85ce830eef6c10d1e9617172dea4681f02b8424, which is a purely cosmetic commit. What totally confuses me is the sheer size of the diff. [8/506]mh@fan:~/linux/debug/linux.bad$ less .git/BISECT_LOG [9/507]mh@fan:~/linux/debug/linux.bad$ git log v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | cat d85ce830eef6c10d1e9617172dea4681f02b8424 perf pmu: Fix misleadingly indented assignment (whitespace) [10/508]mh@fan:~/linux/debug/linux.bad$ git diff v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l 811131 [11/509]mh@fan:~/linux/debug/linux.bad$ git show d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l 14 [12/510]mh@fan:~/linux/debug/linux.bad$ Why do I get a 80+ line diff for a 14 line commit? This can't be correct. Hints? Here is the BISECT_LOG: git bisect start # bad: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6 git bisect bad 2dcd0af568b0cf583645c8a317dd12e344b1c72a # good: [b562e44f507e863c6792946e4e1b1449fbbac85d] Linux 4.5 git bisect good b562e44f507e863c6792946e4e1b1449fbbac85d # bad: [6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f] Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup git bisect bad 6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f # bad: [96b9b1c95660d4bc5510c5d798d3817ae9f0b391] Merge tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect bad 96b9b1c95660d4bc5510c5d798d3817ae9f0b391 # bad: [277edbabf6fece057b14fb6db5e3a34e00f42f42] Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 277edbabf6fece057b14fb6db5e3a34e00f42f42 # bad: [5ca5446ec5ba5e79a6f271cd026bb153d6850fcc] Merge tag 'pinctrl-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad 5ca5446ec5ba5e79a6f271cd026bb153d6850fcc # bad: [e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047 # bad: [54fbad54ebcde9db9c7459e9e379f2350c25e1f1] perf mem record: Check for memory events support git bisect bad 54fbad54ebcde9db9c7459e9e379f2350c25e1f1 # bad: [598b7c6919c7bbcc1243009721a01bc12275ff3e] perf jit: add source line info support git bisect bad 598b7c6919c7bbcc1243009721a01bc12275ff3e # bad: [3848c23b19e07188bfa15e3d9a2ac27692f2ff3c] perf report: Don't show blank lines if entry has no callchain git bisect bad 3848c23b19e07188bfa15e3d9a2ac27692f2ff3c # bad: [5ac76283b32b116c58e362e99542182ddcfc8262] perf cpumap: Auto initialize cpu__max_{node,cpu} git bisect bad 5ac76283b32b116c58e362e99542182ddcfc8262 # bad: [cfd92dadc5e830268036efb25ff41618f29c3306] perf sort: Provide a way to find out if per-thread bucketing is in place git bisect bad cfd92dadc5e830268036efb25ff41618f29c3306 # bad: [3379e0c3effa87d7734fc06277a7023292aadb0c] perf tools: Document the perf sysctls git bisect bad 3379e0c3effa87d7734fc06277a7023292aadb0c # bad: [86a2cf3123bfec118bfb98728d88be0668779b2b] perf stat: Making several helper functions static git bisect bad 86a2cf3123bfec118bfb98728d88be0668779b2b # bad: [403567217d3fa5d4801f820317ada52e5c5f0e53] perf symbols: Do not read symbols/data from device files git bisect bad 403567217d3fa5d4801f820317ada52e5c5f0e53 # bad: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly indented assignment (whitespace) git bisect bad d85ce830eef6c10d1e9617172dea4681f02b8424 # first bad commit: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly indented assignment (whitespace) [13/511]mh@fan:~/linux/debug/linux.bad$ (started with git checkout v4.6, git bisect start, git bisect bad, git bisect good v4.5). Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote: > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to > find the offending commit? I can. The first round of bisecting let me end up with c8b710b3e4348119924051551b836c94835331b1 as the first bad commit, which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad as well. I am not sure whether things went well since I had to use git bisect skip twice because the resulting kernel wouldn't boot on the pi. A second round of bisecting left me in limbo, since I do not understand this: [45/544]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect reset HEAD is now at 2dcd0af... Linux 4.6 [46/545]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect start [47/546]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect bad [48/547]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect bad c8b710b3e4348119924051551b836c94835331b1^ [49/548]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect good v4.5 Some good revs are not ancestor of the bad rev. git bisect cannot work properly in this case. Maybe you mistook good and bad revs? [50/549]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ Do I need to start over completely? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote: > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to > find the offending commit? I can. The first round of bisecting let me end up with c8b710b3e4348119924051551b836c94835331b1 as the first bad commit, which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad as well. I am not sure whether things went well since I had to use git bisect skip twice because the resulting kernel wouldn't boot on the pi. A second round of bisecting left me in limbo, since I do not understand this: [45/544]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect reset HEAD is now at 2dcd0af... Linux 4.6 [46/545]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect start [47/546]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect bad [48/547]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect bad c8b710b3e4348119924051551b836c94835331b1^ [49/548]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ git bisect good v4.5 Some good revs are not ancestor of the bad rev. git bisect cannot work properly in this case. Maybe you mistook good and bad revs? [50/549]mh@fan[zgchroot kernel64][debian_chroot sid_kernel64]:~/linux/debug/linux$ Do I need to start over completely? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Fri, Jun 03, 2016 at 08:35:11AM -0700, Greg KH wrote: > On Fri, Jun 03, 2016 at 08:53:58AM +0200, Marc Haber wrote: > > On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote: > > > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > > > > Hi, > > > > > > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > > > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > > > > and in 4.6 there is not even /dev/bus/usb. > > > > > > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform > > > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host > > > > Controller" is the first one that is missing. > > > > > > > > Is this already a known issue? Or, does a 4.6 kernel need to be > > > > configured differently if you want USB? > > > > > > Are you sure you configured in the correct host controller that you > > > want? Have you done a diff of your .config files to see what you > > > changed? How did you create your 4.6 config? > > > > I used make oldconfig, and I diffed the configs for (hci|usb), and > > there is no difference. > > There might be other things than HCI|USB that are the issue here... Full config diff attached. Hints are appreciated. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 --- /boot/config-4.5.4-zgbpi-armmp-lpae 2016-05-13 17:02:25.0 +0200 +++ /boot/config-4.6.0-zgbpi-armmp-lpae 2016-05-16 14:18:06.0 +0200 @@ -3 +3 @@ -# Linux/arm 4.5.4 Kernel Configuration +# Linux/arm 4.6.0 Kernel Configuration @@ -173,0 +174,2 @@ +# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set +CONFIG_KALLSYMS_BASE_RELATIVE=y @@ -183 +185 @@ -# CONFIG_BPF_SYSCALL is not set +CONFIG_BPF_SYSCALL=y @@ -368,0 +371 @@ +# CONFIG_ARCH_ARTPEC is not set @@ -463 +466,2 @@ -# CONFIG_ARM_KERNMEM_PERMS is not set +CONFIG_DEBUG_RODATA=y +CONFIG_DEBUG_ALIGN_RODATA=y @@ -712 +715,0 @@ -CONFIG_INET_LRO=m @@ -1126 +1128,0 @@ -CONFIG_NETLINK_MMAP=y @@ -1167,0 +1170 @@ +# CONFIG_BT_LEDS is not set @@ -1181,0 +1185 @@ +CONFIG_AF_KCM=m @@ -1207,0 +1212,3 @@ +CONFIG_DST_CACHE=y +CONFIG_NET_DEVLINK=m +CONFIG_MAY_USE_DEVLINK=m @@ -1234 +1241 @@ -CONFIG_REGMAP_SPI=m +CONFIG_REGMAP_SPI=y @@ -1236 +1242,0 @@ -CONFIG_REGMAP_IRQ=y @@ -1246 +1252 @@ -# CONFIG_ARM_CCI500_PMU is not set +# CONFIG_ARM_CCI5xx_PMU is not set @@ -1407,0 +1414 @@ +# CONFIG_PANEL is not set @@ -1440,0 +1448,4 @@ +# VOP Bus Driver +# + +# @@ -1454,0 +1466,4 @@ + +# +# VOP Driver +# @@ -1575,0 +1591 @@ +# CONFIG_MACSEC is not set @@ -1816,0 +1833 @@ +# CONFIG_TOUCHSCREEN_MELFAS_MIP4 is not set @@ -1829 +1845,0 @@ -# CONFIG_TOUCHSCREEN_TS4800 is not set @@ -1856 +1871,0 @@ -# CONFIG_INPUT_AXP20X_PEK is not set @@ -1869,0 +1885 @@ +# CONFIG_RMI4_CORE is not set @@ -1922 +1937,0 @@ -# CONFIG_SERIAL_8250_INGENIC is not set @@ -1951,0 +1967 @@ +# CONFIG_SERIAL_MVEBU_UART is not set @@ -1981,0 +1998 @@ +# CONFIG_I2C_DEMUX_PINCTRL is not set @@ -2031,0 +2049 @@ +# CONFIG_SPI_AXI_SPI_ENGINE is not set @@ -2034,0 +2053 @@ +# CONFIG_SPI_DESIGNWARE is not set @@ -2048 +2066,0 @@ -# CONFIG_SPI_DESIGNWARE is not set @@ -2097 +2115 @@ -CONFIG_PINCTRL_SUNXI_COMMON=y +CONFIG_PINCTRL_SUNXI=y @@ -2109,0 +2128 @@ +CONFIG_PINCTRL_SUN8I_H3_R=y @@ -2131,0 +2151 @@ +# CONFIG_GPIO_MPC8XXX is not set @@ -2147,0 +2168 @@ +# CONFIG_GPIO_TPIC2810 is not set @@ -2159,0 +2181 @@ +# CONFIG_GPIO_PISOSR is not set @@ -2195 +2216,0 @@ -CONFIG_AXP20X_POWER=y @@ -2246,0 +2268 @@ +# CONFIG_SENSORS_LTC2990 is not set @@ -2364 +2385,0 @@ -# CONFIG_TS4800_WATCHDOG is not set @@ -2366 +2386,0 @@ -# CONFIG_BCM7038_WDT is not set @@ -2389,0 +2410 @@ +# CONFIG_MFD_ACT8945A is not set @@ -2397 +2418,2 @@ -CONFIG_MFD_AXP20X=y +# CONFIG_MFD_AXP20X_I2C is not set +# CONFIG_MFD_AXP20X_RSB is not set @@ -2454,0 +2477 @@ +# CONFIG_MFD_TPS65086 is not set @@ -2460 +2482,0 @@ -# CONFIG_MFD_TPS65912 is not set @@ -2491 +2512,0 @@ -# CONFIG_REGULATOR_AXP20X is not set @@ -2566,0 +2588,5 @@ +# ACP (Audio CoProcessor) Configuration +# +# CONFIG_DRM_AMD_ACP is not set + +# @@ -2636,0 +2663 @@ +CONFIG_SND_JACK_INPUT_DEV=y @@ -2702,0 +2730 @@ +# CONFIG_SND_SUN4I_SPDIF is not set @@ -2732 +2760,2 @@ -# CONFIG_SND_SOC_PCM179X is not set +# CONFIG_SND_SOC_PCM179X_I2C is not set +# CONFIG_SND_SOC_PCM179X_SPI is not set @@ -2736,0 +2766 @@ +# CONFIG_SND_SOC_RT5616 is not set @@ -2801,0 +2832 @@ +# CONFIG_HID_CMEDIA is not set @@ -3153,0 +3185 @@ +# CONFIG_LEDS_IS31FL32XX is not set @@ -3208 +3239,0 @@ -# CONFIG_R
Re: USB broken on Banana Pi in Linux 4.6
On Fri, Jun 03, 2016 at 08:35:11AM -0700, Greg KH wrote: > On Fri, Jun 03, 2016 at 08:53:58AM +0200, Marc Haber wrote: > > On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote: > > > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > > > > Hi, > > > > > > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > > > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > > > > and in 4.6 there is not even /dev/bus/usb. > > > > > > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform > > > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host > > > > Controller" is the first one that is missing. > > > > > > > > Is this already a known issue? Or, does a 4.6 kernel need to be > > > > configured differently if you want USB? > > > > > > Are you sure you configured in the correct host controller that you > > > want? Have you done a diff of your .config files to see what you > > > changed? How did you create your 4.6 config? > > > > I used make oldconfig, and I diffed the configs for (hci|usb), and > > there is no difference. > > There might be other things than HCI|USB that are the issue here... Full config diff attached. Hints are appreciated. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 --- /boot/config-4.5.4-zgbpi-armmp-lpae 2016-05-13 17:02:25.0 +0200 +++ /boot/config-4.6.0-zgbpi-armmp-lpae 2016-05-16 14:18:06.0 +0200 @@ -3 +3 @@ -# Linux/arm 4.5.4 Kernel Configuration +# Linux/arm 4.6.0 Kernel Configuration @@ -173,0 +174,2 @@ +# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set +CONFIG_KALLSYMS_BASE_RELATIVE=y @@ -183 +185 @@ -# CONFIG_BPF_SYSCALL is not set +CONFIG_BPF_SYSCALL=y @@ -368,0 +371 @@ +# CONFIG_ARCH_ARTPEC is not set @@ -463 +466,2 @@ -# CONFIG_ARM_KERNMEM_PERMS is not set +CONFIG_DEBUG_RODATA=y +CONFIG_DEBUG_ALIGN_RODATA=y @@ -712 +715,0 @@ -CONFIG_INET_LRO=m @@ -1126 +1128,0 @@ -CONFIG_NETLINK_MMAP=y @@ -1167,0 +1170 @@ +# CONFIG_BT_LEDS is not set @@ -1181,0 +1185 @@ +CONFIG_AF_KCM=m @@ -1207,0 +1212,3 @@ +CONFIG_DST_CACHE=y +CONFIG_NET_DEVLINK=m +CONFIG_MAY_USE_DEVLINK=m @@ -1234 +1241 @@ -CONFIG_REGMAP_SPI=m +CONFIG_REGMAP_SPI=y @@ -1236 +1242,0 @@ -CONFIG_REGMAP_IRQ=y @@ -1246 +1252 @@ -# CONFIG_ARM_CCI500_PMU is not set +# CONFIG_ARM_CCI5xx_PMU is not set @@ -1407,0 +1414 @@ +# CONFIG_PANEL is not set @@ -1440,0 +1448,4 @@ +# VOP Bus Driver +# + +# @@ -1454,0 +1466,4 @@ + +# +# VOP Driver +# @@ -1575,0 +1591 @@ +# CONFIG_MACSEC is not set @@ -1816,0 +1833 @@ +# CONFIG_TOUCHSCREEN_MELFAS_MIP4 is not set @@ -1829 +1845,0 @@ -# CONFIG_TOUCHSCREEN_TS4800 is not set @@ -1856 +1871,0 @@ -# CONFIG_INPUT_AXP20X_PEK is not set @@ -1869,0 +1885 @@ +# CONFIG_RMI4_CORE is not set @@ -1922 +1937,0 @@ -# CONFIG_SERIAL_8250_INGENIC is not set @@ -1951,0 +1967 @@ +# CONFIG_SERIAL_MVEBU_UART is not set @@ -1981,0 +1998 @@ +# CONFIG_I2C_DEMUX_PINCTRL is not set @@ -2031,0 +2049 @@ +# CONFIG_SPI_AXI_SPI_ENGINE is not set @@ -2034,0 +2053 @@ +# CONFIG_SPI_DESIGNWARE is not set @@ -2048 +2066,0 @@ -# CONFIG_SPI_DESIGNWARE is not set @@ -2097 +2115 @@ -CONFIG_PINCTRL_SUNXI_COMMON=y +CONFIG_PINCTRL_SUNXI=y @@ -2109,0 +2128 @@ +CONFIG_PINCTRL_SUN8I_H3_R=y @@ -2131,0 +2151 @@ +# CONFIG_GPIO_MPC8XXX is not set @@ -2147,0 +2168 @@ +# CONFIG_GPIO_TPIC2810 is not set @@ -2159,0 +2181 @@ +# CONFIG_GPIO_PISOSR is not set @@ -2195 +2216,0 @@ -CONFIG_AXP20X_POWER=y @@ -2246,0 +2268 @@ +# CONFIG_SENSORS_LTC2990 is not set @@ -2364 +2385,0 @@ -# CONFIG_TS4800_WATCHDOG is not set @@ -2366 +2386,0 @@ -# CONFIG_BCM7038_WDT is not set @@ -2389,0 +2410 @@ +# CONFIG_MFD_ACT8945A is not set @@ -2397 +2418,2 @@ -CONFIG_MFD_AXP20X=y +# CONFIG_MFD_AXP20X_I2C is not set +# CONFIG_MFD_AXP20X_RSB is not set @@ -2454,0 +2477 @@ +# CONFIG_MFD_TPS65086 is not set @@ -2460 +2482,0 @@ -# CONFIG_MFD_TPS65912 is not set @@ -2491 +2512,0 @@ -# CONFIG_REGULATOR_AXP20X is not set @@ -2566,0 +2588,5 @@ +# ACP (Audio CoProcessor) Configuration +# +# CONFIG_DRM_AMD_ACP is not set + +# @@ -2636,0 +2663 @@ +CONFIG_SND_JACK_INPUT_DEV=y @@ -2702,0 +2730 @@ +# CONFIG_SND_SUN4I_SPDIF is not set @@ -2732 +2760,2 @@ -# CONFIG_SND_SOC_PCM179X is not set +# CONFIG_SND_SOC_PCM179X_I2C is not set +# CONFIG_SND_SOC_PCM179X_SPI is not set @@ -2736,0 +2766 @@ +# CONFIG_SND_SOC_RT5616 is not set @@ -2801,0 +2832 @@ +# CONFIG_HID_CMEDIA is not set @@ -3153,0 +3185 @@ +# CONFIG_LEDS_IS31FL32XX is not set @@ -3208 +3239,0 @@ -# CONFIG_R
Re: USB broken on Banana Pi in Linux 4.6
On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote: > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > > Hi, > > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > > and in 4.6 there is not even /dev/bus/usb. > > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host > > Controller" is the first one that is missing. > > > > Is this already a known issue? Or, does a 4.6 kernel need to be > > configured differently if you want USB? > > Are you sure you configured in the correct host controller that you > want? Have you done a diff of your .config files to see what you > changed? How did you create your 4.6 config? I used make oldconfig, and I diffed the configs for (hci|usb), and there is no difference. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: USB broken on Banana Pi in Linux 4.6
On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote: > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote: > > Hi, > > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), > > and in 4.6 there is not even /dev/bus/usb. > > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host > > Controller" is the first one that is missing. > > > > Is this already a known issue? Or, does a 4.6 kernel need to be > > configured differently if you want USB? > > Are you sure you configured in the correct host controller that you > want? Have you done a diff of your .config files to see what you > changed? How did you create your 4.6 config? I used make oldconfig, and I diffed the configs for (hci|usb), and there is no difference. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
USB broken on Banana Pi in Linux 4.6
Hi, on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), and in 4.6 there is not even /dev/bus/usb. Here is the log excerpt from a 4.5 kernel coming up: May 15 09:30:14 cadencia kernel: [5.307730] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver May 15 09:30:14 cadencia kernel: [5.312891] ehci-platform: EHCI generic platform driver May 15 09:30:14 cadencia kernel: [5.315579] sun4i-ss 1c15000.crypto-engine: no reset control found May 15 09:30:14 cadencia kernel: [5.317303] sun4i-ss 1c15000.crypto-engine: Die ID 0 May 15 09:30:14 cadencia kernel: [5.322742] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver May 15 09:30:14 cadencia kernel: [5.332052] ohci-platform: OHCI generic platform driver May 15 09:30:14 cadencia kernel: [5.360131] axp20x 0-0034: AXP20x variant AXP209 found May 15 09:30:14 cadencia kernel: [5.405989] axp20x 0-0034: AXP20X driver loaded May 15 09:30:14 cadencia kernel: [5.409201] ehci-platform 1c14000.usb: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.409271] ehci-platform 1c14000.usb: new USB bus registered, assigned bus number 1 May 15 09:30:14 cadencia kernel: [5.409506] ehci-platform 1c14000.usb: irq 29, io mem 0x01c14000 May 15 09:30:14 cadencia kernel: [5.410553] sunxi-wdt 1c20c90.watchdog: Watchdog enabled (timeout=16 sec, nowayout=0) May 15 09:30:14 cadencia kernel: [5.420414] ehci-platform 1c14000.usb: USB 2.0 started, EHCI 1.00 May 15 09:30:14 cadencia kernel: [5.420977] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 May 15 09:30:14 cadencia kernel: [5.420998] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.421010] usb usb1: Product: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.421021] usb usb1: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ehci_hcd May 15 09:30:14 cadencia kernel: [5.421033] usb usb1: SerialNumber: 1c14000.usb May 15 09:30:14 cadencia kernel: [5.422317] hub 1-0:1.0: USB hub found May 15 09:30:14 cadencia kernel: [5.422431] hub 1-0:1.0: 1 port detected May 15 09:30:14 cadencia kernel: [5.423753] ehci-platform 1c1c000.usb: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.423814] ehci-platform 1c1c000.usb: new USB bus registered, assigned bus number 2 May 15 09:30:14 cadencia kernel: [5.424055] ehci-platform 1c1c000.usb: irq 33, io mem 0x01c1c000 May 15 09:30:14 cadencia kernel: [5.432424] ehci-platform 1c1c000.usb: USB 2.0 started, EHCI 1.00 May 15 09:30:14 cadencia kernel: [5.433089] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002 May 15 09:30:14 cadencia kernel: [5.433110] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.433122] usb usb2: Product: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.433133] usb usb2: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ehci_hcd May 15 09:30:14 cadencia kernel: [5.433144] usb usb2: SerialNumber: 1c1c000.usb May 15 09:30:14 cadencia kernel: [5.434472] hub 2-0:1.0: USB hub found May 15 09:30:14 cadencia kernel: [5.434595] hub 2-0:1.0: 1 port detected May 15 09:30:14 cadencia kernel: [5.436189] ohci-platform 1c14400.usb: Generic Platform OHCI controller May 15 09:30:14 cadencia kernel: [5.436528] ohci-platform 1c14400.usb: new USB bus registered, assigned bus number 3 May 15 09:30:14 cadencia kernel: [5.436779] ohci-platform 1c14400.usb: irq 30, io mem 0x01c14400 May 15 09:30:14 cadencia kernel: [5.497002] usb usb3: New USB device found, idVendor=1d6b, idProduct=0001 May 15 09:30:14 cadencia kernel: [5.497032] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.497045] usb usb3: Product: Generic Platform OHCI controller May 15 09:30:14 cadencia kernel: [5.497056] usb usb3: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ohci_hcd May 15 09:30:14 cadencia kernel: [5.497068] usb usb3: SerialNumber: 1c14400.usb In kernel 4.6, the message "ohci-platform: OHCI generic platform driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host Controller" is the first one that is missing. Is this already a known issue? Or, does a 4.6 kernel need to be configured differently if you want USB? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
USB broken on Banana Pi in Linux 4.6
Hi, on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration is USB-wise identical to 4.5 (grepped for differences in (hci|usb)), and in 4.6 there is not even /dev/bus/usb. Here is the log excerpt from a 4.5 kernel coming up: May 15 09:30:14 cadencia kernel: [5.307730] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver May 15 09:30:14 cadencia kernel: [5.312891] ehci-platform: EHCI generic platform driver May 15 09:30:14 cadencia kernel: [5.315579] sun4i-ss 1c15000.crypto-engine: no reset control found May 15 09:30:14 cadencia kernel: [5.317303] sun4i-ss 1c15000.crypto-engine: Die ID 0 May 15 09:30:14 cadencia kernel: [5.322742] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver May 15 09:30:14 cadencia kernel: [5.332052] ohci-platform: OHCI generic platform driver May 15 09:30:14 cadencia kernel: [5.360131] axp20x 0-0034: AXP20x variant AXP209 found May 15 09:30:14 cadencia kernel: [5.405989] axp20x 0-0034: AXP20X driver loaded May 15 09:30:14 cadencia kernel: [5.409201] ehci-platform 1c14000.usb: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.409271] ehci-platform 1c14000.usb: new USB bus registered, assigned bus number 1 May 15 09:30:14 cadencia kernel: [5.409506] ehci-platform 1c14000.usb: irq 29, io mem 0x01c14000 May 15 09:30:14 cadencia kernel: [5.410553] sunxi-wdt 1c20c90.watchdog: Watchdog enabled (timeout=16 sec, nowayout=0) May 15 09:30:14 cadencia kernel: [5.420414] ehci-platform 1c14000.usb: USB 2.0 started, EHCI 1.00 May 15 09:30:14 cadencia kernel: [5.420977] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 May 15 09:30:14 cadencia kernel: [5.420998] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.421010] usb usb1: Product: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.421021] usb usb1: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ehci_hcd May 15 09:30:14 cadencia kernel: [5.421033] usb usb1: SerialNumber: 1c14000.usb May 15 09:30:14 cadencia kernel: [5.422317] hub 1-0:1.0: USB hub found May 15 09:30:14 cadencia kernel: [5.422431] hub 1-0:1.0: 1 port detected May 15 09:30:14 cadencia kernel: [5.423753] ehci-platform 1c1c000.usb: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.423814] ehci-platform 1c1c000.usb: new USB bus registered, assigned bus number 2 May 15 09:30:14 cadencia kernel: [5.424055] ehci-platform 1c1c000.usb: irq 33, io mem 0x01c1c000 May 15 09:30:14 cadencia kernel: [5.432424] ehci-platform 1c1c000.usb: USB 2.0 started, EHCI 1.00 May 15 09:30:14 cadencia kernel: [5.433089] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002 May 15 09:30:14 cadencia kernel: [5.433110] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.433122] usb usb2: Product: EHCI Host Controller May 15 09:30:14 cadencia kernel: [5.433133] usb usb2: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ehci_hcd May 15 09:30:14 cadencia kernel: [5.433144] usb usb2: SerialNumber: 1c1c000.usb May 15 09:30:14 cadencia kernel: [5.434472] hub 2-0:1.0: USB hub found May 15 09:30:14 cadencia kernel: [5.434595] hub 2-0:1.0: 1 port detected May 15 09:30:14 cadencia kernel: [5.436189] ohci-platform 1c14400.usb: Generic Platform OHCI controller May 15 09:30:14 cadencia kernel: [5.436528] ohci-platform 1c14400.usb: new USB bus registered, assigned bus number 3 May 15 09:30:14 cadencia kernel: [5.436779] ohci-platform 1c14400.usb: irq 30, io mem 0x01c14400 May 15 09:30:14 cadencia kernel: [5.497002] usb usb3: New USB device found, idVendor=1d6b, idProduct=0001 May 15 09:30:14 cadencia kernel: [5.497032] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 May 15 09:30:14 cadencia kernel: [5.497045] usb usb3: Product: Generic Platform OHCI controller May 15 09:30:14 cadencia kernel: [5.497056] usb usb3: Manufacturer: Linux 4.5.4-zgbpi-armmp-lpae ohci_hcd May 15 09:30:14 cadencia kernel: [5.497068] usb usb3: SerialNumber: 1c14400.usb In kernel 4.6, the message "ohci-platform: OHCI generic platform driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host Controller" is the first one that is missing. Is this already a known issue? Or, does a 4.6 kernel need to be configured differently if you want USB? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote: > Try this one better - it fixes an unitialized var. Nosireebob, VMs crash even with this patch in the host as soon as the host has THP enabled. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote: > Try this one better - it fixes an unitialized var. Nosireebob, VMs crash even with this patch in the host as soon as the host has THP enabled. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote: > On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote: > > How do I apply this? > > I'm attaching it. Had the VM crashing twice with this patch applied, THP==madvise on the host and THP==never in the VM. Now trying the other patch, assuming that it's intended to be used _instead_ of this one. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote: > On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote: > > How do I apply this? > > I'm attaching it. Had the VM crashing twice with this patch applied, THP==madvise on the host and THP==never in the VM. Now trying the other patch, assuming that it's intended to be used _instead_ of this one. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 09:35:45AM +0100, Dr. David Alan Gilbert wrote: > also between 4.4 and 4.5 it did seem worth mentioning as a long shot, > but it was no more than a long shot. It was however helpful. I'd have bisected kernel configuration instead of using the runtime control first, and seeing your long shot two weeks earlier, it'd have saved myself those two weeks of tedious bisecting. > Try Andrea's fix for (a). In the works. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 09:35:45AM +0100, Dr. David Alan Gilbert wrote: > also between 4.4 and 4.5 it did seem worth mentioning as a long shot, > but it was no more than a long shot. It was however helpful. I'd have bisected kernel configuration instead of using the runtime control first, and seeing your long shot two weeks earlier, it'd have saved myself those two weeks of tedious bisecting. > Try Andrea's fix for (a). In the works. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote: > Try this one better - it fixes an unitialized var. Instead, or in addiiton? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote: > Try this one better - it fixes an unitialized var. Instead, or in addiiton? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote: > On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote: > > How do I apply this? > > I'm attaching it. Ok, stupid me, I thought that one could simply curl the web page. Too bad that list archives keep mangling patches :-( It applies now to 4.5 as well. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote: > On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote: > > How do I apply this? > > I'm attaching it. Ok, stupid me, I thought that one could simply curl the web page. Too bad that list archives keep mangling patches :-( It applies now to 4.5 as well. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Thu, May 12, 2016 at 11:42:16PM +0300, Kirill A. Shutemov wrote: > But I guess it should apply cleanly to v4.5. Or at least without major > conflicts. [11/511]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 125290 125290 0 9844 0 --:--:-- 0:00:01 --:--:-- 9849 patching file include/linux/mm.h Hunk #1 succeeded at 456 with fuzz 1 (offset -44 lines). patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3340 (offset 117 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2331 with fuzz 2 (offset -56 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [12/512]mh@fan:~/linux/debug/linux$ It doesn't, and it doesn't apply to 4.6-rc3 as well: [17/517]mh@fan:~/linux/debug/linux$ git checkout v4.6-rc3 Checking out files: 100% (9945/9945), done. Previous HEAD position was b562e44... Linux 4.5 HEAD is now at bf16200... Linux 4.6-rc3 [18/518]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 125290 125290 0 9692 0 --:--:-- 0:00:01 --:--:-- 9697 patching file include/linux/mm.h patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3225 (offset 2 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2354 (offset -33 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [19/519]mh@fan:~/linux/debug/linux$ How do I apply this? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Thu, May 12, 2016 at 11:42:16PM +0300, Kirill A. Shutemov wrote: > But I guess it should apply cleanly to v4.5. Or at least without major > conflicts. [11/511]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 125290 125290 0 9844 0 --:--:-- 0:00:01 --:--:-- 9849 patching file include/linux/mm.h Hunk #1 succeeded at 456 with fuzz 1 (offset -44 lines). patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3340 (offset 117 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2331 with fuzz 2 (offset -56 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [12/512]mh@fan:~/linux/debug/linux$ It doesn't, and it doesn't apply to 4.6-rc3 as well: [17/517]mh@fan:~/linux/debug/linux$ git checkout v4.6-rc3 Checking out files: 100% (9945/9945), done. Previous HEAD position was b562e44... Linux 4.5 HEAD is now at bf16200... Linux 4.6-rc3 [18/518]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 125290 125290 0 9692 0 --:--:-- 0:00:01 --:--:-- 9697 patching file include/linux/mm.h patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3225 (offset 2 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2354 (offset -33 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [19/519]mh@fan:~/linux/debug/linux$ How do I apply this? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote: > http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarca...@redhat.com Is this in v4.6-rc7? If so, can I just test v4.6-rc7? If not so, would it be a valid approach to first check plain v4.6-rc7 and then patched v4.6-rc7? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: transparent huge pages breaks KVM on AMD.
On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote: > http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarca...@redhat.com Is this in v4.6-rc7? If so, can I just test v4.6-rc7? If not so, would it be a valid approach to first check plain v4.6-rc7 and then patched v4.6-rc7? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
transparent huge pages breaks KVM on AMD.
Hi David, On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > Hmm, your problem does sound like bad hardware, but > If you've got a nice reliable crash, can you try turning transparent huge > pages > off on the host; >echo never > /sys/kernel/mm/transparent_hugepage/enabled I must have missed this hint in the middle of the "your hardware is bad" avalance that came over me. I spent two weeks bisecting "good" kernels since during the repeated reconfigurations, transparent huge pages got turned off in kernel configuration. After running each kernel for 24 hours, I eventually ended up with a working 4.5 kernel. The configuration diff was short, showing transparent huge pages, and - finally - upon re-reading the thread I found your hint. I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest memory reliably in the first hour of running under disk load, causing the VM to either drop dead in the water, or to read randomness from disk. Rebooting fixes the VM. This happens as soon as transparent huge pages are turned on in the host. Turning off transparent huge pages by echo never > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even without rebooting the host. Start up the VM again and it works just fine. Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? Where should this issue be forwarded? Or do we just accept it and turn transparent huge pages off? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
transparent huge pages breaks KVM on AMD.
Hi David, On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > Hmm, your problem does sound like bad hardware, but > If you've got a nice reliable crash, can you try turning transparent huge > pages > off on the host; >echo never > /sys/kernel/mm/transparent_hugepage/enabled I must have missed this hint in the middle of the "your hardware is bad" avalance that came over me. I spent two weeks bisecting "good" kernels since during the repeated reconfigurations, transparent huge pages got turned off in kernel configuration. After running each kernel for 24 hours, I eventually ended up with a working 4.5 kernel. The configuration diff was short, showing transparent huge pages, and - finally - upon re-reading the thread I found your hint. I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest memory reliably in the first hour of running under disk load, causing the VM to either drop dead in the water, or to read randomness from disk. Rebooting fixes the VM. This happens as soon as transparent huge pages are turned on in the host. Turning off transparent huge pages by echo never > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even without rebooting the host. Start up the VM again and it works just fine. Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? Where should this issue be forwarded? Or do we just accept it and turn transparent huge pages off? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
On Wed, Apr 13, 2016 at 05:44:25PM +0200, Marc Haber wrote: > On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote: > > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493. > > due to problems on GeekBox and Banana Pi M1 board when > > connected to a real transceiver instead of a switch via > > fixed-link. > > This reversal is still needed in Linux 4.5.1 on Banana Pi. > > Please consider including it in Linux 4.5.2. This reversal is still needed in Linux 4.5.4 on Banana Pi. Please consider including it in Linux 4.5.5. Greetings Marc > > > > > Signed-off-by: Giuseppe Cavallaro <peppe.cavall...@st.com> > > Cc: Gabriel Fernandez <gabriel.fernan...@linaro.org> > > Cc: Andreas Färber <afaer...@suse.de> > > Cc: Frank Schäfer <fschaefer@googlemail.com> > > Cc: Dinh Nguyen <dinh.li...@gmail.com> > > Cc: David S. Miller <da...@davemloft.net> > > --- > > drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++- > > .../net/ethernet/stmicro/stmmac/stmmac_platform.c |9 + > > include/linux/stmmac.h |1 - > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > index ea76129..af09ced 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev) > > struct stmmac_priv *priv = netdev_priv(ndev); > > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; > > int addr, found; > > - struct device_node *mdio_node = priv->plat->mdio_node; > > + struct device_node *mdio_node = NULL; > > + struct device_node *child_node = NULL; > > > > if (!mdio_bus_data) > > return 0; > > > > if (IS_ENABLED(CONFIG_OF)) { > > + for_each_child_of_node(priv->device->of_node, child_node) { > > + if (of_device_is_compatible(child_node, > > + "snps,dwmac-mdio")) { > > + mdio_node = child_node; > > + break; > > + } > > + } > > + > > if (mdio_node) { > > netdev_dbg(ndev, "FOUND MDIO subnode\n"); > > } else { > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > index dcbd2a1..9cf181f 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, > > const char **mac) > > struct device_node *np = pdev->dev.of_node; > > struct plat_stmmacenet_data *plat; > > struct stmmac_dma_cfg *dma_cfg; > > - struct device_node *child_node = NULL; > > > > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL); > > if (!plat) > > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, > > const char **mac) > > plat->phy_node = of_node_get(np); > > } > > > > - for_each_child_of_node(np, child_node) > > - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { > > - plat->mdio_node = child_node; > > - break; > > - } > > - > > /* "snps,phy-addr" is not a standard property. Mark it as deprecated > > * and warn of its use. Remove this when phy node support is added. > > */ > > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) > > dev_warn(>dev, "snps,phy-addr property is deprecated\n"); > > > > - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) > > + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) > > plat->mdio_bus_data = NULL; > > else > > plat->mdio_bus_data = > > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h > > index 4bcf5a6..6e53fa8 100644 > > --- a/include/linux/stmmac.h > > +++ b/include/linux/stmmac.h > > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data { > >
Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
On Wed, Apr 13, 2016 at 05:44:25PM +0200, Marc Haber wrote: > On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote: > > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493. > > due to problems on GeekBox and Banana Pi M1 board when > > connected to a real transceiver instead of a switch via > > fixed-link. > > This reversal is still needed in Linux 4.5.1 on Banana Pi. > > Please consider including it in Linux 4.5.2. This reversal is still needed in Linux 4.5.4 on Banana Pi. Please consider including it in Linux 4.5.5. Greetings Marc > > > > > Signed-off-by: Giuseppe Cavallaro > > Cc: Gabriel Fernandez > > Cc: Andreas Färber > > Cc: Frank Schäfer > > Cc: Dinh Nguyen > > Cc: David S. Miller > > --- > > drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++- > > .../net/ethernet/stmicro/stmmac/stmmac_platform.c |9 + > > include/linux/stmmac.h |1 - > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > index ea76129..af09ced 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev) > > struct stmmac_priv *priv = netdev_priv(ndev); > > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; > > int addr, found; > > - struct device_node *mdio_node = priv->plat->mdio_node; > > + struct device_node *mdio_node = NULL; > > + struct device_node *child_node = NULL; > > > > if (!mdio_bus_data) > > return 0; > > > > if (IS_ENABLED(CONFIG_OF)) { > > + for_each_child_of_node(priv->device->of_node, child_node) { > > + if (of_device_is_compatible(child_node, > > + "snps,dwmac-mdio")) { > > + mdio_node = child_node; > > + break; > > + } > > + } > > + > > if (mdio_node) { > > netdev_dbg(ndev, "FOUND MDIO subnode\n"); > > } else { > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > index dcbd2a1..9cf181f 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, > > const char **mac) > > struct device_node *np = pdev->dev.of_node; > > struct plat_stmmacenet_data *plat; > > struct stmmac_dma_cfg *dma_cfg; > > - struct device_node *child_node = NULL; > > > > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL); > > if (!plat) > > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, > > const char **mac) > > plat->phy_node = of_node_get(np); > > } > > > > - for_each_child_of_node(np, child_node) > > - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { > > - plat->mdio_node = child_node; > > - break; > > - } > > - > > /* "snps,phy-addr" is not a standard property. Mark it as deprecated > > * and warn of its use. Remove this when phy node support is added. > > */ > > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) > > dev_warn(>dev, "snps,phy-addr property is deprecated\n"); > > > > - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) > > + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) > > plat->mdio_bus_data = NULL; > > else > > plat->mdio_bus_data = > > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h > > index 4bcf5a6..6e53fa8 100644 > > --- a/include/linux/stmmac.h > > +++ b/include/linux/stmmac.h > > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data { > > int interface; > > struct stmmac_mdio_bus_data *mdio_bus_data; > > struct device_node *phy_node; > > - struct device_node *mdio_node; > > struct stmmac_d
Re: Major KVM issues with kernel 4.5 on the host
On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote: > > Yes, but there are two symptoms. The VM either suffers file system > > issues (garbage read from files, or an aborted ext4 journal and > > following ro remount) or it stops dead in its tracks. > > Stops dead? What does that mean exactly? Box is wedged solid and it > doesn't react to any key presses? No ping, no reaction on serial console, no reaction on virtual console, no syslog entries. > Because if so, this could really be a DRAM going bad and a correctable > error turning into an uncorrectable. How old is the DRAM in that box? > Judging by your CPU, it should be a couple of years... Uncorrectable errors would still be identified by the ECC hardware, and the box wouldn't be perfectly fine with an "old" kernel. > > The box reports about one correctable error per week, so I probably > > have a faulty DIMM, but since the issue only surfaces in VMs while the > > host system is in perfect working order... > > So it could be that correctable error turns into an uncorrectable one at > some point. But then you should be getting an exception... Yes, that would be in the logs. > > And yes, I am pondering to simply replace the box with an Intel CPU. > > Your CPU is fine, from what I've seen so far. But we still postulate that the issue does only show on older AMD CPUs. Otherwise, I wouldn't be the only one making this experience. > > I go the way of Debian packages since it is easier to handle the > > crypto file systems when the machine is booting up. > > As long as you're testing the correct bisection kernels... I am reasonably sure about that, yes. > > And yes, I think about doing a test reinstall on unencrypted disk to > > find out whether encryption plays a role, but I currently need the > > machine to urgently to take it out of serice for half a month, and, > > again, the host system is in perfect working order, it is just VMs > > that barf. > > Yeah, I can't reproduce it here and I have a very similar box to yours > which is otherwise idle, more or less. > > Another fact which points to potentially DIMM going bad... Do you want me to memtest for 24 hours? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote: > > Yes, but there are two symptoms. The VM either suffers file system > > issues (garbage read from files, or an aborted ext4 journal and > > following ro remount) or it stops dead in its tracks. > > Stops dead? What does that mean exactly? Box is wedged solid and it > doesn't react to any key presses? No ping, no reaction on serial console, no reaction on virtual console, no syslog entries. > Because if so, this could really be a DRAM going bad and a correctable > error turning into an uncorrectable. How old is the DRAM in that box? > Judging by your CPU, it should be a couple of years... Uncorrectable errors would still be identified by the ECC hardware, and the box wouldn't be perfectly fine with an "old" kernel. > > The box reports about one correctable error per week, so I probably > > have a faulty DIMM, but since the issue only surfaces in VMs while the > > host system is in perfect working order... > > So it could be that correctable error turns into an uncorrectable one at > some point. But then you should be getting an exception... Yes, that would be in the logs. > > And yes, I am pondering to simply replace the box with an Intel CPU. > > Your CPU is fine, from what I've seen so far. But we still postulate that the issue does only show on older AMD CPUs. Otherwise, I wouldn't be the only one making this experience. > > I go the way of Debian packages since it is easier to handle the > > crypto file systems when the machine is booting up. > > As long as you're testing the correct bisection kernels... I am reasonably sure about that, yes. > > And yes, I think about doing a test reinstall on unencrypted disk to > > find out whether encryption plays a role, but I currently need the > > machine to urgently to take it out of serice for half a month, and, > > again, the host system is in perfect working order, it is just VMs > > that barf. > > Yeah, I can't reproduce it here and I have a very similar box to yours > which is otherwise idle, more or less. > > Another fact which points to potentially DIMM going bad... Do you want me to memtest for 24 hours? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 21, 2016 at 06:51:06PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote: > > What bothers me is that since I ended up with a "suspect" commit that > > actually results in a "good" kernel (running for 22 hours now), I must > > have said "bad" to an actually "good" kernel, which means that I had > > an unrelated crash or corruption. Is that reasoning correct? > > Hmm, did that "unrelated crash or corruption" have the same symptoms as > the original one? Yes, but there are two symptoms. The VM either suffers file system issues (garbage read from files, or an aborted ext4 journal and following ro remount) or it stops dead in its tracks. > > That one qualified as "good" six days ago. I'll retry, maybe I just > > didn't wait long enough. > > So if the trigger time is varying so much, I'd try to double that to > make sure I'm fairly certain about each commit I'm testing. The longest trigger time I have seen was three hours, I tripled that to nine hours, that probably was not enough. > Also, this is a single box we're talking about, right? And you're sure > it hasn't had any corruption issues so far? It is a single box, and it runs perfectly with kernel 4.4. > I see you have amd64_edac loading, so it must have ECC DIMMs. Have you > had any reports in the past of ECC errors in dmesg? Or other MCEs, > lockups, etc? Can you grep your logs for stuff like "hardware error", > "mce", "edac" etc? Do a case-insensitive search. The box reports about one correctable error per week, so I probably have a faulty DIMM, but since the issue only surfaces in VMs while the host system is in perfect working order... And yes, I am pondering to simply replace the box with an Intel CPU. I see "mce: CPU supports 6 MCE banks" once for each reboot, and about 30 "Machine check events logged" since January. How do I see which events were logged? > > "Trying" means make oldconfig, make deb-pkg in my case right? Does it > > matter what I answer to the numerous config questions that keep coming > > up during the oldconfig step? > > What I do is: > > $ git bisect <good|bad> > > to mark the current commit after having tested it. Then I do > > $ yes "" | make oldconfig > > to set the new config options. So you basically select the default for new options. > Then > > $ make -j7 > $ make modules_install install > > and reboot into the new kernel. Kernel name will possibly change each > time so I write down on paper which kernel I'm testing. I go the way of Debian packages since it is easier to handle the crypto file systems when the machine is booting up. And yes, I think about doing a test reinstall on unencrypted disk to find out whether encryption plays a role, but I currently need the machine to urgently to take it out of serice for half a month, and, again, the host system is in perfect working order, it is just VMs that barf. > You can verify when booting it by doing: > > $ dmesg | head > [0.00] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 > 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016 > ... > > that date at the end of the line and number "#1" should be current. I check the date of the package I am installing and the date stamp of the kernels being installed to /boot. I'm reasonably sure I have that under control. > > Would it help to explicitly mark > > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge > > gained during the last week is not completely lost? > > I'd do the whole thing again, just to be sure. > > I know, bisection is very time-consuming :-\ And it is particularly > annoying if it is done on the box I'm normally using daily. ... and if testing a "good" kernel means a day. > > So I need to git log | grep 46896c73c1a4 and apply the patch again > > each time the commit is found? > > I think you can let git do that for ya: > > $ git branch --contains 46896c73c1a4 > * (HEAD detached at 46896c73c1a4) > > that lists that the current checked out HEAD contains that commit. If you do > > $ git checkout 46896c73c1a4~1 > > then that "(HEAD detached..." line is not in the list of branches > containing it. And whenever 46896c73c1a4 is present, I need to apply Paolo's patch, right? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 21, 2016 at 06:51:06PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote: > > What bothers me is that since I ended up with a "suspect" commit that > > actually results in a "good" kernel (running for 22 hours now), I must > > have said "bad" to an actually "good" kernel, which means that I had > > an unrelated crash or corruption. Is that reasoning correct? > > Hmm, did that "unrelated crash or corruption" have the same symptoms as > the original one? Yes, but there are two symptoms. The VM either suffers file system issues (garbage read from files, or an aborted ext4 journal and following ro remount) or it stops dead in its tracks. > > That one qualified as "good" six days ago. I'll retry, maybe I just > > didn't wait long enough. > > So if the trigger time is varying so much, I'd try to double that to > make sure I'm fairly certain about each commit I'm testing. The longest trigger time I have seen was three hours, I tripled that to nine hours, that probably was not enough. > Also, this is a single box we're talking about, right? And you're sure > it hasn't had any corruption issues so far? It is a single box, and it runs perfectly with kernel 4.4. > I see you have amd64_edac loading, so it must have ECC DIMMs. Have you > had any reports in the past of ECC errors in dmesg? Or other MCEs, > lockups, etc? Can you grep your logs for stuff like "hardware error", > "mce", "edac" etc? Do a case-insensitive search. The box reports about one correctable error per week, so I probably have a faulty DIMM, but since the issue only surfaces in VMs while the host system is in perfect working order... And yes, I am pondering to simply replace the box with an Intel CPU. I see "mce: CPU supports 6 MCE banks" once for each reboot, and about 30 "Machine check events logged" since January. How do I see which events were logged? > > "Trying" means make oldconfig, make deb-pkg in my case right? Does it > > matter what I answer to the numerous config questions that keep coming > > up during the oldconfig step? > > What I do is: > > $ git bisect > > to mark the current commit after having tested it. Then I do > > $ yes "" | make oldconfig > > to set the new config options. So you basically select the default for new options. > Then > > $ make -j7 > $ make modules_install install > > and reboot into the new kernel. Kernel name will possibly change each > time so I write down on paper which kernel I'm testing. I go the way of Debian packages since it is easier to handle the crypto file systems when the machine is booting up. And yes, I think about doing a test reinstall on unencrypted disk to find out whether encryption plays a role, but I currently need the machine to urgently to take it out of serice for half a month, and, again, the host system is in perfect working order, it is just VMs that barf. > You can verify when booting it by doing: > > $ dmesg | head > [0.00] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 > 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016 > ... > > that date at the end of the line and number "#1" should be current. I check the date of the package I am installing and the date stamp of the kernels being installed to /boot. I'm reasonably sure I have that under control. > > Would it help to explicitly mark > > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge > > gained during the last week is not completely lost? > > I'd do the whole thing again, just to be sure. > > I know, bisection is very time-consuming :-\ And it is particularly > annoying if it is done on the box I'm normally using daily. ... and if testing a "good" kernel means a day. > > So I need to git log | grep 46896c73c1a4 and apply the patch again > > each time the commit is found? > > I think you can let git do that for ya: > > $ git branch --contains 46896c73c1a4 > * (HEAD detached at 46896c73c1a4) > > that lists that the current checked out HEAD contains that commit. If you do > > $ git checkout 46896c73c1a4~1 > > then that "(HEAD detached..." line is not in the list of branches > containing it. And whenever 46896c73c1a4 is present, I need to apply Paolo's patch, right? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 21, 2016 at 02:37:11PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote: > > Currently, I cannot explain how this has happened, I must have flagged > > an actually good kernel as bad from my understanding of git bisect. > > > > Can you give advice how to continue here? > > Yap, sounds like you marked a bisection step incorrectly, which lead > into the wrong direction. How reliable is your reproducer? Usually, the crash or filesystem corruption happens in the first 15 to 30 minutes. I have had one instance running three hours before corrupting, I have therefore upped the run time to nine hours before saying "this kernel is good". What bothers me is that since I ended up with a "suspect" commit that actually results in a "good" kernel (running for 22 hours now), I must have said "bad" to an actually "good" kernel, which means that I had an unrelated crash or corruption. Is that reasoning correct? > Also, do the bisection as Paolo suggested: > > * try 45bdbcfdf241. That one qualified as "good" six days ago. I'll retry, maybe I just didn't wait long enough. "Trying" means make oldconfig, make deb-pkg in my case right? Does it matter what I answer to the numerous config questions that keep coming up during the oldconfig step? > * then do > > $ git bisect start v4.5-rc1 v4.4 > > which marks -rc1 as bad and 4.4 as good. Would it help to explicitly mark 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge gained during the last week is not completely lost? > While you're doing that bisect, do what Paolo said by applying the diff > here > > https://lkml.kernel.org/r/570eadd2.8030...@redhat.com > > when the bisection point you're at at each step contains > > 46896c73c1a4 ("KVM: svm: add support for RDTSCP") > > You should apply the above hunk by doing > > $ patch -p1 --dry-run -i /tmp/hunk > > If it applies fine, you then apply it > > $ patch -p1 -i /tmp/hunk > > All clear? So I need to git log | grep 46896c73c1a4 and apply the patch again each time the commit is found? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 21, 2016 at 02:37:11PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote: > > Currently, I cannot explain how this has happened, I must have flagged > > an actually good kernel as bad from my understanding of git bisect. > > > > Can you give advice how to continue here? > > Yap, sounds like you marked a bisection step incorrectly, which lead > into the wrong direction. How reliable is your reproducer? Usually, the crash or filesystem corruption happens in the first 15 to 30 minutes. I have had one instance running three hours before corrupting, I have therefore upped the run time to nine hours before saying "this kernel is good". What bothers me is that since I ended up with a "suspect" commit that actually results in a "good" kernel (running for 22 hours now), I must have said "bad" to an actually "good" kernel, which means that I had an unrelated crash or corruption. Is that reasoning correct? > Also, do the bisection as Paolo suggested: > > * try 45bdbcfdf241. That one qualified as "good" six days ago. I'll retry, maybe I just didn't wait long enough. "Trying" means make oldconfig, make deb-pkg in my case right? Does it matter what I answer to the numerous config questions that keep coming up during the oldconfig step? > * then do > > $ git bisect start v4.5-rc1 v4.4 > > which marks -rc1 as bad and 4.4 as good. Would it help to explicitly mark 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge gained during the last week is not completely lost? > While you're doing that bisect, do what Paolo said by applying the diff > here > > https://lkml.kernel.org/r/570eadd2.8030...@redhat.com > > when the bisection point you're at at each step contains > > 46896c73c1a4 ("KVM: svm: add support for RDTSCP") > > You should apply the above hunk by doing > > $ patch -p1 --dry-run -i /tmp/hunk > > If it applies fine, you then apply it > > $ patch -p1 -i /tmp/hunk > > All clear? So I need to git log | grep 46896c73c1a4 and apply the patch again each time the commit is found? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 07:22:20AM +0200, Marc Haber wrote: > On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > > On 14/04/2016 00:29, Marc Haber wrote: > > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > > >> What the hell was I thinking, I missed write_rdtscp_aux who > > >> obviously uses MSR_TSC_AUX. > > > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > > manually, and there is no change in behavior. Sometimes, the Vm just > > > crashes, but most times the filesystem is remounted ro. > > > > Ok, then I guess bisection is needed. Please first try commit > > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > > window changes for 4.5-rc1. Please apply the patch I sent here when > > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > > that probably that should be the commit you try second; the bisection > > then becomes much easier). > > I have never bisected this deeply. Can you please give more advice, > with which two commits to start? And how do I find out whether I am > "past" a commit? I am als not a git expert, a few command lines would > be appreciated. I have tried bisecting, and finally bisect says that the bad commit is 0e749e54244eec87b2a3cd0a4314e60bc6781115 dax: increase granularity of dax_clear_blocks() operations However, a kernel built after $ git checkout 0e749e54244eec87b2a3cd0a4314e60bc6781115 seems to be fine, at least my VM is running for 15 hours now. I guess I need to start over again with git bisect good 0e749e54244eec87b2a3cd0a4314e60bc6781115 and git bisect bad v4.5. Currently, I cannot explain how this has happened, I must have flagged an actually good kernel as bad from my understanding of git bisect. Can you give advice how to continue here? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 07:22:20AM +0200, Marc Haber wrote: > On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > > On 14/04/2016 00:29, Marc Haber wrote: > > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > > >> What the hell was I thinking, I missed write_rdtscp_aux who > > >> obviously uses MSR_TSC_AUX. > > > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > > manually, and there is no change in behavior. Sometimes, the Vm just > > > crashes, but most times the filesystem is remounted ro. > > > > Ok, then I guess bisection is needed. Please first try commit > > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > > window changes for 4.5-rc1. Please apply the patch I sent here when > > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > > that probably that should be the commit you try second; the bisection > > then becomes much easier). > > I have never bisected this deeply. Can you please give more advice, > with which two commits to start? And how do I find out whether I am > "past" a commit? I am als not a git expert, a few command lines would > be appreciated. I have tried bisecting, and finally bisect says that the bad commit is 0e749e54244eec87b2a3cd0a4314e60bc6781115 dax: increase granularity of dax_clear_blocks() operations However, a kernel built after $ git checkout 0e749e54244eec87b2a3cd0a4314e60bc6781115 seems to be fine, at least my VM is running for 15 hours now. I guess I need to start over again with git bisect good 0e749e54244eec87b2a3cd0a4314e60bc6781115 and git bisect bad v4.5. Currently, I cannot explain how this has happened, I must have flagged an actually good kernel as bad from my understanding of git bisect. Can you give advice how to continue here? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 07:30:43PM +0200, Paolo Bonzini wrote: > On 14/04/2016 18:47, Marc Haber wrote: > >> > Ok, then I guess bisection is needed. Please first try commit > >> > 45bdbcfdf241. > > I did git checkout 45bdbcfdf241 and built the resulting kernel > > 4.4.0-rc5. This one has now been running for ten hours, which is > > threefold the longest time that a faulty kernel has held before a VM > > experienced corruption. So I guess, that one is fine. > > Interesting, this means it's not a KVM bug. You can ignore my patch > from yesterday (though we'll get it in anyway). > > > Since 4.5.0-rc1 is bad, I guess I do: > > > > git checkout 45bdbcfdf241 > > git bisect start > > git bisect good > > git bisect bad v4.5.0-rc1 > > This is correct but you also want to do > > git bisect good 4.4.0 > git bisect good 4.4.0-rc5 > > so that bisection basically works through the commits in the merge window. So I start over from this: [47/544]mh@fan:~/linux/debug/linux$ git checkout 45bdbcfdf241 HEAD is now at 45bdbcf... kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL [48/545]mh@fan:~/linux/debug/linux$ git bisect start [49/546]mh@fan:~/linux/debug/linux$ git bisect good [50/547]mh@fan:~/linux/debug/linux$ git bisect bad v4.5-rc1 Bisecting: 5761 revisions left to test after this (roughly 13 steps) [cbd88cd4c07f9361914ab7fd7e21c9227986fe68] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux [51/548]mh@fan:~/linux/debug/linux$ git bisect good v4.4 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [52/549]mh@fan:~/linux/debug/linux$ git bisect good v4.4-rc5 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [53/550]mh@fan:~/linux/debug/linux$ This is going to take a few days as detecting a "bad" version may take a few hours. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 07:30:43PM +0200, Paolo Bonzini wrote: > On 14/04/2016 18:47, Marc Haber wrote: > >> > Ok, then I guess bisection is needed. Please first try commit > >> > 45bdbcfdf241. > > I did git checkout 45bdbcfdf241 and built the resulting kernel > > 4.4.0-rc5. This one has now been running for ten hours, which is > > threefold the longest time that a faulty kernel has held before a VM > > experienced corruption. So I guess, that one is fine. > > Interesting, this means it's not a KVM bug. You can ignore my patch > from yesterday (though we'll get it in anyway). > > > Since 4.5.0-rc1 is bad, I guess I do: > > > > git checkout 45bdbcfdf241 > > git bisect start > > git bisect good > > git bisect bad v4.5.0-rc1 > > This is correct but you also want to do > > git bisect good 4.4.0 > git bisect good 4.4.0-rc5 > > so that bisection basically works through the commits in the merge window. So I start over from this: [47/544]mh@fan:~/linux/debug/linux$ git checkout 45bdbcfdf241 HEAD is now at 45bdbcf... kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL [48/545]mh@fan:~/linux/debug/linux$ git bisect start [49/546]mh@fan:~/linux/debug/linux$ git bisect good [50/547]mh@fan:~/linux/debug/linux$ git bisect bad v4.5-rc1 Bisecting: 5761 revisions left to test after this (roughly 13 steps) [cbd88cd4c07f9361914ab7fd7e21c9227986fe68] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux [51/548]mh@fan:~/linux/debug/linux$ git bisect good v4.4 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [52/549]mh@fan:~/linux/debug/linux$ git bisect good v4.4-rc5 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [53/550]mh@fan:~/linux/debug/linux$ This is going to take a few days as detecting a "bad" version may take a few hours. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. I did git checkout 45bdbcfdf241 and built the resulting kernel 4.4.0-rc5. This one has now been running for ten hours, which is threefold the longest time that a faulty kernel has held before a VM experienced corruption. So I guess, that one is fine. Since 4.5.0-rc1 is bad, I guess I do: git checkout 45bdbcfdf241 git bisect start git bisect good git bisect bad v4.5.0-rc1 right? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. I did git checkout 45bdbcfdf241 and built the resulting kernel 4.4.0-rc5. This one has now been running for ten hours, which is threefold the longest time that a faulty kernel has held before a VM experienced corruption. So I guess, that one is fine. Since 4.5.0-rc1 is bad, I guess I do: git checkout 45bdbcfdf241 git bisect start git bisect good git bisect bad v4.5.0-rc1 right? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. That kernel labels itself as "4.4.0-rc5+", is that correct? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. That kernel labels itself as "4.4.0-rc5+", is that correct? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > On 14/04/2016 00:29, Marc Haber wrote: > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > >> What the hell was I thinking, I missed write_rdtscp_aux who > >> obviously uses MSR_TSC_AUX. > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > manually, and there is no change in behavior. Sometimes, the Vm just > > crashes, but most times the filesystem is remounted ro. > > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > window changes for 4.5-rc1. Please apply the patch I sent here when > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > that probably that should be the commit you try second; the bisection > then becomes much easier). I have never bisected this deeply. Can you please give more advice, with which two commits to start? And how do I find out whether I am "past" a commit? I am als not a git expert, a few command lines would be appreciated. Things have not become any easier this night; 4.5-rc7 ran for more than three hours before it failed :-( Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > On 14/04/2016 00:29, Marc Haber wrote: > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > >> What the hell was I thinking, I missed write_rdtscp_aux who > >> obviously uses MSR_TSC_AUX. > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > manually, and there is no change in behavior. Sometimes, the Vm just > > crashes, but most times the filesystem is remounted ro. > > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > window changes for 4.5-rc1. Please apply the patch I sent here when > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > that probably that should be the commit you try second; the bisection > then becomes much easier). I have never bisected this deeply. Can you please give more advice, with which two commits to start? And how do I find out whether I am "past" a commit? I am als not a git expert, a few command lines would be appreciated. Things have not become any easier this night; 4.5-rc7 ran for more than three hours before it failed :-( Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. I applied this patch to 4.5, which didn't go cleanly, I had to do it manually, and there is no change in behavior. Sometimes, the Vm just crashes, but most times the filesystem is remounted ro. [ 84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903 [ 84.664877] Aborting journal on device dm-0-8. [ 84.667992] EXT4-fs (dm-0): Remounting filesystem read-only [ 84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal [ 84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898 [ 84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895 [ 84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893 [ 84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900 [ 84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889 [ 84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891 [ 98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897 [ 98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904 [ 99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892 [ 99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901 [ 99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890 [ 99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896 [ 99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899 [ 99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894 [ 207.132045] serial8250: too much work for irq4 [ 207.220043] serial8250: too much work for irq4 [ 207.312028] serial8250: too much work for irq4 Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. I applied this patch to 4.5, which didn't go cleanly, I had to do it manually, and there is no change in behavior. Sometimes, the Vm just crashes, but most times the filesystem is remounted ro. [ 84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903 [ 84.664877] Aborting journal on device dm-0-8. [ 84.667992] EXT4-fs (dm-0): Remounting filesystem read-only [ 84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal [ 84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898 [ 84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895 [ 84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893 [ 84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900 [ 84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889 [ 84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891 [ 98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897 [ 98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904 [ 99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892 [ 99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901 [ 99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890 [ 99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896 [ 99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899 [ 99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894 [ 207.132045] serial8250: too much work for irq4 [ 207.220043] serial8250: too much work for irq4 [ 207.312028] serial8250: too much work for irq4 Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. So you want me to apply that to 4.5 od 4.5.1 and try that? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. So you want me to apply that to 4.5 od 4.5.1 and try that? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote: > On 17/03/2016 19:11, Borislav Petkov wrote: > > I'm going to try reproducing the issue on a less "important" machine > > so that bisecting is less painful, but maybe you guys have an idea > > what's going wrong here. > > No idea, sorry. :( Bisecting would be great. Working on that now. > I'll also try reproducing and bisecting next week, in the meanwhile > just having the host dmesg would help a lot. Attached. I hope the message will get through to the list. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 [0.00] Linux version 4.5.1-zgws1 (mh@fan) (gcc version 5.3.1 20160409 (Debian 5.3.1-14) ) #2 SMP Wed Apr 13 06:32:03 UTC 2016 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.5.1-zgws1 root=/dev/mapper/root ro radeon.modeset=1 splash quiet scsi_mod.scan=sync [0.00] tseg: 00 [0.00] x86/fpu: Legacy x87 FPU detected. [0.00] x86/fpu: Using 'lazy' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009abff] usable [0.00] BIOS-e820: [mem 0x0009ac00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e2000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcff7] usable [0.00] BIOS-e820: [mem 0xcff8-0xcff97fff] ACPI data [0.00] BIOS-e820: [mem 0xcff98000-0xcffb] ACPI NVS [0.00] BIOS-e820: [mem 0xcffc-0xcfff] reserved [0.00] BIOS-e820: [mem 0xffa0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00062fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: System manufacturer System Product Name/M5A88-V EVO, BIOS 160310/12/2012 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x63 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base mask 8000 write-back [0.00] 1 base 8000 mask C000 write-back [0.00] 2 base C000 mask F000 write-back [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00063000 aka 25344M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xd000-0x] usable ==> reserved [0.00] e820: last_pfn = 0xcff80 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [880ff780] [0.00] Base memory trampoline at [88094000] 94000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x01a83000, 0x01a83fff] PGTABLE [0.00] BRK [0x01a84000, 0x01a84fff] PGTABLE [0.00] BRK [0x01a85000, 0x01a85fff] PGTABLE [0.00] BRK [0x01a86000, 0x01a86fff] PGTABLE [0.00] RAMDISK: [mem 0x357bc000-0x36bd5fff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000FBC30 24 (v02 ACPIAM) [0.00] ACPI: XSDT 0xCFF80100 54 (v01 101212 XSDT1626 20121012 MSFT 0097) [0.00] ACPI: FACP 0xCFF80290 F4 (v03 101212 FACP1626 20121012 MSFT 0097) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20160108/tbfadt-623) [0.00] ACPI: DSDT 0xCFF80460 00F14D (v01 A1867 A1867001 0001 INTL 20060113) [0.00] ACPI: FACS 0xCFF98000 40 [0.00] ACPI: FACS 0xCFF98000 40 [0.00] ACPI: APIC 0xCFF80390 8C (v01 101212 APIC1626 20121012 MSFT 0097) [0.00] ACPI: MCFG 0xCFF80420 3C (v01 101212 OEMMCFG 20121012 MSFT 0097) [0.00] ACPI: OEMB 0xCFF98040 72 (v01 101212 OEMB1626 20121012 MSFT 0097) [0.00] ACPI: HPET 0xCFF8F8B0 38 (v01 101212 OEMHPET 20121012 MSFT 0097) [0.00] ACPI: SSDT 0xCFF8F8F0 000E10 (v01 A M I POWERNOW 0001 AMD 0
Re: Major KVM issues with kernel 4.5 on the host
On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote: > On 17/03/2016 19:11, Borislav Petkov wrote: > > I'm going to try reproducing the issue on a less "important" machine > > so that bisecting is less painful, but maybe you guys have an idea > > what's going wrong here. > > No idea, sorry. :( Bisecting would be great. Working on that now. > I'll also try reproducing and bisecting next week, in the meanwhile > just having the host dmesg would help a lot. Attached. I hope the message will get through to the list. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 [0.00] Linux version 4.5.1-zgws1 (mh@fan) (gcc version 5.3.1 20160409 (Debian 5.3.1-14) ) #2 SMP Wed Apr 13 06:32:03 UTC 2016 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.5.1-zgws1 root=/dev/mapper/root ro radeon.modeset=1 splash quiet scsi_mod.scan=sync [0.00] tseg: 00 [0.00] x86/fpu: Legacy x87 FPU detected. [0.00] x86/fpu: Using 'lazy' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009abff] usable [0.00] BIOS-e820: [mem 0x0009ac00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e2000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcff7] usable [0.00] BIOS-e820: [mem 0xcff8-0xcff97fff] ACPI data [0.00] BIOS-e820: [mem 0xcff98000-0xcffb] ACPI NVS [0.00] BIOS-e820: [mem 0xcffc-0xcfff] reserved [0.00] BIOS-e820: [mem 0xffa0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00062fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: System manufacturer System Product Name/M5A88-V EVO, BIOS 160310/12/2012 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x63 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base mask 8000 write-back [0.00] 1 base 8000 mask C000 write-back [0.00] 2 base C000 mask F000 write-back [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00063000 aka 25344M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xd000-0x] usable ==> reserved [0.00] e820: last_pfn = 0xcff80 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [880ff780] [0.00] Base memory trampoline at [88094000] 94000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x01a83000, 0x01a83fff] PGTABLE [0.00] BRK [0x01a84000, 0x01a84fff] PGTABLE [0.00] BRK [0x01a85000, 0x01a85fff] PGTABLE [0.00] BRK [0x01a86000, 0x01a86fff] PGTABLE [0.00] RAMDISK: [mem 0x357bc000-0x36bd5fff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000FBC30 24 (v02 ACPIAM) [0.00] ACPI: XSDT 0xCFF80100 54 (v01 101212 XSDT1626 20121012 MSFT 0097) [0.00] ACPI: FACP 0xCFF80290 F4 (v03 101212 FACP1626 20121012 MSFT 0097) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20160108/tbfadt-623) [0.00] ACPI: DSDT 0xCFF80460 00F14D (v01 A1867 A1867001 0001 INTL 20060113) [0.00] ACPI: FACS 0xCFF98000 40 [0.00] ACPI: FACS 0xCFF98000 40 [0.00] ACPI: APIC 0xCFF80390 8C (v01 101212 APIC1626 20121012 MSFT 0097) [0.00] ACPI: MCFG 0xCFF80420 3C (v01 101212 OEMMCFG 20121012 MSFT 0097) [0.00] ACPI: OEMB 0xCFF98040 72 (v01 101212 OEMB1626 20121012 MSFT 0097) [0.00] ACPI: HPET 0xCFF8F8B0 38 (v01 101212 OEMHPET 20121012 MSFT 0097) [0.00] ACPI: SSDT 0xCFF8F8F0 000E10 (v01 A M I POWERNOW 0001 AMD 0
Re: Major KVM issues with kernel 4.5 on the host
On Sun, Mar 20, 2016 at 07:58:13PM +0100, Borislav Petkov wrote: > So I'm not sure what even happens here yet. I haven't seen anything out > of the ordinary in Marc's dmesg and I wasn't able to reproduce either. > So would it be good to try with "npt=0"? Sure, why not. npt=0 goes on the kernel command line of the host or of the guest? Or is it a KVM option? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Sun, Mar 20, 2016 at 07:58:13PM +0100, Borislav Petkov wrote: > So I'm not sure what even happens here yet. I haven't seen anything out > of the ordinary in Marc's dmesg and I wasn't able to reproduce either. > So would it be good to try with "npt=0"? Sure, why not. npt=0 goes on the kernel command line of the host or of the guest? Or is it a KVM option? Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
On Sun, Mar 20, 2016 at 02:31:58PM +0100, Borislav Petkov wrote: > On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote: > > Booting Debian Linux, apt-get update, apt-get upgrade, and run aide > > (which builds checksums for the entire filesystem, a rather disk-bound > > activity). > > So I did that and aide ran a whole init and check all the way through > and all fine. I don't see anything out of the ordinary in your dmesg > outputs either. > > The next things we should look like is: > > * diff .configs - there might be something there# Here we go: [2/501]mh@fan:~$ diff -u0 /boot/config-4.4.6-zgws1 /boot/config-4.5.1-zgws1 --- /boot/config-4.4.6-zgws12016-03-28 15:50:36.0 +0200 +++ /boot/config-4.5.1-zgws12016-04-13 08:32:44.0 +0200 @@ -3 +3 @@ -# Linux/x86_64 4.4.6 Kernel Configuration +# Linux/x86_64 4.5.1 Kernel Configuration @@ -14 +13,0 @@ -CONFIG_HAVE_LATENCYTOP_SUPPORT=y @@ -15,0 +15,4 @@ +CONFIG_ARCH_MMAP_RND_BITS_MIN=28 +CONFIG_ARCH_MMAP_RND_BITS_MAX=32 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 @@ -147,7 +149,0 @@ -# CONFIG_CGROUP_DEBUG is not set -CONFIG_CGROUP_FREEZER=y -CONFIG_CGROUP_PIDS=y -CONFIG_CGROUP_DEVICE=y -CONFIG_CPUSETS=y -CONFIG_PROC_PID_CPUSET=y -CONFIG_CGROUP_CPUACCT=y @@ -158,3 +154,3 @@ -# CONFIG_MEMCG_KMEM is not set -# CONFIG_CGROUP_HUGETLB is not set -CONFIG_CGROUP_PERF=y +CONFIG_BLK_CGROUP=y +# CONFIG_DEBUG_BLK_CGROUP is not set +CONFIG_CGROUP_WRITEBACK=y @@ -165,3 +161,9 @@ -CONFIG_BLK_CGROUP=y -# CONFIG_DEBUG_BLK_CGROUP is not set -CONFIG_CGROUP_WRITEBACK=y +CONFIG_CGROUP_PIDS=y +CONFIG_CGROUP_FREEZER=y +# CONFIG_CGROUP_HUGETLB is not set +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y +CONFIG_CGROUP_DEVICE=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_CGROUP_PERF=y +# CONFIG_CGROUP_DEBUG is not set @@ -254 +255,0 @@ -CONFIG_HAVE_DMA_ATTRS=y @@ -288,0 +290,4 @@ +CONFIG_HAVE_ARCH_MMAP_RND_BITS=y +CONFIG_ARCH_MMAP_RND_BITS=28 +CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y +CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8 @@ -377,0 +383 @@ +CONFIG_X86_FAST_FEATURE_TESTS=y @@ -383 +389 @@ -CONFIG_IOSF_MBI=m +CONFIG_IOSF_MBI=y @@ -390,0 +397 @@ +# CONFIG_QUEUED_LOCK_STAT is not set @@ -769,0 +777 @@ +# CONFIG_VMD is not set @@ -772,0 +781 @@ +CONFIG_NET_EGRESS=y @@ -824,0 +834 @@ +# CONFIG_INET_DIAG_DESTROY is not set @@ -945,0 +956,3 @@ +CONFIG_NF_DUP_NETDEV=m +CONFIG_NFT_DUP_NETDEV=m +CONFIG_NFT_FWD_NETDEV=m @@ -1252,0 +1266 @@ +# CONFIG_6LOWPAN_DEBUGFS is not set @@ -1344,0 +1359 @@ +CONFIG_SOCK_CGROUP_DATA=y @@ -1411 +1425,0 @@ -CONFIG_WEXT_SPY=y @@ -1423,5 +1437 @@ -CONFIG_LIB80211=m -CONFIG_LIB80211_CRYPT_WEP=m -CONFIG_LIB80211_CRYPT_CCMP=m -CONFIG_LIB80211_CRYPT_TKIP=m -# CONFIG_LIB80211_DEBUG is not set +# CONFIG_LIB80211 is not set @@ -1469 +1479,2 @@ -# CONFIG_NFC_ST_NCI is not set +# CONFIG_NFC_ST_NCI_I2C is not set +# CONFIG_NFC_ST_NCI_SPI is not set @@ -1616,2 +1627,2 @@ -CONFIG_PARPORT_PC=m -CONFIG_PARPORT_SERIAL=m +CONFIG_PARPORT_PC=y +CONFIG_PARPORT_SERIAL=y @@ -1619 +1630 @@ -CONFIG_PARPORT_PC_SUPERIO=y +# CONFIG_PARPORT_PC_SUPERIO is not set @@ -1968,0 +1980 @@ +# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set @@ -1971 +1982,0 @@ -# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set @@ -2131,0 +2143 @@ +# CONFIG_NET_VENDOR_NETRONOME is not set @@ -2263,43 +2275,6 @@ -# CONFIG_PCMCIA_RAYCS is not set -# CONFIG_LIBERTAS_THINFIRM is not set -# CONFIG_AIRO is not set -# CONFIG_ATMEL is not set -# CONFIG_AT76C50X_USB is not set -# CONFIG_AIRO_CS is not set -# CONFIG_PCMCIA_WL3501 is not set -# CONFIG_PRISM54 is not set -# CONFIG_USB_ZD1201 is not set -# CONFIG_USB_NET_RNDIS_WLAN is not set -# CONFIG_ADM8211 is not set -# CONFIG_RTL8180 is not set -# CONFIG_RTL8187 is not set -# CONFIG_MAC80211_HWSIM is not set -# CONFIG_MWL8K is not set -# CONFIG_ATH_CARDS is not set -CONFIG_B43=m -CONFIG_B43_BCMA=y -CONFIG_B43_SSB=y -CONFIG_B43_BUSES_BCMA_AND_SSB=y -# CONFIG_B43_BUSES_BCMA is not set -# CONFIG_B43_BUSES_SSB is not set -CONFIG_B43_PCI_AUTOSELECT=y -CONFIG_B43_PCICORE_AUTOSELECT=y -CONFIG_B43_SDIO=y -CONFIG_B43_BCMA_PIO=y -CONFIG_B43_PIO=y -CONFIG_B43_PHY_G=y -CONFIG_B43_PHY_N=y -CONFIG_B43_PHY_LP=y -CONFIG_B43_PHY_HT=y -CONFIG_B43_LEDS=y -CONFIG_B43_HWRNG=y -# CONFIG_B43_DEBUG is not set -# CONFIG_B43LEGACY is not set -# CONFIG_BRCMSMAC is not set -# CONFIG_BRCMFMAC is not set -CONFIG_HOSTAP=m -CONFIG_HOSTAP_FIRMWARE=y -# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set -CONFIG_HOSTAP_PLX=m -CONFIG_HOSTAP_PCI=m -CONFIG_HOSTAP_CS=m +# CONFIG_WLAN_VENDOR_ADMTEK is not set +# CONFIG_WLAN_VENDOR_ATH is not set +# CONFIG_WLAN_VENDOR_ATMEL is not set +# CONFIG_WLAN_VENDOR_BROADCOM is not set +# CONFIG_WLAN_VENDOR_CISCO is not set +CONFIG_WLAN_VENDOR_INTEL=y @@ -2307,0 +2283,2 @@ +# CONFIG_IWL4965 is not set +# CONFIG_IWL3945 is not set @@ -2321,14 +2298,13 @@ -# CONFIG_IWL4965 is not set -# CONFIG_IWL3945 is not set -# CONFIG_LIBERTAS is not set -# CONFIG_HERMES is not set -# CONFIG_P54_
Re: Major KVM issues with kernel 4.5 on the host
On Sun, Mar 20, 2016 at 02:31:58PM +0100, Borislav Petkov wrote: > On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote: > > Booting Debian Linux, apt-get update, apt-get upgrade, and run aide > > (which builds checksums for the entire filesystem, a rather disk-bound > > activity). > > So I did that and aide ran a whole init and check all the way through > and all fine. I don't see anything out of the ordinary in your dmesg > outputs either. > > The next things we should look like is: > > * diff .configs - there might be something there# Here we go: [2/501]mh@fan:~$ diff -u0 /boot/config-4.4.6-zgws1 /boot/config-4.5.1-zgws1 --- /boot/config-4.4.6-zgws12016-03-28 15:50:36.0 +0200 +++ /boot/config-4.5.1-zgws12016-04-13 08:32:44.0 +0200 @@ -3 +3 @@ -# Linux/x86_64 4.4.6 Kernel Configuration +# Linux/x86_64 4.5.1 Kernel Configuration @@ -14 +13,0 @@ -CONFIG_HAVE_LATENCYTOP_SUPPORT=y @@ -15,0 +15,4 @@ +CONFIG_ARCH_MMAP_RND_BITS_MIN=28 +CONFIG_ARCH_MMAP_RND_BITS_MAX=32 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 @@ -147,7 +149,0 @@ -# CONFIG_CGROUP_DEBUG is not set -CONFIG_CGROUP_FREEZER=y -CONFIG_CGROUP_PIDS=y -CONFIG_CGROUP_DEVICE=y -CONFIG_CPUSETS=y -CONFIG_PROC_PID_CPUSET=y -CONFIG_CGROUP_CPUACCT=y @@ -158,3 +154,3 @@ -# CONFIG_MEMCG_KMEM is not set -# CONFIG_CGROUP_HUGETLB is not set -CONFIG_CGROUP_PERF=y +CONFIG_BLK_CGROUP=y +# CONFIG_DEBUG_BLK_CGROUP is not set +CONFIG_CGROUP_WRITEBACK=y @@ -165,3 +161,9 @@ -CONFIG_BLK_CGROUP=y -# CONFIG_DEBUG_BLK_CGROUP is not set -CONFIG_CGROUP_WRITEBACK=y +CONFIG_CGROUP_PIDS=y +CONFIG_CGROUP_FREEZER=y +# CONFIG_CGROUP_HUGETLB is not set +CONFIG_CPUSETS=y +CONFIG_PROC_PID_CPUSET=y +CONFIG_CGROUP_DEVICE=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_CGROUP_PERF=y +# CONFIG_CGROUP_DEBUG is not set @@ -254 +255,0 @@ -CONFIG_HAVE_DMA_ATTRS=y @@ -288,0 +290,4 @@ +CONFIG_HAVE_ARCH_MMAP_RND_BITS=y +CONFIG_ARCH_MMAP_RND_BITS=28 +CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y +CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8 @@ -377,0 +383 @@ +CONFIG_X86_FAST_FEATURE_TESTS=y @@ -383 +389 @@ -CONFIG_IOSF_MBI=m +CONFIG_IOSF_MBI=y @@ -390,0 +397 @@ +# CONFIG_QUEUED_LOCK_STAT is not set @@ -769,0 +777 @@ +# CONFIG_VMD is not set @@ -772,0 +781 @@ +CONFIG_NET_EGRESS=y @@ -824,0 +834 @@ +# CONFIG_INET_DIAG_DESTROY is not set @@ -945,0 +956,3 @@ +CONFIG_NF_DUP_NETDEV=m +CONFIG_NFT_DUP_NETDEV=m +CONFIG_NFT_FWD_NETDEV=m @@ -1252,0 +1266 @@ +# CONFIG_6LOWPAN_DEBUGFS is not set @@ -1344,0 +1359 @@ +CONFIG_SOCK_CGROUP_DATA=y @@ -1411 +1425,0 @@ -CONFIG_WEXT_SPY=y @@ -1423,5 +1437 @@ -CONFIG_LIB80211=m -CONFIG_LIB80211_CRYPT_WEP=m -CONFIG_LIB80211_CRYPT_CCMP=m -CONFIG_LIB80211_CRYPT_TKIP=m -# CONFIG_LIB80211_DEBUG is not set +# CONFIG_LIB80211 is not set @@ -1469 +1479,2 @@ -# CONFIG_NFC_ST_NCI is not set +# CONFIG_NFC_ST_NCI_I2C is not set +# CONFIG_NFC_ST_NCI_SPI is not set @@ -1616,2 +1627,2 @@ -CONFIG_PARPORT_PC=m -CONFIG_PARPORT_SERIAL=m +CONFIG_PARPORT_PC=y +CONFIG_PARPORT_SERIAL=y @@ -1619 +1630 @@ -CONFIG_PARPORT_PC_SUPERIO=y +# CONFIG_PARPORT_PC_SUPERIO is not set @@ -1968,0 +1980 @@ +# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set @@ -1971 +1982,0 @@ -# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set @@ -2131,0 +2143 @@ +# CONFIG_NET_VENDOR_NETRONOME is not set @@ -2263,43 +2275,6 @@ -# CONFIG_PCMCIA_RAYCS is not set -# CONFIG_LIBERTAS_THINFIRM is not set -# CONFIG_AIRO is not set -# CONFIG_ATMEL is not set -# CONFIG_AT76C50X_USB is not set -# CONFIG_AIRO_CS is not set -# CONFIG_PCMCIA_WL3501 is not set -# CONFIG_PRISM54 is not set -# CONFIG_USB_ZD1201 is not set -# CONFIG_USB_NET_RNDIS_WLAN is not set -# CONFIG_ADM8211 is not set -# CONFIG_RTL8180 is not set -# CONFIG_RTL8187 is not set -# CONFIG_MAC80211_HWSIM is not set -# CONFIG_MWL8K is not set -# CONFIG_ATH_CARDS is not set -CONFIG_B43=m -CONFIG_B43_BCMA=y -CONFIG_B43_SSB=y -CONFIG_B43_BUSES_BCMA_AND_SSB=y -# CONFIG_B43_BUSES_BCMA is not set -# CONFIG_B43_BUSES_SSB is not set -CONFIG_B43_PCI_AUTOSELECT=y -CONFIG_B43_PCICORE_AUTOSELECT=y -CONFIG_B43_SDIO=y -CONFIG_B43_BCMA_PIO=y -CONFIG_B43_PIO=y -CONFIG_B43_PHY_G=y -CONFIG_B43_PHY_N=y -CONFIG_B43_PHY_LP=y -CONFIG_B43_PHY_HT=y -CONFIG_B43_LEDS=y -CONFIG_B43_HWRNG=y -# CONFIG_B43_DEBUG is not set -# CONFIG_B43LEGACY is not set -# CONFIG_BRCMSMAC is not set -# CONFIG_BRCMFMAC is not set -CONFIG_HOSTAP=m -CONFIG_HOSTAP_FIRMWARE=y -# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set -CONFIG_HOSTAP_PLX=m -CONFIG_HOSTAP_PCI=m -CONFIG_HOSTAP_CS=m +# CONFIG_WLAN_VENDOR_ADMTEK is not set +# CONFIG_WLAN_VENDOR_ATH is not set +# CONFIG_WLAN_VENDOR_ATMEL is not set +# CONFIG_WLAN_VENDOR_BROADCOM is not set +# CONFIG_WLAN_VENDOR_CISCO is not set +CONFIG_WLAN_VENDOR_INTEL=y @@ -2307,0 +2283,2 @@ +# CONFIG_IWL4965 is not set +# CONFIG_IWL3945 is not set @@ -2321,14 +2298,13 @@ -# CONFIG_IWL4965 is not set -# CONFIG_IWL3945 is not set -# CONFIG_LIBERTAS is not set -# CONFIG_HERMES is not set -# CONFIG_P54_
Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote: > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493. > due to problems on GeekBox and Banana Pi M1 board when > connected to a real transceiver instead of a switch via > fixed-link. This reversal is still needed in Linux 4.5.1 on Banana Pi. Please consider including it in Linux 4.5.2. Greetings Marc > > Signed-off-by: Giuseppe Cavallaro <peppe.cavall...@st.com> > Cc: Gabriel Fernandez <gabriel.fernan...@linaro.org> > Cc: Andreas Färber <afaer...@suse.de> > Cc: Frank Schäfer <fschaefer@googlemail.com> > Cc: Dinh Nguyen <dinh.li...@gmail.com> > Cc: David S. Miller <da...@davemloft.net> > --- > drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++- > .../net/ethernet/stmicro/stmmac/stmmac_platform.c |9 + > include/linux/stmmac.h |1 - > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > index ea76129..af09ced 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev) > struct stmmac_priv *priv = netdev_priv(ndev); > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; > int addr, found; > - struct device_node *mdio_node = priv->plat->mdio_node; > + struct device_node *mdio_node = NULL; > + struct device_node *child_node = NULL; > > if (!mdio_bus_data) > return 0; > > if (IS_ENABLED(CONFIG_OF)) { > + for_each_child_of_node(priv->device->of_node, child_node) { > + if (of_device_is_compatible(child_node, > + "snps,dwmac-mdio")) { > + mdio_node = child_node; > + break; > + } > + } > + > if (mdio_node) { > netdev_dbg(ndev, "FOUND MDIO subnode\n"); > } else { > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > index dcbd2a1..9cf181f 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, > const char **mac) > struct device_node *np = pdev->dev.of_node; > struct plat_stmmacenet_data *plat; > struct stmmac_dma_cfg *dma_cfg; > - struct device_node *child_node = NULL; > > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL); > if (!plat) > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, > const char **mac) > plat->phy_node = of_node_get(np); > } > > - for_each_child_of_node(np, child_node) > - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { > - plat->mdio_node = child_node; > - break; > - } > - > /* "snps,phy-addr" is not a standard property. Mark it as deprecated >* and warn of its use. Remove this when phy node support is added. >*/ > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) > dev_warn(>dev, "snps,phy-addr property is deprecated\n"); > > - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) > + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) > plat->mdio_bus_data = NULL; > else > plat->mdio_bus_data = > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h > index 4bcf5a6..6e53fa8 100644 > --- a/include/linux/stmmac.h > +++ b/include/linux/stmmac.h > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data { > int interface; > struct stmmac_mdio_bus_data *mdio_bus_data; > struct device_node *phy_node; > - struct device_node *mdio_node; > struct stmmac_dma_cfg *dma_cfg; > int clk_csr; > int has_gmac; > -- > 1.7.4.4 > -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote: > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493. > due to problems on GeekBox and Banana Pi M1 board when > connected to a real transceiver instead of a switch via > fixed-link. This reversal is still needed in Linux 4.5.1 on Banana Pi. Please consider including it in Linux 4.5.2. Greetings Marc > > Signed-off-by: Giuseppe Cavallaro > Cc: Gabriel Fernandez > Cc: Andreas Färber > Cc: Frank Schäfer > Cc: Dinh Nguyen > Cc: David S. Miller > --- > drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++- > .../net/ethernet/stmicro/stmmac/stmmac_platform.c |9 + > include/linux/stmmac.h |1 - > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > index ea76129..af09ced 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev) > struct stmmac_priv *priv = netdev_priv(ndev); > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; > int addr, found; > - struct device_node *mdio_node = priv->plat->mdio_node; > + struct device_node *mdio_node = NULL; > + struct device_node *child_node = NULL; > > if (!mdio_bus_data) > return 0; > > if (IS_ENABLED(CONFIG_OF)) { > + for_each_child_of_node(priv->device->of_node, child_node) { > + if (of_device_is_compatible(child_node, > + "snps,dwmac-mdio")) { > + mdio_node = child_node; > + break; > + } > + } > + > if (mdio_node) { > netdev_dbg(ndev, "FOUND MDIO subnode\n"); > } else { > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > index dcbd2a1..9cf181f 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, > const char **mac) > struct device_node *np = pdev->dev.of_node; > struct plat_stmmacenet_data *plat; > struct stmmac_dma_cfg *dma_cfg; > - struct device_node *child_node = NULL; > > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL); > if (!plat) > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, > const char **mac) > plat->phy_node = of_node_get(np); > } > > - for_each_child_of_node(np, child_node) > - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { > - plat->mdio_node = child_node; > - break; > - } > - > /* "snps,phy-addr" is not a standard property. Mark it as deprecated >* and warn of its use. Remove this when phy node support is added. >*/ > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) > dev_warn(>dev, "snps,phy-addr property is deprecated\n"); > > - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) > + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) > plat->mdio_bus_data = NULL; > else > plat->mdio_bus_data = > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h > index 4bcf5a6..6e53fa8 100644 > --- a/include/linux/stmmac.h > +++ b/include/linux/stmmac.h > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data { > int interface; > struct stmmac_mdio_bus_data *mdio_bus_data; > struct device_node *phy_node; > - struct device_node *mdio_node; > struct stmmac_dma_cfg *dma_cfg; > int clk_csr; > int has_gmac; > -- > 1.7.4.4 > -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: ext4_file_open: Inconsistent encryption contexts (commit ff978b09f973) breaking Docker
On Mon, Mar 14, 2016 at 11:27:35AM +0100, Miklos Szeredi wrote: > Could you please try the below patch? I can confirm that I have the issue on kernel 4.5 with Debian's schroot using overlayfs, and that this patch fixes it. It should be in 4.5.1. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: ext4_file_open: Inconsistent encryption contexts (commit ff978b09f973) breaking Docker
On Mon, Mar 14, 2016 at 11:27:35AM +0100, Miklos Szeredi wrote: > Could you please try the below patch? I can confirm that I have the issue on kernel 4.5 with Debian's schroot using overlayfs, and that this patch fixes it. It should be in 4.5.1. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Major KVM issues with kernel 4.5 on the host
Hi, I have a (semi-productive[1]) system ("host") running Debian unstable. On this system, a few VMs (Debian unstable, Debian testing) ("vm1", "vm2", "vm3") are running. I roll my own kernels and take vanilla upstream sources. No distribution patches. Since host was updated to Kernel 4.5, the VMs have started acting up. All of them. The range of strangeness begins with "relocation error, system halted" on system startup, corrupted data files on disk, filesystems remounted read-only, libraries rejected with "invalid ELF format", binaries segfaulting all of a sudden. Downgrading host to kernel 4.4.5 magically fixed all those issues. Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs errors, logged in one of the VMs: Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546538 Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546530 Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546543: comm aide: bad extra_isize (44800 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546568: comm aide: bogus i_mode (144) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546564 Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546562 Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546563: comm aide: bad extra_isize (6464 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546561: comm aide: bogus i_mode (0) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546529: comm aide: bad extra_isize (1152 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784 I'm going to try reproducing the issue on a less "important" machine so that bisecting is less painful, but maybe you guys have an idea what's going wrong here. jftr, kernel 4.5 in guest and in standalone systems seems to be unproblematic. Greetings Marc [1] my main workstation, running enough services for the local network that disturbances in its operation cause reasonable discomfort, but not the Enterprise kind of "productive" -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Major KVM issues with kernel 4.5 on the host
Hi, I have a (semi-productive[1]) system ("host") running Debian unstable. On this system, a few VMs (Debian unstable, Debian testing) ("vm1", "vm2", "vm3") are running. I roll my own kernels and take vanilla upstream sources. No distribution patches. Since host was updated to Kernel 4.5, the VMs have started acting up. All of them. The range of strangeness begins with "relocation error, system halted" on system startup, corrupted data files on disk, filesystems remounted read-only, libraries rejected with "invalid ELF format", binaries segfaulting all of a sudden. Downgrading host to kernel 4.4.5 magically fixed all those issues. Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs errors, logged in one of the VMs: Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546538 Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546530 Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546543: comm aide: bad extra_isize (44800 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546568: comm aide: bogus i_mode (144) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546564 Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546562 Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546563: comm aide: bad extra_isize (6464 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546561: comm aide: bogus i_mode (0) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546529: comm aide: bad extra_isize (1152 != 256) Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784 I'm going to try reproducing the issue on a less "important" machine so that bisecting is less painful, but maybe you guys have an idea what's going wrong here. jftr, kernel 4.5 in guest and in standalone systems seems to be unproblematic. Greetings Marc [1] my main workstation, running enough services for the local network that disturbances in its operation cause reasonable discomfort, but not the Enterprise kind of "productive" -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
Hi Borislav, On Thu, Mar 17, 2016 at 07:11:28PM +0100, Borislav Petkov wrote: > Do you have any funky messages in host's dmesg ? Not that I see. > Can you upload a full dmesg from both a good and a bad host kernel? http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5 http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5 Hope this helps. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
Hi Borislav, On Thu, Mar 17, 2016 at 07:11:28PM +0100, Borislav Petkov wrote: > Do you have any funky messages in host's dmesg ? Not that I see. > Can you upload a full dmesg from both a good and a bad host kernel? http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5 http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5 Hope this helps. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
Hi Borislav, On Fri, Mar 18, 2016 at 11:04:29PM +0100, Borislav Petkov wrote: > On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote: > > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5 > > This one I got. > > > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5 > > This one doesn't want: > > HTTP request sent, awaiting response... 403 Forbidden > 2016-03-18 22:57:46 ERROR 403: Forbidden. Idiot me. File permissions fixed. > Anything special you're doing to cause the host kernel to barf which I > should do here? Booting Debian Linux, apt-get update, apt-get upgrade, and run aide (which builds checksums for the entire filesystem, a rather disk-bound activity). Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Major KVM issues with kernel 4.5 on the host
Hi Borislav, On Fri, Mar 18, 2016 at 11:04:29PM +0100, Borislav Petkov wrote: > On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote: > > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5 > > This one I got. > > > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5 > > This one doesn't want: > > HTTP request sent, awaiting response... 403 Forbidden > 2016-03-18 22:57:46 ERROR 403: Forbidden. Idiot me. File permissions fixed. > Anything special you're doing to cause the host kernel to barf which I > should do here? Booting Debian Linux, apt-get update, apt-get upgrade, and run aide (which builds checksums for the entire filesystem, a rather disk-bound activity). Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
probable cause found - 3.14 on Xenserver (non-HVM): "cannot allocate memory")
Hi, On Sun, Apr 20, 2014 at 10:56:39PM +0200, Marc Haber wrote: > Linux 3.14, however, does not yet beyond the initramfs state ("cannot > allocate memory". This also happens when i set cgroup_disable=memory. After doing a bisect between "good 3.13" and "bad 3.14", I ended up with this commit: 6145cfe394a7f138f6b64491c5663f97dba12450 is the first bad commit commit 6145cfe394a7f138f6b64491c5663f97dba12450 Author: Kees Cook Date: Thu Oct 10 17:18:18 2013 -0700 x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB), the upper limit currently, since the kernel fixmap page mappings need to be moved to use the other 1 GiB (which would be the theoretical limit when building with -mcmodel=kernel). Signed-off-by: Kees Cook Link: http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keesc...@chromium.org Signed-off-by: H. Peter Anvin :04 04 a48a6355e3ccd676027319ff520bf953cb07a0bb 7beb1fdd7478b6bec2555a364fbfac29d7a5a3c4 M arch and, indeed, reverting this commit on plain 3.14.1 makes my Xenserver boot just fine. When reverting the patch, I took the liberty of ignoring the REJECT in arch/x86/Kconfig. I am, however, a bit confused that this patch dates back to october while the breakage only occurred when 3.14 was released. Greetings Marc -- ----- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things."Winona Ryder | Fon: *49 621 31958061 Nordisch by Nature | How to make an American Quilt | Fax: *49 621 31958062 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
probable cause found - 3.14 on Xenserver (non-HVM): cannot allocate memory)
Hi, On Sun, Apr 20, 2014 at 10:56:39PM +0200, Marc Haber wrote: Linux 3.14, however, does not yet beyond the initramfs state (cannot allocate memory. This also happens when i set cgroup_disable=memory. After doing a bisect between good 3.13 and bad 3.14, I ended up with this commit: 6145cfe394a7f138f6b64491c5663f97dba12450 is the first bad commit commit 6145cfe394a7f138f6b64491c5663f97dba12450 Author: Kees Cook keesc...@chromium.org Date: Thu Oct 10 17:18:18 2013 -0700 x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB), the upper limit currently, since the kernel fixmap page mappings need to be moved to use the other 1 GiB (which would be the theoretical limit when building with -mcmodel=kernel). Signed-off-by: Kees Cook keesc...@chromium.org Link: http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keesc...@chromium.org Signed-off-by: H. Peter Anvin h...@linux.intel.com :04 04 a48a6355e3ccd676027319ff520bf953cb07a0bb 7beb1fdd7478b6bec2555a364fbfac29d7a5a3c4 M arch and, indeed, reverting this commit on plain 3.14.1 makes my Xenserver boot just fine. When reverting the patch, I took the liberty of ignoring the REJECT in arch/x86/Kconfig. I am, however, a bit confused that this patch dates back to october while the breakage only occurred when 3.14 was released. Greetings Marc -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things.Winona Ryder | Fon: *49 621 31958061 Nordisch by Nature | How to make an American Quilt | Fax: *49 621 31958062 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/