42d8c91e for drivers/net/phy/realtek.c causing loss on Banana Pi

2020-10-17 Thread Marc Haber
Hi,

in kernel 5.9, my Banana Pi test systems suffers from catastrophic
packet loss on the Ethernet that makes the machine nearly unusable.
Reverting bbc4d71d63549bcd003a430de18a72a742d8c91e fixes the issue for
me.

Please investigate the breakage caused by this commit. I am prepared to
help with testing and would appreciate a fix even in Greg's stable
releases.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56

2019-06-25 Thread Marc Haber
On Sun, Jun 02, 2019 at 03:48:42PM +0200, Marc Haber wrote:
> On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote:
> > on my primary notebook, a Lenovo X260, with an Intel Wireless 8260
> > (8086:24f3), running Debian unstable, I have started to see network
> > hangs since upgrading to kernel 5.1. In this situation, I cannot
> > restart Network-Manager (the call just hangs), I can log out of X, but
> > the system does not cleanly shut down and I need to Magic SysRq myself
> > out of the running system. This happens about once every two days.
> 
> The issue is also present in 5.1.5 and 5.1.6.

Almost a month later, 5.1.15 still crashes about twice a day on my
Notebook. The error message seems pretty clear to me, how can I go on
from there and may be identify a line number outside of a library?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56

2019-06-07 Thread Marc Haber
On Fri, Jun 07, 2019 at 10:20:56PM +0200, Yussuf Khalil wrote:
> CC'ing iwlwifi maintainers to get some attention for this issue.
> 
> I am experiencing the very same bug on a ThinkPad T480s running 5.1.6 with
> Fedora 30. A friend is seeing it on his X1 Carbon 6th Gen, too. Both have an
> "Intel Corporation Wireless 8265 / 8275" card according to lspci.

I have an older 04:00.0 Network controller [0280]: Intel Corporation
Wireless 8260 [8086:24f3] (rev 3a) on a Thinkpad X260.

> Notably, in all cases I've observed it occurred right after roaming from one
> AP to another (though I can't guarantee this isn't a coincidence).

I also have multiple Access Points broadcasting the same SSID in my
house, and yes, I experience those issues often when I move from one
part of the hose to another. I have, however, also experienced it in a
hotel when I was using the mobile hotspot offered by my mobile, so that
was clearly not a roaming situation.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56

2019-06-02 Thread Marc Haber
On Thu, May 30, 2019 at 10:12:57AM +0200, Marc Haber wrote:
> on my primary notebook, a Lenovo X260, with an Intel Wireless 8260
> (8086:24f3), running Debian unstable, I have started to see network
> hangs since upgrading to kernel 5.1. In this situation, I cannot
> restart Network-Manager (the call just hangs), I can log out of X, but
> the system does not cleanly shut down and I need to Magic SysRq myself
> out of the running system. This happens about once every two days.

The issue is also present in 5.1.5 and 5.1.6.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


iwl_mvm_add_new_dqa_stream_wk BUG in lib/list_debug.c:56

2019-05-30 Thread Marc Haber
 xor 
uas usb_storage raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear 
md_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel 
rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
ahci libahci psmouse xhci_pci i2c_i801 e1000e libata xhci_hcd rtsx_pci mfd_core 
scsi_mod usbcore usb_common i915 i2c_algo_bit drm_kms_helper drm i2c_core 
thermal video
[38179.854652] ---[ end trace fd93637fcde969e6 ]---
[38179.854654] RIP: 0010:compaction_alloc+0x569/0x8c0
[38179.854656] Code: 62 01 00 00 49 be 00 00 00 00 00 16 00 00 eb 72 48 b8 00 
00 00 00 00 ea ff ff 49 89 da 49 c1 e2 06 4d 8d 2c 02 4d 85 ed 74 3b <41> 8b 45 
30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 f9 00 00 00 41 80
[38179.854657] RSP: 0018:c90001a5f900 EFLAGS: 00010286
[38179.854658] RAX: ea00 RBX: 800ffe00 RCX: 003c
[38179.854659] RDX: 800ffe00 RSI:  RDI: 8884417f42c0
[38179.854660] RBP: 8010 R08:  R09: 8884417fab80
[38179.854661] R10: 03ff8000 R11: 80122c00 R12: 0020
[38179.854662] R13: ea0003ff8000 R14: 1600 R15: c90001a5fae0
[38179.854664] FS:  () GS:88843180() 
knlGS:
[38179.854665] CS:  0010 DS:  ES:  CR0: 80050033
[38179.854666] CR2: 7f87c26a5000 CR3: 00033154a006 CR4: 003626f0
[38179.854667] DR0:  DR1:  DR2: 
[38179.854667] DR3:  DR6: fffe0ff0 DR7: 0400


Is that a known issue? I currently have this with 5.1.5, are there patches in
the queue that may be candidates to stabilize my wireless again?

Greetings
Marc


-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Linux in KVM guest segfaults when hosts runs Linux 5.1

2019-05-27 Thread Marc Haber
On Tue, May 14, 2019 at 08:51:28AM +0200, Marc Haber wrote:
> On Mon, May 13, 2019 at 04:10:35PM +0200, Radim Krčmář wrote:
> > 2019-05-12 13:53+0200, Marc Haber:
> > > since updating my home desktop machine to kernel 5.1.1, KVM guests
> > > started on that machine segfault after booting:
> > [...]
> > > Any idea short of bisecting?
> > 
> > It has also been spotted by Borislav and the fix [1] should land in the
> > next kernel update, thanks for the report.
> > 1: https://patchwork.kernel.org/patch/10936271/
> 
> I can confirm that this patch fixes the segfaults for me.

And it is not yet in Linux 5.1.5.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Kernel 5.1 breaks UDP checksums for SIP packets

2019-05-20 Thread Marc Haber
On Mon, May 20, 2019 at 12:28:02PM +0200, Florian Westphal wrote:
> Marc Haber  wrote:
> > when I update my Firewall from Kernel 5.0 to Kernel 5.1, SIP clients
> > that connect from the internal network to an external, commercial SIP
> > service do not work any more. When I trace beyond the NAT, I see that
> > the outgoing SIP packets have incorrect UDP checksums:
> 
> I'm a moron.  Can you please try this patch?
> 
> diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
> --- a/net/netfilter/nf_nat_helper.c
> +++ b/net/netfilter/nf_nat_helper.c
> @@ -170,7 +170,7 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb,
>   if (!udph->check && skb->ip_summed != CHECKSUM_PARTIAL)
>   return true;
>  
> - nf_nat_csum_recalc(skb, nf_ct_l3num(ct), IPPROTO_TCP,
> + nf_nat_csum_recalc(skb, nf_ct_l3num(ct), IPPROTO_UDP,
>  udph, >check, datalen, oldlen);
>  
>   return true;

Thanks for the lightning fast reaction. The patch indeed fixes the issue
for me, everything is online now, incoming and outgoing calls are
possible. Can you funnel that one to Greg please for the next stable
release?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Kernel 5.1 breaks UDP checksums for SIP packets

2019-05-20 Thread Marc Haber
ke things work again.

The obvious candidates are nf_conntrack_sip and nf_nat_sip. nf_nat_sip
didn't change between 5.0.13 and 5.1.3, and transplanting 5.0's
nf_conntrack_sip onto a 5.1.3 kernel didn't change 5.1.3's faulty
behavior.

Does anybody have an idea that I could try before bisecting 7074
revisions in roughly 13 steps?

Greetings
Marc


-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Linux 5.1 on APU runs in circles with Call Traces

2019-05-14 Thread Marc Haber
On Sun, May 12, 2019 at 09:32:03PM +0200, Marc Haber wrote:
> I regret to inform you that I have now the third crippling issue in
> Linux 5.1, with the fourth one in the process of being isolated.

I had GPIOLIB missing in my kernel configuration. That was not
autodetected and resulted just in a bunch of kernel warnings scrolling
by in a second. make oldconfig allowed me to actually see the
warnings. Issue solved.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Linux in KVM guest segfaults when hosts runs Linux 5.1

2019-05-14 Thread Marc Haber
On Mon, May 13, 2019 at 04:10:35PM +0200, Radim Krčmář wrote:
> 2019-05-12 13:53+0200, Marc Haber:
> > since updating my home desktop machine to kernel 5.1.1, KVM guests
> > started on that machine segfault after booting:
> [...]
> > Any idea short of bisecting?
> 
> It has also been spotted by Borislav and the fix [1] should land in the
> next kernel update, thanks for the report.
> 1: https://patchwork.kernel.org/patch/10936271/

I can confirm that this patch fixes the segfaults for me.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Linux 5.1 on APU runs in circles with Call Traces

2019-05-12 Thread Marc Haber
Hi,

I regret to inform you that I have now the third crippling issue in
Linux 5.1, with the fourth one in the process of being isolated.

This time, it's a PC Engines APU2 running in circles right after
booting:
May 12 20:56:01 prom kernel: CPU: 2 PID: 657 Comm: kworker/2:2 Tainted: G   
 W 5.1.1-zgsrv20080 #5.1.1.20190511.0
May 12 20:56:01 prom kernel: Hardware name: PC Engines apu2/apu2, BIOS 88a4f96 
03/11/2016
May 12 20:56:01 prom kernel: Workqueue: events_freezable 
input_polled_device_work [input_polldev]
May 12 20:56:01 prom kernel: RIP: 
0010:gpio_keys_polled_check_state.isra.1+0xa/0x60 [gpio_keys_polled]
May 12 20:56:01 prom kernel: Code: 48 8b 17 48 8b 42 10 48 8b 40 20 48 85 c0 74 
09 48 8b 7a 08 e9 f7 fa 6e e1 c3 66 0f 1f 44 00 00 41 54 55 48 89 cd 53 48 89 
d3 <0f> 0b 8b 46 18 85 c0 74 20 8d 50 fe 83 fa 01 77 1d 8b 03 85 c0 74
May 12 20:56:01 prom kernel: RSP: 0018:c981fe20 EFLAGS: 00010286
May 12 20:56:01 prom kernel: RAX:  RBX: 888117ca6548 RCX: 
888117ca654c
May 12 20:56:01 prom kernel: RDX: 888117ca6548 RSI: a03e9040 RDI: 
888116043000
May 12 20:56:01 prom kernel: RBP: 888117ca654c R08: 0010 R09: 

May 12 20:56:01 prom kernel: R10: 8080808080808080 R11: 0018 R12: 
888117ca6538
May 12 20:56:01 prom kernel: R13: 0001 R14: 888117ca6530 R15: 
8881161ece40
May 12 20:56:01 prom kernel: FS:  () 
GS:88811ab0() knlGS:
May 12 20:56:01 prom kernel: CS:  0010 DS:  ES:  CR0: 80050033
May 12 20:56:01 prom kernel: CR2: 55572f98a251 CR3: 000117dd6000 CR4: 
000406e0
May 12 20:56:01 prom kernel: Call Trace:
May 12 20:56:01 prom kernel: gpio_keys_polled_poll+0xd0/0x240 [gpio_keys_polled]
May 12 20:56:01 prom kernel: ? __switch_to+0x171/0x410
May 12 20:56:01 prom kernel: ? finish_task_switch+0x6f/0x260
May 12 20:56:01 prom kernel: input_polled_device_work+0x11/0x20 [input_polldev]
May 12 20:56:01 prom kernel: process_one_work+0x171/0x300
May 12 20:56:01 prom kernel: worker_thread+0x2b/0x370
May 12 20:56:01 prom kernel: ? process_one_work+0x300/0x300
May 12 20:56:01 prom kernel: kthread+0x108/0x120
May 12 20:56:01 prom kernel: ? kthread_park+0x80/0x80
May 12 20:56:01 prom kernel: ret_from_fork+0x22/0x40
May 12 20:56:01 prom kernel: ---[ end trace 72a086f2949e1d45 ]---
May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 
0 (apu:green:1)
May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 
0 (apu:green:2)
May 12 20:56:01 prom kernel: leds-gpio leds-gpio: Skipping unavailable LED gpio 
0 (apu:green:3)
May 12 20:56:01 prom kernel: WARNING: CPU: 2 PID: 657 at 
include/linux/gpio/consumer.h:421 gpio_keys_polled_check_state.isr
a.1+0xa/0x60 [gpio_keys_polled]
May 12 20:56:01 prom kernel: Modules linked in: 8021q crct10dif_pclmul(+) 
crc32_pclmul leds_gpio ghash_clmulni_intel pcengi
nes_apuv2 gpio_keys_polled aesni_intel input_polldev aes_x86_64 crypto_simd 
cryptd glue_helper fam15h_power k10temp input_l
eds sp5100_tco led_class sg ccp pcc_cpufreq acpi_cpufreq bridge stp llc 
ip_tables x_tables autofs4 ext4 mbcache usbhid jbd2
 dm_mod usb_storage sd_mod ehci_pci ehci_hcd xhci_pci xhci_hcd ahci 
crc32c_intel libahci igb i2c_algo_bit usbcore i2c_piix4
 ptp libata i2c_core usb_common pps_core hwmon

This thing repeats indefinetely at the speed the serial console is able
to print. Going back to 5.0.13 immediately fixes this.

Any idea short of bisecting? I am sorry, but I am running out of time
for kernel debugging this month.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Linux in KVM guest segfaults when hosts runs Linux 5.1

2019-05-12 Thread Marc Haber
Hi,

since updating my home desktop machine to kernel 5.1.1, KVM guests
started on that machine segfault after booting:
general protection fault:  [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted 5.0.13-zgsrv20080 
#5.0.13.20190505.0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events once_deferred
RIP: 0010:native_read_pmc+0x2/0x10
Code: e2 20 89 3e 48 09 d0 c3 89 f9 89 f0 0f 30 c3 66 0f 1f 84 00 00 00 00 00 
89 f0 89 f9 0f 30 31 c0 c3 0f 1f 80 00 00 00 00 89 f9 <0f> 33 48 c1 e2 20 48 09 
d0 c3 0f 1f 40 00 0f 20 c0 c3 66 66 2e 0f
RSP: 0018:8881b9a03e50 EFLAGS: 00010083
RAX: 0001 RBX: 8001 RCX: 
RDX: 002f RSI:  RDI: 
RBP: 8881b590e400 R08: 8881b590e400 R09: 0003
R10: e8c05440 R11:  R12: 8881b590e5d8
R13: 0010 R14: 8881b590e420 R15: e8c05400
FS:  () GS:8881b9a0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f9bcc5c61f8 CR3: 0001b6a24000 CR4: 06f0
Call Trace:
 
 x86_perf_event_update+0x3b/0x80
 x86_pmu_stop+0x84/0xa0
 x86_pmu_del+0x52/0x160
 event_sched_out.isra.59+0x95/0x190
 group_sched_out.part.61+0x51/0xc0
 ctx_sched_out+0xf2/0x220
 ctx_resched+0xb8/0xc0
 __perf_install_in_context+0x175/0x1f0
 remote_function+0x3e/0x50
 flush_smp_call_function_queue+0x30/0xe0
 smp_call_function_interrupt+0x2f/0x40
 call_function_single_interrupt+0xf/0x20
 
RIP: 0010:smp_call_function_many+0x1ca/0x230
Code: ee 89 c7 e8 e8 f5 51 00 3b 05 a6 23 db 00 0f 83 b2 fe ff ff 48 63 d0 48 
8b 0b 48 03 0c d5 20 28 db 81 8b 51 18 83 e2 01 74 0a  90 8b 51 18 83 e2 01 
75 f6 eb c8 48 c7 c2 60 17 e9 81 48 89 ee
RSP: 0018:c9cc3d48 EFLAGS: 0202 ORIG_RAX: ff04
RAX: 0001 RBX: 8881b9a21e00 RCX: 8881b9a647c0
RDX: 0001 RSI:  RDI: 8881b9a21e08
RBP: 8881b9a21e08 R08: 003e R09: 
R10: 81c04584 R11:  R12: 81027af0
R13:  R14: 0001 R15: 0008
 ? arch_unregister_cpu+0x20/0x20
 ? smp_call_function_many+0x1a8/0x230
 ? inet_ehashfn+0x29/0x100
 ? arch_unregister_cpu+0x20/0x20
 ? inet_ehashfn+0x2a/0x100
 smp_call_function+0x20/0x40
 on_each_cpu+0x18/0x70
 ? inet_ehashfn+0x29/0x100
 ? inet_ehashfn+0x2a/0x100
 text_poke_bp+0x8d/0xda
 __jump_label_transform+0x10d/0x120
 arch_jump_label_transform+0x21/0x30
 __jump_label_update+0x70/0xe0
 static_key_disable_cpuslocked+0x54/0x80
 static_key_disable+0x11/0x20
 once_deferred+0x1a/0x30
 process_one_work+0x171/0x300
 worker_thread+0x2b/0x370
 ? process_one_work+0x300/0x300
 kthread+0x108/0x120
 ? kthread_park+0x80/0x80
 ret_from_fork+0x22/0x40
Modules linked in: input_leds sg led_class virtio_balloon virtio_console 
qemu_fw_cfg dm_mod virtio_rng ip_tables x_tables autofs4 ext4 mbcache jbd2 
fscrypto sr_mod cdrom ata_generic virtio_net net_failover failover virtio_blk 
virtio_pci i2c_piix4 virtio_ring ata_piix virtio libata i2c_core floppy
---[ end trace 60c8d1a075894c8d ]---
RIP: 0010:native_read_pmc+0x2/0x10
Code: e2 20 89 3e 48 09 d0 c3 89 f9 89 f0 0f 30 c3 66 0f 1f 84 00 00 00 00 00 
89 f0 89 f9 0f 30 31 c0 c3 0f 1f 80 00 00 00 00 89 f9 <0f> 33 48 c1 e2 20 48 09 
d0 c3 0f 1f 40 00 0f 20 c0 c3 66 66 2e 0f
RSP: 0018:8881b9a03e50 EFLAGS: 00010083
RAX: 0001 RBX: 8001 RCX: 
RDX: 002f RSI:  RDI: 
RBP: 8881b590e400 R08: 8881b590e400 R09: 0003
R10: e8c05440 R11:  R12: 8881b590e5d8
R13: 0010 R14: 8881b590e420 R15: e8c05400
FS:  () GS:8881b9a0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f9bcc5c61f8 CR3: 0001b6a24000 CR4: 06f0
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

The host seems to be running fine, the KVM guest crash is reproducible.
Both host and guest are running Debian unstable with a locally built
kernel; the host runs 5.1.1, the guest 5.0.13. The crash also happens
when the host is running 5.1.0; going back to 5.0.13 with the host
allows the guest to finish bootup and run fine.

Please note that my kernel 5.1.1 image is not fully broken in KVM, I
have update my APU machine which runs firewall and other infrastructure
services and the guests run fine there.

The machine in question is an older box with an AMD Phenom(tm) II X6
1090T Processor. I guess that the issue is related to the Phenom CPU.

Any idea short of bisecting?

Greetings
Marc

-- 
-----
Marc Haber | "I don'

Re: VMs freezing when host is running 4.14

2018-02-11 Thread Marc Haber
Hi,

after in total nine weeks of bisecting, broken filesystems, service
outages (thankfully on unportant systems), 4.15 seems to have fixed the
issue. After going to 4.15, the crashes never happened again.

They have, however, happened with each and every 4.14 release I tried,
which I stopped doing with 4.14.15 on Jan 28.

This means, for me, that the issue is fixed and that I have just wasted
nine weeks of time.

For you, this means that you have a crippling, data-eating issue in the
current long-term releae kernel. I do sincerely hope that I never have
to lay my eye on any 4.14 kernel and hope that no major distribution
will release with this version.

Greetings
Marc


On Mon, Jan 08, 2018 at 10:10:25AM +0100, Marc Haber wrote:
> it's been five weeks since I gave you the last information about this
> issue. Alas, I don't have a solution yet, only reports:
> 
> - The bisect between 4.13 and 4.14 ended up on a one-character fix in a
>   comment, so that was a total waste.
> - The issue is present in all recent kernels up to 4.15-rc5, I didn't
>   try any newer 4.15 version yet.
> - 4.13-rc4 seems good
> - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss
>   to understand why a bug introduced during the 4.13 RC phase could
>   _not_ be present in the 4.13 release but reappear in 4.14. I didn't
>   try any 4.14 rc versions but suspect that those are all bad as well.
> 
> I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is
> "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours
> (as I found out that 24 hours might not be long enough).
> 
> I am still open to any suggestions that might help in identifying this
> issue which now affects five of my six systems that to KVM
> virtualization one way or the other. I have in the mean time experienced
> file system corruption and data loss (and do have backups).
> 
> Greetings
> Marc
> 
> On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote:
> > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > +cc kvm
> > > 
> > > 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>:
> > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> > > >> On the affected host, VMs freeze at a rate about two or three per day.
> > > >> They just stop dead in their tracks, console and serial console become
> > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> > > >> virsh destroy.
> > > >
> > > > I was able to obtain a log of a VM before it became unresponsive. here
> > > > we go:
> > > >
> > > > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 
> > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console 
> > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 
> > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic 
> > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 
> > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata
> > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not 
> > > > tainted 4.14.1-zgsrv20080 #3
> > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > > > c91fc000
> > > > Nov 22 08:19:01 weave kernel: RIP: 
> > > > 0010:kvm_async_pf_task_wait+0x167/0x200
> > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 
> > > > 0202
> > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: 
> > > > c91ffa30 RCX: 0002
> > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 
> > > > 8173514b RDI: 819bdd80
> > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 
> > > > 00193fc0 R09: 8800
> > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: 
> > > >  R12: c91ffa40
> > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 
> > > > 819bdd80 R15: ea193f80
> > > > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > > > GS:88001fd0() knlGS:
> >

Re: VMs freezing when host is running 4.14

2018-02-11 Thread Marc Haber
Hi,

after in total nine weeks of bisecting, broken filesystems, service
outages (thankfully on unportant systems), 4.15 seems to have fixed the
issue. After going to 4.15, the crashes never happened again.

They have, however, happened with each and every 4.14 release I tried,
which I stopped doing with 4.14.15 on Jan 28.

This means, for me, that the issue is fixed and that I have just wasted
nine weeks of time.

For you, this means that you have a crippling, data-eating issue in the
current long-term releae kernel. I do sincerely hope that I never have
to lay my eye on any 4.14 kernel and hope that no major distribution
will release with this version.

Greetings
Marc


On Mon, Jan 08, 2018 at 10:10:25AM +0100, Marc Haber wrote:
> it's been five weeks since I gave you the last information about this
> issue. Alas, I don't have a solution yet, only reports:
> 
> - The bisect between 4.13 and 4.14 ended up on a one-character fix in a
>   comment, so that was a total waste.
> - The issue is present in all recent kernels up to 4.15-rc5, I didn't
>   try any newer 4.15 version yet.
> - 4.13-rc4 seems good
> - 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss
>   to understand why a bug introduced during the 4.13 RC phase could
>   _not_ be present in the 4.13 release but reappear in 4.14. I didn't
>   try any 4.14 rc versions but suspect that those are all bad as well.
> 
> I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is
> "roughly 7 steps"; a kernel is "good" if it survived at least 72 hours
> (as I found out that 24 hours might not be long enough).
> 
> I am still open to any suggestions that might help in identifying this
> issue which now affects five of my six systems that to KVM
> virtualization one way or the other. I have in the mean time experienced
> file system corruption and data loss (and do have backups).
> 
> Greetings
> Marc
> 
> On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote:
> > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > +cc kvm
> > > 
> > > 2017-11-22 10:39 GMT+01:00 Marc Haber :
> > > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> > > >> On the affected host, VMs freeze at a rate about two or three per day.
> > > >> They just stop dead in their tracks, console and serial console become
> > > >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> > > >> virsh destroy.
> > > >
> > > > I was able to obtain a log of a VM before it became unresponsive. here
> > > > we go:
> > > >
> > > > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 
> > > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console 
> > > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 
> > > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic 
> > > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 
> > > > virtio_pci virtio_ring virtio ata_piix i2c_core libata
> > > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not 
> > > > tainted 4.14.1-zgsrv20080 #3
> > > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > > > c91fc000
> > > > Nov 22 08:19:01 weave kernel: RIP: 
> > > > 0010:kvm_async_pf_task_wait+0x167/0x200
> > > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 
> > > > 0202
> > > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: 
> > > > c91ffa30 RCX: 0002
> > > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 
> > > > 8173514b RDI: 819bdd80
> > > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 
> > > > 00193fc0 R09: 8800
> > > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11: 
> > > >  R12: c91ffa40
> > > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 
> > > > 819bdd80 R15: ea193f80
> > > > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > > > GS:88001fd0() knlGS:
> > > > Nov 22 08:19:01 weave ke

Re: VMs freezing when host is running 4.14

2018-01-08 Thread Marc Haber
Hi,

it's been five weeks since I gave you the last information about this
issue. Alas, I don't have a solution yet, only reports:

- The bisect between 4.13 and 4.14 ended up on a one-character fix in a
  comment, so that was a total waste.
- The issue is present in all recent kernels up to 4.15-rc5, I didn't
  try any newer 4.15 version yet.
- 4.13-rc4 seems good
- 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss
  to understand why a bug introduced during the 4.13 RC phase could
  _not_ be present in the 4.13 release but reappear in 4.14. I didn't
  try any 4.14 rc versions but suspect that those are all bad as well.

I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is
"roughly 7 steps"; a kernel is "good" if it survived at least 72 hours
(as I found out that 24 hours might not be long enough).

I am still open to any suggestions that might help in identifying this
issue which now affects five of my six systems that to KVM
virtualization one way or the other. I have in the mean time experienced
file system corruption and data loss (and do have backups).

Greetings
Marc

On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote:
> On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > +cc kvm
> > 
> > 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>:
> > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> > >> On the affected host, VMs freeze at a rate about two or three per day.
> > >> They just stop dead in their tracks, console and serial console become
> > >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> > >> virsh destroy.
> > >
> > > I was able to obtain a log of a VM before it became unresponsive. here
> > > we go:
> > >
> > > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 
> > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console 
> > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 
> > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic 
> > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 
> > > virtio_pci virtio_ring virtio ata_piix i2c_core libata
> > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
> > > 4.14.1-zgsrv20080 #3
> > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > > c91fc000
> > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
> > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
> > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 
> > > RCX: 0002
> > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b 
> > > RDI: 819bdd80
> > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 
> > > R09: 8800
> > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  
> > > R12: c91ffa40
> > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 
> > > R15: ea193f80
> > > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > > GS:88001fd0() knlGS:
> > > Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 
> > > 80050033
> > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 
> > > CR4: 000406e0
> > > Nov 22 08:19:01 weave kernel: Call Trace:
> > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
> > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
> > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
> > > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10
> > > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246
> > > Nov 22 08:19:01 weave kernel: RAX:  RBX: 0004 
> > > RCX: 0200
> > > Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 
> > > RDI: 8800064fe000
> > > Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 
> > > R09: 8800
> > > Nov 22 08:19:01 weave kernel: R10: 00

Re: VMs freezing when host is running 4.14

2018-01-08 Thread Marc Haber
Hi,

it's been five weeks since I gave you the last information about this
issue. Alas, I don't have a solution yet, only reports:

- The bisect between 4.13 and 4.14 ended up on a one-character fix in a
  comment, so that was a total waste.
- The issue is present in all recent kernels up to 4.15-rc5, I didn't
  try any newer 4.15 version yet.
- 4.13-rc4 seems good
- 4.13-rc5 is the earliest kernel that shows the issue. I am at a loss
  to understand why a bug introduced during the 4.13 RC phase could
  _not_ be present in the 4.13 release but reappear in 4.14. I didn't
  try any 4.14 rc versions but suspect that those are all bad as well.

I will now start bisecting between 4.13-rc4 and 4.13-rc5, which is
"roughly 7 steps"; a kernel is "good" if it survived at least 72 hours
(as I found out that 24 hours might not be long enough).

I am still open to any suggestions that might help in identifying this
issue which now affects five of my six systems that to KVM
virtualization one way or the other. I have in the mean time experienced
file system corruption and data loss (and do have backups).

Greetings
Marc

On Fri, Dec 01, 2017 at 03:43:58PM +0100, Marc Haber wrote:
> On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > +cc kvm
> > 
> > 2017-11-22 10:39 GMT+01:00 Marc Haber :
> > > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> > >> On the affected host, VMs freeze at a rate about two or three per day.
> > >> They just stop dead in their tracks, console and serial console become
> > >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> > >> virsh destroy.
> > >
> > > I was able to obtain a log of a VM before it became unresponsive. here
> > > we go:
> > >
> > > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 
> > > crypto_simd glue_helper cryptd input_leds virtio_balloon virtio_console 
> > > led_class qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 
> > > fscrypto usbhid sr_mod cdrom virtio_blk virtio_net ata_generic 
> > > crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy i2c_piix4 
> > > virtio_pci virtio_ring virtio ata_piix i2c_core libata
> > > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
> > > 4.14.1-zgsrv20080 #3
> > > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > > c91fc000
> > > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
> > > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
> > > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 
> > > RCX: 0002
> > > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b 
> > > RDI: 819bdd80
> > > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 
> > > R09: 8800
> > > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  
> > > R12: c91ffa40
> > > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 
> > > R15: ea193f80
> > > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > > GS:88001fd0() knlGS:
> > > Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 
> > > 80050033
> > > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 
> > > CR4: 000406e0
> > > Nov 22 08:19:01 weave kernel: Call Trace:
> > > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
> > > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
> > > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
> > > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10
> > > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246
> > > Nov 22 08:19:01 weave kernel: RAX:  RBX: 0004 
> > > RCX: 0200
> > > Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 
> > > RDI: 8800064fe000
> > > Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 
> > > R09: 8800
> > > Nov 22 08:19:01 weave kernel: R10:  R11: 00

Re: VMs freezing when host is running 4.14

2017-12-01 Thread Marc Haber
4.14.3 is still affected.

I am still bisecting between 4.13 and 4.14, 5 steps to go. Defining a
kernel as "good" if it survived 24 hours on the hosts.

Greetings
Marc


On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> From: 王金浦 <jinpuw...@gmail.com>
> Subject: Re: VMs freezing when host is running 4.14
> To: Marc Haber <mh+linux-ker...@zugschlus.de>
> Cc: LKML <linux-kernel@vger.kernel.org>, "KVM-ML (k...@vger.kernel.org)"
>  <k...@vger.kernel.org>
> Date:   Wed, 22 Nov 2017 16:04:42 +0100
> List-ID: 
> X-Spam-Score: (-) -1.9
> X-Spam-Report: torres.zugschlus.de  Content analysis details:   (-1.9
>  points, 5.0 required)   pts  rule name  description  
>  -- --- -0.0
>  RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
>  trust
>  [209.85.215.48 listed in list.dnswl.org]  0.0
>  HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
>  domains are different  0.0 FREEMAIL_FROM
>   Sender email is commonly abused enduser mail provider
>  (jinpuwang[at]gmail.com) -0.0
>  RCVD_IN_MSPIKE_H3  RBL: Good reputation (+3)
>  [209.85.215.48 listed in wl.mailspike.net]
>  -1.9 BAYES_00   BODY: Bayes spam probability is 0 to 1%
>  [score: 0.] -0.1 DKIM_VALID
>  Message has at least one valid DKIM or DK signature -0.1
>  DKIM_VALID_AU  Message has a valid DKIM or DK signature from
>  author's domain  0.1 DKIM_SIGNED
> Message has a DKIM or DK signature, not necessarily valid  0.1
>  FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom
>   freemail headers are different -0.0
>  RCVD_IN_MSPIKE_WL  Mailspike good senders
> 
> +cc kvm
> 
> 2017-11-22 10:39 GMT+01:00 Marc Haber <mh+linux-ker...@zugschlus.de>:
> > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> >> On the affected host, VMs freeze at a rate about two or three per day.
> >> They just stop dead in their tracks, console and serial console become
> >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> >> virsh destroy.
> >
> > I was able to obtain a log of a VM before it became unresponsive. here
> > we go:
> >
> > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd 
> > glue_helper cryptd input_leds virtio_balloon virtio_console led_class 
> > qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid 
> > sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci 
> > ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio 
> > ata_piix i2c_core libata
> > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
> > 4.14.1-zgsrv20080 #3
> > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > c91fc000
> > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
> > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
> > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 
> > RCX: 0002
> > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b 
> > RDI: 819bdd80
> > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 
> > R09: 8800
> > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  
> > R12: c91ffa40
> > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 
> > R15: ea193f80
> > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > GS:88001fd0() knlGS:
> > Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 
> > CR4: 000406e0
> > Nov 22 08:19:01 weave kernel: Call Trace:
> > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
> > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
> > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
> > N

Re: VMs freezing when host is running 4.14

2017-12-01 Thread Marc Haber
4.14.3 is still affected.

I am still bisecting between 4.13 and 4.14, 5 steps to go. Defining a
kernel as "good" if it survived 24 hours on the hosts.

Greetings
Marc


On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> From: 王金浦 
> Subject: Re: VMs freezing when host is running 4.14
> To: Marc Haber 
> Cc: LKML , "KVM-ML (k...@vger.kernel.org)"
>  
> Date:   Wed, 22 Nov 2017 16:04:42 +0100
> List-ID: 
> X-Spam-Score: (-) -1.9
> X-Spam-Report: torres.zugschlus.de  Content analysis details:   (-1.9
>  points, 5.0 required)   pts  rule name  description  
>  -- --- -0.0
>  RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
>  trust
>  [209.85.215.48 listed in list.dnswl.org]  0.0
>  HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
>  domains are different  0.0 FREEMAIL_FROM
>   Sender email is commonly abused enduser mail provider
>  (jinpuwang[at]gmail.com) -0.0
>  RCVD_IN_MSPIKE_H3  RBL: Good reputation (+3)
>  [209.85.215.48 listed in wl.mailspike.net]
>  -1.9 BAYES_00   BODY: Bayes spam probability is 0 to 1%
>  [score: 0.] -0.1 DKIM_VALID
>  Message has at least one valid DKIM or DK signature -0.1
>  DKIM_VALID_AU  Message has a valid DKIM or DK signature from
>  author's domain  0.1 DKIM_SIGNED
> Message has a DKIM or DK signature, not necessarily valid  0.1
>  FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom
>   freemail headers are different -0.0
>  RCVD_IN_MSPIKE_WL  Mailspike good senders
> 
> +cc kvm
> 
> 2017-11-22 10:39 GMT+01:00 Marc Haber :
> > On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> >> On the affected host, VMs freeze at a rate about two or three per day.
> >> They just stop dead in their tracks, console and serial console become
> >> unresponsive, ping stops, they don't react to virsh shutdown, only to
> >> virsh destroy.
> >
> > I was able to obtain a log of a VM before it became unresponsive. here
> > we go:
> >
> > Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
> > Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd 
> > glue_helper cryptd input_leds virtio_balloon virtio_console led_class 
> > qemu_fw_cfg ip_tables x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid 
> > sr_mod cdrom virtio_blk virtio_net ata_generic crc32c_intel ehci_pci 
> > ehci_hcd usbcore usb_common floppy i2c_piix4 virtio_pci virtio_ring virtio 
> > ata_piix i2c_core libata
> > Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
> > 4.14.1-zgsrv20080 #3
> > Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + 
> > PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
> > c91fc000
> > Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
> > Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
> > Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 
> > RCX: 0002
> > Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b 
> > RDI: 819bdd80
> > Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 
> > R09: 8800
> > Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  
> > R12: c91ffa40
> > Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 
> > R15: ea193f80
> > Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
> > GS:88001fd0() knlGS:
> > Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 
> > CR4: 000406e0
> > Nov 22 08:19:01 weave kernel: Call Trace:
> > Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
> > Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
> > Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
> > Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10
> > Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246
> > Nov 22 08:19:01 

Re: VMs freezing when host is running 4.14

2017-11-24 Thread Marc Haber
On Thu, Nov 23, 2017 at 06:26:36PM +0200, Liran Alon wrote:
> If there is no nested guest so no. My fix here probably won't help.

I can confirm that I am not running nested virt, the host is running
directly on the APU. I also have three other machines that are running
flawlessly with 4.14, and another virtualization host, a "real" server
with a somewhat dated AMD Opteron 1389 that has the same issue. The
machine that first showed the issue is Intel, so we are not having a
vendor issue.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-24 Thread Marc Haber
On Thu, Nov 23, 2017 at 06:26:36PM +0200, Liran Alon wrote:
> If there is no nested guest so no. My fix here probably won't help.

I can confirm that I am not running nested virt, the host is running
directly on the APU. I also have three other machines that are running
flawlessly with 4.14, and another virtualization host, a "real" server
with a somewhat dated AMD Opteron 1389 that has the same issue. The
machine that first showed the issue is Intel, so we are not having a
vendor issue.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-23 Thread Marc Haber
On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote:
> 2017-11-22 16:52+0100, Marc Haber:
> > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > So all guest kernels are 4.14, or also other older kernel?
> > 
> > Guest kernels are also 4.14, but the issue disappears when the host is
> > downgraded to an older kernel. I therefore reckoned that the guest
> > kernel doesn't matter, but that was before I saw the trace in the log.
> 
> The two most suspicious patches since 4.13 (which I assume works) are
> 
>   664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been
>   injected")

That one does not revert cleanly, the line in questions seems to have
been removed a bit later.

Reject is:
141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat 
arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c
+++ arch/x86/kvm/vmx.c
@@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned nr = vcpu->arch.exception.nr;
bool has_error_code = vcpu->arch.exception.has_error_code;
-   bool reinject = vcpu->arch.exception.injected;
+   bool reinject = vcpu->arch.exception.reinject;
u32 error_code = vcpu->arch.exception.error_code;
u32 intr_info = nr | INTR_INFO_VALID_MASK;

> 
> and
> 
>   9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present"
>   and "Page Ready" exceptions simultaneously")
> 
> please try reverting them to see if it helps,

That one reverted cleanly. I am now running the new kernel on the
affected machine, and I think that a second machine has joined the
market of being affected.

Would this matter on the host only or on the guests as well?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-23 Thread Marc Haber
On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote:
> 2017-11-22 16:52+0100, Marc Haber:
> > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > So all guest kernels are 4.14, or also other older kernel?
> > 
> > Guest kernels are also 4.14, but the issue disappears when the host is
> > downgraded to an older kernel. I therefore reckoned that the guest
> > kernel doesn't matter, but that was before I saw the trace in the log.
> 
> The two most suspicious patches since 4.13 (which I assume works) are
> 
>   664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been
>   injected")

That one does not revert cleanly, the line in questions seems to have
been removed a bit later.

Reject is:
141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat 
arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c
+++ arch/x86/kvm/vmx.c
@@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned nr = vcpu->arch.exception.nr;
bool has_error_code = vcpu->arch.exception.has_error_code;
-   bool reinject = vcpu->arch.exception.injected;
+   bool reinject = vcpu->arch.exception.reinject;
u32 error_code = vcpu->arch.exception.error_code;
u32 intr_info = nr | INTR_INFO_VALID_MASK;

> 
> and
> 
>   9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present"
>   and "Page Ready" exceptions simultaneously")
> 
> please try reverting them to see if it helps,

That one reverted cleanly. I am now running the new kernel on the
affected machine, and I think that a second machine has joined the
market of being affected.

Would this matter on the host only or on the guests as well?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-22 Thread Marc Haber
On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> So all guest kernels are 4.14, or also other older kernel?

Guest kernels are also 4.14, but the issue disappears when the host is
downgraded to an older kernel. I therefore reckoned that the guest
kernel doesn't matter, but that was before I saw the trace in the log.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-22 Thread Marc Haber
On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> So all guest kernels are 4.14, or also other older kernel?

Guest kernels are also 4.14, but the issue disappears when the host is
downgraded to an older kernel. I therefore reckoned that the guest
kernel doesn't matter, but that was before I saw the trace in the log.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: VMs freezing when host is running 4.14

2017-11-22 Thread Marc Haber
On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> On the affected host, VMs freeze at a rate about two or three per day.
> They just stop dead in their tracks, console and serial console become
> unresponsive, ping stops, they don't react to virsh shutdown, only to
> virsh destroy.

I was able to obtain a log of a VM before it became unresponsive. here
we go:

Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd glue_helper 
cryptd input_leds virtio_balloon virtio_console led_class qemu_fw_cfg ip_tables 
x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid sr_mod cdrom virtio_blk 
virtio_net ata_generic crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy 
i2c_piix4 virtio_pci virtio_ring virtio ata_piix i2c_core libata
Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
4.14.1-zgsrv20080 #3
Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.10.2-1 04/01/2014
Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
c91fc000
Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 RCX: 
0002
Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b RDI: 
819bdd80
Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 R09: 
8800
Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  R12: 
c91ffa40
Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 R15: 
ea193f80
Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
GS:88001fd0() knlGS:
Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 80050033
Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 CR4: 
000406e0
Nov 22 08:19:01 weave kernel: Call Trace:
Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10
Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246
Nov 22 08:19:01 weave kernel: RAX:  RBX: 0004 RCX: 
0200
Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 RDI: 
8800064fe000
Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 R09: 
8800
Nov 22 08:19:01 weave kernel: R10:  R11:  R12: 
0020
Nov 22 08:19:01 weave kernel: R13: 88001ffd5500 R14: c91ffce8 R15: 
ea193f80
Nov 22 08:19:01 weave kernel: ? get_page_from_freelist+0x8c3/0xaf0
Nov 22 08:19:01 weave kernel: ? __mem_cgroup_threshold+0x8a/0x130
Nov 22 08:19:01 weave kernel: ? free_pcppages_bulk+0x3f6/0x410
Nov 22 08:19:01 weave kernel: __alloc_pages_nodemask+0xe4/0xe20
Nov 22 08:19:01 weave kernel: ? free_hot_cold_page_list+0x2b/0x50
Nov 22 08:19:01 weave kernel: ? release_pages+0x2b7/0x360
Nov 22 08:19:01 weave kernel: ? mem_cgroup_commit_charge+0x7a/0x520
Nov 22 08:19:01 weave kernel: ? account_entity_enqueue+0x95/0xc0
Nov 22 08:19:01 weave kernel: alloc_pages_vma+0x7f/0x1e0
Nov 22 08:19:01 weave kernel: __handle_mm_fault+0x9cb/0xf20
Nov 22 08:19:01 weave kernel: handle_mm_fault+0xb2/0x1f0
Nov 22 08:19:01 weave kernel: __do_page_fault+0x1f2/0x440
Nov 22 08:19:01 weave kernel: do_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: do_async_page_fault+0x4c/0x70
Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: RIP: 0033:0x56434ef679d8
Nov 22 08:19:01 weave kernel: RSP: 002b:7ffd6b48ad80 EFLAGS: 00010206
Nov 22 08:19:01 weave kernel: RAX: 00eb RBX: 001d RCX: 
aaab
Nov 22 08:19:01 weave kernel: RDX: 56434f5eb300 RSI: 000f RDI: 
56434f3ca6c0
Nov 22 08:19:01 weave kernel: RBP: 00ec R08: 7f97e2453000 R09: 
56434f5eb3ea
Nov 22 08:19:01 weave kernel: R10: 56434f5eb3eb R11: 56434f4510a0 R12: 
003a
Nov 22 08:19:01 weave kernel: R13: 56434f3ca500 R14: 56434f451240 R15: 
7f97e1024750
Nov 22 08:19:01 weave kernel: Code: f7 49 89 9d a0 d1 9b 81 48 89 55 98 4c 8d 
63 10 e8 4f 02 53 00 eb 20 48 83 7d 98 00 74 3a e8 21 6e 06 00 80 7d c0 00 74 
3f fb f4  66 66 90 66 66 90 e8 7d 6f 06 00 80 7d c0 00 75 da 48 8d b5
Nov 22 08:19:01 weave kernel: RIP: kvm_async_pf_task_wait+0x167/0x200 RSP: 
c91ffa10
Nov 22 08:19:01 weave kernel: ---[ end trace 4701012ee256be25 ]---

Does that help?

Greetings
Marc

-- 
--

Re: VMs freezing when host is running 4.14

2017-11-22 Thread Marc Haber
On Tue, Nov 21, 2017 at 05:18:21PM +0100, Marc Haber wrote:
> On the affected host, VMs freeze at a rate about two or three per day.
> They just stop dead in their tracks, console and serial console become
> unresponsive, ping stops, they don't react to virsh shutdown, only to
> virsh destroy.

I was able to obtain a log of a VM before it became unresponsive. here
we go:

Nov 22 08:19:01 weave kernel: double fault:  [#1] PREEMPT SMP
Nov 22 08:19:01 weave kernel: Modules linked in: crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc sg aesni_intel aes_x86_64 crypto_simd glue_helper 
cryptd input_leds virtio_balloon virtio_console led_class qemu_fw_cfg ip_tables 
x_tables autofs4 ext4 mbcache jbd2 fscrypto usbhid sr_mod cdrom virtio_blk 
virtio_net ata_generic crc32c_intel ehci_pci ehci_hcd usbcore usb_common floppy 
i2c_piix4 virtio_pci virtio_ring virtio ata_piix i2c_core libata
Nov 22 08:19:01 weave kernel: CPU: 1 PID: 8795 Comm: debsecan Not tainted 
4.14.1-zgsrv20080 #3
Nov 22 08:19:01 weave kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.10.2-1 04/01/2014
Nov 22 08:19:01 weave kernel: task: 88001ef0adc0 task.stack: 
c91fc000
Nov 22 08:19:01 weave kernel: RIP: 0010:kvm_async_pf_task_wait+0x167/0x200
Nov 22 08:19:01 weave kernel: RSP: :c91ffa10 EFLAGS: 0202
Nov 22 08:19:01 weave kernel: RAX: 88001fd11cc0 RBX: c91ffa30 RCX: 
0002
Nov 22 08:19:01 weave kernel: RDX: 0140 RSI: 8173514b RDI: 
819bdd80
Nov 22 08:19:01 weave kernel: RBP: c91ffaa0 R08: 00193fc0 R09: 
8800
Nov 22 08:19:01 weave kernel: R10: c91ffac0 R11:  R12: 
c91ffa40
Nov 22 08:19:01 weave kernel: R13: 0be8 R14: 819bdd80 R15: 
ea193f80
Nov 22 08:19:01 weave kernel: FS:  7f97e25dd700() 
GS:88001fd0() knlGS:
Nov 22 08:19:01 weave kernel: CS:  0010 DS:  ES:  CR0: 80050033
Nov 22 08:19:01 weave kernel: CR2: 00483001 CR3: 15df7000 CR4: 
000406e0
Nov 22 08:19:01 weave kernel: Call Trace:
Nov 22 08:19:01 weave kernel: do_async_page_fault+0x6b/0x70
Nov 22 08:19:01 weave kernel: ? do_async_page_fault+0x6b/0x70
Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: RIP: 0010:clear_page_rep+0x7/0x10
Nov 22 08:19:01 weave kernel: RSP: :c91ffb88 EFLAGS: 00010246
Nov 22 08:19:01 weave kernel: RAX:  RBX: 0004 RCX: 
0200
Nov 22 08:19:01 weave kernel: RDX: 88001ef0adc0 RSI: 00193f80 RDI: 
8800064fe000
Nov 22 08:19:01 weave kernel: RBP: c91ffc50 R08: 00193fc0 R09: 
8800
Nov 22 08:19:01 weave kernel: R10:  R11:  R12: 
0020
Nov 22 08:19:01 weave kernel: R13: 88001ffd5500 R14: c91ffce8 R15: 
ea193f80
Nov 22 08:19:01 weave kernel: ? get_page_from_freelist+0x8c3/0xaf0
Nov 22 08:19:01 weave kernel: ? __mem_cgroup_threshold+0x8a/0x130
Nov 22 08:19:01 weave kernel: ? free_pcppages_bulk+0x3f6/0x410
Nov 22 08:19:01 weave kernel: __alloc_pages_nodemask+0xe4/0xe20
Nov 22 08:19:01 weave kernel: ? free_hot_cold_page_list+0x2b/0x50
Nov 22 08:19:01 weave kernel: ? release_pages+0x2b7/0x360
Nov 22 08:19:01 weave kernel: ? mem_cgroup_commit_charge+0x7a/0x520
Nov 22 08:19:01 weave kernel: ? account_entity_enqueue+0x95/0xc0
Nov 22 08:19:01 weave kernel: alloc_pages_vma+0x7f/0x1e0
Nov 22 08:19:01 weave kernel: __handle_mm_fault+0x9cb/0xf20
Nov 22 08:19:01 weave kernel: handle_mm_fault+0xb2/0x1f0
Nov 22 08:19:01 weave kernel: __do_page_fault+0x1f2/0x440
Nov 22 08:19:01 weave kernel: do_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: do_async_page_fault+0x4c/0x70
Nov 22 08:19:01 weave kernel: async_page_fault+0x22/0x30
Nov 22 08:19:01 weave kernel: RIP: 0033:0x56434ef679d8
Nov 22 08:19:01 weave kernel: RSP: 002b:7ffd6b48ad80 EFLAGS: 00010206
Nov 22 08:19:01 weave kernel: RAX: 00eb RBX: 001d RCX: 
aaab
Nov 22 08:19:01 weave kernel: RDX: 56434f5eb300 RSI: 000f RDI: 
56434f3ca6c0
Nov 22 08:19:01 weave kernel: RBP: 00ec R08: 7f97e2453000 R09: 
56434f5eb3ea
Nov 22 08:19:01 weave kernel: R10: 56434f5eb3eb R11: 56434f4510a0 R12: 
003a
Nov 22 08:19:01 weave kernel: R13: 56434f3ca500 R14: 56434f451240 R15: 
7f97e1024750
Nov 22 08:19:01 weave kernel: Code: f7 49 89 9d a0 d1 9b 81 48 89 55 98 4c 8d 
63 10 e8 4f 02 53 00 eb 20 48 83 7d 98 00 74 3a e8 21 6e 06 00 80 7d c0 00 74 
3f fb f4  66 66 90 66 66 90 e8 7d 6f 06 00 80 7d c0 00 75 da 48 8d b5
Nov 22 08:19:01 weave kernel: RIP: kvm_async_pf_task_wait+0x167/0x200 RSP: 
c91ffa10
Nov 22 08:19:01 weave kernel: ---[ end trace 4701012ee256be25 ]---

Does that help?

Greetings
Marc

-- 
--

VMs freezing when host is running 4.14

2017-11-21 Thread Marc Haber
Hi,

I am running Debian stable with home-built kernels on a number of KVM
hosts and a bigger number of KVM VMs. With 4.14, I have an interesting
phenomenon on _one_ of my hosts, while all other hosts run fine. All
systems are reasonably similar to each other.

On the affected host, VMs freeze at a rate about two or three per day.
They just stop dead in their tracks, console and serial console become
unresponsive, ping stops, they don't react to virsh shutdown, only to
virsh destroy. They do, however, take a noticeable part of CPU resources
when they're in this state, up to a full CPU core.  What's left in
syslog of a VM is unsuspicious, the host logs don't have anything
uncommon. When I start a VM that allocates a lot of memory (like the 8
GB Windows VM that I use for bookkeeping and taxes), it happens that two
to five of the Linux VMs freeze in the same second.

The affected host is a Thinkpad T520 with 16 Gig of RAM and an Intel(R)
Core(TM) i5-2520M that is neither under noticeable load nor under memory
pressure (it can afford 8 Gig of disk cache).

When I boot the host back to 4.13.11, things are just fine and the
machine is chugging away painlessly for days. When I boot 4.14, the VM
freeze phenomenon usually appears in the first 24 hours.

The other VM hosts (ranging from a small PC Engines APU to a bigger 48
Gig Server in Housing) run just fine with 4.14 as well.

Any ideas?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


VMs freezing when host is running 4.14

2017-11-21 Thread Marc Haber
Hi,

I am running Debian stable with home-built kernels on a number of KVM
hosts and a bigger number of KVM VMs. With 4.14, I have an interesting
phenomenon on _one_ of my hosts, while all other hosts run fine. All
systems are reasonably similar to each other.

On the affected host, VMs freeze at a rate about two or three per day.
They just stop dead in their tracks, console and serial console become
unresponsive, ping stops, they don't react to virsh shutdown, only to
virsh destroy. They do, however, take a noticeable part of CPU resources
when they're in this state, up to a full CPU core.  What's left in
syslog of a VM is unsuspicious, the host logs don't have anything
uncommon. When I start a VM that allocates a lot of memory (like the 8
GB Windows VM that I use for bookkeeping and taxes), it happens that two
to five of the Linux VMs freeze in the same second.

The affected host is a Thinkpad T520 with 16 Gig of RAM and an Intel(R)
Core(TM) i5-2520M that is neither under noticeable load nor under memory
pressure (it can afford 8 Gig of disk cache).

When I boot the host back to 4.13.11, things are just fine and the
machine is chugging away painlessly for days. When I boot 4.14, the VM
freeze phenomenon usually appears in the first 24 hours.

The other VM hosts (ranging from a small PC Engines APU to a bigger 48
Gig Server in Housing) run just fine with 4.14 as well.

Any ideas?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression

2017-11-13 Thread Marc Haber
Hi,

those four patches never made it into any 4.13 release.

0001-net-call-sk_reuseport_match-if-we-are-a-reusesock.patch
0001-net-don-t-fast-patch-mismatched-sockets-in-STRICT-mo.patch
0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch
0001-net-set-tb-fast_sk_family.patch

And I have just seen that the first two are not even in 4.14. What does
that mean for libvirt users on systems runnign a 4.14 kernel?

The third and fourth patch
(0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch and
0001-net-set-tb-fast_sk_family.patch) seem to be in 4.14.

Greetings
Marc

On Mon, Sep 18, 2017 at 10:02:32AM +0200, Marc Haber wrote:
> On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote:
> > On 09/15/2017 01:51 PM, Josef Bacik wrote:
> > > Finally got access to a box to run this down myself.  This patch on top 
> > > of the other patches fixes the problem for me, could you verify it works 
> > > for you?  Thanks,
> > > 
> > 
> > Yup I can confirm that patch fixes things when applied on top of the
> > previous 3 patches. Thanks! Please tag those patches for stable releases
> > if appropriate, this is affecting a decent amount of libvirt users
> 
> I can also confirm that these four patches fix things for me (on
> Debian) as well. Thanks!
> 
> I would love to have this in one of Greg's next 4.13 releases.

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression

2017-11-13 Thread Marc Haber
Hi,

those four patches never made it into any 4.13 release.

0001-net-call-sk_reuseport_match-if-we-are-a-reusesock.patch
0001-net-don-t-fast-patch-mismatched-sockets-in-STRICT-mo.patch
0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch
0001-net-set-tb-fast_sk_family.patch

And I have just seen that the first two are not even in 4.14. What does
that mean for libvirt users on systems runnign a 4.14 kernel?

The third and fourth patch
(0001-net-use-inet6_rcv_saddr-to-compare-sockets.patch and
0001-net-set-tb-fast_sk_family.patch) seem to be in 4.14.

Greetings
Marc

On Mon, Sep 18, 2017 at 10:02:32AM +0200, Marc Haber wrote:
> On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote:
> > On 09/15/2017 01:51 PM, Josef Bacik wrote:
> > > Finally got access to a box to run this down myself.  This patch on top 
> > > of the other patches fixes the problem for me, could you verify it works 
> > > for you?  Thanks,
> > > 
> > 
> > Yup I can confirm that patch fixes things when applied on top of the
> > previous 3 patches. Thanks! Please tag those patches for stable releases
> > if appropriate, this is affecting a decent amount of libvirt users
> 
> I can also confirm that these four patches fix things for me (on
> Debian) as well. Thanks!
> 
> I would love to have this in one of Greg's next 4.13 releases.

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression

2017-09-18 Thread Marc Haber
On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote:
> On 09/15/2017 01:51 PM, Josef Bacik wrote:
> > Finally got access to a box to run this down myself.  This patch on top of 
> > the other patches fixes the problem for me, could you verify it works for 
> > you?  Thanks,
> > 
> 
> Yup I can confirm that patch fixes things when applied on top of the
> previous 3 patches. Thanks! Please tag those patches for stable releases
> if appropriate, this is affecting a decent amount of libvirt users

I can also confirm that these four patches fix things for me (on
Debian) as well. Thanks!

I would love to have this in one of Greg's next 4.13 releases.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression

2017-09-18 Thread Marc Haber
On Sun, Sep 17, 2017 at 09:17:13AM -0400, Cole Robinson wrote:
> On 09/15/2017 01:51 PM, Josef Bacik wrote:
> > Finally got access to a box to run this down myself.  This patch on top of 
> > the other patches fixes the problem for me, could you verify it works for 
> > you?  Thanks,
> > 
> 
> Yup I can confirm that patch fixes things when applied on top of the
> previous 3 patches. Thanks! Please tag those patches for stable releases
> if appropriate, this is affecting a decent amount of libvirt users

I can also confirm that these four patches fix things for me (on
Debian) as well. Thanks!

I would love to have this in one of Greg's next 4.13 releases.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6 [solved]

2016-06-14 Thread Marc Haber
Hi,

On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> and in 4.6 there is not even /dev/bus/usb.

This turned out to be a configuration issue. 4.6 kernels on Banana Pi
need CONFIG_AXP20X_POWER for working USB. If that driver is missing,
one gets a silent fail.

Thanks for all your help.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6 [solved]

2016-06-14 Thread Marc Haber
Hi,

On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> and in 4.6 there is not even /dev/bus/usb.

This turned out to be a configuration issue. 4.6 kernels on Banana Pi
need CONFIG_AXP20X_POWER for working USB. If that driver is missing,
one gets a silent fail.

Thanks for all your help.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-12 Thread Marc Haber
On Sat, Jun 11, 2016 at 02:55:04PM +0200, Marc Haber wrote:
> On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote:
> > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to
> > find the offending commit?
> 
> I can. The first round of bisecting let me end up with
> c8b710b3e4348119924051551b836c94835331b1 as the first bad commit,
> which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad
> as well. I am not sure whether things went well since I had to use git
> bisect skip twice because the resulting kernel wouldn't boot on the pi.

The kernel panic on boot is caused by bugs in the parport part. I
worked around these by disabling PARPORT in the kernel configuration.

However, a weekend of bisecting just sent me back to commit
d85ce830eef6c10d1e9617172dea4681f02b8424, which is a purely cosmetic
commit. What totally confuses me is the sheer size of the diff.

[8/506]mh@fan:~/linux/debug/linux.bad$ less .git/BISECT_LOG 
[9/507]mh@fan:~/linux/debug/linux.bad$ git log 
v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | cat
d85ce830eef6c10d1e9617172dea4681f02b8424 perf pmu: Fix misleadingly indented 
assignment (whitespace)
[10/508]mh@fan:~/linux/debug/linux.bad$ git diff 
v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l
811131
[11/509]mh@fan:~/linux/debug/linux.bad$ git show 
d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l
14
[12/510]mh@fan:~/linux/debug/linux.bad$

Why do I get a 80+ line diff for a 14 line commit?

This can't be correct. Hints?

Here is the BISECT_LOG:

git bisect start
# bad: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect bad 2dcd0af568b0cf583645c8a317dd12e344b1c72a
# good: [b562e44f507e863c6792946e4e1b1449fbbac85d] Linux 4.5
git bisect good b562e44f507e863c6792946e4e1b1449fbbac85d
# bad: [6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f] Merge branch 'for-4.6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f
# bad: [96b9b1c95660d4bc5510c5d798d3817ae9f0b391] Merge tag 'tty-4.6-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 96b9b1c95660d4bc5510c5d798d3817ae9f0b391
# bad: [277edbabf6fece057b14fb6db5e3a34e00f42f42] Merge tag 'pm+acpi-4.6-rc1-1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 277edbabf6fece057b14fb6db5e3a34e00f42f42
# bad: [5ca5446ec5ba5e79a6f271cd026bb153d6850fcc] Merge tag 'pinctrl-v4.6-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 5ca5446ec5ba5e79a6f271cd026bb153d6850fcc
# bad: [e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047] Merge branch 
'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047
# bad: [54fbad54ebcde9db9c7459e9e379f2350c25e1f1] perf mem record: Check for 
memory events support
git bisect bad 54fbad54ebcde9db9c7459e9e379f2350c25e1f1
# bad: [598b7c6919c7bbcc1243009721a01bc12275ff3e] perf jit: add source line 
info support
git bisect bad 598b7c6919c7bbcc1243009721a01bc12275ff3e
# bad: [3848c23b19e07188bfa15e3d9a2ac27692f2ff3c] perf report: Don't show blank 
lines if entry has no callchain
git bisect bad 3848c23b19e07188bfa15e3d9a2ac27692f2ff3c
# bad: [5ac76283b32b116c58e362e99542182ddcfc8262] perf cpumap: Auto initialize 
cpu__max_{node,cpu}
git bisect bad 5ac76283b32b116c58e362e99542182ddcfc8262
# bad: [cfd92dadc5e830268036efb25ff41618f29c3306] perf sort: Provide a way to 
find out if per-thread bucketing is in place
git bisect bad cfd92dadc5e830268036efb25ff41618f29c3306
# bad: [3379e0c3effa87d7734fc06277a7023292aadb0c] perf tools: Document the perf 
sysctls
git bisect bad 3379e0c3effa87d7734fc06277a7023292aadb0c
# bad: [86a2cf3123bfec118bfb98728d88be0668779b2b] perf stat: Making several 
helper functions static
git bisect bad 86a2cf3123bfec118bfb98728d88be0668779b2b
# bad: [403567217d3fa5d4801f820317ada52e5c5f0e53] perf symbols: Do not read 
symbols/data from device files
git bisect bad 403567217d3fa5d4801f820317ada52e5c5f0e53
# bad: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly 
indented assignment (whitespace)
git bisect bad d85ce830eef6c10d1e9617172dea4681f02b8424
# first bad commit: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix 
misleadingly indented assignment (whitespace)
[13/511]mh@fan:~/linux/debug/linux.bad$

(started with git checkout v4.6, git bisect start, git bisect bad, git
bisect good v4.5).

Greetings
Marc


-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-12 Thread Marc Haber
On Sat, Jun 11, 2016 at 02:55:04PM +0200, Marc Haber wrote:
> On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote:
> > Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to
> > find the offending commit?
> 
> I can. The first round of bisecting let me end up with
> c8b710b3e4348119924051551b836c94835331b1 as the first bad commit,
> which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad
> as well. I am not sure whether things went well since I had to use git
> bisect skip twice because the resulting kernel wouldn't boot on the pi.

The kernel panic on boot is caused by bugs in the parport part. I
worked around these by disabling PARPORT in the kernel configuration.

However, a weekend of bisecting just sent me back to commit
d85ce830eef6c10d1e9617172dea4681f02b8424, which is a purely cosmetic
commit. What totally confuses me is the sheer size of the diff.

[8/506]mh@fan:~/linux/debug/linux.bad$ less .git/BISECT_LOG 
[9/507]mh@fan:~/linux/debug/linux.bad$ git log 
v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | cat
d85ce830eef6c10d1e9617172dea4681f02b8424 perf pmu: Fix misleadingly indented 
assignment (whitespace)
[10/508]mh@fan:~/linux/debug/linux.bad$ git diff 
v4.5..d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l
811131
[11/509]mh@fan:~/linux/debug/linux.bad$ git show 
d85ce830eef6c10d1e9617172dea4681f02b8424 | wc -l
14
[12/510]mh@fan:~/linux/debug/linux.bad$

Why do I get a 80+ line diff for a 14 line commit?

This can't be correct. Hints?

Here is the BISECT_LOG:

git bisect start
# bad: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect bad 2dcd0af568b0cf583645c8a317dd12e344b1c72a
# good: [b562e44f507e863c6792946e4e1b1449fbbac85d] Linux 4.5
git bisect good b562e44f507e863c6792946e4e1b1449fbbac85d
# bad: [6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f] Merge branch 'for-4.6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 6b5f04b6cf8ebab9a65d9c0026c650bb2538fd0f
# bad: [96b9b1c95660d4bc5510c5d798d3817ae9f0b391] Merge tag 'tty-4.6-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 96b9b1c95660d4bc5510c5d798d3817ae9f0b391
# bad: [277edbabf6fece057b14fb6db5e3a34e00f42f42] Merge tag 'pm+acpi-4.6-rc1-1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 277edbabf6fece057b14fb6db5e3a34e00f42f42
# bad: [5ca5446ec5ba5e79a6f271cd026bb153d6850fcc] Merge tag 'pinctrl-v4.6-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 5ca5446ec5ba5e79a6f271cd026bb153d6850fcc
# bad: [e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047] Merge branch 
'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e71c2c1eeb8de7a083a728c5b7e0b83ed1faf047
# bad: [54fbad54ebcde9db9c7459e9e379f2350c25e1f1] perf mem record: Check for 
memory events support
git bisect bad 54fbad54ebcde9db9c7459e9e379f2350c25e1f1
# bad: [598b7c6919c7bbcc1243009721a01bc12275ff3e] perf jit: add source line 
info support
git bisect bad 598b7c6919c7bbcc1243009721a01bc12275ff3e
# bad: [3848c23b19e07188bfa15e3d9a2ac27692f2ff3c] perf report: Don't show blank 
lines if entry has no callchain
git bisect bad 3848c23b19e07188bfa15e3d9a2ac27692f2ff3c
# bad: [5ac76283b32b116c58e362e99542182ddcfc8262] perf cpumap: Auto initialize 
cpu__max_{node,cpu}
git bisect bad 5ac76283b32b116c58e362e99542182ddcfc8262
# bad: [cfd92dadc5e830268036efb25ff41618f29c3306] perf sort: Provide a way to 
find out if per-thread bucketing is in place
git bisect bad cfd92dadc5e830268036efb25ff41618f29c3306
# bad: [3379e0c3effa87d7734fc06277a7023292aadb0c] perf tools: Document the perf 
sysctls
git bisect bad 3379e0c3effa87d7734fc06277a7023292aadb0c
# bad: [86a2cf3123bfec118bfb98728d88be0668779b2b] perf stat: Making several 
helper functions static
git bisect bad 86a2cf3123bfec118bfb98728d88be0668779b2b
# bad: [403567217d3fa5d4801f820317ada52e5c5f0e53] perf symbols: Do not read 
symbols/data from device files
git bisect bad 403567217d3fa5d4801f820317ada52e5c5f0e53
# bad: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix misleadingly 
indented assignment (whitespace)
git bisect bad d85ce830eef6c10d1e9617172dea4681f02b8424
# first bad commit: [d85ce830eef6c10d1e9617172dea4681f02b8424] perf pmu: Fix 
misleadingly indented assignment (whitespace)
[13/511]mh@fan:~/linux/debug/linux.bad$

(started with git checkout v4.6, git bisect start, git bisect bad, git
bisect good v4.5).

Greetings
Marc


-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-11 Thread Marc Haber
On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote:
> Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to
> find the offending commit?

I can. The first round of bisecting let me end up with
c8b710b3e4348119924051551b836c94835331b1 as the first bad commit,
which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad
as well. I am not sure whether things went well since I had to use git
bisect skip twice because the resulting kernel wouldn't boot on the pi.

A second round of bisecting left me in limbo, since I do not
understand this:

[45/544]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect reset
HEAD is now at 2dcd0af... Linux 4.6
[46/545]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect start
[47/546]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect bad
[48/547]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect bad 
c8b710b3e4348119924051551b836c94835331b1^
[49/548]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect good v4.5
Some good revs are not ancestor of the bad rev.
git bisect cannot work properly in this case.
Maybe you mistook good and bad revs?
[50/549]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$

Do I need to start over completely?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-11 Thread Marc Haber
On Tue, Jun 07, 2016 at 10:30:17AM -0700, Greg KH wrote:
> Nothing obvious, can you use 'git bisect' to go from 4.5.0 to 4.6.0 to
> find the offending commit?

I can. The first round of bisecting let me end up with
c8b710b3e4348119924051551b836c94835331b1 as the first bad commit,
which is wrong, since c8b710b3e4348119924051551b836c94835331b1^ is bad
as well. I am not sure whether things went well since I had to use git
bisect skip twice because the resulting kernel wouldn't boot on the pi.

A second round of bisecting left me in limbo, since I do not
understand this:

[45/544]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect reset
HEAD is now at 2dcd0af... Linux 4.6
[46/545]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect start
[47/546]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect bad
[48/547]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect bad 
c8b710b3e4348119924051551b836c94835331b1^
[49/548]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$ git bisect good v4.5
Some good revs are not ancestor of the bad rev.
git bisect cannot work properly in this case.
Maybe you mistook good and bad revs?
[50/549]mh@fan[zgchroot kernel64][debian_chroot 
sid_kernel64]:~/linux/debug/linux$

Do I need to start over completely?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-07 Thread Marc Haber
On Fri, Jun 03, 2016 at 08:35:11AM -0700, Greg KH wrote:
> On Fri, Jun 03, 2016 at 08:53:58AM +0200, Marc Haber wrote:
> > On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote:
> > > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> > > > Hi,
> > > > 
> > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> > > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> > > > and in 4.6 there is not even /dev/bus/usb.
> > > > 
> > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform
> > > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
> > > > Controller" is the first one that is missing.
> > > > 
> > > > Is this already a known issue? Or, does a 4.6 kernel need to be
> > > > configured differently if you want USB?
> > > 
> > > Are you sure you configured in the correct host controller that you
> > > want?  Have you done a diff of your .config files to see what you
> > > changed?  How did you create your 4.6 config?
> > 
> > I used make oldconfig, and I diffed the configs for (hci|usb), and
> > there is no difference.
> 
> There might be other things than HCI|USB that are the issue here...

Full config diff attached. Hints are appreciated.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--- /boot/config-4.5.4-zgbpi-armmp-lpae 2016-05-13 17:02:25.0 +0200
+++ /boot/config-4.6.0-zgbpi-armmp-lpae 2016-05-16 14:18:06.0 +0200
@@ -3 +3 @@
-# Linux/arm 4.5.4 Kernel Configuration
+# Linux/arm 4.6.0 Kernel Configuration
@@ -173,0 +174,2 @@
+# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set
+CONFIG_KALLSYMS_BASE_RELATIVE=y
@@ -183 +185 @@
-# CONFIG_BPF_SYSCALL is not set
+CONFIG_BPF_SYSCALL=y
@@ -368,0 +371 @@
+# CONFIG_ARCH_ARTPEC is not set
@@ -463 +466,2 @@
-# CONFIG_ARM_KERNMEM_PERMS is not set
+CONFIG_DEBUG_RODATA=y
+CONFIG_DEBUG_ALIGN_RODATA=y
@@ -712 +715,0 @@
-CONFIG_INET_LRO=m
@@ -1126 +1128,0 @@
-CONFIG_NETLINK_MMAP=y
@@ -1167,0 +1170 @@
+# CONFIG_BT_LEDS is not set
@@ -1181,0 +1185 @@
+CONFIG_AF_KCM=m
@@ -1207,0 +1212,3 @@
+CONFIG_DST_CACHE=y
+CONFIG_NET_DEVLINK=m
+CONFIG_MAY_USE_DEVLINK=m
@@ -1234 +1241 @@
-CONFIG_REGMAP_SPI=m
+CONFIG_REGMAP_SPI=y
@@ -1236 +1242,0 @@
-CONFIG_REGMAP_IRQ=y
@@ -1246 +1252 @@
-# CONFIG_ARM_CCI500_PMU is not set
+# CONFIG_ARM_CCI5xx_PMU is not set
@@ -1407,0 +1414 @@
+# CONFIG_PANEL is not set
@@ -1440,0 +1448,4 @@
+# VOP Bus Driver
+#
+
+#
@@ -1454,0 +1466,4 @@
+
+#
+# VOP Driver
+#
@@ -1575,0 +1591 @@
+# CONFIG_MACSEC is not set
@@ -1816,0 +1833 @@
+# CONFIG_TOUCHSCREEN_MELFAS_MIP4 is not set
@@ -1829 +1845,0 @@
-# CONFIG_TOUCHSCREEN_TS4800 is not set
@@ -1856 +1871,0 @@
-# CONFIG_INPUT_AXP20X_PEK is not set
@@ -1869,0 +1885 @@
+# CONFIG_RMI4_CORE is not set
@@ -1922 +1937,0 @@
-# CONFIG_SERIAL_8250_INGENIC is not set
@@ -1951,0 +1967 @@
+# CONFIG_SERIAL_MVEBU_UART is not set
@@ -1981,0 +1998 @@
+# CONFIG_I2C_DEMUX_PINCTRL is not set
@@ -2031,0 +2049 @@
+# CONFIG_SPI_AXI_SPI_ENGINE is not set
@@ -2034,0 +2053 @@
+# CONFIG_SPI_DESIGNWARE is not set
@@ -2048 +2066,0 @@
-# CONFIG_SPI_DESIGNWARE is not set
@@ -2097 +2115 @@
-CONFIG_PINCTRL_SUNXI_COMMON=y
+CONFIG_PINCTRL_SUNXI=y
@@ -2109,0 +2128 @@
+CONFIG_PINCTRL_SUN8I_H3_R=y
@@ -2131,0 +2151 @@
+# CONFIG_GPIO_MPC8XXX is not set
@@ -2147,0 +2168 @@
+# CONFIG_GPIO_TPIC2810 is not set
@@ -2159,0 +2181 @@
+# CONFIG_GPIO_PISOSR is not set
@@ -2195 +2216,0 @@
-CONFIG_AXP20X_POWER=y
@@ -2246,0 +2268 @@
+# CONFIG_SENSORS_LTC2990 is not set
@@ -2364 +2385,0 @@
-# CONFIG_TS4800_WATCHDOG is not set
@@ -2366 +2386,0 @@
-# CONFIG_BCM7038_WDT is not set
@@ -2389,0 +2410 @@
+# CONFIG_MFD_ACT8945A is not set
@@ -2397 +2418,2 @@
-CONFIG_MFD_AXP20X=y
+# CONFIG_MFD_AXP20X_I2C is not set
+# CONFIG_MFD_AXP20X_RSB is not set
@@ -2454,0 +2477 @@
+# CONFIG_MFD_TPS65086 is not set
@@ -2460 +2482,0 @@
-# CONFIG_MFD_TPS65912 is not set
@@ -2491 +2512,0 @@
-# CONFIG_REGULATOR_AXP20X is not set
@@ -2566,0 +2588,5 @@
+# ACP (Audio CoProcessor) Configuration
+#
+# CONFIG_DRM_AMD_ACP is not set
+
+#
@@ -2636,0 +2663 @@
+CONFIG_SND_JACK_INPUT_DEV=y
@@ -2702,0 +2730 @@
+# CONFIG_SND_SUN4I_SPDIF is not set
@@ -2732 +2760,2 @@
-# CONFIG_SND_SOC_PCM179X is not set
+# CONFIG_SND_SOC_PCM179X_I2C is not set
+# CONFIG_SND_SOC_PCM179X_SPI is not set
@@ -2736,0 +2766 @@
+# CONFIG_SND_SOC_RT5616 is not set
@@ -2801,0 +2832 @@
+# CONFIG_HID_CMEDIA is not set
@@ -3153,0 +3185 @@
+# CONFIG_LEDS_IS31FL32XX is not set
@@ -3208 +3239,0 @@
-# CONFIG_R

Re: USB broken on Banana Pi in Linux 4.6

2016-06-07 Thread Marc Haber
On Fri, Jun 03, 2016 at 08:35:11AM -0700, Greg KH wrote:
> On Fri, Jun 03, 2016 at 08:53:58AM +0200, Marc Haber wrote:
> > On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote:
> > > On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> > > > Hi,
> > > > 
> > > > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> > > > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> > > > and in 4.6 there is not even /dev/bus/usb.
> > > > 
> > > > In kernel 4.6, the message "ohci-platform: OHCI generic platform
> > > > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
> > > > Controller" is the first one that is missing.
> > > > 
> > > > Is this already a known issue? Or, does a 4.6 kernel need to be
> > > > configured differently if you want USB?
> > > 
> > > Are you sure you configured in the correct host controller that you
> > > want?  Have you done a diff of your .config files to see what you
> > > changed?  How did you create your 4.6 config?
> > 
> > I used make oldconfig, and I diffed the configs for (hci|usb), and
> > there is no difference.
> 
> There might be other things than HCI|USB that are the issue here...

Full config diff attached. Hints are appreciated.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--- /boot/config-4.5.4-zgbpi-armmp-lpae 2016-05-13 17:02:25.0 +0200
+++ /boot/config-4.6.0-zgbpi-armmp-lpae 2016-05-16 14:18:06.0 +0200
@@ -3 +3 @@
-# Linux/arm 4.5.4 Kernel Configuration
+# Linux/arm 4.6.0 Kernel Configuration
@@ -173,0 +174,2 @@
+# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set
+CONFIG_KALLSYMS_BASE_RELATIVE=y
@@ -183 +185 @@
-# CONFIG_BPF_SYSCALL is not set
+CONFIG_BPF_SYSCALL=y
@@ -368,0 +371 @@
+# CONFIG_ARCH_ARTPEC is not set
@@ -463 +466,2 @@
-# CONFIG_ARM_KERNMEM_PERMS is not set
+CONFIG_DEBUG_RODATA=y
+CONFIG_DEBUG_ALIGN_RODATA=y
@@ -712 +715,0 @@
-CONFIG_INET_LRO=m
@@ -1126 +1128,0 @@
-CONFIG_NETLINK_MMAP=y
@@ -1167,0 +1170 @@
+# CONFIG_BT_LEDS is not set
@@ -1181,0 +1185 @@
+CONFIG_AF_KCM=m
@@ -1207,0 +1212,3 @@
+CONFIG_DST_CACHE=y
+CONFIG_NET_DEVLINK=m
+CONFIG_MAY_USE_DEVLINK=m
@@ -1234 +1241 @@
-CONFIG_REGMAP_SPI=m
+CONFIG_REGMAP_SPI=y
@@ -1236 +1242,0 @@
-CONFIG_REGMAP_IRQ=y
@@ -1246 +1252 @@
-# CONFIG_ARM_CCI500_PMU is not set
+# CONFIG_ARM_CCI5xx_PMU is not set
@@ -1407,0 +1414 @@
+# CONFIG_PANEL is not set
@@ -1440,0 +1448,4 @@
+# VOP Bus Driver
+#
+
+#
@@ -1454,0 +1466,4 @@
+
+#
+# VOP Driver
+#
@@ -1575,0 +1591 @@
+# CONFIG_MACSEC is not set
@@ -1816,0 +1833 @@
+# CONFIG_TOUCHSCREEN_MELFAS_MIP4 is not set
@@ -1829 +1845,0 @@
-# CONFIG_TOUCHSCREEN_TS4800 is not set
@@ -1856 +1871,0 @@
-# CONFIG_INPUT_AXP20X_PEK is not set
@@ -1869,0 +1885 @@
+# CONFIG_RMI4_CORE is not set
@@ -1922 +1937,0 @@
-# CONFIG_SERIAL_8250_INGENIC is not set
@@ -1951,0 +1967 @@
+# CONFIG_SERIAL_MVEBU_UART is not set
@@ -1981,0 +1998 @@
+# CONFIG_I2C_DEMUX_PINCTRL is not set
@@ -2031,0 +2049 @@
+# CONFIG_SPI_AXI_SPI_ENGINE is not set
@@ -2034,0 +2053 @@
+# CONFIG_SPI_DESIGNWARE is not set
@@ -2048 +2066,0 @@
-# CONFIG_SPI_DESIGNWARE is not set
@@ -2097 +2115 @@
-CONFIG_PINCTRL_SUNXI_COMMON=y
+CONFIG_PINCTRL_SUNXI=y
@@ -2109,0 +2128 @@
+CONFIG_PINCTRL_SUN8I_H3_R=y
@@ -2131,0 +2151 @@
+# CONFIG_GPIO_MPC8XXX is not set
@@ -2147,0 +2168 @@
+# CONFIG_GPIO_TPIC2810 is not set
@@ -2159,0 +2181 @@
+# CONFIG_GPIO_PISOSR is not set
@@ -2195 +2216,0 @@
-CONFIG_AXP20X_POWER=y
@@ -2246,0 +2268 @@
+# CONFIG_SENSORS_LTC2990 is not set
@@ -2364 +2385,0 @@
-# CONFIG_TS4800_WATCHDOG is not set
@@ -2366 +2386,0 @@
-# CONFIG_BCM7038_WDT is not set
@@ -2389,0 +2410 @@
+# CONFIG_MFD_ACT8945A is not set
@@ -2397 +2418,2 @@
-CONFIG_MFD_AXP20X=y
+# CONFIG_MFD_AXP20X_I2C is not set
+# CONFIG_MFD_AXP20X_RSB is not set
@@ -2454,0 +2477 @@
+# CONFIG_MFD_TPS65086 is not set
@@ -2460 +2482,0 @@
-# CONFIG_MFD_TPS65912 is not set
@@ -2491 +2512,0 @@
-# CONFIG_REGULATOR_AXP20X is not set
@@ -2566,0 +2588,5 @@
+# ACP (Audio CoProcessor) Configuration
+#
+# CONFIG_DRM_AMD_ACP is not set
+
+#
@@ -2636,0 +2663 @@
+CONFIG_SND_JACK_INPUT_DEV=y
@@ -2702,0 +2730 @@
+# CONFIG_SND_SUN4I_SPDIF is not set
@@ -2732 +2760,2 @@
-# CONFIG_SND_SOC_PCM179X is not set
+# CONFIG_SND_SOC_PCM179X_I2C is not set
+# CONFIG_SND_SOC_PCM179X_SPI is not set
@@ -2736,0 +2766 @@
+# CONFIG_SND_SOC_RT5616 is not set
@@ -2801,0 +2832 @@
+# CONFIG_HID_CMEDIA is not set
@@ -3153,0 +3185 @@
+# CONFIG_LEDS_IS31FL32XX is not set
@@ -3208 +3239,0 @@
-# CONFIG_R

Re: USB broken on Banana Pi in Linux 4.6

2016-06-03 Thread Marc Haber
On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote:
> On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> > Hi,
> > 
> > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> > and in 4.6 there is not even /dev/bus/usb.
> > 
> > In kernel 4.6, the message "ohci-platform: OHCI generic platform
> > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
> > Controller" is the first one that is missing.
> > 
> > Is this already a known issue? Or, does a 4.6 kernel need to be
> > configured differently if you want USB?
> 
> Are you sure you configured in the correct host controller that you
> want?  Have you done a diff of your .config files to see what you
> changed?  How did you create your 4.6 config?

I used make oldconfig, and I diffed the configs for (hci|usb), and
there is no difference.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: USB broken on Banana Pi in Linux 4.6

2016-06-03 Thread Marc Haber
On Mon, May 30, 2016 at 01:47:12PM -0700, Greg KH wrote:
> On Mon, May 30, 2016 at 09:02:54PM +0200, Marc Haber wrote:
> > Hi,
> > 
> > on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
> > is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
> > and in 4.6 there is not even /dev/bus/usb.
> > 
> > In kernel 4.6, the message "ohci-platform: OHCI generic platform
> > driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
> > Controller" is the first one that is missing.
> > 
> > Is this already a known issue? Or, does a 4.6 kernel need to be
> > configured differently if you want USB?
> 
> Are you sure you configured in the correct host controller that you
> want?  Have you done a diff of your .config files to see what you
> changed?  How did you create your 4.6 config?

I used make oldconfig, and I diffed the configs for (hci|usb), and
there is no difference.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


USB broken on Banana Pi in Linux 4.6

2016-05-30 Thread Marc Haber
Hi,

on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
and in 4.6 there is not even /dev/bus/usb.

Here is the log excerpt from a 4.5 kernel coming up:

May 15 09:30:14 cadencia kernel: [5.307730] ehci_hcd: USB 2.0 'Enhanced' 
Host Controller (EHCI) Driver
May 15 09:30:14 cadencia kernel: [5.312891] ehci-platform: EHCI generic 
platform driver
May 15 09:30:14 cadencia kernel: [5.315579] sun4i-ss 1c15000.crypto-engine: 
no reset control found
May 15 09:30:14 cadencia kernel: [5.317303] sun4i-ss 1c15000.crypto-engine: 
Die ID 0
May 15 09:30:14 cadencia kernel: [5.322742] ohci_hcd: USB 1.1 'Open' Host 
Controller (OHCI) Driver
May 15 09:30:14 cadencia kernel: [5.332052] ohci-platform: OHCI generic 
platform driver
May 15 09:30:14 cadencia kernel: [5.360131] axp20x 0-0034: AXP20x variant 
AXP209 found
May 15 09:30:14 cadencia kernel: [5.405989] axp20x 0-0034: AXP20X driver 
loaded
May 15 09:30:14 cadencia kernel: [5.409201] ehci-platform 1c14000.usb: EHCI 
Host Controller
May 15 09:30:14 cadencia kernel: [5.409271] ehci-platform 1c14000.usb: new 
USB bus registered, assigned bus number 1
May 15 09:30:14 cadencia kernel: [5.409506] ehci-platform 1c14000.usb: irq 
29, io mem 0x01c14000
May 15 09:30:14 cadencia kernel: [5.410553] sunxi-wdt 1c20c90.watchdog: 
Watchdog enabled (timeout=16 sec, nowayout=0)
May 15 09:30:14 cadencia kernel: [5.420414] ehci-platform 1c14000.usb: USB 
2.0 started, EHCI 1.00
May 15 09:30:14 cadencia kernel: [5.420977] usb usb1: New USB device found, 
idVendor=1d6b, idProduct=0002
May 15 09:30:14 cadencia kernel: [5.420998] usb usb1: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.421010] usb usb1: Product: EHCI Host 
Controller
May 15 09:30:14 cadencia kernel: [5.421021] usb usb1: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ehci_hcd
May 15 09:30:14 cadencia kernel: [5.421033] usb usb1: SerialNumber: 
1c14000.usb
May 15 09:30:14 cadencia kernel: [5.422317] hub 1-0:1.0: USB hub found
May 15 09:30:14 cadencia kernel: [5.422431] hub 1-0:1.0: 1 port detected
May 15 09:30:14 cadencia kernel: [5.423753] ehci-platform 1c1c000.usb: EHCI 
Host Controller
May 15 09:30:14 cadencia kernel: [5.423814] ehci-platform 1c1c000.usb: new 
USB bus registered, assigned bus number 2
May 15 09:30:14 cadencia kernel: [5.424055] ehci-platform 1c1c000.usb: irq 
33, io mem 0x01c1c000
May 15 09:30:14 cadencia kernel: [5.432424] ehci-platform 1c1c000.usb: USB 
2.0 started, EHCI 1.00
May 15 09:30:14 cadencia kernel: [5.433089] usb usb2: New USB device found, 
idVendor=1d6b, idProduct=0002
May 15 09:30:14 cadencia kernel: [5.433110] usb usb2: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.433122] usb usb2: Product: EHCI Host 
Controller
May 15 09:30:14 cadencia kernel: [5.433133] usb usb2: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ehci_hcd
May 15 09:30:14 cadencia kernel: [5.433144] usb usb2: SerialNumber: 
1c1c000.usb
May 15 09:30:14 cadencia kernel: [5.434472] hub 2-0:1.0: USB hub found
May 15 09:30:14 cadencia kernel: [5.434595] hub 2-0:1.0: 1 port detected
May 15 09:30:14 cadencia kernel: [5.436189] ohci-platform 1c14400.usb: 
Generic Platform OHCI controller
May 15 09:30:14 cadencia kernel: [5.436528] ohci-platform 1c14400.usb: new 
USB bus registered, assigned bus number 3
May 15 09:30:14 cadencia kernel: [5.436779] ohci-platform 1c14400.usb: irq 
30, io mem 0x01c14400
May 15 09:30:14 cadencia kernel: [5.497002] usb usb3: New USB device found, 
idVendor=1d6b, idProduct=0001
May 15 09:30:14 cadencia kernel: [5.497032] usb usb3: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.497045] usb usb3: Product: Generic 
Platform OHCI controller
May 15 09:30:14 cadencia kernel: [5.497056] usb usb3: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ohci_hcd
May 15 09:30:14 cadencia kernel: [5.497068] usb usb3: SerialNumber: 
1c14400.usb

In kernel 4.6, the message "ohci-platform: OHCI generic platform
driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
Controller" is the first one that is missing.

Is this already a known issue? Or, does a 4.6 kernel need to be
configured differently if you want USB?

Greetings
Marc



-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


USB broken on Banana Pi in Linux 4.6

2016-05-30 Thread Marc Haber
Hi,

on my Bananapis, in kernel 4.6 USB does not work. Kernel configuration
is USB-wise identical to 4.5 (grepped for differences in (hci|usb)),
and in 4.6 there is not even /dev/bus/usb.

Here is the log excerpt from a 4.5 kernel coming up:

May 15 09:30:14 cadencia kernel: [5.307730] ehci_hcd: USB 2.0 'Enhanced' 
Host Controller (EHCI) Driver
May 15 09:30:14 cadencia kernel: [5.312891] ehci-platform: EHCI generic 
platform driver
May 15 09:30:14 cadencia kernel: [5.315579] sun4i-ss 1c15000.crypto-engine: 
no reset control found
May 15 09:30:14 cadencia kernel: [5.317303] sun4i-ss 1c15000.crypto-engine: 
Die ID 0
May 15 09:30:14 cadencia kernel: [5.322742] ohci_hcd: USB 1.1 'Open' Host 
Controller (OHCI) Driver
May 15 09:30:14 cadencia kernel: [5.332052] ohci-platform: OHCI generic 
platform driver
May 15 09:30:14 cadencia kernel: [5.360131] axp20x 0-0034: AXP20x variant 
AXP209 found
May 15 09:30:14 cadencia kernel: [5.405989] axp20x 0-0034: AXP20X driver 
loaded
May 15 09:30:14 cadencia kernel: [5.409201] ehci-platform 1c14000.usb: EHCI 
Host Controller
May 15 09:30:14 cadencia kernel: [5.409271] ehci-platform 1c14000.usb: new 
USB bus registered, assigned bus number 1
May 15 09:30:14 cadencia kernel: [5.409506] ehci-platform 1c14000.usb: irq 
29, io mem 0x01c14000
May 15 09:30:14 cadencia kernel: [5.410553] sunxi-wdt 1c20c90.watchdog: 
Watchdog enabled (timeout=16 sec, nowayout=0)
May 15 09:30:14 cadencia kernel: [5.420414] ehci-platform 1c14000.usb: USB 
2.0 started, EHCI 1.00
May 15 09:30:14 cadencia kernel: [5.420977] usb usb1: New USB device found, 
idVendor=1d6b, idProduct=0002
May 15 09:30:14 cadencia kernel: [5.420998] usb usb1: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.421010] usb usb1: Product: EHCI Host 
Controller
May 15 09:30:14 cadencia kernel: [5.421021] usb usb1: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ehci_hcd
May 15 09:30:14 cadencia kernel: [5.421033] usb usb1: SerialNumber: 
1c14000.usb
May 15 09:30:14 cadencia kernel: [5.422317] hub 1-0:1.0: USB hub found
May 15 09:30:14 cadencia kernel: [5.422431] hub 1-0:1.0: 1 port detected
May 15 09:30:14 cadencia kernel: [5.423753] ehci-platform 1c1c000.usb: EHCI 
Host Controller
May 15 09:30:14 cadencia kernel: [5.423814] ehci-platform 1c1c000.usb: new 
USB bus registered, assigned bus number 2
May 15 09:30:14 cadencia kernel: [5.424055] ehci-platform 1c1c000.usb: irq 
33, io mem 0x01c1c000
May 15 09:30:14 cadencia kernel: [5.432424] ehci-platform 1c1c000.usb: USB 
2.0 started, EHCI 1.00
May 15 09:30:14 cadencia kernel: [5.433089] usb usb2: New USB device found, 
idVendor=1d6b, idProduct=0002
May 15 09:30:14 cadencia kernel: [5.433110] usb usb2: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.433122] usb usb2: Product: EHCI Host 
Controller
May 15 09:30:14 cadencia kernel: [5.433133] usb usb2: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ehci_hcd
May 15 09:30:14 cadencia kernel: [5.433144] usb usb2: SerialNumber: 
1c1c000.usb
May 15 09:30:14 cadencia kernel: [5.434472] hub 2-0:1.0: USB hub found
May 15 09:30:14 cadencia kernel: [5.434595] hub 2-0:1.0: 1 port detected
May 15 09:30:14 cadencia kernel: [5.436189] ohci-platform 1c14400.usb: 
Generic Platform OHCI controller
May 15 09:30:14 cadencia kernel: [5.436528] ohci-platform 1c14400.usb: new 
USB bus registered, assigned bus number 3
May 15 09:30:14 cadencia kernel: [5.436779] ohci-platform 1c14400.usb: irq 
30, io mem 0x01c14400
May 15 09:30:14 cadencia kernel: [5.497002] usb usb3: New USB device found, 
idVendor=1d6b, idProduct=0001
May 15 09:30:14 cadencia kernel: [5.497032] usb usb3: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
May 15 09:30:14 cadencia kernel: [5.497045] usb usb3: Product: Generic 
Platform OHCI controller
May 15 09:30:14 cadencia kernel: [5.497056] usb usb3: Manufacturer: Linux 
4.5.4-zgbpi-armmp-lpae ohci_hcd
May 15 09:30:14 cadencia kernel: [5.497068] usb usb3: SerialNumber: 
1c14400.usb

In kernel 4.6, the message "ohci-platform: OHCI generic platform
driver" is the last one, and "ehci-platform 1c14000.usb: EHCI Host
Controller" is the first one that is missing.

Is this already a known issue? Or, does a 4.6 kernel need to be
configured differently if you want USB?

Greetings
Marc



-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-14 Thread Marc Haber
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote:
> Try this one better - it fixes an unitialized var.

Nosireebob, VMs crash even with this patch in the host as soon as the
host has THP enabled.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-14 Thread Marc Haber
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote:
> Try this one better - it fixes an unitialized var.

Nosireebob, VMs crash even with this patch in the host as soon as the
host has THP enabled.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote:
> On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote:
> > How do I apply this?
> 
> I'm attaching it.

Had the VM crashing twice with this patch applied, THP==madvise on the
host and THP==never in the VM.

Now trying the other patch, assuming that it's intended to be used
_instead_ of this one.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote:
> On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote:
> > How do I apply this?
> 
> I'm attaching it.

Had the VM crashing twice with this patch applied, THP==madvise on the
host and THP==never in the VM.

Now trying the other patch, assuming that it's intended to be used
_instead_ of this one.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 09:35:45AM +0100, Dr. David Alan Gilbert wrote:
> also between 4.4 and 4.5 it did seem worth mentioning as a long shot,
> but it was no more than a long shot.

It was however helpful. I'd have bisected kernel configuration instead
of using the runtime control first, and seeing your long shot two
weeks earlier, it'd have saved myself those two weeks of tedious
bisecting.

> Try Andrea's fix for (a).

In the works.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 09:35:45AM +0100, Dr. David Alan Gilbert wrote:
> also between 4.4 and 4.5 it did seem worth mentioning as a long shot,
> but it was no more than a long shot.

It was however helpful. I'd have bisected kernel configuration instead
of using the runtime control first, and seeing your long shot two
weeks earlier, it'd have saved myself those two weeks of tedious
bisecting.

> Try Andrea's fix for (a).

In the works.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote:
> Try this one better - it fixes an unitialized var.

Instead, or in addiiton?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:09:52AM +0200, Borislav Petkov wrote:
> Try this one better - it fixes an unitialized var.

Instead, or in addiiton?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote:
> On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote:
> > How do I apply this?
> 
> I'm attaching it.

Ok, stupid me, I thought that one could simply curl the web page. Too
bad that list archives keep mangling patches :-(

It applies now to 4.5 as well.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-13 Thread Marc Haber
On Fri, May 13, 2016 at 10:07:45AM +0200, Borislav Petkov wrote:
> On Fri, May 13, 2016 at 07:23:34AM +0200, Marc Haber wrote:
> > How do I apply this?
> 
> I'm attaching it.

Ok, stupid me, I thought that one could simply curl the web page. Too
bad that list archives keep mangling patches :-(

It applies now to 4.5 as well.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
On Thu, May 12, 2016 at 11:42:16PM +0300, Kirill A. Shutemov wrote:
> But I guess it should apply cleanly to v4.5. Or at least without major
> conflicts.

[11/511]mh@fan:~/linux/debug/linux$ curl 
'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 125290 125290 0   9844  0 --:--:--  0:00:01 --:--:--  9849
patching file include/linux/mm.h
Hunk #1 succeeded at 456 with fuzz 1 (offset -44 lines).
patching file include/linux/swap.h
Hunk #2 FAILED at 513.
1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej
patching file mm/huge_memory.c
Hunk #1 FAILED at 1298.
Hunk #2 FAILED at 2079.
Hunk #3 succeeded at 3340 (offset 117 lines).
2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej
patching file mm/memory.c
Hunk #1 FAILED at 2373.
Hunk #2 succeeded at 2331 with fuzz 2 (offset -56 lines).
Hunk #3 FAILED at 2622.
2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej
patching file mm/swapfile.c
Hunk #1 FAILED at 922.
1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej
[12/512]mh@fan:~/linux/debug/linux$

It doesn't, and it doesn't apply to 4.6-rc3 as well:

[17/517]mh@fan:~/linux/debug/linux$ git checkout v4.6-rc3
Checking out files: 100% (9945/9945), done.
Previous HEAD position was b562e44... Linux 4.5
HEAD is now at bf16200... Linux 4.6-rc3
[18/518]mh@fan:~/linux/debug/linux$ curl 
'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 125290 125290 0   9692  0 --:--:--  0:00:01 --:--:--  9697
patching file include/linux/mm.h
patching file include/linux/swap.h
Hunk #2 FAILED at 513.
1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej
patching file mm/huge_memory.c
Hunk #1 FAILED at 1298.
Hunk #2 FAILED at 2079.
Hunk #3 succeeded at 3225 (offset 2 lines).
2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej
patching file mm/memory.c
Hunk #1 FAILED at 2373.
Hunk #2 succeeded at 2354 (offset -33 lines).
Hunk #3 FAILED at 2622.
2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej
patching file mm/swapfile.c
Hunk #1 FAILED at 922.
1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej
[19/519]mh@fan:~/linux/debug/linux$

How do I apply this?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
On Thu, May 12, 2016 at 11:42:16PM +0300, Kirill A. Shutemov wrote:
> But I guess it should apply cleanly to v4.5. Or at least without major
> conflicts.

[11/511]mh@fan:~/linux/debug/linux$ curl 
'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 125290 125290 0   9844  0 --:--:--  0:00:01 --:--:--  9849
patching file include/linux/mm.h
Hunk #1 succeeded at 456 with fuzz 1 (offset -44 lines).
patching file include/linux/swap.h
Hunk #2 FAILED at 513.
1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej
patching file mm/huge_memory.c
Hunk #1 FAILED at 1298.
Hunk #2 FAILED at 2079.
Hunk #3 succeeded at 3340 (offset 117 lines).
2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej
patching file mm/memory.c
Hunk #1 FAILED at 2373.
Hunk #2 succeeded at 2331 with fuzz 2 (offset -56 lines).
Hunk #3 FAILED at 2622.
2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej
patching file mm/swapfile.c
Hunk #1 FAILED at 922.
1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej
[12/512]mh@fan:~/linux/debug/linux$

It doesn't, and it doesn't apply to 4.6-rc3 as well:

[17/517]mh@fan:~/linux/debug/linux$ git checkout v4.6-rc3
Checking out files: 100% (9945/9945), done.
Previous HEAD position was b562e44... Linux 4.5
HEAD is now at bf16200... Linux 4.6-rc3
[18/518]mh@fan:~/linux/debug/linux$ curl 
'http://marc.info/?l=linux-rdma=146307074800836=2' | patch -p1
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 125290 125290 0   9692  0 --:--:--  0:00:01 --:--:--  9697
patching file include/linux/mm.h
patching file include/linux/swap.h
Hunk #2 FAILED at 513.
1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej
patching file mm/huge_memory.c
Hunk #1 FAILED at 1298.
Hunk #2 FAILED at 2079.
Hunk #3 succeeded at 3225 (offset 2 lines).
2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej
patching file mm/memory.c
Hunk #1 FAILED at 2373.
Hunk #2 succeeded at 2354 (offset -33 lines).
Hunk #3 FAILED at 2622.
2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej
patching file mm/swapfile.c
Hunk #1 FAILED at 922.
1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej
[19/519]mh@fan:~/linux/debug/linux$

How do I apply this?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote:
> http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarca...@redhat.com

Is this in v4.6-rc7?

If so, can I just test v4.6-rc7?

If not so, would it be a valid approach to first check plain v4.6-rc7
and then patched v4.6-rc7?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote:
> http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarca...@redhat.com

Is this in v4.6-rc7?

If so, can I just test v4.6-rc7?

If not so, would it be a valid approach to first check plain v4.6-rc7
and then patched v4.6-rc7?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
Hi David,

On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote:
> Hmm, your problem does sound like bad hardware, but
> If you've got a nice reliable crash, can you try turning transparent huge 
> pages
> off on the host;
>echo never > /sys/kernel/mm/transparent_hugepage/enabled

I must have missed this hint in the middle of the "your hardware is
bad" avalance that came over me.

I spent two weeks bisecting "good" kernels since during the repeated
reconfigurations, transparent huge pages got turned off in kernel
configuration. After running each kernel for 24 hours, I eventually
ended up with a working 4.5 kernel. The configuration diff was short,
showing transparent huge pages, and - finally - upon re-reading the
thread I found your hint.

I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest
memory reliably in the first hour of running under disk load, causing
the VM to either drop dead in the water, or to read randomness from
disk. Rebooting fixes the VM. This happens as soon as transparent huge
pages are turned on in the host.

Turning off transparent huge pages by echo never >
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue even
without rebooting the host. Start up the VM again and it works just
fine.

Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu?
Where should this issue be forwarded? Or do we just accept it and turn
transparent huge pages off?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


transparent huge pages breaks KVM on AMD.

2016-05-12 Thread Marc Haber
Hi David,

On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote:
> Hmm, your problem does sound like bad hardware, but
> If you've got a nice reliable crash, can you try turning transparent huge 
> pages
> off on the host;
>echo never > /sys/kernel/mm/transparent_hugepage/enabled

I must have missed this hint in the middle of the "your hardware is
bad" avalance that came over me.

I spent two weeks bisecting "good" kernels since during the repeated
reconfigurations, transparent huge pages got turned off in kernel
configuration. After running each kernel for 24 hours, I eventually
ended up with a working 4.5 kernel. The configuration diff was short,
showing transparent huge pages, and - finally - upon re-reading the
thread I found your hint.

I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest
memory reliably in the first hour of running under disk load, causing
the VM to either drop dead in the water, or to read randomness from
disk. Rebooting fixes the VM. This happens as soon as transparent huge
pages are turned on in the host.

Turning off transparent huge pages by echo never >
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue even
without rebooting the host. Start up the VM again and it works just
fine.

Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu?
Where should this issue be forwarded? Or do we just accept it and turn
transparent huge pages off?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"

2016-05-11 Thread Marc Haber
On Wed, Apr 13, 2016 at 05:44:25PM +0200, Marc Haber wrote:
> On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote:
> > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
> > due to problems on GeekBox and Banana Pi M1 board when
> > connected to a real transceiver instead of a switch via
> > fixed-link.
> 
> This reversal is still needed in Linux 4.5.1 on Banana Pi.
> 
> Please consider including it in Linux 4.5.2.

This reversal is still needed in Linux 4.5.4 on Banana Pi.

Please consider including it in Linux 4.5.5.

Greetings
Marc



> 
> > 
> > Signed-off-by: Giuseppe Cavallaro <peppe.cavall...@st.com>
> > Cc: Gabriel Fernandez <gabriel.fernan...@linaro.org>
> > Cc: Andreas Färber <afaer...@suse.de>
> > Cc: Frank Schäfer <fschaefer@googlemail.com>
> > Cc: Dinh Nguyen <dinh.li...@gmail.com>
> > Cc: David S. Miller <da...@davemloft.net>
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   11 ++-
> >  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |9 +
> >  include/linux/stmmac.h |1 -
> >  3 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > index ea76129..af09ced 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
> > struct stmmac_priv *priv = netdev_priv(ndev);
> > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
> > int addr, found;
> > -   struct device_node *mdio_node = priv->plat->mdio_node;
> > +   struct device_node *mdio_node = NULL;
> > +   struct device_node *child_node = NULL;
> >  
> > if (!mdio_bus_data)
> > return 0;
> >  
> > if (IS_ENABLED(CONFIG_OF)) {
> > +   for_each_child_of_node(priv->device->of_node, child_node) {
> > +   if (of_device_is_compatible(child_node,
> > +   "snps,dwmac-mdio")) {
> > +   mdio_node = child_node;
> > +   break;
> > +   }
> > +   }
> > +
> > if (mdio_node) {
> > netdev_dbg(ndev, "FOUND MDIO subnode\n");
> > } else {
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > index dcbd2a1..9cf181f 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> > const char **mac)
> > struct device_node *np = pdev->dev.of_node;
> > struct plat_stmmacenet_data *plat;
> > struct stmmac_dma_cfg *dma_cfg;
> > -   struct device_node *child_node = NULL;
> >  
> > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL);
> > if (!plat)
> > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> > const char **mac)
> > plat->phy_node = of_node_get(np);
> > }
> >  
> > -   for_each_child_of_node(np, child_node)
> > -   if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
> > -   plat->mdio_node = child_node;
> > -   break;
> > -   }
> > -
> > /* "snps,phy-addr" is not a standard property. Mark it as deprecated
> >  * and warn of its use. Remove this when phy node support is added.
> >  */
> > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
> > dev_warn(>dev, "snps,phy-addr property is deprecated\n");
> >  
> > -   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
> > +   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
> >     plat->mdio_bus_data = NULL;
> > else
> > plat->mdio_bus_data =
> > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> > index 4bcf5a6..6e53fa8 100644
> > --- a/include/linux/stmmac.h
> > +++ b/include/linux/stmmac.h
> > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
> >  

Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"

2016-05-11 Thread Marc Haber
On Wed, Apr 13, 2016 at 05:44:25PM +0200, Marc Haber wrote:
> On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote:
> > This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
> > due to problems on GeekBox and Banana Pi M1 board when
> > connected to a real transceiver instead of a switch via
> > fixed-link.
> 
> This reversal is still needed in Linux 4.5.1 on Banana Pi.
> 
> Please consider including it in Linux 4.5.2.

This reversal is still needed in Linux 4.5.4 on Banana Pi.

Please consider including it in Linux 4.5.5.

Greetings
Marc



> 
> > 
> > Signed-off-by: Giuseppe Cavallaro 
> > Cc: Gabriel Fernandez 
> > Cc: Andreas Färber 
> > Cc: Frank Schäfer 
> > Cc: Dinh Nguyen 
> > Cc: David S. Miller 
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   11 ++-
> >  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |9 +
> >  include/linux/stmmac.h |1 -
> >  3 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > index ea76129..af09ced 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> > @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
> > struct stmmac_priv *priv = netdev_priv(ndev);
> > struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
> > int addr, found;
> > -   struct device_node *mdio_node = priv->plat->mdio_node;
> > +   struct device_node *mdio_node = NULL;
> > +   struct device_node *child_node = NULL;
> >  
> > if (!mdio_bus_data)
> > return 0;
> >  
> > if (IS_ENABLED(CONFIG_OF)) {
> > +   for_each_child_of_node(priv->device->of_node, child_node) {
> > +   if (of_device_is_compatible(child_node,
> > +   "snps,dwmac-mdio")) {
> > +   mdio_node = child_node;
> > +   break;
> > +   }
> > +   }
> > +
> > if (mdio_node) {
> > netdev_dbg(ndev, "FOUND MDIO subnode\n");
> > } else {
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > index dcbd2a1..9cf181f 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> > const char **mac)
> > struct device_node *np = pdev->dev.of_node;
> > struct plat_stmmacenet_data *plat;
> > struct stmmac_dma_cfg *dma_cfg;
> > -   struct device_node *child_node = NULL;
> >  
> > plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL);
> > if (!plat)
> > @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> > const char **mac)
> > plat->phy_node = of_node_get(np);
> > }
> >  
> > -   for_each_child_of_node(np, child_node)
> > -   if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
> > -   plat->mdio_node = child_node;
> > -   break;
> > -   }
> > -
> > /* "snps,phy-addr" is not a standard property. Mark it as deprecated
> >  * and warn of its use. Remove this when phy node support is added.
> >  */
> > if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
> > dev_warn(>dev, "snps,phy-addr property is deprecated\n");
> >  
> > -   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
> > +   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
> > plat->mdio_bus_data = NULL;
> >     else
> > plat->mdio_bus_data =
> > diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> > index 4bcf5a6..6e53fa8 100644
> > --- a/include/linux/stmmac.h
> > +++ b/include/linux/stmmac.h
> > @@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
> >     int interface;
> > struct stmmac_mdio_bus_data *mdio_bus_data;
> > struct device_node *phy_node;
> > -   struct device_node *mdio_node;
> > struct stmmac_d

Re: Major KVM issues with kernel 4.5 on the host

2016-04-23 Thread Marc Haber
On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote:
> > Yes, but there are two symptoms. The VM either suffers file system
> > issues (garbage read from files, or an aborted ext4 journal and
> > following ro remount) or it stops dead in its tracks.
> 
> Stops dead? What does that mean exactly? Box is wedged solid and it
> doesn't react to any key presses?

No ping, no reaction on serial console, no reaction on virtual
console, no syslog entries.

> Because if so, this could really be a DRAM going bad and a correctable
> error turning into an uncorrectable. How old is the DRAM in that box?
> Judging by your CPU, it should be a couple of years...

Uncorrectable errors would still be identified by the ECC hardware,
and the box wouldn't be perfectly fine with an "old" kernel.

> > The box reports about one correctable error per week, so I probably
> > have a faulty DIMM, but since the issue only surfaces in VMs while the
> > host system is in perfect working order...
> 
> So it could be that correctable error turns into an uncorrectable one at
> some point. But then you should be getting an exception...

Yes, that would be in the logs.

> > And yes, I am pondering to simply replace the box with an Intel CPU.
> 
> Your CPU is fine, from what I've seen so far.

But we still postulate that the issue does only show on older AMD
CPUs. Otherwise, I wouldn't be the only one making this experience.

> > I go the way of Debian packages since it is easier to handle the
> > crypto file systems when the machine is booting up.
> 
> As long as you're testing the correct bisection kernels...

I am reasonably sure about that, yes.

> > And yes, I think about doing a test reinstall on unencrypted disk to
> > find out whether encryption plays a role, but I currently need the
> > machine to urgently to take it out of serice for half a month, and,
> > again, the host system is in perfect working order, it is just VMs
> > that barf.
> 
> Yeah, I can't reproduce it here and I have a very similar box to yours
> which is otherwise idle, more or less.
> 
> Another fact which points to potentially DIMM going bad...

Do you want me to memtest for 24 hours?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-23 Thread Marc Haber
On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote:
> > Yes, but there are two symptoms. The VM either suffers file system
> > issues (garbage read from files, or an aborted ext4 journal and
> > following ro remount) or it stops dead in its tracks.
> 
> Stops dead? What does that mean exactly? Box is wedged solid and it
> doesn't react to any key presses?

No ping, no reaction on serial console, no reaction on virtual
console, no syslog entries.

> Because if so, this could really be a DRAM going bad and a correctable
> error turning into an uncorrectable. How old is the DRAM in that box?
> Judging by your CPU, it should be a couple of years...

Uncorrectable errors would still be identified by the ECC hardware,
and the box wouldn't be perfectly fine with an "old" kernel.

> > The box reports about one correctable error per week, so I probably
> > have a faulty DIMM, but since the issue only surfaces in VMs while the
> > host system is in perfect working order...
> 
> So it could be that correctable error turns into an uncorrectable one at
> some point. But then you should be getting an exception...

Yes, that would be in the logs.

> > And yes, I am pondering to simply replace the box with an Intel CPU.
> 
> Your CPU is fine, from what I've seen so far.

But we still postulate that the issue does only show on older AMD
CPUs. Otherwise, I wouldn't be the only one making this experience.

> > I go the way of Debian packages since it is easier to handle the
> > crypto file systems when the machine is booting up.
> 
> As long as you're testing the correct bisection kernels...

I am reasonably sure about that, yes.

> > And yes, I think about doing a test reinstall on unencrypted disk to
> > find out whether encryption plays a role, but I currently need the
> > machine to urgently to take it out of serice for half a month, and,
> > again, the host system is in perfect working order, it is just VMs
> > that barf.
> 
> Yeah, I can't reproduce it here and I have a very similar box to yours
> which is otherwise idle, more or less.
> 
> Another fact which points to potentially DIMM going bad...

Do you want me to memtest for 24 hours?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 21, 2016 at 06:51:06PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote:
> > What bothers me is that since I ended up with a "suspect" commit that
> > actually results in a "good" kernel (running for 22 hours now), I must
> > have said "bad" to an actually "good" kernel, which means that I had
> > an unrelated crash or corruption. Is that reasoning correct?
> 
> Hmm, did that "unrelated crash or corruption" have the same symptoms as
> the original one?

Yes, but there are two symptoms. The VM either suffers file system
issues (garbage read from files, or an aborted ext4 journal and
following ro remount) or it stops dead in its tracks.


> > That one qualified as "good" six days ago. I'll retry, maybe I just
> > didn't wait long enough.
> 
> So if the trigger time is varying so much, I'd try to double that to
> make sure I'm fairly certain about each commit I'm testing.

The longest trigger time I have seen was three hours, I tripled that
to nine hours, that probably was not enough.

> Also, this is a single box we're talking about, right? And you're sure
> it hasn't had any corruption issues so far?

It is a single box, and it runs perfectly with kernel 4.4.

> I see you have amd64_edac loading, so it must have ECC DIMMs. Have you
> had any reports in the past of ECC errors in dmesg? Or other MCEs,
> lockups, etc? Can you grep your logs for stuff like "hardware error",
> "mce", "edac" etc? Do a case-insensitive search.

The box reports about one correctable error per week, so I probably
have a faulty DIMM, but since the issue only surfaces in VMs while the
host system is in perfect working order...

And yes, I am pondering to simply replace the box with an Intel CPU.

I see "mce: CPU supports 6 MCE banks" once for each reboot, and about
30 "Machine check events logged" since January. How do I see which
events were logged?

> > "Trying" means make oldconfig, make deb-pkg in my case right? Does it
> > matter what I answer to the numerous config questions that keep coming
> > up during the oldconfig step?
> 
> What I do is:
> 
> $ git bisect <good|bad>
> 
> to mark the current commit after having tested it. Then I do
> 
> $ yes "" | make oldconfig
> 
> to set the new config options.

So you basically select the default for new options.

>  Then
> 
> $ make -j7
> $ make modules_install install
> 
> and reboot into the new kernel. Kernel name will possibly change each
> time so I write down on paper which kernel I'm testing.

I go the way of Debian packages since it is easier to handle the
crypto file systems when the machine is booting up.

And yes, I think about doing a test reinstall on unencrypted disk to
find out whether encryption plays a role, but I currently need the
machine to urgently to take it out of serice for half a month, and,
again, the host system is in perfect working order, it is just VMs
that barf.

>  You can verify when booting it by doing:
> 
> $ dmesg | head
> [0.00] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 
> 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016
> ...
> 
> that date at the end of the line and number "#1" should be current.

I check the date of the package I am installing and the date stamp of
the kernels being installed to /boot. I'm reasonably sure I have that
under control.

> > Would it help to explicitly mark
> > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge
> > gained during the last week is not completely lost?
> 
> I'd do the whole thing again, just to be sure.
> 
> I know, bisection is very time-consuming :-\ And it is particularly
> annoying if it is done on the box I'm normally using daily.

... and if testing a "good" kernel means a day.

> > So I need to git log | grep 46896c73c1a4 and apply the patch again
> > each time the commit is found?
> 
> I think you can let git do that for ya:
> 
> $ git branch --contains 46896c73c1a4
> * (HEAD detached at 46896c73c1a4)
> 
> that lists that the current checked out HEAD contains that commit. If you do
> 
> $ git checkout 46896c73c1a4~1
> 
> then that "(HEAD detached..." line is not in the list of branches
> containing it.

And whenever 46896c73c1a4 is present, I need to apply Paolo's patch,
right?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 21, 2016 at 06:51:06PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote:
> > What bothers me is that since I ended up with a "suspect" commit that
> > actually results in a "good" kernel (running for 22 hours now), I must
> > have said "bad" to an actually "good" kernel, which means that I had
> > an unrelated crash or corruption. Is that reasoning correct?
> 
> Hmm, did that "unrelated crash or corruption" have the same symptoms as
> the original one?

Yes, but there are two symptoms. The VM either suffers file system
issues (garbage read from files, or an aborted ext4 journal and
following ro remount) or it stops dead in its tracks.


> > That one qualified as "good" six days ago. I'll retry, maybe I just
> > didn't wait long enough.
> 
> So if the trigger time is varying so much, I'd try to double that to
> make sure I'm fairly certain about each commit I'm testing.

The longest trigger time I have seen was three hours, I tripled that
to nine hours, that probably was not enough.

> Also, this is a single box we're talking about, right? And you're sure
> it hasn't had any corruption issues so far?

It is a single box, and it runs perfectly with kernel 4.4.

> I see you have amd64_edac loading, so it must have ECC DIMMs. Have you
> had any reports in the past of ECC errors in dmesg? Or other MCEs,
> lockups, etc? Can you grep your logs for stuff like "hardware error",
> "mce", "edac" etc? Do a case-insensitive search.

The box reports about one correctable error per week, so I probably
have a faulty DIMM, but since the issue only surfaces in VMs while the
host system is in perfect working order...

And yes, I am pondering to simply replace the box with an Intel CPU.

I see "mce: CPU supports 6 MCE banks" once for each reboot, and about
30 "Machine check events logged" since January. How do I see which
events were logged?

> > "Trying" means make oldconfig, make deb-pkg in my case right? Does it
> > matter what I answer to the numerous config questions that keep coming
> > up during the oldconfig step?
> 
> What I do is:
> 
> $ git bisect 
> 
> to mark the current commit after having tested it. Then I do
> 
> $ yes "" | make oldconfig
> 
> to set the new config options.

So you basically select the default for new options.

>  Then
> 
> $ make -j7
> $ make modules_install install
> 
> and reboot into the new kernel. Kernel name will possibly change each
> time so I write down on paper which kernel I'm testing.

I go the way of Debian packages since it is easier to handle the
crypto file systems when the machine is booting up.

And yes, I think about doing a test reinstall on unencrypted disk to
find out whether encryption plays a role, but I currently need the
machine to urgently to take it out of serice for half a month, and,
again, the host system is in perfect working order, it is just VMs
that barf.

>  You can verify when booting it by doing:
> 
> $ dmesg | head
> [0.00] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 
> 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016
> ...
> 
> that date at the end of the line and number "#1" should be current.

I check the date of the package I am installing and the date stamp of
the kernels being installed to /boot. I'm reasonably sure I have that
under control.

> > Would it help to explicitly mark
> > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge
> > gained during the last week is not completely lost?
> 
> I'd do the whole thing again, just to be sure.
> 
> I know, bisection is very time-consuming :-\ And it is particularly
> annoying if it is done on the box I'm normally using daily.

... and if testing a "good" kernel means a day.

> > So I need to git log | grep 46896c73c1a4 and apply the patch again
> > each time the commit is found?
> 
> I think you can let git do that for ya:
> 
> $ git branch --contains 46896c73c1a4
> * (HEAD detached at 46896c73c1a4)
> 
> that lists that the current checked out HEAD contains that commit. If you do
> 
> $ git checkout 46896c73c1a4~1
> 
> then that "(HEAD detached..." line is not in the list of branches
> containing it.

And whenever 46896c73c1a4 is present, I need to apply Paolo's patch,
right?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 21, 2016 at 02:37:11PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote:
> > Currently, I cannot explain how this has happened, I must have flagged
> > an actually good kernel as bad from my understanding of git bisect.
> > 
> > Can you give advice how to continue here?
> 
> Yap, sounds like you marked a bisection step incorrectly, which lead
> into the wrong direction. How reliable is your reproducer?

Usually, the crash or filesystem corruption happens in the first 15 to
30 minutes. I have had one instance running three hours before
corrupting, I have therefore upped the run time to nine hours before
saying "this kernel is good".

What bothers me is that since I ended up with a "suspect" commit that
actually results in a "good" kernel (running for 22 hours now), I must
have said "bad" to an actually "good" kernel, which means that I had
an unrelated crash or corruption. Is that reasoning correct?

> Also, do the bisection as Paolo suggested:
> 
> * try 45bdbcfdf241.

That one qualified as "good" six days ago. I'll retry, maybe I just
didn't wait long enough.

"Trying" means make oldconfig, make deb-pkg in my case right? Does it
matter what I answer to the numerous config questions that keep coming
up during the oldconfig step?

> * then do
> 
> $ git bisect start v4.5-rc1 v4.4
> 
> which marks -rc1 as bad and 4.4 as good.

Would it help to explicitly mark
0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge
gained during the last week is not completely lost?

> While you're doing that bisect, do what Paolo said by applying the diff
> here
> 
>   https://lkml.kernel.org/r/570eadd2.8030...@redhat.com
> 
> when the bisection point you're at at each step contains
> 
>   46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> 
> You should apply the above hunk by doing
> 
> $ patch -p1 --dry-run -i /tmp/hunk
> 
> If it applies fine, you then apply it
> 
> $ patch -p1 -i /tmp/hunk
> 
> All clear?

So I need to git log | grep 46896c73c1a4 and apply the patch again
each time the commit is found?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 21, 2016 at 02:37:11PM +0200, Borislav Petkov wrote:
> On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote:
> > Currently, I cannot explain how this has happened, I must have flagged
> > an actually good kernel as bad from my understanding of git bisect.
> > 
> > Can you give advice how to continue here?
> 
> Yap, sounds like you marked a bisection step incorrectly, which lead
> into the wrong direction. How reliable is your reproducer?

Usually, the crash or filesystem corruption happens in the first 15 to
30 minutes. I have had one instance running three hours before
corrupting, I have therefore upped the run time to nine hours before
saying "this kernel is good".

What bothers me is that since I ended up with a "suspect" commit that
actually results in a "good" kernel (running for 22 hours now), I must
have said "bad" to an actually "good" kernel, which means that I had
an unrelated crash or corruption. Is that reasoning correct?

> Also, do the bisection as Paolo suggested:
> 
> * try 45bdbcfdf241.

That one qualified as "good" six days ago. I'll retry, maybe I just
didn't wait long enough.

"Trying" means make oldconfig, make deb-pkg in my case right? Does it
matter what I answer to the numerous config questions that keep coming
up during the oldconfig step?

> * then do
> 
> $ git bisect start v4.5-rc1 v4.4
> 
> which marks -rc1 as bad and 4.4 as good.

Would it help to explicitly mark
0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge
gained during the last week is not completely lost?

> While you're doing that bisect, do what Paolo said by applying the diff
> here
> 
>   https://lkml.kernel.org/r/570eadd2.8030...@redhat.com
> 
> when the bisection point you're at at each step contains
> 
>   46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> 
> You should apply the above hunk by doing
> 
> $ patch -p1 --dry-run -i /tmp/hunk
> 
> If it applies fine, you then apply it
> 
> $ patch -p1 -i /tmp/hunk
> 
> All clear?

So I need to git log | grep 46896c73c1a4 and apply the patch again
each time the commit is found?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 14, 2016 at 07:22:20AM +0200, Marc Haber wrote:
> On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> > On 14/04/2016 00:29, Marc Haber wrote:
> > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> > >> Didn't help, but a fresh look at the list of 4.5 patches helped.
> > >> What the hell was I thinking, I missed write_rdtscp_aux who
> > >> obviously uses MSR_TSC_AUX.
> > > 
> > > I applied this patch to 4.5, which didn't go cleanly, I had to do it
> > > manually, and there is no change in behavior. Sometimes, the Vm just
> > > crashes, but most times the filesystem is remounted ro.
> > 
> > Ok, then I guess bisection is needed.  Please first try commit
> > 45bdbcfdf241.  If it fails, then the bug come together with KVM's merge
> > window changes for 4.5-rc1.  Please apply the patch I sent here when
> > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means
> > that probably that should be the commit you try second; the bisection
> > then becomes much easier).
> 
> I have never bisected this deeply. Can you please give more advice,
> with which two commits to start? And how do I find out whether I am
> "past" a commit? I am als not a git expert, a few command lines would
> be appreciated.

I have tried bisecting, and finally bisect says that the bad commit is
0e749e54244eec87b2a3cd0a4314e60bc6781115 dax: increase granularity of 
dax_clear_blocks() operations

However, a kernel built after
$ git checkout 0e749e54244eec87b2a3cd0a4314e60bc6781115
seems to be fine, at least my VM is running for 15 hours now.

I guess I need to start over again with git bisect good
0e749e54244eec87b2a3cd0a4314e60bc6781115 and git bisect bad v4.5.

Currently, I cannot explain how this has happened, I must have flagged
an actually good kernel as bad from my understanding of git bisect.

Can you give advice how to continue here?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-21 Thread Marc Haber
On Thu, Apr 14, 2016 at 07:22:20AM +0200, Marc Haber wrote:
> On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> > On 14/04/2016 00:29, Marc Haber wrote:
> > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> > >> Didn't help, but a fresh look at the list of 4.5 patches helped.
> > >> What the hell was I thinking, I missed write_rdtscp_aux who
> > >> obviously uses MSR_TSC_AUX.
> > > 
> > > I applied this patch to 4.5, which didn't go cleanly, I had to do it
> > > manually, and there is no change in behavior. Sometimes, the Vm just
> > > crashes, but most times the filesystem is remounted ro.
> > 
> > Ok, then I guess bisection is needed.  Please first try commit
> > 45bdbcfdf241.  If it fails, then the bug come together with KVM's merge
> > window changes for 4.5-rc1.  Please apply the patch I sent here when
> > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means
> > that probably that should be the commit you try second; the bisection
> > then becomes much easier).
> 
> I have never bisected this deeply. Can you please give more advice,
> with which two commits to start? And how do I find out whether I am
> "past" a commit? I am als not a git expert, a few command lines would
> be appreciated.

I have tried bisecting, and finally bisect says that the bad commit is
0e749e54244eec87b2a3cd0a4314e60bc6781115 dax: increase granularity of 
dax_clear_blocks() operations

However, a kernel built after
$ git checkout 0e749e54244eec87b2a3cd0a4314e60bc6781115
seems to be fine, at least my VM is running for 15 hours now.

I guess I need to start over again with git bisect good
0e749e54244eec87b2a3cd0a4314e60bc6781115 and git bisect bad v4.5.

Currently, I cannot explain how this has happened, I must have flagged
an actually good kernel as bad from my understanding of git bisect.

Can you give advice how to continue here?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 07:30:43PM +0200, Paolo Bonzini wrote:
> On 14/04/2016 18:47, Marc Haber wrote:
> >> > Ok, then I guess bisection is needed.  Please first try commit
> >> > 45bdbcfdf241.
> > I did git checkout 45bdbcfdf241 and built the resulting kernel
> > 4.4.0-rc5. This one has now been running for ten hours, which is
> > threefold the longest time that a faulty kernel has held before a VM
> > experienced corruption. So I guess, that one is fine.
> 
> Interesting, this means it's not a KVM bug.  You can ignore my patch
> from yesterday (though we'll get it in anyway).
> 
> > Since 4.5.0-rc1 is bad, I guess I do:
> > 
> > git checkout 45bdbcfdf241
> > git bisect start
> > git bisect good
> > git bisect bad v4.5.0-rc1
> 
> This is correct but you also want to do
> 
> git bisect good 4.4.0
> git bisect good 4.4.0-rc5
> 
> so that bisection basically works through the commits in the merge window.

So I start over from this:

[47/544]mh@fan:~/linux/debug/linux$ git checkout 45bdbcfdf241
HEAD is now at 45bdbcf... kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
[48/545]mh@fan:~/linux/debug/linux$ git bisect start
[49/546]mh@fan:~/linux/debug/linux$ git bisect good
[50/547]mh@fan:~/linux/debug/linux$ git bisect bad v4.5-rc1
Bisecting: 5761 revisions left to test after this (roughly 13 steps)
[cbd88cd4c07f9361914ab7fd7e21c9227986fe68] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
[51/548]mh@fan:~/linux/debug/linux$ git bisect good v4.4
Bisecting: 5468 revisions left to test after this (roughly 12 steps)
[f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
[52/549]mh@fan:~/linux/debug/linux$ git bisect good v4.4-rc5
Bisecting: 5468 revisions left to test after this (roughly 12 steps)
[f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
[53/550]mh@fan:~/linux/debug/linux$

This is going to take a few days as detecting a "bad" version may take
a few hours.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 07:30:43PM +0200, Paolo Bonzini wrote:
> On 14/04/2016 18:47, Marc Haber wrote:
> >> > Ok, then I guess bisection is needed.  Please first try commit
> >> > 45bdbcfdf241.
> > I did git checkout 45bdbcfdf241 and built the resulting kernel
> > 4.4.0-rc5. This one has now been running for ten hours, which is
> > threefold the longest time that a faulty kernel has held before a VM
> > experienced corruption. So I guess, that one is fine.
> 
> Interesting, this means it's not a KVM bug.  You can ignore my patch
> from yesterday (though we'll get it in anyway).
> 
> > Since 4.5.0-rc1 is bad, I guess I do:
> > 
> > git checkout 45bdbcfdf241
> > git bisect start
> > git bisect good
> > git bisect bad v4.5.0-rc1
> 
> This is correct but you also want to do
> 
> git bisect good 4.4.0
> git bisect good 4.4.0-rc5
> 
> so that bisection basically works through the commits in the merge window.

So I start over from this:

[47/544]mh@fan:~/linux/debug/linux$ git checkout 45bdbcfdf241
HEAD is now at 45bdbcf... kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
[48/545]mh@fan:~/linux/debug/linux$ git bisect start
[49/546]mh@fan:~/linux/debug/linux$ git bisect good
[50/547]mh@fan:~/linux/debug/linux$ git bisect bad v4.5-rc1
Bisecting: 5761 revisions left to test after this (roughly 13 steps)
[cbd88cd4c07f9361914ab7fd7e21c9227986fe68] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
[51/548]mh@fan:~/linux/debug/linux$ git bisect good v4.4
Bisecting: 5468 revisions left to test after this (roughly 12 steps)
[f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
[52/549]mh@fan:~/linux/debug/linux$ git bisect good v4.4-rc5
Bisecting: 5468 revisions left to test after this (roughly 12 steps)
[f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
[53/550]mh@fan:~/linux/debug/linux$

This is going to take a few days as detecting a "bad" version may take
a few hours.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.

I did git checkout 45bdbcfdf241 and built the resulting kernel
4.4.0-rc5. This one has now been running for ten hours, which is
threefold the longest time that a faulty kernel has held before a VM
experienced corruption. So I guess, that one is fine.

Since 4.5.0-rc1 is bad, I guess I do:

git checkout 45bdbcfdf241
git bisect start
git bisect good
git bisect bad v4.5.0-rc1

right?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.

I did git checkout 45bdbcfdf241 and built the resulting kernel
4.4.0-rc5. This one has now been running for ten hours, which is
threefold the longest time that a faulty kernel has held before a VM
experienced corruption. So I guess, that one is fine.

Since 4.5.0-rc1 is bad, I guess I do:

git checkout 45bdbcfdf241
git bisect start
git bisect good
git bisect bad v4.5.0-rc1

right?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.

That kernel labels itself as "4.4.0-rc5+", is that correct?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-14 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.

That kernel labels itself as "4.4.0-rc5+", is that correct?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> On 14/04/2016 00:29, Marc Haber wrote:
> > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> >> Didn't help, but a fresh look at the list of 4.5 patches helped.
> >> What the hell was I thinking, I missed write_rdtscp_aux who
> >> obviously uses MSR_TSC_AUX.
> > 
> > I applied this patch to 4.5, which didn't go cleanly, I had to do it
> > manually, and there is no change in behavior. Sometimes, the Vm just
> > crashes, but most times the filesystem is remounted ro.
> 
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.  If it fails, then the bug come together with KVM's merge
> window changes for 4.5-rc1.  Please apply the patch I sent here when
> bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means
> that probably that should be the commit you try second; the bisection
> then becomes much easier).

I have never bisected this deeply. Can you please give more advice,
with which two commits to start? And how do I find out whether I am
"past" a commit? I am als not a git expert, a few command lines would
be appreciated.

Things have not become any easier this night; 4.5-rc7 ran for more
than three hours before it failed :-(

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote:
> On 14/04/2016 00:29, Marc Haber wrote:
> > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> >> Didn't help, but a fresh look at the list of 4.5 patches helped.
> >> What the hell was I thinking, I missed write_rdtscp_aux who
> >> obviously uses MSR_TSC_AUX.
> > 
> > I applied this patch to 4.5, which didn't go cleanly, I had to do it
> > manually, and there is no change in behavior. Sometimes, the Vm just
> > crashes, but most times the filesystem is remounted ro.
> 
> Ok, then I guess bisection is needed.  Please first try commit
> 45bdbcfdf241.  If it fails, then the bug come together with KVM's merge
> window changes for 4.5-rc1.  Please apply the patch I sent here when
> bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means
> that probably that should be the commit you try second; the bisection
> then becomes much easier).

I have never bisected this deeply. Can you please give more advice,
with which two commits to start? And how do I find out whether I am
"past" a commit? I am als not a git expert, a few command lines would
be appreciated.

Things have not become any easier this night; 4.5-rc7 ran for more
than three hours before it failed :-(

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

I applied this patch to 4.5, which didn't go cleanly, I had to do it
manually, and there is no change in behavior. Sometimes, the Vm just
crashes, but most times the filesystem is remounted ro.

[   84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27903
[   84.664877] Aborting journal on device dm-0-8.
[   84.667992] EXT4-fs (dm-0): Remounting filesystem read-only
[   84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: 
Detected aborted journal
[   84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27898
[   84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27895
[   84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27893
[   84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27900
[   84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27889
[   84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27891
[   98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: 
comm aide: deleted inode referenced: 27897
[   98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: 
comm aide: deleted inode referenced: 27904
[   99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27892
[   99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27901
[   99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27890
[   99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27896
[   99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27899
[   99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27894
[  207.132045] serial8250: too much work for irq4
[  207.220043] serial8250: too much work for irq4
[  207.312028] serial8250: too much work for irq4


Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

I applied this patch to 4.5, which didn't go cleanly, I had to do it
manually, and there is no change in behavior. Sometimes, the Vm just
crashes, but most times the filesystem is remounted ro.

[   84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27903
[   84.664877] Aborting journal on device dm-0-8.
[   84.667992] EXT4-fs (dm-0): Remounting filesystem read-only
[   84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: 
Detected aborted journal
[   84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27898
[   84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27895
[   84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27893
[   84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27900
[   84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27889
[   84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm 
aide: deleted inode referenced: 27891
[   98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: 
comm aide: deleted inode referenced: 27897
[   98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: 
comm aide: deleted inode referenced: 27904
[   99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27892
[   99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27901
[   99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27890
[   99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27896
[   99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27899
[   99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm 
aide: deleted inode referenced: 27894
[  207.132045] serial8250: too much work for irq4
[  207.220043] serial8250: too much work for irq4
[  207.312028] serial8250: too much work for irq4


Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

So you want me to apply that to 4.5 od 4.5.1 and try that?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

So you want me to apply that to 4.5 od 4.5.1 and try that?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote:
> On 17/03/2016 19:11, Borislav Petkov wrote:
> > I'm going to try reproducing the issue on a less "important" machine
> > so that bisecting is less painful, but maybe you guys have an idea
> > what's going wrong here.
> 
> No idea, sorry. :(  Bisecting would be great.

Working on that now.

>   I'll also try reproducing and bisecting next week, in the meanwhile
>   just having the host dmesg would help a lot.

Attached. I hope the message will get through to the list.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
[0.00] Linux version 4.5.1-zgws1 (mh@fan) (gcc version 5.3.1 20160409 
(Debian 5.3.1-14) ) #2 SMP Wed Apr 13 06:32:03 UTC 2016
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.5.1-zgws1 
root=/dev/mapper/root ro radeon.modeset=1 splash quiet scsi_mod.scan=sync
[0.00] tseg: 00
[0.00] x86/fpu: Legacy x87 FPU detected.
[0.00] x86/fpu: Using 'lazy' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009abff] usable
[0.00] BIOS-e820: [mem 0x0009ac00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e2000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xcff7] usable
[0.00] BIOS-e820: [mem 0xcff8-0xcff97fff] ACPI data
[0.00] BIOS-e820: [mem 0xcff98000-0xcffb] ACPI NVS
[0.00] BIOS-e820: [mem 0xcffc-0xcfff] reserved
[0.00] BIOS-e820: [mem 0xffa0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00062fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: System manufacturer System Product Name/M5A88-V EVO, BIOS 
160310/12/2012
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x63 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-E uncachable
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base  mask 8000 write-back
[0.00]   1 base 8000 mask C000 write-back
[0.00]   2 base C000 mask F000 write-back
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00063000 aka 25344M
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xd000-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xcff80 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at 
[880ff780]
[0.00] Base memory trampoline at [88094000] 94000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x01a83000, 0x01a83fff] PGTABLE
[0.00] BRK [0x01a84000, 0x01a84fff] PGTABLE
[0.00] BRK [0x01a85000, 0x01a85fff] PGTABLE
[0.00] BRK [0x01a86000, 0x01a86fff] PGTABLE
[0.00] RAMDISK: [mem 0x357bc000-0x36bd5fff]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000FBC30 24 (v02 ACPIAM)
[0.00] ACPI: XSDT 0xCFF80100 54 (v01 101212 XSDT1626 
20121012 MSFT 0097)
[0.00] ACPI: FACP 0xCFF80290 F4 (v03 101212 FACP1626 
20121012 MSFT 0097)
[0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20160108/tbfadt-623)
[0.00] ACPI: DSDT 0xCFF80460 00F14D (v01 A1867  A1867001 
0001 INTL 20060113)
[0.00] ACPI: FACS 0xCFF98000 40
[0.00] ACPI: FACS 0xCFF98000 40
[0.00] ACPI: APIC 0xCFF80390 8C (v01 101212 APIC1626 
20121012 MSFT 0097)
[0.00] ACPI: MCFG 0xCFF80420 3C (v01 101212 OEMMCFG  
20121012 MSFT 0097)
[0.00] ACPI: OEMB 0xCFF98040 72 (v01 101212 OEMB1626 
20121012 MSFT 0097)
[0.00] ACPI: HPET 0xCFF8F8B0 38 (v01 101212 OEMHPET  
20121012 MSFT 0097)
[0.00] ACPI: SSDT 0xCFF8F8F0 000E10 (v01 A M I  POWERNOW 
0001 AMD  0

Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote:
> On 17/03/2016 19:11, Borislav Petkov wrote:
> > I'm going to try reproducing the issue on a less "important" machine
> > so that bisecting is less painful, but maybe you guys have an idea
> > what's going wrong here.
> 
> No idea, sorry. :(  Bisecting would be great.

Working on that now.

>   I'll also try reproducing and bisecting next week, in the meanwhile
>   just having the host dmesg would help a lot.

Attached. I hope the message will get through to the list.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
[0.00] Linux version 4.5.1-zgws1 (mh@fan) (gcc version 5.3.1 20160409 
(Debian 5.3.1-14) ) #2 SMP Wed Apr 13 06:32:03 UTC 2016
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.5.1-zgws1 
root=/dev/mapper/root ro radeon.modeset=1 splash quiet scsi_mod.scan=sync
[0.00] tseg: 00
[0.00] x86/fpu: Legacy x87 FPU detected.
[0.00] x86/fpu: Using 'lazy' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009abff] usable
[0.00] BIOS-e820: [mem 0x0009ac00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e2000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xcff7] usable
[0.00] BIOS-e820: [mem 0xcff8-0xcff97fff] ACPI data
[0.00] BIOS-e820: [mem 0xcff98000-0xcffb] ACPI NVS
[0.00] BIOS-e820: [mem 0xcffc-0xcfff] reserved
[0.00] BIOS-e820: [mem 0xffa0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00062fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: System manufacturer System Product Name/M5A88-V EVO, BIOS 
160310/12/2012
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x63 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-E uncachable
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base  mask 8000 write-back
[0.00]   1 base 8000 mask C000 write-back
[0.00]   2 base C000 mask F000 write-back
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00063000 aka 25344M
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xd000-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xcff80 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at 
[880ff780]
[0.00] Base memory trampoline at [88094000] 94000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x01a83000, 0x01a83fff] PGTABLE
[0.00] BRK [0x01a84000, 0x01a84fff] PGTABLE
[0.00] BRK [0x01a85000, 0x01a85fff] PGTABLE
[0.00] BRK [0x01a86000, 0x01a86fff] PGTABLE
[0.00] RAMDISK: [mem 0x357bc000-0x36bd5fff]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000FBC30 24 (v02 ACPIAM)
[0.00] ACPI: XSDT 0xCFF80100 54 (v01 101212 XSDT1626 
20121012 MSFT 0097)
[0.00] ACPI: FACP 0xCFF80290 F4 (v03 101212 FACP1626 
20121012 MSFT 0097)
[0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20160108/tbfadt-623)
[0.00] ACPI: DSDT 0xCFF80460 00F14D (v01 A1867  A1867001 
0001 INTL 20060113)
[0.00] ACPI: FACS 0xCFF98000 40
[0.00] ACPI: FACS 0xCFF98000 40
[0.00] ACPI: APIC 0xCFF80390 8C (v01 101212 APIC1626 
20121012 MSFT 0097)
[0.00] ACPI: MCFG 0xCFF80420 3C (v01 101212 OEMMCFG  
20121012 MSFT 0097)
[0.00] ACPI: OEMB 0xCFF98040 72 (v01 101212 OEMB1626 
20121012 MSFT 0097)
[0.00] ACPI: HPET 0xCFF8F8B0 38 (v01 101212 OEMHPET  
20121012 MSFT 0097)
[0.00] ACPI: SSDT 0xCFF8F8F0 000E10 (v01 A M I  POWERNOW 
0001 AMD  0

Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Sun, Mar 20, 2016 at 07:58:13PM +0100, Borislav Petkov wrote:
> So I'm not sure what even happens here yet. I haven't seen anything out
> of the ordinary in Marc's dmesg and I wasn't able to reproduce either.
> So would it be good to try with "npt=0"? Sure, why not.

npt=0 goes on the kernel command line of the host or of the guest? Or
is it a KVM option?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Sun, Mar 20, 2016 at 07:58:13PM +0100, Borislav Petkov wrote:
> So I'm not sure what even happens here yet. I haven't seen anything out
> of the ordinary in Marc's dmesg and I wasn't able to reproduce either.
> So would it be good to try with "npt=0"? Sure, why not.

npt=0 goes on the kernel command line of the host or of the guest? Or
is it a KVM option?

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Sun, Mar 20, 2016 at 02:31:58PM +0100, Borislav Petkov wrote:
> On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote:
> > Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
> > (which builds checksums for the entire filesystem, a rather disk-bound
> > activity).
> 
> So I did that and aide ran a whole init and check all the way through
> and all fine. I don't see anything out of the ordinary in your dmesg
> outputs either.
> 
> The next things we should look like is:
> 
> * diff .configs - there might be something there#

Here we go:

[2/501]mh@fan:~$ diff -u0 /boot/config-4.4.6-zgws1 /boot/config-4.5.1-zgws1
--- /boot/config-4.4.6-zgws12016-03-28 15:50:36.0 +0200
+++ /boot/config-4.5.1-zgws12016-04-13 08:32:44.0 +0200
@@ -3 +3 @@
-# Linux/x86_64 4.4.6 Kernel Configuration
+# Linux/x86_64 4.5.1 Kernel Configuration
@@ -14 +13,0 @@
-CONFIG_HAVE_LATENCYTOP_SUPPORT=y
@@ -15,0 +15,4 @@
+CONFIG_ARCH_MMAP_RND_BITS_MIN=28
+CONFIG_ARCH_MMAP_RND_BITS_MAX=32
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
@@ -147,7 +149,0 @@
-# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_FREEZER=y
-CONFIG_CGROUP_PIDS=y
-CONFIG_CGROUP_DEVICE=y
-CONFIG_CPUSETS=y
-CONFIG_PROC_PID_CPUSET=y
-CONFIG_CGROUP_CPUACCT=y
@@ -158,3 +154,3 @@
-# CONFIG_MEMCG_KMEM is not set
-# CONFIG_CGROUP_HUGETLB is not set
-CONFIG_CGROUP_PERF=y
+CONFIG_BLK_CGROUP=y
+# CONFIG_DEBUG_BLK_CGROUP is not set
+CONFIG_CGROUP_WRITEBACK=y
@@ -165,3 +161,9 @@
-CONFIG_BLK_CGROUP=y
-# CONFIG_DEBUG_BLK_CGROUP is not set
-CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_PIDS=y
+CONFIG_CGROUP_FREEZER=y
+# CONFIG_CGROUP_HUGETLB is not set
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_PERF=y
+# CONFIG_CGROUP_DEBUG is not set
@@ -254 +255,0 @@
-CONFIG_HAVE_DMA_ATTRS=y
@@ -288,0 +290,4 @@
+CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
+CONFIG_ARCH_MMAP_RND_BITS=28
+CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
@@ -377,0 +383 @@
+CONFIG_X86_FAST_FEATURE_TESTS=y
@@ -383 +389 @@
-CONFIG_IOSF_MBI=m
+CONFIG_IOSF_MBI=y
@@ -390,0 +397 @@
+# CONFIG_QUEUED_LOCK_STAT is not set
@@ -769,0 +777 @@
+# CONFIG_VMD is not set
@@ -772,0 +781 @@
+CONFIG_NET_EGRESS=y
@@ -824,0 +834 @@
+# CONFIG_INET_DIAG_DESTROY is not set
@@ -945,0 +956,3 @@
+CONFIG_NF_DUP_NETDEV=m
+CONFIG_NFT_DUP_NETDEV=m
+CONFIG_NFT_FWD_NETDEV=m
@@ -1252,0 +1266 @@
+# CONFIG_6LOWPAN_DEBUGFS is not set
@@ -1344,0 +1359 @@
+CONFIG_SOCK_CGROUP_DATA=y
@@ -1411 +1425,0 @@
-CONFIG_WEXT_SPY=y
@@ -1423,5 +1437 @@
-CONFIG_LIB80211=m
-CONFIG_LIB80211_CRYPT_WEP=m
-CONFIG_LIB80211_CRYPT_CCMP=m
-CONFIG_LIB80211_CRYPT_TKIP=m
-# CONFIG_LIB80211_DEBUG is not set
+# CONFIG_LIB80211 is not set
@@ -1469 +1479,2 @@
-# CONFIG_NFC_ST_NCI is not set
+# CONFIG_NFC_ST_NCI_I2C is not set
+# CONFIG_NFC_ST_NCI_SPI is not set
@@ -1616,2 +1627,2 @@
-CONFIG_PARPORT_PC=m
-CONFIG_PARPORT_SERIAL=m
+CONFIG_PARPORT_PC=y
+CONFIG_PARPORT_SERIAL=y
@@ -1619 +1630 @@
-CONFIG_PARPORT_PC_SUPERIO=y
+# CONFIG_PARPORT_PC_SUPERIO is not set
@@ -1968,0 +1980 @@
+# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -1971 +1982,0 @@
-# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -2131,0 +2143 @@
+# CONFIG_NET_VENDOR_NETRONOME is not set
@@ -2263,43 +2275,6 @@
-# CONFIG_PCMCIA_RAYCS is not set
-# CONFIG_LIBERTAS_THINFIRM is not set
-# CONFIG_AIRO is not set
-# CONFIG_ATMEL is not set
-# CONFIG_AT76C50X_USB is not set
-# CONFIG_AIRO_CS is not set
-# CONFIG_PCMCIA_WL3501 is not set
-# CONFIG_PRISM54 is not set
-# CONFIG_USB_ZD1201 is not set
-# CONFIG_USB_NET_RNDIS_WLAN is not set
-# CONFIG_ADM8211 is not set
-# CONFIG_RTL8180 is not set
-# CONFIG_RTL8187 is not set
-# CONFIG_MAC80211_HWSIM is not set
-# CONFIG_MWL8K is not set
-# CONFIG_ATH_CARDS is not set
-CONFIG_B43=m
-CONFIG_B43_BCMA=y
-CONFIG_B43_SSB=y
-CONFIG_B43_BUSES_BCMA_AND_SSB=y
-# CONFIG_B43_BUSES_BCMA is not set
-# CONFIG_B43_BUSES_SSB is not set
-CONFIG_B43_PCI_AUTOSELECT=y
-CONFIG_B43_PCICORE_AUTOSELECT=y
-CONFIG_B43_SDIO=y
-CONFIG_B43_BCMA_PIO=y
-CONFIG_B43_PIO=y
-CONFIG_B43_PHY_G=y
-CONFIG_B43_PHY_N=y
-CONFIG_B43_PHY_LP=y
-CONFIG_B43_PHY_HT=y
-CONFIG_B43_LEDS=y
-CONFIG_B43_HWRNG=y
-# CONFIG_B43_DEBUG is not set
-# CONFIG_B43LEGACY is not set
-# CONFIG_BRCMSMAC is not set
-# CONFIG_BRCMFMAC is not set
-CONFIG_HOSTAP=m
-CONFIG_HOSTAP_FIRMWARE=y
-# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set
-CONFIG_HOSTAP_PLX=m
-CONFIG_HOSTAP_PCI=m
-CONFIG_HOSTAP_CS=m
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+CONFIG_WLAN_VENDOR_INTEL=y
@@ -2307,0 +2283,2 @@
+# CONFIG_IWL4965 is not set
+# CONFIG_IWL3945 is not set
@@ -2321,14 +2298,13 @@
-# CONFIG_IWL4965 is not set
-# CONFIG_IWL3945 is not set
-# CONFIG_LIBERTAS is not set
-# CONFIG_HERMES is not set
-# CONFIG_P54_

Re: Major KVM issues with kernel 4.5 on the host

2016-04-13 Thread Marc Haber
On Sun, Mar 20, 2016 at 02:31:58PM +0100, Borislav Petkov wrote:
> On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote:
> > Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
> > (which builds checksums for the entire filesystem, a rather disk-bound
> > activity).
> 
> So I did that and aide ran a whole init and check all the way through
> and all fine. I don't see anything out of the ordinary in your dmesg
> outputs either.
> 
> The next things we should look like is:
> 
> * diff .configs - there might be something there#

Here we go:

[2/501]mh@fan:~$ diff -u0 /boot/config-4.4.6-zgws1 /boot/config-4.5.1-zgws1
--- /boot/config-4.4.6-zgws12016-03-28 15:50:36.0 +0200
+++ /boot/config-4.5.1-zgws12016-04-13 08:32:44.0 +0200
@@ -3 +3 @@
-# Linux/x86_64 4.4.6 Kernel Configuration
+# Linux/x86_64 4.5.1 Kernel Configuration
@@ -14 +13,0 @@
-CONFIG_HAVE_LATENCYTOP_SUPPORT=y
@@ -15,0 +15,4 @@
+CONFIG_ARCH_MMAP_RND_BITS_MIN=28
+CONFIG_ARCH_MMAP_RND_BITS_MAX=32
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
@@ -147,7 +149,0 @@
-# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_FREEZER=y
-CONFIG_CGROUP_PIDS=y
-CONFIG_CGROUP_DEVICE=y
-CONFIG_CPUSETS=y
-CONFIG_PROC_PID_CPUSET=y
-CONFIG_CGROUP_CPUACCT=y
@@ -158,3 +154,3 @@
-# CONFIG_MEMCG_KMEM is not set
-# CONFIG_CGROUP_HUGETLB is not set
-CONFIG_CGROUP_PERF=y
+CONFIG_BLK_CGROUP=y
+# CONFIG_DEBUG_BLK_CGROUP is not set
+CONFIG_CGROUP_WRITEBACK=y
@@ -165,3 +161,9 @@
-CONFIG_BLK_CGROUP=y
-# CONFIG_DEBUG_BLK_CGROUP is not set
-CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_PIDS=y
+CONFIG_CGROUP_FREEZER=y
+# CONFIG_CGROUP_HUGETLB is not set
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_PERF=y
+# CONFIG_CGROUP_DEBUG is not set
@@ -254 +255,0 @@
-CONFIG_HAVE_DMA_ATTRS=y
@@ -288,0 +290,4 @@
+CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
+CONFIG_ARCH_MMAP_RND_BITS=28
+CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
@@ -377,0 +383 @@
+CONFIG_X86_FAST_FEATURE_TESTS=y
@@ -383 +389 @@
-CONFIG_IOSF_MBI=m
+CONFIG_IOSF_MBI=y
@@ -390,0 +397 @@
+# CONFIG_QUEUED_LOCK_STAT is not set
@@ -769,0 +777 @@
+# CONFIG_VMD is not set
@@ -772,0 +781 @@
+CONFIG_NET_EGRESS=y
@@ -824,0 +834 @@
+# CONFIG_INET_DIAG_DESTROY is not set
@@ -945,0 +956,3 @@
+CONFIG_NF_DUP_NETDEV=m
+CONFIG_NFT_DUP_NETDEV=m
+CONFIG_NFT_FWD_NETDEV=m
@@ -1252,0 +1266 @@
+# CONFIG_6LOWPAN_DEBUGFS is not set
@@ -1344,0 +1359 @@
+CONFIG_SOCK_CGROUP_DATA=y
@@ -1411 +1425,0 @@
-CONFIG_WEXT_SPY=y
@@ -1423,5 +1437 @@
-CONFIG_LIB80211=m
-CONFIG_LIB80211_CRYPT_WEP=m
-CONFIG_LIB80211_CRYPT_CCMP=m
-CONFIG_LIB80211_CRYPT_TKIP=m
-# CONFIG_LIB80211_DEBUG is not set
+# CONFIG_LIB80211 is not set
@@ -1469 +1479,2 @@
-# CONFIG_NFC_ST_NCI is not set
+# CONFIG_NFC_ST_NCI_I2C is not set
+# CONFIG_NFC_ST_NCI_SPI is not set
@@ -1616,2 +1627,2 @@
-CONFIG_PARPORT_PC=m
-CONFIG_PARPORT_SERIAL=m
+CONFIG_PARPORT_PC=y
+CONFIG_PARPORT_SERIAL=y
@@ -1619 +1630 @@
-CONFIG_PARPORT_PC_SUPERIO=y
+# CONFIG_PARPORT_PC_SUPERIO is not set
@@ -1968,0 +1980 @@
+# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -1971 +1982,0 @@
-# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -2131,0 +2143 @@
+# CONFIG_NET_VENDOR_NETRONOME is not set
@@ -2263,43 +2275,6 @@
-# CONFIG_PCMCIA_RAYCS is not set
-# CONFIG_LIBERTAS_THINFIRM is not set
-# CONFIG_AIRO is not set
-# CONFIG_ATMEL is not set
-# CONFIG_AT76C50X_USB is not set
-# CONFIG_AIRO_CS is not set
-# CONFIG_PCMCIA_WL3501 is not set
-# CONFIG_PRISM54 is not set
-# CONFIG_USB_ZD1201 is not set
-# CONFIG_USB_NET_RNDIS_WLAN is not set
-# CONFIG_ADM8211 is not set
-# CONFIG_RTL8180 is not set
-# CONFIG_RTL8187 is not set
-# CONFIG_MAC80211_HWSIM is not set
-# CONFIG_MWL8K is not set
-# CONFIG_ATH_CARDS is not set
-CONFIG_B43=m
-CONFIG_B43_BCMA=y
-CONFIG_B43_SSB=y
-CONFIG_B43_BUSES_BCMA_AND_SSB=y
-# CONFIG_B43_BUSES_BCMA is not set
-# CONFIG_B43_BUSES_SSB is not set
-CONFIG_B43_PCI_AUTOSELECT=y
-CONFIG_B43_PCICORE_AUTOSELECT=y
-CONFIG_B43_SDIO=y
-CONFIG_B43_BCMA_PIO=y
-CONFIG_B43_PIO=y
-CONFIG_B43_PHY_G=y
-CONFIG_B43_PHY_N=y
-CONFIG_B43_PHY_LP=y
-CONFIG_B43_PHY_HT=y
-CONFIG_B43_LEDS=y
-CONFIG_B43_HWRNG=y
-# CONFIG_B43_DEBUG is not set
-# CONFIG_B43LEGACY is not set
-# CONFIG_BRCMSMAC is not set
-# CONFIG_BRCMFMAC is not set
-CONFIG_HOSTAP=m
-CONFIG_HOSTAP_FIRMWARE=y
-# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set
-CONFIG_HOSTAP_PLX=m
-CONFIG_HOSTAP_PCI=m
-CONFIG_HOSTAP_CS=m
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+CONFIG_WLAN_VENDOR_INTEL=y
@@ -2307,0 +2283,2 @@
+# CONFIG_IWL4965 is not set
+# CONFIG_IWL3945 is not set
@@ -2321,14 +2298,13 @@
-# CONFIG_IWL4965 is not set
-# CONFIG_IWL3945 is not set
-# CONFIG_LIBERTAS is not set
-# CONFIG_HERMES is not set
-# CONFIG_P54_

Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"

2016-04-13 Thread Marc Haber
On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote:
> This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
> due to problems on GeekBox and Banana Pi M1 board when
> connected to a real transceiver instead of a switch via
> fixed-link.

This reversal is still needed in Linux 4.5.1 on Banana Pi.

Please consider including it in Linux 4.5.2.

Greetings
Marc

> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavall...@st.com>
> Cc: Gabriel Fernandez <gabriel.fernan...@linaro.org>
> Cc: Andreas Färber <afaer...@suse.de>
> Cc: Frank Schäfer <fschaefer@googlemail.com>
> Cc: Dinh Nguyen <dinh.li...@gmail.com>
> Cc: David S. Miller <da...@davemloft.net>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   11 ++-
>  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |9 +
>  include/linux/stmmac.h |1 -
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index ea76129..af09ced 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
>   struct stmmac_priv *priv = netdev_priv(ndev);
>   struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
>   int addr, found;
> - struct device_node *mdio_node = priv->plat->mdio_node;
> + struct device_node *mdio_node = NULL;
> + struct device_node *child_node = NULL;
>  
>   if (!mdio_bus_data)
>   return 0;
>  
>   if (IS_ENABLED(CONFIG_OF)) {
> + for_each_child_of_node(priv->device->of_node, child_node) {
> + if (of_device_is_compatible(child_node,
> + "snps,dwmac-mdio")) {
> + mdio_node = child_node;
> + break;
> + }
> + }
> +
>   if (mdio_node) {
>   netdev_dbg(ndev, "FOUND MDIO subnode\n");
>   } else {
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index dcbd2a1..9cf181f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> const char **mac)
>   struct device_node *np = pdev->dev.of_node;
>   struct plat_stmmacenet_data *plat;
>   struct stmmac_dma_cfg *dma_cfg;
> - struct device_node *child_node = NULL;
>  
>   plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL);
>   if (!plat)
> @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> const char **mac)
>   plat->phy_node = of_node_get(np);
>   }
>  
> - for_each_child_of_node(np, child_node)
> - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
> - plat->mdio_node = child_node;
> - break;
> - }
> -
>   /* "snps,phy-addr" is not a standard property. Mark it as deprecated
>* and warn of its use. Remove this when phy node support is added.
>*/
>   if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
>   dev_warn(>dev, "snps,phy-addr property is deprecated\n");
>  
> - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
> + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
>   plat->mdio_bus_data = NULL;
>   else
>   plat->mdio_bus_data =
> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> index 4bcf5a6..6e53fa8 100644
> --- a/include/linux/stmmac.h
> +++ b/include/linux/stmmac.h
> @@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
>   int interface;
>   struct stmmac_mdio_bus_data *mdio_bus_data;
>   struct device_node *phy_node;
> - struct device_node *mdio_node;
>   struct stmmac_dma_cfg *dma_cfg;
>   int clk_csr;
>   int has_gmac;
> -- 
> 1.7.4.4
> 

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"

2016-04-13 Thread Marc Haber
On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote:
> This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
> due to problems on GeekBox and Banana Pi M1 board when
> connected to a real transceiver instead of a switch via
> fixed-link.

This reversal is still needed in Linux 4.5.1 on Banana Pi.

Please consider including it in Linux 4.5.2.

Greetings
Marc

> 
> Signed-off-by: Giuseppe Cavallaro 
> Cc: Gabriel Fernandez 
> Cc: Andreas Färber 
> Cc: Frank Schäfer 
> Cc: Dinh Nguyen 
> Cc: David S. Miller 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   11 ++-
>  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |9 +
>  include/linux/stmmac.h |1 -
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index ea76129..af09ced 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
>   struct stmmac_priv *priv = netdev_priv(ndev);
>   struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
>   int addr, found;
> - struct device_node *mdio_node = priv->plat->mdio_node;
> + struct device_node *mdio_node = NULL;
> + struct device_node *child_node = NULL;
>  
>   if (!mdio_bus_data)
>   return 0;
>  
>   if (IS_ENABLED(CONFIG_OF)) {
> + for_each_child_of_node(priv->device->of_node, child_node) {
> + if (of_device_is_compatible(child_node,
> + "snps,dwmac-mdio")) {
> + mdio_node = child_node;
> + break;
> + }
> + }
> +
>   if (mdio_node) {
>   netdev_dbg(ndev, "FOUND MDIO subnode\n");
>   } else {
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index dcbd2a1..9cf181f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> const char **mac)
>   struct device_node *np = pdev->dev.of_node;
>   struct plat_stmmacenet_data *plat;
>   struct stmmac_dma_cfg *dma_cfg;
> - struct device_node *child_node = NULL;
>  
>   plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL);
>   if (!plat)
> @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
> const char **mac)
>   plat->phy_node = of_node_get(np);
>   }
>  
> - for_each_child_of_node(np, child_node)
> - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
> - plat->mdio_node = child_node;
> - break;
> - }
> -
>   /* "snps,phy-addr" is not a standard property. Mark it as deprecated
>* and warn of its use. Remove this when phy node support is added.
>*/
>   if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
>   dev_warn(>dev, "snps,phy-addr property is deprecated\n");
>  
> - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
> + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
>   plat->mdio_bus_data = NULL;
>   else
>   plat->mdio_bus_data =
> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> index 4bcf5a6..6e53fa8 100644
> --- a/include/linux/stmmac.h
> +++ b/include/linux/stmmac.h
> @@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
>   int interface;
>   struct stmmac_mdio_bus_data *mdio_bus_data;
>   struct device_node *phy_node;
> - struct device_node *mdio_node;
>   struct stmmac_dma_cfg *dma_cfg;
>   int clk_csr;
>   int has_gmac;
> -- 
> 1.7.4.4
> 

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: ext4_file_open: Inconsistent encryption contexts (commit ff978b09f973) breaking Docker

2016-03-31 Thread Marc Haber
On Mon, Mar 14, 2016 at 11:27:35AM +0100, Miklos Szeredi wrote:
> Could you please try the below patch?

I can confirm that I have the issue on kernel 4.5 with Debian's
schroot using overlayfs, and that this patch fixes it.

It should be in 4.5.1.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: ext4_file_open: Inconsistent encryption contexts (commit ff978b09f973) breaking Docker

2016-03-31 Thread Marc Haber
On Mon, Mar 14, 2016 at 11:27:35AM +0100, Miklos Szeredi wrote:
> Could you please try the below patch?

I can confirm that I have the issue on kernel 4.5 with Debian's
schroot using overlayfs, and that this patch fixes it.

It should be in 4.5.1.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Major KVM issues with kernel 4.5 on the host

2016-03-19 Thread Marc Haber
Hi,

I have a (semi-productive[1]) system ("host") running Debian unstable.
On this system, a few VMs (Debian unstable, Debian testing) ("vm1",
"vm2", "vm3") are running. I roll my own kernels and take vanilla
upstream sources. No distribution patches.

Since host was updated to Kernel 4.5, the VMs have started acting up.
All of them. The range of strangeness begins with "relocation error,
system halted" on system startup, corrupted data files on disk,
filesystems remounted read-only, libraries rejected with "invalid ELF
format", binaries segfaulting all of a sudden. Downgrading host to
kernel 4.4.5 magically fixed all those issues.

Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs
errors, logged in one of the VMs:

Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #415065: comm aide: deleted inode referenced: 546538
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #415065: comm aide: deleted inode referenced: 546530
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546543: comm aide: bad extra_isize (44800 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: 
inode #546568: comm aide: bogus i_mode (144)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #546548: comm aide: deleted inode referenced: 546564
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #546548: comm aide: deleted inode referenced: 546562
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546563: comm aide: bad extra_isize (6464 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: 
inode #546561: comm aide: bogus i_mode (0)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546529: comm aide: bad extra_isize (1152 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): 
ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784

I'm going to try reproducing the issue on a less "important" machine
so that bisecting is less painful, but maybe you guys have an idea
what's going wrong here.

jftr, kernel 4.5 in guest and in standalone systems seems to be
unproblematic.

Greetings
Marc


[1] my main workstation, running enough services for the local network
that disturbances in its operation cause reasonable discomfort, but not the
Enterprise kind of "productive"

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Major KVM issues with kernel 4.5 on the host

2016-03-19 Thread Marc Haber
Hi,

I have a (semi-productive[1]) system ("host") running Debian unstable.
On this system, a few VMs (Debian unstable, Debian testing) ("vm1",
"vm2", "vm3") are running. I roll my own kernels and take vanilla
upstream sources. No distribution patches.

Since host was updated to Kernel 4.5, the VMs have started acting up.
All of them. The range of strangeness begins with "relocation error,
system halted" on system startup, corrupted data files on disk,
filesystems remounted read-only, libraries rejected with "invalid ELF
format", binaries segfaulting all of a sudden. Downgrading host to
kernel 4.4.5 magically fixed all those issues.

Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs
errors, logged in one of the VMs:

Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #415065: comm aide: deleted inode referenced: 546538
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #415065: comm aide: deleted inode referenced: 546530
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546543: comm aide: bad extra_isize (44800 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: 
inode #546568: comm aide: bogus i_mode (144)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #546548: comm aide: deleted inode referenced: 546564
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: 
inode #546548: comm aide: deleted inode referenced: 546562
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546563: comm aide: bad extra_isize (6464 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: 
inode #546561: comm aide: bogus i_mode (0)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: 
inode #546529: comm aide: bad extra_isize (1152 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): 
ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784

I'm going to try reproducing the issue on a less "important" machine
so that bisecting is less painful, but maybe you guys have an idea
what's going wrong here.

jftr, kernel 4.5 in guest and in standalone systems seems to be
unproblematic.

Greetings
Marc


[1] my main workstation, running enough services for the local network
that disturbances in its operation cause reasonable discomfort, but not the
Enterprise kind of "productive"

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-03-18 Thread Marc Haber
Hi Borislav,

On Thu, Mar 17, 2016 at 07:11:28PM +0100, Borislav Petkov wrote:
> Do you have any funky messages in host's dmesg ?

Not that I see.

> Can you upload a full dmesg from both a good and a bad host kernel?

http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5

Hope this helps.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-03-18 Thread Marc Haber
Hi Borislav,

On Thu, Mar 17, 2016 at 07:11:28PM +0100, Borislav Petkov wrote:
> Do you have any funky messages in host's dmesg ?

Not that I see.

> Can you upload a full dmesg from both a good and a bad host kernel?

http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5

Hope this helps.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-03-18 Thread Marc Haber
Hi Borislav,

On Fri, Mar 18, 2016 at 11:04:29PM +0100, Borislav Petkov wrote:
> On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote:
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
> 
> This one I got.
> 
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5
> 
> This one doesn't want:
> 
> HTTP request sent, awaiting response... 403 Forbidden
> 2016-03-18 22:57:46 ERROR 403: Forbidden.

Idiot me. File permissions fixed.

> Anything special you're doing to cause the host kernel to barf which I
> should do here?

Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
(which builds checksums for the entire filesystem, a rather disk-bound
activity).

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


Re: Major KVM issues with kernel 4.5 on the host

2016-03-18 Thread Marc Haber
Hi Borislav,

On Fri, Mar 18, 2016 at 11:04:29PM +0100, Borislav Petkov wrote:
> On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote:
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
> 
> This one I got.
> 
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5
> 
> This one doesn't want:
> 
> HTTP request sent, awaiting response... 403 Forbidden
> 2016-03-18 22:57:46 ERROR 403: Forbidden.

Idiot me. File permissions fixed.

> Anything special you're doing to cause the host kernel to barf which I
> should do here?

Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
(which builds checksums for the entire filesystem, a rather disk-bound
activity).

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421


probable cause found - 3.14 on Xenserver (non-HVM): "cannot allocate memory")

2014-04-22 Thread Marc Haber
Hi,

On Sun, Apr 20, 2014 at 10:56:39PM +0200, Marc Haber wrote:
> Linux 3.14, however, does not yet beyond the initramfs state ("cannot
> allocate memory". This also happens when i set cgroup_disable=memory.

After doing a bisect between "good 3.13" and "bad 3.14", I ended up
with this commit:

6145cfe394a7f138f6b64491c5663f97dba12450 is the first bad commit
commit 6145cfe394a7f138f6b64491c5663f97dba12450
Author: Kees Cook 
Date:   Thu Oct 10 17:18:18 2013 -0700

x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64

On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB),
the upper limit currently, since the kernel fixmap page mappings need
to be moved to use the other 1 GiB (which would be the theoretical
limit when building with -mcmodel=kernel).

Signed-off-by: Kees Cook 
Link: 
http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keesc...@chromium.org
Signed-off-by: H. Peter Anvin 

:04 04 a48a6355e3ccd676027319ff520bf953cb07a0bb 
7beb1fdd7478b6bec2555a364fbfac29d7a5a3c4 M   arch

and, indeed, reverting this commit on plain 3.14.1 makes my Xenserver
boot just fine. When reverting the patch, I took the liberty of
ignoring the REJECT in arch/x86/Kconfig.

I am, however, a bit confused that this patch dates back to october
while the breakage only occurred when 3.14 was released.

Greetings
Marc

-- 
-----
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 31958061
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 31958062
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


probable cause found - 3.14 on Xenserver (non-HVM): cannot allocate memory)

2014-04-22 Thread Marc Haber
Hi,

On Sun, Apr 20, 2014 at 10:56:39PM +0200, Marc Haber wrote:
 Linux 3.14, however, does not yet beyond the initramfs state (cannot
 allocate memory. This also happens when i set cgroup_disable=memory.

After doing a bisect between good 3.13 and bad 3.14, I ended up
with this commit:

6145cfe394a7f138f6b64491c5663f97dba12450 is the first bad commit
commit 6145cfe394a7f138f6b64491c5663f97dba12450
Author: Kees Cook keesc...@chromium.org
Date:   Thu Oct 10 17:18:18 2013 -0700

x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64

On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB),
the upper limit currently, since the kernel fixmap page mappings need
to be moved to use the other 1 GiB (which would be the theoretical
limit when building with -mcmodel=kernel).

Signed-off-by: Kees Cook keesc...@chromium.org
Link: 
http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keesc...@chromium.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com

:04 04 a48a6355e3ccd676027319ff520bf953cb07a0bb 
7beb1fdd7478b6bec2555a364fbfac29d7a5a3c4 M   arch

and, indeed, reverting this commit on plain 3.14.1 makes my Xenserver
boot just fine. When reverting the patch, I took the liberty of
ignoring the REJECT in arch/x86/Kconfig.

I am, however, a bit confused that this patch dates back to october
while the breakage only occurred when 3.14 was released.

Greetings
Marc

-- 
-
Marc Haber | I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things.Winona Ryder | Fon: *49 621 31958061
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 31958062
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >