[qubes-users] Re: AppVms being killed on resume due to clock skew too large

mmoris Sat, 01 Feb 2020 02:28:03 -0800

Same problem again, this time not related to any socket closure.
Apparently related to systemd:
[41911.199732] audit: type=1104 audit(1580516883.707:119): pid=4917 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41920.252871] clocksource: timekeeping watchdog on CPU0: Marking clocksource 
'tsc' as unstable because the skew is too large:
[41920.252927] clocksource: 'xen' wd_now: 2a1620baf67a wd_last: 2a140e3c5f9f 
mask: ffffffffffffffff
[41920.252972] clocksource: 'tsc' cs_now: ffffff88779d4270 cs_last: 
5083a288ea9a mask: ffffffffffffffff
[41920.253013] tsc: Marking TSC unstable due to clocksource watchdog
[41921.161370] audit: type=1100 audit(1580516893.670:120): pid=4955 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:authentication grantors=pam_rootok 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41921.163039] audit: type=1103 audit(1580516893.672:121): pid=4955 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41921.176874] audit: type=1105 audit(1580516893.686:122): pid=4955 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:session_open 
grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41922.205481] audit: type=1106 audit(1580552389.038:123): pid=4955 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:session_close 
grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41922.205554] audit: type=1104 audit(1580552389.038:124): pid=4955 uid=0 
auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok 
acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? 
res=success'
[41932.321374] systemd[4919]: segfault at 640550f11920 ip 0000640550345cbd sp 
00007ffd40e80440 error 6 in systemd[6405502f6000+b7000]
[41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 00 49 8b 
84 24 28 02 00 00 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 00 00 <48> 89 90 
20 02 00 00 49 c7 84 24 28 02 00 00 00 00 00 00 49 c7 84
[41932.321515] audit: type=1701 audit(1580552399.156:125): auid=0 uid=0 gid=0 
ses=4 pid=4919 comm="systemd" exe="/usr/lib/systemd/systemd" sig=11 res=1
[41932.336794] audit: type=1130 audit(1580552399.171:126): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? 
res=success'
[41932.627105] audit: type=1131 audit(1580552399.456:127): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=user@0 comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41932.636551] audit: type=1131 audit(1580552399.471:128): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@0 comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41932.661359] audit: type=1131 audit(1580552399.495:129): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? 
res=success'
[41934.482123] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000080
[41934.482143] PGD 0 P4D 0
[41934.482150] Oops: 0000 [#1] SMP PTI
[41934.482159] CPU: 0 PID: 5002 Comm: Compositor Tainted: G O 
4.19.94-1.pvops.qubes.x86_64 #1
[41934.482178] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482189] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 8b 17 
48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 00 <48> 3b b0 
80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482222] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482232] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: ffffc900011d3ae8
[41934.482246] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: ffffea0002adec00
[41934.482265] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: 000000000001eb39
[41934.482279] R10: 00000000000fa000 R11: ffffffffffffffff R12: ffff8880f9fd5000
[41934.482294] R13: ffffea0002adec00 R14: 0000000000000014 R15: ffff88802f7e7000
[41934.482308] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) 
knlGS:0000000000000000
[41934.482323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482335] CR2: 0000000000000080 CR3: 000000003c9da001 CR4: 00000000003606f0
[41934.482351] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41934.482365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[41934.482380] Call Trace:
[41934.482388] release_pages+0x12c/0x4b0
[41934.482397] tlb_flush_mmu_free+0x36/0x50
[41934.482406] unmap_page_range+0x8f0/0xd00
[41934.482415] unmap_vmas+0x4c/0xa0
[41934.482423] exit_mmap+0xb5/0x1a0
[41934.482432] mmput+0x5f/0x140
[41934.482443] flush_old_exec+0x597/0x6c0
[41934.482451] ? load_elf_phdrs+0x97/0xb0
[41934.482460] load_elf_binary+0x3d9/0x1224
[41934.482468] ? get_acl+0x1a/0x100
[41934.482477] search_binary_handler+0xa6/0x1c0
[41934.482487] __do_execve_file.isra.34+0x587/0x7e0
[41934.482498] __x64_sys_execve+0x34/0x40
[41934.482506] do_syscall_64+0x5b/0x190
[41934.482515] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[41934.482526] RIP: 0033:0x7c1fb7d15acb
[41934.482535] Code: Bad RIP value.
[41934.482543] RSP: 002b:00007c1fa7361b18 EFLAGS: 00000246 ORIG_RAX: 
000000000000003b
[41934.482557] RAX: ffffffffffffffda RBX: 00007c1fa7361b40 RCX: 00007c1fb7d15acb
[41934.482572] RDX: 00007c1fa9b5f800 RSI: 00007c1fa7361b20 RDI: 00007c1fb7a22cd0
[41934.482586] RBP: 00007c1fa7361ba0 R08: 00007c1fa7361b38 R09: 00007c1fa7361b60
[41934.482600] R10: 00007c1fa7361b20 R11: 0000000000000246 R12: 00007c1fa7361bd8
[41934.482615] R13: 0000000000000000 R14: 000000005e355001 R15: 00007c1fa7361bf0
[41934.482630] Modules linked in: ip6table_filter ip6_tables xt_conntrack 
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 libcrc32c intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel 
xen_netfront ghash_clmulni_intel intel_rapl_perf pcspkr u2mfn(O) xenfs 
xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn overlay xen_blkfront
[41934.482694] CR2: 0000000000000080
[41934.482703] ---[ end trace f587889938477959 ]---
[41934.482714] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482724] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 8b 17 
48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 00 <48> 3b b0 
80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482756] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482766] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: ffffc900011d3ae8
[41934.482780] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: ffffea0002adec00
[41934.482794] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: 000000000001eb39
[41934.482808] R10: 00000000000fa000 R11: ffffffffffffffff R12: ffff8880f9fd5000
[41934.482822] R13: ffffea0002adec00 R14: 0000000000000014 R15: ffff88802f7e7000
[41934.482837] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) 
knlGS:0000000000000000
[41934.482851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482863] CR2: 00007c1fb7d15aa1 CR3: 000000003c9da001 CR4: 00000000003606f0
[41934.482877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41934.482891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[41934.482905] Kernel panic - not syncing: Fatal exception
[41936.108632] Shutting down cpus with NMI
[41936.108774] Kernel Offset: disabled
Any idea what might be causing this issue?


Thanks.
January 31, 2020 5:08 PM, [email protected] (mailto:[email protected]) wrote:
Many thanks for the suggestion!
I'm not using any proprietary modules of any sort, below are the only modules 
that I are loaded in the AppVM that was killed (as you can see nothing really 
special):

Module Size Used by
fuse 126976 3
ip6table_filter 16384 1
ip6_tables 32768 1 ip6table_filter
xt_conntrack 16384 2
ipt_MASQUERADE 16384 1
iptable_nat 16384 1
nf_nat_ipv4 16384 2 ipt_MASQUERADE,iptable_nat
nf_nat 36864 1 nf_nat_ipv4
nf_conntrack 163840 4 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4
nf_defrag_ipv6 20480 1 nf_conntrack
nf_defrag_ipv4 16384 1 nf_conntrack
libcrc32c 16384 2 nf_conntrack,nf_nat
intel_rapl 24576 0
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
crc32c_intel 24576 1
ghash_clmulni_intel 16384 0
xen_netfront 32768 0
intel_rapl_perf 16384 0
pcspkr 16384 0
xenfs 16384 1
u2mfn 16384 0
xen_privcmd 24576 17 xenfs
xen_gntdev 24576 1
xen_gntalloc 16384 5
xen_blkback 49152 0
xen_evtchn 16384 6
overlay 122880 1
xen_blkfront 45056 6

The closesure of the socket probably is related with borgmatic (that I'm using 
as my backup mechanism for the AppVms). But I don't think its related, since I 
this enabled only in a few machines, and even the ones that are not using 
borgmatic are terminated on resume.

I'm runing out of ideas on this. What I do noticed though is that if the resume 
is done immediately after the suspend the resume works fins without any AppVM 
being killed, which seems to indicate perhaps an issue with the clock (that's 
the only thing that comes to mind, specially given the warning above) but I'm 
not sure if this is the root cause.

Any more suggestions would be really appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/e1c4ff725710a0e5bfa9478eb94fab0f%40disroot.org.

[qubes-users] Re: AppVms being killed on resume due to clock skew too large

Reply via email to