Re: [qubes-users] Re: AppVms being killed on resume due to clock skew too large

mmoris Sat, 01 Feb 2020 10:15:14 -0800

It was a typo sorry about that. I'm now replying to the list.

I'm using fedora 30 and debian 9, both templates and AppVMS uses pvh 
virtualization.
I have nothing fancy on my setup, this is what it comes out of the box.


Do you have any idea Mike about what might be originating this?

Thanks!


February 1, 2020 1:37 PM, "Mike Keehan" <m...@keehan.net> wrote:

> On 2/1/20 10:27 AM, mmo...@disroot.org wrote:
> 
>> Same problem again, this time not related to any socket closure.
>> Apparently related to systemd:
>>> [41911.199732] audit: type=1104 audit(1580516883.707:119): pid=4917 > uid=0 
>>> auid=4294967295
>> ses=4294967295 msg='op=PAM:setcred > grantors=pam_rootok acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" > hostname=? addr=? terminal=? res=success'
>> [41920.252871] clocksource: timekeeping watchdog on CPU0: Marking > 
>> clocksource 'tsc' as unstable
>> because the skew is too large:
>> [41920.252927] clocksource: 'xen' wd_now: 2a1620baf67a wd_last: > 
>> 2a140e3c5f9f mask:
>> ffffffffffffffff
>> [41920.252972] clocksource: 'tsc' cs_now: ffffff88779d4270 cs_last: > 
>> 5083a288ea9a mask:
>> ffffffffffffffff
>> [41920.253013] tsc: Marking TSC unstable due to clocksource watchdog
>> [41921.161370] audit: type=1100 audit(1580516893.670:120): pid=4955 > uid=0 
>> auid=4294967295
>> ses=4294967295 msg='op=PAM:authentication > grantors=pam_rootok acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" > hostname=? addr=? terminal=? res=success'
>> [41921.163039] audit: type=1103 audit(1580516893.672:121): pid=4955 > uid=0 
>> auid=4294967295
>> ses=4294967295 msg='op=PAM:setcred > grantors=pam_rootok acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" > hostname=? addr=? terminal=? res=success'
>> [41921.176874] audit: type=1105 audit(1580516893.686:122): pid=4955 > uid=0 
>> auid=4294967295
>> ses=4294967295 msg='op=PAM:session_open >
>> grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog > 
>> acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? > terminal=? res=success'
>> [41922.205481] audit: type=1106 audit(1580552389.038:123): pid=4955 > uid=0 
>> auid=4294967295
>> ses=4294967295 msg='op=PAM:session_close >
>> grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog > 
>> acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? > terminal=? res=success'
>> [41922.205554] audit: type=1104 audit(1580552389.038:124): pid=4955 > uid=0 
>> auid=4294967295
>> ses=4294967295 msg='op=PAM:setcred > grantors=pam_rootok acct="root"
>> exe="/usr/lib/qubes/qrexec-agent" > hostname=? addr=? terminal=? res=success'
>> *[41932.321374] systemd[4919]: segfault at 640550f11920 ip > 
>> 0000640550345cbd sp 00007ffd40e80440
>> error 6 in systemd[6405502f6000+b7000]
>> [41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 00 > 49 
>> 8b 84 24 28 02 00 00
>> 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 00 > 00 <48> 89 90 20 02 00 00 
>> 49 c7 84 24 28 02 00 00
>> 00 00 00 00 49 c7 84*
>> [41932.321515] audit: type=1701 audit(1580552399.156:125): auid=0 uid=0 > 
>> gid=0 ses=4 pid=4919
>> comm="systemd" exe="/usr/lib/systemd/systemd" > sig=11 res=1
>> [41932.336794] audit: type=1130 audit(1580552399.171:126): pid=1 uid=0 > 
>> auid=4294967295
>> ses=4294967295 msg='unit=systemd-coredump@0-4990-0 > comm="systemd" 
>> exe="/usr/lib/systemd/systemd"
>> hostname=? addr=? > terminal=? res=success'
>> [41932.627105] audit: type=1131 audit(1580552399.456:127): pid=1 uid=0 > 
>> auid=4294967295
>> ses=4294967295 msg='unit=user@0 comm="systemd" > 
>> exe="/usr/lib/systemd/systemd" hostname=? addr=?
>> terminal=? res=success'
>> [41932.636551] audit: type=1131 audit(1580552399.471:128): pid=1 uid=0 > 
>> auid=4294967295
>> ses=4294967295 msg='unit=user-runtime-dir@0 > comm="systemd" 
>> exe="/usr/lib/systemd/systemd"
>> hostname=? addr=? > terminal=? res=success'
>> [41932.661359] audit: type=1131 audit(1580552399.495:129): pid=1 uid=0 > 
>> auid=4294967295
>> ses=4294967295 msg='unit=systemd-coredump@0-4990-0 > comm="systemd" 
>> exe="/usr/lib/systemd/systemd"
>> hostname=? addr=? > terminal=? res=success'
>> [41934.482123] BUG: unable to handle kernel NULL pointer dereference at > 
>> 0000000000000080
>> [41934.482143] PGD 0 P4D 0
>> [41934.482150] Oops: 0000 [#1] SMP PTI
>> [41934.482159] CPU: 0 PID: 5002 Comm: Compositor Tainted: G O > 
>> 4.19.94-1.pvops.qubes.x86_64 #1
>> [41934.482178] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
>> [41934.482189] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 > 8b 
>> 17 48 85 c0 48 0f 44
>> 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 > 00 <48> 3b b0 80 00 00 00 
>> 75 12 f3 c3 48 8d 86 a0
>> a1 02 00 48 3b b0 80
>> [41934.482222] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
>> [41934.482232] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: > 
>> ffffc900011d3ae8
>> [41934.482246] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: > 
>> ffffea0002adec00
>> [41934.482265] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: > 
>> 000000000001eb39
>> [41934.482279] R10: 00000000000fa000 R11: ffffffffffffffff R12: > 
>> ffff8880f9fd5000
>> [41934.482294] R13: ffffea0002adec00 R14: 0000000000000014 R15: > 
>> ffff88802f7e7000
>> [41934.482308] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) > 
>> knlGS:0000000000000000
>> [41934.482323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [41934.482335] CR2: 0000000000000080 CR3: 000000003c9da001 CR4: > 
>> 00000000003606f0
>> [41934.482351] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 
>> 0000000000000000
>> [41934.482365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 
>> 0000000000000400
>> [41934.482380] Call Trace:
>> [41934.482388] release_pages+0x12c/0x4b0
>> [41934.482397] tlb_flush_mmu_free+0x36/0x50
>> [41934.482406] unmap_page_range+0x8f0/0xd00
>> [41934.482415] unmap_vmas+0x4c/0xa0
>> [41934.482423] exit_mmap+0xb5/0x1a0
>> [41934.482432] mmput+0x5f/0x140
>> [41934.482443] flush_old_exec+0x597/0x6c0
>> [41934.482451] ? load_elf_phdrs+0x97/0xb0
>> [41934.482460] load_elf_binary+0x3d9/0x1224
>> [41934.482468] ? get_acl+0x1a/0x100
>> [41934.482477] search_binary_handler+0xa6/0x1c0
>> [41934.482487] __do_execve_file.isra.34+0x587/0x7e0
>> [41934.482498] __x64_sys_execve+0x34/0x40
>> [41934.482506] do_syscall_64+0x5b/0x190
>> [41934.482515] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [41934.482526] RIP: 0033:0x7c1fb7d15acb
>> [41934.482535] Code: Bad RIP value.
>> [41934.482543] RSP: 002b:00007c1fa7361b18 EFLAGS: 00000246 ORIG_RAX: > 
>> 000000000000003b
>> [41934.482557] RAX: ffffffffffffffda RBX: 00007c1fa7361b40 RCX: > 
>> 00007c1fb7d15acb
>> [41934.482572] RDX: 00007c1fa9b5f800 RSI: 00007c1fa7361b20 RDI: > 
>> 00007c1fb7a22cd0
>> [41934.482586] RBP: 00007c1fa7361ba0 R08: 00007c1fa7361b38 R09: > 
>> 00007c1fa7361b60
>> [41934.482600] R10: 00007c1fa7361b20 R11: 0000000000000246 R12: > 
>> 00007c1fa7361bd8
>> [41934.482615] R13: 0000000000000000 R14: 000000005e355001 R15: > 
>> 00007c1fa7361bf0
>> [41934.482630] Modules linked in: ip6table_filter ip6_tables > xt_conntrack 
>> ipt_MASQUERADE
>> iptable_nat nf_nat_ipv4 nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 
>> libcrc32c intel_rapl
>> crct10dif_pclmul > crc32_pclmul crc32c_intel xen_netfront 
>> ghash_clmulni_intel > intel_rapl_perf
>> pcspkr u2mfn(O) xenfs xen_privcmd xen_gntdev > xen_gntalloc xen_blkback 
>> xen_evtchn overlay
>> xen_blkfront
>> [41934.482694] CR2: 0000000000000080
>> [41934.482703] ---[ end trace f587889938477959 ]---
>> [41934.482714] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
>> [41934.482724] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 > 8b 
>> 17 48 85 c0 48 0f 44
>> 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 > 00 <48> 3b b0 80 00 00 00 
>> 75 12 f3 c3 48 8d 86 a0
>> a1 02 00 48 3b b0 80
>> [41934.482756] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
>> [41934.482766] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: > 
>> ffffc900011d3ae8
>> [41934.482780] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: > 
>> ffffea0002adec00
>> [41934.482794] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: > 
>> 000000000001eb39
>> [41934.482808] R10: 00000000000fa000 R11: ffffffffffffffff R12: > 
>> ffff8880f9fd5000
>> [41934.482822] R13: ffffea0002adec00 R14: 0000000000000014 R15: > 
>> ffff88802f7e7000
>> [41934.482837] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) > 
>> knlGS:0000000000000000
>> [41934.482851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [41934.482863] CR2: 00007c1fb7d15aa1 CR3: 000000003c9da001 CR4: > 
>> 00000000003606f0
>> [41934.482877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 
>> 0000000000000000
>> [41934.482891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 
>> 0000000000000400
>> [41934.482905] Kernel panic - not syncing: Fatal exception
>> [41936.108632] Shutting down cpus with NMI
>> [41936.108774] Kernel Offset: disabled
>>> Any idea what might be causing this issue?
>> Thanks.
>>> January 31, 2020 5:08 PM, mmo...@disroot.org <mailto:mmo...@disroot.org> > 
>>> wrote:
>> Many thanks for the suggestion!
>> I'm not using any proprietary modules of any sort, below are the
>> only modules that I are loaded in the AppVM that was killed (as you
>> can see nothing really special):
>> Module Size Used by
>> fuse 126976 3
>> ip6table_filter 16384 1
>> ip6_tables 32768 1 ip6table_filter
>> xt_conntrack 16384 2
>> ipt_MASQUERADE 16384 1
>> iptable_nat 16384 1
>> nf_nat_ipv4 16384 2 ipt_MASQUERADE,iptable_nat
>> nf_nat 36864 1 nf_nat_ipv4
>> nf_conntrack 163840 4 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4
>> nf_defrag_ipv6 20480 1 nf_conntrack
>> nf_defrag_ipv4 16384 1 nf_conntrack
>> libcrc32c 16384 2 nf_conntrack,nf_nat
>> intel_rapl 24576 0
>> crct10dif_pclmul 16384 0
>> crc32_pclmul 16384 0
>> crc32c_intel 24576 1
>> ghash_clmulni_intel 16384 0
>> xen_netfront 32768 0
>> intel_rapl_perf 16384 0
>> pcspkr 16384 0
>> xenfs 16384 1
>> u2mfn 16384 0
>> xen_privcmd 24576 17 xenfs
>> xen_gntdev 24576 1
>> xen_gntalloc 16384 5
>> xen_blkback 49152 0
>> xen_evtchn 16384 6
>> overlay 122880 1
>> xen_blkfront 45056 6
>> The closesure of the socket probably is related with borgmatic (that
>> I'm using as my backup mechanism for the AppVms). But I don't think
>> its related, since I this enabled only in a few machines, and even
>> the ones that are not using borgmatic are terminated on resume.
>> I'm runing out of ideas on this. What I do noticed though is that if
>> the resume is done immediately after the suspend the resume works
>> fins without any AppVM being killed, which seems to indicate perhaps
>> an issue with the clock (that's the only thing that comes to mind,
>> specially given the warning above) but I'm not sure if this is the
>> root cause.
>> Any more suggestions would be really appreciated!
> 
> As this is a different crash, maybe it is memory corruption.
> Some information about which VM template is crashing may help,
> and any VMs that never crash?
> 
> What type of machine are you using?

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/0df159c4f85a439e605c17870ba3eaba%40disroot.org.

Re: [qubes-users] Re: AppVms being killed on resume due to clock skew too large

Reply via email to