Re: svn commit: r333703 - head/sys/vm

Andrew Gallatin Thu, 17 May 2018 11:19:25 -0700

On 05/17/18 14:07, Mark Johnston wrote:

On Thu, May 17, 2018 at 10:07:34AM -0700, Conrad Meyer wrote:

On Wed, May 16, 2018 at 9:27 PM, Mark Johnston <[email protected]> wrote:

Author: markj
Date: Thu May 17 04:27:08 2018
New Revision: 333703
URL: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svnweb.freebsd.org_changeset_base_333703&d=DwIBAg&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=Ed-falealxPeqc22ehgAUCLh8zlZbibZLSMWJeZro4A&m=6lhtci2MYxtyrK5Ub70QC0DcEiQ77Ry2LTAb6cDtW5A&s=z0SOGvNGORjI-SySfy-aovuyFzy_K5CtCfbNeWbRGLA&e=


Log:
   Fix a race in vm_page_pagequeue_lockptr().

   The value of m->queue must be cached after comparing it with PQ_NONE,
   since it may be concurrently changing.

   Reported by:  glebius


What were the symptoms of this issue?  The test plan in the linked
phabricator revision says:

"Gleb reported seeing panics as a result of the use of a bogus index
into the pagequeue array, and also reported that this patch fixed the
panics."

So an attempt to lock pagequeues[PQ_NONE=255].pq_mutex, which is
either something later in the vm_domain object, or bogus memory?  One
of the mtx asserts trips?


I think it was "mtx_lock() of spin mutex"; I didn't get a lot of
details.

I failed to note in the commit message that this race was introduced in
r332974.


The most common stack was:

panic: mtx_lock() of spin mutex (null) @/data/ocafirmware.alt/FreeBSD/sys/vm/vm_page.c:3344

cpuid = 4
time = 1526415167
KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame0xfffffe158af62380

vpanic() at vpanic+0x1a3/frame 0xfffffe158af623e0
doadump() at doadump/frame 0xfffffe158af62460
__mtx_lock_flags() at __mtx_lock_flags+0x11a/frame 0xfffffe158af624a0
vm_page_dequeue() at vm_page_dequeue+0x8a/frame 0xfffffe158af624e0

vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x2cb/frame0xfffffe158af62560

vm_page_grab_pages() at vm_page_grab_pages+0x274/frame 0xfffffe158af62610
vn_sendfile() at vn_sendfile+0x83a/frame 0xfffffe158af628e0

[Tue May 15 20:12:48 2018]sys_sendfile() at sys_sendfile+0x119/frame0xfffffe158af62980

amd64_syscall() at amd64_syscall+0x298/frame 0xfffffe158af62ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe158af62ab0



I once saw one like this:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff8088bf74
stack pointer           = 0x28:0xfffffe55af7712e0
frame pointer           = 0x28:0xfffffe55af771330
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq446: mlx5_core0)
[Mon May 14 04:45:10 2018]trap number           = 9
panic: general protection fault
cpuid = 0
time = 1526273109
KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame0xfffffe55af770ff0

vpanic() at vpanic+0x1a3/frame 0xfffffe55af771050
panic() at panic+0x43/frame 0xfffffe55af7710b0
trap_fatal() at trap_fatal+0x35f/frame 0xfffffe55af771100
trap() at trap+0x6d/frame 0xfffffe55af771210

[Mon May 14 04:45:10 2018]calltrap() at calltrap+0x8/frame0xfffffe55af771210--- trap 0x9, rip = 0xffffffff8088bf74, rsp = 0xfffffe55af7712e0, rbp =0xfffffe55af771330 ---vm_pqbatch_submit_page() at vm_pqbatch_submit_page+0x144/frame0xfffffe55af771330

sendfile_free_page() at sendfile_free_page+0x10e/frame 0xfffffe55af771360

sendfile_free_mext_pg() at sendfile_free_mext_pg+0xb7/frame0xfffffe55af7713b0

mb_free_ext() at mb_free_ext+0x103/frame 0xfffffe55af7713e0
m_freem() at m_freem+0x48/frame 0xfffffe55af771400
tcp_do_segment() at tcp_do_segment+0x1647/frame 0xfffffe55af771500
tcp_input_with_port() at tcp_input_with_port+0xfcc/frame 0xfffffe55af771650
tcp_input() at tcp_input+0xb/frame 0xfffffe55af771660

[Mon May 14 04:45:10 2018]ip_input() at ip_input+0xe9/frame0xfffffe55af7716c0

netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af771710
ether_demux() at ether_demux+0x140/frame 0xfffffe55af771740
ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe55af7717a0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af7717f0
ether_input() at ether_input+0x26/frame 0xfffffe55af771810
tcp_lro_flush_all() at tcp_lro_flush_all+0xf2/frame 0xfffffe55af771850
mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0x5e5/frame 0xfffffe55af771950
mlx5_cq_completion() at mlx5_cq_completion+0x73/frame 0xfffffe55af771990
<...>

Thanks again for fixing it so quickly!

Drew
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "[email protected]"

Re: svn commit: r333703 - head/sys/vm

Reply via email to