Re: [osv-dev] Re: NMI crash in memcpy() between memory areas allocated with mmu::map_anon()

Rick Payne Sat, 28 Mar 2020 15:33:00 -0700


With your latest 2 patches, our production box which was having
problems has run fine for the last 48hours. Thanks for working so hard
on fixing it! It has been quite the pain point for us.


Are bugs 784 and 1077 something we should worry about?

Rick

On Thu, 2020-03-26 at 12:50 -0700, Waldek Kozaczuk wrote:
> This was actually caused by a bug in one of the older versions of the
> "mempool: use map_anon() for large allocations or when memory is
> fragmented"  patch. It turns out I forgot that object_size() also
> needs to account mamp_anon() based allocations and do it properly ;-) 
> My latest - version 4 - of this patch should work better, plus I
> added a unit test around it. But it still needs to be reviewed.
> 
> On Wednesday, March 25, 2020 at 11:48:52 AM UTC-4, Waldek Kozaczuk
> wrote:
> > This is really related to the "OOM query" thread but I wanted to
> > send new email as the other thread has gotten quite long.
> > 
> > I any case we are troubleshooting an app crash which happens pretty
> > instantly after boot and one of the of thread stack trace looks
> > like this:
> > 
> > (gdb) bt
> > #0  0x00000000403a7bea in processor::cli_hlt () at
> > arch/x64/processor.hh:247
> > #1  nmi (ef=0xffff80003fa1c068) at arch/x64/exceptions.cc:306
> > #2  <signal handler called>
> > #3  0x00000000403940a3 in memcpy_repmov_ssse3 (dest=0x2000415014c0,
> > src=0x20004e7851d4, n=16) at /usr/include/c++/9/array:185
> > #4  0x0000100001756a5b in ?? ()
> > #5  0x0000000000000000 in ?? ()
> > 
> > Also this is with the last 2 patches - "[PATCH V2 1/2] mempool: fix
> > a bug in page_range_allocator() when handling worst case O(n)
> > scenario" and "[PATCH V2 2/2] mempool: use map_anon() for large
> > allocations or when memory is fragmented" applied to address
> > fragmentation that make malloc_large() use mmu::map_anon() in
> > certain cases.
> > 
> > So as you tell mempy (or specifically memcpy_repmov_ssse3())
> > triggers NMI (Non-maskable interrupt) exception in memcpy between
> > memory areas allocated with mmu::map_anon() (see
> > dest=0x2000415014c0,
> > src=0x20004e7851d4, n=16). I really have no idea why. But have a
> > hunch that possibly it happens because mapping tables are not being
> > refreshed properly/flushed. Possibly allocation in requested on one
> > cpu and then memcpy()  called on another one which does not see a
> > mapping yet because. Or maybe TLB needs to flushed. From cursory
> > reading it look mmu::map_anon() might be doing it (somewhere
> > downstream) but not 100% sure.
> > 
> > Or maybe this NMI is caused by misaligned memory allocation (had
> > question in my patch if it really addresses it properly). Or maybe
> > a bug in my patch? Or maybe there is something fundamental in the
> > way memory allocated with map_anon() vs allocation using contiguous
> > physical memory. 
> > 
> > Anybody has other smart ideas?
> > 
> > Waldek
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/899b38b5-aed4-4497-ab83-c161a6b673ea%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/8d3c483ae9c4338676b2b644361aa9ab5928418c.camel%40rossfell.co.uk.

Re: [osv-dev] Re: NMI crash in memcpy() between memory areas allocated with mmu::map_anon()

Reply via email to