Re: Debugging malloc crash in gdb
>Can you share a link to the gdb bug report? https://sourceware.org/bugzilla/show_bug.cgi?id=29513 So, open source products' support reaction time is usually great but not always great ;-). Unassigned for two months+ and counting. Kind Regards Ariel Burbaickij On Wed, Nov 2, 2022 at 1:38 PM Jon Turney wrote: > On 19/10/2022 07:20, Ariel Burbaickij wrote: > > Hello all, > > I reported it already, of course as it happened to me but alas no > reaction > > so far. > > > > Thanks for doing that. > > Can you share a link to the gdb bug report? > > -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
On 20/10/2022 09:22, David Allsopp wrote: On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote: On 18/10/2022 11:35, David Allsopp wrote: I'm wondering if I may be able to have some pointers for debugging what seems to be an unexpected interaction between mmap/mprotect/munmap and malloc with the OCaml runtime. [...]>>> /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550 : internal-error: void resume_1(gdb_signal): Assertion `pc_in_thread_step_range (pc, tp)' failed. I'm not sure now which combination of stepping directly into the malloc call, adding set cygwin-exceptions on or switching to gdb 12.1, but either way I was able to get to an invalid memory access in mmap_alloc in malloc.cc. At this point, p was a pointer to the start of the 256M block which had been passed to munmap. What I then noticed from that is a bug in our code - the mmap'd region was actually 256M+64K but the size passed to munmap was 256M... so the munmap call was not releasing the entire block. Fixing that on the OCaml side fixes the error completely - I don't know whether what we were seeing before counts as a bug in Cygwin's allocator? That depends. Is the ocaml code relying on undefined behaviour, which just happens to work elsewhere, but fails on cygwin? Or is it defined behaviour, which Cygwin doesn't implement correctly? (It's not unreasonable that Cygwin's memory allocator is more sensitive to some classes of errors than other implementations) -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
On 19/10/2022 07:20, Ariel Burbaickij wrote: Hello all, I reported it already, of course as it happened to me but alas no reaction so far. Thanks for doing that. Can you share a link to the gdb bug report? -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
Hello David, congrats on your bug fixing but gdb is pretty open that it considers it as its own bug while running its "inferior", somewhere here: if (tp->control.may_range_step) { /* If we're resuming a thread with the PC out of the step range, then we're doing some nested/finer run control operation, like stepping the thread out of the dynamic linker or the displaced stepping scratch pad. We shouldn't have allowed a range step then. */ gdb_assert (pc_in_thread_step_range (pc, tp)); } whatever the logic behind setting may_range_step might be, it is (or should be) as much decoupled from all the probable bugs in allocators of all the possible flavours. So, it should be investigated from the side of gdb maintainers too, for sure, as I see it. Kind Regards Ariel Burbaickij On Thu, Oct 20, 2022 at 10:22 AM David Allsopp wrote: > On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote: > > > > On 18/10/2022 11:35, David Allsopp wrote: > > > I'm wondering if I may be able to have some pointers for debugging what > > > seems to be an unexpected interaction between mmap/mprotect/munmap and > > > malloc with the OCaml runtime. > > > > > > At the moment, I know that we crash in malloc, so my main question is > how to > > > go further in gdb. I installed the cygwin-debuginfo package, but all > I'm > > > getting is: > > > > Firstly, if the crash is inside the cygwin DLL, you must follow the > > advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on > > an exception inside cygwin itself. > > > > [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin > > > > > > > > /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550 > > > : internal-error: void resume_1(gdb_signal): Assertion > > > `pc_in_thread_step_range (pc, tp)' failed. > > I'm not sure now which combination of stepping directly into the > malloc call, adding set cygwin-exceptions on or switching to gdb 12.1, > but either way I was able to get to an invalid memory access in > mmap_alloc in malloc.cc. At this point, p was a pointer to the start > of the 256M block which had been passed to munmap. > > What I then noticed from that is a bug in our code - the mmap'd region > was actually 256M+64K but the size passed to munmap was 256M... so the > munmap call was not releasing the entire block. Fixing that on the > OCaml side fixes the error completely - I don't know whether what we > were seeing before counts as a bug in Cygwin's allocator? > > Many thanks! > > > David > > -- > Problem reports: https://cygwin.com/problems.html > FAQ: https://cygwin.com/faq/ > Documentation:https://cygwin.com/docs.html > Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple > -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote: > > On 18/10/2022 11:35, David Allsopp wrote: > > I'm wondering if I may be able to have some pointers for debugging what > > seems to be an unexpected interaction between mmap/mprotect/munmap and > > malloc with the OCaml runtime. > > > > At the moment, I know that we crash in malloc, so my main question is how to > > go further in gdb. I installed the cygwin-debuginfo package, but all I'm > > getting is: > > Firstly, if the crash is inside the cygwin DLL, you must follow the > advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on > an exception inside cygwin itself. > > [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin > > > > /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550 > > : internal-error: void resume_1(gdb_signal): Assertion > > `pc_in_thread_step_range (pc, tp)' failed. I'm not sure now which combination of stepping directly into the malloc call, adding set cygwin-exceptions on or switching to gdb 12.1, but either way I was able to get to an invalid memory access in mmap_alloc in malloc.cc. At this point, p was a pointer to the start of the 256M block which had been passed to munmap. What I then noticed from that is a bug in our code - the mmap'd region was actually 256M+64K but the size passed to munmap was 256M... so the munmap call was not releasing the entire block. Fixing that on the OCaml side fixes the error completely - I don't know whether what we were seeing before counts as a bug in Cygwin's allocator? Many thanks! David -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
Hello all, I reported it already, of course as it happened to me but alas no reaction so far. Kind Regards Ariel Burbaickij On Tuesday, October 18, 2022, Jon Turney wrote: > On 18/10/2022 11:35, David Allsopp wrote: > >> I'm wondering if I may be able to have some pointers for debugging what >> seems to be an unexpected interaction between mmap/mprotect/munmap and >> malloc with the OCaml runtime. >> >> At the moment, I know that we crash in malloc, so my main question is how >> to >> go further in gdb. I installed the cygwin-debuginfo package, but all I'm >> getting is: >> > > Firstly, if the crash is inside the cygwin DLL, you must follow the advice > in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on an > exception inside cygwin itself. > > [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin > > > /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/ >> gdb/infrun.c:2550 >> : internal-error: void resume_1(gdb_signal): Assertion >> `pc_in_thread_step_range (pc, tp)' failed. >> > > This looks similar to the gdb crash reported [2], which I just don't have > any time to look into. > > [2] https://cygwin.com/pipermail/cygwin/2022-June/251714.html > > I'd suggest reporting this as directed in https://www.sourceware.org/gdb > /bugs/ > > (Note that self-service account creation is disabled on the sourceware > bugzilla, due to spam problems, so you need to mail overseers as directed > there, to request a Sourceware Bugzilla account.) > > The reproduction case is below (it's the OCaml runtime, so it's not exactly >> minimal, but it seems to be very repeatable to get gdb to the position of >> the crash). >> >> [...] > >> >> Any assistance to debug this further hugely appreciated! >> > > It might be worth exploring if this gdb crash is seen in older versions of > gcc, or with older gcc... > > > -- > Problem reports: https://cygwin.com/problems.html > FAQ: https://cygwin.com/faq/ > Documentation:https://cygwin.com/docs.html > Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple > -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Debugging malloc crash in gdb
On 18/10/2022 11:35, David Allsopp wrote: I'm wondering if I may be able to have some pointers for debugging what seems to be an unexpected interaction between mmap/mprotect/munmap and malloc with the OCaml runtime. At the moment, I know that we crash in malloc, so my main question is how to go further in gdb. I installed the cygwin-debuginfo package, but all I'm getting is: Firstly, if the crash is inside the cygwin DLL, you must follow the advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on an exception inside cygwin itself. [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550 : internal-error: void resume_1(gdb_signal): Assertion `pc_in_thread_step_range (pc, tp)' failed. This looks similar to the gdb crash reported [2], which I just don't have any time to look into. [2] https://cygwin.com/pipermail/cygwin/2022-June/251714.html I'd suggest reporting this as directed in https://www.sourceware.org/gdb/bugs/ (Note that self-service account creation is disabled on the sourceware bugzilla, due to spam problems, so you need to mail overseers as directed there, to request a Sourceware Bugzilla account.) The reproduction case is below (it's the OCaml runtime, so it's not exactly minimal, but it seems to be very repeatable to get gdb to the position of the crash). [...] Any assistance to debug this further hugely appreciated! It might be worth exploring if this gdb crash is seen in older versions of gcc, or with older gcc... -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Debugging malloc crash in gdb
I'm wondering if I may be able to have some pointers for debugging what seems to be an unexpected interaction between mmap/mprotect/munmap and malloc with the OCaml runtime. At the moment, I know that we crash in malloc, so my main question is how to go further in gdb. I installed the cygwin-debuginfo package, but all I'm getting is: /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550 : internal-error: void resume_1(gdb_signal): Assertion `pc_in_thread_step_range (pc, tp)' failed. The reproduction case is below (it's the OCaml runtime, so it's not exactly minimal, but it seems to be very repeatable to get gdb to the position of the crash). In terms of memory, what OCaml is doing: - At startup, 256M of address space is reserved (with mmap) for garbage collected minor heaps ("minor arena") - The first 2M of this is "committed" with mprotect for use by the program's main thread - The program then instructs the runtime to double the size of the minor arena - The 2M portion is "decommitted" with mprotect - The 256M mmap'd region is munmap'd - A new 512M region of address space is reserved - The first 4M of this is "committed" with mprotect for use by the program's main thread - The program performs some assertion checks - Book-keeping at the end of this causes malloc to be called, which segfaults. The crashing call to malloc is the first call to malloc since the 256M -> 512M munmap/map dance. If the call to caml_mem_unmap at the end of unreserve_minor_heaps in runtime/domain.c is omitted, then this program succeeds - i.e. malloc does not appear to crash if the 256M region is left mapped. Obviously, I realise this may well be unrelated to what's going wrong. Any assistance to debug this further hugely appreciated! Thanks, David --- Full repro instructions Cygwin packages required: gcc-core, make, flexdll Build: git clone https://github.com/dra27/ocaml -b restore-cygwin-break --depth 1 cd ocaml ./configure --disable-native-compiler --disable-debugger --disable-ocamldoc && make -j runtime/ocamlrun.exe ./ocamlc.exe -nostdlib -I stdlib testsuite/tests/regression/pr9326/gc_set.ml -o gc_set.byte.exe Crash: runtime/ocamlrun.exe ./gc_set.byte.exe Debug: OCAMLRUNPARAM=v=0x1FFF gdb runtime/ocamlrun.exe break caml_gc_get run ./gc_set.byte.exe continue break alloc_generic_table continue break caml_stat_alloc_noexc continue step step step *boom* -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple