Re: [julia-users] debugging tips for GC?
Hello, On Tuesday, September 13, 2016 at 10:17:30 PM UTC+2, Yichao Yu wrote: > > On Tue, Sep 13, 2016 at 4:11 PM, Yichao Yu> wrote: > >> I'm able to reproduce it in rr and found the issue. >> TL;DR the issue is at >> https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625, >> where it passes the ownership of a cairo pointer to julia, causing a double >> free. >> > > Which comes from your commit a year ago ;-p > > https://github.com/JuliaGraphics/Cairo.jl/commit/23681dc1270882c964059d23863a88d4276f6fd8 > True. Actually Tim recommended this, i had initially a solution with unsafe_load on single elements. But we both and subsequent editors never recognized the problem. I've filed an PR.
Re: [julia-users] debugging tips for GC?
On Tue, Sep 13, 2016 at 4:11 PM, Yichao Yuwrote: > I'm able to reproduce it in rr and found the issue. > > TL;DR the issue is at https://github.com/JuliaGraphics/Cairo.jl/blame/ > master/src/Cairo.jl#L625, where it passes the ownership of a cairo > pointer to julia, causing a double free. > Which comes from your commit a year ago ;-p https://github.com/JuliaGraphics/Cairo.jl/commit/23681dc1270882c964059d23863a88d4276f6fd8 > > Here's the rough process of my debugging, I'm not really sure how to > summarize it though > > 1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with > debug symbol to make my life easier. (the function is pretty short so > reading the disasm would have worked too) > 2. It is free'ing `path->data` so I added a watchpoint on it `watch -l > path->data` and reverse-continue to find the point of assignment. > 3. Assignment happens in cairo from a valid malloc so path->data isn't > corrupted. > 4. Now it takes some guessing to figure out exactly what's wrong. I'm not > sure how glibc stores it's malloc metadata (would help to know that) so I > tried the naive thing and watch the intptr_t before the malloc result > (that's how julia store the gc metadata) and run forward. None of the > assignment to this location looks suspicious (they are all in glibc and the > first hit isn't free'ing this value) > 5. So now I tried the brute force way,the pointer (`path->data`) I see is > `0x3746950` so I simply did a conditional breakpoint to see when it's > free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first > argument since the glibc I installed doesn't have that detailed debug info. > 6. After a long run (conditional breakpoint is really slow which is why I > didn't use it first) it hits a breakpoint in the julia GC when free'ing an > array. The array has a data pointer the same as the one in question and > that's before the pointer is free'd by cairo so sth is wrong with the > creation of the array. Now simply watch the `a->data` and go back again. > I'm lucky this time, if this didn't work, the next thing to try would be > trying to reduce the code/ run GC more often so that I can afford looking > at the code more carefully instead of just catching events in the debugger. > 7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems > that the caller is supplying a cairo pointer and transfering the ownership, > which is wrong. > > > > On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yu wrote: > >> >> >> On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger >> wrote: >> >>> Hello colleague, >>> >>> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote: On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger wrote: > Hello colleagues, > > i'm trying to find out, why this > ... > fails miserably. I guess, but cannot track it down right now: There is > something wrong in memory management of Cairo.jl that only shows up for > objects that could have been freed long ago and julia and libcairo have > different concepts of invalidation. > > Any blog/receipe/issue that deals with GC debugging? > It's not too different from debugging memory issue in any other program. It usually helps (a lot) to reproduce under rr[1] >>> >>> Many thanks for pointing to this. I was aware it exists but wasn't aware >>> of their progress. >>> >>> Other than that, it strongly depend on the kind of error and I've seen it happens due to almost all parts of the runtime and it's really hard to summarize. >>> >>> What do you mean with "happens due to almost all parts of the runtime" ? >>> >> >> The general procedure is basically catch the failure and try to figure >> out why it got into this states. This means that you generally need to >> trace back where certain value is generated which also usually means that >> you need to trace back through a few layers of code and they might be >> scattered all over the place. >> >> >
Re: [julia-users] debugging tips for GC?
I'm able to reproduce it in rr and found the issue. TL;DR the issue is at https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625, where it passes the ownership of a cairo pointer to julia, causing a double free. Here's the rough process of my debugging, I'm not really sure how to summarize it though 1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with debug symbol to make my life easier. (the function is pretty short so reading the disasm would have worked too) 2. It is free'ing `path->data` so I added a watchpoint on it `watch -l path->data` and reverse-continue to find the point of assignment. 3. Assignment happens in cairo from a valid malloc so path->data isn't corrupted. 4. Now it takes some guessing to figure out exactly what's wrong. I'm not sure how glibc stores it's malloc metadata (would help to know that) so I tried the naive thing and watch the intptr_t before the malloc result (that's how julia store the gc metadata) and run forward. None of the assignment to this location looks suspicious (they are all in glibc and the first hit isn't free'ing this value) 5. So now I tried the brute force way,the pointer (`path->data`) I see is `0x3746950` so I simply did a conditional breakpoint to see when it's free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first argument since the glibc I installed doesn't have that detailed debug info. 6. After a long run (conditional breakpoint is really slow which is why I didn't use it first) it hits a breakpoint in the julia GC when free'ing an array. The array has a data pointer the same as the one in question and that's before the pointer is free'd by cairo so sth is wrong with the creation of the array. Now simply watch the `a->data` and go back again. I'm lucky this time, if this didn't work, the next thing to try would be trying to reduce the code/ run GC more often so that I can afford looking at the code more carefully instead of just catching events in the debugger. 7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems that the caller is supplying a cairo pointer and transfering the ownership, which is wrong. On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yuwrote: > > > On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger > wrote: > >> Hello colleague, >> >> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote: >>> >>> >>> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger >>> wrote: >>> Hello colleagues, i'm trying to find out, why this ... >>> fails miserably. I guess, but cannot track it down right now: There is something wrong in memory management of Cairo.jl that only shows up for objects that could have been freed long ago and julia and libcairo have different concepts of invalidation. Any blog/receipe/issue that deals with GC debugging? >>> >>> It's not too different from debugging memory issue in any other program. >>> It usually helps (a lot) to reproduce under rr[1] >>> >> >> Many thanks for pointing to this. I was aware it exists but wasn't aware >> of their progress. >> >> >>> Other than that, it strongly depend on the kind of error and I've seen >>> it happens due to almost all parts of the runtime and it's really hard to >>> summarize. >>> >> >> What do you mean with "happens due to almost all parts of the runtime" ? >> > > The general procedure is basically catch the failure and try to figure out > why it got into this states. This means that you generally need to trace > back where certain value is generated which also usually means that you > need to trace back through a few layers of code and they might be scattered > all over the place. > >
Re: [julia-users] debugging tips for GC?
On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobingerwrote: > Hello colleague, > > On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote: >> >> >> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger >> wrote: >> >>> Hello colleagues, >>> >>> i'm trying to find out, why this >>> ... >>> >> fails miserably. I guess, but cannot track it down right now: There is >>> something wrong in memory management of Cairo.jl that only shows up for >>> objects that could have been freed long ago and julia and libcairo have >>> different concepts of invalidation. >>> >>> Any blog/receipe/issue that deals with GC debugging? >>> >> >> It's not too different from debugging memory issue in any other program. >> It usually helps (a lot) to reproduce under rr[1] >> > > Many thanks for pointing to this. I was aware it exists but wasn't aware > of their progress. > > >> Other than that, it strongly depend on the kind of error and I've seen it >> happens due to almost all parts of the runtime and it's really hard to >> summarize. >> > > What do you mean with "happens due to almost all parts of the runtime" ? > The general procedure is basically catch the failure and try to figure out why it got into this states. This means that you generally need to trace back where certain value is generated which also usually means that you need to trace back through a few layers of code and they might be scattered all over the place.
Re: [julia-users] debugging tips for GC?
Hello colleague, On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote: > > > On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger> wrote: > >> Hello colleagues, >> >> i'm trying to find out, why this >> ... >> > fails miserably. I guess, but cannot track it down right now: There is >> something wrong in memory management of Cairo.jl that only shows up for >> objects that could have been freed long ago and julia and libcairo have >> different concepts of invalidation. >> >> Any blog/receipe/issue that deals with GC debugging? >> > > It's not too different from debugging memory issue in any other program. > It usually helps (a lot) to reproduce under rr[1] > Many thanks for pointing to this. I was aware it exists but wasn't aware of their progress. > Other than that, it strongly depend on the kind of error and I've seen it > happens due to almost all parts of the runtime and it's really hard to > summarize. > What do you mean with "happens due to almost all parts of the runtime" ?
Re: [julia-users] debugging tips for GC?
On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobingerwrote: > Hello colleagues, > > i'm trying to find out, why this > >_ >_ _ _(_)_ | A fresh approach to technical computing > (_) | (_) (_)| Documentation: http://docs.julialang.org >_ _ _| |_ __ _ | Type "?help" for help. > | | | | | | |/ _` | | > | | |_| | | | (_| | | Version 0.5.0-rc4+0 (2016-09-09 01:43 UTC) > _/ |\__'_|_|_|\__'_| | > |__/ | x86_64-linux-gnu > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > INFO: Cairo tests passed > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > INFO: Cairo tests passed > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > INFO: Cairo tests passed > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > INFO: Cairo tests passed > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > INFO: Cairo tests passed > > julia> Pkg.test("Cairo") > INFO: Testing Cairo > *** Error in `/home/lobi/julia05/usr/bin/julia': free(): invalid pointer: > 0x04013e80 *** > > signal (6): Aborted > while loading /home/lobi/.julia/v0.5/Cairo/samples/sample_imagepattern.jl, > in expression starting on line 29 > raise at /build/eglibc-3GlaMS/eglibc-2.19/signal/../nptl/sysdeps/ > unix/sysv/linux/raise.c:56 > abort at /build/eglibc-3GlaMS/eglibc-2.19/stdlib/abort.c:89 > __libc_message at /build/eglibc-3GlaMS/eglibc-2.19/libio/../sysdeps/posix/ > libc_fatal.c:175 > malloc_printerr at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:4996 > [inlined] > _int_free at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:3840 > jl_gc_free_array at /home/lobi/julia05/src/gc.c:744 [inlined] > sweep_malloced_arrays at /home/lobi/julia05/src/gc.c:765 [inlined] > gc_sweep_other at /home/lobi/julia05/src/gc.c:1032 [inlined] > _jl_gc_collect at /home/lobi/julia05/src/gc.c:1792 [inlined] > jl_gc_collect at /home/lobi/julia05/src/gc.c:1858 > jl_gc_pool_alloc at /home/lobi/julia05/src/gc.c:828 > jl_gc_alloc_ at /home/lobi/julia05/src/julia_internal.h:148 [inlined] > jl_gc_alloc at /home/lobi/julia05/src/gc.c:1881 > jl_alloc_svec_uninit at /home/lobi/julia05/src/simplevector.c:47 > jl_alloc_svec at /home/lobi/julia05/src/simplevector.c:56 > intersect_tuple at /home/lobi/julia05/src/jltypes.c:601 > jl_type_intersect at /home/lobi/julia05/src/jltypes.c:992 > jl_type_intersection_matching at /home/lobi/julia05/src/jltypes.c:1510 > jl_lookup_match at /home/lobi/julia05/src/typemap.c:376 [inlined] > jl_typemap_intersection_node_visitor at /home/lobi/julia05/src/ > typemap.c:503 > jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:565 > jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:556 > ml_matches at /home/lobi/julia05/src/gf.c:2266 [inlined] > jl_matching_methods at /home/lobi/julia05/src/gf.c:2287 > _methods_by_ftype at ./reflection.jl:223 > unknown function (ip: 0x7f00077b1f22) > jl_call_method_internal at /home/lobi/julia05/src/julia_internal.h:189 > [inlined] > jl_apply_generic at /home/lobi/julia05/src/gf.c:1929 > inlineable at ./inference.jl:2496 > unknown function (ip: 0x7f00077c4c92) > > > fails miserably. I guess, but cannot track it down right now: There is > something wrong in memory management of Cairo.jl that only shows up for > objects that could have been freed long ago and julia and libcairo have > different concepts of invalidation. > > Any blog/receipe/issue that deals with GC debugging? > It's not too different from debugging memory issue in any other program. It usually helps (a lot) to reproduce under rr[1] Other than that, it strongly depend on the kind of error and I've seen it happens due to almost all parts of the runtime and it's really hard to summarize. [1] github.com/mozilla/rr > > Wishing a happy day, > Andreas >
[julia-users] debugging tips for GC?
Hello colleagues, i'm trying to find out, why this _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.5.0-rc4+0 (2016-09-09 01:43 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-linux-gnu julia> Pkg.test("Cairo") INFO: Testing Cairo INFO: Cairo tests passed julia> Pkg.test("Cairo") INFO: Testing Cairo INFO: Cairo tests passed julia> Pkg.test("Cairo") INFO: Testing Cairo INFO: Cairo tests passed julia> Pkg.test("Cairo") INFO: Testing Cairo INFO: Cairo tests passed julia> Pkg.test("Cairo") INFO: Testing Cairo INFO: Cairo tests passed julia> Pkg.test("Cairo") INFO: Testing Cairo *** Error in `/home/lobi/julia05/usr/bin/julia': free(): invalid pointer: 0x04013e80 *** signal (6): Aborted while loading /home/lobi/.julia/v0.5/Cairo/samples/sample_imagepattern.jl, in expression starting on line 29 raise at /build/eglibc-3GlaMS/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56 abort at /build/eglibc-3GlaMS/eglibc-2.19/stdlib/abort.c:89 __libc_message at /build/eglibc-3GlaMS/eglibc-2.19/libio/../sysdeps/posix/libc_fatal.c:175 malloc_printerr at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:4996 [inlined] _int_free at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:3840 jl_gc_free_array at /home/lobi/julia05/src/gc.c:744 [inlined] sweep_malloced_arrays at /home/lobi/julia05/src/gc.c:765 [inlined] gc_sweep_other at /home/lobi/julia05/src/gc.c:1032 [inlined] _jl_gc_collect at /home/lobi/julia05/src/gc.c:1792 [inlined] jl_gc_collect at /home/lobi/julia05/src/gc.c:1858 jl_gc_pool_alloc at /home/lobi/julia05/src/gc.c:828 jl_gc_alloc_ at /home/lobi/julia05/src/julia_internal.h:148 [inlined] jl_gc_alloc at /home/lobi/julia05/src/gc.c:1881 jl_alloc_svec_uninit at /home/lobi/julia05/src/simplevector.c:47 jl_alloc_svec at /home/lobi/julia05/src/simplevector.c:56 intersect_tuple at /home/lobi/julia05/src/jltypes.c:601 jl_type_intersect at /home/lobi/julia05/src/jltypes.c:992 jl_type_intersection_matching at /home/lobi/julia05/src/jltypes.c:1510 jl_lookup_match at /home/lobi/julia05/src/typemap.c:376 [inlined] jl_typemap_intersection_node_visitor at /home/lobi/julia05/src/typemap.c:503 jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:565 jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:556 ml_matches at /home/lobi/julia05/src/gf.c:2266 [inlined] jl_matching_methods at /home/lobi/julia05/src/gf.c:2287 _methods_by_ftype at ./reflection.jl:223 unknown function (ip: 0x7f00077b1f22) jl_call_method_internal at /home/lobi/julia05/src/julia_internal.h:189 [inlined] jl_apply_generic at /home/lobi/julia05/src/gf.c:1929 inlineable at ./inference.jl:2496 unknown function (ip: 0x7f00077c4c92) fails miserably. I guess, but cannot track it down right now: There is something wrong in memory management of Cairo.jl that only shows up for objects that could have been freed long ago and julia and libcairo have different concepts of invalidation. Any blog/receipe/issue that deals with GC debugging? Wishing a happy day, Andreas