Re: [julia-users] debugging tips for GC?

2016-09-14 Thread Andreas Lobinger
Hello,

On Tuesday, September 13, 2016 at 10:17:30 PM UTC+2, Yichao Yu wrote:

>
> On Tue, Sep 13, 2016 at 4:11 PM, Yichao Yu  > wrote:
>
>> I'm able to reproduce it in rr and found the issue.
>> TL;DR the issue is at 
>> https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625, 
>> where it passes the ownership of a cairo pointer to julia, causing a double 
>> free.
>>
>
> Which comes from your commit a year ago ;-p
>
> https://github.com/JuliaGraphics/Cairo.jl/commit/23681dc1270882c964059d23863a88d4276f6fd8
>

True. Actually Tim recommended this, i had initially a solution with 
unsafe_load on single elements. But we both and subsequent editors never 
recognized the problem. I've filed an PR.


Re: [julia-users] debugging tips for GC?

2016-09-13 Thread Yichao Yu
On Tue, Sep 13, 2016 at 4:11 PM, Yichao Yu  wrote:

> I'm able to reproduce it in rr and found the issue.
>
> TL;DR the issue is at https://github.com/JuliaGraphics/Cairo.jl/blame/
> master/src/Cairo.jl#L625, where it passes the ownership of a cairo
> pointer to julia, causing a double free.
>

Which comes from your commit a year ago ;-p
https://github.com/JuliaGraphics/Cairo.jl/commit/23681dc1270882c964059d23863a88d4276f6fd8


>
> Here's the rough process of my debugging, I'm not really sure how to
> summarize it though
>
> 1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with
> debug symbol to make my life easier. (the function is pretty short so
> reading the disasm would have worked too)
> 2. It is free'ing `path->data` so I added a watchpoint on it `watch -l
> path->data` and reverse-continue to find the point of assignment.
> 3. Assignment happens in cairo from a valid malloc so path->data isn't
> corrupted.
> 4. Now it takes some guessing to figure out exactly what's wrong. I'm not
> sure how glibc stores it's malloc metadata (would help to know that) so I
> tried the naive thing and watch the intptr_t before the malloc result
> (that's how julia store the gc metadata) and run forward. None of the
> assignment to this location looks suspicious (they are all in glibc and the
> first hit isn't free'ing this value)
> 5. So now I tried the brute force way,the pointer (`path->data`) I see is
> `0x3746950` so I simply did a conditional breakpoint to see when it's
> free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first
> argument since the glibc I installed doesn't have that detailed debug info.
> 6. After a long run (conditional breakpoint is really slow which is why I
> didn't use it first) it hits a breakpoint in the julia GC when free'ing an
> array. The array has a data pointer the same as the one in question and
> that's before the pointer is free'd by cairo so sth is wrong with the
> creation of the array. Now simply watch the `a->data` and go back again.
> I'm lucky this time, if this didn't work, the next thing to try would be
> trying to reduce the code/ run GC more often so that I can afford looking
> at the code more carefully instead of just catching events in the debugger.
> 7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems
> that the caller is supplying a cairo pointer and transfering the ownership,
> which is wrong.
>
>
>
> On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yu  wrote:
>
>>
>>
>> On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger 
>> wrote:
>>
>>> Hello colleague,
>>>
>>> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote:


 On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger 
 wrote:

> Hello colleagues,
>
> i'm trying to find out, why this
> ...
>
 fails miserably. I guess, but cannot track it down right now: There is
> something wrong in memory management of Cairo.jl that only shows up for
> objects that could have been freed long ago and julia and libcairo have
> different concepts of invalidation.
>
> Any blog/receipe/issue that deals with GC debugging?
>

 It's not too different from debugging memory issue in any other program.
 It usually helps (a lot) to reproduce under rr[1]

>>>
>>> Many thanks for pointing to this. I was aware it exists but wasn't aware
>>> of their progress.
>>>
>>>
 Other than that, it strongly depend on the kind of error and I've seen
 it happens due to almost all parts of the runtime and it's really hard to
 summarize.

>>>
>>> What do you mean with "happens due to almost all parts of the runtime" ?
>>>
>>
>> The general procedure is basically catch the failure and try to figure
>> out why it got into this states. This means that you generally need to
>> trace back where certain value is generated which also usually means that
>> you need to trace back through a few layers of code and they might be
>> scattered all over the place.
>>
>>
>


Re: [julia-users] debugging tips for GC?

2016-09-13 Thread Yichao Yu
I'm able to reproduce it in rr and found the issue.

TL;DR the issue is at
https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625,
where it passes the ownership of a cairo pointer to julia, causing a double
free.

Here's the rough process of my debugging, I'm not really sure how to
summarize it though

1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with
debug symbol to make my life easier. (the function is pretty short so
reading the disasm would have worked too)
2. It is free'ing `path->data` so I added a watchpoint on it `watch -l
path->data` and reverse-continue to find the point of assignment.
3. Assignment happens in cairo from a valid malloc so path->data isn't
corrupted.
4. Now it takes some guessing to figure out exactly what's wrong. I'm not
sure how glibc stores it's malloc metadata (would help to know that) so I
tried the naive thing and watch the intptr_t before the malloc result
(that's how julia store the gc metadata) and run forward. None of the
assignment to this location looks suspicious (they are all in glibc and the
first hit isn't free'ing this value)
5. So now I tried the brute force way,the pointer (`path->data`) I see is
`0x3746950` so I simply did a conditional breakpoint to see when it's
free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first
argument since the glibc I installed doesn't have that detailed debug info.
6. After a long run (conditional breakpoint is really slow which is why I
didn't use it first) it hits a breakpoint in the julia GC when free'ing an
array. The array has a data pointer the same as the one in question and
that's before the pointer is free'd by cairo so sth is wrong with the
creation of the array. Now simply watch the `a->data` and go back again.
I'm lucky this time, if this didn't work, the next thing to try would be
trying to reduce the code/ run GC more often so that I can afford looking
at the code more carefully instead of just catching events in the debugger.
7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems
that the caller is supplying a cairo pointer and transfering the ownership,
which is wrong.


On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yu  wrote:

>
>
> On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger 
> wrote:
>
>> Hello colleague,
>>
>> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote:
>>>
>>>
>>> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger 
>>> wrote:
>>>
 Hello colleagues,

 i'm trying to find out, why this
 ...

>>> fails miserably. I guess, but cannot track it down right now: There is
 something wrong in memory management of Cairo.jl that only shows up for
 objects that could have been freed long ago and julia and libcairo have
 different concepts of invalidation.

 Any blog/receipe/issue that deals with GC debugging?

>>>
>>> It's not too different from debugging memory issue in any other program.
>>> It usually helps (a lot) to reproduce under rr[1]
>>>
>>
>> Many thanks for pointing to this. I was aware it exists but wasn't aware
>> of their progress.
>>
>>
>>> Other than that, it strongly depend on the kind of error and I've seen
>>> it happens due to almost all parts of the runtime and it's really hard to
>>> summarize.
>>>
>>
>> What do you mean with "happens due to almost all parts of the runtime" ?
>>
>
> The general procedure is basically catch the failure and try to figure out
> why it got into this states. This means that you generally need to trace
> back where certain value is generated which also usually means that you
> need to trace back through a few layers of code and they might be scattered
> all over the place.
>
>


Re: [julia-users] debugging tips for GC?

2016-09-13 Thread Yichao Yu
On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger 
wrote:

> Hello colleague,
>
> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote:
>>
>>
>> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger 
>> wrote:
>>
>>> Hello colleagues,
>>>
>>> i'm trying to find out, why this
>>> ...
>>>
>> fails miserably. I guess, but cannot track it down right now: There is
>>> something wrong in memory management of Cairo.jl that only shows up for
>>> objects that could have been freed long ago and julia and libcairo have
>>> different concepts of invalidation.
>>>
>>> Any blog/receipe/issue that deals with GC debugging?
>>>
>>
>> It's not too different from debugging memory issue in any other program.
>> It usually helps (a lot) to reproduce under rr[1]
>>
>
> Many thanks for pointing to this. I was aware it exists but wasn't aware
> of their progress.
>
>
>> Other than that, it strongly depend on the kind of error and I've seen it
>> happens due to almost all parts of the runtime and it's really hard to
>> summarize.
>>
>
> What do you mean with "happens due to almost all parts of the runtime" ?
>

The general procedure is basically catch the failure and try to figure out
why it got into this states. This means that you generally need to trace
back where certain value is generated which also usually means that you
need to trace back through a few layers of code and they might be scattered
all over the place.


Re: [julia-users] debugging tips for GC?

2016-09-13 Thread Andreas Lobinger
Hello colleague,

On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote:
>
>
> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger  > wrote:
>
>> Hello colleagues,
>>
>> i'm trying to find out, why this 
>> ...
>>
> fails miserably. I guess, but cannot track it down right now: There is 
>> something wrong in memory management of Cairo.jl that only shows up for 
>> objects that could have been freed long ago and julia and libcairo have 
>> different concepts of invalidation.
>>
>> Any blog/receipe/issue that deals with GC debugging?
>>
>
> It's not too different from debugging memory issue in any other program.
> It usually helps (a lot) to reproduce under rr[1]
>

Many thanks for pointing to this. I was aware it exists but wasn't aware of 
their progress.
 

> Other than that, it strongly depend on the kind of error and I've seen it 
> happens due to almost all parts of the runtime and it's really hard to 
> summarize.
>

What do you mean with "happens due to almost all parts of the runtime" ?


Re: [julia-users] debugging tips for GC?

2016-09-13 Thread Yichao Yu
On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger 
wrote:

> Hello colleagues,
>
> i'm trying to find out, why this
>
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "?help" for help.
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.5.0-rc4+0 (2016-09-09 01:43 UTC)
>  _/ |\__'_|_|_|\__'_|  |
> |__/   |  x86_64-linux-gnu
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> INFO: Cairo tests passed
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> INFO: Cairo tests passed
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> INFO: Cairo tests passed
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> INFO: Cairo tests passed
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> INFO: Cairo tests passed
>
> julia> Pkg.test("Cairo")
> INFO: Testing Cairo
> *** Error in `/home/lobi/julia05/usr/bin/julia': free(): invalid pointer:
> 0x04013e80 ***
>
> signal (6): Aborted
> while loading /home/lobi/.julia/v0.5/Cairo/samples/sample_imagepattern.jl,
> in expression starting on line 29
> raise at /build/eglibc-3GlaMS/eglibc-2.19/signal/../nptl/sysdeps/
> unix/sysv/linux/raise.c:56
> abort at /build/eglibc-3GlaMS/eglibc-2.19/stdlib/abort.c:89
> __libc_message at /build/eglibc-3GlaMS/eglibc-2.19/libio/../sysdeps/posix/
> libc_fatal.c:175
> malloc_printerr at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:4996
> [inlined]
> _int_free at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:3840
> jl_gc_free_array at /home/lobi/julia05/src/gc.c:744 [inlined]
> sweep_malloced_arrays at /home/lobi/julia05/src/gc.c:765 [inlined]
> gc_sweep_other at /home/lobi/julia05/src/gc.c:1032 [inlined]
> _jl_gc_collect at /home/lobi/julia05/src/gc.c:1792 [inlined]
> jl_gc_collect at /home/lobi/julia05/src/gc.c:1858
> jl_gc_pool_alloc at /home/lobi/julia05/src/gc.c:828
> jl_gc_alloc_ at /home/lobi/julia05/src/julia_internal.h:148 [inlined]
> jl_gc_alloc at /home/lobi/julia05/src/gc.c:1881
> jl_alloc_svec_uninit at /home/lobi/julia05/src/simplevector.c:47
> jl_alloc_svec at /home/lobi/julia05/src/simplevector.c:56
> intersect_tuple at /home/lobi/julia05/src/jltypes.c:601
> jl_type_intersect at /home/lobi/julia05/src/jltypes.c:992
> jl_type_intersection_matching at /home/lobi/julia05/src/jltypes.c:1510
> jl_lookup_match at /home/lobi/julia05/src/typemap.c:376 [inlined]
> jl_typemap_intersection_node_visitor at /home/lobi/julia05/src/
> typemap.c:503
> jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:565
> jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:556
> ml_matches at /home/lobi/julia05/src/gf.c:2266 [inlined]
> jl_matching_methods at /home/lobi/julia05/src/gf.c:2287
> _methods_by_ftype at ./reflection.jl:223
> unknown function (ip: 0x7f00077b1f22)
> jl_call_method_internal at /home/lobi/julia05/src/julia_internal.h:189
> [inlined]
> jl_apply_generic at /home/lobi/julia05/src/gf.c:1929
> inlineable at ./inference.jl:2496
> unknown function (ip: 0x7f00077c4c92)
>
>
> fails miserably. I guess, but cannot track it down right now: There is
> something wrong in memory management of Cairo.jl that only shows up for
> objects that could have been freed long ago and julia and libcairo have
> different concepts of invalidation.
>
> Any blog/receipe/issue that deals with GC debugging?
>

It's not too different from debugging memory issue in any other program.
It usually helps (a lot) to reproduce under rr[1]
Other than that, it strongly depend on the kind of error and I've seen it
happens due to almost all parts of the runtime and it's really hard to
summarize.

[1] github.com/mozilla/rr


>
> Wishing a happy day,
> Andreas
>


[julia-users] debugging tips for GC?

2016-09-13 Thread Andreas Lobinger
Hello colleagues,

i'm trying to find out, why this 

   _
   _   _ _(_)_ |  A fresh approach to technical computing
  (_) | (_) (_)|  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-rc4+0 (2016-09-09 01:43 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/   |  x86_64-linux-gnu

julia> Pkg.test("Cairo")
INFO: Testing Cairo
INFO: Cairo tests passed

julia> Pkg.test("Cairo")
INFO: Testing Cairo
INFO: Cairo tests passed

julia> Pkg.test("Cairo")
INFO: Testing Cairo
INFO: Cairo tests passed

julia> Pkg.test("Cairo")
INFO: Testing Cairo
INFO: Cairo tests passed

julia> Pkg.test("Cairo")
INFO: Testing Cairo
INFO: Cairo tests passed

julia> Pkg.test("Cairo")
INFO: Testing Cairo
*** Error in `/home/lobi/julia05/usr/bin/julia': free(): invalid pointer: 
0x04013e80 ***

signal (6): Aborted
while loading /home/lobi/.julia/v0.5/Cairo/samples/sample_imagepattern.jl, 
in expression starting on line 29
raise at 
/build/eglibc-3GlaMS/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56
abort at /build/eglibc-3GlaMS/eglibc-2.19/stdlib/abort.c:89
__libc_message at 
/build/eglibc-3GlaMS/eglibc-2.19/libio/../sysdeps/posix/libc_fatal.c:175
malloc_printerr at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:4996 
[inlined]
_int_free at /build/eglibc-3GlaMS/eglibc-2.19/malloc/malloc.c:3840
jl_gc_free_array at /home/lobi/julia05/src/gc.c:744 [inlined]
sweep_malloced_arrays at /home/lobi/julia05/src/gc.c:765 [inlined]
gc_sweep_other at /home/lobi/julia05/src/gc.c:1032 [inlined]
_jl_gc_collect at /home/lobi/julia05/src/gc.c:1792 [inlined]
jl_gc_collect at /home/lobi/julia05/src/gc.c:1858
jl_gc_pool_alloc at /home/lobi/julia05/src/gc.c:828
jl_gc_alloc_ at /home/lobi/julia05/src/julia_internal.h:148 [inlined]
jl_gc_alloc at /home/lobi/julia05/src/gc.c:1881
jl_alloc_svec_uninit at /home/lobi/julia05/src/simplevector.c:47
jl_alloc_svec at /home/lobi/julia05/src/simplevector.c:56
intersect_tuple at /home/lobi/julia05/src/jltypes.c:601
jl_type_intersect at /home/lobi/julia05/src/jltypes.c:992
jl_type_intersection_matching at /home/lobi/julia05/src/jltypes.c:1510
jl_lookup_match at /home/lobi/julia05/src/typemap.c:376 [inlined]
jl_typemap_intersection_node_visitor at /home/lobi/julia05/src/typemap.c:503
jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:565
jl_typemap_intersection_visitor at /home/lobi/julia05/src/typemap.c:556
ml_matches at /home/lobi/julia05/src/gf.c:2266 [inlined]
jl_matching_methods at /home/lobi/julia05/src/gf.c:2287
_methods_by_ftype at ./reflection.jl:223
unknown function (ip: 0x7f00077b1f22)
jl_call_method_internal at /home/lobi/julia05/src/julia_internal.h:189 
[inlined]
jl_apply_generic at /home/lobi/julia05/src/gf.c:1929
inlineable at ./inference.jl:2496
unknown function (ip: 0x7f00077c4c92)


fails miserably. I guess, but cannot track it down right now: There is 
something wrong in memory management of Cairo.jl that only shows up for 
objects that could have been freed long ago and julia and libcairo have 
different concepts of invalidation.

Any blog/receipe/issue that deals with GC debugging?

Wishing a happy day,
Andreas