Re: [go-nuts] Re: Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread 'Keith Randall' via golang-nuts
Your assembly looks ok to me. At least, the sections you've shown us. It 
would help if we could see all of it and/or the whole program.

You might want to try putting a len>0 test in the pop and a len
> On Fri, Mar 22, 2019 at 10:55 AM Robert Johnstone 
> > wrote: 
> > 
> > I don't see any memory barriers in your assembly.  If you are modifying 
> the backing array while it is being scanned by the GC, there could be some 
> interaction.  I don't know enough about the GC internals to say more than 
> that.  If you look at when memory barriers are inserted by the Go compiler, 
> it might provide more guidance. 
>
> If it's just []uint64 that shouldn't be an issue, as write barriers 
> are not required for uint64. 
>
> You are certainly correct if the assembly is manipulating slices that 
> contain pointers. 
>
> Ian 
>
>
> > On Friday, 22 March 2019 00:39:34 UTC-4, Tom wrote: 
> >> 
> >> I've been stuck on this for a few days so thought I would ask the 
> brains trust. 
> >> 
> >> TL;DR: When I have native amd64 instructions mutating (updating the len 
> + values of a []uint64) a slice, I experience spurious & random memory 
> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
> the same thing continuously), and only when the GC is enabled. Any 
> debugging ideas or things I should look into? 
> >> 
> >> Background: 
> >> 
> >> I'm calling into go assembly with a few pointers to slices (*[]uint64), 
> and that assembly is mutating them (reading/writing values, updating len 
> within capacity). I'm experiencing random memory corruption, but I can only 
> trigger it in the following scenarios: 
> >> 
> >> Heavy load - Doing a zillion things at once (specifically running all 
> my test cases in parallel) and maxing out my machine. 
> >> Parallelism - A panic due to memory corruption happens faster if 
> --parallel is set higher, and never if not in parallel. 
> >> GC - The panic never happens if the GC is disabled (of course, the test 
> process eventually runs out of memory). 
> >> 
> >> The memory corruption varies, but usually results in an element of an 
> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
> or (less likely) a segfault. 
> >> 
> >> Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all 
> my test cases at once (with --count at 8000 or so & using t.Parallel()). 
> Running thing serially or individually yields the correct behaviour. 
> >> 
> >> The assembly in question looks like this: 
> >> 
> >> TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
> >> GO_ARGS 
> >> MOVQ asm+0(FP), AX  // Load the address of the assembly 
> section. 
> >> MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
> >> MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
> >> MOVQ 0(AX), AX  // Deference pointer to native code. 
> >> JMP AX  // Jump to native code. 
> >> 
> >> And slice manipulation like this (this is a 'pop'): 
> >> 
> >>  MOVQ r13, [r10+8]   // Load the length of the slice. 
> >>  DECQ r13// Decrements the len (I can guarantee 
> this will never underflow). 
> >>  MOVQ r12, [r10] // Load the 0th element address. 
> >>  LEAQ r12, [r12 + r13*8] // Compute the address of the last 
> element. 
> >>  MOVQ reg, [r12] // Load the element to reg. 
> >>  MOVQ [r10+8], r13   // Write the len back. 
> >> 
> >> or 'push' like this (note: cap is always large enough for any pushes) 
> ... 
> >> 
> >>  MOVQ r12, [r10]  // Load the 0th element address. 
> >>  MOVQ r13, [r10+8]// Load the len. 
> >>  LEAQ r12, [r12 + r13*8]  // Compute the address of the last 
> element + 1. 
> >>  INCQ r13 // Increment the len. 
> >>  MOVQ [r10+8], r13// Save the len. 
> >>  MOVQ [r12],   reg// Write the new element. 
> >> 
> >> 
> >> I acknowledge that calling into code like this is unsupported, but I 
> struggle to understand how such corruption can happen, and having stared at 
> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
> preemption was in these versions of Go I would expect the GC to  abort when 
> it cant find the stack maps for my RIP value. With no GC safe points in my 
> native assembly, I dont see how the GC could interfere (yet the issue 
> disappears with the GC off??). 
> >> 
> >> Questions: 
> >> 
> >> Any ideas what I'm doing wrong? 
> >> Any ideas how I can trace this from the application side and also the 
> runtime side? I've tried schedtrace and the like, but the output didnt 
> appear useful or correlated to the crashes. 
> >> Any suggestions for assumptions I might have missed and should write 
> tests / guards for? 
> >> 
> >> Thanks, 
> >> Tom 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "golang-nuts" group. 
> > To unsubscribe from this group 

Re: [go-nuts] Re: Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread Ian Lance Taylor
On Fri, Mar 22, 2019 at 10:55 AM Robert Johnstone
 wrote:
>
> I don't see any memory barriers in your assembly.  If you are modifying the 
> backing array while it is being scanned by the GC, there could be some 
> interaction.  I don't know enough about the GC internals to say more than 
> that.  If you look at when memory barriers are inserted by the Go compiler, 
> it might provide more guidance.

If it's just []uint64 that shouldn't be an issue, as write barriers
are not required for uint64.

You are certainly correct if the assembly is manipulating slices that
contain pointers.

Ian


> On Friday, 22 March 2019 00:39:34 UTC-4, Tom wrote:
>>
>> I've been stuck on this for a few days so thought I would ask the brains 
>> trust.
>>
>> TL;DR: When I have native amd64 instructions mutating (updating the len + 
>> values of a []uint64) a slice, I experience spurious & random memory 
>> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
>> the same thing continuously), and only when the GC is enabled. Any debugging 
>> ideas or things I should look into?
>>
>> Background:
>>
>> I'm calling into go assembly with a few pointers to slices (*[]uint64), and 
>> that assembly is mutating them (reading/writing values, updating len within 
>> capacity). I'm experiencing random memory corruption, but I can only trigger 
>> it in the following scenarios:
>>
>> Heavy load - Doing a zillion things at once (specifically running all my 
>> test cases in parallel) and maxing out my machine.
>> Parallelism - A panic due to memory corruption happens faster if --parallel 
>> is set higher, and never if not in parallel.
>> GC - The panic never happens if the GC is disabled (of course, the test 
>> process eventually runs out of memory).
>>
>> The memory corruption varies, but usually results in an element of an 
>> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, or 
>> (less likely) a segfault.
>>
>> Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
>> test cases at once (with --count at 8000 or so & using t.Parallel()). 
>> Running thing serially or individually yields the correct behaviour.
>>
>> The assembly in question looks like this:
>>
>> TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24
>> GO_ARGS
>> MOVQ asm+0(FP), AX  // Load the address of the assembly section.
>> MOVQ stack+8(FP),   R10 // Load the address of the 1st slice.
>> MOVQ locals+16(FP), R11 // Load the address of the 2nd slice.
>> MOVQ 0(AX), AX  // Deference pointer to native code.
>> JMP AX  // Jump to native code.
>>
>> And slice manipulation like this (this is a 'pop'):
>>
>>  MOVQ r13, [r10+8]   // Load the length of the slice.
>>  DECQ r13// Decrements the len (I can guarantee this 
>> will never underflow).
>>  MOVQ r12, [r10] // Load the 0th element address.
>>  LEAQ r12, [r12 + r13*8] // Compute the address of the last element.
>>  MOVQ reg, [r12] // Load the element to reg.
>>  MOVQ [r10+8], r13   // Write the len back.
>>
>> or 'push' like this (note: cap is always large enough for any pushes) ...
>>
>>  MOVQ r12, [r10]  // Load the 0th element address.
>>  MOVQ r13, [r10+8]// Load the len.
>>  LEAQ r12, [r12 + r13*8]  // Compute the address of the last element + 1.
>>  INCQ r13 // Increment the len.
>>  MOVQ [r10+8], r13// Save the len.
>>  MOVQ [r12],   reg// Write the new element.
>>
>>
>> I acknowledge that calling into code like this is unsupported, but I 
>> struggle to understand how such corruption can happen, and having stared at 
>> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
>> preemption was in these versions of Go I would expect the GC to  abort when 
>> it cant find the stack maps for my RIP value. With no GC safe points in my 
>> native assembly, I dont see how the GC could interfere (yet the issue 
>> disappears with the GC off??).
>>
>> Questions:
>>
>> Any ideas what I'm doing wrong?
>> Any ideas how I can trace this from the application side and also the 
>> runtime side? I've tried schedtrace and the like, but the output didnt 
>> appear useful or correlated to the crashes.
>> Any suggestions for assumptions I might have missed and should write tests / 
>> guards for?
>>
>> Thanks,
>> Tom
>
> --
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, 

[go-nuts] Re: Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread Robert Johnstone
I don't see any memory barriers in your assembly.  If you are modifying the 
backing array while it is being scanned by the GC, there could be some 
interaction.  I don't know enough about the GC internals to say more than 
that.  If you look at when memory barriers are inserted by the Go compiler, 
it might provide more guidance.



On Friday, 22 March 2019 00:39:34 UTC-4, Tom wrote:
>
> I've been stuck on this for a few days so thought I would ask the brains 
> trust.
>
> *TL;DR: *When I have native amd64 instructions mutating (updating the len 
> + values of a []uint64) a slice, I experience spurious & random memory 
> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
> the same thing continuously), and only when the GC is enabled. Any 
> debugging ideas or things I should look into?
>
> *Background:*
>
> I'm calling into go assembly with a few pointers to slices (*[]uint64), 
> and that assembly is mutating them (reading/writing values, updating len 
> within capacity). I'm experiencing random memory corruption, but I can only 
> trigger it in the following scenarios:
>
>1. Heavy load - Doing a zillion things at once (specifically running 
>all my test cases in parallel) and maxing out my machine.
>2. Parallelism - A panic due to memory corruption happens faster if 
>--parallel is set higher, and never if not in parallel.
>3. GC - The panic never happens if the GC is disabled (of course, the 
>test process eventually runs out of memory).
>
> The memory corruption varies, but usually results in an element of an 
> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
> or (less likely) a segfault.
>
> Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
> test cases at once (with --count at 8000 or so & using t.Parallel()). 
> Running thing serially or individually yields the correct behaviour.
>
> The assembly in question looks like this:
>
> TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24
> GO_ARGS
> MOVQ asm+0(FP), AX  // Load the address of the assembly 
> section.
> MOVQ stack+8(FP),   R10 // Load the address of the 1st slice.
> MOVQ locals+16(FP), R11 // Load the address of the 2nd slice.
> MOVQ 0(AX), AX  // Deference pointer to native code.
> JMP AX  // Jump to native code.
>
> And slice manipulation like this (this is a 'pop'):
>
>  MOVQ r13, [r10+8]   // Load the length of the slice.
>  DECQ r13// Decrements the len (I can guarantee this 
> will never underflow).
>  MOVQ r12, [r10] // Load the 0th element address.
>  LEAQ r12, [r12 + r13*8] // Compute the address of the last element.
>  MOVQ reg, [r12] // Load the element to reg.
>  MOVQ [r10+8], r13   // Write the len back.
>
> or 'push' like this (note: cap is always large enough for any pushes) ...
>
>  MOVQ r12, [r10]  // Load the 0th element address.
>  MOVQ r13, [r10+8]// Load the len.
>  LEAQ r12, [r12 + r13*8]  // Compute the address of the last element 
> + 1.
>  INCQ r13 // Increment the len.
>  MOVQ [r10+8], r13// Save the len.
>  MOVQ [r12],   reg// Write the new element.
>
>
> I acknowledge that calling into code like this is unsupported, but I 
> struggle to understand how such corruption can happen, and having stared at 
> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
> preemption was in these versions of Go I would expect the GC to  abort when 
> it cant find the stack maps for my RIP value. With no GC safe points in my 
> native assembly, I dont see how the GC could interfere (yet the issue 
> disappears with the GC off??).
>
> *Questions:*
>
>1. Any ideas what I'm doing wrong?
>2. Any ideas how I can trace this from the application side and also 
>the runtime side? I've tried schedtrace and the like, but the output didnt 
>appear useful or correlated to the crashes.
>3. Any suggestions for assumptions I might have missed and should 
>write tests / guards for?
>
> Thanks,
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.