Re: [9fans] Go: FP in note handler

2016-02-23 Thread Kenny Lasse Hoff Levinsen

> On 23 Feb 2016, at 18:31, lu...@proxima.alt.za wrote:
> 
>> A proper duffcopy/duffzero/memmove is also an option.
> 
> The adjective "proper" is revealing.  I vote for that.
> 
> Lucio.
> 
> 

It’s a bit out of my usual area of expertise, however. I have no idea what 
benchmark they have been running, either. Any pointers?

Kenny


Re: [9fans] Go: FP in note handler

2016-02-23 Thread lucio
> A proper duffcopy/duffzero/memmove is also an option.

The adjective "proper" is revealing.  I vote for that.

Lucio.




Re: [9fans] Go: FP in note handler

2016-02-23 Thread Kenny Lasse Hoff Levinsen
A proper duffcopy/duffzero/memmove is also an option.

Best regards,
Kenny Levinsen

> On 23. feb. 2016, at 18.02, erik quanstrom  wrote:
> 
>> On Tue Feb 23 07:55:26 PST 2016, kennylevin...@gmail.com wrote:
>> A benchmark was supposedly made of the new duffcopy/duffzero which claimed 
>> significant speedup for larger copies: 
>> https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da
>> 
>> I have no clue whether this holds true or not. My intention to reenable 
>> duffcopy and continue to use duffzero is mostly to avoid differences and 
>> ensure that the note handlers are floating point free in the future. Whether 
>> the duffcopy/duffzero’s current form is an actual optimization or just a 
>> complexity, I cannot say. A test was made in #cat-v out of annoyance where 
>> the result seemed to be that it was indeed faster to use MOVUPS, but I don’t 
>> remember the details.
> 
> that post is a speedup relative to the original asm, which might not be as 
> good as the best
> non-sse versions, and it is also for amd64.
> 
> - erik
> 



Re: [9fans] Go: FP in note handler

2016-02-23 Thread erik quanstrom
On Tue Feb 23 07:55:26 PST 2016, kennylevin...@gmail.com wrote:
> A benchmark was supposedly made of the new duffcopy/duffzero which claimed 
> significant speedup for larger copies: 
> https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da
> 
> I have no clue whether this holds true or not. My intention to reenable 
> duffcopy and continue to use duffzero is mostly to avoid differences and 
> ensure that the note handlers are floating point free in the future. Whether 
> the duffcopy/duffzero’s current form is an actual optimization or just a 
> complexity, I cannot say. A test was made in #cat-v out of annoyance where 
> the result seemed to be that it was indeed faster to use MOVUPS, but I don’t 
> remember the details.

that post is a speedup relative to the original asm, which might not be as good 
as the best
non-sse versions, and it is also for amd64.

- erik



Re: [9fans] Go: FP in note handler

2016-02-23 Thread Kenny Lasse Hoff Levinsen
A benchmark was supposedly made of the new duffcopy/duffzero which claimed 
significant speedup for larger copies: 
https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da

I have no clue whether this holds true or not. My intention to reenable 
duffcopy and continue to use duffzero is mostly to avoid differences and ensure 
that the note handlers are floating point free in the future. Whether the 
duffcopy/duffzero’s current form is an actual optimization or just a 
complexity, I cannot say. A test was made in #cat-v out of annoyance where the 
result seemed to be that it was indeed faster to use MOVUPS, but I don’t 
remember the details.

Best regards,
Kenny Levinsen

> On 23 Feb 2016, at 16:27, erik quanstrom  wrote:
> 
> On Tue Feb 23 02:36:41 PST 2016, kennylevin...@gmail.com wrote:
>> Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 
>> specific runtime.sighandler implementation and everything called by it 
>> directly. Notes that don't exit the process are queued and should run 
>> outside the actual note handler.
>> 
>> I think the "magic" code will be isolated, and might fend off accidental 
>> future additions of floating point registers. The magic-ness also only 
>> revolves around avoiding duffzero and duffcopy in some way. I also think 
>> that removing conditionals in the compiler will be a positive thing.
>> 
>> I still do not know the feasibility of my plan, whether it is possible to do 
>> cleanly, or possible at all. Maybe someone smarter than me with knowledge on 
>> the matter could chime in and call me an idiot?
>> 
>> Avoiding duffcopy should be easy with a simple memmove implementation. If 
>> done right, we can also remove the plan9 specific runtime.memmove and only 
>> use the slow memmove in sighandler (The globlal runtime.memmove is 
>> implemented using MOVUPS just like duffcopy. Duffcopy is used for 
>> blockcopies by the compiler in some cases, although I must admit to not know 
>> all the cases yet).
>> 
>> Avoiding duffzero without compiler assistance is a bit more tricky - global 
>> variables, stack on assembly functions, something like that.
> 
> fwiw, on modern amd64 machines, using the xmm and ymm registers has a benefit 
> only in a narrow range
> of sizes (384-511 bytes) and a subset of (mis-)alignments that i've 
> forgotten.  at least for the exact test setup
> i used on 3-4 different µarches.  intel claims rep; movs is the 
> (architecturally) fastest way to go.
> 
> i am not sure any of this makes much difference, as it's hard to know what a 
> real-world memory
> access pattern looks like, and that seems to dominate all but gigantic moves, 
> for which rep; movs
> is actually no slower than even the trickiest use of ymm registers.
> 
> - erik
> 




Re: [9fans] Go: FP in note handler

2016-02-23 Thread erik quanstrom
On Tue Feb 23 02:36:41 PST 2016, kennylevin...@gmail.com wrote:
> Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 
> specific runtime.sighandler implementation and everything called by it 
> directly. Notes that don't exit the process are queued and should run outside 
> the actual note handler.
> 
> I think the "magic" code will be isolated, and might fend off accidental 
> future additions of floating point registers. The magic-ness also only 
> revolves around avoiding duffzero and duffcopy in some way. I also think that 
> removing conditionals in the compiler will be a positive thing.
> 
> I still do not know the feasibility of my plan, whether it is possible to do 
> cleanly, or possible at all. Maybe someone smarter than me with knowledge on 
> the matter could chime in and call me an idiot?
> 
> Avoiding duffcopy should be easy with a simple memmove implementation. If 
> done right, we can also remove the plan9 specific runtime.memmove and only 
> use the slow memmove in sighandler (The globlal runtime.memmove is 
> implemented using MOVUPS just like duffcopy. Duffcopy is used for blockcopies 
> by the compiler in some cases, although I must admit to not know all the 
> cases yet).
> 
> Avoiding duffzero without compiler assistance is a bit more tricky - global 
> variables, stack on assembly functions, something like that.

fwiw, on modern amd64 machines, using the xmm and ymm registers has a benefit 
only in a narrow range
of sizes (384-511 bytes) and a subset of (mis-)alignments that i've forgotten.  
at least for the exact test setup
i used on 3-4 different µarches.  intel claims rep; movs is the 
(architecturally) fastest way to go.

i am not sure any of this makes much difference, as it's hard to know what a 
real-world memory
access pattern looks like, and that seems to dominate all but gigantic moves, 
for which rep; movs
is actually no slower than even the trickiest use of ymm registers.

- erik



Re: [9fans] Go: FP in note handler

2016-02-23 Thread Kenny Lasse Hoff Levinsen
Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 
specific runtime.sighandler implementation and everything called by it 
directly. Notes that don't exit the process are queued and should run outside 
the actual note handler.

I think the "magic" code will be isolated, and might fend off accidental future 
additions of floating point registers. The magic-ness also only revolves around 
avoiding duffzero and duffcopy in some way. I also think that removing 
conditionals in the compiler will be a positive thing.

I still do not know the feasibility of my plan, whether it is possible to do 
cleanly, or possible at all. Maybe someone smarter than me with knowledge on 
the matter could chime in and call me an idiot?

Avoiding duffcopy should be easy with a simple memmove implementation. If done 
right, we can also remove the plan9 specific runtime.memmove and only use the 
slow memmove in sighandler (The globlal runtime.memmove is implemented using 
MOVUPS just like duffcopy. Duffcopy is used for blockcopies by the compiler in 
some cases, although I must admit to not know all the cases yet).

Avoiding duffzero without compiler assistance is a bit more tricky - global 
variables, stack on assembly functions, something like that.

Best regards,
Kenny Levinsen

On 23. feb. 2016, at 10.05, lu...@proxima.alt.za wrote:

>> Well, avoiding XMM registers in duffcopy/duffzero is one solution, but
>> I was thinking of working around them entirely in code called from the
>> note handler, so that duffcopy/duffzero can operate as intended on
>> plan9, rather than littering the compiler with OS conditionals.
> 
> Do you think you'll be able to sell that to the Go developers?  You
> ARE talking about a system-wide adjustment and it seems to me that it
> will need constant supervision to be maintained.  Again, I may have
> misunderstood, but it does seem like a maintenance nightmare to me.
> 
> As for:
> 
>> To fix the duffzero, we'd have to fix runtime.goexitsall's buffer
>> usage, but to reenable duffcopy, we'd have to look at the much bigger
>> runtime.sighandler.
> 
> That is undeniable, but to avoid a different type of maintenance
> nightmare, may be the only option.  Although "fixing" duffcopy and
> duffzero would seem a better, if less efficient option.
> 
> Still, it's the opinion of a none-too-well-informed spectator, do not
> let me spoil it for you.  In particular, I'm sure I'm not telling you
> anything you have not already considered.
> 
> Lucio.
> 
> PS: I do think that it is our responsibility to track each and every
> aspect of Go where Plan 9 demands special treatment.  Ideally, this
> means build flags or specially named modules and a commitment from a
> few of us to keep these in sync.  Anything else becomes someone else's
> responsibility and that is risky.
> 



Re: [9fans] Go: FP in note handler

2016-02-23 Thread Kenny Lasse Hoff Levinsen
Well, avoiding XMM registers in duffcopy/duffzero is one solution, but I was 
thinking of working around them entirely in code called from the note handler, 
so that duffcopy/duffzero can operate as intended on plan9, rather than 
littering the compiler with OS conditionals.

It puts some restrictions on the note handling code, such as no copy(), make() 
or even an on-stack var b [n]byte. Due to sighandler disabling write barriers, 
we can't currently allocate on the heap, meaning that we might need either 
locked global buffers (which can be duffzeroed) or more assembly so we can use 
on-stack buffers (which could be zeroed if we wanted to, they just can't use 
duffzero for it).

To fix the duffzero, we'd have to fix runtime.goexitsall's buffer usage, but to 
reenable duffcopy, we'd have to look at the much bigger runtime.sighandler.

Best regards,
Kenny Levinsen

On 23. feb. 2016, at 08.20, lu...@proxima.alt.za wrote:

>> Duffcopy is disabled from plan9 after the last bug report on the
>> matter, but duffzero was later optimized to use XMM registers, causing
>> goexitsall, which use an on-stack byte array to make a new note, to
>> call duffzero and trip the fp in note handler message.
> 
> I had to re-read this to understand this because you tend to put at
> the end what I would find easier to understand if it was at the
> beginning.  No offence meant, different punctuation would have perhaps
> helped my understanding.
> 
> So, we need a duffcopy and duffzero that do not use XMM registers,
> rather than stop invoking them, if I read your comment correctly?
> 
> I also have an open issue (I see David has offered to look into it
> soon) involving syscalls and their error messages, it seems these are
> all Plan 9 specific issues that could be addressed together.
> 
> I really would like to take a more active role in Go for Plan 9, but I
> can't yet give it the priority I'd like.  Still, I like hearing from
> others who take this to heart.
> 
> Lucio.
> 



[9fans] Go: FP in note handler

2016-02-22 Thread Kenny Lasse Hoff Levinsen
For those interested in the matter, I have opened 
https://github.com/golang/go/issues/14471

I mention potentially reenabling duffcopy by writing some magic note handler 
code that avoid the regular copy and zero optimizations, but I’m not entirely 
sure if that’s a plausible path. If it is, I think it would bring benefit, both 
in the performance gained by duffcopy/duffzero, as well as the chances of this 
happening again. It is, however, slightly annoying to do, as you cannot use 
copy(), make() or even strings or byte array literals, as these will trip 
duffcopy and duffzero. Any comments to my silly idea?

Best regards,
Kenny Levinsen

> On 22 Feb 2016, at 18:16, Richard Miller  wrote:
> 
>> The trace of goexitsall still contain FP register access (XORPS and duffzero 
>> which contains MOVUPS)
> 
> Sorry, in that case I think my patch is not relevant for your issue
> (but it does prevent a deadlock on multiprocessors which you might
> also run into...)
>