Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-11-03 Thread 'simon place' via golang-nuts
> > > That's the sort of cheap checks that I had mind in the very first post > when I talked about "I envisaged a call to CPUID and then some bool tests > along the way to utilise SSE[2-4]/AVX[2] (or NEON on ARM) if available. All > in a static, portable package." Thanks for a good example of

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-11-03 Thread Ondrej
On Thursday, 3 November 2016 01:40:29 UTC, Nigel Tao wrote: > > > Another ignorant question from me, but what do you mean exactly by > universal binary? > Apologies for the confusing and nonsensical term. What I meant was a binary that works for a number of CPUs within an architecture, with or

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-11-02 Thread Nigel Tao
On Tue, Nov 1, 2016 at 8:58 PM, Ondrej wrote: > It seems that a universal binary, as Go requires it, would be slow on > dispatch, because there would be too much checking for individual intrinsics > support. Do I understand it correctly, that to overcome this, people

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-11-01 Thread Ondrej
Klaus, that's a great thread, completely missed it. It seems that a universal binary, as Go requires it, would be slow on dispatch, because there would be too much checking for individual intrinsics support. Do I understand it correctly, that to overcome this, people either compile natively

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-28 Thread 'simon place' via golang-nuts
> Yes, speeding up an accumulation step, described at > > https://medium.com/@raphlinus/inside-the-fastest-font-renderer-in-the-world-75ae5270c445#.qz8jram0o > > > The generated code are SIMD implementations of very simple Go functions. > > For example, the fixedAccumulateOpSrcSIMD function

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-28 Thread 'simon place' via golang-nuts
> > Take for instance the PSHUFB instruction, which allows a very fast > [16]byte lookup in SSSE3 capable machines. This is helpful in various ways, > but if it isn't available, it will have to commit the XMM register to > memory and do 16 lookups, which is at least an order of magnitude

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-28 Thread 'simon place' via golang-nuts
just, Machine Code, may be a less common term now. strictly you might say the 'text' being generated here is assembly, and it becomes m/c 'numbers' after the assembler, but since its just a one-one relationship, there isn't really much of a conceptual difference. On Friday, 28 October 2016

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-27 Thread Erwin Driessens
I'd love to see SIMD intrinsics in the Go compiler(s), even if it would mean separate packages for all the architectures. I'm not experienced enough to tell how far one could get with designing a cross-platform set of intrinsics instructions? Using the hardware when it is available, falling

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-27 Thread Nigel Tao
On Thu, Oct 27, 2016 at 9:24 AM, 'simon place' via golang-nuts wrote: > the approach i took was to try to minimise the M/C, so; Sorry for the ignorant question, but what does M/C stand for? -- You received this message because you are subscribed to the Google

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-27 Thread Nigel Tao
On Fri, Oct 28, 2016 at 6:54 AM, 'simon place' via golang-nuts wrote: > however, from looking at it, couldn’t find documentation, that code is > specific to speeding up graphics overlays? maybe? (accumulate) Yes, speeding up an accumulation step, described at

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-27 Thread 'simon place' via golang-nuts
> Something like that? short answer, Yes. however, from looking at it, couldn’t find documentation, that code is specific to speeding up graphics overlays? maybe? (accumulate) but it’s confusing me that its using templates, when there seems to only be one template. i was thinking of one,

Re: [go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-27 Thread Seb Binet
Something like that? https://github.com/golang/image/blob/master/vector/gen.go -s sent from my droid On Oct 27, 2016 12:24 AM, "'simon place' via golang-nuts" < golang-nuts@googlegroups.com> wrote: > i was playing with SIMD last year, > > the approach i took was to try to minimise the M/C,

[go-nuts] Re: Float32 math and slice arithmetics using SIMD

2016-10-26 Thread 'simon place' via golang-nuts
i was playing with SIMD last year, the approach i took was to try to minimise the M/C, so; no attempt to support general formula, let people combine the, pre-made, most common/expensive functions, like SIMD designers did, only up the complexity of formula supported and make it x-platform. make