Re: Libdivide ported to D

2017-05-15 Thread Walter Bright via Digitalmars-d-announce

On 5/15/2017 3:51 AM, Jonathan M Davis via Digitalmars-d-announce wrote:

Liran was telling me last year about how the folks at Weka had used this to
speed up the stuff in core.time and std.datetime in their local branch and
wanted me to look into updating the official implementation to use it
(unfortunately, I haven't had the time to spend on D that I would have liked
and haven't managed to look into that yet - though that would require
putting at least some of this in druntime). I confess though that I was
highly confused about the whole thing, because it sounded like this was an
optimization that the compiler already did, and yet the Weka guys were
having to use libdivide some portion of the time. I suppose that it makes
sense though if the issue is that the divisor is not known until runtime.
But unfortunately, my understanding of compiler optimizations like this is
fairly poor.



One can do things like this:

if (divisor == 10)
foreach (i; 1..1000)
result += i / 10;  // compiler generates faster code here
else
 foreach (i; 1..1000)
result += i / divisor;

if one knows in advance popular values of divisor. This sort of thing is already 
done in Phobos when converting numbers <==> strings (optimizing for radix==10).




Re: Libdivide ported to D

2017-05-15 Thread Jonathan M Davis via Digitalmars-d-announce
On Sunday, May 14, 2017 16:20:21 David Nadlinger via Digitalmars-d-announce 
wrote:
> On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:
> > On 5/14/2017 3:39 AM, Tomer Filiba wrote:
> >> Of course it only applies to runtime division -- the compiler
> >> can do the same if
> >> the divisor is known in compile time.
> >
> > I hate to say this, but modern compilers already do this for
> > generated runtime code when dividing by a constant. […]
>
> As Tomer points out, a runtime implementation can still be useful
> if the divisor is only known dynamically (but constant across
> number of operations).

Liran was telling me last year about how the folks at Weka had used this to
speed up the stuff in core.time and std.datetime in their local branch and
wanted me to look into updating the official implementation to use it
(unfortunately, I haven't had the time to spend on D that I would have liked
and haven't managed to look into that yet - though that would require
putting at least some of this in druntime). I confess though that I was
highly confused about the whole thing, because it sounded like this was an
optimization that the compiler already did, and yet the Weka guys were
having to use libdivide some portion of the time. I suppose that it makes
sense though if the issue is that the divisor is not known until runtime.
But unfortunately, my understanding of compiler optimizations like this is
fairly poor.

- Jonathan M Davis




Re: Libdivide ported to D

2017-05-14 Thread David Nadlinger via Digitalmars-d-announce

On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:

On 5/14/2017 3:39 AM, Tomer Filiba wrote:
Of course it only applies to runtime division -- the compiler 
can do the same if

the divisor is known in compile time.


I hate to say this, but modern compilers already do this for 
generated runtime code when dividing by a constant. […]


As Tomer points out, a runtime implementation can still be useful 
if the divisor is only known dynamically (but constant across 
number of operations).


 — David


Re: Libdivide ported to D

2017-05-14 Thread Walter Bright via Digitalmars-d-announce

On 5/14/2017 3:39 AM, Tomer Filiba wrote:

https://code.dlang.org/packages/divide

Libdivide (http://libdivide.com/) allows converting the DIV instruction (in
runtime) to a series of shifts and MULs, which is much more efficient in
execution time. It works by taking a number (the divisor or "denominator") and
doing some preprocessing to it, after which dividing by it can be ~8 times
faster (my own measurements). I use it to divide CPU cycles by the CPU frequency
(i.e., two big ugly numbers) to obtain wall time from it.

Of course it only applies to runtime division -- the compiler can do the same if
the divisor is known in compile time.


I hate to say this, but modern compilers already do this for generated runtime 
code when dividing by a constant. Here's dmd's version:


  https://github.com/dlang/dmd/blob/master/src/ddmd/backend/divcoeff.c


Libdivide ported to D

2017-05-14 Thread Tomer Filiba via Digitalmars-d-announce

https://code.dlang.org/packages/divide

Libdivide (http://libdivide.com/) allows converting the DIV 
instruction (in runtime) to a series of shifts and MULs, which is 
much more efficient in execution time. It works by taking a 
number (the divisor or "denominator") and doing some 
preprocessing to it, after which dividing by it can be ~8 times 
faster (my own measurements). I use it to divide CPU cycles by 
the CPU frequency (i.e., two big ugly numbers) to obtain wall 
time from it.


Of course it only applies to runtime division -- the compiler can 
do the same if the divisor is known in compile time.


Notes:
* It's a header-only library so I ported the code itself
* I tried to keep my port as mechanical as possible; I can't 
really say I know what's going on there

* I only ported the POSIX x86-64 code because that's what I needed
* Signes-ness is a big issue, be sure you use the right variant


-tomer