Re: class templates and static if

2012-02-29 Thread Ali Çehreli

On 02/28/2012 02:12 AM, Dmitry Olshansky wrote:
> On 28.02.2012 2:17, Ali Çehreli wrote:

>> I have played with this optimization recently. (Could be dmd 2.057.) No,
>> dmd did not optimize a straightforward switch statement over a ubyte
>> expression with about two hundred ubyte cases.
>>
> Hate to say it but I see it in real-world code, VM performance almost
> doubled.

That's really good to hear. Perhaps my experiments were either naive, or 
a series of "cmp xxx je xxx" would be faster, say for the ubyte type 
that I've used.


> Mm care to share you experiments?

Here is a program that mixes-in a lookup function that contains a single 
switch statement that covers the entire 256 values of ubyte:


import std.stdio;
import std.conv;

string oneCase(FromT, ToT)(FromT fromExpr, ToT toExpr)
{
return
" case " ~ to!string(fromExpr)
~ ": return " ~ to!string(ToT.init) ~ "; ";
}

string manyCases(FromT, ToT)(size_t count)
{
string result;

ToT fromExpr;
foreach (i; 0 .. count) {
result ~= oneCase(fromExpr, 42);
++fromExpr;
}

return result;
}

string lookUpFunc(FromT, ToT)(size_t count)
{
immutable header = ToT.stringof ~ " lookUp(" ~ FromT.stringof
   ~ " expr) { switch (expr) { ";
immutable cases = manyCases!(FromT, ToT)(count);
immutable footer = " default: return " ~ to!string(ToT.init) ~ "; } }";

return header ~ cases ~ footer;
}

void main()
{
enum funcWithManyCases = lookUpFunc!(ubyte, ubyte)(256);

writeln("Mixing-in: ", funcWithManyCases);
mixin(funcWithManyCases);

assert(lookUp(42) == 0);
}

I compiled with -O and then used obj2asm to see that there are 256 cmp 
xxx jmp xxx instructions.


Ali



Re: class templates and static if

2012-02-28 Thread Dmitry Olshansky

On 28.02.2012 2:17, Ali Çehreli wrote:

On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

 > I'm not saying that dmd doesn't ever optimize switch statements. I'm
just
 > saying that as I understand it, it doesn't always do so (its
performance with
 > ranged case statements isn't great for instance). Odds are that it
_does_
 > optimize straight integer case statements fairly well, because that's
the
 > classic C stuff (and if you've verified that, all the better - I
haven't).


I would say use switch when you have multiple choises over one variable, 
because the compiler *can* do much better job then with if/else chain 
(even for strings, e.g. hashing, same binary-search approach). Because 
the if/else chain fixes the order of comparisons it can't become faster 
at all (or it would be an aggressive optimization).

It's just wrong idea to lose this opportunity with switch.



I have played with this optimization recently. (Could be dmd 2.057.) No,
dmd did not optimize a straightforward switch statement over a ubyte
expression with about two hundred ubyte cases.

Hate to say it but I see it in real-world code, VM performance almost 
doubled. Mm care to share you experiments? Alternatively just dissemble 
any regex sample, seek out mangled trash of _xxx_eval_ function.



The assembly contained conditional jump statements, not a table. And
yes, I did try with -O.

But I am not sure that a lookup table really is an optimization with
modern CPUs. A series of conditional jumps that fit the CPU's cache
could be faster than a table that's outside of the cache. I think
accessing the cache is hundreds of times faster than accessing memory
outside of the cache.


It's all about branch prediction really - hundreds of cmp xxx je xxx 
stall pipelines pretty darn well. Though the effect might depend on the 
code size inside these case branches.

BTW tables should be somewhere close, though I haven't checked this.

--
Dmitry Olshansky


Re: class templates and static if

2012-02-27 Thread Ali Çehreli

On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

> I'm not saying that dmd doesn't ever optimize switch statements. I'm just
> saying that as I understand it, it doesn't always do so (its 
performance with

> ranged case statements isn't great for instance). Odds are that it _does_
> optimize straight integer case statements fairly well, because that's the
> classic C stuff (and if you've verified that, all the better - I 
haven't).


I have played with this optimization recently. (Could be dmd 2.057.) No, 
dmd did not optimize a straightforward switch statement over a ubyte 
expression with about two hundred ubyte cases.


The assembly contained conditional jump statements, not a table. And 
yes, I did try with -O.


But I am not sure that a lookup table really is an optimization with 
modern CPUs. A series of conditional jumps that fit the CPU's cache 
could be faster than a table that's outside of the cache. I think 
accessing the cache is hundreds of times faster than accessing memory 
outside of the cache.


Ali



Re: class templates and static if

2012-02-27 Thread Jonathan M Davis
On Monday, February 27, 2012 23:55:39 Dmitry Olshansky wrote:
> On 27.02.2012 23:18, Jonathan M Davis wrote:
> > On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
> >> On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
> >>> I didn't want to do subclassing, because my parser is a state-machine
> >> 
> >> style
> >> 
> >>> parser, so it's in a big switch. Pretty gross, but I would like it to
> >> 
> >> be as
> >> 
> >>> fast as possible. That's why I thought this model would be so cool,
> >> 
> >> because
> >> 
> >>> I could remove conditions from the generated code, and get rid of a
> >> 
> >> lot of
> >> 
> >>> the conditionals.
> >> 
> >> I am pretty sure switch statements boil down to a sequence conditionals
> >> consisting of equality comparisons.
> >> 
> >> I know that some compilers use optimizations where the comparisons are
> >> converted to a single lookup, but last I checked, dmd does not apply
> >> that optimization. Perhaps because it's not implemented yet, or because
> >> using a table lookup might be slower because of reaching outside of the
> >> cpu cache. (Or another reason that I am not aware of. :))
> > 
> > Yeah. In theory, switch statements can be optimized better than if-else
> > chains, and eventually I'd fully expect that dmd would do that, but right
> > now, I don't think that they really are. You'd have to look at the
> > generated assembly though to be sure though.
> 
> Please stop spreading the misconception that they are going to be
> optimized "probably and eventually". Switch statements over integers are
> much better then if/else chains and were for something like 20 years.
> They do produce nice improvements over a sequence of if/else because
> values are known in advance and are bounded.
> So it goes like this:
[snip]
> And for the record I seen all of that in assembly listings produced by
> dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes
> layout a bit so that dmd outputs jump table and not a binary-searching
> like stuff in a tight loop.

I'm not saying that dmd doesn't ever optimize switch statements. I'm just 
saying that as I understand it, it doesn't always do so (its performance with 
ranged case statements isn't great for instance). Odds are that it _does_ 
optimize straight integer case statements fairly well, because that's the 
classic C stuff (and if you've verified that, all the better - I haven't). It's 
the new stuff that's less likely to be well-optimized (like ranged case 
statements or case statements with strings). And those probably _will_ be 
better optimized eventually, but as I understand it, they aren't now.

So, if the reason for using a switch statement over and an if-else chain is 
purely for performance (particularly if doing something else than just using 
plain case statements with integers like in C), then you should probably 
profile your code and/or look at the assembly to be sure that you're getting 
the performance gain that you think that you're getting.

- Jonathan M Davis


Re: class templates and static if

2012-02-27 Thread Dmitry Olshansky

On 27.02.2012 23:18, Jonathan M Davis wrote:

On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:

On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:

I didn't want to do subclassing, because my parser is a state-machine


style


parser, so it's in a big switch. Pretty gross, but I would like it to


be as


fast as possible. That's why I thought this model would be so cool,


because


I could remove conditions from the generated code, and get rid of a


lot of


the conditionals.


I am pretty sure switch statements boil down to a sequence conditionals
consisting of equality comparisons.

I know that some compilers use optimizations where the comparisons are
converted to a single lookup, but last I checked, dmd does not apply
that optimization. Perhaps because it's not implemented yet, or because
using a table lookup might be slower because of reaching outside of the
cpu cache. (Or another reason that I am not aware of. :))


Yeah. In theory, switch statements can be optimized better than if-else
chains, and eventually I'd fully expect that dmd would do that, but right now,
I don't think that they really are. You'd have to look at the generated
assembly though to be sure though.


Please stop spreading the misconception that they are going to be 
optimized "probably and eventually". Switch statements over integers are 
much better then if/else chains and were for something like 20 years. 
They do produce nice improvements over a sequence of if/else because 
values are known in advance and are bounded.

So it goes like this:
1. If all numbers are tightly packed (e.g. almost all of M..N range is 
covered by a case) then it does a jump table. So there is almost no 
comparisons at all (just check it fits the range) and it's just one 
indirect jump.


2.Even failing that it does a binary-search like comparisons to speed 
thing up.


3. Sometimes linear searching is faster then binary (on small number of 
case) and it can estimate this threshold. Same goes for table vs linear 
search and table vs binary search.


4. In fact it even does combination of all of the above, e.g. separate 
range in two by one comparison then use jump tables in both etc.


For instance:

switch(x){
case 0:
...
case 1:
...
case 4:
...
case 9:
... //all cases used up to 40
case 40:

It might do something like (depends on compiler vendor ;) ):

if( x < 9)
{
if(x == 0)
...

else if(x == 1)
...

else if(x == 4)
...
}
else {
if(x <=40)
goto table[x]; //pseudocode
else
goto default;
}

And for the record I seen all of that in assembly listings produced by 
dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes 
layout a bit so that dmd  outputs jump table and not a binary-searching 
like stuff in a tight loop.



--
Dmitry Olshansky


Re: class templates and static if

2012-02-27 Thread Jonathan M Davis
On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
> On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
> > I didn't want to do subclassing, because my parser is a state-machine
> 
> style
> 
> > parser, so it's in a big switch. Pretty gross, but I would like it to
> 
> be as
> 
> > fast as possible. That's why I thought this model would be so cool,
> 
> because
> 
> > I could remove conditions from the generated code, and get rid of a
> 
> lot of
> 
> > the conditionals.
> 
> I am pretty sure switch statements boil down to a sequence conditionals
> consisting of equality comparisons.
> 
> I know that some compilers use optimizations where the comparisons are
> converted to a single lookup, but last I checked, dmd does not apply
> that optimization. Perhaps because it's not implemented yet, or because
> using a table lookup might be slower because of reaching outside of the
> cpu cache. (Or another reason that I am not aware of. :))

Yeah. In theory, switch statements can be optimized better than if-else 
chains, and eventually I'd fully expect that dmd would do that, but right now, 
I don't think that they really are. You'd have to look at the generated 
assembly though to be sure though.

- Jonathan M Davis


Re: class templates and static if

2012-02-27 Thread Ali Çehreli

On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:

> I didn't want to do subclassing, because my parser is a state-machine 
style
> parser, so it's in a big switch. Pretty gross, but I would like it to 
be as
> fast as possible. That's why I thought this model would be so cool, 
because
> I could remove conditions from the generated code, and get rid of a 
lot of

> the conditionals.

I am pretty sure switch statements boil down to a sequence conditionals 
consisting of equality comparisons.


I know that some compilers use optimizations where the comparisons are 
converted to a single lookup, but last I checked, dmd does not apply 
that optimization. Perhaps because it's not implemented yet, or because 
using a table lookup might be slower because of reaching outside of the 
cpu cache. (Or another reason that I am not aware of. :))


Ali



Re: Re: class templates and static if

2012-02-27 Thread Andrej Mitrovic
On 2/27/12, Tyler Jameson Little  wrote:
> That's why I thought this model would be so cool, because
> I could remove conditions from the generated code, and get rid of a lot of
> the conditionals.

I think Nick Sabalausky talks about this concept (removing
conditionals via compile-time features) in his article
http://www.semitwist.com/articles/EfficientAndFlexible/SinglePage/

Other articles from last year's contest:
http://prowiki.org/wiki4d/wiki.cgi?Article_Contest


Re: Re: class templates and static if

2012-02-27 Thread H. S. Teoh
On Mon, Feb 27, 2012 at 09:29:25AM -0700, Tyler Jameson Little wrote:
> > Parser!(Type.request) h = new Parser!(Type.request)("Hello world")

Even better:

auto h = new Parser!(Type.request)("Hello world");


T

-- 
First Rule of History: History doesn't repeat itself -- historians
merely repeat each other.


Re: Re: class templates and static if

2012-02-27 Thread Tyler Jameson Little
> Parser!(Type.request) h = new Parser!(Type.request)("Hello world")

Wow, I was so close!  I had tried this:

Parser!Type.request h = new Parser!Type.request("Hello world");

but that didn't work. I didn't think about enclosing it in parens!

I didn't want to do subclassing, because my parser is a state-machine style
parser, so it's in a big switch. Pretty gross, but I would like it to be as
fast as possible. That's why I thought this model would be so cool, because
I could remove conditions from the generated code, and get rid of a lot of
the conditionals.

Thanks so much!


Re: class templates and static if

2012-02-26 Thread James Miller
> Also, 'static' on the enum declaration doesn't really do anything. :)

but what if the immutable data changes?! :P

But yeah, D's type inference is reasonably good (not as good as
Haskell's, but nothing is as good as Haskell's), so doing what seems
like the best way is normally the right way around it.

If you have a lot of shared code, it might make more sense to just use
traditional subclassing. Assuming that the shared code is all mostly
Parser stuff then subclassing a Parser type into Request and Response
types makes more sense. You can easily add common functionality to the
base class, then add specific functionality to the individual classes.
It also allows for more reflection, which is easier and probably more
powerful than conditional compilation.

--
James Miller


Re: class templates and static if

2012-02-26 Thread Andrej Mitrovic
The immutable is not necessary, use:
auto h = new Parser!(Type.request)("Hello world");

Otherwise, if you really want to declare the type instead of infering
it you can write:
Parser!(Type.request) h = new Parser!(Type.request)("Hello world");

You can also do:
Parser!(Type.request) h;
h = new typeof(h)("Hello world");

or:
alias Parser!(Type.request) ParseRequest;
ParseRequest h = new ParseRequest("Hello world");

Also, 'static' on the enum declaration doesn't really do anything. :)


class templates and static if

2012-02-26 Thread Tyler Jameson Little
So, here's my code, as it stands currently:

import std.stdio;

static enum Type {
request,
 response
};

class Parser(Type t) {
static if (t == Type.request) {
 string name = "request";
} else {
string name = "response";
 }

string message;

this(string message) {
 this.message = message;
}
}

void main() {
 immutable Type t = Type.request;
Parser!t h = new Parser!t("Hello world");

writefln("%s: %s", h.name, h.message);
}

The general goal is to make a templated class that will only include the
parts that each needs. I would like to keep this all as one class template,
because there is a lot of shared code, and I would like to keep it as
seamless as possible and not have to separate out code into separate
functions.

Anyway, my current approach is a little clunky, because it requires an
immutable to be created just to tell the compiler what type of parser I
need.

I was thinking about having two classes, Request and Response, and have
parser take a normal template (class Parser (T)), but I wanted to restrict
it to just those two classes (for now), so I came up with the enum solution.

Is there a better way to do this?

What I'd ultimately like to do is have some static if surrounding blocks of
code that depends on the input to the template.