Re: class templates and static if
On 02/28/2012 02:12 AM, Dmitry Olshansky wrote: > On 28.02.2012 2:17, Ali Çehreli wrote: >> I have played with this optimization recently. (Could be dmd 2.057.) No, >> dmd did not optimize a straightforward switch statement over a ubyte >> expression with about two hundred ubyte cases. >> > Hate to say it but I see it in real-world code, VM performance almost > doubled. That's really good to hear. Perhaps my experiments were either naive, or a series of "cmp xxx je xxx" would be faster, say for the ubyte type that I've used. > Mm care to share you experiments? Here is a program that mixes-in a lookup function that contains a single switch statement that covers the entire 256 values of ubyte: import std.stdio; import std.conv; string oneCase(FromT, ToT)(FromT fromExpr, ToT toExpr) { return " case " ~ to!string(fromExpr) ~ ": return " ~ to!string(ToT.init) ~ "; "; } string manyCases(FromT, ToT)(size_t count) { string result; ToT fromExpr; foreach (i; 0 .. count) { result ~= oneCase(fromExpr, 42); ++fromExpr; } return result; } string lookUpFunc(FromT, ToT)(size_t count) { immutable header = ToT.stringof ~ " lookUp(" ~ FromT.stringof ~ " expr) { switch (expr) { "; immutable cases = manyCases!(FromT, ToT)(count); immutable footer = " default: return " ~ to!string(ToT.init) ~ "; } }"; return header ~ cases ~ footer; } void main() { enum funcWithManyCases = lookUpFunc!(ubyte, ubyte)(256); writeln("Mixing-in: ", funcWithManyCases); mixin(funcWithManyCases); assert(lookUp(42) == 0); } I compiled with -O and then used obj2asm to see that there are 256 cmp xxx jmp xxx instructions. Ali
Re: class templates and static if
On 28.02.2012 2:17, Ali Çehreli wrote: On 02/27/2012 02:06 PM, Jonathan M Davis wrote: > I'm not saying that dmd doesn't ever optimize switch statements. I'm just > saying that as I understand it, it doesn't always do so (its performance with > ranged case statements isn't great for instance). Odds are that it _does_ > optimize straight integer case statements fairly well, because that's the > classic C stuff (and if you've verified that, all the better - I haven't). I would say use switch when you have multiple choises over one variable, because the compiler *can* do much better job then with if/else chain (even for strings, e.g. hashing, same binary-search approach). Because the if/else chain fixes the order of comparisons it can't become faster at all (or it would be an aggressive optimization). It's just wrong idea to lose this opportunity with switch. I have played with this optimization recently. (Could be dmd 2.057.) No, dmd did not optimize a straightforward switch statement over a ubyte expression with about two hundred ubyte cases. Hate to say it but I see it in real-world code, VM performance almost doubled. Mm care to share you experiments? Alternatively just dissemble any regex sample, seek out mangled trash of _xxx_eval_ function. The assembly contained conditional jump statements, not a table. And yes, I did try with -O. But I am not sure that a lookup table really is an optimization with modern CPUs. A series of conditional jumps that fit the CPU's cache could be faster than a table that's outside of the cache. I think accessing the cache is hundreds of times faster than accessing memory outside of the cache. It's all about branch prediction really - hundreds of cmp xxx je xxx stall pipelines pretty darn well. Though the effect might depend on the code size inside these case branches. BTW tables should be somewhere close, though I haven't checked this. -- Dmitry Olshansky
Re: class templates and static if
On 02/27/2012 02:06 PM, Jonathan M Davis wrote: > I'm not saying that dmd doesn't ever optimize switch statements. I'm just > saying that as I understand it, it doesn't always do so (its performance with > ranged case statements isn't great for instance). Odds are that it _does_ > optimize straight integer case statements fairly well, because that's the > classic C stuff (and if you've verified that, all the better - I haven't). I have played with this optimization recently. (Could be dmd 2.057.) No, dmd did not optimize a straightforward switch statement over a ubyte expression with about two hundred ubyte cases. The assembly contained conditional jump statements, not a table. And yes, I did try with -O. But I am not sure that a lookup table really is an optimization with modern CPUs. A series of conditional jumps that fit the CPU's cache could be faster than a table that's outside of the cache. I think accessing the cache is hundreds of times faster than accessing memory outside of the cache. Ali
Re: class templates and static if
On Monday, February 27, 2012 23:55:39 Dmitry Olshansky wrote: > On 27.02.2012 23:18, Jonathan M Davis wrote: > > On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote: > >> On 02/27/2012 08:29 AM, Tyler Jameson Little wrote: > >>> I didn't want to do subclassing, because my parser is a state-machine > >> > >> style > >> > >>> parser, so it's in a big switch. Pretty gross, but I would like it to > >> > >> be as > >> > >>> fast as possible. That's why I thought this model would be so cool, > >> > >> because > >> > >>> I could remove conditions from the generated code, and get rid of a > >> > >> lot of > >> > >>> the conditionals. > >> > >> I am pretty sure switch statements boil down to a sequence conditionals > >> consisting of equality comparisons. > >> > >> I know that some compilers use optimizations where the comparisons are > >> converted to a single lookup, but last I checked, dmd does not apply > >> that optimization. Perhaps because it's not implemented yet, or because > >> using a table lookup might be slower because of reaching outside of the > >> cpu cache. (Or another reason that I am not aware of. :)) > > > > Yeah. In theory, switch statements can be optimized better than if-else > > chains, and eventually I'd fully expect that dmd would do that, but right > > now, I don't think that they really are. You'd have to look at the > > generated assembly though to be sure though. > > Please stop spreading the misconception that they are going to be > optimized "probably and eventually". Switch statements over integers are > much better then if/else chains and were for something like 20 years. > They do produce nice improvements over a sequence of if/else because > values are known in advance and are bounded. > So it goes like this: [snip] > And for the record I seen all of that in assembly listings produced by > dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes > layout a bit so that dmd outputs jump table and not a binary-searching > like stuff in a tight loop. I'm not saying that dmd doesn't ever optimize switch statements. I'm just saying that as I understand it, it doesn't always do so (its performance with ranged case statements isn't great for instance). Odds are that it _does_ optimize straight integer case statements fairly well, because that's the classic C stuff (and if you've verified that, all the better - I haven't). It's the new stuff that's less likely to be well-optimized (like ranged case statements or case statements with strings). And those probably _will_ be better optimized eventually, but as I understand it, they aren't now. So, if the reason for using a switch statement over and an if-else chain is purely for performance (particularly if doing something else than just using plain case statements with integers like in C), then you should probably profile your code and/or look at the assembly to be sure that you're getting the performance gain that you think that you're getting. - Jonathan M Davis
Re: class templates and static if
On 27.02.2012 23:18, Jonathan M Davis wrote: On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote: On 02/27/2012 08:29 AM, Tyler Jameson Little wrote: I didn't want to do subclassing, because my parser is a state-machine style parser, so it's in a big switch. Pretty gross, but I would like it to be as fast as possible. That's why I thought this model would be so cool, because I could remove conditions from the generated code, and get rid of a lot of the conditionals. I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :)) Yeah. In theory, switch statements can be optimized better than if-else chains, and eventually I'd fully expect that dmd would do that, but right now, I don't think that they really are. You'd have to look at the generated assembly though to be sure though. Please stop spreading the misconception that they are going to be optimized "probably and eventually". Switch statements over integers are much better then if/else chains and were for something like 20 years. They do produce nice improvements over a sequence of if/else because values are known in advance and are bounded. So it goes like this: 1. If all numbers are tightly packed (e.g. almost all of M..N range is covered by a case) then it does a jump table. So there is almost no comparisons at all (just check it fits the range) and it's just one indirect jump. 2.Even failing that it does a binary-search like comparisons to speed thing up. 3. Sometimes linear searching is faster then binary (on small number of case) and it can estimate this threshold. Same goes for table vs linear search and table vs binary search. 4. In fact it even does combination of all of the above, e.g. separate range in two by one comparison then use jump tables in both etc. For instance: switch(x){ case 0: ... case 1: ... case 4: ... case 9: ... //all cases used up to 40 case 40: It might do something like (depends on compiler vendor ;) ): if( x < 9) { if(x == 0) ... else if(x == 1) ... else if(x == 4) ... } else { if(x <=40) goto table[x]; //pseudocode else goto default; } And for the record I seen all of that in assembly listings produced by dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes layout a bit so that dmd outputs jump table and not a binary-searching like stuff in a tight loop. -- Dmitry Olshansky
Re: class templates and static if
On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote: > On 02/27/2012 08:29 AM, Tyler Jameson Little wrote: > > I didn't want to do subclassing, because my parser is a state-machine > > style > > > parser, so it's in a big switch. Pretty gross, but I would like it to > > be as > > > fast as possible. That's why I thought this model would be so cool, > > because > > > I could remove conditions from the generated code, and get rid of a > > lot of > > > the conditionals. > > I am pretty sure switch statements boil down to a sequence conditionals > consisting of equality comparisons. > > I know that some compilers use optimizations where the comparisons are > converted to a single lookup, but last I checked, dmd does not apply > that optimization. Perhaps because it's not implemented yet, or because > using a table lookup might be slower because of reaching outside of the > cpu cache. (Or another reason that I am not aware of. :)) Yeah. In theory, switch statements can be optimized better than if-else chains, and eventually I'd fully expect that dmd would do that, but right now, I don't think that they really are. You'd have to look at the generated assembly though to be sure though. - Jonathan M Davis
Re: class templates and static if
On 02/27/2012 08:29 AM, Tyler Jameson Little wrote: > I didn't want to do subclassing, because my parser is a state-machine style > parser, so it's in a big switch. Pretty gross, but I would like it to be as > fast as possible. That's why I thought this model would be so cool, because > I could remove conditions from the generated code, and get rid of a lot of > the conditionals. I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :)) Ali
Re: Re: class templates and static if
On 2/27/12, Tyler Jameson Little wrote: > That's why I thought this model would be so cool, because > I could remove conditions from the generated code, and get rid of a lot of > the conditionals. I think Nick Sabalausky talks about this concept (removing conditionals via compile-time features) in his article http://www.semitwist.com/articles/EfficientAndFlexible/SinglePage/ Other articles from last year's contest: http://prowiki.org/wiki4d/wiki.cgi?Article_Contest
Re: Re: class templates and static if
On Mon, Feb 27, 2012 at 09:29:25AM -0700, Tyler Jameson Little wrote: > > Parser!(Type.request) h = new Parser!(Type.request)("Hello world") Even better: auto h = new Parser!(Type.request)("Hello world"); T -- First Rule of History: History doesn't repeat itself -- historians merely repeat each other.
Re: Re: class templates and static if
> Parser!(Type.request) h = new Parser!(Type.request)("Hello world") Wow, I was so close! I had tried this: Parser!Type.request h = new Parser!Type.request("Hello world"); but that didn't work. I didn't think about enclosing it in parens! I didn't want to do subclassing, because my parser is a state-machine style parser, so it's in a big switch. Pretty gross, but I would like it to be as fast as possible. That's why I thought this model would be so cool, because I could remove conditions from the generated code, and get rid of a lot of the conditionals. Thanks so much!
Re: class templates and static if
> Also, 'static' on the enum declaration doesn't really do anything. :) but what if the immutable data changes?! :P But yeah, D's type inference is reasonably good (not as good as Haskell's, but nothing is as good as Haskell's), so doing what seems like the best way is normally the right way around it. If you have a lot of shared code, it might make more sense to just use traditional subclassing. Assuming that the shared code is all mostly Parser stuff then subclassing a Parser type into Request and Response types makes more sense. You can easily add common functionality to the base class, then add specific functionality to the individual classes. It also allows for more reflection, which is easier and probably more powerful than conditional compilation. -- James Miller
Re: class templates and static if
The immutable is not necessary, use: auto h = new Parser!(Type.request)("Hello world"); Otherwise, if you really want to declare the type instead of infering it you can write: Parser!(Type.request) h = new Parser!(Type.request)("Hello world"); You can also do: Parser!(Type.request) h; h = new typeof(h)("Hello world"); or: alias Parser!(Type.request) ParseRequest; ParseRequest h = new ParseRequest("Hello world"); Also, 'static' on the enum declaration doesn't really do anything. :)
class templates and static if
So, here's my code, as it stands currently: import std.stdio; static enum Type { request, response }; class Parser(Type t) { static if (t == Type.request) { string name = "request"; } else { string name = "response"; } string message; this(string message) { this.message = message; } } void main() { immutable Type t = Type.request; Parser!t h = new Parser!t("Hello world"); writefln("%s: %s", h.name, h.message); } The general goal is to make a templated class that will only include the parts that each needs. I would like to keep this all as one class template, because there is a lot of shared code, and I would like to keep it as seamless as possible and not have to separate out code into separate functions. Anyway, my current approach is a little clunky, because it requires an immutable to be created just to tell the compiler what type of parser I need. I was thinking about having two classes, Request and Response, and have parser take a normal template (class Parser (T)), but I wanted to restrict it to just those two classes (for now), so I came up with the enum solution. Is there a better way to do this? What I'd ultimately like to do is have some static if surrounding blocks of code that depends on the input to the template.