Re: Compilation times and idiomatic D code
On Monday, 17 July 2017 at 18:42:36 UTC, H. S. Teoh wrote: Besides the symbol problem though, does the template instantiation explosion problem imply as many duplicated function bodies corresponding to the new type symbols? That's a different, though somewhat related, issue. I have some ideas on that front, but for now, Rainers' PR addresses only the problem of symbol length. So the next task after his fix is merged is to tackle the larger problem of how to improve the implementation of templates. [...] Often, this results from templates that contain helper code that's independent of the template parameters, or dependent only on a subset of template parameters. For example: Yeah, I think this is the second best optimization for symbol names after using references for duplicates symbol subnames. One thing you didn't mention is that this also creates a lot of useless duplicate for Voldemort types in template functions that don't depend on (all of) the functions parameters. For instance, looking at the code from the OP's issue: auto s(T)(T t) { struct Result { void foo(){ throw new Exception("2");} } return Result(); } void main(string[] args) { auto x = 1.s.s.s.s.s; x.foo; } foo mangles to a super long symbol, even though the actual symbol should be something like moduleName.s!(__any_type)(__any_type).Result.foo Actually, it's a little more complicated than that, because s might have overloads the compiler is not aware of, which means it doesn't trivially know that T doesn't matter to result. Still, pretty sure there are optimizations to be had there. One possibility would be to allow the coder to give unique subnames to functions. For instance int add.withInts(int n1, int n2) float add.withFloats(float n1, float n2) This would allow the compiler to assume that a function's mangling is unique, without having to worry about overloads and other templates, and therefore skip the unnecessary parameters the function's Voldemort type.
Re: Compilation times and idiomatic D code
On Sat, Jul 15, 2017 at 07:11:54PM +, Enamex via Digitalmars-d wrote: [...] > All that time I'd assumed that 'symbols' as linkers used them were > constant length :T They may have been once upon a time, many decades ago. :-D But modern linkers support symbols of arbitrary length (well, up to a limit :-D) because of the demands of modern languages that generate long symbols as part of implementing function overloading, etc.. > Just to be clear: 100% of that bloat resides in the symbol table? Pretty much, yes. Unless your program happens to want to print a string constructed from the type name (very unlikely unless it's purely for debugging purposes). > So the current proposed remedy is to hash symbols above a length > threshold? Rainers' PR (which is now available as a dmd/druntime branch, btw) changes the mangling scheme so that when a symbol contains substrings that are repeated, the repeated instances are replaced with an encoded back-reference. When substrings are long and/or repeated frequently, this leads to a significant reduction in symbol size. In fact, I just tested the `mangle` branch this morning on my code, and it shows major improvements: my executable sizes are now down to 4MB, about half the size of what I could achieve by type erasure via OO polymorphism. The longest symbol lengths are now about 1000 characters or so, as opposed to 20KB (or, before I applied the horcrux trick, 800KB). Compared to the original 600MB executable sizes before I started tackling this issue, this represents a 99% improvement(!). Now Rainers' changes are just waiting for various dmd testsuite issues to be ironed out, and it should be available in git master soon. Hooray! > Besides the symbol problem though, does the template instantiation > explosion problem imply as many duplicated function bodies > corresponding to the new type symbols? That's a different, though somewhat related, issue. I have some ideas on that front, but for now, Rainers' PR addresses only the problem of symbol length. So the next task after his fix is merged is to tackle the larger problem of how to improve the implementation of templates. One of the major issues with heavily-templated code is that you could potentially end up with many duplicated function bodies that may in fact be binary-identical, but are stored separately in the executable because from the language's POV, they are distinct functions. (Well, there's also the problem of separate compilation, where you could end up with the *same* function instantiated multiple times in different object files, but for the most part, LTO (link-time optimization) takes care of that: the compiler writes out the duplicated instantiations as "weak symbols", so the linker discards all but the first copy.) Often, this results from templates that contain helper code that's independent of the template parameters, or dependent only on a subset of template parameters. For example: struct S(T, size_t size) { T method1() { // N.B.: independent of `size` return T.init; } int[size] method2() { // N.B.: independent of `T` return int[size].init; } T[size] method3() { // N.B.: depends on both `T` and `size` return T[size].init; } } S!(int, 5) si5; S!(int, 6) si6; S!(string, 2) ss2; S!(string, 4) ss4; Here, si5.method1 and si6.method1 are identical, because method1 is independent of `size`. However, because they are part of the template S, they have distinct names: `S!(int, 5).method1` and `S!(int, 6).method1`, so they will be duplicated in the generated code. Ideally, the compiler should detect that they are actually identical, and merge them accordingly. A similar situation occurs with method2. For method3, since it depends on both template parameters, two versions of the function must be generated, obviously. Of course, this example is contrived, and rather trivial. A less trivial example shows that there's more to this problem than just detecting which template parameters a method depends on: struct List(T) { struct Node { Node* next; T t; } Node* removeNext(Node* prev) { Node* tmp = prev.next; prev.next = tmp.next; tmp.next = null; return tmp; } ... } Here, removeNext takes Node* as parameter, and since the definition of Node depends on T, it would appear that removeNext also depends on T. However, closer inspection shows that actually it never touches the portion of Node that is dependent on T. Therefore, removeNext will actually be identical across
Re: Compilation times and idiomatic D code
On 7/14/17 7:30 PM, H. S. Teoh via Digitalmars-d wrote: On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote: [...] Here's a further update to the saga of combating ridiculously large symbol sizes. [...] .wrapWithInterface // <--- type erasure happens here [...] Some further thoughts about type erasure and UFCS chains. Nobody says type erasure *requires* OO -- that's just an artifact of the way things are currently implemented. Consider, for example, your generic UFCS chain: auto r = source .ufcsFunc1 .ufcsFunc2 .ufcsFunc3 ... .ufcsFuncn; From the POV of outside code, *nobody cares* what the specific types of each stage of the UFCS pipeline are. Only the code that implements ufcsFunc1, ufcsFunc2, etc., need to know. Furthermore, suppose X is the specific type of the range that's returned by ufcsFunc3, in the middle of the pipeline. What are the chances this exact type is going to be reused again later? Very slim. And if ufcsFunc2, say, takes an alias parameter that's instantiated with a lambda function, you can basically guarantee that this type X will *never* ever be repeated again, outside of this specific UFCS chain. I think this is slightly flawed thinking. Types can certainly be repeated. Even long chains of UFCS functions may be repeated elsewhere. Therefore, you need to have these characteristics: 1. The compiler can generate the same symbol given the same compile-time parameters. Separate compilation dictates this is a necessity. 2. This CANNOT depend on position inside the module (i.e. no line numbers or global counters). It would be too surprising for it to be a linker error to reorder functions in a module. 3. The mangled symbol should be reversable back to the original symbol name. 4. The symbol should make sense to the viewer. Note that 3 and 4 are more "nice to have" than "essential", as you can certainly compile, link, and run without ever having to print a symbol name. Thinking about this, I've had a couple interesting thoughts. First, when this really matters (i.e. you get really long symbol names) is symbols defined inside template functions. I don't need to rehash this, as you all know this. But the definitions also depend on the runtime parameters, which ironically are usually a repeat of the template parameters. Hence the huge bloat. But what if we forgo that requirement of using the runtime parameters? That is: // prototypical UFCS wrapper auto foo(Args...)(Args args) { static struct Result { ... } return Result(args); } Instead of Result being typed as (e.g.): foo!(int, string, bool)(int, string, bool).Result it really is foo!(int, string, bool).Result What happens? In essence, this is the "horcrux" solution that I came up with. However, there is one case where it doesn't work. And that is, when you have foo overloaded only on runtime parameters: auto foo(T)(T t) { struct Result { ... } ... } auto foo(T)(T t, string s) { struct Result { ... } ... } If we define Result the way I suggest, then both have the same name (foo!(int).Result) even though they are different structs. If you use horcrux on this right now, it actually is a bug, as only the first Result definition is considered (https://issues.dlang.org/show_bug.cgi?id=17653) But I would propose, that we just make this an error. That is, you just have to rename the second one Result2, or something else. 1. This would break some code. Not much, but some rare examples. Most UFCS code has one definition of a function, or multiple definitions, but with different template parameters. 2. It would solve the exponential problem, as now only the template definition is considered, so the growth of the symbol is linear. 3. The savings from Rainer's patch still can be additive. Thoughts? -Steve
Re: Compilation times and idiomatic D code
On Monday, 17 July 2017 at 12:51:37 UTC, jmh530 wrote: On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote: int[] a; auto b = a.map!(a => a / 2)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda1, int[]) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda2, MapResult!(__lambda1, int[])) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)(); pragma(msg, typeof(b)); then it prints out FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1, int[]))) Is there any way - theoretically - to compute typeof(b) lazily, so that the information is only provided as needed? typeof(b) is needed as soon as b is declared. So no. There is no point in making it lazy.
Re: Compilation times and idiomatic D code
On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote: int[] a; auto b = a.map!(a => a / 2)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda1, int[]) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda2, MapResult!(__lambda1, int[])) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)(); pragma(msg, typeof(b)); then it prints out FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1, int[]))) Is there any way - theoretically - to compute typeof(b) lazily, so that the information is only provided as needed?
Re: Compilation times and idiomatic D code
On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote: On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote: [...] The issue has to do with how each invocation of a range-based function tends to result in a new template instantiation, and it's common practice in D to chain a bunch of templated function calls together. For instance, if you have [...] The type is getting progressively longer and more complicated, because when the function template is instantiated, it's instantiated with the type from the previous function call, so it's wrapping the previous type, and you get a new type that has the name of the type of its argument embedded in it. It's like if you keep wrapping a container inside another container. [...] - Jonathan M Davis This was quite insightful, thank you. All that time I'd assumed that 'symbols' as linkers used them were constant length :T Just to be clear: 100% of that bloat resides in the symbol table? So the current proposed remedy is to hash symbols above a length threshold? Besides the symbol problem though, does the template instantiation explosion problem imply as many duplicated function bodies corresponding to the new type symbols?
Re: Compilation times and idiomatic D code
On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote: > - What type information are being kept because of UFCS chains? > Doesn't that mechanism simply apply overload resolution then > choose between the prefix and .method forms as appropriate, > rewriting the terms? > Then it's a problem of function invocation. I don't get > what's happening here still. Does this tie to the Voldemort types > problem? (=> are many of the functions in the chain returning > custom types?) It doesn't make sense, especially if, from your > indirection workaround, it looks like it would work around the > same but without the bloat problem if we unlinked the chain into > many intermediate temporary parts. So how is this a type > information issue? UFCS is irrelevant to all of this. That just so happens to be how he's calling the functions. You'd get the same issue if you called all of the functions in a chain with the traditional function call syntax rather than treating them as a bunch of member functions. The issue has to do with how each invocation of a range-based function tends to result in a new template instantiation, and it's common practice in D to chain a bunch of templated function calls together. For instance, if you have int[] a; auto b = a.map!(a => a / 2)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda1, int[]) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)(); pragma(msg, typeof(b)); then it prints out MapResult!(__lambda2, MapResult!(__lambda1, int[])) If you have int[] a; auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)(); pragma(msg, typeof(b)); then it prints out FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1, int[]))) The type is getting progressively longer and more complicated, because when the function template is instantiated, it's instantiated with the type from the previous function call, so it's wrapping the previous type, and you get a new type that has the name of the type of its argument embedded in it. It's like if you keep wrapping a container inside another container. Array!int a; Array!(Array!int) b; Array!(Array!(Array!int)) c; The first level or two may not be so bad, but pretty quickly, it gets quite ridiculous. The fact that we use auto and/or the name of the template parameter in range-based code completely hides the fact that the type we're operating on has a very long and hideous name. So, for the most part, you don't care, but you do get ridiculously long symbol names underneath the hood, and the compiler and linker have to deal with them, and it becomes a performance problem. And as bad as this example is, map doesn't actually use Voldemort types. MapResult is declared outside of map as a private struct. The extra information that gets encoded in the symbol name with a Voldemort type makes them _much_ worse, though interestingly enough, the compiler doesn't print all of that out. e.g. int[][] a; auto b = a.joiner([5]); pragma(msg, typeof(b)); just prints Result which doesn't look bad even though things are apparently much uglier underneath the hood. But the fact that working with ranges results in a whole bunch of chained invocations of function templates tends to result in a much larger symbol sizes than you'd get with code that wasn't so generic. And the fact that Voldemort types have been pushed as a great way to encapsulate the range types that result from function calls and really shouldn't be used by name makes the situation that much worse. So, idiomatic D code currently results in very large symbol names, and while in many cases, you don't notice, some people (like H.S. Teoh) _are_ noticing, and it's becoming a problem. As I understand it, Weka has quite a few problems with this. Fortunately, a fix to condense symbol names should be along shortly, which will help the situation considerably, but in general, we really need to look at how we can make some of these common D idioms do better in terms of symbol size. - Jonathan M Davis
Re: Compilation times and idiomatic D code
On Saturday, 15 July 2017 at 11:10:32 UTC, Enamex wrote: On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote: [...] I have some stupid questions: - What does everyone mean when they say 'symbol' here? I'm probably misunderstanding symbols gravely or it's something that DMD in particular handles in a strange way. [...] A symbol is anything which has an address and can/could be picked up by the linker. In other words a symbol is anything which has a name. (or could have a name)
Re: Compilation times and idiomatic D code
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote: Here's a further update to the saga of combating ridiculously large symbol sizes. So yesterday I wrote a new module that also heavily uses UFCS chains. My initial draft of the module, once I linked it with the main program, particularly with a UFCS chain that has led to the 600MB executable sizes seen above, caused another explosion in symbol size that actually managed to reach 100MB in *one* symbol, triggering a DMD termination complaining about possible infinite template recursion. :-D Funnier still, temporarily simplifying part of the chain to bring symbol sizes down, I eventually got it below 100MB but ended up with linker segfaults and ELF errors because the huge symbol was too ridiculously huge. Eventually, it drove me to refactor two Phobos functions that are used heavily in my code: std.range.chunks and std.algorithm.joiner, using the same "horcrux" technique (see Phobos PRs #5610 and #5611). This, together with some further refactoring in my own code, eventually brought things down to the 20MB range of executable sizes. Then an idea occurred to me: the reason these symbol sizes got so large, was because the UFCS chain preserves *all* type information about every component type used to build the final type. So, by necessity, the type name has to somehow encode all of this information in an unambiguous way. Now, arguably, DMD's current mangling scheme is at fault because it contains too many repeating components, but even if you disregard that, the fact remains that if you have 50+ components in your overall UFCS chain, the symbol length cannot be less than 50*n where n is the average length of a single component's type name, plus some necessary overhead to account for the mangling scheme syntax. Let's say n is on average 20-25 characters, say round it up to 35 for mangling syntax, so you're still looking at upwards of 1700+ characters *minimum*. That, coupled with the current O(n^2) / O(n^3) mangling scheme, you easily reach megabytes of symbol length. We can compress the symbols all we want, but there's a limit as to how much compression will help. At the end of the day, you still have to represent those 50+ components *somehow*. But what if most of this type information is actually *unnecessary*? To use a range example, if all you care about at the end of the chain is that it's a forward range of ubyte, then why even bother with multi-MB symbols encoding type information that's never actually used? Maybe a little type-erasure would help, via hiding those 50+ component types behind an opaque runtime OO polymorphic interface. Phobos does have the facilities of this, in the form of the InputRange, ForwardRange, etc., interfaces in std.range.interfaces. In my case, however, part of the chain uses another generic type (a kind of generalization of 2D arrays). But either way, the idea is simple: at the end of the UFCS chain, wrap it in a class object that inherits from a generic interface that encodes only the primitives of the 2D array concept, and the element type. The class object itself, of course, still must retain knowledge of the full-fledged type, but the trick is that if we insert this type erasure in the middle of the chain, then later components don't have to encode the type names of earlier components anymore. T I have some stupid questions: - What does everyone mean when they say 'symbol' here? I'm probably misunderstanding symbols gravely or it's something that DMD in particular handles in a strange way. - What type information are being kept because of UFCS chains? Doesn't that mechanism simply apply overload resolution then choose between the prefix and .method forms as appropriate, rewriting the terms? Then it's a problem of function invocation. I don't get what's happening here still. Does this tie to the Voldemort types problem? (=> are many of the functions in the chain returning custom types?) It doesn't make sense, especially if, from your indirection workaround, it looks like it would work around the same but without the bloat problem if we unlinked the chain into many intermediate temporary parts. So how is this a type information issue? Thanks!
Re: Compilation times and idiomatic D code
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote: Result: 8MB executable size (!). Of course, this comes at a runtime cost of an added level of indirection (now you have to call a virtual function through the interface to integrate with earlier components in the pipeline). But at a 60% reduction in executable size, I think it's a worthwhile tradeoff. :-) That's a very interesting idea! However, I wouldn't want to insert this into code manually, maybe a simple flag-opt-in support in DMD that detects UFCS pipes with X components and then inserts the type erasure hack could solve most problems for the time being? That way it wouldn't create a WTF moment when explaining D code and when Stefan or someone else manages to implement something more sophisticated, no code would need an update.
Re: Compilation times and idiomatic D code
On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote: [...] > Here's a further update to the saga of combating ridiculously large > symbol sizes. [...] >.wrapWithInterface // <--- type erasure happens here [...] Some further thoughts about type erasure and UFCS chains. Nobody says type erasure *requires* OO -- that's just an artifact of the way things are currently implemented. Consider, for example, your generic UFCS chain: auto r = source .ufcsFunc1 .ufcsFunc2 .ufcsFunc3 ... .ufcsFuncn; >From the POV of outside code, *nobody cares* what the specific types of each stage of the UFCS pipeline are. Only the code that implements ufcsFunc1, ufcsFunc2, etc., need to know. Furthermore, suppose X is the specific type of the range that's returned by ufcsFunc3, in the middle of the pipeline. What are the chances this exact type is going to be reused again later? Very slim. And if ufcsFunc2, say, takes an alias parameter that's instantiated with a lambda function, you can basically guarantee that this type X will *never* ever be repeated again, outside of this specific UFCS chain. And at the end of the chain, typeof(r) really only needs to encode the fact that it implements the API returned by ufcsFuncn, e.g., the range methods, element type, etc.. But nobody will care how these API functions are implemented under the hood -- as far as outside code is concerned, typeof(r) might as well be an opaque serial number, and we'll be able to find the instantiated API functions that implement the methods of r. IOW, the explicit types of all intermediate stages of the UFCS chain are irrelevant to the outside world. Since they are template functions (most of the time) anyway, the compiler might as well just glue all their code together, and put it inside functions named like funcName0012345XYZ where the 0012345XYZ is some random unique string, basically a serial number, that identifies it as a method of r. For that matter, typeof(r) might as well be ufcsChain0012345XYZ, since nobody outside the pipeline itself will care what its component types are. So basically, the compiler could just automatically type-erase the type of such chains, given that (1) none of the UFCS functions "escape" knowledge about an intermediate type to the outside world, (2) the chain is essentially unique to the current function (esp. if one or more components of the chain has an alias parameter bound to a lambda argument), (3) most UFCS functions return opaque types anyway -- very few (i.e., almost none) Phobos algorithms, for example, allow you to recover the type of the previous component of the chain -- so it's not relevant what the specific types of the previous components are, and (4) therefore the outside world couldn't care less about the specific types that compose typeof(r). It could be as simple as bumping a global counter and suffixing it to some compiler-reserved prefix, to make type names like __ufcsType_0123. This will be far shorter than any symbol compression applied to present-day symbols! To summarize, here's a scheme for handling template expansion in UFCS chains that will produce extremely short symbols, *independently* of how long your UFCS chains are. If the compiler sees a UFCS chain with these characteristics: 1) There are ≥ n components, for some arbitrary value of n (say n=5 or n=10, or something like that, so that we don't bother with trivial cases where the full type may actually be important); 2) Each component of the chain is a template function (this one is almost a given); 3) At least one component guarantees a unique instantiation, e.g., an alias parameter bound to a lambda function; then the compiler will: a) Generate a unique serial number (package.module + global counter should be good enough); b) Assign a unique typename to the final return value, constructed using this serial number; c) Essentially inline all template bodies of previous stages of the pipeline into the last one, using the same serial number to generate names for the aggregated symbols of the final stage. This way, those ridiculously huge symbols aren't even generated to begin with, thereby alleviating the need for convoluted compression schemes. T -- I am not young enough to know everything. -- Oscar Wilde
Re: Compilation times and idiomatic D code
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote: On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via Digitalmars-d wrote: [...] [...] Here's a further update to the saga of combating ridiculously large symbol sizes. [...] You will be excited to hear that my template work will fix the described problem as a side effect. Without incurring run-time overhead. Or forcing the usage of OO wrappers. The downside however is that the scheme I have in mind is very complex and might take over a year to implement.
Re: Compilation times and idiomatic D code
On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via Digitalmars-d wrote: > On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via > Digitalmars-d wrote: > [...] > > http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/ > [...] > > Whoa. I just tried your 'horcrux' trick, and discovered that it > causes a further reduction in executable size! > > Original sizes: > 38144296 bytes (38.1 MB) > 61412056 bytes (61.4 MB) > > After factoring out Voldemort structs as module-global private structs: > >8332352 bytes (8.3 MB) > 10376584 bytes (10.4 MB) > > After further refactoring to use the "horcrux" trick: >8147392 bytes (8.1 MB) > 10124136 bytes (10.1 MB) [...] Here's a further update to the saga of combating ridiculously large symbol sizes. So yesterday I wrote a new module that also heavily uses UFCS chains. My initial draft of the module, once I linked it with the main program, particularly with a UFCS chain that has led to the 600MB executable sizes seen above, caused another explosion in symbol size that actually managed to reach 100MB in *one* symbol, triggering a DMD termination complaining about possible infinite template recursion. :-D Funnier still, temporarily simplifying part of the chain to bring symbol sizes down, I eventually got it below 100MB but ended up with linker segfaults and ELF errors because the huge symbol was too ridiculously huge. Eventually, it drove me to refactor two Phobos functions that are used heavily in my code: std.range.chunks and std.algorithm.joiner, using the same "horcrux" technique (see Phobos PRs #5610 and #5611). This, together with some further refactoring in my own code, eventually brought things down to the 20MB range of executable sizes. Then an idea occurred to me: the reason these symbol sizes got so large, was because the UFCS chain preserves *all* type information about every component type used to build the final type. So, by necessity, the type name has to somehow encode all of this information in an unambiguous way. Now, arguably, DMD's current mangling scheme is at fault because it contains too many repeating components, but even if you disregard that, the fact remains that if you have 50+ components in your overall UFCS chain, the symbol length cannot be less than 50*n where n is the average length of a single component's type name, plus some necessary overhead to account for the mangling scheme syntax. Let's say n is on average 20-25 characters, say round it up to 35 for mangling syntax, so you're still looking at upwards of 1700+ characters *minimum*. That, coupled with the current O(n^2) / O(n^3) mangling scheme, you easily reach megabytes of symbol length. We can compress the symbols all we want, but there's a limit as to how much compression will help. At the end of the day, you still have to represent those 50+ components *somehow*. But what if most of this type information is actually *unnecessary*? To use a range example, if all you care about at the end of the chain is that it's a forward range of ubyte, then why even bother with multi-MB symbols encoding type information that's never actually used? Maybe a little type-erasure would help, via hiding those 50+ component types behind an opaque runtime OO polymorphic interface. Phobos does have the facilities of this, in the form of the InputRange, ForwardRange, etc., interfaces in std.range.interfaces. In my case, however, part of the chain uses another generic type (a kind of generalization of 2D arrays). But either way, the idea is simple: at the end of the UFCS chain, wrap it in a class object that inherits from a generic interface that encodes only the primitives of the 2D array concept, and the element type. The class object itself, of course, still must retain knowledge of the full-fledged type, but the trick is that if we insert this type erasure in the middle of the chain, then later components don't have to encode the type names of earlier components anymore. So, tl;dr: // Before: auto r = input .ufcsFunc1 .ufcsFunc2 .map!(e => e.ufcsFunc3.ufcsFunc4 ...) ... .ufcsFuncn .flattenToRange .rangeFunc1 .rangeFunc2 ...; Result: 20MB executable size (600MB before the "horcrux" technique was applied). // After: input.ufcsFunc1 .ufcsFunc2 .map!(e => e.ufcsFunc3.ufcsFunc4 ...) ... .ufcsFuncn .wrapWithInterface // <--- type erasure happens here .flattenToRange .rangeFunc1 .rangeFunc2 ...; Result: 8MB executable size (!). Of course, this comes at a runtime cost of an added level of indirection (now you have to call a virtual function through the interface to integrate with earlier components in the pipeline). But at a
Re: Compilation times and idiomatic D code
On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...] > http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/ > > I was surprised as well at the reduction. I only noticed the problem > because when I was debugging my library, the program was taking an > unreasonably long time to *print a stack trace* (like multiple > seconds). [...] Whoa. I just tried your 'horcrux' trick, and discovered that it causes a further reduction in executable size! Original sizes: 38144296 bytes (38.1 MB) 61412056 bytes (61.4 MB) After factoring out Voldemort structs as module-global private structs: 8332352 bytes (8.3 MB) 10376584 bytes (10.4 MB) After further refactoring to use the "horcrux" trick: 8147392 bytes (8.1 MB) 10124136 bytes (10.1 MB) While this last reduction is rather modest relative to the total executable sizes, it still represents a reduction of 200KB and 300KB worth of symbols(!) each. Given that this code base is a mere 5000+ LOC, I'd expect significant savings in larger template-heavy projects. My guess as to reason for this reduction is because now you only have to instantiate one template for each template function call, rather than two: private struct MyFuncImpl(Args...) { ... } auto myFunc(Args...)(Args args) { return MyFuncImpl!Args(args); } will instantiate two templates, MyFuncImpl and myFunc, each time with the same argument list; whereas: template myFunc(Args...) { auto myFunc(Args args) { return Impl(args); } struct Impl { ... } } only requires a single template instantiation of myFunc, since Impl is instantiated along with the function itself. There's also the slightly shorter names involved (`myFunc!Args.Impl` vs. `MyFuncImpl!Args.MyFuncImpl`), which, given the O(n^2) or O(n^3) symbol length dependency, can quickly add up to non-trivial amounts of savings. T -- Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N
Re: Compilation times and idiomatic D code
On 7/13/17 3:23 PM, H. S. Teoh via Digitalmars-d wrote: Today, I decided to sit down and refactor my code. I've heard tell that Voldemort types tend to cause an explosion in symbol size, and today I thought it'd be interesting to see just how big of a difference this actually makes. I could not believe my eyes. Before the refactoring, my executable sizes (I have 2 executables in this codebase) were 37.3MB and 60.0MB, respectively. After the refactoring, my executable sizes were, respectively, 8.2MB and 10.2MB. That's a 5x and 6x reduction in executable size, respectively. Whoa. Yep. http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/ I was surprised as well at the reduction. I only noticed the problem because when I was debugging my library, the program was taking an unreasonably long time to *print a stack trace* (like multiple seconds). -Steve
Re: Compilation times and idiomatic D code
On Wed, Jul 05, 2017 at 09:45:55PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...] > In this case, I think that it's more that Voldemort types are biting > us than that ranges are biting us (though the Voldemort types are > typically ranges, so in practice, using ranges ends up biting you). > > https://issues.dlang.org/show_bug.cgi?id=15831 > > I think that there's a good argument that Voldemort types aren't > actually a particularly good idea given these issues, but the work > that's being done to compress symbols should at least reduce the > problem. [...] Here's the latest update on this bit of drama: Today, I decided to sit down and refactor my code. I've heard tell that Voldemort types tend to cause an explosion in symbol size, and today I thought it'd be interesting to see just how big of a difference this actually makes. I could not believe my eyes. Before the refactoring, my executable sizes (I have 2 executables in this codebase) were 37.3MB and 60.0MB, respectively. After the refactoring, my executable sizes were, respectively, 8.2MB and 10.2MB. That's a 5x and 6x reduction in executable size, respectively. Whoa. And all I did was to move Voldemort structs out into module-level private structs. I.e., this: auto myFunc(Args...)(Args args) { static struct Result { ... } return Result(args); } became this: private MyFuncImpl(Args...) { ... } auto myFunc(Args...)(Args args) { return MyFuncImpl!Args(args); } And you get an instant 5x to 6x reduction in executable size. Now mind you, the ridiculous symbol size problem still exists; the executable still contains some huge symbols. The difference is that now the longest symbol is "only" 86KB, whereas before the refactoring, the longest symbol was 777KB (!). So the problem is less severe, but still present. So symbol compression is still a very important part of the eventual solution. T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
Re: Compilation times and idiomatic D code
On Fri, Jul 07, 2017 at 09:32:24AM -0400, Steven Schveighoffer via Digitalmars-d wrote: > On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote: > > On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote: > > > On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote: [...] > > > > Rainer Schuetze is quite close to a solution. Which reduces the > > > > symbol-name bloat significantly. > > > > See https://github.com/dlang/dmd/pull/5855 [...] > I'm super-psyched this has moved from "proof of concept" to ready for > review. Kudos to Rainer for his work on this! Has been a PITA for a > while: > > https://issues.dlang.org/show_bug.cgi?id=15831 > https://forum.dlang.org/post/n96k3g$ka5$1...@digitalmars.com Yes, kudos to Rainer for making this a (near) reality! [...] > I have found that the linker gets REALLY slow when the symbols get > large. So it's not necessarily the compiler that's slow for this. [...] True, I didn't profile the compiler carefully to discern whether it was the compiler that's slow, or the linker. But either way, having smaller symbols will benefit both. T -- Freedom: (n.) Man's self-given right to be enslaved by his own depravity.
Re: Compilation times and idiomatic D code
On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote: On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote: On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote: Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 That's very nice. Hope we will get this through sooner rather than later! I'm super-psyched this has moved from "proof of concept" to ready for review. Kudos to Rainer for his work on this! Has been a PITA for a while: https://issues.dlang.org/show_bug.cgi?id=15831 https://forum.dlang.org/post/n96k3g$ka5$1...@digitalmars.com In any case, I think the actual compilation times would depend on the details of the code. If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols. I have found that the linker gets REALLY slow when the symbols get large. So it's not necessarily the compiler that's slow for this. -Steve
Re: Compilation times and idiomatic D code
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote: On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote: I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future? Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done. Please give consent for the D Foundation to clone you.
Re: Compilation times and idiomatic D code
On Thu, Jul 06, 2017 at 01:32:04PM +, Atila Neves via Digitalmars-d wrote: > On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote: [...] > > Yeah, it's usually all these D specific compile time features that > > is slowing down compilation. > > > > DWT and Tango are two good examples of large code bases where very > > few of these features are used, they're written in a more > > traditional style. They're at least 200k lines of code each and, > > IIRC, takes around 10 seconds (or less) to compile, for a full > > build. > > IIRC building Tango per package instead of all-at-once got the build > time down to less than a second. [...] Well, obviously D's famed compilation speed must still be applicable *somewhere*, otherwise we'd be hearing loud complaints. :-D My point was that D's compile-time features, which are a big draw to me personally, and also becoming a selling point of D, need improvement in this area. I'm very happy to be pointed to Rainer's PR that implements symbol backreferencing compression. Apparently it has successfully compressed the largest symbol generated by Phobos unittests from 30KB (or something like that) down to about 1100 characters, which, though still on the large side, is much more reasonable than the present situation. I hope this PR will get merged in the near future. T -- Making non-nullable pointers is just plugging one hole in a cheese grater. -- Walter Bright
Re: Compilation times and idiomatic D code
On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote: On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote: [...] It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols. [...] Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build. IIRC building Tango per package instead of all-at-once got the build time down to less than a second. Atila
Re: Compilation times and idiomatic D code
On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote: Over time, what is considered "idiomatic D" has changed, and nowadays it seems to be leaning heavily towards range-based code with UFCS chains using std.algorithm and similar reusable pieces of code. It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols. D (well, DMD specifically) is famed for its lightning speed compilation times. So this left me wondering why my latest D project, a smallish codebase with only ~5000 lines of code, a good part of which are comments, takes about 11 seconds to compile. Yeah, it's usually all these D specific compile time features that is slowing down compilation. DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build. -- /Jacob Carlborg
Re: Compilation times and idiomatic D code
On Wednesday, July 05, 2017 13:12:40 H. S. Teoh via Digitalmars-d wrote: [...] > IOW, the very range-based idiom that has become one of the defining > characteristics of modern D is negating the selling point of fast > compilation. > > I vaguely remember there was talk about compressing symbols when they > get too long... is there any hope of seeing this realized in the near > future? In this case, I think that it's more that Voldemort types are biting us than that ranges are biting us (though the Voldemort types are typically ranges, so in practice, using ranges ends up biting you). https://issues.dlang.org/show_bug.cgi?id=15831 I think that there's a good argument that Voldemort types aren't actually a particularly good idea given these issues, but the work that's being done to compress symbols should at least reduce the problem. - Jonathan M Davis
Re: Compilation times and idiomatic D code
On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote: > On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote: > > > > Yes there is. > > Rainer Schuetze is quite close to a solution. Which reduces the > > symbol-name bloat significantly. > > See https://github.com/dlang/dmd/pull/5855 That's very nice. Hope we will get this through sooner rather than later! [...] > A table in the comments [1] shows a significant reduction in bloat > when compiling phobos unit tests. However, it shows a slight increase > in build time. I would have expected a decrease. Any idea why that is? > > [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542 The same comment points to a refactoring PR by Walter (dmd #6841). Not sure why that PR would interact with this one in this way. In any case, I think the actual compilation times would depend on the details of the code. If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols. T -- Help a man when he is in trouble and he will remember you when he is in trouble again.
Re: Compilation times and idiomatic D code
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote: Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is? [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542
Re: Compilation times and idiomatic D code
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote: I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future? LDC has an experimental feature replacing long names by their hash; ldc2 -help: ... -hash-threshold=- Hash symbol names longer than this threshold (experimental)
Re: Compilation times and idiomatic D code
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote: I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future? Yes there is. Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly. See https://github.com/dlang/dmd/pull/5855 There is still a problem with the template system as a whole. Which I am working on in my spare time. And which will become my focus after newCTFE is done.
Compilation times and idiomatic D code
Over time, what is considered "idiomatic D" has changed, and nowadays it seems to be leaning heavily towards range-based code with UFCS chains using std.algorithm and similar reusable pieces of code. D (well, DMD specifically) is famed for its lightning speed compilation times. So this left me wondering why my latest D project, a smallish codebase with only ~5000 lines of code, a good part of which are comments, takes about 11 seconds to compile. A first hint is that these meager 5000 lines of code compile to a 600MB executable. Well, large executables have been the plague of D since the beginning, but the reasoning has always been that hello world examples don't really count, because the language offers the machinery for much more than that, and the idea is that as the code size grows, the "bloat to functionality" ratio decreases. But still... 600MB for 5000 lines of code seems a bit excessive. Especially when stripping symbols cut off about *half* of that size. Which leads to the discovery, to my horror, that there are some very, VERY large symbols that are generated. Including one that's 31 characters long. Yes, that's almost 400KB just for ONE symbol. This particular symbol is the result of a long UFCS chain in the main program, and contains a lot of repeated elements, like myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments and so on. Each additional member in the UFCS chain causes a repetition of all the previous members' return type names, plus the new typename, causing an O(n^2) explosion in symbol size. Worse yet, because the typename encoded in this monster symbol is a range, you have the same 300+KB of typename repeated for each of the range primitives. And anything else this typename happens to be a template argument to. There's another related symbol that's 388944 characters long. Not to mention all the range primitives (along with their similarly huge typenames) of all the smaller types contained within this monster typename. Given this, it's no surprise that the compiler took 11 seconds to compile a 5000-line program. Just imagine how much time is spent generating these huge symbols, storing them in the symbol table, comparing them in symbol table lookups, writing them to the executable, etc.. And we're not even talking about the other smaller, but still huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB symbols, etc.. And think about the impact of this on the compiler's memory footprint. IOW, the very range-based idiom that has become one of the defining characteristics of modern D is negating the selling point of fast compilation. I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future? T -- War doesn't prove who's right, just who's left. -- BSD Games' Fortune