Re: Compilation times and idiomatic D code

2017-07-18 Thread Olivier FAURE via Digitalmars-d

On Monday, 17 July 2017 at 18:42:36 UTC, H. S. Teoh wrote:
Besides the symbol problem though, does the template 
instantiation explosion problem imply as many duplicated 
function bodies corresponding to the new type symbols?


That's a different, though somewhat related, issue.  I have 
some ideas on that front, but for now, Rainers' PR addresses 
only the problem of symbol length.  So the next task after his 
fix is merged is to tackle the larger problem of how to improve 
the implementation of templates.


[...]

Often, this results from templates that contain helper code 
that's independent of the template parameters, or dependent 
only on a subset of template parameters. For example:


Yeah, I think this is the second best optimization for symbol 
names after using references for duplicates symbol subnames.


One thing you didn't mention is that this also creates a lot of 
useless duplicate for Voldemort types in template functions that 
don't depend on (all of) the functions parameters. For instance, 
looking at the code from the OP's issue:


auto s(T)(T t)
{
struct Result
{
void foo(){ throw new Exception("2");}
}
return Result();
}

void main(string[] args)
{
auto x = 1.s.s.s.s.s;
x.foo;
}

foo mangles to a super long symbol, even though the actual symbol 
should be something like


moduleName.s!(__any_type)(__any_type).Result.foo

Actually, it's a little more complicated than that, because s 
might have overloads the compiler is not aware of, which means it 
doesn't trivially know that T doesn't matter to result.


Still, pretty sure there are optimizations to be had there.

One possibility would be to allow the coder to give unique 
subnames to functions. For instance


int add.withInts(int n1, int n2)
float add.withFloats(float n1, float n2)

This would allow the compiler to assume that a function's 
mangling is unique, without having to worry about overloads and 
other templates, and therefore skip the unnecessary parameters 
the function's Voldemort type.


Re: Compilation times and idiomatic D code

2017-07-17 Thread H. S. Teoh via Digitalmars-d
On Sat, Jul 15, 2017 at 07:11:54PM +, Enamex via Digitalmars-d wrote:
[...]
> All that time I'd assumed that 'symbols' as linkers used them were
> constant length :T

They may have been once upon a time, many decades ago. :-D  But modern
linkers support symbols of arbitrary length (well, up to a limit :-D)
because of the demands of modern languages that generate long symbols as
part of implementing function overloading, etc..


> Just to be clear: 100% of that bloat resides in the symbol table?

Pretty much, yes.  Unless your program happens to want to print a string
constructed from the type name (very unlikely unless it's purely for
debugging purposes).


> So the current proposed remedy is to hash symbols above a length
> threshold?

Rainers' PR (which is now available as a dmd/druntime branch, btw)
changes the mangling scheme so that when a symbol contains substrings
that are repeated, the repeated instances are replaced with an encoded
back-reference. When substrings are long and/or repeated frequently,
this leads to a significant reduction in symbol size.

In fact, I just tested the `mangle` branch this morning on my code, and
it shows major improvements: my executable sizes are now down to 4MB,
about half the size of what I could achieve by type erasure via OO
polymorphism.  The longest symbol lengths are now about 1000 characters
or so, as opposed to 20KB (or, before I applied the horcrux trick,
800KB).  Compared to the original 600MB executable sizes before I
started tackling this issue, this represents a 99% improvement(!).

Now Rainers' changes are just waiting for various dmd testsuite issues
to be ironed out, and it should be available in git master soon. Hooray!


> Besides the symbol problem though, does the template instantiation
> explosion problem imply as many duplicated function bodies
> corresponding to the new type symbols?

That's a different, though somewhat related, issue.  I have some ideas
on that front, but for now, Rainers' PR addresses only the problem of
symbol length.  So the next task after his fix is merged is to tackle
the larger problem of how to improve the implementation of templates.

One of the major issues with heavily-templated code is that you could
potentially end up with many duplicated function bodies that may in fact
be binary-identical, but are stored separately in the executable because
from the language's POV, they are distinct functions.  (Well, there's
also the problem of separate compilation, where you could end up with
the *same* function instantiated multiple times in different object
files, but for the most part, LTO (link-time optimization) takes care of
that: the compiler writes out the duplicated instantiations as "weak
symbols", so the linker discards all but the first copy.)

Often, this results from templates that contain helper code that's
independent of the template parameters, or dependent only on a subset of
template parameters. For example:

struct S(T, size_t size) {
T method1() {
// N.B.: independent of `size`
return T.init;
}
int[size] method2() {
// N.B.: independent of `T`
return int[size].init;
}
T[size] method3() {
// N.B.: depends on both `T` and `size`
return T[size].init;
}
}

S!(int, 5) si5;
S!(int, 6) si6;
S!(string, 2) ss2;
S!(string, 4) ss4;

Here, si5.method1 and si6.method1 are identical, because method1 is
independent of `size`.  However, because they are part of the template
S, they have distinct names: `S!(int, 5).method1` and `S!(int,
6).method1`, so they will be duplicated in the generated code.  Ideally,
the compiler should detect that they are actually identical, and merge
them accordingly.

A similar situation occurs with method2. For method3, since it depends
on both template parameters, two versions of the function must be
generated, obviously.

Of course, this example is contrived, and rather trivial.  A less
trivial example shows that there's more to this problem than just
detecting which template parameters a method depends on:

struct List(T) {
struct Node {
Node* next;
T t;
}
Node* removeNext(Node* prev) {
Node* tmp = prev.next;
prev.next = tmp.next;
tmp.next = null;
return tmp;
}
...
}

Here, removeNext takes Node* as parameter, and since the definition of
Node depends on T, it would appear that removeNext also depends on T.
However, closer inspection shows that actually it never touches the
portion of Node that is dependent on T.  Therefore, removeNext will
actually be identical across 

Re: Compilation times and idiomatic D code

2017-07-17 Thread Steven Schveighoffer via Digitalmars-d

On 7/14/17 7:30 PM, H. S. Teoh via Digitalmars-d wrote:

On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]

Here's a further update to the saga of combating ridiculously large
symbol sizes.

[...]

 .wrapWithInterface // <--- type erasure happens here

[...]

Some further thoughts about type erasure and UFCS chains.

Nobody says type erasure *requires* OO -- that's just an artifact of the
way things are currently implemented.  Consider, for example, your
generic UFCS chain:

auto r = source
.ufcsFunc1
.ufcsFunc2
.ufcsFunc3
...
.ufcsFuncn;


From the POV of outside code, *nobody cares* what the specific types of

each stage of the UFCS pipeline are.  Only the code that implements
ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the
specific type of the range that's returned by ufcsFunc3, in the middle
of the pipeline.  What are the chances this exact type is going to be
reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias
parameter that's instantiated with a lambda function, you can basically
guarantee that this type X will *never* ever be repeated again, outside
of this specific UFCS chain.


I think this is slightly flawed thinking.

Types can certainly be repeated. Even long chains of UFCS functions may 
be repeated elsewhere. Therefore, you need to have these characteristics:


1. The compiler can generate the same symbol given the same compile-time 
parameters. Separate compilation dictates this is a necessity.
2. This CANNOT depend on position inside the module (i.e. no line 
numbers or global counters). It would be too surprising for it to be a 
linker error to reorder functions in a module.

3. The mangled symbol should be reversable back to the original symbol name.
4. The symbol should make sense to the viewer.

Note that 3 and 4 are more "nice to have" than "essential", as you can 
certainly compile, link, and run without ever having to print a symbol name.


Thinking about this, I've had a couple interesting thoughts. First, when 
this really matters (i.e. you get really long symbol names) is symbols 
defined inside template functions. I don't need to rehash this, as you 
all know this.


But the definitions also depend on the runtime parameters, which 
ironically are usually a repeat of the template parameters. Hence the 
huge bloat.


But what if we forgo that requirement of using the runtime parameters? 
That is:


// prototypical UFCS wrapper
auto foo(Args...)(Args args)
{
  static struct Result { ... }
  return Result(args);
}

Instead of Result being typed as (e.g.):
  foo!(int, string, bool)(int, string, bool).Result

it really is
  foo!(int, string, bool).Result

What happens? In essence, this is the "horcrux" solution that I came up 
with. However, there is one case where it doesn't work. And that is, 
when you have foo overloaded only on runtime parameters:


auto foo(T)(T t) { struct Result { ... } ... }
auto foo(T)(T t, string s) { struct Result { ... } ... }

If we define Result the way I suggest, then both have the same name 
(foo!(int).Result) even though they are different structs.


If you use horcrux on this right now, it actually is a bug, as only the 
first Result definition is considered 
(https://issues.dlang.org/show_bug.cgi?id=17653)


But I would propose, that we just make this an error. That is, you just 
have to rename the second one Result2, or something else.


1. This would break some code. Not much, but some rare examples. Most 
UFCS code has one definition of a function, or multiple definitions, but 
with different template parameters.


2. It would solve the exponential problem, as now only the template 
definition is considered, so the growth of the symbol is linear.


3. The savings from Rainer's patch still can be additive.

Thoughts?

-Steve


Re: Compilation times and idiomatic D code

2017-07-17 Thread Stefan Koch via Digitalmars-d

On Monday, 17 July 2017 at 12:51:37 UTC, jmh530 wrote:
On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis 
wrote:


int[] a;
auto b = a.map!(a => a / 2)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda1, int[])

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda2, MapResult!(__lambda1, int[]))

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => 
a < 7)();

pragma(msg, typeof(b));

then it prints out

FilterResult!(__lambda3, MapResult!(__lambda2, 
MapResult!(__lambda1,

int[])))



Is there any way - theoretically - to compute typeof(b) lazily, 
so that the information is only provided as needed?


typeof(b) is needed as soon as b is declared.
So no.
There is no point in making it lazy.


Re: Compilation times and idiomatic D code

2017-07-17 Thread jmh530 via Digitalmars-d

On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote:


int[] a;
auto b = a.map!(a => a / 2)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda1, int[])

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda2, MapResult!(__lambda1, int[]))

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a 
< 7)();

pragma(msg, typeof(b));

then it prints out

FilterResult!(__lambda3, MapResult!(__lambda2, 
MapResult!(__lambda1,

int[])))



Is there any way - theoretically - to compute typeof(b) lazily, 
so that the information is only provided as needed?


Re: Compilation times and idiomatic D code

2017-07-15 Thread Enamex via Digitalmars-d

On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote:
On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d 
wrote:

[...]
The issue has to do with how each invocation of a range-based 
function tends to result in a new template instantiation, and 
it's common practice in D to chain a bunch of templated 
function calls together. For instance, if you have


[...]

The type is getting progressively longer and more complicated, 
because when the function template is instantiated, it's 
instantiated with the type from the previous function call, so 
it's wrapping the previous type, and you get a new type that 
has the name of the type of its argument embedded in it. It's 
like if you keep wrapping a container inside another container.


[...]

- Jonathan M Davis


This was quite insightful, thank you.

All that time I'd assumed that 'symbols' as linkers used them 
were constant length :T


Just to be clear: 100% of that bloat resides in the symbol table? 
So the current proposed remedy is to hash symbols above a length 
threshold?


Besides the symbol problem though, does the template 
instantiation explosion problem imply as many duplicated function 
bodies corresponding to the new type symbols?


Re: Compilation times and idiomatic D code

2017-07-15 Thread Jonathan M Davis via Digitalmars-d
On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote:
> - What type information are being kept because of UFCS chains?
> Doesn't that mechanism simply apply overload resolution then
> choose between the prefix and .method forms as appropriate,
> rewriting the terms?
>  Then it's a problem of function invocation. I don't get
> what's happening here still. Does this tie to the Voldemort types
> problem? (=> are many of the functions in the chain returning
> custom types?) It doesn't make sense, especially if, from your
> indirection workaround, it looks like it would work around the
> same but without the bloat problem if we unlinked the chain into
> many intermediate temporary parts. So how is this a type
> information issue?

UFCS is irrelevant to all of this. That just so happens to be how he's
calling the functions. You'd get the same issue if you called all of the
functions in a chain with the traditional function call syntax rather than
treating them as a bunch of member functions. The issue has to do with how
each invocation of a range-based function tends to result in a new template
instantiation, and it's common practice in D to chain a bunch of templated
function calls together. For instance, if you have

int[] a;
auto b = a.map!(a => a / 2)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda1, int[])

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)();
pragma(msg, typeof(b));

then it prints out

MapResult!(__lambda2, MapResult!(__lambda1, int[]))

If you have

int[] a;
auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)();
pragma(msg, typeof(b));

then it prints out

FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1, 
int[])))

The type is getting progressively longer and more complicated, because when
the function template is instantiated, it's instantiated with the type from
the previous function call, so it's wrapping the previous type, and you get
a new type that has the name of the type of its argument embedded in it.
It's like if you keep wrapping a container inside another container.

Array!int a;
Array!(Array!int) b;
Array!(Array!(Array!int)) c;

The first level or two may not be so bad, but pretty quickly, it gets quite
ridiculous. The fact that we use auto and/or the name of the template
parameter in range-based code completely hides the fact that the type we're
operating on has a very long and hideous name. So, for the most part, you
don't care, but you do get ridiculously long symbol names underneath the
hood, and the compiler and linker have to deal with them, and it becomes a
performance problem. And as bad as this example is, map doesn't actually use
Voldemort types. MapResult is declared outside of map as a private struct.
The extra information that gets encoded in the symbol name with a Voldemort
type makes them _much_ worse, though interestingly enough, the compiler
doesn't print all of that out. e.g.

int[][] a;
auto b = a.joiner([5]);
pragma(msg, typeof(b));

just prints

Result

which doesn't look bad even though things are apparently much uglier
underneath the hood.

But the fact that working with ranges results in a whole bunch of chained
invocations of function templates tends to result in a much larger symbol
sizes than you'd get with code that wasn't so generic. And the fact that
Voldemort types have been pushed as a great way to encapsulate the range
types that result from function calls and really shouldn't be used by name
makes the situation that much worse. So, idiomatic D code currently results
in very large symbol names, and while in many cases, you don't notice, some
people (like H.S. Teoh) _are_ noticing, and it's becoming a problem. As I
understand it, Weka has quite a few problems with this. Fortunately, a fix
to condense symbol names should be along shortly, which will help the
situation considerably, but in general, we really need to look at how we can
make some of these common D idioms do better in terms of symbol size.

- Jonathan M Davis



Re: Compilation times and idiomatic D code

2017-07-15 Thread Stefan Koch via Digitalmars-d

On Saturday, 15 July 2017 at 11:10:32 UTC, Enamex wrote:

On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:

[...]


I have some stupid questions:

- What does everyone mean when they say 'symbol' here? I'm 
probably misunderstanding symbols gravely or it's something 
that DMD in particular handles in a strange way.


[...]


A symbol is anything which has an address and can/could be picked 
up by the linker.
In other words a symbol is anything which has a name. (or could 
have a name)


Re: Compilation times and idiomatic D code

2017-07-15 Thread Enamex via Digitalmars-d

On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
Here's a further update to the saga of combating ridiculously 
large symbol sizes.


So yesterday I wrote a new module that also heavily uses UFCS 
chains. My initial draft of the module, once I linked it with 
the main program, particularly with a UFCS chain that has led 
to the 600MB executable sizes seen above, caused another 
explosion in symbol size that actually managed to reach 100MB 
in *one* symbol, triggering a DMD termination complaining about 
possible infinite template recursion. :-D  Funnier still, 
temporarily simplifying part of the chain to bring symbol sizes 
down, I eventually got it below 100MB but ended up with linker 
segfaults and ELF errors because the huge symbol was too 
ridiculously huge.


Eventually, it drove me to refactor two Phobos functions that 
are used heavily in my code: std.range.chunks and 
std.algorithm.joiner, using the same "horcrux" technique (see 
Phobos PRs #5610 and #5611). This, together with some further 
refactoring in my own code, eventually brought things down to 
the 20MB range of executable sizes.


Then an idea occurred to me: the reason these symbol sizes got 
so large, was because the UFCS chain preserves *all* type 
information about every component type used to build the final 
type. So, by necessity, the type name has to somehow encode all 
of this information in an unambiguous way.  Now, arguably, 
DMD's current mangling scheme is at fault because it contains 
too many repeating components, but even if you disregard that, 
the fact remains that if you have 50+ components in your 
overall UFCS chain, the symbol length cannot be less than 50*n 
where n is the average length of a single component's type 
name, plus some necessary overhead to account for the mangling 
scheme syntax. Let's say n is on average 20-25 characters, say 
round it up to 35 for mangling syntax, so you're still looking 
at upwards of 1700+ characters *minimum*.  That, coupled with 
the current O(n^2) / O(n^3) mangling scheme, you easily reach 
megabytes of symbol length.  We can compress the symbols all we 
want, but there's a limit as to how much compression will help. 
At the end of the day, you still have to represent those 50+ 
components *somehow*.


But what if most of this type information is actually 
*unnecessary*?  To use a range example, if all you care about 
at the end of the chain is that it's a forward range of ubyte, 
then why even bother with multi-MB symbols encoding type 
information that's never actually used?  Maybe a little 
type-erasure would help, via hiding those 50+ component types 
behind an opaque runtime OO polymorphic interface.  Phobos does 
have the facilities of this, in the form of the InputRange, 
ForwardRange, etc., interfaces in std.range.interfaces.  In my 
case, however, part of the chain uses another generic type (a 
kind of generalization of 2D arrays). But either way, the idea 
is simple: at the end of the UFCS chain, wrap it in a class 
object that inherits from a generic interface that encodes only 
the primitives of the 2D array concept, and the element type. 
The class object itself, of course, still must retain knowledge 
of the full-fledged type, but the trick is that if we insert 
this type erasure in the middle of the chain, then later 
components don't have to encode the type names of earlier 
components anymore.


T


I have some stupid questions:

- What does everyone mean when they say 'symbol' here? I'm 
probably misunderstanding symbols gravely or it's something that 
DMD in particular handles in a strange way.


- What type information are being kept because of UFCS chains? 
Doesn't that mechanism simply apply overload resolution then 
choose between the prefix and .method forms as appropriate, 
rewriting the terms?
Then it's a problem of function invocation. I don't get 
what's happening here still. Does this tie to the Voldemort types 
problem? (=> are many of the functions in the chain returning 
custom types?) It doesn't make sense, especially if, from your 
indirection workaround, it looks like it would work around the 
same but without the bloat problem if we unlinked the chain into 
many intermediate temporary parts. So how is this a type 
information issue?


Thanks!


Re: Compilation times and idiomatic D code

2017-07-14 Thread Seb via Digitalmars-d

On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:

Result: 8MB executable size (!).

Of course, this comes at a runtime cost of an added level of 
indirection (now you have to call a virtual function through 
the interface to integrate with earlier components in the 
pipeline).  But at a 60% reduction in executable size, I think 
it's a worthwhile tradeoff.  :-)


That's a very interesting idea!
However, I wouldn't want to insert this into code manually, maybe 
a simple flag-opt-in support in DMD that detects UFCS pipes with 
X components and then inserts the type erasure hack could solve 
most problems for the time being? That way it wouldn't create a 
WTF moment when explaining D code and when Stefan or someone else 
manages to implement something more sophisticated, no code would 
need an update.





Re: Compilation times and idiomatic D code

2017-07-14 Thread H. S. Teoh via Digitalmars-d
On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]
> Here's a further update to the saga of combating ridiculously large
> symbol sizes.
[...]
>.wrapWithInterface // <--- type erasure happens here
[...]

Some further thoughts about type erasure and UFCS chains.

Nobody says type erasure *requires* OO -- that's just an artifact of the
way things are currently implemented.  Consider, for example, your
generic UFCS chain:

auto r = source
.ufcsFunc1
.ufcsFunc2
.ufcsFunc3
...
.ufcsFuncn;

>From the POV of outside code, *nobody cares* what the specific types of
each stage of the UFCS pipeline are.  Only the code that implements
ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the
specific type of the range that's returned by ufcsFunc3, in the middle
of the pipeline.  What are the chances this exact type is going to be
reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias
parameter that's instantiated with a lambda function, you can basically
guarantee that this type X will *never* ever be repeated again, outside
of this specific UFCS chain.

And at the end of the chain, typeof(r) really only needs to encode the
fact that it implements the API returned by ufcsFuncn, e.g., the range
methods, element type, etc.. But nobody will care how these API
functions are implemented under the hood -- as far as outside code is
concerned, typeof(r) might as well be an opaque serial number, and we'll
be able to find the instantiated API functions that implement the
methods of r.  IOW, the explicit types of all intermediate stages of the
UFCS chain are irrelevant to the outside world.  Since they are template
functions (most of the time) anyway, the compiler might as well just
glue all their code together, and put it inside functions named like
funcName0012345XYZ where the 0012345XYZ is some random unique string,
basically a serial number, that identifies it as a method of r.

For that matter, typeof(r) might as well be ufcsChain0012345XYZ, since
nobody outside the pipeline itself will care what its component types
are.

So basically, the compiler could just automatically type-erase the type
of such chains, given that (1) none of the UFCS functions "escape"
knowledge about an intermediate type to the outside world, (2) the chain
is essentially unique to the current function (esp. if one or more
components of the chain has an alias parameter bound to a lambda
argument), (3) most UFCS functions return opaque types anyway -- very
few (i.e., almost none) Phobos algorithms, for example, allow you to
recover the type of the previous component of the chain -- so it's not
relevant what the specific types of the previous components are, and (4)
therefore the outside world couldn't care less about the specific types
that compose typeof(r).  It could be as simple as bumping a global
counter and suffixing it to some compiler-reserved prefix, to make type
names like __ufcsType_0123.  This will be far shorter than any
symbol compression applied to present-day symbols!

To summarize, here's a scheme for handling template expansion in UFCS
chains that will produce extremely short symbols, *independently* of how
long your UFCS chains are.  If the compiler sees a UFCS chain with these
characteristics:

1) There are ≥ n components, for some arbitrary value of n (say n=5 or
n=10, or something like that, so that we don't bother with trivial cases
where the full type may actually be important);

2) Each component of the chain is a template function (this one is
almost a given);

3) At least one component guarantees a unique instantiation, e.g., an
alias parameter bound to a lambda function;

then the compiler will:

a) Generate a unique serial number (package.module + global counter
should be good enough);

b) Assign a unique typename to the final return value, constructed using
this serial number;

c) Essentially inline all template bodies of previous stages of the
pipeline into the last one, using the same serial number to generate
names for the aggregated symbols of the final stage.

This way, those ridiculously huge symbols aren't even generated to begin
with, thereby alleviating the need for convoluted compression schemes.


T

-- 
I am not young enough to know everything. -- Oscar Wilde


Re: Compilation times and idiomatic D code

2017-07-14 Thread Stefan Koch via Digitalmars-d

On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via 
Digitalmars-d wrote:

[...]

[...]

Here's a further update to the saga of combating ridiculously 
large symbol sizes.


[...]


You will be excited to hear that my template work will fix the 
described problem as a side effect.
Without incurring run-time overhead. Or forcing the usage of OO 
wrappers.
The downside however is that the scheme I have in mind is very 
complex and might take over a year to implement.


Re: Compilation times and idiomatic D code

2017-07-14 Thread H. S. Teoh via Digitalmars-d
On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via Digitalmars-d wrote:
> On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via 
> Digitalmars-d wrote:
> [...]
> > http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/
> [...]
> 
> Whoa.  I just tried your 'horcrux' trick, and discovered that it
> causes a further reduction in executable size!
> 
> Original sizes:
>   38144296 bytes (38.1 MB)
>   61412056 bytes (61.4 MB)
> 
> After factoring out Voldemort structs as module-global private structs:
> 
>8332352 bytes (8.3 MB)
>   10376584 bytes (10.4 MB)
> 
> After further refactoring to use the "horcrux" trick:
>8147392 bytes (8.1 MB)
>   10124136 bytes (10.1 MB)
[...]

Here's a further update to the saga of combating ridiculously large
symbol sizes.

So yesterday I wrote a new module that also heavily uses UFCS chains.
My initial draft of the module, once I linked it with the main program,
particularly with a UFCS chain that has led to the 600MB executable
sizes seen above, caused another explosion in symbol size that actually
managed to reach 100MB in *one* symbol, triggering a DMD termination
complaining about possible infinite template recursion. :-D  Funnier
still, temporarily simplifying part of the chain to bring symbol sizes
down, I eventually got it below 100MB but ended up with linker segfaults
and ELF errors because the huge symbol was too ridiculously huge.

Eventually, it drove me to refactor two Phobos functions that are used
heavily in my code: std.range.chunks and std.algorithm.joiner, using the
same "horcrux" technique (see Phobos PRs #5610 and #5611). This,
together with some further refactoring in my own code, eventually
brought things down to the 20MB range of executable sizes.

Then an idea occurred to me: the reason these symbol sizes got so large,
was because the UFCS chain preserves *all* type information about every
component type used to build the final type. So, by necessity, the type
name has to somehow encode all of this information in an unambiguous
way.  Now, arguably, DMD's current mangling scheme is at fault because
it contains too many repeating components, but even if you disregard
that, the fact remains that if you have 50+ components in your overall
UFCS chain, the symbol length cannot be less than 50*n where n is the
average length of a single component's type name, plus some necessary
overhead to account for the mangling scheme syntax. Let's say n is on
average 20-25 characters, say round it up to 35 for mangling syntax, so
you're still looking at upwards of 1700+ characters *minimum*.  That,
coupled with the current O(n^2) / O(n^3) mangling scheme, you easily
reach megabytes of symbol length.  We can compress the symbols all we
want, but there's a limit as to how much compression will help. At the
end of the day, you still have to represent those 50+ components
*somehow*.

But what if most of this type information is actually *unnecessary*?  To
use a range example, if all you care about at the end of the chain is
that it's a forward range of ubyte, then why even bother with multi-MB
symbols encoding type information that's never actually used?  Maybe a
little type-erasure would help, via hiding those 50+ component types
behind an opaque runtime OO polymorphic interface.  Phobos does have the
facilities of this, in the form of the InputRange, ForwardRange, etc.,
interfaces in std.range.interfaces.  In my case, however, part of the
chain uses another generic type (a kind of generalization of 2D arrays).
But either way, the idea is simple: at the end of the UFCS chain, wrap
it in a class object that inherits from a generic interface that encodes
only the primitives of the 2D array concept, and the element type. The
class object itself, of course, still must retain knowledge of the
full-fledged type, but the trick is that if we insert this type erasure
in the middle of the chain, then later components don't have to encode
the type names of earlier components anymore.

So, tl;dr:

// Before:
auto r = input
 .ufcsFunc1
 .ufcsFunc2
 .map!(e => e.ufcsFunc3.ufcsFunc4 ...)
 ...
 .ufcsFuncn
 .flattenToRange
 .rangeFunc1
 .rangeFunc2
 ...;

Result: 20MB executable size (600MB before the "horcrux" technique was
applied).

// After:
input.ufcsFunc1
 .ufcsFunc2
 .map!(e => e.ufcsFunc3.ufcsFunc4 ...)
 ...
 .ufcsFuncn
 .wrapWithInterface // <--- type erasure happens here
 .flattenToRange
 .rangeFunc1
 .rangeFunc2
 ...;

Result: 8MB executable size (!).

Of course, this comes at a runtime cost of an added level of indirection
(now you have to call a virtual function through the interface to
integrate with earlier components in the pipeline).  But at a 

Re: Compilation times and idiomatic D code

2017-07-13 Thread H. S. Teoh via Digitalmars-d
On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via 
Digitalmars-d wrote:
[...]
> http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/
> 
> I was surprised as well at the reduction. I only noticed the problem
> because when I was debugging my library, the program was taking an
> unreasonably long time to *print a stack trace* (like multiple
> seconds).
[...]

Whoa.  I just tried your 'horcrux' trick, and discovered that it causes
a further reduction in executable size!

Original sizes:
38144296 bytes (38.1 MB)
61412056 bytes (61.4 MB)

After factoring out Voldemort structs as module-global private structs:

 8332352 bytes (8.3 MB)
10376584 bytes (10.4 MB)

After further refactoring to use the "horcrux" trick:
 8147392 bytes (8.1 MB)
10124136 bytes (10.1 MB)

While this last reduction is rather modest relative to the total
executable sizes, it still represents a reduction of 200KB and 300KB
worth of symbols(!) each.  Given that this code base is a mere 5000+
LOC, I'd expect significant savings in larger template-heavy projects.


My guess as to reason for this reduction is because now you only have to
instantiate one template for each template function call, rather than
two:

private struct MyFuncImpl(Args...) { ... }
auto myFunc(Args...)(Args args) { return MyFuncImpl!Args(args); }

will instantiate two templates, MyFuncImpl and myFunc, each time with
the same argument list; whereas:

template myFunc(Args...)
{
auto myFunc(Args args) { return Impl(args); }
struct Impl { ... }
}

only requires a single template instantiation of myFunc, since Impl is
instantiated along with the function itself.

There's also the slightly shorter names involved (`myFunc!Args.Impl` vs.
`MyFuncImpl!Args.MyFuncImpl`), which, given the O(n^2) or O(n^3) symbol
length dependency, can quickly add up to non-trivial amounts of savings.


T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N


Re: Compilation times and idiomatic D code

2017-07-13 Thread Steven Schveighoffer via Digitalmars-d

On 7/13/17 3:23 PM, H. S. Teoh via Digitalmars-d wrote:

Today, I decided to sit down and refactor my code.  I've heard tell that
Voldemort types tend to cause an explosion in symbol size, and today I
thought it'd be interesting to see just how big of a difference this
actually makes.

I could not believe my eyes.

Before the refactoring, my executable sizes (I have 2 executables in
this codebase) were 37.3MB and 60.0MB, respectively.

After the refactoring, my executable sizes were, respectively, 8.2MB and
10.2MB. That's a 5x and 6x reduction in executable size, respectively.
Whoa.


Yep.

http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/

I was surprised as well at the reduction. I only noticed the problem 
because when I was debugging my library, the program was taking an 
unreasonably long time to *print a stack trace* (like multiple seconds).


-Steve


Re: Compilation times and idiomatic D code

2017-07-13 Thread H. S. Teoh via Digitalmars-d
On Wed, Jul 05, 2017 at 09:45:55PM -0600, Jonathan M Davis via Digitalmars-d 
wrote:
[...]
> In this case, I think that it's more that Voldemort types are biting
> us than that ranges are biting us (though the Voldemort types are
> typically ranges, so in practice, using ranges ends up biting you).
> 
> https://issues.dlang.org/show_bug.cgi?id=15831
> 
> I think that there's a good argument that Voldemort types aren't
> actually a particularly good idea given these issues, but the work
> that's being done to compress symbols should at least reduce the
> problem.
[...]

Here's the latest update on this bit of drama:

Today, I decided to sit down and refactor my code.  I've heard tell that
Voldemort types tend to cause an explosion in symbol size, and today I
thought it'd be interesting to see just how big of a difference this
actually makes.

I could not believe my eyes.

Before the refactoring, my executable sizes (I have 2 executables in
this codebase) were 37.3MB and 60.0MB, respectively.

After the refactoring, my executable sizes were, respectively, 8.2MB and
10.2MB. That's a 5x and 6x reduction in executable size, respectively.
Whoa.

And all I did was to move Voldemort structs out into module-level
private structs. I.e., this:

auto myFunc(Args...)(Args args) {
static struct Result {
...
}
return Result(args);
}

became this:

private MyFuncImpl(Args...)
{
...
}

auto myFunc(Args...)(Args args) {
return MyFuncImpl!Args(args);
}

And you get an instant 5x to 6x reduction in executable size.

Now mind you, the ridiculous symbol size problem still exists; the
executable still contains some huge symbols.  The difference is that now
the longest symbol is "only" 86KB, whereas before the refactoring, the
longest symbol was 777KB (!).  So the problem is less severe, but still
present.  So symbol compression is still a very important part of the
eventual solution.


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder


Re: Compilation times and idiomatic D code

2017-07-07 Thread H. S. Teoh via Digitalmars-d
On Fri, Jul 07, 2017 at 09:32:24AM -0400, Steven Schveighoffer via 
Digitalmars-d wrote:
> On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
> > On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote:
> > > On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
[...]
> > > > Rainer Schuetze is quite close to a solution. Which reduces the
> > > > symbol-name bloat significantly.
> > > > See https://github.com/dlang/dmd/pull/5855
[...]
> I'm super-psyched this has moved from "proof of concept" to ready for
> review. Kudos to Rainer for his work on this! Has been a PITA for a
> while:
> 
> https://issues.dlang.org/show_bug.cgi?id=15831
> https://forum.dlang.org/post/n96k3g$ka5$1...@digitalmars.com

Yes, kudos to Rainer for making this a (near) reality!


[...]
> I have found that the linker gets REALLY slow when the symbols get
> large. So it's not necessarily the compiler that's slow for this.
[...]

True, I didn't profile the compiler carefully to discern whether it was
the compiler that's slow, or the linker.  But either way, having smaller
symbols will benefit both.


T

-- 
Freedom: (n.) Man's self-given right to be enslaved by his own depravity.


Re: Compilation times and idiomatic D code

2017-07-07 Thread Steven Schveighoffer via Digitalmars-d

On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:

On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote:

On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:


Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the
symbol-name bloat significantly.
See https://github.com/dlang/dmd/pull/5855


That's very nice.  Hope we will get this through sooner rather than
later!


I'm super-psyched this has moved from "proof of concept" to ready for 
review. Kudos to Rainer for his work on this! Has been a PITA for a while:


https://issues.dlang.org/show_bug.cgi?id=15831
https://forum.dlang.org/post/n96k3g$ka5$1...@digitalmars.com


In any case, I think the actual compilation times would depend on the
details of the code.  If you're using relatively shallow UFCS chains,
like Phobos unittests tend to do, probably the compressed symbols won't
give very much advantage over the cost of computing the compression.
But if you have heavy usage of UFCS like in my code, this should cause
significant speedups from not having to operate on 300KB large symbols.


I have found that the linker gets REALLY slow when the symbols get 
large. So it's not necessarily the compiler that's slow for this.


-Steve


Re: Compilation times and idiomatic D code

2017-07-06 Thread John Colvin via Digitalmars-d

On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:

On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:


I vaguely remember there was talk about compressing symbols 
when they get too long... is there any hope of seeing this 
realized in the near future?


Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the 
symbol-name bloat significantly.

See https://github.com/dlang/dmd/pull/5855


There is still a problem with the template system as a whole.
Which I am working on in my spare time.
And which will become my focus after newCTFE is done.


Please give consent for the D Foundation to clone you.


Re: Compilation times and idiomatic D code

2017-07-06 Thread H. S. Teoh via Digitalmars-d
On Thu, Jul 06, 2017 at 01:32:04PM +, Atila Neves via Digitalmars-d wrote:
> On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
[...]
> > Yeah, it's usually all these D specific compile time features that
> > is slowing down compilation.
> > 
> > DWT and Tango are two good examples of large code bases where very
> > few of these features are used, they're written in a more
> > traditional style.  They're at least 200k lines of code each and,
> > IIRC, takes around 10 seconds (or less) to compile, for a full
> > build.
> 
> IIRC building Tango per package instead of all-at-once got the build
> time down to less than a second.
[...]

Well, obviously D's famed compilation speed must still be applicable
*somewhere*, otherwise we'd be hearing loud complaints. :-D

My point was that D's compile-time features, which are a big draw to me
personally, and also becoming a selling point of D, need improvement in
this area.

I'm very happy to be pointed to Rainer's PR that implements symbol
backreferencing compression.  Apparently it has successfully compressed
the largest symbol generated by Phobos unittests from 30KB (or something
like that) down to about 1100 characters, which, though still on the
large side, is much more reasonable than the present situation.  I hope
this PR will get merged in the near future.


T

-- 
Making non-nullable pointers is just plugging one hole in a cheese grater. -- 
Walter Bright


Re: Compilation times and idiomatic D code

2017-07-06 Thread Atila Neves via Digitalmars-d

On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:

On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:

[...]


It's not UFCS per say that causes the problem. If you're using 
the traditional calling syntax it would generate the same 
symbols.



[...]


Yeah, it's usually all these D specific compile time features 
that is slowing down compilation.


DWT and Tango are two good examples of large code bases where 
very few of these features are used, they're written in a more 
traditional style. They're at least 200k lines of code each 
and, IIRC, takes around 10 seconds (or less) to compile, for a 
full build.


IIRC building Tango per package instead of all-at-once got the 
build time down to less than a second.


Atila


Re: Compilation times and idiomatic D code

2017-07-06 Thread Jacob Carlborg via Digitalmars-d

On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:

Over time, what is considered "idiomatic D" has changed, and nowadays it
seems to be leaning heavily towards range-based code with UFCS chains
using std.algorithm and similar reusable pieces of code.


It's not UFCS per say that causes the problem. If you're using the 
traditional calling syntax it would generate the same symbols.



D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase
with only ~5000 lines of code, a good part of which are comments, takes
about 11 seconds to compile.


Yeah, it's usually all these D specific compile time features that is 
slowing down compilation.


DWT and Tango are two good examples of large code bases where very few 
of these features are used, they're written in a more traditional style. 
They're at least 200k lines of code each and, IIRC, takes around 10 
seconds (or less) to compile, for a full build.


--
/Jacob Carlborg


Re: Compilation times and idiomatic D code

2017-07-05 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, July 05, 2017 13:12:40 H. S. Teoh via Digitalmars-d wrote:
[...]
> IOW, the very range-based idiom that has become one of the defining
> characteristics of modern D is negating the selling point of fast
> compilation.
>
> I vaguely remember there was talk about compressing symbols when they
> get too long... is there any hope of seeing this realized in the near
> future?

In this case, I think that it's more that Voldemort types are biting us than
that ranges are biting us (though the Voldemort types are typically ranges,
so in practice, using ranges ends up biting you).

https://issues.dlang.org/show_bug.cgi?id=15831

I think that there's a good argument that Voldemort types aren't actually a
particularly good idea given these issues, but the work that's being done to
compress symbols should at least reduce the problem.

- Jonathan M Davis



Re: Compilation times and idiomatic D code

2017-07-05 Thread H. S. Teoh via Digitalmars-d
On Wed, Jul 05, 2017 at 09:18:45PM +, jmh530 via Digitalmars-d wrote:
> On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
> > 
> > Yes there is.
> > Rainer Schuetze is quite close to a solution. Which reduces the
> > symbol-name bloat significantly.
> > See https://github.com/dlang/dmd/pull/5855

That's very nice.  Hope we will get this through sooner rather than
later!


[...]
> A table in the comments [1] shows a significant reduction in bloat
> when compiling phobos unit tests. However, it shows a slight increase
> in build time. I would have expected a decrease. Any idea why that is?
> 
> [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542

The same comment points to a refactoring PR by Walter (dmd #6841). Not
sure why that PR would interact with this one in this way.

In any case, I think the actual compilation times would depend on the
details of the code.  If you're using relatively shallow UFCS chains,
like Phobos unittests tend to do, probably the compressed symbols won't
give very much advantage over the cost of computing the compression.
But if you have heavy usage of UFCS like in my code, this should cause
significant speedups from not having to operate on 300KB large symbols.


T

-- 
Help a man when he is in trouble and he will remember you when he is in trouble 
again.


Re: Compilation times and idiomatic D code

2017-07-05 Thread jmh530 via Digitalmars-d

On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:


Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the 
symbol-name bloat significantly.

See https://github.com/dlang/dmd/pull/5855




A table in the comments [1] shows a significant reduction in 
bloat when compiling phobos unit tests. However, it shows a 
slight increase in build time. I would have expected a decrease. 
Any idea why that is?


[1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542


Re: Compilation times and idiomatic D code

2017-07-05 Thread kinke via Digitalmars-d

On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
I vaguely remember there was talk about compressing symbols 
when they get too long... is there any hope of seeing this 
realized in the near future?


LDC has an experimental feature replacing long names by their 
hash; ldc2 -help:

...
  -hash-threshold=- Hash symbol names 
longer than this threshold (experimental)


Re: Compilation times and idiomatic D code

2017-07-05 Thread Stefan Koch via Digitalmars-d

On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:


I vaguely remember there was talk about compressing symbols 
when they get too long... is there any hope of seeing this 
realized in the near future?


Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the 
symbol-name bloat significantly.

See https://github.com/dlang/dmd/pull/5855


There is still a problem with the template system as a whole.
Which I am working on in my spare time.
And which will become my focus after newCTFE is done.


Compilation times and idiomatic D code

2017-07-05 Thread H. S. Teoh via Digitalmars-d
Over time, what is considered "idiomatic D" has changed, and nowadays it
seems to be leaning heavily towards range-based code with UFCS chains
using std.algorithm and similar reusable pieces of code.

D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase
with only ~5000 lines of code, a good part of which are comments, takes
about 11 seconds to compile.

A first hint is that these meager 5000 lines of code compile to a 600MB
executable. Well, large executables have been the plague of D since the
beginning, but the reasoning has always been that hello world examples
don't really count, because the language offers the machinery for much
more than that, and the idea is that as the code size grows, the "bloat
to functionality" ratio decreases.  But still... 600MB for 5000 lines of
code seems a bit excessive. Especially when stripping symbols cut off
about *half* of that size.

Which leads to the discovery, to my horror, that there are some very,
VERY large symbols that are generated. Including one that's 31
characters long. Yes, that's almost 400KB just for ONE symbol.  This
particular symbol is the result of a long UFCS chain in the main
program, and contains a lot of repeated elements, like
myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments
and so on.  Each additional member in the UFCS chain causes a repetition
of all the previous members' return type names, plus the new typename,
causing an O(n^2) explosion in symbol size.

Worse yet, because the typename encoded in this monster symbol is a
range, you have the same 300+KB of typename repeated for each of the
range primitives. And anything else this typename happens to be a
template argument to.  There's another related symbol that's 388944
characters long.  Not to mention all the range primitives (along with
their similarly huge typenames) of all the smaller types contained
within this monster typename.

Given this, it's no surprise that the compiler took 11 seconds to
compile a 5000-line program. Just imagine how much time is spent
generating these huge symbols, storing them in the symbol table,
comparing them in symbol table lookups, writing them to the executable,
etc..  And we're not even talking about the other smaller, but still
huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB
symbols, etc..  And think about the impact of this on the compiler's
memory footprint.

IOW, the very range-based idiom that has become one of the defining
characteristics of modern D is negating the selling point of fast
compilation.

I vaguely remember there was talk about compressing symbols when they
get too long... is there any hope of seeing this realized in the near
future?


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune