Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-16 Thread Dicebot via Digitalmars-d-announce

On Sunday, 15 June 2014 at 21:38:18 UTC, Dmitry Olshansky wrote:

15-Jun-2014 20:21, Dicebot пишет:
On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky 
wrote:
But let's face it - it's a one-time job to get it right in 
your

favorite build tool. Then you have fast and cached (re)build.
Comparatively costs of CTFE generation are paid in full 
during _each_

build.


There is no such thing as one-time job in programming unless 
you work
alone and abandon any long-term maintenance. As time goes any 
mistake

that can possibly happen will inevitably happen.


The frequency of such event is orders of magnitude smaller. 
Let's not take arguments to supreme as then doing anything is 
futile due to the potential of mistake it introduces sooner or 
later.


It is more likely to happen if you change you build scripts more 
often. And this is exactly what you propose.


I am not going to say it is impractical, just mentioning flaws 
that make me seek for better solution.


Automation. Dumping the body of ctRegex is manual work after 
all, including putting it with the right symbol. In proposed 
scheme it's just a matter of copy-pasting a pattern after 
initial setup has been done.


I think defining regexes in separate module is even less of an 
effort than adding few lines to the build script ;)


It is somewhat worse because you don't routinely change 
external

libraries, as opposed to local sources.



But surely we have libraries that are built as separate project 
and are external dependencies, right? There is nothing new 
here except that d--obj--lib file is changed to 
generator--generated D file---obj file.


Ok, I am probably convinced on this one. Incidentally I do always 
prefer full source builds as opposed to static library separation 
inside application itself. When there is enough RAM for dmd of 
course :)



Huge mess to maintain. According to my experience
dub is terrible at defining any complicated build models. 
Pretty much
anything that is not single step compile-them-all approach can 
only be

done via calling external shell script.


I'm not going to like dub then ;)


It is primarily source dependency manager, not a build tool. I 
remember Sonke mentioning it is intentionally kept simplistic to 
guarantee no platform-unique features are ever needed.


For anything complicated I'd probably wrap dub call inside 
makefile to prepare all necessary extra files.



If using external generators is
necessary I will take make over anything else :)


Then I understand your point about inevitable mistakes, it's 
all in the tool.


make is actually pretty good if you don't care about other 
platforms than Linux. Well, other than stupid whitespace 
sensitivity. But it is incredibly good at defining build systems 
with chained dependencies.


What I want to point out is to not mistake goals and the means 
to an end. No matter how we call it CTFE code generation is 
just a means to an end, with serious limitations (especially as 
it stands today, in the real world).


I agree. What I do disagree about is definition of the goal. It 
is not just generating code, it is generating code in a manner 
understood by compiler.


For instance if D compiler allowed external tools as plugins 
(just an example to show means vs ends distinction) with some 
form of the following construct:


mixin(call_external_tool(args, 3, 14, 15, .92));

it would make any generation totally practical *today*.


But this is exactly the case when language integration gives you 
nothing over build system solution :) If compiler itself is not 
aware how code gets generated from arguments, there is no real 
advantage in putting tool invocation inline.


How long till C preprocessor is working at CTFE? How long till 
it's practical to do:


mixin(htod(import(some_header.h)));

and have it done optimally fast at CTFE?


Never, but it is not really about being fast or convenient. For 
htod you don't want just C grammar / preprocessor support, you 
want it as good as one in real C compilers.


My answer is - no amount of JITing CTFE and compiler 
architecture improvements in foreseeable future will get it 
better then standalone tool(s), due to the mentioned 
_fundamental_ limitations.


There are real practical boundaries on where an internal 
interpreter can stay competitive.


I don't see any fundamental practical boundaries. Quality of 
implementation ones - sure. Quite the contrary, I totally see how 
better compiler can easily outperform any external tools for most 
build tasks despite somewhat worse JIT codegen - it has huge 
advantage of being able to work on language semantical entities 
and not just files. That allows much smarter caching and 
dependency tracking, something external tools will never be able 
to achieve.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-15 Thread Dicebot via Digitalmars-d-announce

On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky wrote:
Consider something like REST API generator I have described 
during
DConf. There is different code generated in different contexts 
from same
declarative description - both for server and client. Right 
now simple
fact that you import very same module from both gives solid 
100%
guarantee that API usage between those two programs stays in 
sync.


But let's face it - it's a one-time job to get it right in your 
favorite build tool. Then you have fast and cached (re)build. 
Comparatively costs of CTFE generation are paid in full during 
_each_ build.


There is no such thing as one-time job in programming unless you 
work alone and abandon any long-term maintenance. As time goes 
any mistake that can possibly happen will inevitably happen.


In your proposed scenario there will be two different 
generated files
imported by server and client respectively. Tiny typo in 
writing your
build script will result in hard to detect run-time bug while 
code

itself still happily compiles.


Or a link error if we go a hybrid path where the imported 
module is emitting declarations/hooks via CTFE to be linked to 
by the proper generated code. This is something I'm thinking 
that could be a practical solution.


snip


What is the benefit of this approach over simply keeping all 
ctRegex bodies in separate package, compiling it as a static 
library and referring from actual app by own unique symbol? This 
is something that can does not need any changes in compiler or 
Phobos, just matter of project layout.


It does not work for more complicated cases were you actually 
need access to generated sources (generate templates for example).


You may keep convenience but losing guarantees hurts a lot. To 
be able
to verify static correctness of your program / group of 
programs type
system needs to be aware how generated code relates to 
original source.


Build system does it. We have this problem with all of external 
deps anyway (i.e. who verifies the right version of libXYZ is 
linked not some other?)


It is somewhat worse because you don't routinely change external 
libraries, as opposed to local sources.



Huge mess to maintain. According to my experience
all builds systems are incredibly fragile beasts, trusting them
something that impacts program correctness and won't be 
detected at

compile time is just too dangerous.


Could be, but we have dub which should be simple and nice.
I had very positive experience with scons and half-generated 
sources.


dub is terrible at defining any complicated build models. Pretty 
much anything that is not single step compile-them-all approach 
can only be done via calling external shell script. If using 
external generators is necessary I will take make over anything 
else :)



snip


tl; dr: I believe that we should improve compiler technology to 
achieve same results instead of promoting temporary hacks as the 
true way to do things. Relying on build system is likely to be 
most practical solution today but it is not solution I am 
satisfied with and hardly one I can accept as accomplished target.


Imaginary compiler that continuously runs as daemon/service, is 
capable of JIT-ing and provides basic dependency tracking as part 
of compilation step should behave as good as any external 
solution with much better correctness guarantees and overall user 
experience out of the box.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-15 Thread Dmitry Olshansky via Digitalmars-d-announce

15-Jun-2014 20:21, Dicebot пишет:

On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky wrote:

But let's face it - it's a one-time job to get it right in your
favorite build tool. Then you have fast and cached (re)build.
Comparatively costs of CTFE generation are paid in full during _each_
build.


There is no such thing as one-time job in programming unless you work
alone and abandon any long-term maintenance. As time goes any mistake
that can possibly happen will inevitably happen.


The frequency of such event is orders of magnitude smaller. Let's not 
take arguments to supreme as then doing anything is futile due to the 
potential of mistake it introduces sooner or later.



In your proposed scenario there will be two different generated files
imported by server and client respectively. Tiny typo in writing your
build script will result in hard to detect run-time bug while code
itself still happily compiles.


Or a link error if we go a hybrid path where the imported module is
emitting declarations/hooks via CTFE to be linked to by the proper
generated code. This is something I'm thinking that could be a
practical solution.

snip


What is the benefit of this approach over simply keeping all ctRegex
bodies in separate package, compiling it as a static library and
referring from actual app by own unique symbol? This is something that
can does not need any changes in compiler or Phobos, just matter of
project layout.


Automation. Dumping the body of ctRegex is manual work after all, 
including putting it with the right symbol. In proposed scheme it's just 
a matter of copy-pasting a pattern after initial setup has been done.



It does not work for more complicated cases were you actually need
access to generated sources (generate templates for example).


Indeed, this is a limitation, and the import of generated source would 
be required.



You may keep convenience but losing guarantees hurts a lot. To be able
to verify static correctness of your program / group of programs type
system needs to be aware how generated code relates to original source.


Build system does it. We have this problem with all of external deps
anyway (i.e. who verifies the right version of libXYZ is linked not
some other?)


It is somewhat worse because you don't routinely change external
libraries, as opposed to local sources.



But surely we have libraries that are built as separate project and are 
external dependencies, right? There is nothing new here except that 
d--obj--lib file is changed to generator--generated D file---obj 
file.



Huge mess to maintain. According to my experience
all builds systems are incredibly fragile beasts, trusting them
something that impacts program correctness and won't be detected at
compile time is just too dangerous.


Could be, but we have dub which should be simple and nice.
I had very positive experience with scons and half-generated sources.


dub is terrible at defining any complicated build models. Pretty much
anything that is not single step compile-them-all approach can only be
done via calling external shell script.


I'm not going to like dub then ;)


If using external generators is
necessary I will take make over anything else :)


Then I understand your point about inevitable mistakes, it's all in the 
tool.



snip


tl; dr: I believe that we should improve compiler technology to achieve
same results instead of promoting temporary hacks as the true way to do
things. Relying on build system is likely to be most practical solution
today but it is not solution I am satisfied with and hardly one I can
accept as accomplished target.
Imaginary compiler that continuously runs as daemon/service, is capable
of JIT-ing and provides basic dependency tracking as part of compilation
step should behave as good as any external solution with much better
correctness guarantees and overall user experience out of the box.


What I want to point out is to not mistake goals and the means to an 
end. No matter how we call it CTFE code generation is just a means to an 
end, with serious limitations (especially as it stands today, in the 
real world).


Seamless integration is not about packing everything into single 
compiler invocation:


dmd src/*.d

Generation is generation, as long as it's fast and automatic it solves 
the problem(s) meta programming was established to solve.


For instance if D compiler allowed external tools as plugins (just an 
example to show means vs ends distinction) with some form of the 
following construct:


mixin(call_external_tool(args, 3, 14, 15, .92));

it would make any generation totally practical *today*. This was 
proposed before, and dismissed out of fear of security risks, never 
identifying the proper set of restrictions. After all we have textual 
mixins of potential security risk no problem.


Let's focus on the facts that this has the benefits of:
- sane debugging of the plug-in (it's just a program with the usual symbols)
- fast, as the 

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-14 Thread Dicebot via Digitalmars-d-announce

On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:
It's always nice to ask something on D NG, so many good answers 
I can hardly choose whom to reply ;) So this is kind of 
broadcast.


Yes, the answer seems spot on - reflection! But allow me to 
retort.


I'm not talking about completely stand-alone generator. Just as 
well generator tool could be written in D using the same exact 
sources as your D program does. Including the static 
introspection and type-awareness. Then generator itself is a 
library + an invocation script in D.


The Q is specifically of CTFE in this scenario, including not 
only obvious shortcomings of design, but fundamental ones of 
compilation inside of compilation. Unlike proper compilation is 
has nothing persistent to back it up. It feels backwards, a bit 
like C++ TMP but, of course, much-much better.



1)

Reflection. It is less of an issue for pure DSL solutions 
because those
don't provide any good reflection capabilities anyway, but 
other code

generation approaches have very similar problems.

By doing all code generation in separate build step you 
potentially lose
many of guarantees of keeping various parts of your 
application in sync.




Use the same sources for the generator. In essence all is the 
same, just relying on separate runs and linkage, not mixin. 
Necessary hooks to link to later could indeed be generated 
with a tiny bit of CTFE.


Yes, deeply embedded stuff might not be that easy. The scope 
and damage is smaller though.



2)

Moving forward. You use traditional reasoning of DSL generally 
being
something rare and normally stable. This fits most common DSL 
usage but
tight in-language integration D makes possible brings new 
opportunities
of using DSL and code generation casually all other your 
program.




Well, I'm biased by heavy-handed ones. Say I have a (no longer) 
secret plan of doing a next-gen parser generator in D. Needless 
to say swaths of non-trivial code generation. I'm all for 
embedding nicely but I see very little _practical_ gains in 
CTFE+mixin here EVEN if CTFE wouldn't suck. See the point above 
about using the same metadata and types as the user application 
would.


Consider something like REST API generator I have described 
during DConf. There is different code generated in different 
contexts from same declarative description - both for server and 
client. Right now simple fact that you import very same module 
from both gives solid 100% guarantee that API usage between those 
two programs stays in sync.


In your proposed scenario there will be two different generated 
files imported by server and client respectively. Tiny typo in 
writing your build script will result in hard to detect run-time 
bug while code itself still happily compiles.


You may keep convenience but losing guarantees hurts a lot. To be 
able to verify static correctness of your program / group of 
programs type system needs to be aware how generated code relates 
to original source.


Also this approach does not scale. I can totally imagine you 
doing it for two or three DSL in single program, probably even 
dozen. But something like 100+? Huge mess to maintain. According 
to my experience all builds systems are incredibly fragile 
beasts, trusting them something that impacts program correctness 
and won't be detected at compile time is just too dangerous.


I totally expect programming culture to evolve to the point 
where
something like 90% of all application code is being generated 
in typical
project. D has good base for promoting such paradigm switch 
and reducing

any unnecessary mental context switches is very important here.

This was pretty much the point I was trying to make with my 
DConf talk (

and have probably failed :) )


I liked the talk, but you know ... 4th or 5th talk with 
CTFE/mixin I think I might have been distracted :)


More specifically this bright future of 90%+ concise DSL driven 
programs is undermined by the simple truth - no amount of 
improvement in CTFE would make generators run faster then 
optimized standalone tool invocation. The tool (library written 
in D) may read D metadata just fine.


I heard D builds times are important part of its adoption so...


Adoption - yes. Production usage - less so (though still 
important). Difference between 1 second and 5 seconds is very 
important. Between 10 seconds and 1 minute - not so much.


JIT will be probably slower than stand-alone generators but not 
that slower.


It might solve most of _current_ problems, but I foresee 
fundamental issues of no global state in CTFE that in say 10 
years from now would look a lot like `#include` in C++.


I hope 10 years ago from now we will consider having global state 
in RTFE stone age relict :P


A major one is there is no way for compiler to not recompile 
generated code as it has no knowledge of how it might have 
changed from the previous run.


Why can't we merge basic build system functionality akin to 

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-14 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 6/14/14, 8:05 AM, Dicebot wrote:

Adoption - yes. Production usage - less so (though still important).
Difference between 1 second and 5 seconds is very important. Between 10
seconds and 1 minute - not so much.


Wait, what? -- Andrei


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-14 Thread Dicebot via Digitalmars-d-announce
On Saturday, 14 June 2014 at 15:25:11 UTC, Andrei Alexandrescu 
wrote:

On 6/14/14, 8:05 AM, Dicebot wrote:
Adoption - yes. Production usage - less so (though still 
important).
Difference between 1 second and 5 seconds is very important. 
Between 10

seconds and 1 minute - not so much.


Wait, what? -- Andrei


If build time becomes long enough that it forces you to switch 
the mental context, it is less important how long it takes - you 
are much likely to do something else and return to it later. Of 
course it can also get to famous C++ hours of build time which is 
next level of inconvenience :)


But reasonably big and complicated project won't build in 5 
seconds anyway (even with perfect compiler), so eventually pure 
build time becomes less of a selling point. Still important but 
not _that_ important.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-14 Thread Dmitry Olshansky via Digitalmars-d-announce

14-Jun-2014 19:05, Dicebot пишет:

On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:

[snip]

Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret
plan of doing a next-gen parser generator in D. Needless to say swaths
of non-trivial code generation. I'm all for embedding nicely but I see
very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't
suck. See the point above about using the same metadata and types as
the user application would.


Consider something like REST API generator I have described during
DConf. There is different code generated in different contexts from same
declarative description - both for server and client. Right now simple
fact that you import very same module from both gives solid 100%
guarantee that API usage between those two programs stays in sync.


But let's face it - it's a one-time job to get it right in your favorite 
build tool. Then you have fast and cached (re)build. Comparatively costs 
of CTFE generation are paid in full during _each_ build.



In your proposed scenario there will be two different generated files
imported by server and client respectively. Tiny typo in writing your
build script will result in hard to detect run-time bug while code
itself still happily compiles.


Or a link error if we go a hybrid path where the imported module is 
emitting declarations/hooks via CTFE to be linked to by the proper 
generated code. This is something I'm thinking that could be a practical 
solution.


I.e. currently to get around wasting cycles again and again:

module a;
bool verify(string s){
  static re = ctRegex!; return match(s, re);
}
//
module b;
import a;
void foo(){
...
verify(blah);
...
}

vs would-be hybrid approach:

module gen_re;

void main() //or wrap it in tiny template mixin
{
generateCtRegex(
//all patterns
);
}

module b;
import std.regex;
//notice no import of a

void foo(){
...
static re = ctRegex!(...); //
...
}
and using ctRegex as usual in b, but any miss of compiled cache would 
lead to a link error.


In fact it might be the best of both worlds if there is a switch to try 
full CTFE vs link-time external option.




You may keep convenience but losing guarantees hurts a lot. To be able
to verify static correctness of your program / group of programs type
system needs to be aware how generated code relates to original source.


Build system does it. We have this problem with all of external deps 
anyway (i.e. who verifies the right version of libXYZ is linked not some 
other?)



Also this approach does not scale. I can totally imagine you doing it
for two or three DSL in single program, probably even dozen. But
something like 100+?


Not everything is suitable, of course. Some stuff  is good only inline 
and on spot. But it does use the same sources, it may look a lot like 
this in case of REST generators:


import everything;

void main(){
foreach(m; module){
//... generate client code from meta-data
}
}

Waiting for 100+ DSL compiled in a JIT interpreter that can't optimize a 
thing (pretty much by definition or use separate flags for that?) is not 
going to be fun too.



Huge mess to maintain. According to my experience
all builds systems are incredibly fragile beasts, trusting them
something that impacts program correctness and won't be detected at
compile time is just too dangerous.


Could be, but we have dub which should be simple and nice.
I had very positive experience with scons and half-generated sources.



I heard D builds times are important part of its adoption so...


Adoption - yes. Production usage - less so (though still important).
Difference between 1 second and 5 seconds is very important. Between 10
seconds and 1 minute - not so much.

JIT will be probably slower than stand-alone generators but not that
slower.


It might solve most of _current_ problems, but I foresee fundamental
issues of no global state in CTFE that in say 10 years from now
would look a lot like `#include` in C++.


I hope 10 years ago from now we will consider having global state in
RTFE stone age relict :P


Well, no amount of purity dismisses the point that a cache is a cache. 
When I say global in D I mean thread/fiber local.





A major one is there is no way for compiler to not recompile generated
code as it has no knowledge of how it might have changed from the
previous run.


Why can't we merge basic build system functionality akin to rdmd into
compiler itself? It makes perfect sense to me as build process can
benefit a lot from being semantically aware.


I wouldn't cross my fingers, but yes ideally it would need to have 
powers of a build system making it that much more complicated. Then it 
can cache results including templates instantiations across module and 
separate invocations of the tool. It's a distant dream though.


Currently available caching at the level of object files is very coarse 
grained and 

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dmitry Olshansky via Digitalmars-d-announce

11-Jun-2014 22:03, Atila Neves пишет:

On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:

At about 40.42 in the Thoughts on static regex there is written
even compile-time printf would be awesome. There is a patch about
__ctWrite in GitHug, it should be fixed and merged.

Bye,
bearophile


I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe said
what I was thinking of saying: unit test and debug the CTFE function at
runtime and then use it at compile-time when it's ready for production.



Yes, that's a starting point - a function working at R-T.


Yes, Dmitry brought up compiler bugs. But if you write a compile-time UT
and it fails, you'll know it wasn't because of your own code because the
run-time ones still pass.


It doesn't help that it's not your fault :)
And with a bit of __ctfe's to workaround compiler bugs you won't be so 
sure of your code anymore.




Maybe there's still a place for something more than pragma msg, but I'd
definitely advocate for the above at least in the beginning. If
anything, easier ways to write compile-time UTs would be, to me,
preferable to a compile-time printf.



There is nice assertCTFEable written by Kenji in Phobos. I think it's 
our private magic for now but I see no reason not to expose it somewhere.



Atila



--
Dmitry Olshansky


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dmitry Olshansky via Digitalmars-d-announce

12-Jun-2014 03:29, Adam D. Ruppe пишет:

On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:

I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe
said what I was thinking of saying: unit test and debug the CTFE
function at runtime and then use it at compile-time when it's ready
for production.


Aye. It wasn't long ago that this wasn't really possible because of how
incomplete and buggy CTFE was, you kinda had to do it with special code,
but now so much of the language works, there's a good chance if it works
at runtime it will work at compile time too.

I was really surprised with CTFE a few months ago when I tried to use my
dom.d with it... and it actually worked. That's amazing to me.

But anyway, in general, the ctfe mixin stuff could be replaced with an
external code generator, so yeah that's the way I write them now - as a
code generator standalone thing then go back and enum it to actually
use. (BTW I also like to generate fairly pretty code, e.g. indentend
properly, just because it makes it easier to read.)



This one thing I'm loosing sleep over - what precisely is so good in 
CTFE code generation in _practical_ context (DSL that is quite stable, 
not just tiny helpers)?


By the end of day it's just about having to write a trivial line in your 
favorite build system (NOT make) vs having to wait for a couple of 
minutes each build hoping the compiler won't hit your system's memory 
limits.


And these couple of minutes are more like 30 minutes at a times. Worse 
yet unlike proper build system it doesn't keep track of actual changes 
(same regex patterns get recompiled over and over), at this point 
seamless integration into the language starts felling like a joke.


And speaking of seamless integration: just generate a symbol name out of 
pattern at CTFE to link to later, at least this much can be done 
relatively fast. And voila even the clunky run-time generation is not 
half-bad at integration.


Unless things improve dramatically CTFE code generation + mixin is just 
our funny painful toy.


--
Dmitry Olshansky


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread bearophile via Digitalmars-d-announce

Dmitry Olshansky:


Unless things improve dramatically CTFE code generation +


An alternative and much faster JITter for LLVM, something like 
this could make CTFE on LDC2 very quick:

http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/fast-jit-code-generation.pdf

Bye,
bearophile


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Colin via Digitalmars-d-announce

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:

12-Jun-2014 03:29, Adam D. Ruppe пишет:

On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:
I wish I'd taken the mic at the end, and 2 days later Adam D. 
Ruppe
said what I was thinking of saying: unit test and debug the 
CTFE
function at runtime and then use it at compile-time when it's 
ready

for production.


Aye. It wasn't long ago that this wasn't really possible 
because of how
incomplete and buggy CTFE was, you kinda had to do it with 
special code,
but now so much of the language works, there's a good chance 
if it works

at runtime it will work at compile time too.

I was really surprised with CTFE a few months ago when I tried 
to use my

dom.d with it... and it actually worked. That's amazing to me.

But anyway, in general, the ctfe mixin stuff could be replaced 
with an
external code generator, so yeah that's the way I write them 
now - as a
code generator standalone thing then go back and enum it to 
actually
use. (BTW I also like to generate fairly pretty code, e.g. 
indentend

properly, just because it makes it easier to read.)



And these couple of minutes are more like 30 minutes at a 
times. Worse yet unlike proper build system it doesn't keep 
track of actual changes (same regex patterns get recompiled 
over and over), at this point seamless integration into the 
language starts felling like a joke.



Maybe a change to the compiler to write any mixin'd string out to 
a temporary file (along with some identifier information and the 
line of code that generated it) and at the next compilation time 
try reading it back from that file iff the line of code that 
generated it hasnt changed?


Then, there'd be no heavy work for the compiler to do, apart from 
read that file in to a string.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread dennis luehring via Digitalmars-d-announce

Am 12.06.2014 11:17, schrieb Dmitry Olshansky:

This one thing I'm loosing sleep over - what precisely is so good in
CTFE code generation in_practical_  context (DSL that is quite stable,
not just tiny helpers)?

By the end of day it's just about having to write a trivial line in your
favorite build system (NOT make) vs having to wait for a couple of
minutes each build hoping the compiler won't hit your system's memory
limits.

And these couple of minutes are more like 30 minutes at a times. Worse
yet unlike proper build system it doesn't keep track of actual changes
(same regex patterns get recompiled over and over), at this point
seamless integration into the language starts felling like a joke.

And speaking of seamless integration: just generate a symbol name out of
pattern at CTFE to link to later, at least this much can be done
relatively fast. And voila even the clunky run-time generation is not
half-bad at integration.

Unless things improve dramatically CTFE code generation + mixin is just
our funny painful toy.


you should write a big top post about your CTFE experience/problems - it 
is important enough


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Joakim via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 17:19:42 UTC, Dicebot wrote:
On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu 
wrote:

Watch, discuss, upvote!

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476386465166135296

https://www.facebook.com/dlang.org/posts/863635576983458

http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


Andrei


http://youtu.be/hkaOciiP11c


Great talk, just finished watching the youtube upload.  I zoned 
out during the livestream, as it was late over here and I was 
falling asleep during this fairly technical talk, but now that 
I'm awake, enjoyed going through it.


Never knew how regular expression engines are implemented, good 
introduction to the topic and how D made your approach easier or 
harder.  A model talk for DConf, particularly given the great 
results on the regex-dna benchmark.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Artur Skawina via Digitalmars-d-announce
On 06/12/14 11:17, Dmitry Olshansky via Digitalmars-d-announce wrote:
 This one thing I'm loosing sleep over - what precisely is so good in CTFE 
 code generation in _practical_ context (DSL that is quite stable, not just 
 tiny helpers)?

Language integration; direct access to meta data (such as types, but
also constants).

 By the end of day it's just about having to write a trivial line in your 
 favorite build system (NOT make) vs having to wait for a couple of minutes 
 each build hoping the compiler won't hit your system's memory limits.

If it really was only about an extra makefile rule then CTFE wouldn't
make much difference; it would just be an explicitly-requested smarter
version of constant folding. But that is not the case.

Simple example: create a function that implements an algorithm
which is derived from some type given to it as input. /Derived/
does not mean that it only contains some conditionally executed
code that depends on some property of that type; it means that
the algorithm itself is determined from the type. With the
external-generator solution you can emit a templated function,
but what you can *not* do is emit code based on meta-data or
CT introspection - because the necessary data simply isn't
available when the external generator runs.
With CTFE you have direct access to all the data and generating
the code becomes almost trivial. It makes a night-and-day type of
difference.
While you could implement a sufficiently-smart-generator that could
handle some subset of the functionality of CTFE, it would be
prohibitively expensive to do so, wouldn't scale and would often be
pointless, if you had to resort to generating code containing mixin
expressions anyway. There's a reason why this isn't done in other
languages that don't have CTFE.

 Unless things improve dramatically CTFE code generation + mixin is just our 
 funny painful toy.

The code snippets posted here are of course just toy programs.
This does not mean that CTFE and mixins are merely toys, they
enable writing code in ways that just isn't practically possible
in other languages. The fact that there isn't much such publicly
available code is just a function of D's microscopic user base.

Real Programmers write mixins that write mixins.

artur


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dicebot via Digitalmars-d-announce

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
This one thing I'm loosing sleep over - what precisely is so 
good in CTFE code generation in _practical_ context (DSL that 
is quite stable, not just tiny helpers)?


By the end of day it's just about having to write a trivial 
line in your favorite build system (NOT make) vs having to wait 
for a couple of minutes each build hoping the compiler won't 
hit your system's memory limits.


Oh, this is a very good question :) There are two unrelated 
concerns here:


1)

Reflection. It is less of an issue for pure DSL solutions because 
those don't provide any good reflection capabilities anyway, but 
other code generation approaches have very similar problems.


By doing all code generation in separate build step you 
potentially lose many of guarantees of keeping various parts of 
your application in sync.


2)

Moving forward. You use traditional reasoning of DSL generally 
being something rare and normally stable. This fits most common 
DSL usage but tight in-language integration D makes possible 
brings new opportunities of using DSL and code generation 
casually all other your program.


I totally expect programming culture to evolve to the point where 
something like 90% of all application code is being generated in 
typical project. D has good base for promoting such paradigm 
switch and reducing any unnecessary mental context switches is 
very important here.


This was pretty much the point I was trying to make with my DConf 
talk ( and have probably failed :) )


And these couple of minutes are more like 30 minutes at a 
times. Worse yet unlike proper build system it doesn't keep 
track of actual changes (same regex patterns get recompiled 
over and over), at this point seamless integration into the 
language starts felling like a joke.


And speaking of seamless integration: just generate a symbol 
name out of pattern at CTFE to link to later, at least this 
much can be done relatively fast. And voila even the clunky 
run-time generation is not half-bad at integration.


Unless things improve dramatically CTFE code generation + mixin 
is just our funny painful toy.


Unfortunately current implementation of frontend falls behind 
language capabilities a lot. There are no fundamental reasons why 
it can't work with better compiler. In fact, deadlnix has made a 
very good case for SDC taking over as next D frontend exactly 
because of things like CTFE JIT.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dicebot via Digitalmars-d-announce

On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
Maybe a change to the compiler to write any mixin'd string out 
to a temporary file (along with some identifier information and 
the line of code that generated it) and at the next compilation 
time try reading it back from that file iff the line of code 
that generated it hasnt changed?


Then, there'd be no heavy work for the compiler to do, apart 
from read that file in to a string.


Compiler can cache return value of function that get called from 
inside mixin statement (for a given argument set). As CTFE is 
implicitly pure (no global state at compile-time) later generated 
code can be simply re-used for same argument set.


Re-using it between compiler invocations is more tricky because 
it is only legal if generator function and all stuff they 
indirectly use have not changed too. Ignoring this requirement 
can result in nasty build issues that are only fixed by clean 
build. Too harmful in my opinion.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Colin via Digitalmars-d-announce

On Thursday, 12 June 2014 at 12:31:09 UTC, Dicebot wrote:

On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
Maybe a change to the compiler to write any mixin'd string out 
to a temporary file (along with some identifier information 
and the line of code that generated it) and at the next 
compilation time try reading it back from that file iff the 
line of code that generated it hasnt changed?


Then, there'd be no heavy work for the compiler to do, apart 
from read that file in to a string.


Compiler can cache return value of function that get called 
from inside mixin statement (for a given argument set). As CTFE 
is implicitly pure (no global state at compile-time) later 
generated code can be simply re-used for same argument set.


Re-using it between compiler invocations is more tricky because 
it is only legal if generator function and all stuff they 
indirectly use have not changed too. Ignoring this requirement 
can result in nasty build issues that are only fixed by clean 
build. Too harmful in my opinion.


Yeah, it quite dangerous I agree. I was only thinking of a 
solution to the problem above where a ctRegex is compiled every 
time, whether it was changed or not.


I'm sure theres some way of keeping track of all dependent D 
modules filename, and if any of them have been changed in the 
chain, recalculate the string mixin.


Only trouble with that is, there'd be a good chunk of checking 
for every mixin, and would slow the compiler down in normal use 
cases.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Timon Gehr via Digitalmars-d-announce

On 06/12/2014 02:31 PM, Dicebot wrote:

Compiler can cache return value of function that get called from inside
mixin statement (for a given argument set). As CTFE is implicitly pure
(no global state at compile-time) later generated code can be simply
re-used for same argument set.



Re-using it between compiler invocations is more tricky because it is
only legal if generator function and all stuff they indirectly use have
not changed too. Ignoring this requirement can result in nasty build
issues that are only fixed by clean build. Too harmful in my opinion.


Clearly, nirvana is continuous compilation, where the compiler performs 
explicit dependency management at the level of nodes in the syntax tree.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dicebot via Digitalmars-d-announce

On Thursday, 12 June 2014 at 12:49:23 UTC, Timon Gehr wrote:

On 06/12/2014 02:31 PM, Dicebot wrote:
Compiler can cache return value of function that get called 
from inside
mixin statement (for a given argument set). As CTFE is 
implicitly pure
(no global state at compile-time) later generated code can be 
simply

re-used for same argument set.


Re-using it between compiler invocations is more tricky 
because it is
only legal if generator function and all stuff they indirectly 
use have
not changed too. Ignoring this requirement can result in nasty 
build
issues that are only fixed by clean build. Too harmful in my 
opinion.


Clearly, nirvana is continuous compilation, where the compiler 
performs explicit dependency management at the level of nodes 
in the syntax tree.


Yeah I was wondering if we can merge some of rdmd functionality 
into compiler to speed up rebuilds and do better dependency 
tracking. But I am not sure it can fit nicely into current 
frontend architecture.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Adam D. Ruppe via Digitalmars-d-announce

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
This one thing I'm loosing sleep over - what precisely is so 
good in CTFE code generation in _practical_ context (DSL that 
is quite stable, not just tiny helpers)?


I've asked this same question before and my answer is mostly the 
same as dicebot: I think reflection is the important bit. Of 
course, even there it is sometimes useful to break it into two 
steps (one just prints the data out kinda like dmd -X then a 
regular program reads it and generates the code), but I find it 
really useful to read D code and generate stuff based on that.


By the end of day it's just about having to write a trivial 
line in your favorite build system (NOT make)


it is actually pretty trivial in make too...


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Dmitry Olshansky via Digitalmars-d-announce

12-Jun-2014 16:25, Dicebot пишет:

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:

This one thing I'm loosing sleep over - what precisely is so good in
CTFE code generation in _practical_ context (DSL that is quite stable,
not just tiny helpers)?

By the end of day it's just about having to write a trivial line in
your favorite build system (NOT make) vs having to wait for a couple
of minutes each build hoping the compiler won't hit your system's
memory limits.


Oh, this is a very good question :) There are two unrelated concerns here:



It's always nice to ask something on D NG, so many good answers I can 
hardly choose whom to reply ;) So this is kind of broadcast.


Yes, the answer seems spot on - reflection! But allow me to retort.

I'm not talking about completely stand-alone generator. Just as well 
generator tool could be written in D using the same exact sources as 
your D program does. Including the static introspection and 
type-awareness. Then generator itself is a library + an invocation 
script in D.


The Q is specifically of CTFE in this scenario, including not only 
obvious shortcomings of design, but fundamental ones of compilation 
inside of compilation. Unlike proper compilation is has nothing 
persistent to back it up. It feels backwards, a bit like C++ TMP but, of 
course, much-much better.



1)

Reflection. It is less of an issue for pure DSL solutions because those
don't provide any good reflection capabilities anyway, but other code
generation approaches have very similar problems.

By doing all code generation in separate build step you potentially lose
many of guarantees of keeping various parts of your application in sync.



Use the same sources for the generator. In essence all is the same, just 
relying on separate runs and linkage, not mixin. Necessary hooks to 
link to later could indeed be generated with a tiny bit of CTFE.


Yes, deeply embedded stuff might not be that easy. The scope and damage 
is smaller though.



2)

Moving forward. You use traditional reasoning of DSL generally being
something rare and normally stable. This fits most common DSL usage but
tight in-language integration D makes possible brings new opportunities
of using DSL and code generation casually all other your program.



Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret 
plan of doing a next-gen parser generator in D. Needless to say swaths 
of non-trivial code generation. I'm all for embedding nicely but I see 
very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't 
suck. See the point above about using the same metadata and types as the 
user application would.



I totally expect programming culture to evolve to the point where
something like 90% of all application code is being generated in typical
project. D has good base for promoting such paradigm switch and reducing
any unnecessary mental context switches is very important here.

This was pretty much the point I was trying to make with my DConf talk (
and have probably failed :) )


I liked the talk, but you know ... 4th or 5th talk with CTFE/mixin I 
think I might have been distracted :)


More specifically this bright future of 90%+ concise DSL driven programs 
is undermined by the simple truth - no amount of improvement in CTFE 
would make generators run faster then optimized standalone tool 
invocation. The tool (library written in D) may read D metadata just fine.


I heard D builds times are important part of its adoption so...




And these couple of minutes are more like 30 minutes at a times. Worse
yet unlike proper build system it doesn't keep track of actual changes
(same regex patterns get recompiled over and over), at this point
seamless integration into the language starts felling like a joke.

And speaking of seamless integration: just generate a symbol name out
of pattern at CTFE to link to later, at least this much can be done
relatively fast. And voila even the clunky run-time generation is not
half-bad at integration.

Unless things improve dramatically CTFE code generation + mixin is
just our funny painful toy.


Unfortunately current implementation of frontend falls behind language
capabilities a lot. There are no fundamental reasons why it can't work
with better compiler.


It might solve most of _current_ problems, but I foresee fundamental 
issues of no global state in CTFE that in say 10 years from now would 
look a lot like `#include` in C++. A major one is there is no way for 
compiler to not recompile generated code as it has no knowledge of how 
it might have changed from the previous run.



In fact, deadlnix has made a very good case for
SDC taking over as next D frontend exactly because of things like CTFE JIT.


Yeah, we ought to help him!

--
Dmitry Olshansky


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 6/12/14, 4:04 AM, dennis luehring wrote:

you should write a big top post about your CTFE experience/problems - it
is important enough


yes please


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-12 Thread Atila Neves via Digitalmars-d-announce

On Thursday, 12 June 2014 at 08:42:49 UTC, Dmitry Olshansky wrote:

11-Jun-2014 22:03, Atila Neves пишет:

On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:
At about 40.42 in the Thoughts on static regex there is 
written
even compile-time printf would be awesome. There is a patch 
about

__ctWrite in GitHug, it should be fixed and merged.

Bye,
bearophile


I wish I'd taken the mic at the end, and 2 days later Adam D. 
Ruppe said
what I was thinking of saying: unit test and debug the CTFE 
function at
runtime and then use it at compile-time when it's ready for 
production.




Yes, that's a starting point - a function working at R-T.

Yes, Dmitry brought up compiler bugs. But if you write a 
compile-time UT
and it fails, you'll know it wasn't because of your own code 
because the

run-time ones still pass.


It doesn't help that it's not your fault :)
And with a bit of __ctfe's to workaround compiler bugs you 
won't be so sure of your code anymore.




Maybe there's still a place for something more than pragma 
msg, but I'd

definitely advocate for the above at least in the beginning. If
anything, easier ways to write compile-time UTs would be, to 
me,

preferable to a compile-time printf.



There is nice assertCTFEable written by Kenji in Phobos. I 
think it's our private magic for now but I see no reason not to 
expose it somewhere.



Atila


It helps; you won't lose time looking at your code and wondering. 
I thought of the __cfte problem though: that would mean different 
code paths and what I said wouldn't be valid anymore.


Atila


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-11 Thread Atila Neves via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:
At about 40.42 in the Thoughts on static regex there is 
written even compile-time printf would be awesome. There is a 
patch about __ctWrite in GitHug, it should be fixed and merged.


Bye,
bearophile


I wish I'd taken the mic at the end, and 2 days later Adam D. 
Ruppe said what I was thinking of saying: unit test and debug the 
CTFE function at runtime and then use it at compile-time when 
it's ready for production.


Yes, Dmitry brought up compiler bugs. But if you write a 
compile-time UT and it fails, you'll know it wasn't because of 
your own code because the run-time ones still pass.


Maybe there's still a place for something more than pragma msg, 
but I'd definitely advocate for the above at least in the 
beginning. If anything, easier ways to write compile-time UTs 
would be, to me, preferable to a compile-time printf.


Atila


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-11 Thread Adam D. Ruppe via Digitalmars-d-announce

On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:
I wish I'd taken the mic at the end, and 2 days later Adam D. 
Ruppe said what I was thinking of saying: unit test and debug 
the CTFE function at runtime and then use it at compile-time 
when it's ready for production.


Aye. It wasn't long ago that this wasn't really possible because 
of how incomplete and buggy CTFE was, you kinda had to do it with 
special code, but now so much of the language works, there's a 
good chance if it works at runtime it will work at compile time 
too.


I was really surprised with CTFE a few months ago when I tried to 
use my dom.d with it... and it actually worked. That's amazing to 
me.


But anyway, in general, the ctfe mixin stuff could be replaced 
with an external code generator, so yeah that's the way I write 
them now - as a code generator standalone thing then go back and 
enum it to actually use. (BTW I also like to generate fairly 
pretty code, e.g. indentend properly, just because it makes it 
easier to read.)


Yes, Dmitry brought up compiler bugs. But if you write a 
compile-time UT and it fails, you'll know it wasn't because of 
your own code because the run-time ones still pass.


Yeah, good point too.


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-10 Thread Dicebot via Digitalmars-d-announce
On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu 
wrote:

Watch, discuss, upvote!

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476386465166135296

https://www.facebook.com/dlang.org/posts/863635576983458

http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


Andrei


http://youtu.be/hkaOciiP11c


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-10 Thread bearophile via Digitalmars-d-announce
At about 40.42 in the Thoughts on static regex there is written 
even compile-time printf would be awesome. There is a patch 
about __ctWrite in GitHug, it should be fixed and merged.


Bye,
bearophile