Re: Templates problem

2016-09-09 Thread deXtoRious via Digitalmars-d-learn

On Friday, 9 September 2016 at 13:32:16 UTC, Russel Winder wrote:

Why write algorithms in C or C++ when you can do it in Chapel?


For the moment, the objective answers to that question seem: you 
need GPGPU (especially CUDA, which is vastly more convenient to 
use from C++ than from anything else), you're attached to a C++ 
codebase/libraries (calling C code is trivial, C++ far from it 
generally) or you need the low level flexibility afforded by 
C/C++. Personally, I probably won't consider Chapel for serious 
projects until at least the first point is solved.


Of course, there's also a host of subjective reasons -- 
attachment to more familiar languages, difficulty of selling new 
languages to coworkers, preference for larger communities, etc. 
All of those take time to overcome.


Re: Templates problem

2016-09-08 Thread deXtoRious via Digitalmars-d-learn
On Thursday, 8 September 2016 at 10:20:42 UTC, Russel Winder 
wrote:
On Wed, 2016-09-07 at 20:29 +, deXtoRious via 
Digitalmars-d-learn wrote:



[…]
More to the general point of the discussion, I find that most 
scientifically minded users of Python already appreciate some 
of the inherent advantages of lower level statically typed 
languages and often rather write C/C++ code than descend into 
the likes of Cython. D has considerable advantages over C++ in 
conciseness and template facilities for achieving zero cost 
static polymorphism without descending into utter 
unreadability. Personally, I find myself still forced to write 
most of my non-Julia high performance code in C++ due to the 
available libraries and GPGPU support (especially CUDA), but 
in terms of language properties I'd much rather be writing D.


Or Chapel.


It's very early days for Chapel at the moment, but I don't really 
see it as being remotely comparable to D or even Julia, it's much 
closer to a DSL than a general purpose language. That's by no 
means a bad thing, it seems like it could be a very useful tool 
in a few years, but it's never going to completely substitute for 
the likes of Python, C++ or D even for purely scientific 
programming. I'm also a bit concerned about how limited the 
compile time facilities seem there at the moment, but I guess 
we'll just have to wait and see how it develops over the next 
couple of years.





Re: Templates problem

2016-09-07 Thread deXtoRious via Digitalmars-d-learn
On Wednesday, 7 September 2016 at 20:57:03 UTC, data pulverizer 
wrote:
On Wednesday, 7 September 2016 at 20:29:51 UTC, deXtoRious 
wrote:
On Wednesday, 7 September 2016 at 19:19:23 UTC, data 
pulverizer wrote:
The "One language to rule them all" motif of Julia has hit 
the rocks; one reason is because they now realize that their 
language is being held back because the compiler cannot infer 
certain types for example: 
http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/


As an avid user of Julia, I'm going to have to disagree very 
strongly with this statement. The language is progressing very 
nicely and while it doesn't aim to be the best choice for 
every programming task imaginable...


Ahem (http://www.wired.com/2014/02/julia/), I'm not saying that 
the Julia founders approved that title, we all know how the 
press can inflate things, but there was a certain rhetoric that 
Julia was creating something super-special that would change 
everything.


That's just typical press nonsense, and even they quote Bezanson 
saying how Julia isn't at all suited to a whole host of 
applications. Julia certainly has (justifiable, imho, though only 
time will tell) aspirations of being useful in certain areas of 
general computing, not just scientific code, but they are far 
from universal applicability, let alone optimality. If nothing 
else, it's an interesting example of thinking rather far outside 
the usual box of language design, one with demonstrable real 
world applications.




Re: Templates problem

2016-09-07 Thread deXtoRious via Digitalmars-d-learn
On Wednesday, 7 September 2016 at 19:19:23 UTC, data pulverizer 
wrote:
The "One language to rule them all" motif of Julia has hit the 
rocks; one reason is because they now realize that their 
language is being held back because the compiler cannot infer 
certain types for example: 
http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/


As an avid user of Julia, I'm going to have to disagree very 
strongly with this statement. The language is progressing very 
nicely and while it doesn't aim to be the best choice for every 
programming task imaginable, it already does an excellent job of 
letting a scientific programmer such as myself do most of my 
workflow in a single language with remarkable performance. 
Furthermore, the article you linked pertains to a simple type 
inference issue, exposed by the design constraints of a 
particular library. While certain design patterns can and often 
do lead to Python-style Julia code with optimal performance, you 
can always get there by manually enforcing type stability at the 
cost of less pretty code.


More to the general point of the discussion, I find that most 
scientifically minded users of Python already appreciate some of 
the inherent advantages of lower level statically typed languages 
and often rather write C/C++ code than descend into the likes of 
Cython. D has considerable advantages over C++ in conciseness and 
template facilities for achieving zero cost static polymorphism 
without descending into utter unreadability. Personally, I find 
myself still forced to write most of my non-Julia high 
performance code in C++ due to the available libraries and GPGPU 
support (especially CUDA), but in terms of language properties 
I'd much rather be writing D.




Re: Simple performance question from a newcomer

2016-02-24 Thread dextorious via Digitalmars-d-learn

On Wednesday, 24 February 2016 at 03:33:14 UTC, Mike Parker wrote:

On Tuesday, 23 February 2016 at 20:03:30 UTC, dextorious wrote:
For instance, I am still not sure how to make it pass the -O5 
switch to the LDC2 compiler and the impression I got from the 
documentation is that explicit manual switches can only be 
supplied for the DMD compiler.


If you're referring to this:

"Additional flags passed to the D compiler - note that these 
flags are usually specific to the compiler in use, but a set of 
flags is automatically translated from DMD to the selected 
compiler"


My take is that a specific set of flags are automatically 
translated (so you don't need to make a separate dflags entry 
for each compiler you support if you only use those flags), but 
you can pass any compiler-specific flags you need.


There's part of what I'm referring to, yes. There doesn't seem to 
be any documentation on what gets translated and what doesn't.


For the moment, the only way I've found to manually pass specific 
compiler options ("-O5 -singleobj" in my case) is by settings the 
dflags attribute when defining a buildType. However, there 
doesn't seem to be any way to specify different dflags for 
different compilers, so I am forced to introduce separately named 
buildTypes for each compiler. Since I still need to manually 
specify the compiler using the --compiler option when running 
dub, this feels like I'm using a hacky workaround rather than a 
consistently designed CLI. Furthermore, from the documentation, I 
have no idea if what I'm doing is the intended way or just an 
ugly hack around whatever piece of information I've missed.


Re: Simple performance question from a newcomer

2016-02-23 Thread dextorious via Digitalmars-d-learn

On Tuesday, 23 February 2016 at 14:07:22 UTC, Marc Schütz wrote:

On Tuesday, 23 February 2016 at 11:10:40 UTC, ixid wrote:
We really need to standard algorithms to be fast and perhaps 
have separate ones for perfect technical accuracy.




While I agree with most of what you're saying, I don't think we 
should prioritize performance over accuracy or correctness. 
Especially for numerics people, precision is very important, 
and it can make a just as bad first impression if we don't get 
this right. We can however make the note in the documentation 
(which already talks about performance) a bit more prominent: 
http://dlang.org/phobos/std_algorithm_iteration.html#sum


Being new to the language, I certainly make no claims about what 
the Phobos library should do, but coming from a heavy numerics 
background in many languages, I can say that this is the first 
time I've seen a common summation function do anything beyond 
naive summation. Some languages feature more accurate options 
separately, but never as the default, so it did not occur to me 
to specifically check the documentation for something like sum() 
(which is my fault, of course, no issues there). Having the more 
accurate pairwise summation algorithm in the standard library is 
certainly worthwhile for some applications, but I was a bit 
surprised to see it as the default.


Re: Simple performance question from a newcomer

2016-02-23 Thread dextorious via Digitalmars-d-learn

On Tuesday, 23 February 2016 at 11:10:40 UTC, ixid wrote:

On Monday, 22 February 2016 at 15:43:23 UTC, dextorious wrote:
I do have to wonder, however, about the default settings of 
dub in this case. Having gone through its documentation, I 
might still not have guessed to try the compiler options you 
provided, thereby losing out on a 2-3x performance 
improvement. What build options did you use in your dub.json 
that it managed to translate to the correct compiler switches?


Your experience is exactly what the D community needs to get 
right. You've come in as an interested user with patience and 
initially D has offered slightly disappointing performance for 
both technical reasons and because of the different compilers. 
You've gotten to the right place in the end but we need point A 
to point B to be a lot smoother and more obvious so more people 
get a good initial impression of D.


Every D user thread seems to go like this- someone starts with 
DMD, they then struggle a little and hopefully get LDC working 
with a list of slightly obscure compiler switches offered. A 
standard algorithm performs disappointingly for somewhat valid 
technical reasons and more clunky alternatives are then 
deployed. We really need to standard algorithms to be fast and 
perhaps have separate ones for perfect technical accuracy.


What are your thoughts on D now? What would have helped you get 
to the right place much faster?


Personally, I think a few aspects of documentation for the 
various compilers, dub and possibly the dlang.org website itself 
could be improved, if accessibility is considered important. For 
instance, just to take my journey with trying out D as an 
example, I can immediately list a few points where I 
misunderstood or failed to find relevant information:


1. While the dlang.org website does a good job presenting the 
three compilers side by side with a short pro/con list for each 
and does mention that DMD produces slower code, I did not at 
first expect the difference to be half an order of magnitude or 
more. In retrospect, after reading the forums and learning about 
how each compiler works, this is quite obvious, but the initial 
impression was misleading.


2. The LDC compiler gave me a few issues during setup, 
particularly on Windows. The binaries supplied are dynamically 
linked against the MSVS2015 runtime (and will fail on any other 
system) and seem to require a full Visual Studio installation. I 
assume there are good reasons for this (though I hope in the 
future a more widely usable version could be made available), but 
the fact itself could be made clearer on the download page (it 
can be found after some searching on the D wiki and the forums).


3. The documentation for the dub package is useful, but somewhat 
difficult to read due to how it is structured and does not seem 
complete. For instance, I am still not sure how to make it pass 
the -O5 switch to the LDC2 compiler and the impression I got from 
the documentation is that explicit manual switches can only be 
supplied for the DMD compiler. It says that when using other 
compilers, the relevant switches are automatically translated to 
appropriate options for GDC/LDC, but no further details are 
supplied and no matter what options I set for the DMD compiler, 
using --compiler=ldc2 only yields -O and not -O5. For the moment, 
I'm compiling my code and managing dependencies manually like I 
would in C++, which is just fine for me personally, but does 
leave a slightly disappointing impression about what is 
apparently considered a semi-official package manager for the D 
language.


Of course, this is just my anecdotal experience and should not be 
taken as major criticism. It may be that I missed something or 
did not do enough research. Certainly, some amount of adjustment 
is to be expected when learning a new language, but there does 
seem to be some room for improvement.


Re: Simple performance question from a newcomer

2016-02-22 Thread dextorious via Digitalmars-d-learn

On Sunday, 21 February 2016 at 16:20:30 UTC, bachmeier wrote:
First, a minor point, the D community is usually pretty careful 
not to frown on a particular coding style (unlike some 
communities) so if you are comfortable writing loops and it 
gives you the fastest code, you should do so.


On the performance issue, you can see this related post about 
performance with reduce:

http://forum.dlang.org/post/mailman.4829.1434623275.7663.digitalmar...@puremagic.com

This was Walter's response:
http://forum.dlang.org/post/mlvb40$1tdf$1...@digitalmars.com

And this shows that LDC flat out does a better job of 
optimization in this case:

http://forum.dlang.org/post/mailman.4899.1434779705.7663.digitalmar...@puremagic.com


While I certainly do not doubt the open mindedness of the D 
community, it was in part Walter Bright's statement during a 
keynote speech of how "loops are bugs" that motivated me to look 
at D for a fresh approach to writing numerical code. For decades, 
explicit loops have been the only way to attain good performance 
for certain kinds of code in virtually all languages (discounting 
a few quirky high level languages like MATLAB) and the notion 
that this need not be the case is quite attractive to many 
people, myself included.


While the point Walter makes, that there is no mathematical 
reason ranges should be slower than loops and that loops are 
generally easier to get wrong is certainly true, D is the first 
general purpose language I've ever seen that makes this sentiment 
come close to reality.


Re: Simple performance question from a newcomer

2016-02-22 Thread dextorious via Digitalmars-d-learn
First of all, I am pleasantly surprised by the rapid influx of 
helpful responses. The community here seems quite wonderful. In 
the interests of not cluttering the thread too much, since the 
advice given here has many commonalities, I will only try to 
respond once to each type of suggestion.


On Sunday, 21 February 2016 at 16:29:26 UTC, ZombineDev wrote:
The problem is not with ranges, but with the particualr 
algorithm used for summing. If you look at the docs 
(http://dlang.org/phobos-prerelease/std_algorithm_iteration.html#.sum) you'll see that if the range has random-access `sum` will use the pair-wise algorithm. About the second and third tests, the problem is with DMD which should not be used when measuring performance (but only for development, because it has fast compile-times).

...
According to `dub --verbose`, my command-line was roughly this:
ldc2 -ofapp -release -O5 -singleobj -w source/app.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/internal.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/iteration.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/package.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/selection.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/slice.d


It appears that I cannot use the GDC compiler for this particular 
problem due to it using a comparatively older version of the DMD 
frontend (I understand Mir requires >=2.068), but I did manage to 
get LDC working on my system after a bit of work. Since I've been 
using dub to manage my project, I used the default "release" 
build type. I also tried compiling manually with LDC, using the 
-O5 switch you mentioned. These are the results (I increased the 
iteration count to lessen the noise, the array is now 1x20, 
each function is run a thousand times):


DMDLDC (dub)LDC (-release -enable-inlining 
-O5 -w -singleobj)

sumtest1:12067 ms  6899 ms  1940 ms
sumtest2: 3076 ms  1349 ms   452 ms
sumtest3: 2526 ms   847 ms   434 ms
sumtest4: 5614 ms  1481 ms   452 ms

The sumtest1, 2 and 3 functions are as given in the first post, 
sumtest4 uses the range.reduce!((a, b) => a + b) approach to 
enforce naive summation. Much to my satisfaction, the 
range.reduce version is now exactly as quick as the traditional 
loop and while function inlining isn't quite perfect, the 4% 
performance penalty incurred by the 10_000 function calls (or 
whatever inlined form the function finally takes) is quite 
acceptable.


I do have to wonder, however, about the default settings of dub 
in this case. Having gone through its documentation, I might 
still not have guessed to try the compiler options you provided, 
thereby losing out on a 2-3x performance improvement. What build 
options did you use in your dub.json that it managed to translate 
to the correct compiler switches?


Simple performance question from a newcomer

2016-02-21 Thread dextorious via Digitalmars-d-learn
I've been vaguely aware of D for many years, but the recent 
addition of std.experimental.ndslice finally inspired me to give 
it a try, since my main expertise lies in the domain of 
scientific computing and I primarily use Python/Julia/C++, where 
multidimensional arrays can be handled with a great deal of 
expressiveness and flexibility. Before writing anything serious, 
I wanted to get a sense for the kind of code I would have to 
write to get the best performance for numerical calculations, so 
I wrote a trivial summation benchmark. The following code gave me 
slightly surprising results:


import std.stdio;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.range;
import std.experimental.ndslice;

void main() {
int N = 1000;
int Q = 20;
int times = 1_000;
double[] res1 = uninitializedArray!(double[])(N);
double[] res2 = uninitializedArray!(double[])(N);
double[] res3 = uninitializedArray!(double[])(N);
auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
StopWatch sw;
double t0, t1, t2;
sw.start();
foreach (unused; 0..times) {
for (int i=0; i a + b) instead of f.sum in sumtest1, 
but that yielded even worse performance. I did not try the 
GDC/LDC compilers yet, since they don't seem to be up to date on 
the standard library and don't include the ndslice package last I 
checked.


Now, seeing as how my experience writing D is literally a few 
hours, is there anything I did blatantly wrong? Did I miss any 
optimizations? Most importantly, can the elegant operator 
chaining style be generally made as fast as the explicit loops 
we've all been writing for decades?