Re: foreach - premature optimization vs cultivating good habits

2015-02-01 Thread Laeeth Isharc via Digitalmars-d-learn
Thank you Adam, Bbaz and Ola for the helpful thoughts.  I dumped 
them in a wiki page off the sandbox but needs editing and 
refining.


Re: foreach - premature optimization vs cultivating good habits

2015-01-31 Thread via Digitalmars-d-learn

On Friday, 30 January 2015 at 14:41:11 UTC, Laeeth Isharc wrote:
Thanks, Adam.  That's what I had thought (your first 
paragraph), but something Ola on a different thread confused me 
and made me think I didn't understand it, and I wanted to pin 
it down.


There is always significant optimization effects in long running 
loops:

- SIMD
- cache locality / prefetching

For the former (SIMD) you need to make sure that good code is 
generated either by hand, by using vectorized libraries or by 
auto vectorization.


For the latter (cache) you need to make sure that the prefetcher 
is able to predict or is being told to prefetch explicitly and 
also that the working set is small enough to stay at the faster 
cache levels.


If you want good performance you cannot ignore any of these, and 
you have to design the data structures and algorithms for it. 
Prefetching has to happen maybe 100 instructions before the 
actual load from memory and AVX requires byte alignment and a 
layout that fits the algorithm. On next gen Xeon Skylake I think 
the alignment might go up to 64 byte and you have 512 bits wide 
registers (so you can do 8 64 bit floating point operations in 
parallel per core). The difference between issuing 1-4 ops and 
issuing 8-16 per time unit is noticable...


An of course, the closer your code is to theoretical throughput 
in the CPU, the more critical it becomes to not wait for memory 
loads.


This is also a moving target...


Re: foreach - premature optimization vs cultivating good habits

2015-01-30 Thread Laeeth Isharc via Digitalmars-d-learn

On Friday, 30 January 2015 at 12:55:20 UTC, Adam D. Ruppe wrote:

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
As I understand it, foreach allocates when a simple C-style 
for using an array index would not.


foreach is just syntax sugar over a for loop. If there's any 
allocations, it is because your code had some, it isn't inherit 
to the loop. The doc definition even lists the translation of 
foreach to for in the case of ranges explicitly:


http://dlang.org/statement.html#ForeachStatement


The most likely allocation would be to a user-defined opApply 
delegate, and you can prevent that by making it opApply(scope 
your_delegate) - the scope word prevents any closure allocation.


Thanks, Adam.  That's what I had thought (your first paragraph), 
but something Ola on a different thread confused me and made me 
think I didn't understand it, and I wanted to pin it down.


The second paragraph is very helpful - appreciate it.


Laeeth.


Re: foreach - premature optimization vs cultivating good habits

2015-01-30 Thread John Colvin via Digitalmars-d-learn

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:

Hi.

The standard advice is not to worry about memory usage and 
execution speed until profiling shows you where the problem is, 
and I respect Knuth greatly as a thinker.


Still, one may learn from others' experience and cultivate good 
habits early.  To say that one should not prematurely optimize 
is not to say that one should not try to avoid cases that tend 
to be really bad, and I would rather learn from others what 
these are then learn only the hard way.


For the time being I am still at early development stage and 
have not yet tested things with the larger data sets I 
anticipate eventually using.  It's cheap to make slightly 
different design decisions early, but much more painful further 
down the line, particularly given my context.


As I understand it, foreach allocates when a simple C-style for 
using an array index would not.  I would like to learn more 
about when this turns particularly expensive, and perhaps I 
could put this up on the wiki if people think it is a good idea.


What exactly does it allocate, and how often, and how large is 
this in relation to the size of the underlying data 
(structs/classes/ranges)?  Are there any cache effects to 
consider?  Happy to go to the source code if you can give me 
some pointers.


Thanks in advice for any thoughts.



Laeeth.


I think you're thinking of java's foreach syntax, which IIRC does 
cause allocations.


Re: foreach - premature optimization vs cultivating good habits

2015-01-30 Thread Adam D. Ruppe via Digitalmars-d-learn

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
As I understand it, foreach allocates when a simple C-style for 
using an array index would not.


foreach is just syntax sugar over a for loop. If there's any 
allocations, it is because your code had some, it isn't inherit 
to the loop. The doc definition even lists the translation of 
foreach to for in the case of ranges explicitly:


http://dlang.org/statement.html#ForeachStatement


The most likely allocation would be to a user-defined opApply 
delegate, and you can prevent that by making it opApply(scope 
your_delegate) - the scope word prevents any closure allocation.




Re: foreach - premature optimization vs cultivating good habits

2015-01-30 Thread BBaz via Digitalmars-d-learn

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:

Hi.

The standard advice is not to worry about memory usage and 
execution speed until profiling shows you where the problem is, 
and I respect Knuth greatly as a thinker.


Still, one may learn from others' experience and cultivate good 
habits early.  To say that one should not prematurely optimize 
is not to say that one should not try to avoid cases that tend 
to be really bad, and I would rather learn from others what 
these are then learn only the hard way.


For the time being I am still at early development stage and 
have not yet tested things with the larger data sets I 
anticipate eventually using.  It's cheap to make slightly 
different design decisions early, but much more painful further 
down the line, particularly given my context.


As I understand it, foreach allocates when a simple C-style for 
using an array index would not.  I would like to learn more 
about when this turns particularly expensive, and perhaps I 
could put this up on the wiki if people think it is a good idea.


What exactly does it allocate, and how often, and how large is 
this in relation to the size of the underlying data 
(structs/classes/ranges)?  Are there any cache effects to 
consider?  Happy to go to the source code if you can give me 
some pointers.


Thanks in advice for any thoughts.



Laeeth.


foreach() is good for linked list because it uses a delegate so 
it avoids to cross the items from 0 to i at each iteration.


for() is good for arrays because of the pointer arithmetic.

But the D way for foreach'es is to use ranges: popFront etc, 
although i often feel too lazy to use them...


You'll probably get more technical answers...