Hello Erik, Frank, > In case this grid function is expected to be a performance > improvement: I was intended as a performance improvement. The changes make the code faster though I do not know if only the sum total of all changes that happened to prim2con etc make it faster or if already the sqrtdetg part makes it faster.
> I would guess that calculating sqrt(det(g_ij)) takes about 20 to 30 > cycles, if the 3-metric g_ij is in the D1 cache, i.e. if the 3-metric > is already used in the same loop. Accessing a grid function element > that is stored in memory (assuming it remains in the L3 cache) costs > about 50 cycles. Interesting I had not realized that a sqrt is actually faster than a memeory access (multiplications and additions: yes of course, divisions: I wouldn't have known). So I guess what one has to do is actually run a test that just changes this one aspect and whatever the result is, documennt that in the code. Looking at the code it might be possible that the postiitve effec comes from avoiding multiple calls to sqrt for the same argument and/or from passing sqrt(detg) instead of detg to the prim2con and con2prim routines. Some testing seems in order. > Of course, the details will vary between systems, and will depend on > which cache level holds the data, and what optimizations the compiler > can apply to the loop. Don't take these numbers at face value. The > point here is that, although sqrt may "look expensive", it may well > be cheaper to re-calculate than to pre-calculate and store it. ok. I'll test them (or see if I can talk someone else into testing them). Yours, Roland
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
