Re: [fpc-devel] Nested functions in numlib

2017-04-05 Thread Michael Schnell
Performance-wise, moreover for some tasks (such as Matrix 
multiplication) with modern multi-core machines parallel calculation 
could increase performance greatly. This can be done e.g. by using a 
thread pool. (I once did a thread pool implementation based on TThread, 
but I suppose there are more "official" sources).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-04 Thread Werner Pamler

Am 04.04.2017 um 03:28 schrieb Marco van de Voort:

Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?


Right, I should have done that before asking...

Here are the results of a test running the original roof1r routine (A), 
the modified one using the nested function (B) and other modified one 
using a non-nested function but calling the version with the nested 
function (C). In each case, several functions are passed to the root 
finder which is called 5 million times, each call with a (reproducibly) 
different parameter:


f(x) = x
  (A)   ORIGINAL version: 0.656s for 500 
runs (check: y = 0.)
  (B) NESTED version: 0.703s for 500 
runs (7%)
  (C)Global function calling nested function: 0.735s for 500 
runs (12%)


f(x) = x^2
ORIGINAL version: 6.296s for 500 
runs (check: y = 0.)
  NESTED version: 6.313s for 500 
runs (0%)
 Global function calling nested function: 6.546s for 500 
runs (4%)


f(x) = exp(x)
ORIGINAL version: 6.734s for 500 
runs (check: y = 0.)
  NESTED version: 6.703s for 500 
runs (0%)
 Global function calling nested function: 6.890s for 500 
runs (2%)


f(x) = arcsin(x)
ORIGINAL version: 5.718s for 500 
runs (check: y = 0.)
  NESTED version: 5.718s for 500 
runs (0%)
 Global function calling nested function: 5.937s for 500 
runs (4%)


f(x) = erf(x)
ORIGINAL version: 6.391s for 500 
runs (check: y = 0.)
  NESTED version: 6.422s for 500 
runs (0%)
 Global function calling nested function: 6.673s for 500 
runs (4%)


f(x) = gammaLn(x)
ORIGINAL version: 15.260s for 500 
runs (check: y = 0.)
  NESTED version: 15.142s for 500 
runs (-1%)
 Global function calling nested function: 15.426s for 500 
runs (1%)


I would interpret these results such that there are no dramatic 
slow-downs due to calling variant C. Variant B (nested funtion) is 
roughly the same speed as the original procedure.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-04 Thread Marco van de Voort
In our previous episode, Marco van de Voort said:
> > Is there a chance that such a patch would be accepted?
> 
> Did you test performance? Repeated access to parent frame in tight loops
> might be suboptimal. Could maybe be helped with some pointer work?

(no it can't be helped with pointer work, it is a loop around a function
call, not a memory access, don't reply in the middle of the
night Marco)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-03 Thread Marco van de Voort
In our previous episode, Werner Pamler said:
> Is there a chance that such a patch would be accepted?

Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel