Hi Ed,

On Thu, Jul 07, 2022 at 03:38:23PM +0000, Ed . wrote:
> ...
> A wrinkle to your specified task is that as mentioned above, simple numerical 
> addition is a very memory-bound activity. If the task were a bit more complex 
> (using the registers / CPU caches more), the upper bound of beneficial 
> thread-use might well be higher than the 2 your results show.


I see. It goes much better if I broadcast, for example, a small matrix
multiplication (as in the case of Eric) instead of a simple scalar
multiplication.

> A very contrived example that I’ve used to explore this, while investigating 
> the floating point benchmark stuff 
> (https://github.com/Fourmilab/floating_point_benchmarks/pull/1/files) (if you 
> hack it up with your additional set_autopthread_targ that would be very 
> valuable):

I'm not sure I understand what you mean 'if you hack...'

> ...

Regards,
Luis


--

                                                                  o
W. Luis Mochán,                      | tel:(52)(777)329-1734     /<(*)
Instituto de Ciencias Físicas, UNAM  | fax:(52)(777)317-5388     `>/   /\
Av. Universidad s/n CP 62210         |                           (*)/\/  \
Cuernavaca, Morelos, México          | moc...@fis.unam.mx   /\_/\__/
GPG: 791EB9EB, C949 3F81 6D9B 1191 9A16  C2DF 5F0A C52B 791E B9EB


_______________________________________________
pdl-general mailing list
pdl-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to