[julia-users] Re: multiple processes question

2016-10-12 Thread noel ryan
Thanks a lot Chris your insight has been extremely helpful




[julia-users] Re: multiple processes question

2016-10-05 Thread Chris Rackauckas
See this blog post .

If your code is perfectly efficient, yes then processes equal to the number 
of cores (so for something like BLAS where it's written as the most 
efficient threaded algorithms you could image). But for your simple 
homework assignment? There will be time lost due to inefficiencies. It ends 
up being much faster to overload the scheduler so that, while one process 
is being slow due to moving data or something like that, it will kick 
another one in so that way something is always computing on each core. Even 
though this will cause some cache misses, if your program is not perfectly 
efficient, this will win the tradeoff.

So while in theory cores = threads, you just write efficient code and this 
choice is best because no cache misses... that's generally not reality. 
This is the same principle as Amdahl's Law, though that's an iffy 
explanation since normally that law is in the context of efficiency 
measured as what percentage of the program is serial vs parallel. Here the 
efficiency loss is due to the higher-level programming context not being 
100% bare-metal efficient, but it's the same idea. Note that your Monte 
Carlo pi calculation probably is 100% parallel, so it would "look" like 
Amdahl's law type things don't apply, but that's only when you abstract and 
ignore all of the details of computing (caching, data movement, etc.)

I was taught the same thing, yet if you continuously benchmark, only for 
the most performant and optimal threaded/MPI will this be true. It 
shouldn't be taught anymore: you should just be taught to benchmark you're 
code.

On Wednesday, October 5, 2016 at 6:02:41 AM UTC-7, noel ryan wrote:
>
> I am an undergraduate working on a Julia parallelism project. I have read 
> in quite a few tutorials that to get the best parallel performance I should 
> spawn a number of processes equal to the number of cores in my processor ( 
> working with 2 cores & 4 threads). However in a test to check processing 
> speeds my result ( monte carlo test for pi to 1 billion)  was that using 17 
> processes calculated the quickest. Adding extra processes above 17 didn't 
> speed up the calculation. Can anyone explain what is happening here?
>
> Any help would be great
>
> Regards,
>
> Noel
>