Re: [julia-users] Embarrassingly parallel workload
I have tried the construction below with no success. In v0.4.3, I end up getting a segmentation fault. In the latest v.0.5.0, the run time is 3-4x as long as the non-parallelized version and the array constructed is vastly different than the one that is constructed using the non-parallelized code. Below is the C++ code that I am essentially trying to emulate: void TreeLattice::stepback(Size i, const Array& values, Array& newValues) const { #pragma omp parallel for for (Size j=0; jimpl().size(i); j++) { Real value = 0.0; for (Size l=0; limpl().probability(i,j,l) * values[this->impl().descendant(i,j,l)]; } value *= this->impl().discount(i,j); newValues[j] = value; } } The calls to probability, descendant, and discount all end up accessing data in other objects, so I tried to prepend those function and type definitions with @everywhere. However, that started me on a long chain of having to eventually wrap each file in my module in @everywhere, and there were still errors complaining about things not being defined. At this point I am really confused as to how to construct what would appear to be a rather simple parallelized for loop that generates the same results as non-parallelized code. I've poured over both this forum and other resources, and nothing has really worked. Any help would be appreciated. Thanks! Chris On Thursday, August 20, 2015 at 4:52:52 AM UTC-4, Nils Gudat wrote: > > Sebastian, I'm not sure I understand you correctly, but point (1) in your > list can usually be taken care of by wrapping all the necessary > usings/requires/includes and definitions in a @everywhere begin ... end > block. > > Julio, as for your original problem, I think Tim's advice about > SharedArrays was perfectly reasonable. Without having looked at your > problem in detail, I think you should be able to do something like this > (and I also think this gets close enough to what Sebastian was talking > about, and to Matlab's parfor, unless I'm completely misunderstanding your > problem): > > nprocs()==CPU_CORES || addprocs(CPU_CORES-1) > results = SharedArray(Float64, (m,n)) > > @sync @parallel for i = 1:n > results[:, i] = complicatedfunction(inputs[i]) > end >
Re: [julia-users] Embarrassingly parallel workload
By "construction below", I mean this: results = SharedArray(Float64, (m,n)) @sync @parallel for i = 1:n results[:, i] = complicatedfunction(inputs[i]) end On Saturday, January 30, 2016 at 2:31:40 PM UTC-5, Christopher Alexander wrote: > > I have tried the construction below with no success. In v0.4.3, I end up > getting a segmentation fault. In the latest v.0.5.0, the run time is 3-4x > as long as the non-parallelized version and the array constructed is vastly > different than the one that is constructed using the non-parallelized code. > Below is the C++ code that I am essentially trying to emulate: > > void TreeLattice::stepback(Size i, const Array& values, > > Array& newValues) const { > > #pragma omp parallel for > > for (Size j=0; jimpl().size(i); j++) { > > Real value = 0.0; > > for (Size l=0; l> value += this->impl().probability(i,j,l) * > > values[this->impl().descendant(i,j,l)]; > > } > > value *= this->impl().discount(i,j); > > newValues[j] = value; > > } > > } > > The calls to probability, descendant, and discount all end up accessing > data in other objects, so I tried to prepend those function and type > definitions with @everywhere. However, that started me on a long chain of > having to eventually wrap each file in my module in @everywhere, and there > were still errors complaining about things not being defined. At this > point I am really confused as to how to construct what would appear to be a > rather simple parallelized for loop that generates the same results as > non-parallelized code. I've poured over both this forum and other > resources, and nothing has really worked. > > Any help would be appreciated. > > Thanks! > > Chris > > > On Thursday, August 20, 2015 at 4:52:52 AM UTC-4, Nils Gudat wrote: >> >> Sebastian, I'm not sure I understand you correctly, but point (1) in your >> list can usually be taken care of by wrapping all the necessary >> usings/requires/includes and definitions in a @everywhere begin ... end >> block. >> >> Julio, as for your original problem, I think Tim's advice about >> SharedArrays was perfectly reasonable. Without having looked at your >> problem in detail, I think you should be able to do something like this >> (and I also think this gets close enough to what Sebastian was talking >> about, and to Matlab's parfor, unless I'm completely misunderstanding your >> problem): >> >> nprocs()==CPU_CORES || addprocs(CPU_CORES-1) >> results = SharedArray(Float64, (m,n)) >> >> @sync @parallel for i = 1:n >> results[:, i] = complicatedfunction(inputs[i]) >> end >> >
Re: [julia-users] Embarrassingly parallel workload
Sebastian, I'm not sure I understand you correctly, but point (1) in your list can usually be taken care of by wrapping all the necessary usings/requires/includes and definitions in a @everywhere begin ... end block. Julio, as for your original problem, I think Tim's advice about SharedArrays was perfectly reasonable. Without having looked at your problem in detail, I think you should be able to do something like this (and I also think this gets close enough to what Sebastian was talking about, and to Matlab's parfor, unless I'm completely misunderstanding your problem): nprocs()==CPU_CORES || addprocs(CPU_CORES-1) results = SharedArray(Float64, (m,n)) @sync @parallel for i = 1:n results[:, i] = complicatedfunction(inputs[i]) end
Re: [julia-users] Embarrassingly parallel workload
Sebastian, This talk from JuliaCon 2015 discusses progress on OpenMP-like threading: Kiran Pamnany and Ranjan Anantharaman: Multi-threading Julia: http://youtu.be/GvLhseZ4D8M?a Ryan On 08/19/2015 02:42 PM, Sebastian Nowozin wrote: Hi Julio, I believe this is a very common type of workload, especially in scientific computing. In C++ one can use OpenMP for this type of computation, in Matlab there is parfor. From the users perspective both just work. In Julia, I have not found an easy and convenient way to do such computation. The difficulties I have experienced, trying to do this with distributed arrays and the Julia parallel operations: 1. Having to prepend @parallel before every import/require so that all parallel processes have all definitions. 2. Working with the distributed arrays API has given me plenty of headaches; it is more like programming with local/global contexts in OpenCL. 3. (I believe this is fixed now.) There were garbage collection issues and crashes on Windows when using distributed arrays. What would be very convenient is a type of OpenMP like parallelism, really anything that can enable us to write simply function compute(X::Vector{Float64}, theta) n = length(X) A = zeros(Float64, n) @smp for i=1:n A[i] = embarassing_parallel(X[i], theta); end A end Where @smp would correspond to #pragma omp parallel for. I know this may be difficult to implement for a language as dynamic as Julia, but it is hard to argue against this simplicity from the users' point of view. As Clang/LLVM now support OpenMP (https://clang-omp.github.io/), one perhaps can recycle the same OpenMP runtime for such lightweight parallelism? Thanks, Sebastian On Wednesday, 19 August 2015 19:03:59 UTC+1, Júlio Hoffimann wrote: Hi Kristoffer, sorry for the delay and thanks for the code. What I want to do is very simple: I have an expensive loop for i=1:N such that each iteration is independent and produces a large array of size M. The result of this loop is a matrix of size MxN. I have many CPU cores at my disposal and want to distribute this work. In the past I accomplished that with MPI in Python: https://github.com/juliohm/HUM/blob/master/pyhum/utils.py https://github.com/juliohm/HUM/blob/master/pyhum/utils.py Whenever a process in the pool is free it consumes an iteration of the loop. What exactly the @parallel macro in Julia is doing? How can I modify the code I previously posted in this thread to achieve such effect? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Thanks Ryan for the pointer, this is awesome work, I am looking forward to this becoming part of the Julia release in Q3. Sebastian On Thu, Aug 20, 2015 at 3:34 PM, Ryan Cox ryan_...@byu.edu wrote: Sebastian, This talk from JuliaCon 2015 discusses progress on OpenMP-like threading: Kiran Pamnany and Ranjan Anantharaman: Multi-threading Julia: http://youtu.be/GvLhseZ4D8M?ahttp://youtu.be/GvLhseZ4D8M?a Ryan On 08/19/2015 02:42 PM, Sebastian Nowozin wrote: Hi Julio, I believe this is a very common type of workload, especially in scientific computing. In C++ one can use OpenMP for this type of computation, in Matlab there is parfor. From the users perspective both just work. In Julia, I have not found an easy and convenient way to do such computation. The difficulties I have experienced, trying to do this with distributed arrays and the Julia parallel operations: 1. Having to prepend @parallel before every import/require so that all parallel processes have all definitions. 2. Working with the distributed arrays API has given me plenty of headaches; it is more like programming with local/global contexts in OpenCL. 3. (I believe this is fixed now.) There were garbage collection issues and crashes on Windows when using distributed arrays. What would be very convenient is a type of OpenMP like parallelism, really anything that can enable us to write simply function compute(X::Vector{Float64}, theta) n = length(X) A = zeros(Float64, n) @smp for i=1:n A[i] = embarassing_parallel(X[i], theta); end A end Where @smp would correspond to #pragma omp parallel for. I know this may be difficult to implement for a language as dynamic as Julia, but it is hard to argue against this simplicity from the users' point of view. As Clang/LLVM now support OpenMP (https://clang-omp.github.io/), one perhaps can recycle the same OpenMP runtime for such lightweight parallelism? Thanks, Sebastian On Wednesday, 19 August 2015 19:03:59 UTC+1, Júlio Hoffimann wrote: Hi Kristoffer, sorry for the delay and thanks for the code. What I want to do is very simple: I have an expensive loop for i=1:N such that each iteration is independent and produces a large array of size M. The result of this loop is a matrix of size MxN. I have many CPU cores at my disposal and want to distribute this work. In the past I accomplished that with MPI in Python: https://github.com/juliohm/HUM/blob/master/pyhum/utils.py https://github.com/juliohm/HUM/blob/master/pyhum/utils.py Whenever a process in the pool is free it consumes an iteration of the loop. What exactly the @parallel macro in Julia is doing? How can I modify the code I previously posted in this thread to achieve such effect? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Hi Kristoffer, sorry for the delay and thanks for the code. What I want to do is very simple: I have an expensive loop for i=1:N such that each iteration is independent and produces a large array of size M. The result of this loop is a matrix of size MxN. I have many CPU cores at my disposal and want to distribute this work. In the past I accomplished that with MPI in Python: https://github.com/juliohm/HUM/blob/master/pyhum/utils.py Whenever a process in the pool is free it consumes an iteration of the loop. What exactly the @parallel macro in Julia is doing? How can I modify the code I previously posted in this thread to achieve such effect? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Hi Ismael, MPI is distributed memory, I'm trying to use all the cores in my single workstation with shared memory instead. Thanks for the link anyways. -Júlio
Re: [julia-users] Embarrassingly parallel workload
There is an MPI wrapper for Julia, I don't know if it'll suit your needs thoug: https://github.com/JuliaParallel/MPI.jl El miércoles, 19 de agosto de 2015, 13:03:59 (UTC-5), Júlio Hoffimann escribió: Hi Kristoffer, sorry for the delay and thanks for the code. What I want to do is very simple: I have an expensive loop for i=1:N such that each iteration is independent and produces a large array of size M. The result of this loop is a matrix of size MxN. I have many CPU cores at my disposal and want to distribute this work. In the past I accomplished that with MPI in Python: https://github.com/juliohm/HUM/blob/master/pyhum/utils.py Whenever a process in the pool is free it consumes an iteration of the loop. What exactly the @parallel macro in Julia is doing? How can I modify the code I previously posted in this thread to achieve such effect? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Hi Julio, I believe this is a very common type of workload, especially in scientific computing. In C++ one can use OpenMP for this type of computation, in Matlab there is parfor. From the users perspective both just work. In Julia, I have not found an easy and convenient way to do such computation. The difficulties I have experienced, trying to do this with distributed arrays and the Julia parallel operations: 1. Having to prepend @parallel before every import/require so that all parallel processes have all definitions. 2. Working with the distributed arrays API has given me plenty of headaches; it is more like programming with local/global contexts in OpenCL. 3. (I believe this is fixed now.) There were garbage collection issues and crashes on Windows when using distributed arrays. What would be very convenient is a type of OpenMP like parallelism, really anything that can enable us to write simply function compute(X::Vector{Float64}, theta) n = length(X) A = zeros(Float64, n) @smp for i=1:n A[i] = embarassing_parallel(X[i], theta); end A end Where @smp would correspond to #pragma omp parallel for. I know this may be difficult to implement for a language as dynamic as Julia, but it is hard to argue against this simplicity from the users' point of view. As Clang/LLVM now support OpenMP (https://clang-omp.github.io/), one perhaps can recycle the same OpenMP runtime for such lightweight parallelism? Thanks, Sebastian On Wednesday, 19 August 2015 19:03:59 UTC+1, Júlio Hoffimann wrote: Hi Kristoffer, sorry for the delay and thanks for the code. What I want to do is very simple: I have an expensive loop for i=1:N such that each iteration is independent and produces a large array of size M. The result of this loop is a matrix of size MxN. I have many CPU cores at my disposal and want to distribute this work. In the past I accomplished that with MPI in Python: https://github.com/juliohm/HUM/blob/master/pyhum/utils.py Whenever a process in the pool is free it consumes an iteration of the loop. What exactly the @parallel macro in Julia is doing? How can I modify the code I previously posted in this thread to achieve such effect? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Hi Sebastian, thanks for sharing your experience in parallelizing Julia code. I used OpenMP in the past too, it was very convenient in my C++ codebase. I remember of an initiative OpenACC that was trying to bring OpenMP and GPU accelerators together, I don't know the current status of it. It may be of interest to Julia devs. -Júlio
Re: [julia-users] Embarrassingly parallel workload
What am I doing wrong in the following code? function foo(N; parallel=false) if parallel nprocs() CPU_CORES addprocs(CPU_CORES - nprocs()) end result = SharedArray(Float64, 9, N) @parallel for i=1:N sleep(1) result[:,i] = rand(3,3)[:] end result end If I call foo(60, parallel=true), result is all zeros. Expected behavior is a random matrix. -Júlio
Re: [julia-users] Embarrassingly parallel workload
Something like this? @everywhere function fill(A::SharedArray) for idx in Base.localindexes(A) A[idx] = rand() end end function fill_array(m, n) A = SharedArray(Float64, (m, n)) @sync begin for p in procs(q) @async remotecall_wait(p, fill, A) end end A end fill_array(9, 60) On Tuesday, August 11, 2015 at 12:23:02 AM UTC+2, Júlio Hoffimann wrote: What am I doing wrong in the following code? function foo(N; parallel=false) if parallel nprocs() CPU_CORES addprocs(CPU_CORES - nprocs()) end result = SharedArray(Float64, 9, N) @parallel for i=1:N sleep(1) result[:,i] = rand(3,3)[:] end result end If I call foo(60, parallel=true), result is all zeros. Expected behavior is a random matrix. -Júlio
[julia-users] Embarrassingly parallel workload
Hi, Suppose I have a complicated but embarrassingly parallel loop, namely: https://github.com/juliohm/ImageQuilting.jl/blob/master/src/iqsim.jl#L167 How would you dispatch the iterations so that all cores in the client computer are busy working? There is any construct in the language for that usual scenario? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Completely possible with SharedArrays, see http://docs.julialang.org/en/latest/manual/parallel-computing/#shared-arrays --Tim On Sunday, August 09, 2015 02:41:42 PM Júlio Hoffimann wrote: Hi, Suppose I have a complicated but embarrassingly parallel loop, namely: https://github.com/juliohm/ImageQuilting.jl/blob/master/src/iqsim.jl#L167 How would you dispatch the iterations so that all cores in the client computer are busy working? There is any construct in the language for that usual scenario? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Consider the following simplified example. There is an algorithm implemented as a function foo(N). This algorithm repeats the same recipe N times in a loop to fill in an array of arrays: function foo(N) # bunch of auxiliary variables goes here # ... result = [] for i=1:N # complicated use of auxiliary variables to produce result[i] # ... push!(result, rand(3,3)) # push result[i] to the vector end return result end Each iteration is expensive and we would like to add an option foo(N; parallel=false) to use all cores available. I would start with: function foo(N; parallel=false) if parallel addprocs(CPU_CORES - 1) end # bunch of auxiliary variables goes here # ... result = SharedVector(???) @parallel for i=1:N # complicated use of auxiliary variables to produce result[i] # ... push!(result, rand(3,3)) end return result end How to initialize the SharedVector? Is this the correct approach? What about the auxiliary variables, will them be copied to all workers? If yes, I can just replace Arrays by SharedArrays so that they are visible everywhere without copying? -Júlio
Re: [julia-users] Embarrassingly parallel workload
Thank you Tim, will check it carefully. -Júlio