Hi Asis,
parallel computing is a very delicate task in programming,
which depends on one side on your hardware architecture
and on the other side on your commands in your software.
1. If a sequential code is faster than the parallel code,
check if something is differently programmed or, if all is
the same and only a '#pragma omp'-directive is added.
2. How many iterations are you working in parallel, as
building a threadpool and initialize parallel processing
costs time for the computer. If the time is more than the
time you win from parallel execution of the tasks in the
loop, you loose in aggregate. test your program by
doubling your input: when does the parallel code bcomes
faster then the sequential? If never, there must be a
serious red flag somewhere in your code.
3. If you have a NUMA-architecture on your computer (which
is the case for almost all modern home computers) and your
parallelized tasks are defined by many momery accesses,
there could be non-local memory accesses which are costly.
In this case the only workaround is page processing, i.e.
put the objects which are processed at a certain CPU into
the local cache of this CPU.
4. In the case of too much cores for too less tasks, you
get high generating costs (see point 2). Better try less
threads. Furthermore: take as a maximum only the amount of
cores you have. Hyperthreading is nice, but a real
performance jump is only possible via real cores.
5. Set off the dynamic scheduling of OpenMP!! Set the
environment variable via: OMP_DYNAMIC=false.
6. Look at the workload! If you are parallelizing the
'wrong' loop it costs often more to parallelize something
without much calculations the winning from doing it in
parallel. Instead parallelize something with very complex
calculations. Take a tool to monitor performance and the
big workloads of your program. This can't be done by
simply looking at the code - only if you also consider the
specific hardware structure of your computer and using
very simple objects.
For performance tools check out vampir
(http://www.vampir.eu/) or scalasca
(http://www.scalasca.org/). For debugging check valgrind
(http://valgrind.org/docs/manual/drd-manual.html#drd-manual.openmp).
The commercial softwares are much better though. So, if
you have access to this software I would suggest either
Intel VTune Analyzer and Intel Inspector. Further
especially for debugging TotalView.
Hope this helps
Best
Simon
On Mon, 3 Jun 2013 12:44:20 +0200
Asis Hallab <[email protected]> wrote:
Dear Dirk, Simon and Rcpp Experts.
This is a message following up the thread about using
OpenMP
directives with Rcpp to construct probability matrices
in parallel.
I followed Dirk's hint and implemented the parallel
matrix generation
using just C++'s STL and the "#pragma omp parallel for"
for the loop
of the heaviest work load in each iteration, that is the
generation of
a matrix.
Good news: The code compiles and runs without errors.
Bad news: Even though the conversion of a large RcppList
and its
contained NumericMatrix objects does only take less then
half a
second, the parallel code with 10 cores runs
approximately 10 times
slower than the serial pure Rcpp implementation.
Serial implementation
user system elapsed
9.657 0.100 9.785
Parallel implementation on 10 cores
user system elapsed
443.095 26.437 100.132
Parallel implementation on 20 cores
user system elapsed
719.173 35.418 85.663
Again: I measured the time required to convert the Rcpp
objects and
this is only half a second.
Back conversion I did not even implement yet, I just
wrap the
resulting std::map< std string, std::vector<
std::vector<double> >.
Does anyone have an idea what is going on?
The code can be reviewed on github:
https://github.com/asishallab/PhyloFun_Rccp/blob/OpenMP
You'll find very short installation and test run
instructions in the
README.textile.
Kind regards and all the best!
_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel