Hi Asis,

I have cloned the git project and successfully installed it to R. The 
'testCptsRealLive.R' does not work though. The 'load' command has problems 
reading the binary file. 

Nevertheless, I took a look at your OpenMP code: 

SEXP conditionalProbabilityTables( SEXP uniqueEdgeLengths, SEXP annos, SEXP
    stringifiedAnnotations, SEXP annotsMutationProbTableList, SEXP
    mutTblLengthColIndx, SEXP nThreads ) {
  BEGIN_RCPP

    //std::cout << "1" << "\n";
    NumericVector numberThreads = NumericVector( nThreads );
    //std::cout << "2" << "\n";
    omp_set_num_threads( numberThreads(0) );
    //std::cout << "3" << "\n";

    NumericVector edgeLengths = NumericVector( uniqueEdgeLengths );
    //std::cout << "4" << "\n";
    CharacterVector edgeLengthsAsStrs = as<CharacterVector>( edgeLengths );
    //std::cout << "5" << "\n";
    List cpts = List( 0 );
    /* std::map< std::string, std::vector<double> > cpts; */
    //std::cout << "6" << "\n";

    #pragma omp parallel for private( uniqueEdgeLengths, annos, 
stringifiedAnnotations, annotsMutationProbTableList, mutTblLengthColIndx, 
nThreads, edgeLengths ) shared( cpts )
    for ( int i = 0; i < edgeLengths.size(); i++ )   
    {
      NumericVector currBranchLength( 1, edgeLengths( i ) );
      NumericMatrix cpt = conditionalProbabilityTable( currBranchLength, annos,
          stringifiedAnnotations, annotsMutationProbTableList,
          mutTblLengthColIndx );
      cpts.push_back( cpt, std::string( edgeLengthsAsStrs( i ) ) );
      /* cpts[ std::string( edgeLengthsAsStrs( i ) ) ] = as<std::vector<double> 
>( cpt ); */
    }
    return( wrap( cpts ) );

  END_RCPP
}

My comments: 
1) Set always '++i' not 'i++'. The second directive involves a copy, whereas 
the first one doesn't. This makes the first one faster. 

2) You are using a function 'conditionalProbabilityTable' in your OpenMP-Loop. 
Have you tested this function?

3) The 'cpts' List-Object is shared in the loop, i.e. all threads can access 
it. Then you write-access the list not by index or name, but by push_back(). If 
this is thread-safe I cannot tell you, but I wouldn't use it here. 
This can cause undefined behavior, i.e. your program runs sometimes without 
errors and another time you get a segfault. Try to access the list at least 
with an index or name. Another possibility: Use the push_back
command in a '#pragma omp critical' directive, which allows only one thread at 
a time access to the List (this can cause a longer runtime). 

4) You are using in the called function 'conditionalProbabilityTable' another 
for-loop: 

SEXP conditionalProbabilityTable( SEXP branchLength, SEXP annos,
    SEXP stringifiedAnnotations, SEXP annotsMutationProbTables, SEXP
    mutTblLengthColIndx ) {
  BEGIN_RCPP

    List annosLst( annos );
    CharacterVector annosAsStrs( stringifiedAnnotations );
    NumericMatrix cpt = NumericMatrix( annosLst.size(), annosLst.size() );
    cpt.attr( "dimnames" ) = List::create( annosAsStrs, annosAsStrs );

    for ( int i = 0; i < annosLst.size(); i++ ) {
      CharacterVector compositeAnnotation = annosLst( i );
      double compAnnoMutProb = 1.0;
      std::string ua = "unknown";
      std::string caFirst = as<std::string>( compositeAnnotation( 0 ) );
      if ( ua != caFirst ) {
        compAnnoMutProb = mutationProbability( compositeAnnotation,
            branchLength, annotsMutationProbTables, mutTblLengthColIndx )( 0 );
      }
      double mutToOtherAnnosProb = compAnnoMutProb / ( annosLst.size() - 1 );
      NumericVector colmn( annosLst.size(), mutToOtherAnnosProb );
      colmn( i ) = 1.0 - compAnnoMutProb;
      cpt( _, i ) = colmn;
    }

    return( wrap( cpt ) );

  END_RCPP
}
In such cases, you should always think about which loop to parallelize. Usually 
you parallelize the loop with the bigger workload. Only in case this loop has 
only
a few iterations it would make sense to parallelize the other loop. Another 
possibility: Create nested parallelized loops (advanced-level). For this you 
just use another '#pragma omp for' directive before the second for-loop. Set 
the variables OMP_NESTED to true. 
Set OMP_MAX_ACTIVE_LEVELS=2. Further make sure, that you have enough stack size 
for the threads: OMP_STACKSIZE=1000000. 
Do not even dream about an improvement, when working with two cores! You need 
at least 8 to see the work power of nesting!

5) Set the environment variable OMP_DYNAMIC=false. The dynamic chunking does in 
most cases not work and static chunking has a better performance. 

6) The OpenMP library was intended to be used on very simple objects, i.e. 
arrays and vectors. One reason for this is also the socalled page placing, 
where you place the memory chunks, which are used by a certain core
on the local memory of this core (NUMA-technology). If we have very complicated 
objects, it is not always clear, how the memory can be placed for an optimal 
access. 
So, to cut a long story short, I would always use very simple objects in an 
OpenMP directive. As NumericVectors and Armadillo-Vectors rely on an inner 
array for memory allocations 
(e.g. you can initialize both objects with an already existing stl::vector for 
memory reutilization), this objects can still be considered as very simple -> 
fast. Second, page placement
is more easily achieved, if wanted/needed. Third, write-access can be done 
easily by index and is thread-safe if indices do not intersect or we have 
read-access only. 
Run over vectors and matrices and stack them later on into a list. 

7) It seems sometimes to me, that you create a lot of objects. Check if these 
objects are all needed. For example, consider in ''conditionalProbabilityTable':
colmn( i ) = 1.0 - compAnnoMutProb;
cpt( _, i ) = colmn;
I think, you could write this immediately into 'cpt'?

8) Use instruments for testing: A very good introduction and an overview can be 
found on the website of the HPC-Cluster in Aachen (to which by the way you 
could get access from the University Bonn), 
http://www.rz.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaaaclexs. 
Identify with an instrument the 
big workload in your sequential program and try to parallelize this section. 
Then check another time where another workload is, and so on. I would further 
suggest to simplify your program, when you 
encounter undefined behavior. This helps in most cases a lot. 

I hope I could help here. I already saw your new thread on the list and it 
seems, everything is running. So congrats and keep on!


Best

Simon

P.S. Of course we could meet in Bonn or Cologne (where I am living and working 
now). I am also very interested, what is needed in Bioinformatics, maybe we 
could interchange ideas. Let us keep in contact!


On Jun 1, 2013, at 8:53 PM, Asis Hallab <[email protected]> wrote:

> Dear Simon,
> 
> thank you very much for your insights and offer to help.
> 
> I prepared a very small project on github:
> https://github.com/asishallab/PhyloFun_Rccp
> 
> You'll find installation and test instructions as well as references
> to the file containing the OpenMP using C++ code.
> 
> Your help is much appreciated.
> 
> Kind regards!

_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to