KK, Nice work!!
Looking forward to playing with this some more, and CCing Conrad and Ryan. Some more comments below. On 24 July 2014 at 20:25, Qiang Kou wrote: | RcppMLPACK is almost done, and I really hope it is useful for other people. | Testing and bug report are deeply welcome. Not only the code, also the results. | Now you can try it from my repo: https://github.com/thirdwing/RcppMLPACK | | I am afraid there will be known problems on Windows about size_t type. | | MLPACK is a scalable C++ machine learning library providing an intuitive and | simple API. It implements a wide array of machine learning methods and uses | Armadillo as input/output. For more detail about MLPACK, please visit its | homepage: http://www.mlpack.org/ | | Since we have Rcpp and RcppArmadillo, which can integrate C++ and Armadillo | with R seamlessly, RcppMLPACK becomes something very natural. The RcppMLPACK | package includes the source code from the MLPACK library. Thus users do not | need to install MLPACK itself in order to use RcppMLPACK. | | I use k-means as an example. By using RcppMLPACK, a k-means method can be | implemented like below. The interfere between R and C++ is handled by Rcpp and | RcppArmadillo. | | #include "RcppMLPACK.h" | | using namespace mlpack::kmeans; | using namespace Rcpp; | | // [[Rcpp::export]] | List kmeans(const arma::mat& data, const int& clusters) { | | arma::Col<size_t> assignments; | | // Initialize with the default arguments. | KMeans<> k; | | k.Cluster(data, clusters, assignments); | | return List::create(_["clusters"] = clusters, | _["result"] = assignments); | } | | inline package provides a complete wrapper around the compilation, linking, and | loading steps. So all the steps can be done in an R session. There is no reason | that RcppMLPACK doesn't support the inline compilation. It also works via sourceCpp() as Rcpp Attributes uses the same plugin: R> sourceCpp("/tmp/rcppmlpackEx.cpp") # saved your code in /tmp/rcppmlpackEx.cpp R> data(trees, package="datasets") R> kmeans(t(trees), 3) KMeans::Cluster(): converged after 9 iterations. $clusters [1] 3 $result [,1] [1,] 2 [2,] 2 [3,] 2 [.... rest of output omitted for brevity ...] All it takes is to add one line // [[Rcpp::depends(RcppMLPACK)]] in the source code you show above. | library(inline) | library(RcppMLPACK) | code <- ' | arma::mat data = as<arma::mat>(test); | int clusters = as<int>(n); | arma::Col<size_t> assignments; | mlpack::kmeans::KMeans<> k; | k.Cluster(data, clusters, assignments); | return List::create(_["clusters"] = clusters, | _["result"] = assignments); | ' | mlKmeans <- cxxfunction(signature(test="numeric", n ="integer"), code, plugin= | "RcppMLPACK") | data(trees, package="datasets") | mlKmeans(t(trees), 3) | | There is one point we need to pay attention to: Armadillo matrices in MLPACK | are stored in a column-major format for speed. That means observations are | stored as columns and dimensions as rows.So when using MLPACK, additional | transpose may be needed. | | The package also contains a RcppMLPACK.package.skeleton() function for people | who want to use MLPACK code in their own package. It follows the structure of | RcppArmadillo.package.skeleton(). | | library(RcppMLPACK) | RcppMLPACK.package.skeleton("foobar") | Creating directories ... | Creating DESCRIPTION ... | Creating NAMESPACE ... | Creating Read-and-delete-me ... | Saving functions and data ... | Making help files ... | Done. | Further steps are described in './foobar/Read-and-delete-me'. | | Adding RcppMLPACK settings | >> added Imports: Rcpp | >> added LinkingTo: Rcpp, RcppArmadillo, BH, RcppMLPACK | >> added useDynLib and importFrom directives to NAMESPACE | >> added Makevars file with RcppMLPACK settings | >> added Makevars.win file with RcppMLPACK settings | >> added example src file using MLPACK classes | >> invoked Rcpp::compileAttributes to create wrappers | | system("ls -R foobar") | foobar: | DESCRIPTION man NAMESPACE R Read-and-delete-me src | | foobar/man: | foobar-package.Rd | | foobar/R: | RcppExports.R | | foobar/src: | kmeans.cpp Makevars Makevars.win RcppExports.cpp Nice one too! | Even without a performance testing, we are still sure the C++ implementations | should be faster. A small wine data set from UCI data sets repository is used | for benchmarking. A script using rbenchmark package is written as below: | | suppressMessages(library(rbenchmark)) | res <- benchmark(mlKmeans(t(wine),3), | kmeans(wine,3), | columns=c("test", "replications", "elapsed", | "relative", "user.self", "sys.self"), order="relative") | | For 100 replications, MLPACK version of k-means (0.028s) is 33-time faster than | kmeans in R (0.947s). However, we should note that R returns more information | than the clustering result and there are much more checking functions in R. | | There is an important problem in MLPACK: it uses size_t type heavily. | | There will be problems in wrapping such type, since in 64-bit Windows, size_t | is defined as unsigned long long int. No this kind of error found during | testing on my Ubuntu. That is a known issue with R insisting on C++ 1998 without the interim changes. The simplest way around it (in the context of R and CRAN) is to enable C++11 -- I do so in RcppCNPy and RcppBDT as I need 'long long' in both. | Testing and bug report are deeply welcome. Not only the code, also the results. Very exciting. I am sure you'll get a ton of good feedback. Once again, nice work and congratulations. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | [email protected] _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
