¡Hola! This is to announce that [kmcuda](https://github.com/src-d/kmcuda) has obtained native R bindings and ask for the help with CRAN packaging. kmcuda is my child: an efficient GPGPU (CUDA) library to do K-means and K-nn on as much data as fits into memory. It supports running on multiple GPUs simultaneously, angular distance metric, Yinyang refinement, float16 (well, not in R for sure), K-means++ and AFK-MC2 initialization. I am thinking about Minibatch in the near future.
Usage example: dyn.load("libKMCUDA.so") samples <- replicate(4, runif(16000)) result = .External("kmeans_cuda", samples, 50, tolerance=0.01, seed=777, verbosity=1) print(result$centroids) print(result$assignments[1:10,]) This library only supports Linux and macOS at the moment. Windows port is welcome. I knew pretty much nothing about R a week ago so would be glad to your suggestions. Besides, I've never published anything to CRAN and it will take some time for me to design a full package following the guidelines and rules. It will be awesome If somebody is willing to help! It seems to be the special fun to package the CUDA+OpenMP code for R and this fun doubles on macOS where you need a specific combination of two different clang compilers to make it work. Besides, I have a question which prevents me from sleeping at night: how is R able to support matrices with dimensions larger than INT32_MAX if the only integer type in C API is int (32-bit signed on Linux)? Even getting the dimensions with INTEGER() automatically leads to the overflow. -- Best regards, Vadim Markovtsev Lead Machine Learning Engineer || source{d} / sourced.tech / Madrid StackOverflow: 69708/markhor | GitHub: vmarkovtsev | data.world: vmarkovtsev ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.