> > On 23 July 2011 at 09:02, ian.fell...@stat.ucla.edu wrote: > | Hi all, > | > | I've just released an R package to CRAN that creates pretty looking word | clouds. I think it makes a good minimal example of how to prototype an | algorithm in R, and then bring the performance bottleneck down to c++ to | improve speed. > > Sweet! I am still watching the whole onslaught of new or updated packages unfold so I haven't had a chance to even check if there were new Rcpp-using > packages. So welcome to the club :) > > | An example: > | > | > >install.packages("wordcloud",repos="http://cran.r-project.org",type="source") > | >library(tm) > | >data(crude) > | >crude <- tm_map(crude, removePunctuation) > | >crude <- tm_map(crude, function(x)removeWords(x,stopwords())) | >tdm <- TermDocumentMatrix(crude) > | >m <- as.matrix(tdm) > | >v <- sort(rowSums(m),decreasing=TRUE) > | >d <- data.frame(word = names(v),freq=v > | + ) > | >library(wordcloud) > | Loading required package: Rcpp > | >#using c++ to help layout the words > | >system.time(wordcloud(d$word,d$freq,scale=c(8,.1),min.freq=0)) | user system elapsed > | 9.979 0.049 9.878 > | >#using R code to do the same layout > | > >system.time(wordcloud(d$word,d$freq,scale=c(8,.1),min.freq=0,use.r.layout=T)) > | user system elapsed > | 151.919 0.716 146.737 > > Ok, I'll be lazy now as I could just look at the code, but what type of layout operation did you move to C++? Is it a type of sorting / arranging / > classifying / ... ? Does it rely on other libraries or did you solve it with > homegrown C++? How many lines?
The layout algorithm takes each word and spirals out from the center of the plot until it finds a place where the word wouldn't overlap with any words already plotted. It then plots the word in that place. The check to see whether the word has any overlaps at a particular point is expensive and scales poorly. I tried something really smart and clever in R to fix this, but it turns out that just doing the check in c++ is faster than any cleverness I could come up with. The function is 24 lines of c++ code replacing 23 lines of R code. > > And lastly ... given that also know Java so well: what works well / better with Rcpp for you? Speed. wordcloud was a cute little weekend project, but for my dissertation work, high performance is a primary concern, so I'm designing it from the ground up using Rcpp (nothing public yet). It is much slower going for me to code in c++, partly due to my lack of experience, and partly due to my inability to find an IDE with good code completion / syntax error detection. My understanding is that (due to extensive use of templates), this is something I'll just have to live with. I'm open to suggestions though. I'm currently using Eclipse CDT. > > Cheers, Dirk > > -- > Gauss once played himself in a zero-sum game and won $50. > -- #11 at http://www.gaussfacts.com > > _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel