Hello,

I've commited an Rcpp version of %in%.

For example:

require(Rcpp)
require(microbenchmark)

sourceCpp( code = '
#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
LogicalVector in_( CharacterVector x, CharacterVector table){
    return in( x, table ) ;
}
' )

`%in++%` <- in_


> c("a", "ad") %in++% letters
[1]  TRUE FALSE

In terms of performance:

> xx <- sample( sample(letters, 15 ), 1000000, replace = TRUE )
> microbenchmark(
+     xx %in% letters,
+     xx %in++%  letters,
+     in_( xx, letters )
+ )
Unit: milliseconds
               expr      min       lq   median       uq      max
1  in_(xx, letters) 12.79488 12.85228 12.88214 15.33067 44.65161
2   xx %in% letters 31.96431 34.43951 34.90381 35.37460 65.68226
3 xx %in++% letters 12.81114 12.86457 12.91557 15.06667 16.20493



The tool here is unordered_set as we don't care where the data is on the table, we just want to know if it is there.

Might be interesting at some point to check alternatives to the standard hasing functions... e.g. play with sparsehash: http://code.google.com/p/sparsehash/

Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to