Thanks so much, Jonas and Dirk! I just tried it and it works great. Jonas' benchmarking results are awesome too. I hope our discussion will help many other people who will come across this issue some time.

Cheers, Ulrich


On 03/28/2012 05:07 PM, Dirk Eddelbuettel wrote:
On 28 March 2012 at 16:36, Jonas Rauch wrote:
| If I remember correctly, environments use hash tables to lookup names.
|
| In pure R to check if an element L[["key"]] exists I would usually use
|>  "key" %in% names(L)
| which seems to be reasonably fast. `%in%` works in vectors, too
|>  keys %in% L
| returning exactly what Dirks suggestion was supposed to do (you missed an 
"if",
| or alternatively an "or")
|
| `%in%` uses .Internal(match(...)), which computes a temporary hash table as 
far
| as I can see in the source (src/main/unique.c). I am not sure about the
| implementation of the Hash. In particular I don't know if some kind of caching
| is done, because the lookup is much faster than Dirks function (see below) and
| computing a hash table for one lookup seems inefficient. I think it would be
| good to use the internal `match` function of R to do what you want. In
| Rinternals.h match is exposed as
|   SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
| where nmatch is the value to be returned if there is no match.

Excellent suggestion!

I think we can even return just the index vector, with NA or another filler
for non-matches:

R>  g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
+   Rcpp::List lst(ls);
+   Rcpp::CharacterVector nam = lst.names();
+
+   Rcpp::IntegerVector ind(Rf_match(nam,ts,R_NaInt));
+   return ind;
+ ')
R>  g(ll, c("aa", "b", "c"))
[1] NA  2 NA
R>

Dirk

| Below are the results of my little benchmark (see below). First parameter of
| `lookup` is the number elements in L, second is the length of keys, third one
| specifies the probability that a key is in present in the list.
|
| Regards,
| Jonas
|
| ##Benchmark results
|>  lookup(10000, 1000, 0.1)
|                 test elapsed relative
| 2         f(L, keys)   0.641   160.25
| 3         g(L, keys)   0.005     1.25
| 1 keys %in% names(L)   0.004     1.00
|>  lookup(10000, 1000, 1.0)
|                 test elapsed relative
| 2         f(L, keys)   0.643   160.75
| 3         g(L, keys)   0.004     1.00
| 1 keys %in% names(L)   0.004     1.00
|>  lookup(100000, 100, 0.1)
|                 test elapsed relative
| 2         f(L, keys)   0.656      164
| 3         g(L, keys)   0.004        1
| 1 keys %in% names(L)   0.004        1
|>  lookup(100000, 100, 1.0)
|                 test elapsed relative
| 2         f(L, keys)   0.654    163.5
| 3         g(L, keys)   0.004      1.0
| 1 keys %in% names(L)   0.004      1.0
|>  lookup(100000, 1, 0.1)
|                 test elapsed relative
| 2         f(L, keys)   0.647   161.75
| 3         g(L, keys)   0.004     1.00
| 1 keys %in% names(L)   0.005     1.25
|>  lookup(100000, 1, 1.0)
|                 test elapsed relative
| 2         f(L, keys)   0.647   161.75
| 3         g(L, keys)   0.004     1.00
| 1 keys %in% names(L)   0.005     1.25
|
|
| ## Code for benchmarks
| library(inline)
| library(rbenchmark)
| f<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
|     Rcpp::List lst(ls);
|     Rcpp::CharacterVector nam = lst.names();
|     std::vector<std::string>  nm = Rcpp::as<  std::vector<  std::string>  >
| (nam);
|     int m = nam.size();
|
|     Rcpp::CharacterVector tok(ts);
|     std::vector<std::string>  tk = Rcpp::as<  std::vector<  std::string>  >
| (tok);
|     int n = tok.size();
|
|     Rcpp::LogicalVector   log(n);
|
|     for (int i=0; i<n; i++) {  // look at all tokens
|        log[i] = false;                   // assume
| false, but compare to all names
|        for (int j=0; j<m; j++) {
|                     if(tk[i] == nm[j])
|               log[i] = 1;     // and note equality if we find
| it
|        }
|     }
|     return log;
|  ')
|
| g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
|     Rcpp::List lst(ls);
|     Rcpp::CharacterVector nam = lst.names();
|
|         Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
|     Rcpp::LogicalVector   log(ind);
|         return log;
|  ')
|
|
| lookup<-function(nList, nKeys, P) {
|     nList<-10000
|     nKeys<-1000
|     P<-0.1
|     keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
|     L<-as.list(1:nList)
|     names(L)<-as.character(1:nList)
|     benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
| columns=c("test", "elapsed", "relative"))
| }
|
| lookup(10000, 1000, 0.1)
| lookup(10000, 1000, 1.0)
| lookup(100000, 100, 0.1)
| lookup(100000, 100, 1.0)
| lookup(100000, 1, 0.1)
| lookup(100000, 1, 1.0)
|
|
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to