I'm reposting the following question relating to using data frames in Rcpp - I 
originally put it up on StackOverflow but Dirk directed me to post it here 
instead. I'm interested in whether there's a resolution to this issue, and if 
not, whether there are future plans to resolve it. 

This is my first post on here, so go easy - I'm hoping my query will get a 
better response than on SO!


In the R / Rcpp code shown in italics below (a toy example), I beam across the 
data frame mydf to the Rcpp code (and pick it up as DF), and then count the 
number of age values that exceed 21, and the number of name values that equal 
"Bob" or "Eve". The two answers (4 and 2) are returned as a list, as shown at 
the end of the code. All hopefully self-explanatory.

Here's my question: Rcpp clearly understands DF["name"] and DF["age"] as being 
the columns name and age in DF - that's great. Given that this notation is 
meaningful, what notation can we use to refer directly to the individual 
elements in DF, so that we don't need to generate intermediate vectors (i.e. 
the std::vectors name and age in the code below)? The reason I ask is that in 
practice the input data frame(s) may well have a much, much greater number of 
columns, and it feels unwieldy to have to map each one individually to a vector 
given that the information is clearly already contained within the DF object. 
If we had to do this to use the columns of a data frame in R, there'd be a riot!

I imagine an answer to this question will be valuable to all those who use Rcpp 
for complex tasks where data frames need passing (which is presumably 
everything beyond a certain level of complexity), so I thought I'd map things 
out in detail. Hope the question is clear, and many thanks in advance for your 
help. :)


library(inline)

mydf = data.frame(name=c("Amy","Bob","Cal","Dan","Eve","Fay","Gus"), 
                  age=c(24,17,31,28,19,20,25), stringsAsFactors=FALSE)


testfunc1 = cxxfunction(
    signature(DFin = "data.frame"),
    plugin = "Rcpp",
    body = '
        Rcpp::DataFrame DF(DFin);
        std::vector<std::string> name = 
                         Rcpp::as< std::vector<std::string> >(DF["name"]);
        std::vector<int> age = 
                         Rcpp::as< std::vector<int> >(DF["age"]);
        int n = name.size();
        int counter1 = 0;
        int counter2 = 0;
        for (int i = 0; i < n; i++) {
            if (age[i] > 21) {
                counter1++;
            }
            if ((name[i] == "Bob") | (name[i] == "Eve")) {
                counter2++;
            }
        }
        return(Rcpp::List::create( _["counter1"] = counter1, 
                                   _["counter2"] = counter2 ));
        ')

out = testfunc1(mydf)
print(out)

The output in out is of course:

$counter1
[1] 4

$counter2
[1] 2

                                          
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to