Le 27/09/13 12:11, sky Xue a écrit :
Hello,

I have a list  below whose elements are data frames (Please see the
attached file “try.dat”).  Now I want to apply a complicated function to
each row of the data frame which returns a single value. For simplicity,
you can assume this function is ma(x) (x is the row of the data frame).

[[1]]
    class_id student_id  1  2
1         1          1  9 14
2         1          2  4  1
3         1          3 10  8
4         1          4  7  7
5         1          5  6 11
6         1          6  1  3
7         1          7 14 10
8         1          8 13 12
9         1          9 12  2
10        1         10  3  9
11        1         11  8  4
12        1         12 11  6
13        1         13  2 13
14        1         14  5  5

[[2]]
    class_id student_id  1  2
15        2          1 11  3
16        2          2  7 10
17        2          3  2  2
18        2          4  6  6
19        2          5 13  8
20        2          6 12 13
21        2          7  8 14
22        2          8  1  9
23        2          9  3  1
24        2         10  4 11
25        2         11  5  4
26        2         12  9 12
27        2         13 10  7
28        2         14 14  5

[[3]]
    class_id student_id  1  2
29        3          1 12  6
30        3          2  1  3
31        3          3  8  2
32        3          4  9 10
33        3          5 11  7
34        3          6 14  4
35        3          7  2 14
36        3          8 13 13
37        3          9  3  8
38        3         10  5 11
39        3         11  4 12
40        3         12  7  1
41        3         13 10  5
42        3         14  6  9

In real situation the list will be very long, and the dataframe is much
wider. That’s why I want  to use Rcpp to improve the speed.

I got stuck from the very beginning, I failed to import this list to
Rcpp, not to mention import the dataframe to Rcpp.

I’ve checked the book Seamless R and C++ integration with Rcpp but find
NO example deals with such case.

Thank you very much for your support!

Best regards,

Sky

Rcpp has the Rcpp::DataFrame class which might help you but it does not do much.

A data.frame is merely a list of vectors of the same size, but of arbitrary types. This makes it difficult to process rows of a data frame.

So you have to do some work to grab a row of a data frame and apply something to it. The code below assumes that you have a data frame that contains only numeric vectors.


#include <Rcpp.h>
using namespace Rcpp;

double fun( NumericVector x){
    return sum(x) ;
}

void fill_row( NumericVector& row, const std::vector<NumericVector>& vectors, int i, int n){
    for( int j=0; j<n; j++){
        row[j] = vectors[j][i] ;
    }
}

// [[Rcpp::export]]
NumericVector apply_row_df( DataFrame df ){
    int n = df.size() ;
    int nrows = df.nrows() ;
    std::vector<NumericVector> vectors(n) ;
    for( int i=0; i<n; i++) vectors[i] = df[i] ;

    NumericVector row(n) ;
    NumericVector results(nrows) ;
    for( int i=0; i<nrows; i++){
        fill_row( row, vectors, i, n );
        results[i]=fun(row) ;
    }
    return results ;

}

// [[Rcpp::export]]
List apply_all( List list ){
    return lapply( list, apply_row_df) ;
}

/*** R
df <- data.frame( x = seq(0, 10, .1), y = seq(0, 10, .1), z = seq(0, 10, .1) )
    apply_row_df( df )

    list_of_df <- rep( list(df), 10 )
    apply_all( list_of_df )
*/


The function apply_row_df works on a single data frame, it calls the fun function on each row of the data frame. Prior to that we fill the vector "row" with data using the fill_row function.

Then it is just looping, etc ...


The apply_all is just a convenience that will apply apply_row_df to each item of a list.

Hope this helps.

Romain


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to