Le 27/09/13 12:11, sky Xue a écrit :
Hello,
I have a list below whose elements are data frames (Please see the
attached file “try.dat”). Now I want to apply a complicated function to
each row of the data frame which returns a single value. For simplicity,
you can assume this function is ma(x) (x is the row of the data frame).
[[1]]
class_id student_id 1 2
1 1 1 9 14
2 1 2 4 1
3 1 3 10 8
4 1 4 7 7
5 1 5 6 11
6 1 6 1 3
7 1 7 14 10
8 1 8 13 12
9 1 9 12 2
10 1 10 3 9
11 1 11 8 4
12 1 12 11 6
13 1 13 2 13
14 1 14 5 5
[[2]]
class_id student_id 1 2
15 2 1 11 3
16 2 2 7 10
17 2 3 2 2
18 2 4 6 6
19 2 5 13 8
20 2 6 12 13
21 2 7 8 14
22 2 8 1 9
23 2 9 3 1
24 2 10 4 11
25 2 11 5 4
26 2 12 9 12
27 2 13 10 7
28 2 14 14 5
[[3]]
class_id student_id 1 2
29 3 1 12 6
30 3 2 1 3
31 3 3 8 2
32 3 4 9 10
33 3 5 11 7
34 3 6 14 4
35 3 7 2 14
36 3 8 13 13
37 3 9 3 8
38 3 10 5 11
39 3 11 4 12
40 3 12 7 1
41 3 13 10 5
42 3 14 6 9
In real situation the list will be very long, and the dataframe is much
wider. That’s why I want to use Rcpp to improve the speed.
I got stuck from the very beginning, I failed to import this list to
Rcpp, not to mention import the dataframe to Rcpp.
I’ve checked the book Seamless R and C++ integration with Rcpp but find
NO example deals with such case.
Thank you very much for your support!
Best regards,
Sky
Rcpp has the Rcpp::DataFrame class which might help you but it does not
do much.
A data.frame is merely a list of vectors of the same size, but of
arbitrary types. This makes it difficult to process rows of a data frame.
So you have to do some work to grab a row of a data frame and apply
something to it. The code below assumes that you have a data frame that
contains only numeric vectors.
#include <Rcpp.h>
using namespace Rcpp;
double fun( NumericVector x){
return sum(x) ;
}
void fill_row( NumericVector& row, const std::vector<NumericVector>&
vectors, int i, int n){
for( int j=0; j<n; j++){
row[j] = vectors[j][i] ;
}
}
// [[Rcpp::export]]
NumericVector apply_row_df( DataFrame df ){
int n = df.size() ;
int nrows = df.nrows() ;
std::vector<NumericVector> vectors(n) ;
for( int i=0; i<n; i++) vectors[i] = df[i] ;
NumericVector row(n) ;
NumericVector results(nrows) ;
for( int i=0; i<nrows; i++){
fill_row( row, vectors, i, n );
results[i]=fun(row) ;
}
return results ;
}
// [[Rcpp::export]]
List apply_all( List list ){
return lapply( list, apply_row_df) ;
}
/*** R
df <- data.frame( x = seq(0, 10, .1), y = seq(0, 10, .1), z =
seq(0, 10, .1) )
apply_row_df( df )
list_of_df <- rep( list(df), 10 )
apply_all( list_of_df )
*/
The function apply_row_df works on a single data frame, it calls the fun
function on each row of the data frame. Prior to that we fill the vector
"row" with data using the fill_row function.
Then it is just looping, etc ...
The apply_all is just a convenience that will apply apply_row_df to each
item of a list.
Hope this helps.
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel