Le 07/11/2013 14:43, Romain Francois a écrit :
Le 07/11/2013 14:30, George Vega Yon a écrit :
Romain,

Thanks for your quick response. I've already received that suggestion,
but, besides of haven't ever used C++, I wanted to understand first
what am I doing wrong.

For that type of code, it is actually quite simpler to learn c++ than it
is to learn the macros and loose typing of the R interface.

Still, would you give me a small example, in R
C++, of:

   - Creating a generic vector "L1" of size N
   - Creating a data.frame "D" and increasing its the number of rows
of it
   - Storing the data.frame "D" in the first element of "L1"

I would be very gratefull if you can do that.

#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
List example(int N){
     List out(N) ;

     // let's first accumulate data in these two std::vector
     std::vector<double> x ;
     std::vector<int> y ;
     for( int i=0; i<30; i++){
         x.push_back( sqrt( i ) ) ;
         y.push_back( i ) ;
     }

     // Now let's create a data frame
     DataFrame df = DataFrame::create(
         _["x"] = x,
         _["y"] = y
         ) ;

     // storing df as the first element of out
     out[0] = df ;

     return out ;
}

Forgot to mention. You would just put the code above in a .cpp file and call sourceCpp on it.

sourceCpp( "file.cpp" )
example( 3 )

You can also do it like this acknowleding what a data frame really is
(just a list of vectors):

     List df = List::create(
         _["x"] = x,
         _["y"] = y
         ) ;
     df.attr( "class" ) = "data.frame" ;
     df.attr( "row.names") = IntegerVector::create(
         IntegerVector::get_na(), -30 ) ;


The key thing here is that we accumulate data into std::vector<double>
and std::vector<int> which know how to grow efficiently. Looping around
with SET_LENGTH will allocate and copy data at each iteration of the
loop which will lead to disastrous performance.

Romain

Thanks again!

George Vega Yon
+56 9 7 647 2552
http://ggvega.cl


2013/11/7 Romain Francois <rom...@r-enthusiasts.com>:
Hello,

Any particular reason you're not using Rcpp? You would have access to
nice
abstraction instead of these MACROS all over the place.

The cost of these abstractions is close to 0.

Looping around and SET_LENGTH is going to be quite expensive. I would
urge
you to accumulate data in data structures that know how to grow
efficiently,
i.e. a std::vector and then convert that to an R vector when you're done
with them.

Romain

Le 07/11/2013 14:03, George Vega Yon a écrit :

Hi!

I didn't wanted to do this but I think that this is the easiest way
for you to understand my problem (thanks again for all the comments
that you have made). Here is a copy of the function that I'm working
on. This may be tedious to analyze, so I understand if you don't feel
keen to give it a time. Having dedicated many hours to this (as a new
user of both C and R C API), I would be very pleased to know what am I
doing wrong here.

G0 is a Nx2 matrix. The first column is a group id (can be shared with
several observations) and the second tells how many individuals are in
that group. This matrix can look something like this:

id_group  nreps
1  3
1  3
1  3
2  1
3  1
4  2
5  1
6  1
4  2
...

L0 is list of two column data.frames with different sizes. The first
column (id) are row indexes (with values 1 to N) and the second column
are real numbers. L0 can look something like this
[[1]]
id  lambda
3  0.5
15  0.3
25  0.2
[[2]]
id  lambda
15  0.8
40  0.2
...
[[N]]
id  lambda
80  1

TE0 is a int scalar in {0,1,2}

T0 is a dichotomous vector of length N that can look something like
this
[1] 0 1 0 1 1 1 0 ...
[N] 1

L1 (the expected output) is a modified version of L0, that, for
instance can look something like this (note the rows marked with "*")

[[1]]
id  lambda
3  0.5
*15  0.15 (15 was in the same group of 50, so I added this new row and
divided the value of lambda by two)
25  0.2
*50  0.15
[[2]]
id  lambda
15  0.8
40  0.2
...
[[N]]
id  lambda
*80  0.333 (80 shared group id with 30 and 100, so lambda is divided
by 3)
*30  0.333
*100 0.333

That said, the function is as follows

SEXP distribute_lambdas(
    SEXP G0,  // Groups ids (matrix of Nx2). First column = Group Id,
second column: Elements in the group
    SEXP L0,  // List of N two-column dataframes with different
number of
rows
    SEXP TE0, // Treatment effect (int scalar): ATE(0) ATT(1) ATC(2)
    SEXP T0   // Treat var (bool vector, 0/1, of size N)
)
{

    int i, j, l, m;
    const int *G = INTEGER_POINTER(PROTECT(G0 = AS_INTEGER(G0 )));
    const int *T = INTEGER_POINTER(PROTECT(T0 = AS_INTEGER(T0 )));
    const int *TE= INTEGER_POINTER(PROTECT(TE0= AS_INTEGER(TE0)));
    double *L, val;
    int *I, nlambdas, nreps;

    const int n = length(T0);

    PROTECT_INDEX pin0, pin1;
    SEXP L1;
    PROTECT(L1 = allocVector(VECSXP,n));
    SEXP id, lambda;

    // Fixing size
    for(i=0;i<n;i++)
    {
      SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));
    //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, NEW_INTEGER(100));
    //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, NEW_NUMERIC(100));
    }

    // For over the list, i.e observations
    for(i=0;i<n;i++)
    {

      R_CheckUserInterrupt();

      // Checking if has to be analyzed.
      if (
        ((TE[0] == 1 & !T[i]) | (TE[0] == 2 & T[i])) |
        (length(VECTOR_ELT(L0,i)) != 2)
      )
      {
        SET_VECTOR_ELT(L1,i,R_NilValue);
        continue;
      }

      // Checking how many rows does the i-th data.frame has
      nlambdas = length(VECTOR_ELT(VECTOR_ELT(L0,i),0));

      // Pointing to the data.frame's origianl values
      I =
INTEGER_POINTER(AS_INTEGER(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),0))));
      L =
NUMERIC_POINTER(AS_NUMERIC(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),1))));

      // Creating a copy of the pointed values
      PROTECT_WITH_INDEX(id   =
duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)),
&pin0);

PROTECT_WITH_INDEX(lambda=duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)),
&pin1);

      // Over the rows of the i-th data.frame
      nreps=0;
      for(l=0;l<nlambdas;l++)
      {
        // If the current lambda id is repeated, ie ther are more
individuals
        // with the same covariates, then enter.
        if (G[n+I[l]-1] > 1)
        {
          /* Changing the length of the object */
          REPROTECT(SET_LENGTH(id,    length(lambda) + G[n+I[l]-1] -1),
pin0);
          REPROTECT(SET_LENGTH(lambda,length(lambda) + G[n+I[l]-1] -1),
pin1);

          // Getting the new value
          val = L[l]/G[n+I[l] - 1];
          REAL(lambda)[l] = val;

          // Looping over the full set of groups
          m = -1,j = -1;
          while(m < (G[n+I[l]-1] - 1))
          {
            // Looking for individuals in the same group
            if (G[++j] != G[I[l]-1]) continue;

            // If it is the current lambda, then do not asign it
            if (j == (I[l] - 1)) continue;

            INTEGER(id)[length(id) - (G[n+I[l]-1] - 1) + ++m] = j+1;
            REAL(lambda)[length(id) - (G[n+I[l]-1] - 1) + m] = val;
          }

          nreps+=1;
        }
      }

      if (nreps)
      {
        // Replacing elements from of the list (modified)
        SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
        SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
      }
      else {
        // Setting the list with the old elements
        SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0,
          duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)));
        SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1,
          duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)));
      }

      // Unprotecting elements
      UNPROTECT(4);
    }

    Rprintf("Exito\n") ;
    UNPROTECT(4);

    return L1;
}

Thanks again in advanced.

George Vega Yon
+56 9 7 647 2552
http://ggvega.cl

2013/11/5 George Vega Yon <g.vega...@gmail.com>:

Either way, understanding that it may not be the best way of do it, is
there anything wrong in what I'm doing??
George Vega Yon
+56 9 7 647 2552
http://ggvega.cl


2013/11/5 Gabriel Becker <gmbec...@ucdavis.edu>:

George,

My point is you don't need to create them and then grow them....


for(i=0;i<n;i++)
{
    // Creating the "id" and "lambda" vectors. I do this in every
repetition
of
    // the loop.

    // ... Some other instructions where I set the value of an
integer
    // z, which tells how much do the vectors have to grow ...

PROTECT(id=allocVector(INTSXP, 4 +z));
PROTECT(lambda=allocVector(REALSXP, 4 +z));


    // ... some lines where I fill the vectors ...

    // Storing the new vectors at the i-th element of the list
    SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
    SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));

    // Unprotecting the "id" and "lambda" vectors
    UNPROTECT(2);
}

~G


On Tue, Nov 5, 2013 at 1:56 PM, George Vega Yon <g.vega...@gmail.com>
wrote:


Gabriel,

While the length (in terms of number of SEXP elements it stores)
of L1
doesn't changes, the vectors within L1 do (sorry if I didn't
explained
it well before).

The post was about a SEXP object that grows, in my case, every
pair of
vectors in L1 (id and lambda) can change lengths, this is why I need
to reprotect them. I populate the i-th element of L1 by creating the
vectors "id" and "lambda", setting the length of these according to
some rule (that's the part where lengths change)... here is a
reduced
form of my code:

//////////////////////////////////////// C
////////////////////////////////////////
const int = length(L0);
SEXP L1;
PROTECT(L1 = allocVector(VECSXP,n));
SEXP id, lambda;

// Fixing size
for(i=0;i<n;i++)
    SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));

for(i=0;i<n;i++)
{
    // Creating the "id" and "lambda" vectors. I do this in every
repetition
of
    // the loop.
    PROTECT_WITH_INDEX(id=allocVector(INTSXP, 4), &ipx0);
    PROTECT_WITH_INDEX(lambda=allocVector(REALSXP, 4), &ipx1);

    // ... Some other instructions where I set the value of an
integer
    // z, which tells how much do the vectors have to grow ...

    REPROTECT(SET_LENGTH(id,    length(lambda) + z), ipx0);
    REPROTECT(SET_LENGTH(lambda,length(lambda) + z), ipx1);

    // ... some lines where I fill the vectors ...

    // Storing the new vectors at the i-th element of the list
    SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
    SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));

    // Unprotecting the "id" and "lambda" vectors
    UNPROTECT(2);
}

UNPROTECT(1);

return L1;
//////////////////////////////////////// C
////////////////////////////////////////

I can't set the length from the start because every pair of
vectors in
L1 have different lengths, lengths that I cannot tell before
starting
the loop.

Thanks for your help,

Regards,

George Vega Yon
+56 9 7 647 2552
http://ggvega.cl


2013/11/5 Gabriel Becker <gmbec...@ucdavis.edu>:

George,

I don't see the relevance of the stackoverflow post you linked.
In the
post,
the author wanted to change the length of an existing "mother list"
(matrix,
etc), while you specifically state that the length of L1 will not
change.

You say that the child lists (vectors if they are
INTSXP/REALSXP) are
variable, but that is not what the linked post was about unless
I am
completely missing something.

I can't really say more without knowing the details of how the
vectors
are
being created and why they cannot just have the right length
from the
start.

As for the error, that is a weird one. I imagine it means that a
SEXP
thinks
that it has a type other than ones defined in Rinternals. I can't
speak
to
how that could have happened from what you posted though.

Sorry I can't be of more help,
~G



On Mon, Nov 4, 2013 at 8:00 PM, George Vega Yon
<g.vega...@gmail.com>
wrote:


Dear R-devel,

A couple of weeks ago I started to use the R C API for package
development. Without knowing much about C, I've been able to write
some routines sucessfully... until now.

My problem consists in dynamically creating a list ("L1") of lists
using .Call, the tricky part is that each element of the "mother
list"
contains two vectors (INTSXP and REALEXP types) with varying
sizes;
sizes that I set while I'm looping over another list's ("L1")
elements
   (input list). The steps I've follow are:

FIRST: Create the "mother list" of size "n=length(L0)" (doesn't
change) and protect it as
    PROTECT(L1=allocVector(VECEXP, length(L0)))
and filling it with vectors of length two:
    for(i=0;i<n;i++) SET_VECTOR_ELT(L1,i, allocVector(VECSXP, 2));

then, for each element of the mother list:

    for(i=0;i<n;i++) {

SECOND: By reading this post in Stackoverflow



http://stackoverflow.com/questions/7458364/growing-an-r-matrix-inside-a-c-loop/7458516#7458516

I understood that it was necesary to (1) create the "child
lists" and
protecting them with PROTECT_WITH_INDEX, and (2) changing its size
using SETLENGTH (Rf_lengthgets) and REPROTECT ing the lists in
order
to tell the GC that the vectors had change.

THIRD: Once my two vectors are done ("id" and "lambda"), assign
them
to the i-th element of the "mother list" L1 using
    SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, duplicate(id));
    SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, duplicate(lambda));

and unprotecting the elements protected with index: UNPROTECT(2);

}

FOURTH: Unprotecting the "mother list" (L1) and return it to R

With small datasets this works fine, but after trying with bigger
ones
R (my code) keeps failing and returning a strange error that I
haven't
been able to identify (or find in the web)

    "unimplemented type (29) in 'duplicate'"

This happens right after I try to use the returned list from my
routine (trying to print it or building a data-frame).

Does anyone have an idea of what am I doing wrong?

Best regards,

PS: I didn't wanted to copy the entire function... but if you
need it
I can do it.

George Vega Yon
+56 9 7 647 2552
http://ggvega.cl

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis





--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to