Re: [Rcpp-devel] Using sugar expressions for evaluating deviance residuals

Romain Francois Sun, 15 Aug 2010 01:39:42 -0700

Hi,

Could you provide some minimal example using inline, etc ...

Here are a few hints in the meantime. In the next version of Rcpp, I'veadded a few things that help generating sugar functions. This is how forexample the sugar version of choose is currently implemented :


SUGAR_BLOCK_2(choose      , ::Rf_choose     )

The SUGAR_BLOCK_2 macro (perhaps I should prefix it with RCPP_) lives inRcpp/sugar/SugarBlock.h


You can promote y_log_y to a sugar function like this :

inc <- '
double y_log_y(double y, double mu){
    return (y) ? (y * log(y/mu)) : 0;
}
SUGAR_BLOCK_2( y_log_y , ::y_log_y )
'

fx <- cxxfunction( signature( y = "numeric", mu = "numeric" ), '
        NumericVector res = y_log_y(
                NumericVector(y),
                NumericVector(mu)
        ) ;
        return res ;
', plugin = "Rcpp", includes = inc )
fx( runif(11) , seq(0, 1, .1 ) )


This gives you 3 Rcpp::y_log_y functions :

- one for when y is a double and mu is a NumericVector (or any numericsugar expression)

- one for when y is a NumericVector and mu is a double
- one for both are NumericVector.

However, at the moment it does not take care about recycling so it is upto the user to make sure that y and mu have the same length. I'mcurrently thinking about how to deal with recycling.




rep is a possibility here, but consider this hint in the TODO file:

    o   not sure rep should be lazy, i.e. rep( x, 4 ) fetches x[i] 4 times,
        maybe we should use LazyVector like in outer to somehow cache the
        result when it is judged expensive to calculate

One other thing about sugar is that it has to do many checks for missingvalues so if (with the one defined above) you did call this expression:


2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu))

(and again, currently binary operators +,*,.. don't take care of recycling)

you would have many checks for missing values. That is fine if you mayhave missing values, because it will propagate them correctly, but itcan slow things down because they are tested for at every step.

If you are sure that y and mu don't contain missing values, then perhapsone thing we can do is embed that information in the data so that sugardoes not have to check for missing values because it just assumes thereare not any. Most of the code in sugar contains version that ignoresmissing values, controlled by a template parameter. For example seq_lencreates a sugar expression where we know for sure that it does notcontain missing values.

One way perhaps to short circuit this is to first write the code withdouble's and then promote it to sugar:


double resid( double y, double mu, double w){
        return 2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu) ;
}

But then you need someone to write SUGAR_BLOCK_3 or write it manually.

Another idea would be to have something likeNumericVector::import_transform but that would take 3 vector parametersinstead of 1.

Sorry if this email is a bit of a mess, I sort of wrote the ideas asthey came.


Romain

Le 14/08/10 20:28, Douglas Bates a écrit :

In profiling code for generalized linear mixed models for binary
responses I found that a substantial portion of the execution time was
being spend in evaluating the R functions for inverse links and for
the deviance residuals.  As I result I wrote C code (in
$RSRC/src/library/stats/src/family.c) or some of those.

The way that some research is going I expect that I will soon be in
the position where I need to evaluate such expressions even more
frequently so it is worthwhile to me to tune it up.

In general there will be three NumericVector objects -- y (observed
responses), wt (weights) and eta (linear predictors) -- plus an
IntegerVector ind.  y, wt and ind will all have length n.  eta's
length can be a multiple of n.

The index vector, ind, is a factor with k<  n levels so all of its
elements are between 1 and k.

The objective is to apply the inverse link function to eta, producing
the predicted mean response, mu, evaluate the deviance residuals for
y, mu and wt and sum the deviance residuals according to the indices
in ind.

The simplest inverse link is for the logit link

  NumericVector mu = 1./(1. + exp(-eta));

For the deviance residuals I defined an helper function

static R_INLINE
double y_log_y(double y, double mu)
{
     return (y) ? (y * log(y/mu)) : 0;
}

which has the appropriate limiting behavior when y is zero.  Using
that, the deviance residuals can be evaluated as

2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu))

That last expression could have different lengths of y and mu but I
could use rep to extend y, wt and ind to the desired length.

The conditional evaluation in y_log_y is not something that can be
ignored because, in many cases y consists only of 0's and 1's so
either y_log_y(y, mu) or y_log_y(1.-y, 1.-mu) will fail the condition
in the ? expression.

Are there suggestions on how best to structure the calculation to make
it blazingly fast?



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/bzoWrs : Rcpp svn revision 2000
|- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th
`- http://bit.ly/aAyra4 : highlight 0.2-2

_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Using sugar expressions for evaluating deviance residuals

Reply via email to