On Mon, Aug 16, 2010 at 11:47 AM, Romain Francois <rom...@r-enthusiasts.com> wrote: > Le 16/08/10 15:33, Douglas Bates a écrit : >> >> On Mon, Aug 16, 2010 at 8:10 AM, Romain Francois >> <rom...@r-enthusiasts.com> wrote: >>> >>> Nothing really conclusive here ... >>> >>> I've added a preliminary version of SUGAR_BLOCK_3 (only for the case >>> where >>> all inputs are vectors) so that one can write : >>> >>> incl8<- ' >>> inline double y_log_y(double y, double mu) { >>> return y ? y*log(y/mu) : 0.; >>> } >>> inline double yMu(double y, double mu) { >>> return 2. * (y_log_y(y, mu) + y_log_y(1.-y,1.-mu)); >>> } >>> double yEta(double y, double eta, double w) { >>> return w * yMu(y, 1./(1. + exp(-eta))); >>> } >>> SUGAR_BLOCK_3(yeta, ::yEta) >>> ' >>> code8<- ' >>> NumericVector ans = yeta(NumericVector(yy), NumericVector(ee), >>> NumericVector(ww) ); >>> return ans ; >>> ' >>> >>> I get : >>> >>> $ Rscript ylogy.R >>> Le chargement a nécessité le package : methods >>> test replications elapsed user.self >>> 1 fx0(yy, ee, ww) 500 0.953 0.947 >>> 2 fx1(yy, ee, ww) 500 2.576 2.521 >>> 3 fx2(yy, ee, ww) 500 2.887 2.843 >>> 4 fx3(yy, ee, ww) 500 2.898 2.854 >>> 5 fx4(yy, ee, ww) 500 1.273 1.241 >>> 6 fx5(yy, ee, ww) 500 1.502 1.450 >>> 7 fx6(yy, ee, ww) 500 1.408 1.356 >>> 8 fx7(yy, ee, ww) 500 1.303 1.273 >>> 9 fx8(yy, ee, ww) 500 1.452 1.389 >>> >>> So slightly better than fx5 but not as good as fx4 >>> >>> >>> I guess there needs to be some investigation on why fx1, fx2 and fx3 >>> don't >>> perform "well". >> >> To me the interesting question is why fx0 does better than fx4 when my >> guess is that it should not be as fast. Is it possible that the level >> of optimization for the C compiler when compiling R is different from >> the level of optimization when inline calls the C++ compiler? > > Apparently not, with the attached file, which essentially is the code that > lives in family.c, I can actually get better performance than the code in R: > > $ Rscript ylogy9.R > Le chargement a nécessité le package : methods > test replications elapsed user.self > 1 fx0(yy, ee, ww) 500 0.954 0.947 > 2 fx9(yy, ee, ww) 500 0.943 0.936 > > Probably just because I cache the symbols rather than let .Call retrieve > them each time. > > So, is there something in family.c you don't do in fx4 ? > > Is it this perhaps: > > for (i = 0; i < n; i++) { > double etai = reta[i], tmp; > tmp = (etai < MTHRESH) ? DOUBLE_EPS : > ((etai > THRESH) ? INVEPS : exp(etai)); > rans[i] = x_d_opx(tmp); > }
That part is to truncate the range of the eta variable so that the corresponding mu doesn't get to 0 or 1. When you get to the extremes of the interval [0,1] for mu (which here represents a probability) the increment calculation breaks down on a division by zero. These days we skip the check there and apply it later when calculating the variance. > > -- > Romain Francois > Professional R Enthusiast > +33(0) 6 28 91 30 30 > http://romainfrancois.blog.free.fr > |- http://bit.ly/bzoWrs : Rcpp svn revision 2000 > |- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th > `- http://bit.ly/aAyra4 : highlight 0.2-2 > > _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel