Hello, I was hoping that someone familiar with the implementation details of the loess algorithm might be able to help me resolve some difficulties I am having. I am attempting to reproduce some of the functionality of the loess() function in C++. My primary motivation is that I would like to understand the algorithm in detail.
So far I have managed to create a working port in C++ for the linear (degree=1) case. I use the GNU Scientific Library (GSL) to perform the least squares fitting. The program I wrote can be viewed here: http://pastebin.com/0mhMiZee The test data I am using comes from here: http://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm After reading Cleveland 1992 paper: A Package of C and Fortran Routines for Fitting Local Regression Models, I was under the impression that in principle the quadratic (degree=2) case would be straight forward, simply fit y ~ x + I(x^2) instead of y ~ x. I realize that Cleveland goes to some effort to optimize the computation by using kd-trees and interpolation to reduce the calculations, but I believe that the proper options to the R loess.control() function can disable that. When I attempt to fit the quadratic in the simple way I describe above, I get results that are very close to R but well outside of any floating point rounding error and signals something more serious. The test program I have written in C++ for the quadratic case can be viewed here: http://pastebin.com/8LZSFSyg Here is some working R code that performs the quadratic loess fit: x <- c(0.5578196, 2.0217271, 2.5773252, 3.4140288, 4.3014084, 4.7448394, 5.1073781, 6.5411662, 6.7216176, 7.2600583, 8.1335874, 9.1224379, 11.9296663, 12.3797674, 13.2728619, 14.2767453, 15.3731026, 15.6476637, 18.5605355, 18.5866354, 18.7572812) y <- c(18.63654, 103.49646, 150.35391, 190.51031, 208.70115, 213.71135, 228.49353, 233.55387, 234.55054, 223.89225, 227.68339, 223.91982, 168.01999, 164.95750, 152.61107, 160.78742, 168.55567, 152.42658, 221.70702, 222.69040, 243.18828) df <- data.frame(x=x, y=y) y.loess <- loess(y ~ x, span=0.33, degree=2, control=loess.control(surface="direct", statistics="exact", trace.hat="exact"), data=df) y.predict <- predict(y.loess) And the output: > y.predict [1] 17.7138 111.3904 147.5650 190.6561 209.4832 217.5014 225.5488 233.8617 231.7305 226.7886 224.7809 224.1512 [13] 167.6779 162.0389 154.9118 159.9907 161.1376 157.1839 221.0499 223.8055 242.7856 My C++ program for the quadratic case mentioned above produces the following output: X Y SMOOTHED 0.557820 18.636540 17.128316 2.021727 103.496460 113.404389 2.577325 150.353910 145.509245 3.414029 190.510310 188.070299 4.301408 208.701150 209.587060 4.744839 213.711350 217.841596 5.107378 228.493530 223.961597 6.541166 233.553870 232.331387 6.721618 234.550540 231.725151 7.260058 223.892250 229.100556 8.133587 227.683390 225.721553 9.122438 223.919820 222.121049 11.929666 168.019990 167.448082 12.379767 164.957500 162.050603 13.272862 152.611070 158.478596 14.276745 160.787420 157.644046 15.373103 168.555670 161.076443 15.647664 152.426580 161.123333 18.560536 221.707020 225.721022 18.586635 222.690400 226.986782 18.757281 243.188280 235.647504 If you are still reading, thanks! I have tried poking around in the R source code for the R_loess_raw (loessc.c) method, but I find it quite dense and hard to follow... if anyone has some debugging ideas, they would be greatly appreciated. Thank you for your attention, Scott [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.