RE: AI-GEOSTATS: moving averages and trend
Sebastiano, I am struggling to understand why you are interested in doing trend + residual separation? There can be no unique decomposition of a data set into 'trend' and 'residual', it is a judgement about what model you feel is most appropriate given your prior beliefs and observations (evidence). The only thing you can say about the model is to validate it on out of sample data (even as a Bayesian I say this!). So in a sense there is no correct decomposition, and any decomposition is valid (so long as it is correctly implemented - maybe that is your question?). Are some decompositions better than others? Well yes they are likely to be, but this largely depends on your data (and the completeness of the overall model). In terms of your original question about the shape of the kernel there is no overall theory that I am aware of - different kernels will have different properties in terms of the function classes that they represent (e.g. differentiability, frequency response / characteristic length scales). Kernel families will have different null spaces which might or might not be important for your specific application and what you want to find out. I'm not sure if this is terribly helpful ... but I think it is the reality - everything depends on your data and your judgement (prior). Conditional on those you get a model and you need to validate this model carefully ... then you are OK. cheers Dan --- Dr Dan Cornford Senior Lecturer, Computer Science and NCRG Aston University, Birmingham B4 7ET www: http://wiki.aston.ac.uk/DanCornford/ tel: +44 (0)121 204 3451 mob: 07766344953 --- From: owner-ai-geost...@jrc.ec.europa.eu [mailto:owner-ai-geost...@jrc.ec.europa.eu] On Behalf Of seba Sent: 02 February 2010 08:39 To: Pierre Goovaerts Cc: ai-geostats@jrc.it Subject: Re: AI-GEOSTATS: moving averages and trend Hi Pierre I think that for my task factorial kriging is a little bit too much sophisticated (nevertheless, is there any open source or free implementation of it ??? I remember that it is implemented in Isatis.). I have an exhaustive and regularly spaced data set (i.e. a grid) and I need to calculate locally the spatial variability of the residual surface or better I would like to calculate the spatial variability of the high frequency component. Here I'm lucky because I know exactly what I want to see and what I need to filter out. In theory, using (overlapping) moving window averages (but here it seems better to use some more complex kernel) one should be able to filter out the short range variability (characterized by an eventual variogram range within the window size???). Seeing the problem from another perspective, in the case of a perfect sine wave behavior, I should be able to filter out spatial variability components with wave lengths up to the window size. But maybe there is something flawed in my reasoningso feedback is appreciated! Bye Sebastiano At 16.27 01/02/2010, you wrote: well Factorial Kriging Analysis allows you to tailor the filtering weights to the spatial patterns in your data. You can use the same filter size but different kriging weights depending on whether you want to estimate the local or regional scales of variability. Pierre 2010/2/1 seba sebastiano.trevis...@libero.itmailto:sebastiano.trevis...@libero.it Hi José Thank you for the interesting references. I'm going to give a look! Bye Sebastiano At 15.46 01/02/2010, José M. Blanco Moreno wrote: Hello again, I am not a mathematician, so I never worried too much on the theoretical reasons. You may be able to find some discussion on this subject in Eubank, R.L. 1999. Nonparametric Regression and Spline Smoothing, 2a ed. M. Dekker, New York. You may be also interested on searching information in and related to (perhaps citing) this work: Altman, N. 1990. Kernel smoothing of data with correlated errors. Journal of the American Statistical Association, 85: 749-759. En/na seba ha escrit: Hi José Thank you for your reply. Effectively I'm trying to figure out the theoretical reasons for their use. Bye Sebas -- Pierre Goovaerts Chief Scientist at BioMedware Inc. 3526 W Liberty, Suite 100 Ann Arbor, MI 48103 Voice: (734) 913-1098 (ext. 202) Fax: (734) 913-2201 Courtesy Associate Professor, University of Florida Associate Editor, Mathematical Geosciences Geostatistician, Computer Sciences Corporation President, PGeostat LLC 710 Ridgemont Lane Ann Arbor, MI 48103 Voice: (734) 668-9900 Fax: (734) 668-7788 http://goovaerts.pierre.googlepages.com/
RE: AI-GEOSTATS: Interpolation of measures with measurement errors
Enrico, sorry we have caused some problems / confusion. I will reply inline below: From: Enrico Guastaldi [mailto:enrico.guasta...@gmail.com] Sent: 05 October 2009 17:04 To: Cornford, Dan Cc: ai-geostats@jrc.it; r-sig-...@stat.math.ethz.ch Subject: Re: AI-GEOSTATS: Interpolation of measures with measurement errors Dear Dan and dear lists members, I'm trying to explain my problem in two steps: the first is the theory, the second one the practical application on my case study. 1) Actually my errors are not so gaussian, however I think I could consider them gaussian like. It seems I've to set not zero the values of diagonal covariance matrix (used for the variogram). Maybe I've to put in these matrix the measurements errors instead zero. However, these are not actual variances, but ranges deriving from the instrumental error. Is it possible to assume it as confidence interval and use it such as the diagonal of covariance matrix? This in theory. DC: I think this highlights that when people specify errors they should be as precise as possible (this is what UncertML is designed to try and help people do). Giving a range is really only useful with a precise definition of what the min and max values mean (5th and 95th percentiles?). In a Bayesian approach to the problem, which I prefer, one is required to specify a probability distribution (two percentiles alone are not enough). This can be quite a challenge, but the other imprecise probability models while often being rather attractive on the surface can lack the ability to undertake complex analysis and generally can only make weaker statements (note I don't want to start a debate here about Bayesian versus other subjective / imprecise uncertainty frameworks!). DC: Also in theory yes, I might want to put observation errors on the diagonal of the covariance matrix, but I might also want to allow an additional nugget effect to model unresolved variation (i.e. things happening below the measurement separation distance) - I would want to estimate the nugget effect, but fix the known observation errors - we do this in the psgp code. DC: On the issue of Gaussian errors, well I think you are going to need to make some distributional assumption here and the Gaussian one will make life easier. My experience is that if your underlying distribution is symmetric then your mean predictions will not be overly sensitive to the exact distributional form (i.e. small deviations from Gaussian), but if the distribution is skewed significantly then this might be more of an issue. This is in practice I must say. I would also emphasise here we are discussing the errors on the observations, not the observations themselves! But in practice??? 2) In practice, since I'm not a programmer, I've tried to use the psgp R package in order to perform the interpolation of my variable with the associated errors. I've tried to use the Intamap on line service, but I've got some problem with the net traffic (maybe) and I did not achieve the result. So, I installed psgp and intamap in my computer, in order to perform the calculation locally in my machine. First ten rows of my dataset are as follows (names of variables:X, Y, V2_PPM, ERR_V2_PPM:) 715946.900,4826440.340,2.280,0.140 722818.590,4824910.500,2.820,0.140 725514.920,4815239.460,2.380,0.130 722793.930,4810022.240,3.160,0.150 717682.540,4811456.540,3.040,0.140 712376.620,4806677.870,2.730,0.150 716270.140,4801958.660,2.650,0.140 721068.720,4801447.860,2.990,0.150 718812.980,4792920.780,4.450,0.170 722315.960,4788258.960,2.190,0.130 ... Actually I'm not uderstanding how process these data to interpolate my variable V2_PPM. The EPSG code il 23032. Then, I think I've to set up my object as Intamap object as follows (is it correct?): library(psgp) coordinates(rock) = ~X+Y data(grid.enrico) gridded(grid.enrico) = ~x+y proj4string(rock) = CRS(+init=epsg:23032) proj4string(grid.enrico) = CRS(+init=epsg:23032) # set up intamap object: obj = createIntamapObject( observations = meuse, predictionLocations = grid.enrico, targetCRS = +init=epsg:23032, class = psgp However, I did not understand where I can declare the variable containing the measurement errors. DC: I am not a great R user I am afraid (I probably should not be sending this to the R sig list!) but Jon / Remi might be able to provide more advice on how to use psgp's from R in practice. I know the WPS rather better! DC: How big is your data set? If you send it to me, I'd be very interested to try with the web service since this is part of our project outcomes!! Moreover, I do not understand how prosecute the process (i.e. experimental variogram, variogram modelling, and finally kriging) DC: I can say more about how the psgp method works. This is a maximum (marginal) likelihood inference based method, so there is no experimental variogram modelling, rather a fixed covariance function (in the present
RE: AI-GEOSTATS: Interpolation of measures with measurement errors
Enrico, we have built an online system to perform this as part of the INTAMAP project - you can try this here: http://intamap.geo.uu.nl/~jon/intamap/tryIntamapj.php If you paste in observations with Gaussian errors (I assume the +/- means one or two standard deviations - I would check this!) in the form x, y, value, stddev then our interpolation method (called psgp, which will shortly be released as an R and C++ library too) will provide a prediction of the mean and variance using a maximum likelihood Gaussian process method. The interface on that web page should allow you to try out the system very simply and the associated web site has details for more interactive ways of using the service, or installing the system on your own machine. If you want to have a quick look I suggest using the OMI NO2 data set which contains error estimates (this could take a little bit of time depending on the usage of the service!). Note the visualisation is still a little beta, so I would not entirely trust the legends! Further details will be added to the web site in the next few weeks! cheers Dan --- Dr Dan Cornford Senior Lecturer, Computer Science and NCRG Aston University, Birmingham B4 7ET www: http://wiki.aston.ac.uk/DanCornford/ tel: +44 (0)121 204 3451 mob: 07766344953 --- From: owner-ai-geost...@jrc.it [mailto:owner-ai-geost...@jrc.it] On Behalf Of Enrico Guastaldi Sent: 28 September 2009 13:55 To: ai-geostats@jrc.it Subject: AI-GEOSTATS: Interpolation of measures with measurement errors Dear list members, I'm looking for some kind of interpolation for values of an environmental variable which has been measured together the measurement errors, for instance a measure is 45ppm +or- 10.7ppm, another one is 10ppm +or- 3ppm, and so on. In practice, measures and measurement errors are two independent variable. I could use some kind of kriging, however I exactly know the magnitude of each error at every sampled location, i.e. the value plus or minus the error gave me by the laboratory. Could anyone tell me what kind of function should I use for handling this problem? It should be nice some R package, of course, but I need to understand the background theory. Thanks in advance, Regards, Enrico Guastaldi