[R-sig-Geo] Generating uncertainty maps with Monte-Carlo simulation and large rasters

2016-10-11 Thread Mercedes Román
Dear list members,



I am trying to generate uncertainty maps of available water capacity
predicted with a pedotransfer function via Monte-Carlo simulation. I have
the raster layers with the mean and SD of the input variables, so I can
sample from their distributions (I ignore the correlations among variables,
and consider their distributions as univariate in this exercise). All I
want is to apply Monte-Carlo simulation for each cell of the raster. I
found very useful some of the previous postings on this subject with the
mc2d package (
http://r-sig-geo.2731867.n2.nabble.com/parallel-raster-processing-with-calc-and-mc2d-monte-carlo-simulation-td7587901.html#a7587902).
Finally, I found that it was quicker to use the package lhs, generate a
latin hypercube sample from a uniform distribution, and then obtain the
sample for each of my variables with qnorm. I give an example of my
function bellow. The PTF is a bit long; you can ignore most of it and just
look at the beginning, where I sample from the distributions, and at the
end (the output layers with the prediction interval limits).



Applied to a small raster it is pretty quick. The problem is that I have to
apply the function to 300 raster stacks of 2805374 cells each, and that is
going to take a long time even with parallel computing using clusterR (a
couple of weeks with a computer like mine).



I imagine some of you have dealt with similar problems. Do you have any
advice for sampling from probability distributions in monte-carlo
simulations from big raster files? Or any advice for generating uncertainty
maps?



I believe much of the time is due to accessing the data (getValues). So I
tried to do an alternative function, by extracting the data of the raster
stack first into a dataframe, but that did not seem to save time either.



Thank you in advance for all your help.



Kind regards,



Mercedes Roman







require(raster)

require(sp)

require(rgdal)

library(lhs)

library(parallel)

library(doParallel)

library(snow)





### Generate example rasters

r <- raster(nrow=10, ncol=10)

set.seed(23850)



s1 <- setValues(r,runif(n = 100, min = 0, max=1000))

s2 <- setValues(r,runif(n = 100, min = 0, max=100))

s3 <- setValues(r,runif(n = 100, min = 0, max=1000))

s4 <- setValues(r,runif(n = 100, min = 0, max=20))

s5 <- setValues(r,runif(n = 100, min = 0, max=300))

s6 <- setValues(r,runif(n = 100, min = 0, max=20))

s7 <- setValues(r,runif(n = 100, min = 0, max=200))

s8 <- setValues(r,runif(n = 100, min = 0, max=20))

s9 <- setValues(r,runif(n = 100, min = 0, max=14))

s10 <- setValues(r,runif(n = 100, min = 0, max=1))



s<-stack(s1, s2,s3,s4,s5,s6,s7,s8,s9,s10)

plot(s)



# Write the PTF function, sample with package lhs

func_lhs <- function(x){



###How many simulations we want?

simulations <- 1000

sLHS <-randomLHS(n=simulations,k=5) ### Draws a Latin Hypercube Sample
from a set of uniform distribution



### Transform the values into normal distribution with the mean and sd
of each variable

### sand

sim_sand <- qnorm(sLHS[,1], mean = x[[1]], sd = x[[2]])

### constrained to be between 0 and 1000

sim_sand <- ifelse(sim_sand<0, 0, ifelse(sim_sand>1000, 1000, sim_sand))

### silt

sim_silt <- qnorm(sLHS[,2], mean = x[[3]], sd = x[[4]])

### constrained to be between 0 and 1000

sim_silt <- ifelse(sim_silt<0, 0, ifelse(sim_silt>1000, 1000, sim_silt))

### SOC

sim_soc <- qnorm(sLHS[,3], mean = x[[5]], sd = x[[6]])

### constrained to be between 0 and 1000

sim_soc <- ifelse(sim_soc<0, 0, ifelse(sim_soc>1000, 1000, sim_soc))

### CEC

sim_cec <- qnorm(sLHS[,4], mean = x[[7]], sd = x[[8]])

### constrained to be > 0

sim_cec <- ifelse(sim_cec<0, 0, sim_cec)

### pH

sim_ph <- qnorm(sLHS[,5], mean = x[[9]], sd = x[[10]])

### constrained to be betweeen 0 and 14

sim_ph <- ifelse(sim_ph<0, 0, ifelse(sim_ph>14, 14, sim_ph))



### Apply the PTF

### Define the functions



### Because from the simulations we cannot guarantee that the sum of
texture fractions is 1000, we calculate clay as the difference 1000 - (sand
+ silt)

sim_clay <- 1000 - (sim_sand + sim_silt)

### What if silt and sand sum > 1000?

sim_clay <- ifelse(sim_clay <0, 0, sim_clay)



### To calculate the first parameter, SMres, the residual water content
(cm³ cm-3), we need sand content, as %

### GSM has sand as g/kg - So the 2% by Toth et al is 20 g/kg here

SMres <- ifelse(is.na(sim_sand), NA, ifelse(sim_sand >= 20, 0.041,
0.179))



### Now for SMs, the saturated water content (cm³ cm-3)

### SMs = 0.5056 - 0.1437 * (1/(OC+1)) + 0.0004152 * Si

SMsat <- 0.5056 - (0.1437 * (1/((0.1* sim_soc)+1))) + (0.0004152 *
sim_silt * 0.1)



 the parameter alpha

 log10(alpha) = -1.3050 - 0.0006123 * Si - 0.009810 * Cl + 0.07611
* (1/(OC+1)) - 0.0004508 * Si * Cl + 0.03472 * Cl * (1/(OC+1)) - 0.01226 *
Si * (1/(OC+1))


Re: [R-sig-Geo] Kriging with uncertain data

2016-10-11 Thread Santiago Beguería Portugués
Hi,

Thank you for your suggestions. I gave the gstat with weights argument a try, 
but I’m not sure about the results / my implementation.

I have worked out an example with the meuse dataset so I can share it with you:

library(gstat)
library(sp)

data(meuse)
coordinates(meuse) <- ~x+y

data(meuse.grid)
gridded(meuse.grid) = ~x+y

For a starting, let’s fit a standard universal kriging model:

# model 1: UK

v <- variogram(log(zinc)~sqrt(dist)+x+y, meuse)
var1 <- fit.variogram(v, vgm(1, "Sph", 700, 1))

m1 <- gstat(formula=log(zinc)~sqrt(dist), data=meuse, id='log_zinc',
model=var1)
predm1 <- predict(m1, meuse.grid)

summary(predm1@data)

Let’s now simulate some measurement variances. I will assume the standard error 
to be uniformly distributed, with values between 0.2 and 0.4 times the measured 
values: 

meuse$var <- (meuse$zinc*runif(155,0.1,0.2))^2

plot(meuse$var~meuse$zinc)
plot(log(meuse$var)~log(meuse$zinc))

Let’s now fit a weighted UK model, using measurement precisions (= 1/variance) 
as weights:

# model 2: weighted UK

m2 <- update(m1, weights=1/log(meuse$var))
predm2 <- predict(m2, meuse.grid)

If we compare the results of both models, we shall see that the variability of 
the predictions has shrinked a bit in model 2, while the kriging variances have 
increased:

summary(predm1@data)
summary(predm2@data)

The spatial distribution of the variances has also changed, reflecting the 
spatial distribution of the measurement variances.

All good so far: the uncertainty of the measurements has influenced the kriging 
model, and now we obtain slightly less certain krigged values. But this does 
not answer my question, since we have not propagated the measurement 
uncertainty to the estimated field. You can see this by comparing the 
magnitudes of model 2 kriging variances to the measurement variances:

summary(log(meuse$var))
summary(predm2@data$log_zinc.var)

Ákos’ suggestion, i.e. to krige the measured values and their variances 
separately, is a good suggestion. We can still use the weights argument to 
krige the observations, and we shall get separated measurement and kriging 
variances, which is nice. If we suspect that there is a relation between the 
measured values and their errors (as it is the case in my simulated example), 
we could even use the trigged surface of the measurement as a coverable; I 
suppose. 

What do you think about that approached? Does it make sense, is it 
statistically correct regarding the kriging assumptions?

Also, I am a bit worried about the semivariogram model fitted, since there is 
no way that I have found to incorporate the error variances of the 
measurements. There is an Err argument to the vim function in gstat, but as I 
understand it a single error variance for the whole spatial field is expected, 
i.e. it does not work with spatially varying errors.

Cheers,

Santiago



> El 7 oct 2016, a las 10:37, Edzer Pebesma  
> escribió:
> 
> If the only problem is to krige these data, the solution is pretty
> trivial; add a location specific value to the nugget; this is what
> Delhomme in 1978 coined as regression kriging [1] (kriging of regressed
> rather than observed values, using estimates + estimation errors).
> 
> An implementation is found in gstat, look up argument "weights" in
> ?gstat; you can use this argument in gstat::krige
> 
> Trickier is to infer the variogram of the underlying, unobserved
> stationary variable from your estimates + estimation errors, in
> particular when these estimation errors are rather large and/or vary
> strongly. Anyone knows a good ref to a paper that tackles that issue?
> 
> 
> [1] Delhomme, J. P. "Kriging in the hydrosciences." Advances in water
> resources 1.5 (1978): 251-266.
> 
> On 07/10/16 10:27, Santiago Beguería wrote:
>> Dear Ákos,
>> 
>> I was referring to the former: I have data with two values at each location: 
>> measured value and uncertainty of the measurement. So, each observation is 
>> in fact a statistical variate, which we can assume is Gaussian distributed. 
>> Hence, my two values are the expected (mean) and the variance of the 
>> distribution.
>> 
>> Cheers,
>> 
>> Stg
>> 
>> 
>>> El 7 oct 2016, a las 8:39, Bede-Fazekas Ákos  
>>> escribió:
>>> 
>>> Dear Santiago,
>>> 
>>> you mean you have two values at each location (observed value and 
>>> uncertainty)? Or you have an observed value that is the sum of the real 
>>> value and the observation error (uncertainty). If the last, then I think 
>>> using the gstat::krige() function is straightforward, since the result of 
>>> the function contains the variance of the prediction ("Attributes columns 
>>> contain prediction and
>>> prediction variance"; 
>>> https://cran.r-project.org/web/packages/gstat/gstat.pdf).
>>> 
>>> HTH,
>>> Ákos Bede-Fazekas
>>> Hungarian Academy of Sciences
>>> 
>>> 
>>> 
>>> 2016.10.06. 11:52 keltezéssel, Santiago Beguería írta:
 Dear R-sig-geo list