Re: [R] Unexpected values obtained when reading in data using ncdf and ncdf4
Hi Dave, Roy and R-users, Many thanks for your suggestions - in later correspondence Dave suggested that I ask the data provider to run a md5 checksum on the problem files, and compare their results against a md5 checksum on my copies of the files. Having done this, I found that the results didn't match (which they should do if the files were identical), and so this indicated that some corruption must have occurred during the file transfer. Unfortunately we haven't discovered the source of the problem, but it was very helpful to learn how to compare files and identify the problem, so thanks very much for your help! Best wishes, Louise -Original Message- From: Roy Mendelssohn - NOAA Federal [mailto:roy.mendelss...@noaa.gov] Sent: den 22 april 2016 17:31 To: Louise Mair Cc: r-help@r-project.org; David W. Pierce Subject: Re: [R] Unexpected values obtained when reading in data using ncdf and ncdf4 Hi Louise: If Dave can’t figure it out, I can give a look also. A couple of things I would suggest: 1. Don’t use the name “data” in the nc_open command, that is a reserved command in R and you never know what problems that can cause. 2. You are doing calculations to get set the start and count values in the ncvar_get commands, print those values out before you make the calls to make certain they are valid. HTH, -Roy > On Apr 22, 2016, at 8:08 AM, David W. Pierce <dpie...@ucsd.edu> wrote: > > On Fri, Apr 22, 2016 at 1:32 AM, Louise Mair <louise.m...@slu.se> wrote: > >> Dear R Users, >> >> I am encountering a problem when reading nc files into R using the >> ncdf and ncdf4 libraries. The nc files are too large to attach an >> example (but if someone is interested in helping out I could send a >> file privately via an online drive), but the code is basic: >> > [...] > > > Hi Louise, > > I'm the author of the ncdf and ncdf4 libraries. What are the details > -- what operating system are you running on, what version of R and the > netcdf library are you using? > > If you make the files available to me I can take a look. > > Regards, > > --Dave Pierce > > > > > > >> for(i in 1:length(thesenames[,1])){ >> data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""), >> write=F) >> d.vars <- names(data$var) >> d.size <- (data$var[[length(d.vars)]])$size >> >> # Obtaining longitude and latitude values >> d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1), >> count=c(d.size[1],d.size[2]))) >> d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1), >> count=c(d.size[1],d.size[2]))) >> >> # Obtaining climate data values >> df.clim <- data.frame(rn=seq(1:length(d.lon))) >> for(y in 1:d.size[3]){ >> df.clim[,1+y] <- as.vector(ncvar_get(data, >> varid=d.vars[length(d.vars)], start=c(1,1,y), >> count=c(d.size[1],d.size[2],1))) >> names(df.clim)[1+y] <- paste("y",y,sep="") } >> tosummarise[,,i] <- as.matrix(df.clim[,-1]) } >> >> The data are temperature or precipitation, across space and time. >> >> For most of the >250 files I have, there are no problems, but for >> around 8 of these files, I get strange values. The data should be >> within a relatively narrow range, yet I get values such as >> -8.246508e+07 or 7.659506e+11. The particularly strange part is that >> these kind of values occur at regularly spaced intervals across the >> data, usually within a single time step. >> >> I have the same problem (including the exact same strange values) >> when using ArcMap, yet the data provider assures me that the data >> look normal when using CDO (climate data operators) to view them, and >> that there are no missing values. >> >> I realise this is very difficult to diagnose without the nc files >> themselves, so my questions are (1) Has anyone encountered something >> like this before?, (2) Is there something I am failing to specify in >> the code when reading in?, and (3) Is anyone interested in digging >> into this and willing to play around with the nc files if I make them >> available privately? >> >> Thanks very much in advance! >> Louise >> >> >> >> >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-gui
[R] Unexpected values obtained when reading in data using ncdf and ncdf4
Dear R Users, I am encountering a problem when reading nc files into R using the ncdf and ncdf4 libraries. The nc files are too large to attach an example (but if someone is interested in helping out I could send a file privately via an online drive), but the code is basic: for(i in 1:length(thesenames[,1])){ data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""), write=F) d.vars <- names(data$var) d.size <- (data$var[[length(d.vars)]])$size # Obtaining longitude and latitude values d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1), count=c(d.size[1],d.size[2]))) d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1), count=c(d.size[1],d.size[2]))) # Obtaining climate data values df.clim <- data.frame(rn=seq(1:length(d.lon))) for(y in 1:d.size[3]){ df.clim[,1+y] <- as.vector(ncvar_get(data, varid=d.vars[length(d.vars)], start=c(1,1,y), count=c(d.size[1],d.size[2],1))) names(df.clim)[1+y] <- paste("y",y,sep="") } tosummarise[,,i] <- as.matrix(df.clim[,-1]) } The data are temperature or precipitation, across space and time. For most of the >250 files I have, there are no problems, but for around 8 of these files, I get strange values. The data should be within a relatively narrow range, yet I get values such as -8.246508e+07 or 7.659506e+11. The particularly strange part is that these kind of values occur at regularly spaced intervals across the data, usually within a single time step. I have the same problem (including the exact same strange values) when using ArcMap, yet the data provider assures me that the data look normal when using CDO (climate data operators) to view them, and that there are no missing values. I realise this is very difficult to diagnose without the nc files themselves, so my questions are (1) Has anyone encountered something like this before?, (2) Is there something I am failing to specify in the code when reading in?, and (3) Is anyone interested in digging into this and willing to play around with the nc files if I make them available privately? Thanks very much in advance! Louise [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Obtaining r-squared values from phylogenetic autoregression in ape
Hello, I am trying to carry out a phylogenetic autoregression to test whether my data show a phylogenetic signal, but I keep calculating bizzare R-squared values. My script is: library(ape) x - R:1,GK:1)d:1,(MW:1,G:1)n:1)a:1,SPW:1)a:1,WB:1)b:1,(((SPBF:1,PBF:1)n:1,(HBF:1,SWF:1)c:1)i:1,(PE:1,((P:1,C:1)w:1,(MF:1,HF:1)l:1)b:1)c:1)x:1)a:1,((CHB:1,AD:1)a:1,BA:1,SSB:1)i:1,(HB:1,SB:1)b:1)x:1,SC:1)x:1,((BRH:1,PH:1)e:1,(WLH:1,GH:1)b:1)d:1)x:1,DBF:1)a:1)a:1,((WW:1,(OT:1,SW:1)v:1)b:1,B:1)z:1)a:1,((DS:1,GS:1)j:1,(SSS:1,LS:1,(ES:1,SS:1)a:1)b:1)e:1)a; treeX - read.tree(text=x) treeX - compute.brlen(treeX, method = Grafen) data - c(4.854185, 6.008532, 6.221286, 4.369945, 10.044475, 5.801292, 5.128374, 5.540995, 4.566704, 10.188250, 7.121077, 4.469329, 4.815972, 7.798617, 5.892205, 4.853027, 5.080509, 7.982360, 7.518022, 5.675702, 11.989929, 6.760587, 7.433313, 7.906303, 7.235088, 7.131338, 5.582816, 6.769775, 11.886225, 5.589256, 5.626147, 4.714369, 6.040151, 9.098583, 5.194043, 8.830687, 6.231105) species - c( AD, B,BA, BRH , C,CHB, DBF, DS, ES, G,GH, GK, GS, HB, HBF, HF, LS, MF, MW, OT, P,PBF, PE, PH, R, SB,SC, SPBF, SPW, SS, SSB, SSS, SW, SWF, WB, WLH, WW) names(data) - species cor.mat - vcv.phylo(treeX, cor=TRUE) regr - compar.cheverud(data, cor.mat) regr$rhohat [1] 5.541462 1 - var(regr$residuals)/var(data) [,1] [1,] -1.333095 I don't understand why the autoregression coefficient falls outside the interval -1 to 1, or why the calculation for obtaining an R-squared produces a value that doesn't make sense. Have I made a mistake in the application of this method, or have I misunderstood when it is appropriate to use phylogenetic autoregression? Thanks for your help. Louise [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating polygons from scattered points
Hello, I have a distribution dataset for species consisting of xy coordinates at the 1km resolution, with only presence data. So a simplified example of a species distribution might be: y - rbind(as.integer(rnorm(100,50,20)), as.integer(rnorm(200,100,30)), as.integer(rnorm(100,180,15))) x - rbind(as.integer(rnorm(200,50,20)), as.integer(rnorm(200,100,20)), as.integer(rnorm(100,200,15))) plot(y~x) I would like to create polygons for each species distribution, where if an island is present (as I have tried to show in the example), it would be a seperate polygon, and the jagged edges of coastlines etc are maintained. I have spent ages trying to find a package that will allow me to convert scattered point distributions to polygons but haven't found anything that works, the functions I have found require the data already to be in the format where the only xy coordinates present are the outline of the polygon. Can anyone please recommend a function I can use here, or suggest a way of extracting the outline points? I have tried this manually but cannot seem to write a code that will effectively take account of jagged edges and islands. Thanks very much for your help, Louise. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test vs manual calculation - why are different results produced?
Hello, I am trying to fit gamma, negative exponential and inverse power functions to a dataset, and then test whether the fit of each curve is good. To do this I have been advised to calculate predicted values for bins of data (I have grouped a continuous range of distances into 1km bins), and then apply a chi-squared test. Example: data - data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1), predicted=c(28, 18, 10, 5 ,3, 1, 1)) chisq.test(data$observed, data$predicted) Which gives: Pearson's Chi-squared test data: data$observed and data$predicted X-squared = 35, df = 25, p-value = 0.0882 Warning message: In chisq.test(data$observed, data$predicted) : Chi-squared approximation may be incorrect I understand this is due to having observed/predicted values of less than five, however I am interested to know firstly why R uses such a large number of degrees of freedom (when by my understanding there should only be 4 df), and secondly whether using the following manual calculation is therefore inappropriate - X2 - sum(((data$observed - data$predicted)^2)/data$predicted) 1-pchisq(X2,4) [1] 0.04114223 If chi-squared is unsuitable, what other test can I use to determine whether my observed and predicted data come from the same distribution? The frequently recommended fisher's test doesn't seem to be any more appropriate as it requires values of greater than 5 for contingency tables larger than 2 x 2. Thanks for your help. Louise [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating categorical frequency tables from continuous data
Hello, I am working with a dataset which essentially has only one column - a list of distances in metres, accurate to several decimal places. eg distance 1000 6403.124 1000 1414.214 1414.214 1000 I want to organise this into a frequency table, grouping into categories of 0 - 999, 1000 - 1999, 2000-2999 etc. I'd also like the rows where there are no data points in that category to contain 0, in order to be able to plot a histrogram with a linear x axis, and to statistically analyse differences between datasets. I have tried table() which doesn't group the data the way I'd like it, I've also tried cut() but couldn't make it work. Ideally I'd like the output to look something like this... distancefrequency 0-9990 1000-1999 3 2000-2999 0 ... Any suggestions that are an improvement on doing it manually please? Thanks in advance! Louise __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.