Re: [R] Unexpected values obtained when reading in data using ncdf and ncdf4

2016-05-23 Thread Louise Mair
Hi Dave, Roy and R-users,

Many thanks for your suggestions - in later correspondence Dave suggested that 
I ask the data provider to run a md5 checksum on the problem files, and compare 
their results against a md5 checksum on my copies of the files. Having done 
this, I found that the results didn't match (which they should do if the files 
were identical), and so this indicated that some corruption must have occurred 
during the file transfer.

Unfortunately we haven't discovered the source of the problem, but it was very 
helpful to learn how to compare files and identify the problem, so thanks very 
much for your help!

Best wishes,
Louise




-Original Message-
From: Roy Mendelssohn - NOAA Federal [mailto:roy.mendelss...@noaa.gov] 
Sent: den 22 april 2016 17:31
To: Louise Mair
Cc: r-help@r-project.org; David W. Pierce
Subject: Re: [R] Unexpected values obtained when reading in data using ncdf and 
ncdf4

Hi Louise:

If Dave can’t figure it out, I can give a look also.  A couple of things I 
would suggest:

1.  Don’t use the name “data” in the nc_open command, that is a reserved 
command in R and you never know what problems that can cause.

2. You are doing calculations to get set the start and count values in the 
ncvar_get commands, print those values out before you make the calls to make 
certain they are valid.

HTH,

-Roy

> On Apr 22, 2016, at 8:08 AM, David W. Pierce <dpie...@ucsd.edu> wrote:
> 
> On Fri, Apr 22, 2016 at 1:32 AM, Louise Mair <louise.m...@slu.se> wrote:
> 
>> Dear R Users,
>> 
>> I am encountering a problem when reading nc files into R using the 
>> ncdf and ncdf4 libraries. The nc files are too large to attach an 
>> example (but if someone is interested in helping out I could send a 
>> file privately via an online drive), but the code is basic:
>> 
> ​[...]​
> 
> 
> ​Hi Louise,
> 
> I'm the author of the ncdf and ncdf4 libraries. What are the details 
> -- what operating system are you running on, what version of R and the 
> netcdf library are you using?
> 
> If you make the files available to me I can take a look.
> 
> Regards,
> 
> --Dave Pierce
> ​
> 
> 
> 
> 
> 
>> for(i in 1:length(thesenames[,1])){
>>   data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""),
>> write=F)
>>   d.vars <- names(data$var)
>>   d.size <- (data$var[[length(d.vars)]])$size
>> 
>>   # Obtaining longitude and latitude values
>>   d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>>   d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>> 
>>   # Obtaining climate data values
>>   df.clim <- data.frame(rn=seq(1:length(d.lon)))
>>   for(y in 1:d.size[3]){
>> df.clim[,1+y] <- as.vector(ncvar_get(data, 
>> varid=d.vars[length(d.vars)], start=c(1,1,y),
>> count=c(d.size[1],d.size[2],1)))
>>  names(df.clim)[1+y] <- paste("y",y,sep="")  }
>>   tosummarise[,,i] <- as.matrix(df.clim[,-1]) }
>> 
>> The data are temperature or precipitation, across space and time.
>> 
>> For most of the >250 files I have, there are no problems, but for 
>> around 8 of these files, I get strange values. The data should be 
>> within a relatively narrow range, yet I get values such as 
>> -8.246508e+07  or 7.659506e+11. The particularly strange part is that 
>> these kind of values occur at regularly spaced intervals across the 
>> data, usually within a single time step.
>> 
>> I have the same problem (including the exact same strange values) 
>> when using ArcMap, yet the data provider assures me that the data 
>> look normal when using CDO (climate data operators) to view them, and 
>> that there are no missing values.
>> 
>> I realise this is very difficult to diagnose without the nc files 
>> themselves, so my questions are (1) Has anyone encountered something 
>> like this before?, (2) Is there something I am failing to specify in 
>> the code when reading in?, and (3) Is anyone interested in digging 
>> into this and willing to play around with the nc files if I make them 
>> available privately?
>> 
>> Thanks very much in advance!
>> Louise
>> 
>> 
>> 
>> 
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-gui

[R] Unexpected values obtained when reading in data using ncdf and ncdf4

2016-04-22 Thread Louise Mair
Dear R Users,

I am encountering a problem when reading nc files into R using the ncdf and 
ncdf4 libraries. The nc files are too large to attach an example (but if 
someone is interested in helping out I could send a file privately via an 
online drive), but the code is basic:

for(i in 1:length(thesenames[,1])){
   data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""), write=F)
   d.vars <- names(data$var)
   d.size <- (data$var[[length(d.vars)]])$size

   # Obtaining longitude and latitude values
   d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1), 
count=c(d.size[1],d.size[2])))
   d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1), 
count=c(d.size[1],d.size[2])))

   # Obtaining climate data values
   df.clim <- data.frame(rn=seq(1:length(d.lon)))
   for(y in 1:d.size[3]){
 df.clim[,1+y] <- as.vector(ncvar_get(data, varid=d.vars[length(d.vars)], 
start=c(1,1,y), count=c(d.size[1],d.size[2],1)))
  names(df.clim)[1+y] <- paste("y",y,sep="")  }
   tosummarise[,,i] <- as.matrix(df.clim[,-1])
}

The data are temperature or precipitation, across space and time.

For most of the >250 files I have, there are no problems, but for around 8 of 
these files, I get strange values. The data should be within a relatively 
narrow range, yet I get values such as -8.246508e+07  or  7.659506e+11. The 
particularly strange part is that these kind of values occur at regularly 
spaced intervals across the data, usually within a single time step.

I have the same problem (including the exact same strange values) when using 
ArcMap, yet the data provider assures me that the data look normal when using 
CDO (climate data operators) to view them, and that there are no missing values.

I realise this is very difficult to diagnose without the nc files themselves, 
so my questions are (1) Has anyone encountered something like this before?, (2) 
Is there something I am failing to specify in the code when reading in?, and 
(3) Is anyone interested in digging into this and willing to play around with 
the nc files if I make them available privately?

Thanks very much in advance!
Louise





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Obtaining r-squared values from phylogenetic autoregression in ape

2012-06-18 Thread Louise Mair
Hello,

I am trying to carry out a phylogenetic autoregression to test whether my
data show a phylogenetic signal, but I keep calculating bizzare R-squared
values.

My script is:

 library(ape)

 x -
R:1,GK:1)d:1,(MW:1,G:1)n:1)a:1,SPW:1)a:1,WB:1)b:1,(((SPBF:1,PBF:1)n:1,(HBF:1,SWF:1)c:1)i:1,(PE:1,((P:1,C:1)w:1,(MF:1,HF:1)l:1)b:1)c:1)x:1)a:1,((CHB:1,AD:1)a:1,BA:1,SSB:1)i:1,(HB:1,SB:1)b:1)x:1,SC:1)x:1,((BRH:1,PH:1)e:1,(WLH:1,GH:1)b:1)d:1)x:1,DBF:1)a:1)a:1,((WW:1,(OT:1,SW:1)v:1)b:1,B:1)z:1)a:1,((DS:1,GS:1)j:1,(SSS:1,LS:1,(ES:1,SS:1)a:1)b:1)e:1)a;
 treeX - read.tree(text=x)
 treeX - compute.brlen(treeX, method = Grafen)

 data -  c(4.854185,  6.008532,  6.221286,  4.369945, 10.044475,
5.801292,  5.128374,  5.540995,  4.566704, 10.188250,  7.121077,
4.469329,  4.815972,  7.798617,  5.892205,  4.853027,  5.080509,
7.982360,  7.518022,  5.675702, 11.989929,  6.760587,  7.433313,
7.906303,  7.235088,  7.131338,  5.582816, 6.769775, 11.886225,  5.589256,
5.626147,  4.714369,  6.040151,  9.098583,  5.194043,  8.830687, 6.231105)
 species - c( AD,   B,BA,   BRH , C,CHB,  DBF,
DS,   ES,  G,GH,   GK,   GS,  HB,   HBF,  HF,
LS,   MF,  MW,   OT,   P,PBF,  PE,   PH,  R,
SB,SC,   SPBF, SPW,  SS,   SSB, SSS,  SW,   SWF,  WB,
WLH,  WW)
 names(data) - species

 cor.mat - vcv.phylo(treeX, cor=TRUE)
 regr - compar.cheverud(data, cor.mat)

 regr$rhohat
[1] 5.541462

 1 - var(regr$residuals)/var(data)
  [,1]
[1,] -1.333095


I don't understand why the autoregression coefficient falls outside the
interval -1 to 1, or why the calculation for obtaining an R-squared
produces a value that doesn't make sense.

Have I made a mistake in the application of this method, or have I
misunderstood when it is appropriate to use phylogenetic autoregression?

Thanks for your help.

Louise

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating polygons from scattered points

2012-03-14 Thread Louise Mair
Hello,

I have a distribution dataset for species consisting of xy coordinates at
the 1km resolution, with only presence data. So a simplified example of a
species distribution might be:

y - rbind(as.integer(rnorm(100,50,20)), as.integer(rnorm(200,100,30)),
as.integer(rnorm(100,180,15)))
x - rbind(as.integer(rnorm(200,50,20)), as.integer(rnorm(200,100,20)),
as.integer(rnorm(100,200,15)))
plot(y~x)

I would like to create polygons for each species distribution, where if an
island is present (as I have tried to show in the example), it would be a
seperate polygon, and the jagged edges of coastlines etc are maintained. I
have spent ages trying to find a package that will allow me to convert
scattered point distributions to polygons but haven't found anything that
works, the functions I have found require the data already to be in the
format where the only xy coordinates present are the outline of the
polygon.

Can anyone please recommend a function I can use here, or suggest a way of
extracting the outline points? I have tried this manually but cannot seem
to write a code that will effectively take account of jagged edges and
islands.

Thanks very much for your help,

Louise.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] chisq.test vs manual calculation - why are different results produced?

2012-02-20 Thread Louise Mair
Hello,

I am trying to fit gamma, negative exponential and inverse power functions
to a dataset, and then test whether the fit of each curve is good. To do
this I have been advised to calculate predicted values for bins of data (I
have grouped a continuous range of distances into 1km bins), and then apply
a chi-squared test. Example:

 data - data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1),
predicted=c(28, 18, 10, 5 ,3, 1, 1))

 chisq.test(data$observed, data$predicted)

Which gives:

Pearson's Chi-squared test

data:  data$observed and data$predicted
X-squared = 35, df = 25, p-value = 0.0882

Warning message:
In chisq.test(data$observed, data$predicted) :
  Chi-squared approximation may be incorrect

I understand this is due to having observed/predicted values of less than
five, however I am interested to know firstly why R uses such a large
number of degrees of freedom (when by my understanding there should only be
4 df), and secondly whether using the following manual calculation is
therefore inappropriate -

 X2 - sum(((data$observed - data$predicted)^2)/data$predicted)
 1-pchisq(X2,4)
[1] 0.04114223

If chi-squared is unsuitable, what other test can I use to determine
whether my observed and predicted data come from the same distribution? The
frequently recommended fisher's test doesn't seem to be any more
appropriate as it requires values of greater than 5 for contingency tables
larger than 2 x 2.

Thanks for your help.

Louise

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating categorical frequency tables from continuous data

2011-01-27 Thread Louise Mair

Hello,

I am working with a dataset which essentially has only one column - a 
list of distances in metres, accurate to several decimal places. eg


distance
1000
6403.124
1000
1414.214
1414.214
1000

I want to organise this into a frequency table, grouping into categories 
of 0 - 999,  1000 - 1999, 2000-2999 etc. I'd also like the rows where 
there are no data points in that category to contain 0, in order to be 
able to plot a histrogram with a linear x axis, and to statistically 
analyse differences between datasets.


I have tried table()  which doesn't group the data the way I'd like it, 
I've also tried cut() but couldn't make it work. Ideally I'd like the 
output to look something like this...


distancefrequency
0-9990
1000-1999   3
2000-2999   0
...

Any suggestions that are an improvement on doing it manually please?

Thanks in advance!

Louise

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.