Re: [R] GLS models - bootstrapping

2007-02-21 Thread Christian Kamenik
Dear Lillian,

I tried to estimate parameters for time series regression using time 
series bootstrapping as described on page 434 in Davison  Hinkley 
(1997) - bootstrap methods and their application. This approach is based 
on an AR process (ARIMA model) with a regression term (compare also with 
page 414 in Venable  Ripley (2002) - modern applied statistics with S) 
I rewrote the code for R (this comes without any warranty):

fit - function( data )
{ X - cbind(rep(1,100),data$activ)
   para - list( X=X,data=data)
   assign(para,para)
   d - arima(x=para$data$temp,order=c(1,0,0),xreg=para$X)
   res - d$residuals
   res - res[!is.na(res)]
   list(paras=c(d$model$ar,d$reg.coef,sqrt(d$sigma2)),
res=res-mean(res),fit=X %*% d$reg.coef)
}
beaver.args - fit( beaver )
white.noise - function( n.sim, ts) sample(ts,size=n.sim,replace=T)
beaver.gen - function( ts, n.sim, ran.args )
{ tsb - ran.args$res
   fit - ran.args$fit
   coeff - ran.args$paras
   ts$temp - fit + coeff[4]*arima.sim( model=list(ar=coeff[1]),
 n=n.sim,rand.gen=white.noise,ts=tsb )
   ts }
new.beaver - beaver.gen( beaver, 100, beaver.args )

beaver.fun - function(ts) fit(ts)$paras
beaver.boot - tsboot( beaver, beaver.fun, R=99,sim=model,
n.sim=100,ran.gen=beaver.gen,ran.args=beaver.args)
names(beaver.boot)
beaver.boot$t0
beaver.boot$t[1:10,]

Maybe there is a more elegant way for doing this. Anyway, boot.ci should 
give you confidence intervals.

Let me know how you are doing.

Best, Christian




 From: Lillian Sandeman l.sandeman
 Date: Mon, 2 Oct 2006 13:59:09 +0100 (BST)
 
 Hello,
 
 I am have fitted GLS models to time series data. Now I wish to bootstrap
 this data to produce confidence intervals for the model.
 
 However, because this is time series data, normal bootstrapping is not
 applicable. Secondly, 'tsboot' appears to only be useful for ar models -
 and does not seem to be applicable to GLS models.
 
 I have written code in R to randomly sample blocks of the data (as in
 Davison  Hinkley's book - bootstrap methods and their application) and
 use this resampling to re-run the model, but this does not seem to be the
 correct approach since Confidence Intervals produced do not show the
 underlying pattern (cycles) in the data [even when block length is
 increased, it only picks up a little of this variation].
 
 Any help as to how to proceed with this would be greatly appreciated, as I
 cannot find anything applicable on the R pages. Alternatively, if there
 is another method to proceed with this (other than bootstrapping), I would
 also be happy to try it.
 
 Thankyou,
 
 Lillian.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Significance of Principal Coordinates

2005-03-14 Thread Christian Kamenik
Dear all,
I was looking for methods in R that allow assessing the number of  
significant principal coordinates. Unfortunatly I was not very 
successful. I expanded my search to the web and Current Contents, 
however, the information I found is very limited.
Therefore, I tried to write code for doing a randomization. I would 
highly appriciate if somebody could comment on the following approach. I 
am neither a statistician, nor an R expert... the data matrix I used has 
72 species (columns) and 167 samples (rows).

Many thanks in advance, Christian
# focus on ~80% of all the eigenvalues
nEigen - round(ncol(Data*0.8))
# Calculate Weights for Principal Coordinates Analysis
Total - apply(Data,1,sum)
Weight - round(Total/max(Total)*1000)
# Calculate Chord Distance
library(vegan)
Chord - vegdist(decostand(Data, norm), euclidean)
# Calculate Principal Coordinates, including distance matrix row weights
library(ade4)
PCoord.Eigen - dudi.pco(Chord,row.w=Weight,scann=F,full=T)$eig[1:nEigen]
# Randomization of Principal Coordinates Analysis
library(labdsv)
for (i in 1:99) {
Data.random - rndtaxa(Data,species=T,plots=T)
Total.random - apply(Data.random,1,sum)
Weight.random - round(Total.random/max(Total.random)*1000)
Chord.random - vegdist(decostand(Data.random, norm), euclidean)
PCoord.Eigen.random - 
dudi.pco(Chord.random,row.w=Weight.random,scann=F,full=T)$eig[1:nEigen]
PCoord.Eigen - cbind.data.frame(PCoord.Eigen, PCoord.Eigen.random)
}

# Plot scree diagramm with original eigenvalues and 95%-quantiles of 
eigenvalues from randomized principal coordinate analysis

plot(c(1:nEigen),PCoord.Eigen[,1],type=b)
lines(c(1:nEigen),apply(PCoord.Eigen[,-1],1,quantile,probs=c(0.95)),col=red)

Christian Kamenik
Institute of Plant Sciences
University of Bern
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Gregmisc

2005-03-10 Thread Christian Kamenik
Dear all,
I use R 2.0.1 on Windows XP professional. When I want to load the 
'Gregmisc' library I get the following error message:

Error in library(pkg, character.only = TRUE) :  'gregmisc' is not a 
valid package -- installed  2.0.0?

Can anybody tell me what's wrong with this package?
Cheers, Christian
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm and percentage data with many zero values

2005-01-29 Thread Christian Kamenik
Dear R users,
I would like to summarize the answers I got to the following question:
I am interested in correctly testing effects of continuous  
environmental variables and ordered factors on bacterial abundance.  
Bacterial abundance is derived from counts and expressed as  percentage. 
My problem is that the abundance data contain many zero  values:
Bacteria -  
c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0 
.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0 
.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0, 
0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)

First I tried transforming the data (e.g., logit) but because of the  
zeros I was not satisfied. Next I converted the percentages into  
integer values by round(Bacteria*10) or ceiling(Bacteria*10) and  
calculated a glm with a Poisson error structure; however, I am not  very 
happy with this approach because it changes the original  percentage 
data substantially (e.g., 0.03 becomes either 0 or 1). The  same is true 
for converting the percentages into factors and  calculating a 
multinomial or proportional-odds model (anyway, I do not  know if this 
would be a meaningful approach).
I was searching the web and the best answer I could get was  
http://www.biostat.wustl.edu/archives/html/s-news/1998-12/ msg00010.html 
in which several persons suggested quasi-likelihood.  Would it be 
reasonable to use a glm with quasipoisson? If yes, how I  can I find the 
appropriate variance function? Any other suggestions?

If you know the totals from which these percentages were derived,  
then transform your Bacteria back to original observations and fit a  
quasi-Poisson model with log(total) as an offset. That is:

BCount - round(tot * Bacteria)
glm(Bcount  ~ x1+ x2 + offset(log(tot)), family=quasipoisson)
cheers, jari oksanen 

I have developed an R library for specificially dealing with positive
continuous data with exact zeros.  For example, rainfall:  No rain
means exactly zero is recorded, but when rain falls, a continuous
amount is recorded (after suitable rounding).
This library--available on CRAN--is called  tweedie.  The distributions
used are Tweedie models, which belong to the EDM family and so
can be used in generalized linear models.  The Tweedie models have
a variance function  V(mu) = mu^p, for p not in the range (0, 1).
For various values of p, we have:
 Value of p  Distribution
p =0 Defined over whole real line
p=0 Normal distribution
0  p  1 No distributions exist
p=1 Poisson distribution (with phi=1)
1  p  2 Continuous over positive Y, with positive mass at Y=0
p=2 Gamma distribution
p = 2 Continuous for positive Y
p=3 Inverse Gaussian distribution
Of particular interest are the distributions such that 1  p  2, 
which can be seen as a Poisson sum of gamma random variables. They are 
continuous for Y0 with a positive probability that Y=0 exactly. For 
this reason, the Tweedie densities with 1  p  2 have been called the 
compound Poisson, compound gamma and the Poisson-gamma distribution.

In your case, percentages with exact zeros may not exactly fall into
this category because of the upper limit of 100%.  But provided there's
very few values near 100%, the Tweedie models might be worth a try.
(The data above seem to indicate few values near 100%.)
Get the  tweedie  package from CRAN, or from
http://www.sci.usq.edu.au/staff/dunn/twhtml/home.html
You will also need the  statmod  package, also available on CRAN.
All the best.
P.
--
Dr Peter Dunn  (USQ CRICOS No. 00244B)
  Web:http://www.sci.usq.edu.au/staff/dunn
  Email:  dunn @ usq.edu.au
Opinions expressed are mine, not those of USQ.  Obviously...

You might try with ZIP i.e. zero inflated poisson model. I did not 
used it, but I have such data to work on. So if there is anyone hwo 
can point how to do this in R - please. There is also a classs of ZINB 
or something like that for zero inflated negative binomial models.

Actually I just went on web and found a book from Simonoff Analyzing 
Categorical Data and there are some examples in it for ZIP et al. 
Look examples for sections 4.5 and 5.4

http://www.stern.nyu.edu/~jsimonof/AnalCatData/Splus/analcatdata.s
http://www.stern.nyu.edu/~jsimonof/AnalCatData/Splus/functions.s
--
Lep pozdrav / With regards,
Gregor GORJANC 

The ZIP model can be fitted with Jim Lindsey's function fmr 
from his gnlm library, see:

http://popgen0146uns50.unimaas.nl/~jlindsey/rcode.html
Bendix Carstensen
It turned out that the percentage data were calculated from 
concentrations resulting in positive continuous data with exact zeros. 
The Tweedie models did a fine job.

Many thanks, Christian Kamenik
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] glm and percentage data with many zero values

2005-01-20 Thread Christian Kamenik
Dear all,
I am interested in correctly testing effects of continuous environmental 
variables and ordered factors on bacterial abundance. Bacterial 
abundance is derived from counts and expressed as percentage. My problem 
is that the abundance data contain many zero values:
Bacteria - 
c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)

First I tried transforming the data (e.g., logit) but because of the 
zeros I was not satisfied. Next I converted the percentages into integer 
values by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a 
glm with a Poisson error structure; however, I am not very happy with 
this approach because it changes the original percentage data 
substantially (e.g., 0.03 becomes either 0 or 1). The same is true for 
converting the percentages into factors and calculating a multinomial or 
proportional-odds model (anyway, I do not know if this would be a 
meaningful approach).
I was searching the web and the best answer I could get was 
http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html 
in which several persons suggested quasi-likelihood. Would it be 
reasonable to use a glm with quasipoisson? If yes, how I can I find the 
appropriate variance function? Any other suggestions?

Many thanks in advance, Christian

Christian Kamenik
Institute of Plant Sciences
University of Bern
Altenbergrain 21
3013 Bern
Switzerland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html