from:"Christoph Scherber"

[R] Combining two ANOVA outputs of different lengths

2007-08-10 Thread Christoph Scherber

Dear R users,

I have been trying to combine two anova outputs into one single table 
(for later publication). The outputs are of different length, and share 
only some common explanatory variables.

Using merge() or melt() (from the reshape package) did not work out.

Here are the model outputs and what I would like to have:

anova(model1)
 numDF denDF  F-value p-value
(Intercept) 174 0.063446  0.8018
days174 6.613997  0.0121
logdiv  174 1.587983  0.2116
leg 174 4.425843  0.0388

anova(model2)
  numDF denDF   F-value p-value
(Intercept)  173 165.94569  .0001
funcgr   173   7.91999  0.0063
grass173  42.16909  .0001
leg  173   4.72108  0.0330
funcgr:grass 173   8.49068  0.0047

#merge(anova(model1),anova(model2),...)

F-value 1   p-val1  F-value 2   p-value 2
(Intercept) 0.0634460.8018  165.94569   .0001
days6.6139970.0121  NA  NA
logdiv  1.5879830.2116  NA  NA
leg 4.4258430.0388  4.72108 0.033
funcgr  NA  NA  7.91999 0.0063
grass   NA  NA  42.16909.0001
funcgr:grassNA  NA  8.49068 0.0047


I would be glad if someone would have an idea of how to do this in 
principle.

I am using R 2.5.1 on Windows XP.

Thanks very much in advance!

Best wishes
Christoph









-- 
Dr. Christoph Scherber
DNPW, Agroecology
University of Goettingen
Waldweg 26
D-37073 Goettingen
Germany

phone +49(0)551 39 8807
fax   +49(0)551 39 8806

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Selecting all values smaller than X in a dataframe

2007-06-11 Thread Christoph Scherber

Dear R users,

I have a correlation matrix for a dataframe called synth, for which I 
now want to select only those cells that have correlations larger than 
+/-0.6:

synth=data.frame(x=rnorm(10,1),y=rnorm(10,2),z=rnorm(10,0.5))

w=cor(synth,use=pairwise.complete.obs)
w=as.data.frame(w)
w[,sapply(w,abs(w),,0.6)]

The problem is that using sapply with  or  doesn´t seem to work.

How could I solve this problem?

Thank you very much in advance for your help!

Best wishes
Christoph

(I am using R 2.5.0 on Windows XP).



--
Christoph Scherber
DNPW, Agroecology
University of Goettingen
Waldweg 26
D-37073 Goettingen

+49-(0)551-39-8807

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading a big file

2007-05-24 Thread Christoph Scherber

Dear Remigijus,

You should change memory allocation in Windows XP, as described in

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021

Hope this helps.

Best wishes
Christoph


--
Christoph Scherber
DNPW, Agroecology
University of Goettingen
Waldweg 26
D-37073 Goettingen

+49-(0)551-39-8807




Remigijus Lapinskas schrieb:
 Dear All,
 
 I am on WindowsXP with 512 MB of RAM, R 2.4.0, and I want to read in a
 big file mln100.txt. The file is 390MB big, it contains a column of 100 
 millions integers.
 
 mln100=scan(mln100.txt)
 Error: cannot allocate vector of size 512000 Kb
 In addition: Warning messages:
 1: Reached total allocation of 511Mb: see help(memory.size)
 2: Reached total allocation of 511Mb: see help(memory.size)
 
 In fact, I would be quite happy if I could read, say, every tenth 
 integer (line) of the file. Is it possible to do this?
 
 Cheers,
 Rem
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 .


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Chosing a subset of a non-sorted vector

2007-05-21 Thread Christoph Scherber

Dear all,

I have a tricky problem here:

I have a dataframe with biodiversity data in which suplots are a 
repeated sequence from 1 to 4 (1234,1234,...)

Now, I want to randomly pick two subplots each from each diversity level 
(DL).

The problem is that it works up to that point - but if I try to subset 
the whole dataframe, I get stuck:

DL=gl(3,4)
subplot=rep(1:4,3)
diversity.data=data.frame(DL,subplot)


subplot.sampled=NULL
for(i in 1:3)
subplot.sampled=c(subplot.sampled,sort(sample(4,2,replace=F)))

subplot.sampled
[1] 3 4 1 3 1 3
subplot[subplot.sampled]
[1] 3 4 1 3 1 3

## here comes the tricky bit:

diversity.data[subplot.sampled,]
 DL subplot
31   3
41   4
11   1
3.1  1   3
1.1  1   1
3.2  1   3

How can I select those rows of diversity.data that match the exact 
subplots in subplot.sampled?


Thank you very much for your help!

Best wishes,
Christoph

(I am using R 2.4.1 on Windows XP)


##
Christoph Scherber
DNPW, Agroecology
University of Goettingen
Waldweg 26
D-37073 Goettingen

+49-(0)551-39-8807

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lme4/lmer: P-Values from mcmc samples or chi2-tests?

2007-02-13 Thread Christoph Scherber

Dear R users,

I have now tried out several options of obtaining p-values for 
(quasi)poisson lmer models, including Markov-chain Monte Carlo sampling 
and single-term deletions with subsequent chi-square tests (although I 
am aware that the latter may be problematic).

However, I encountered several problems that can be classified as
(1) the quasipoisson lmer model does not give p-values when summary() is 
called (see below)
(2) dependence on the size of the mcmc sample
(3) lack of correspondence between different p-value estimates.

How can I proceed, left with these uncertainties in the estimations of 
the p-values?

Below is the corresponding R code with some output so that you can see 
it all for your own:

##
m1-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),primula,poisson,method=ML)
m2-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),primula,quasipoisson,method=ML)
summary(m1);summary(m2)

#m1: [...]
Fixed effects:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -0.403020.57403 -0.7021  0.48262
logpatch 0.109150.04111  2.6552  0.00793 **
loghab   0.087500.06128  1.4279  0.15331
landscape_diversity  1.023380.40604  2.5204  0.01172 *

#m2: [...] #for the quasipoisson model, no p values are shown?!
Fixed effects:
 Estimate Std. Error t value
(Intercept)  -0.4030 0.6857 -0.5877
logpatch  0.1091 0.0491  2.2228
loghab0.0875 0.0732  1.1954
landscape_diversity   1.0234 0.4850  2.1099

##

anova(m1)
#here, no p-values appear (only in the current version of lme4)

Analysis of Variance Table
 Df  Sum Sq Mean Sq
logpatch 1 11.6246 11.6246
loghab   1  6.0585  6.0585
landscape_diversity  1  6.3024  6.3024

anova(m2)
Analysis of Variance Table
 Df  Sum Sq Mean Sq
logpatch 1 11.6244 11.6244
loghab   1  6.0581  6.0581
landscape_diversity  1  6.3023  6.3023

So I am left with the p-values estimated from the poisson model; 
single-term deletion tests for each of the individual terms lead to 
different p-values; I restrict this here to m2:

##
m2a=update(m2,~.-loghab)
anova(m2,m2a,test=Chi)

Data: primula
Models:
m2a: number_pollinators ~ logpatch + landscape_diversity + (1 | site)
m2: number_pollinators ~ logpatch + loghab + landscape_diversity + (1 | 
site)
 Df AIC BIC  logLik  Chisq Chi Df Pr(Chisq)
m2a  4  84.713  91.850 -38.357
m2   5  84.834  93.755 -37.417 1.8793  1 0.1704

##
m2b=update(m2,~.-logpatch)
anova(m2,m2b,test=Chi)

Data: primula
Models:
m2b: number_pollinators ~ loghab + landscape_diversity + (1 | site)
m2: number_pollinators ~ logpatch + loghab + landscape_diversity +
m2b: (1 | site)
 Df AIC BIC  logLik Chisq Chi Df Pr(Chisq)
m2b  4  90.080  97.217 -41.040
m2   5  84.834  93.755 -37.417 7.246  1   0.007106 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##
m2c=update(m2,~.-landscape_diversity)
anova(m2,m2c,test=Chi)

Data: primula
Models:
m2c: number_pollinators ~ logpatch + loghab + (1 | site)
m2: number_pollinators ~ logpatch + loghab + landscape_diversity +
m2c: (1 | site)
 Df AIC BIC  logLik  Chisq Chi Df Pr(Chisq)
m2c  4  89.036  96.173 -40.518
m2   5  84.834  93.755 -37.417 6.2023  10.01276 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


The p-values from mcmc are:

##
markov1=mcmcsamp(m2,5000)

HPDinterval(markov1)
 lower  upper
(Intercept)  -1.394287660  0.6023229
logpatch  0.031154910  0.1906861
loghab0.002961281  0.2165285
landscape_diversity   0.245623183  1.6442544
log(site.(In))  -41.156007604 -1.6993996
attr(,Probability)
[1] 0.95

##

mcmcpvalue(as.matrix(markov1[,1])) #i.e. the p value for the intercept
[1] 0.3668
  mcmcpvalue(as.matrix(markov1[,2])) #i.e. the p-value for logpatch
[1] 0.004
  mcmcpvalue(as.matrix(markov1[,3])) #i.e. the p-value for loghab
[1] 0.0598
  mcmcpvalue(as.matrix(markov1[,4])) #i.e. the p-value for landscape.div
[1] 0.0074

If one runs the mcmcsamp function for, say, 50,000 runs, the p-values 
are slightly different (not shown here).

##here are the p-values summarized in tabular form:

(Intercept) 0.3668
logpatch0.004
loghab  0.0598
landscape_diversity 0.0074


##and from the single-term deletions:

(Intercept)N.A.
logpatch0.007106
loghab  0.1704
landscape_diversity 0.01276


## Compare this with the p-values from the poisson model:
Fixed effects:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -0.403020.57403 -0.7021  0.48262
logpatch 0.109150.04111  2.6552  0.00793 **
loghab   0.087500.06128  1.4279  0.15331
landscape_diversity  1.023380.40604  2.5204

[R] lmer and estimation of p-values: error with mcmcpvalue()

2007-02-12 Thread Christoph Scherber

Dear all,

I am currently analyzing count data from a hierarchical design, and I?ve 
tried to follow the suggestions for a correct estimation of p-values as 
discusssed at R-Wiki 
(http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-testss=lme%20and%20aov).

However, I have the problem that my model only consists of parameters 
with just 1 d.f. (intercepts, slopes), so that the mcmcpvalue function 
defined below obviously produces error messages.

How can I proceed in estimating the p-values, then?

I very much acknowledge any suggestions.

Best regards
Christoph.

##
mcmcpvalue - function(samp)
{  std - backsolve(chol(var(samp)),

 cbind(0, t(samp)) - colMeans(samp),
 transpose = TRUE)


sqdist - colSums(std * std)
sum(sqdist[-1]  sqdist[1])/nrow(samp) }

m1-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),quasipoisson)

Generalized linear mixed model fit using Laplace
Formula: number_pollinators ~ logpatch + loghab + landscape_diversity + 
 (1 | site)
Data: primula
  Family: quasipoisson(log link)
AIC   BIC logLik deviance
  84.83 93.75 -37.4274.83
Random effects:
  Groups   NameVariance Std.Dev.
  site (Intercept) 0.036592 0.19129
  Residual 1.426886 1.19452
number of obs: 44, groups: site, 15

Fixed effects:
 Estimate Std. Error t value
(Intercept)  -0.4030 0.6857 -0.5877
logpatch  0.1091 0.0491  2.2228
loghab0.0875 0.0732  1.1954
landscape_diversity   1.0234 0.4850  2.1099

Correlation of Fixed Effects:
 (Intr) lgptch loghab
logpatch 0.091
loghab  -0.637 -0.121
lndscp_dvrs -0.483 -0.098 -0.348


markov1=mcmcsamp(m1,5000)
HPDinterval(markov1)

mcmcpvalue(as.matrix(markov1)[,1])

Error in colMeans(samp) : 'x' must be an array of at least two dimensions

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer and estimation of p-values: error with mcmcpvalue()

2007-02-12 Thread Christoph Scherber

Dear Henric,

Thanks, now it works; but how reliable are these estimates? Especially 
with p-values close to 0.05 it is of course important that the range of 
the estimates is not too large. I´ve just run several simulations, each 
of which yielding sometimes quite different p-values.

Best wishes
Christoph


##
Dr. rer. nat. Christoph Scherber
DNPW, Agroecology
University of Goettingen
Waldweg 26
D-37073 Goettingen
http://wwwuser.gwdg.de/~uaoe/Agroecology.html
+49-(0)551-39-8807



Henric Nilsson (Public) schrieb:
 Den Må, 2007-02-12, 13:58 skrev Christoph Scherber:
 Dear all,

 I am currently analyzing count data from a hierarchical design, and I?ve
 tried to follow the suggestions for a correct estimation of p-values as
 discusssed at R-Wiki
 (http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-testss=lme%20and%20aov).

 However, I have the problem that my model only consists of parameters
 with just 1 d.f. (intercepts, slopes), so that the mcmcpvalue function
 defined below obviously produces error messages.

 How can I proceed in estimating the p-values, then?

 I very much acknowledge any suggestions.

 Best regards
 Christoph.

 ##
 mcmcpvalue - function(samp)
 {  std - backsolve(chol(var(samp)),

  cbind(0, t(samp)) - colMeans(samp),
  transpose = TRUE)


 sqdist - colSums(std * std)
 sum(sqdist[-1]  sqdist[1])/nrow(samp) }

 m1-lmer(number_pollinators~logpatch+loghab+landscape_diversity+(1|site),quasipoisson)

 Generalized linear mixed model fit using Laplace
 Formula: number_pollinators ~ logpatch + loghab + landscape_diversity +
  (1 | site)
 Data: primula
   Family: quasipoisson(log link)
 AIC   BIC logLik deviance
   84.83 93.75 -37.4274.83
 Random effects:
   Groups   NameVariance Std.Dev.
   site (Intercept) 0.036592 0.19129
   Residual 1.426886 1.19452
 number of obs: 44, groups: site, 15

 Fixed effects:
  Estimate Std. Error t value
 (Intercept)  -0.4030 0.6857 -0.5877
 logpatch  0.1091 0.0491  2.2228
 loghab0.0875 0.0732  1.1954
 landscape_diversity   1.0234 0.4850  2.1099

 Correlation of Fixed Effects:
  (Intr) lgptch loghab
 logpatch 0.091
 loghab  -0.637 -0.121
 lndscp_dvrs -0.483 -0.098 -0.348


 markov1=mcmcsamp(m1,5000)
 HPDinterval(markov1)

 mcmcpvalue(as.matrix(markov1)[,1])
 
 Try `mcmcpvalue(as.matrix(markov1[,1]))'.
 
 
 HTH,
 Henric
 
 
 
 Error in colMeans(samp) : 'x' must be an array of at least two dimensions

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 
 .


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Poisson example from Snedechor Cochran

2006-11-15 Thread Christoph Scherber


Dear R users,

I am trying to reproduce table 7.4.12 (page 131) from Snedechor  
Cochran (eigth edition); the example is counts of weed seeds with a 
fitted Poisson distribution, tested for goodness-of-fit using a Chi-square:


observed=c(3,17,26,16,18,9,3,5,0,1,0,0)
expected=dpois(0:11,lambda=3.020408)*98
chisq.test(observed,p=expected,rescale.p=T)

Now the problem I have is that chisq.test gives me the chi-squared value 
of roughly 8.30 (which is the value given by Snedechor  Cochran), but I 
am wondering why the warning message occurs at the end of the test.


Further, it is not clear to me how these calculations could be done 
using the full dataset of N=98 observations:


observed.full=rep(0:11,c(3,17,26,16,18,9,3,5,0,1,0,0))

What would the correct specification for a chisq.test against a poisson 
distribution look like in this case?


Thanks for your help!

Best wishes
Christoph




##
Chi-squared test for given probabilities

data:  observed
X-squared = 8.2628, df = 8, p-value = 0.4082

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(observed, p = 
expected, rescale.p = T)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Poisson example from Snedechor Cochran

2006-11-15 Thread Christoph Scherber


Dear R users,

I am trying to reproduce table 7.4.12 (page 131) from Snedechor 
Cochran (eigth edition); the example is counts of weed seeds with a
fitted Poisson distribution, tested for goodness-of-fit using a Chi-square:

observed=c(3,17,26,16,18,9,3,5,0,1,0,0)
expected=dpois(0:11,lambda=3.020408)*98
chisq.test(observed,p=expected,rescale.p=T)

Now the problem I have is that chisq.test gives me the chi-squared value
of roughly 8.30 (which is the value given by Snedechor  Cochran), but I
am wondering why the warning message occurs at the end of the test.

Further, it is not clear to me how these calculations could be done
using the full dataset of N=98 observations:

observed.full=rep(0:11,c(3,17,26,16,18,9,3,5,0,1,0,0))

What would the correct specification for a chisq.test against a poisson
distribution look like in this case?

Thanks for your help!

Best wishes
Christoph




##
Chi-squared test for given probabilities

data:  observed
X-squared = 8.2628, df = 8, p-value = 0.4082

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(observed, p =
expected, rescale.p = T)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Batch printing of existing postscript files with file names included

2006-05-10 Thread Christoph Scherber

Dear R users,

I have created a series of postscript files and I´d like to print them 
with the file name added to the printout. Is there a way of reading 
these files into R (e.g. using rimage after conversion to jpeg), adding 
the file name, and then sending the files to a windows printer?

I have already tried the rimage package, and several image batch 
conversion program, but none was so far able to do the job, unfortunately.

I´m using R 2.3.0 on Windows XP.

Thank you very much for your help!

Best regards
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Use of step() with an unbalanced ANOVA model

2006-03-10 Thread Christoph Scherber

Dear all,

Is it dangerous to use step() during model simplification when I have an 
ANOVA design that is unbalanced (i.e. there is order dependence when 
entering the terms into the full model)?

Thanks very much for your help!

Kind regards,
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] xyplot question

2005-12-06 Thread Christoph Scherber

Dear R users,

I have a question regarding the use of xyplot in the lattice() package. 
I have two factors (each with two levels), and I´d like to change the 
order of the panels in a 2x2 panel layout from the default alphabetic 
order that R uses based on the names of the factor levels.

My approach is (in principle)

xyplot(y~x|Factor1+Factor2)

Let´s assume, my factor levels for Factor1 are A and B,
and for Factor2 they´re C and D, respectively.

Now the default arrangement of my panels would be (from bottom top left 
to bottom right): BC,CA,BD,AD

What I´d like to have is BD,AC,BC,AD.

Can anyone tell me how to solve this problem easily?

I´ve read that using perm.cond and/or index.cond could solve this 
problem, but couldn´t find an appropriate example, unfortunately...

Thank you very much for your help!

Regards,
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Calculation of the median in survfit()

2005-11-18 Thread Christoph Scherber

Dear R users,

Can anyone tell me how the medians in survfit() are computed? I´ve 
looked it up in the source code (print.survfit.s version 4.19 07/09/00), 
but I´m not a programmer...;-)

Especially, I´d like to know what the pfun() function inside 
print.survfit.s() works.

The help file does describe the calculation of the median, but I´d like 
to know if and how this differs from just directly calculating the 
median of the survival times (e.g. using tapply(time,factor,median)).

I would be most grateful for any help. Thanks a lot!
Christoph

###
[I am using R 2.1.1 on WinXP with the latest version of the survival 
package]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Mean survival times

2005-11-17 Thread Christoph Scherber

Dear list,

I have data on insect survival in different cages; these have the 
following structure:

  deathtime status id cageS  F G   L S
1.5  1  1 C1  8  2 1   1 1
1.5  1  2 C1  8  2 1   1 1
   11.5  1  3 C1  8  2 1   1 1
   11.5  1  4 C1  8  2 1   1 1

There are 81 cages and each 20 individuals whose survival was followed 
over time. The columns S,F,G,L and S are experimentally manipulated 
factors thought to have an influence on survival.

Using survfit(Surv(deathtime,status)~cage) gives me the survivorship 
curves for every cage. But what I´d like to have is a mean survivorship 
value for every cage.

Obviously, using tapply (deathtime,cage,mean) gives me mean values, but 
I´d like to have a better estimate of this using a proper statistical 
model. I´ve tried a glm with poisson errors (as suggested in Crawley´s 
book, page 628), but the back-transformed estimates (using status as the 
response variable and deathtime as an offset) were totally unrealistic.

As I´m new to survival analysis, it would be great if anyone could give 
me some hints on what method would be best.

Thanks a lot!
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] trellis: positioning of key

2005-11-09 Thread Christoph Scherber

Dear R users,

Using xyplot(), how can I position the key in the *margin* outside the 
plotting area ?

My problem is that the key always overlaps with the x axis labels, no 
matter how I try to specify any of the par() arguments (e.g. oma()).

Many thanks for any suggestions!
Christoph


###
for information, here´s the code I use

par(oma=c(0,0,3,0)) ###this, I think, is what should be changed

xyplot(jitter(height,factor=1) ~ sowndiv | time2,
   groups=treatment, data=caging04051, subscripts=T,
   xlab=list(Diversity,cex=1.5),
   ylab=list(Height,cex=1.5),
   layout=c(3, 1),

   key=list(space=top,border=TRUE,
 columns=1,
 transparent=FALSE,
 points=list(pch=c(1,2),cex=2),
 lines=list(col=c(1,1), lty=c(1,2), lwd=c(1.5,1.5)),
 text=list(levels(treatment), cex=1.2)),

   par.strip.text=list(cex=1.2),
   strip = function(..., strip.names)
   strip.default(..., strip.names=c(FALSE, TRUE),
 style = 1),

   scales=list(  x=list(at=c(1, 2, 4, 8, 16, 60),
   labels=c(1,2,4,8,16,60),log=TRUE,cex=1.4,tck=0.02),
 y=list(tck=0.02)),

   panel=function(x, y, subscripts, groups){
 panel.superpose(x, y, subscripts, groups,cex=1.4)
 which - groups[subscripts]
panel.loess(x[which==C],y[which==C],lty=1)
panel.loess(x[which==H],y[which==H],lty=2)
   })

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] standard errors in summary.lm

2005-11-01 Thread Christoph Scherber

Dear R users,

If I have an aov object, how are the standard errors of the estimates in 
summary.lm calculated?

Using treatment contrasts, I would like to use the estimated differences 
in mean values (intercepts) to calculate the mean values per factor 
level, and for these mean values I´f like to use the model output to 
calculate the standard errors.

Many thanks for your help!
Regards,
Christoph.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] aggregating using several functions

2005-10-24 Thread Christoph Scherber

Dear R users,

I would like to aggregate a data frame using several functions at once 
(e.g., mean plus standard error).

How can I make this work using aggregate()?  The help file says scalar 
functions are needed; can anyone help?

Below is the code for my meanse function which I´d like to use like this:

aggregate(dataframe, list(factorA,factoB),meanse)

Thanks for your help!
Christoph





meanse-function(x,...,na.rm=T){
if(na.rm)
x-x[!is.na(x)]
m-mean(x)
s-sd(x)
l-length(x)
serr-s/sqrt(l)
return(data.frame(list(Mean=m,Std.Error=serr)))
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] dataframe (matrix) multiplication

2005-10-14 Thread Christoph Scherber

Dear R users,

Suppose I have 2 parts of a dataframe, say

ABCD
2143
3245
2154

(the real dataframe is 160 columns with each 120 rows)

and I want to multiply every element in [,A:B] with every element in [,C:D];
What is the most elegant way to do this?

I´ve been thinking of converting [,A:B] to a matrix, and then 
multiplying it with the inverse of [,C:D]; would that be correct?

The result should look like

E;F
8;3
12;10
10;4

Thanks very much for any suggestions
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lme and ordering of terms

2005-08-29 Thread Christoph Scherber

Dear R users,

When fitting a lme() object (from the nlme library), is it possible to 
test interactions *before* main effects? As I understand, R 
conventionally re-orders all terms such that highest-order interactions 
come last - but I´d like to know if it´s possible (and sensible) to 
change this ordering of terms.

I´ve tried the terms() command (from aov) but I don´t know if something 
similar exists for lme() objects.

Thanks a lot for your help!

Best wishes
Christoph

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] regression with more than one observation per x value

2005-08-16 Thread Christoph Scherber

Dear R users,

How can I do a regression analysis in R where there is more than one 
observation per x value? I tried the example in SokalRohlf (3rd edn., 
1995), page 476 ff., but I somehow couldn´t find a way to partition the 
sums of squares into linear regression, deviations from regression, 
and within-groups SS.

I tried

model1-lm(y~as.numeric(x)+as.factor(x) #with treatment contrasts


but I am sure there´s a better way around it. I would be very happy if 
anyone could give me some suggestions on this.

Best regards
Christoph.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] JGR and R 2.0.X

2005-05-04 Thread Christoph Scherber

Dear all,
I am running Windows XP with several parallel installations of R (2.0.1; 
2.1 and so on). How can I install JGR for the 2.0.1 version? I keep on 
getting error messages when trying to install it.

Best wishes
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] apply question

2005-05-02 Thread Christoph Scherber

Dear R users,
I´ve got a simple question but somehow I can´t find the solution:
I have a data frame with columns 1-5 containing one set of integer 
values, and columns 6-10 containing another set of integer values. 
Columns 6-10 contain NA´s at some places.

I now want to calculate
(1) the number of values in each row of columns 6-10 that were NA´s
(2) the sum of all values on columns 1-5 for which there were no missing 
values in the corresponding cells of columns 6-10.

Example: (let´s call the data frame data)
Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
1  2  5  2  3  NA  5  NA1  4
3  1  4  5  2  6  NA  4 NA 1
The result would then be (for the first row)
(1) There were 2 NA´s in columns 6-10.
(2) The mean of Columns 1-5 was 2+2+3=7 (because there were NA´s in the 
1st and 3rd position in rows 6-10)

So far, I know how to calculate the rowSums for the data.frame, but I 
don´t know how to condition these on the values of columns 6-10

rowSums(data[,1:5]) #that´s straightforward
apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine
But I don´t know how to select just the desired values of columns 1-5 
(as described above)

Can anyone help me? Thanks a lot in advance!
Best regards
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] glm with poisson errors

2005-04-01 Thread Christoph Scherber

Dear R users,
I have data on percentage leaf area damaged (in classes, e.g. 1%, 5%, 
10%) in plants. My two questions are:

(1) Could I use a glm with poisson errors on these data?
(2) Could I still use this glm with poisson errors after arcsine 
transformation of the data?

Thank you very much for your help!
Best regards
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] some help interpreting ANOVA results, please?

2005-02-16 Thread Christoph Scherber

Dear RenE,
Can you explain a bit more how you derive your T.SPart? That´s what I 
think is the tricky part of your analysis.

I would suggest you should try to end up with something like this:
model1-aov(SR~WasSick*Time+Error(Subject/Time)
model2-aov(SR~SC*Time+Error(Subject/Time)
This way it would be like a repeated measures ANOVA, where WasSick (or 
SC) are the primary covariates, and Time is nested within Subject.

I think the correct specification of time is crucial for the whole 
analysis. It´s like in a split-plot ANOVA, where finding the appropriate 
codings for plots of different sizes can sometimes take a very long time.

Regards,
Christoph
0) Subject, the subject identifier
1) physiological recordings, say SR (skin resistance): time series
2) a SessionPart variable (parts R1 and R2, separated in time by a pause)
3) time, T.SPart: normalised per subject and per SessionPart, so twice 0..1
4) a subjective sickness estimate (SC): time series
5) a per-subject classification: WasSick or not (available as a time series, 
but constant in time of course)


RenE J.V. Bertin wrote:
On Sun, 10 Oct 2004 19:55:41 +0200, RenE J.V. Bertin [EMAIL PROTECTED] wrote 
regarding Re:
[R] some help interpreting ANOVA results, please?
I'm would like to come back to a question I posted quite a while ago, 
concerning the analysis of data of an ongoing experiment. I have, for a given 
number of subjects:
0) Subject, the subject identifier
1) physiological recordings, say SR (skin resistance): time series
2) a SessionPart variable (parts R1 and R2, separated in time by a pause)
3) time, T.SPart: normalised per subject and per SessionPart, so twice 0..1
4) a subjective sickness estimate (SC): time series
5) a per-subject classification: WasSick or not (available as a time series, 
but constant in time of course)
I would like to make statements on whether or not sickness (measured by 4 or 5) can be deduced from the physiological recordings, e.g. something like
 

aov( SR ~ WasSick * T.SPart )
   

expecting a significant effect of time (sickness building up), of WasSick, and 
a significant interaction showing that the effect is stronger (or only 
significant) in the WasSick=TRUE subjects. A simple t.test(SR~WasSick) gives a 
significant difference, as well as t.test( SR~ (T.SPart=0.5) ) .
The problem I'm having is that WasSick (and SC) are not independent variables 
properly speaking. So I cannot do
 

aov( SR ~ WasSick * T.SPart + Error(Subject/WasSick*T.SPart) )
   

R would remove WasSick from the Error term, and do the analysis without it, giving a significant T.SPart effect and WasSick:T.SPart interaction (?), both listed under Error: Subject:T.SPart :
Error: Subject:T.SPart
   Df Sum Sq Mean Sq F value   Pr(F)
T.SPart  5  318.263.6   8.336 7.46e-07 ***
WasSick:T.SPart  5  125.525.1   3.289   0.0079 ** 
Residuals  129  984.9 7.6 

There is no trace of a WasSick effect other than in that interaction (of which 
I'm not sure it is truly one).

I have 2 questions at this point:
A) I think one could assimilate WasSick to a grouping variable (like in a clinical stdudy), forgetting it is actually an observation on the subjects. In that case, I could do
 

aov( SR ~ WasSick * T.SPart )
   

which gives me the expected two significant main effects and the significant 
interaction (which agrees with visual inspection of the data).
Is this an acceptable approach/model?
B) Should I contine putting the Subject id in an Error term, e.g.
 

aov( SR ~ WasSick + Error(Subject) )
   

WithOUT this error term, that anova gives a significant effect, confirming the 
t.test mentioned above. If I include the error term, the effect is no longer 
significant.
Is that because the model does not make sense, rather because my data are so non-normal 
that a t.test cannot be used? (?Error has a similar model, and calls it not 
particularly sensible statistically.)
I would really appreciate some more constructive comments!
Thanks,
RenE Bertin
PS: I must add that it has been suggested to try lme. I went over what docs I 
have (help and MASS 4), but these are far to specialistic for me, so I haven't 
gotten anywhere in that direction :(
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] xyplot() question

2005-02-10 Thread Christoph Scherber

Dear R Users,
I have an xyplot() where different plotting symbols are used for 
subgroups (originally used within S-Plus, but hopefully it´s also 
applicable to R users).
How can I fit separate regression lines for every subgroup? So far, I 
can only plot the overall fitted line.

The code looks like this:
trellis.device()
sps-trellis.par.get(superpose.symbol)
sps$pch-1:7
trellis.par.set(superpose.symbol,sps)
spl - trellis.par.get(superpose.line)  
ps$lty - 1:7 
trellis.par.set(superpose.line,spl) 
xyplot(a~b|factor+treatment,
groups=external,data=ownframe,layout=c(2,2),
panel=function(x,y,subscripts,...){panel.superpose(x,y,subscripts,...);panel.lmline(x,y)}) 

Thank you very much for your help!
Regards
Christioph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] panel plot with log-scaled x-axis

2005-02-08 Thread Christoph Scherber

Dear all,
Can anyone help me plotting a panel plot with log-scaled x axes?
My formula looks like this:
coplot(response~div|treatment+grass,
panel=function(x,y,...){panel.xyplot(x,y,...);panel.lmline(x,y,...)})
And I´d like to have div plotted in log scale.
Thanks very much for your help!
Best regards
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] publishing random effects from lme

2005-02-04 Thread Christoph Scherber

Hi Dieter,
Yes, I´ve tried both options. The anova(lme(...)) gives me good results 
for the fixed effects part, but what I´m specifically interested in is 
what to do with the random effects.

I have tried glmmPQL (generalized linear mixed-effects models), which 
did in fact greatly help account for heteroscedasticity, but I can´t do 
model simplification with these models (and they´re still heavily 
debated, as I read from previous postings to R Help.

How would you deal with the random effects part of the models when 
publishing results from lme?

Thanks for your help!
Christoph



###
Here are my original questions once again (with an example below):
1) What is the total variance of the random effects at each level?
(2) How can I test the significance of the variance components?
(3) Is there something like an r squared for the whole model which I 
can state?  ##it seems, there isn´t (as I learned from a previous posting

The data come from an experiment on plant performance with and without 
insecticide, with and without grasses present, and across different 
levels of plant diversity (div).

Thanks for your help!
Christoph.
lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass,
random =  ~ 1 | plotcode/treatment, na.action = na.exclude, method = ML)
Linear mixed-effects model fit by maximum likelihood
Data: NULL
  AIC  BIC  logLik
-290.4181 -268.719 152.209
Random effects:
Formula:  ~ 1 | plotcode
  (Intercept)
StdDev:  0.04176364
Formula:  ~ 1 | treatment %in% plotcode
 (Intercept)   Residual
StdDev:  0.08660458 0.00833387
Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass
  Value  Std.Error DF   t-value p-value
  (Intercept)  0.1858065 0.01858581 81  9.997225  .0001
treatment  0.0201384 0.00687832 81  2.927803  0.0044
logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073  0.0042
grass  0.0428934 0.01802506 79  2.379656  0.0197
Standardized Within-Group Residuals:
 Min  Q1 Med Q3   Max
-0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217
Number of Observations: 164
Number of Groups:
plotcode ansatz %in% plotcode
82  164


Dieter Menne wrote:
Suppose I have a linear mixed-effects model (from the package nlme) with 
nested random effects (see below); how would I present the results from 
   

the random effects part in a publication?
 

Have you tried anova(lme())?
Your asin(sqrt()) looks a bit like these are percentages of counts. The method 
is still quoted in old books, but has fallen a bit out of favor. Have you 
thought of some glm model instead (http://www.stats.ox.ac.uk/pub/MASS4/)? 

Dieter Menne
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] integration function

2005-02-04 Thread Christoph Scherber

Dear R users,
I have tried to write a function which gives the step-wise integral for 
an exponential function (moving from -3 to 3 in steps of 0.1, where the 
output for every step shall be the integral under the curve of y against x.

However, something seems to be wrong with this function; can anyone 
please help me?

x-seq(-3,3,0.1)
y-exp(x)
integral-function(z,a,b,step){
for(i in (1:((b-a)/step))){
   c-0
   c[i]-integrate(z,lower=a+(i-1)*step,upper=a+i*step)
   print(c$integral)
}}
integral(y,-3,3,0.1)
Best regards
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Split-split plot ANOVA

2005-02-03 Thread Christoph Scherber

Hi Mike,
Do you have a schematic drawing of how exactly your treatments were 
applied? In split-plot experiments, it is generally very important to 
clearly define the sequence of plot sizes, because if you don´t do this 
properly, then the output will be confusing. Checking if your degrees of 
freedom at each level are correct should give you a good idea about 
whether you´ve specified the model in the right way.

Generally, I see some problem with your model specification as you seem 
to have two (not one) treatments in some of your subplots.

If I got it right, the sequence of terms should be something like 
Block/Whole.plot/Caging/Competition/Species

at least if it´s a full split-plot.
Can you send me some more details on the design?
Regards,
Christoph

[EMAIL PROTECTED] wrote:
I have been going over and over the examples in MASS 
and the Pinheiro and Bates example, but cannot get my model 
to run correctly with either aov or lme.

Could someone give me a hand with the correct model statement?
   

It would help to see some of the things you have tried already ...
 

First a description of the design.  We are studying 
germination rates for 
various species under a variety of treaments.  This is a 
blocked split-split 
plot design.  The levels and treatments are:

Blocks:  1-6
Whole plot treatment:
  Overstory:  Yes or No
Split plot treatments:
  Caging (to protect against seed predators):  Yes or No
  Herbaceous competition (i.e., grass):  Yes or No
Split-split plot treatment:
  Tree species:  7 kinds
The response variable is Lag, which is a indication of when 
the seeds first germinated.
   

I would try somthing like
lme (fixed= Lag ~ Caging + herbaceous + tree,
data= your.data,
random= ~ 1 | Overstory/split/splitsplit)
Perhaps you want/need to add some interactions as well. Overstory, split and
splitsplit would be factors with specific levels for each of the plots,
split plots and split-split plots, respectively.
Thus what I attempted here is to separate the variables of the hierarchical
design of data gathering (which go into the random effects) and the
treatments (which go into the fixed effects).
The degrees of freedom for the fixed effects are automatically adjusted to
the correct level in the hierarchy.
Did you try that? What did not work out with it?
 

Lastly, I have unbalanced data since some treatment 
combinations never had any germination.
   

In principle, the REML estimates in lme are not effected by unbalanced data.
BUT I do not think that the missing germinations by themselves lead to an
unbalanced data set: I assume it is informative that in some treatment
combinations there was no germination. Thus, your lag there is something
close to infinity (or at least longer than you cared to wait ;-). Thus, I
would argue you have to somehow include these data points as well, otherwise
you can only make a very restricted statement of the kind: if there was
germination, this depended on such and such.
 

Since the data are highly nonnormal, I hope to do a 
permutations test on the F-values for each main effect and 
interaction in order to get my p-values.
   

As these are durations a log transformation of your response might be
enough.
Regards, Lorenz
- 
Lorenz Gygax, Dr. sc. nat.
Centre for proper housing of ruminants and pigs
Swiss Federal Veterinary Office
agroscope FAT Tänikon, CH-8356 Ettenhausen / Switzerland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] publishing random effects from lme

2005-02-02 Thread Christoph Scherber

Dear all,
Suppose I have a linear mixed-effects model (from the package nlme) with 
nested random effects (see below); how would I present the results from 
the random effects part in a publication?

Specifically, I´d like to know:
(1) What is the total variance of the random effects at each level?
(2) How can I test the significance of the variance components?
(3) Is there something like an r squared for the whole model which I 
can state?

The data come from an experiment on plant performance with and without 
insecticide, with and without grasses present, and across different 
levels of plant diversity (div).

Thanks for your help!
Christoph.
lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass,
random =  ~ 1 | plotcode/treatment, na.action = na.exclude, method = ML)
Linear mixed-effects model fit by maximum likelihood
Data: NULL
   AIC  BIC  logLik
 -290.4181 -268.719 152.209
Random effects:
Formula:  ~ 1 | plotcode
   (Intercept)
StdDev:  0.04176364
 Formula:  ~ 1 | treatment %in% plotcode
  (Intercept)   Residual
StdDev:  0.08660458 0.00833387
Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass
   Value  Std.Error DF   t-value p-value
   (Intercept)  0.1858065 0.01858581 81  9.997225  .0001
 treatment  0.0201384 0.00687832 81  2.927803  0.0044
logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073  0.0042
 grass  0.0428934 0.01802506 79  2.379656  0.0197
Standardized Within-Group Residuals:
  Min  Q1 Med Q3   Max
-0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217
Number of Observations: 164
Number of Groups:
plotcode ansatz %in% plotcode
 82  164
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] random effects in lme

2005-02-02 Thread Christoph Scherber

Dear all,
Suppose I have a linear mixed-effects model (from the package nlme) with 
nested random effects (see below); how would I present the results from 
the random effects part in a publication?

Specifically, I´d like to know:
(1) What is the total variance of the random effects at each level?
(2) How can I test the significance of the variance components?
(3) Is there something like an r squared for the whole model which I 
can state?

The data come from an experiment on plant performance with and without 
insecticide, with and without grasses present, and across different 
levels of plant diversity (div).

Thanks for your help!
Christoph.
lme(asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass,
random =  ~ 1 | plotcode/treatment, na.action = na.exclude, method = ML)
Linear mixed-effects model fit by maximum likelihood
Data: NULL
  AIC  BIC  logLik
-290.4181 -268.719 152.209
Random effects:
Formula:  ~ 1 | plotcode
  (Intercept)
StdDev:  0.04176364
Formula:  ~ 1 | treatment %in% plotcode
 (Intercept)   Residual
StdDev:  0.08660458 0.00833387
Fixed effects: asin(sqrt(response)) ~ treatment + logb(div + 1, 2) + grass
  Value  Std.Error DF   t-value p-value
  (Intercept)  0.1858065 0.01858581 81  9.997225  .0001
treatment  0.0201384 0.00687832 81  2.927803  0.0044
logb(div + 1, 2) -0.0203301 0.00690074 79 -2.946073  0.0042
grass  0.0428934 0.01802506 79  2.379656  0.0197
Standardized Within-Group Residuals:
 Min  Q1 Med Q3   Max
-0.2033155 -0.05739679 -0.00943737 0.04045958 0.3637217
Number of Observations: 164
Number of Groups:
plotcode ansatz %in% plotcode
82  164
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] self-written function

2005-01-28 Thread Christoph Scherber

Hi!
I´m using the function for a data frame, but up to now it only works 
with single vectors, such as

backsin(variable,grouping.factor)
Actually, you´re right, the ... in the function definition is not 
needed...:-)

Regards
Christoph
Erin Hodgess wrote:
Hello Christoph!
I have a question about your question, please:
In your first line of code, 
backsin - function(x,y,...){

why do you have the three dots at the end?
Also, what kind of data sets are you applying this to, please?
Data frames or Matrixes?
Thanks,
Erin
mailto: [EMAIL PROTECTED]
 

Dear all,
I´ve got a simple self-written function to calculate the mean + s.e. 
from arcsine-transformed data:

backsin-function(x,y,...){
backtransf-list()
backtransf$back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(backtransf$back,y[x!=NA],mean)
backtransf$sdback-tapply(backtransf$back,y[x!=NA],stdev)/sqrt(length(y[x!=NA])) 

backtransf
}
I would like to apply this function to whole datasets, such as
tapply(variable,list(A,B,C,D),backsin)
Of course, this doesn´t work with the way in which the backsin() 
function is specified.

Does anyone have suggestions on how I could improve my function?
Regards,
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] self-written function

2005-01-28 Thread Christoph Scherber

Hi!
OK, here are some more details on the function: My dataframe consists of 
several columns of categorical variables (lets call them A,B,C) plus a 
column with a response variable y (which is arcsine-square root 
transformed proportions)

I am now trying to write a function that automatically gives me the 
back-transformed mean values + standard errors for y for each column in 
the dataframe. Ideally, this would be something like

tapply(y,list(A,B,C,D),backtransformed.mean)
Here is the correct version of the function:
backsin-function(x,y){
backtransf-list()
back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(back,y[x!=NA],mean)
backtransf$sdback-tapply(back,y[x!=NA],std)/sqrt(length(y[x!=NA]))
backtransf
}
Regards,
Christoph
Petr Pikal wrote:
Hi
I am not sure if anybody gave you a reply, but do you think you gave 
enough detail about your data, what you expect, what you did, what 
was the response and how it differ from expected output and best of 
all ***working*** example?

BTW, what is stdev?
If you wanted to compute standard deviation sd is enough.
Cheers
Petr
On 27 Jan 2005 at 12:20, Christoph Scherber wrote:
 

Dear all,
Ive got a simple self-written function to calculate the mean + s.e.
from arcsine-transformed data:
backsin-function(x,y,...){
backtransf-list()
backtransf$back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(backtransf$back,y[x!=NA],mean)
backtransf$sdback-tapply(backtransf$back,y[x!=NA],std)/sqrt(length(y[x!=NA])) 


backtransf
   

}
I would like to apply this function to whole datasets, such as
tapply(variable,list(A,B,C,D),backsin)
Of course, this doesnt work with the way in which the backsin()
function is specified.
Does anyone have suggestions on how I could improve my function?
Regards,
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
   

Petr Pikal
[EMAIL PROTECTED]

 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] self-written function

2005-01-28 Thread Christoph Scherber

Dear Jari,
I had included the test for NA´s just to be sure the function can handle 
datasets with missing values.

Here´s an updated version without this specification:
backsin-function(x,y){
backtransf-list()
back-((sin(x))^2)*100
backtransf$mback-tapply(back,y,mean)
backtransf$sdback-tapply(back,y,sd)/sqrt(length(y))
backtransf
}

Jari Oksanen wrote:
Christoph,
I guess your NA refers to Not Available (missing, NA). In that case,
you should check it with is.na() instead of != NA. See this:
 

x - c(3,4,NA,5)
x != NA
   

[1] TRUE TRUE   NA TRUE
 

!is.na(x)
   

[1]  TRUE  TRUE FALSE  TRUE
No idea if this helps, but it was a problem with the code anyway.
cheers, jari oksanen
On Fri, 2005-01-28 at 11:04 +0100, Christoph Scherber wrote:
 

Hi!
OK, here are some more details on the function: My dataframe consists of 
several columns of categorical variables (let´s call them A,B,C) plus a 
column with a response variable y (which is arcsine-square root 
transformed proportions)

I am now trying to write a function that automatically gives me the 
back-transformed mean values + standard errors for y for each column in 
the dataframe. Ideally, this would be something like

tapply(y,list(A,B,C,D),backtransformed.mean)
Here is the correct version of the function:
backsin-function(x,y){
backtransf-list()
back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(back,y[x!=NA],mean)
backtransf$sdback-tapply(back,y[x!=NA],std)/sqrt(length(y[x!=NA]))
backtransf
}
Regards,
Christoph
Petr Pikal wrote:
   

Hi
I am not sure if anybody gave you a reply, but do you think you gave 
enough detail about your data, what you expect, what you did, what 
was the response and how it differ from expected output and best of 
all ***working*** example?

BTW, what is stdev?
If you wanted to compute standard deviation sd is enough.
Cheers
Petr
On 27 Jan 2005 at 12:20, Christoph Scherber wrote:

 

Dear all,
I´ve got a simple self-written function to calculate the mean + s.e.
   

from arcsine-transformed data:
 

backsin-function(x,y,...){
backtransf-list()
backtransf$back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(backtransf$back,y[x!=NA],mean)
backtransf$sdback-tapply(backtransf$back,y[x!=NA],std)/sqrt(length(y[x!=NA])) 

   

backtransf
  

}
I would like to apply this function to whole datasets, such as
tapply(variable,list(A,B,C,D),backsin)
Of course, this doesn´t work with the way in which the backsin()
function is specified.
Does anyone have suggestions on how I could improve my function?
Regards,
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
  

   

Petr Pikal
[EMAIL PROTECTED]


 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
   


 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] self-written function

2005-01-27 Thread Christoph Scherber

Dear all,
I´ve got a simple self-written function to calculate the mean + s.e. 
from arcsine-transformed data:

backsin-function(x,y,...){
backtransf-list()
backtransf$back-((sin(x[x!=NA]))^2)*100
backtransf$mback-tapply(backtransf$back,y[x!=NA],mean)
backtransf$sdback-tapply(backtransf$back,y[x!=NA],stdev)/sqrt(length(y[x!=NA]))
backtransf
}
I would like to apply this function to whole datasets, such as
tapply(variable,list(A,B,C,D),backsin)
Of course, this doesn´t work with the way in which the backsin() 
function is specified.

Does anyone have suggestions on how I could improve my function?
Regards,
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lme and varFunc()

2005-01-26 Thread Christoph Scherber

Dear all,
I am expecting a Poisson error distribution in my lme with 
weights=varFunc().

The weigths= varPower (form= fitted (.)) doesn´t work due to missing 
values in the response:

Problem in lme.formula(fixed = sqrt(nrmainaxes + 0...: Maximum number of iterations reached without convergence. 
Use traceback() to see the call stack

That´s why I´ve used one of my most important explanatory variables as a 
variance covariate:
weigths= varPower (form=~explanatory)
With that, it worked out properly so far.
What would your suggestion in such a case be?
Regards
Christoph

[EMAIL PROTECTED] wrote:
I am currently analyzing a dataset using lme(). The model I 
use has the following structure:

model-lme(response~Covariate+TreatmentA+TreatmentB,
  random=~1|Block/Plot,method=ML)
When I plot the residuals against the fitted values, I see a clear 
positive trend (meaning that the variance increases with the mean).

I tried to solve this issue using weights=varPower(), but it
doesn´t change the residual plot at all.
   

You are aware that you need to use something like 

weigths= varPower (form= fitted (.))
and the plot residuals using e.g.
scatter.smooth (fitted (model), resid (model, type= 'n'))
Maybe the latter should also be ok with the default pearson residuals, but I
am not sure.
Possibly a look into the following would help?
@Book{Pin:00a,
 author ={Pinheiro, Jose C and Bates, Douglas M},
 title = {Mixed-Effects Models in {S} and {S}-{P}{L}{U}{S}},
 publisher = {Springer},
 year =  {2000},
 address =   {New York}
}
 

How would you implement such a positive trend in the variance? I´ve 
tried glmmPQL (which works great with poisson errors), but 
using glmmPQL I can´t do model simplification.
   

Well, what error distribution do you have / do you expect?
Regards, Lorenz
- 
Lorenz Gygax, Dr. sc. nat.
Centre for proper housing of ruminants and pigs
Swiss Federal Veterinary Office

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lme and varFunc()

2005-01-25 Thread Christoph Scherber

Dear all,
Regarding the lme with varFunc() question I posted a few days ago: I 
have used the following two approaches:

model1-lme(response~Covariate+Block+TreatmentA+TreatmentB,random=~1|Plot/Subplot,method=ML)
model2a-update(model1,weights=varPower(form=~ fitted(.)))
model2b-update(model1,weights=varPower(form=~block))
While model2a produces an error
Problem in .C(mixed_loglik,: subroutine mixed_loglik: Missing values in argument 1 
Use traceback() to see the call stack

Model 2b seems to work fine, now.
I´m not sure why model2a doesn´t work, but using an important explanatory 
variable (block) as a variance covariate seems to do a better job (although I 
don´t really understand why)
Does anyone have an explanation for this?
Regards,
Chris.


Andrew Robinson wrote:
Dear Christoph,
what command are you using to plot the residuals?  If you use the
default residuals it will not reflect the variance model.  If you use
the argument
type=p
then you get the Pearson residuals, which will reflect the weights
model.  Try something like this:
plot(model, resid(., type = p) ~ fitted(.), abline = 0)
I hope that this helps,
Andrew
On Mon, Jan 24, 2005 at 02:28:44PM +0100, Christoph Scherber wrote:
 

Dear R users,
I am currently analyzing a dataset using lme(). The model I use has the 
following structure:

model-lme(response~Covariate+TreatmentA+TreatmentB,random=~1|Block/Plot,method=ML)
When I plot the residuals against the fitted values, I see a clear 
positive trend (meaning that the variance increases with the mean).

I tried to solve this issue using weights=varPower(), but it doesn?t 
change the residual plot at all.

How would you implement such a positive trend in the variance? I?ve 
tried glmmPQL (which works great with poisson errors), but using glmmPQL 
I can?t do model simplification.

Many thanks for your help!
Regards
Chris.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
   

 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Box-Cox / data transformation question

2005-01-25 Thread Christoph Scherber

Dear R users,
Is it reasonable to transform data (measurements of plant height) to the 
power of 1/4? I´ve used boxcox(response~A*B) and lambda was close to 0.25.

Regards,
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] comparing glmmPQL models

2005-01-24 Thread Christoph Scherber

Dear R users,

Is there a way to compare glmmPQL models differing in their fixed-effects
structure (similar to the ANOVA approach in lme) ?

Thank you very much for your help!
Chris.



This mail was sent through http://webmail.uni-jena.de

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lme and varFunc()

2005-01-24 Thread Christoph Scherber

Dear R users,
I am currently analyzing a dataset using lme(). The model I use has the 
following structure:

model-lme(response~Covariate+TreatmentA+TreatmentB,random=~1|Block/Plot,method=ML)
When I plot the residuals against the fitted values, I see a clear 
positive trend (meaning that the variance increases with the mean).

I tried to solve this issue using weights=varPower(), but it doesn´t 
change the residual plot at all.

How would you implement such a positive trend in the variance? I´ve 
tried glmmPQL (which works great with poisson errors), but using glmmPQL 
I can´t do model simplification.

Many thanks for your help!
Regards
Chris.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] mixed effects model:how to include initial conditions

2005-01-21 Thread Christoph Scherber

Dear R users,
I am analyzing a dataset on growth of plants in response to several 
factors. I am using a mixed-effects model of the following structure:

model-lme(growth~block*treatment*factor1*factor2,
random=~1|plot/treatment/initialsize)
I have measured the initial size of the plants (in 2003) and thought it 
might be sensible to include this (random) variation into the random 
effects term of the model.

Is that correct? Or should initialsize rather be included as a 
covariate into the fixed effects term, as in:

alternative-lme(growth~block*initialsize*treatment*factor1*factor2,
random=~1|plot/treatment)
I would very much appreciate any suggestions on how to analyze these 
data correctly.

Best regards
Chris.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Naming Convention

2005-01-13 Thread Christoph Scherber

Dear Reinhold,

All entries are allowed except price swap or price_swap

Of course it´s most convenient to use short names and small letters for quicker
typing.

Regards
Christoph



Quoting Hafner, Reinhold (Risklab) [EMAIL PROTECTED]:

 I was wondering whether there exists a naming convention for row and column
 names in R data frames and matrices.
 E.g: PriceSwap or PRICESWAP or PRICE.SWAP

 Many thanks.
 Reinhold


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html






This mail was sent through http://webmail.uni-jena.de

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Calculate Mean of Column Vectors?

2005-01-11 Thread Christoph Scherber

Dear Thomas,
My suggestion would be as follows:
y - rnorm(3000)
dim(y) - c(3, 1000).
#then create three column vectors out of y using cbind():
w-cbind(y[1,],y[2,],y[3,])
# and calculate the row means
for (i in 1:1000) z[i]-mean(w[i,1:3])
z
Hope I got it right!
Regards,
Christoph

Thomas Hopper wrote:
Hello,
I've got an array defined as y - rnorm(3000), dim(y) - c(3, 1000).
I'd like to produce a 1000-element vector z that is the mean of the 
corresponding elements of y (like z[1,1] - mean(y[1,1], y[2,1], 
y[3,1])), but being new to R, I'm not sure how to do this for all 
elements at once (or, at least, simply). Any help is appreciated.

Thanks,
Tom
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] coercing columns

2005-01-07 Thread Christoph Scherber

Dear all,
I have a data frame that looks like this:
c1   c2   c3
ABC
BCA
AAB
and so on;
I´d like to produce one single vector consisting of the columns c1,c2, 
c3, such that

vector=(A,B,A,B,C,A,C,A,B)
I guess it´s easy to do but I don´t know how...Can anyone help me?
Thanks a lot!
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] treatment contrasts and summary.lm

2004-12-02 Thread Christoph Scherber

Dear list members,
I have a 2-factor ANOVA where the summary.lm output looks like this 
(using treatment contrasts):

   Value Std. Error t value Pr(|t|)
 (Intercept)  0.0389  0.0220 1.7695  0.0817
as.factor(Block)1  0.0156  0.0066 2.3597  0.0215
as.factor(Block)2 -0.0018  0.0037-0.4857  0.6289
as.factor(Block)3 -0.0007  0.0026-0.2812  0.7795
  as.factor(AZ)1 -0.0066  0.0076-0.8670  0.3893
  as.factor(AZ)2  0.0064  0.0047 1.3530  0.1810
  as.factor(AZ)3 -0.0015  0.0031-0.4863  0.6284
  as.factor(AZ)4  0.0054  0.0025 2.1499  0.0355
  as.factor(AZ)5  0.0062  0.0037 1.6653  0.1009
Block has 4 levels and AZ has 6 levels. My question now is: What exactly 
do the values for AZ show? I know it´s somehow differences between 
intercepts, but it would be great if someone could tell me more on how 
exactly to interprete the output.

Thanks very much in advance!
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Dominant factors in aov?

2004-12-02 Thread Christoph Scherber

Dear Rene,
First of all, note that A,B,C,D, and E need to be declared as factors in 
the beginning, using factor() (but I think you did this already). Also, 
make sure that the data are read into R in the correct way (i.e. . 
separating decimal places).

The reason for the singularities is that B, C and D are not 
independent (in fact, they´re identical in their factor levels, and 
hence in their effect on Y).

For this reason, only the effects of A, B and E can be estimated:
  Df Sum Sq Mean Sq F valuePr(F)   
A3 302286  100762  7.9887  0.002396 **
B1 422869  422869 33.5263 4.683e-05 ***
E3  222817427  0.5888  0.632334   
Residuals   14 176583   12613 

A has 4 levels so there should be 3 d.f. (that´s correct in the table)
B has 2 levels so there is only 1 d.f. (that´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)
In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so 
that´s OK, too.

Hope this helps,
Christoph

levels(A)
[1] 0250  500  1000
 levels(B)
[1] 1 2
 levels(E)
[1] 1 2 3 4


Rene Eschen wrote:
Hi all,
I'm using R 2.0.1. for Windows to analyze the influence of following factors
on response Y:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
The dataset looks like this:
A   B   C   D   E   Y
0   1   1   1   1   491.9
0   1   1   1   2   618.7
0   1   1   1   3   448.2
0   1   1   1   4   632.9
250 1   1   1   1   92.4
250 1   1   1   2   117
250 1   1   1   3   35.5
250 1   1   1   4   102.7
500 1   1   1   1   47
500 1   1   1   2   57.4
500 1   1   1   3   6.5
500 1   1   1   4   50.9
10001   1   1   1   0.7
10001   1   1   2   6.2
10001   1   1   3   0.5
10001   1   1   4   1.1
0   2   2   2   1   6
0   2   2   2   2   4.2
0   2   2   2   3   20.3
0   2   2   2   4   3.5
250 2   2   2   1   8.4
250 2   2   2   2   2.8
etc.
If I ask the following: summary(aov(Y~A+B+C+D+E))
R gives me this answer:
  		 Df  Sum Sq Mean Sq  F value Pr(F)
A  		  3 135.602  45.201 310.2166 2e-16 ***
B  		  2   0.553   0.276   1.8976 0.1512
C  		  1   0.281   0.281   1.9264 0.1659
D  		 25  92.848   3.714  25.4890 2e-16 ***
E  		  3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146   

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?
When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical reason
for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df. 

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?
Many thanks,
René Eschen.
___
drs. René Eschen
CABI Bioscience Switzerland Centre
1 Rue des Grillons
CH-2800 Delémont
Switzerland
+41 32 421 48 87 (Direct)
+41 32 421 48 70 (Secretary)
+41 32 421 48 71 (Fax)
http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Dominant factors in aov?

2004-12-02 Thread Christoph Scherber

Dear Rene,
At least from the part of the data.frame attached to your mail, I 
assumed that C,D and E changed in identical ways (but maybe I got this 
wrong).

With your following combination of factors:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
And assuming independence of the treatment levels, you should get
3 d.f. for A
2 d.f. for B
28 d.f. for D
3 d.f. for E
? residual d.f. (how big is total number of Y values?)
The problem arises if parts of treatments B,D and E are applied to the same 
subjects, e.g.
B   D   E   Y   
1   1   1   400
2   2   2   300
2   2   3   420
2   2   4   350
(etc)
then you immediately run into problems because treatments B and D (in this 
case) change in an identical way, i.e. the variances calculated for each level 
of B and D are the same; this is what causes the ´singularities´. Errors need 
to be independent, otherwise you will have order dependence in your analyses.
i.e. the output of your aov model will change depending on the sequence in 
which the terms A,B,C,D,E are entered.
Did I get this right? It would probably help to see the full dataset
Best wishes
Christoph



Rene Eschen wrote:
Dear Christoph, 

 

The reason for the singularities is that B, C and D are not 
independent (in fact, they´re identical in their factor levels, and 
hence in their effect on Y).
   

I do not understand this. You gave the correct levels for A, B and E, but I
do not see how they are identical. They have different levels and different
codings, or is it because A has the same number of levels as E, and E shares
some of the coding with B?
René Eschen.
---
For this reason, only the effects of A, B and E can be estimated:
  Df Sum Sq Mean Sq F valuePr(F)   
A3 302286  100762  7.9887  0.002396 **
B1 422869  422869 33.5263 4.683e-05 ***
E3  222817427  0.5888  0.632334   
Residuals   14 176583   12613 

A has 4 levels so there should be 3 d.f. (that´s correct in the table)
B has 2 levels so there is only 1 d.f. (that´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)
In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so 
that´s OK, too.

Hope this helps,
Christoph

levels(A)
[1] 0250  500  1000
 levels(B)
[1] 1 2
 levels(E)
[1] 1 2 3 4


Rene Eschen wrote:
 

Hi all,
I'm using R 2.0.1. for Windows to analyze the influence of following
   

factors
 

on response Y:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
The dataset looks like this:
A   B   C   D   E   Y
0   1   1   1   1   491.9
0   1   1   1   2   618.7
0   1   1   1   3   448.2
0   1   1   1   4   632.9
250 1   1   1   1   92.4
250 1   1   1   2   117
250 1   1   1   3   35.5
250 1   1   1   4   102.7
500 1   1   1   1   47
500 1   1   1   2   57.4
500 1   1   1   3   6.5
500 1   1   1   4   50.9
10001   1   1   1   0.7
10001   1   1   2   6.2
10001   1   1   3   0.5
10001   1   1   4   1.1
0   2   2   2   1   6
0   2   2   2   2   4.2
0   2   2   2   3   20.3
0   2   2   2   4   3.5
250 2   2   2   1   8.4
250 2   2   2   2   2.8
etc.
If I ask the following: summary(aov(Y~A+B+C+D+E))
R gives me this answer:
 		 Df  Sum Sq Mean Sq  F value Pr(F)
A  		  3 135.602  45.201 310.2166 2e-16 ***
B  		  2   0.553   0.276   1.8976 0.1512
C  		  1   0.281   0.281   1.9264 0.1659
D  		 25  92.848   3.714  25.4890 2e-16 ***
E  		  3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146   

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?
When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical
   

reason
 

for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df. 

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?
Many thanks,
René Eschen.
___
drs. René Eschen
CABI Bioscience Switzerland Centre
1 Rue des Grillons
CH-2800 Delémont
Switzerland
+41 32 421 48 87 (Direct)
+41 32 421

[R] integration problem

2004-10-26 Thread Christoph Scherber

Dear R users,
I have spectral data (say, wavelength vs. extinction coefficient) for 
which I´d like to calculate an integral (i.e. the area underneath the 
curve).

Suppose the (artificial) dataset is
lambda  E
1   2
2   4
3   5
4   8
5   1
6   5
7   4
8   9
9   8
10  2

How can I calculate an integral  for these values using R?
Many thanks for any help!
Regards
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] error bars

2004-08-21 Thread Christoph Scherber

Dear all,
is there an easy way to create error bars for the following types of plots:
a) barplots
b) interaction plots
Many other statistics packages (e.g. Statistica) offer very nice 
interaction plots with error bars, and I´d like to be able to do the 
same in R.

Best regards
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] analysis of life tables

2004-08-15 Thread Christoph Scherber

Dear all,
How can I analyze a life table (e.g. for a cohort of insects) in R?
I have 20 insects in 200 cages with two different treatments, whose 
survival is followed over time, such that, e.g., in one treatment, the 
number of animals surviving is c(20,18,16,12,10,8,4,0), while in the 
other treatment the survival is c(20,20,18,18,16,15,15,14) at 8 
subsequent time intervals.

I would very much appreciate any suggestions on how to analyze such a 
dataset.

Best regards
Christoph.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] log-linear contrasts

2004-08-09 Thread Christoph Scherber

Dear list members,
suppose I have a one-way ANOVA where the explanatory variable is of 
class ordered(). How can I define, instead of the default linear, 
quadratic, cubic contrasts, a set of log-linear contrasts corresponding 
to logb(c(1,2,4,8,16,60)+1,2) ?

Regards,
Christoph
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] reading data

2004-05-05 Thread Christoph Scherber

Just one small remark:
why don´t you try (C:\\dados10.txt) ? It seems to me that the double 
\\ is important!

Cheers
Chris

Christoph Lange wrote:
(Reply to Margarida Júlia Rodrigues Igreja)
Hello!
 

When i print:
   

a-read.table(file=C:/dados10.txt)
 

The next error appears:
Error in file(file, r) : unable to open connection
In addition: Warning message:
cannot open file `C:/dados10.txt'
Can you help me?
   

Contrary to what Professor Ripley wrote, this file name is of course
totally valid under unixoid systems.
But 'C:' looks like a windows hard drive mounted under a Unix
directory named 'C:'. Usually this is done directly under '/', so just
try:
 a-read.table(file=/C:/dados10.txt)
-cl
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lm(y~x) question: removing NA´s

2004-05-04 Thread Christoph Scherber

Dear all,
I have a data frame with different numbers of NA´s in each column, e.g.:
x   y  
1  2
NA  3
NA  4
4 NA
1 5
NA NA

I now want to do a linear regression on y~x with all the NA´s removed. 
The problem now is that is.na(x) (and is.na(y) obviously gives vectors 
with different lengths. How could I solve this problem?

Thank you very much for any help.
Best regards
Chris
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] more on lm(y~x) question: removing NA´s

2004-05-04 Thread Christoph Scherber

actually, the situation is much more complicated. I am producing 
multiple graphs within a for loop. For some strange reason, the 
plotting routine always stops once lm(y~x) encounters more than one 
missing value (I have marked the important bit with ***):

par(mfrow=c(5,5))
p_seq(3,122,2)
i_0
k_0
number_0
for (i in p) {
  j_foranalysis[93:174,i+1]
  k_foranalysis[93:174,i]  
  df_data.frame(j,k)
  mainlab1_substring(names(foranalysis[i]),2,8)
  mainlab2_; corr.:
  mainlab3_round(cor(j,k,na.method=available),4)
  mainlab4_; excl.Mono:
  mainlab5_round(cor(j[j0.9],k[j0.9],na.method=available),4)
  mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5)
  plot(k,j,main=mainlab,xlab=% of total biomass,ylab=% of total 
cover,pch=n)
  for (k in 1:length(foranalysis[93:174,i])) 
number[k]_substring(plotcode[foranalysis[k,1]],1,5)
  text(foranalysis[93:174,i],foranalysis[93:174,i+1],number)
**
  model_lm(j~k,na.action=na.exclude])
**
  abline(model)
  abline(0,1,lty=2)
   }

Does anyone have any suggestions on this?
Best regards
Chris.,

Liaw, Andy wrote:
By (`factory') default that's done for you automagically, because
options(na.action) is `na.omit'.
If you really want to do it `by hand', and have the data in a data frame,
you can use something like:
lm(y ~ x, df[complete.cases(df),])
HTH,
Andy
 

From: Christoph Scherber
Dear all,
I have a data frame with different numbers of NA´s in each 
column, e.g.:

x   y  
1  2
NA  3
NA  4
4 NA
1 5
NA NA

I now want to do a linear regression on y~x with all the NA´s 
removed. 
The problem now is that is.na(x) (and is.na(y) obviously 
gives vectors 
with different lengths. How could I solve this problem?

Thank you very much for any help.
Best regards
Chris
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   


--
Notice:  This e-mail message, together with any attachments, contains information of Merck 
 Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its 
affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp  
Dohme or MSD and in Japan as Banyu) that may be confidential, proprietary copyrighted and/or 
legally privileged. It is intended solely for the use of the individual or entity named on 
this message.  If you are not the intended recipient, and have received this message in error, 
please notify us immediately by reply e-mail and then delete it from your system.
--
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] RE: more on lm(y~x) question: removing NA´s

2004-05-04 Thread Christoph Scherber

it all works fine (the regression lines fit correctly to the data) as 
long as there are not both missing values in j and k.

What suggestions would you have for this? Or, more precisely, how would 
you create multiple graphs from subsequent columns of a data.frame?


Thomas Lumley wrote:
On Tue, 4 May 2004, Liaw, Andy wrote:
 

1. If your code actually runs, you should upgrade R, and quit using `_' for
assignment... 8-)
2. You seem to have an extraneous `]' after the na.exclude.  Could that be
the problem?
   

More seriously, the for() loop over k will mess up the value of k that you
want to use for lm.
-thomas
 

Andy
   

From: Christoph Scherber
actually, the situation is much more complicated. I am producing
multiple graphs within a for loop. For some strange reason, the
plotting routine always stops once lm(y~x) encounters more than one
missing value (I have marked the important bit with ***):
par(mfrow=c(5,5))
p_seq(3,122,2)
i_0
k_0
number_0
for (i in p) {
  j_foranalysis[93:174,i+1]
  k_foranalysis[93:174,i]
  df_data.frame(j,k)
  mainlab1_substring(names(foranalysis[i]),2,8)
  mainlab2_; corr.:
  mainlab3_round(cor(j,k,na.method=available),4)
  mainlab4_; excl.Mono:
  mainlab5_round(cor(j[j0.9],k[j0.9],na.method=available),4)
  mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5)
  plot(k,j,main=mainlab,xlab=% of total biomass,ylab=% of total
cover,pch=n)
  for (k in 1:length(foranalysis[93:174,i]))
number[k]_substring(plotcode[foranalysis[k,1]],1,5)
  text(foranalysis[93:174,i],foranalysis[93:174,i+1],number)
**
  model_lm(j~k,na.action=na.exclude])
**
  abline(model)
  abline(0,1,lty=2)
   }
Does anyone have any suggestions on this?
Best regards
Chris.,

Liaw, Andy wrote:
 

By (`factory') default that's done for you automagically, because
options(na.action) is `na.omit'.
If you really want to do it `by hand', and have the data in
   

a data frame,
 

you can use something like:
lm(y ~ x, df[complete.cases(df),])
HTH,
Andy

   

From: Christoph Scherber
Dear all,
I have a data frame with different numbers of NA´s in each
column, e.g.:
x   y
1  2
NA  3
NA  4
4 NA
1 5
NA NA
I now want to do a linear regression on y~x with all the NA´s
removed.
The problem now is that is.na(x) (and is.na(y) obviously
gives vectors
with different lengths. How could I solve this problem?
Thank you very much for any help.
Best regards
Chris
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
   

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] RE: more on lm(y~x) question: removing NA´s

2004-05-04 Thread Christoph Scherber

Great!!! This works, many thanks!
**
using lsfit(x,y) instead of lm(y~x)  produces a perfectly correct output.
***

Thomas Lumley wrote:
On Tue, 4 May 2004, Christoph Scherber wrote:
 

it all works fine (the regression lines fit correctly to the data) as
long as there are not both missing values in j and k.
   

That's very strange.  The lines
for (k in 1:length(foranalysis[93:174,i]))
number[k]_substring(plotcode[foranalysis[k,1]],1,5)
should set result in k being the scalar value 81 after the loop is over.
In R (unlike S-PLUS), loop indices are just ordinary variables in the
environment where the loop is executed. I'd expect this code to work in
S-PLUS but not in R.
That loop is actually redundant, since substring() is vectorised:
number - substring(plotcode[foranalysis[93:174,1]],1,5)
should work just as well.
It's also strange that you create a data frame df from j and k but don't
use it in the lm() call (or AFAICS anywhere else).
 

What suggestions would you have for this? Or, more precisely, how would
you create multiple graphs from subsequent columns of a data.frame?
   

I'd probably use lsfit. The following is obviously not tested, since I
don't have the data (or even understand fully the data layout).
L - length(93:174)
for(i in p) {
X-foranalysis[93:174, i]
Y-foranalysis[93:174, i+1]
corr-cor(X,Y)
corrtrunc-cor(X[X0.9], Y[X0.9])
mainlab - paste(substring(names(foranalysis[i]), 2, 8),
; corr.:, corr,
;excl.Mono, corrtrunc))
   plot(X,Y,main=mainlab,
xlab=% of total biomass,ylab=% of total cover,pch=n)
number - substring(plotcode[foranalysis[1:L,1]], 1, 5)
text(X, Y, number)
model - lsfit(X,Y)
abline(model)
abline(0, 1, lty=2)
   }
-thomas
 

par(mfrow=c(5,5))
p_seq(3,122,2)
i_0
k_0
number_0
for (i in p) {
 j_foranalysis[93:174,i+1]
 k_foranalysis[93:174,i]
 df_data.frame(j,k)
 mainlab1_substring(names(foranalysis[i]),2,8)
 mainlab2_; corr.:
 mainlab3_round(cor(j,k,na.method=available),4)
 mainlab4_; excl.Mono:
 mainlab5_round(cor(j[j0.9],k[j0.9],na.method=available),4)
 mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5)
 plot(k,j,main=mainlab,xlab=% of total biomass,ylab=% of total
cover,pch=n)
 for (k in 1:length(foranalysis[93:174,i]))
number[k]_substring(plotcode[foranalysis[k,1]],1,5)
 text(foranalysis[93:174,i],foranalysis[93:174,i+1],number)
**
 model_lm(j~k,na.action=na.exclude])
**
 abline(model)
 abline(0,1,lty=2)
  }
Does anyone have any suggestions on this?
Best regards
Chris.,

Liaw, Andy wrote:

 

By (`factory') default that's done for you automagically, because
options(na.action) is `na.omit'.
If you really want to do it `by hand', and have the data in
   

a data frame,
 

you can use something like:
lm(y ~ x, df[complete.cases(df),])
HTH,
Andy


   

From: Christoph Scherber
Dear all,
I have a data frame with different numbers of NA´s in each
column, e.g.:
x   y
1  2
NA  3
NA  4
4 NA
1 5
NA NA
I now want to do a linear regression on y~x with all the NA´s
removed.
The problem now is that is.na(x) (and is.na(y) obviously
gives vectors
with different lengths. How could I solve this problem?
Thank you very much for any help.
Best regards
Chris

 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

   

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

 

   

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] quantile regression

2004-03-12 Thread Christoph Scherber

Dear colleagues,

How can I do quantile regression with R?

Best regards
Chris.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] quantile regression

2004-03-12 Thread Christoph Scherber

OK, Thank you all very much for the help!

Best regards
Chris.


Christoph Scherber wrote:

Dear colleagues,

How can I do quantile regression with R?

Best regards
Chris.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] binomial errors in split-plot design

2004-03-04 Thread Christoph Scherber

Dear all,

I have proportion data with binomial errors. The problem is that the 
whole experiment was laid out as a split-plot design.

Ideally, what I would like is having a glm with an Error term such as 
glm(y~x+Error(A/B)) but I fear this is not possible. Would using lme be 
an alternative? How could I state the variance structure, then?

I would very much appreciate any suggestions!
Best regards
Christoph
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] binomial errors in split-plot design

2004-03-04 Thread Christoph Scherber

Thanks!

And how can I then plot  a Q-Q line for model checking? qqnorm works 
fine, but I couldn´t find how to use qqline for mixed effects models of 
this type

so far, I have tried (e.g.)

qqnorm(glm1,~resid(.)|TREATMENT)

but I don´t know how to specify qqline for this

the full model is 
glm1_glmmPQL(cbindarea~BLOCK+targetweight+TREATMENT+SOWNDIV+GRASS+LEG+SHERB+THERB,random=~1|PLOTCODE/TREATMENT,family=binomial)

where categorical variables are in capital letters

Best regards,
Chris.


Prof Brian Ripley wrote:

There are several possibilities, including glmmPQL (MASS) and GLMM (lme4).
Be careful with the interpretation, as you don't have the advantages of 
balance that the classical AoV gives you.

On Thu, 4 Mar 2004, Christoph Scherber wrote:

  

I have proportion data with binomial errors. The problem is that the 
whole experiment was laid out as a split-plot design.

Ideally, what I would like is having a glm with an Error term such as 
glm(y~x+Error(A/B)) but I fear this is not possible. Would using lme be 
an alternative? How could I state the variance structure, then?



  


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] interactive graphic s

2004-03-03 Thread Christoph Scherber

Dear Ruben,

Yes, the iplots package works fine! You can download it from 
http://stats.math.uni-augsburg.de/iPlots/index.shtml

Good luck!
Chris.


[EMAIL PROTECTED] wrote:

Hi, i dont understand ¿Graphics in R are interactives or not?, I hear the
the package iplots can do it (zoom, scaling etc), is true that?, Thanks Ruben
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] zoom graphics

2004-03-03 Thread Christoph Scherber

Interactive Plots with zoom options etc. can be performed using the 
iplots library. It´s really very useful and can be downloaded from

http://stats.math.uni-augsburg.de/iPlots/index.shtml

Best regards
Chris
Barry Rowlingson wrote:

[EMAIL PROTECTED] wrote:

Hi, i don't understand how i cant zoom in and zoom out a graphics 
(plots)
exist a package for that? Thanks Ruben

 I dont think you can do it quite like you can zoom in and out in a 
program like 'photoshop'. All you can really do is redraw the plot 
with a different set of X and Y limits.

Example:

# some data
 xy=list(x=rnorm(100),y=rnorm(100))
# default plot shows all the data:
 plot(xy)
# define an interactive 'zoom' function:
zoom=function(){reg=locator(2);plot(reg,type='n')}
# click two points at the corners of the zoom region:
zoom()
# add the points - dont use 'plot' since this resets the axes:
points(xy)
 If you've got a complex plot with lots of things on it, then yes, you 
have to redraw it all again, but then you probably should have put 
that complex plot into a single function!

Barry

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] boxplot notches

2004-03-02 Thread Christoph Scherber

Dear colleagues,

I think it would be a good idea to include a short note in the R 
boxplot() help file, stating exactly how the confidence levels are 
calculated
(the notches are +/- 1.58 IQR/sqrt(n))  - at least as a guidance for 
users not advanced enough to directly interpret the code.

Would this be possible?

Regards,
Christoph.
David James wrote:

Prof Brian Ripley wrote:

On Mon, 1 Mar 2004, Martin Maechler wrote:

TL == Thomas Lumley [EMAIL PROTECTED]
on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:

TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
 Dear list members,

 Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
 calculated? What do these notches represent exactly? I´d suppose they
 are Conficence Intervals for the median, but I´ve also been told they
 might show Least Significant Difference (LSD) equivalents.
TL The help page says that
TL  If the notches of two plots do not overlap then
TL the medians are significantly different at the 5 percent level.
TL The only thing wrong with this is that it isn't true.
TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
TL so I think the claimed confidence level holds only for
TL normal distribuitons with small amounts of contamination.
I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I 
checked
it was only appproximately correct for a normal), but I would seriously
doubt if it were approximately correct for a significance level of 5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.


@article{McGi:Tuke:Lars:1978,
author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
title = {Variations of {B}ox plots},
year = {1978},
journal = {The American Statistician},
volume = {32},
pages = {12--16},
keywords = {Exploratory data analysis; Graphics}
}
@book{Cham:Clev:Klei:Tuke:1983,
author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
and Tukey, Paul A.},
title = {Graphical methods for data analysis},
year = {1983},
pages = {395},
publisher = {Wadsworth Publishing Co Inc}
}
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] boxplot notches

2004-03-02 Thread Christoph Scherber

In McGill et al. (1978) there´s a description of the calculation as 
follows (p. 16):

The widths [are] computed from the midspread or interquartile range (R) 
of the data (...), and the number of observations (N) for each group. 
The Gaussian-based asymptotic approximation (Kendall and Stuart 1967) of 
the standard deviation s of the median (M) is given by

s=1.25 R/1.35 sqrt(N)

and can be shown to be reasonably broadly applicable to other 
distributions (...)

The notch around each median can then be calculated as

M +- Cs,

where C is a constant. Should one desire a notch indicating 95 percent 
confidence interval about each median, C = 1.96 would be used (...)

It can be shown that C=1.96 would only be appropriate if the standard 
deviations of the two groups were vastly different (...) Thus, the 
notches were computed as

M+-1.7(1.25R/1.35 sqrt(N))

Hope this helps. Best regards
Chris.
REF:
McGill, R; Tukey, JW   Larsen, WA (1978) Variations of Box Plots. The 
American Statistician, Vol.32 No. 1, pp.12-16.
Kendall, MG  Stuart, A (1967): The Advanced Theory of Statistics, 
Vol.1, 2nd ed., Ch14., New York, Hafner Publishing Co.

*

Michael Friendly wrote:



I think John Tukey's idea was that this formula (or just the fact of

using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
  


It may be approximately correct for the width of a CI (and when I 
checked it was only appproximately correct for a normal), but I would 
seriously doubt if it were approximately correct for a significance 
level of 5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a 20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.
 

The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
approximations going from a 95%
confidence interval for a difference in means, to one for a difference 
in medians, using the H-spr=IQR
instead of the standard deviation:

   H-spr/1.349  \approx \sigma in a N(0,1) dist/n
   \sqrt{ \pi / 2} \approx std error of a median
  1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, 
factors for the standard error of the difference
between two means, in the cases where one variance is tiny, 
and where both are equal.

I believe this is explained in

@Article{McGill-etal:78,
 author =   R. McGill and J. W. Tukey and W. Larsen,
 year = 1978,
 title =Variations of Box Plots,
 journal =  TAS,
 volume =   32,
 pages =12--16,
}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] boxplot notches

2004-03-01 Thread Christoph Scherber

Dear list members,

Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are 
calculated? What do these notches represent exactly? I´d suppose they 
are Conficence Intervals for the median, but I´ve also been told they 
might show Least Significant Difference (LSD) equivalents.

I would very much appreciate any help from you.

Best regards
Chris.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

65 matches

Mail list logo