[R] plotting dnorm() issued from mclust models

2007-03-23 Thread Frederic Jean
Dear all

I have a problem in fitting lines() of the normal distributions  
identified with Mclust on a histogram or a mclust1Dplot. Here is some  
sample code to explain :

  set.seed(22)
  foo - c(rnorm(400, 10, 2), rnorm(500, 17, 4))
  mcl - Mclust(foo, G=2)
  mcl.sd - sqrt(mcl$parameters$variance$sigmasq)
  mcl.size - c(length(mcl$classification[mcl$classification==2]),  
length(mcl$classification[mcl$classification==1]))
  x - pretty(c(0:44), 100)

   my plot of histogram and lines of normal distributions
   SEEMS OK (or am I wrong ?) using frequencies :
  histA - hist(foo, breaks =c(0:44), ylim = c(0,100))
  lines(x, dnorm(x, mcl$parameters$mean[1], mcl.sd[1])*mcl.size[1],  
col =2, lw=2)
  lines(x, dnorm(x, mcl$parameters$mean[2], mcl.sd[2])*mcl.size[2],  
col =2, lw=2)

   my plot of histogram and lines of normal distributions
   IS wrong when using prob :
  mclust1Dplot(foo, parameters = mcl$parameters, z = mcl$z, what = density)
  histA - hist(foo, breaks =c(0:44), prob = T, add =T)
  lines(x, dnorm(x, mcl$parameters$mean[2], mcl.sd[2]), col =2, lw=2)
  lines(x, dnorm(x, mcl$parameters$mean[1], mcl.sd[1]), col =2, lw=2)

In second plot, the bell shaped curves are obviously too high and it  
seems that I miss something obvious in scaling dnorm()'s in building  
the second plot: I tried different things like scaling dnorm() by the  
proportion of individuals belonging to cluster 1 and 2 respectively,  
but with no success.

Could someone help to point my errors ?
Many thanks in advance

Fred J.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] significant anova but no distinct groups ?

2007-03-02 Thread Frederic Jean
Dear all,

I am studying a dataset using the aov() function.

The independant variable 'cds' is a factor() with 8 levels and here is  
the result in studying the dependant variable 'rta' with aov() :

 summary(aov(rta ~ cds))
 Df  Sum Sq Mean Sq F value  Pr(F)
cds  7 0.34713 0.04959  2.3807 0.02777
Residuals   92 1.91635 0.02083

The dependant variable 'rta' is normally distributed and variances are  
homogeneous.
But when studying the result with TukeyHSD, no differences in 'rta'  
are seen among groups of 'cds' :

 TukeyHSD(aov(rta ~ cds), which=cds)
   Tukey multiple comparisons of means
 95% family-wise confidence level

Fit: aov(formula = rta ~ cds)

$cds
  difflwrupr p adj
1-0 -0.1046092796 -0.4331100 0.22389141 0.9751178
2-0  0.0359991860 -0.1371359 0.20913425 0.9980970
3-0  0.0261665235 -0.1348524 0.18718540 0.9996165
4-0  0.0004502442 -0.1805448 0.18144531 1.000
5-0 -0.1438949939 -0.3104752 0.02268526 0.1422670
[...]
7-5  0.0621598639 -0.1027595 0.22707926 0.9386170
7-6  0.0256519274 -0.1757408 0.22704465 0.248

I tried a pairwise.t.test (holm correction) which also was not able to  
detect differences in 'rta' among groups of 'cds'
I've never been confronted to such a situation before : is it just a  
problem of power of the /a posteriori/ tests used ? Do I miss  
something important in basic stats or in R ?
How to highlight differences among 'cds' groups seen with aov() ?

Any help appreciated
Thanks in advance,

Fred J.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.