Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Jari Oksanen

On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

 On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au wrote:
 
 A good place to start is by looking at your residuals  to determine if
 the normality assumptions are being met, if not then some form of glm
 that correctly models the residuals or a non parametric method should
 be used.
 
 
 Doing that could be very tricky indeed; I defy anyone, without knowledge of
 how the data were generated, to detect departures from normality in such a
 small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.
 
 Second, one usually considers the distribution of the response when fitting
 a GLM, not decide if residuals from an LM are non-Gaussian then move on.
 The decision to use the GLM should be motivated directly from the data and
 question to hand. Perhaps sometimes we can get away with fitting the LM,
 but that usually involves some thought, in which case one has probably
 already thought about the GLM as well.

I agree completely with Gavin. If you have four data points and fit a 
two-parameter linear model and in addition select a one-parameter exponential 
family distribution (as implied in selecting a GLM family) you don't have many 
degrees of freedom left. I don't think you get such models accepted in many 
journals. Forget the regression and get more data. Some people suggested here 
that an acceptable model could be possible if your data points are not single 
observations but means from several observations. That is true: then you can 
proceed, but consult a statistician on the way to proceed.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Gavin Simpson
I think there are actually 4 data points per level of some factor (after
seeing some of the other no-threaded emails - why can't people use emails
that preserve threads?**); but yes, either way this is a small data set and
trying to decide if residuals are normal or not is going to be nigh on
impossible.

I like the suggestion that someone made to actually do some simulation to
work out whether you have any power to detect an effect of a given size;
seems pointless doing the analysis if you conclusions would be well, I
didn't detect an effect, but I have no power so I don't even know if I
should have been able to detect an effect if one were present. You'd be in
no worse off a position then than if you hadn't run the analysis or
collected the data.

G

** He says, hoping to heck that GMail preserves the threading information...

On 23 October 2014 14:00, Jari Oksanen jari.oksa...@oulu.fi wrote:


 On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

  On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au
 wrote:
 
  A good place to start is by looking at your residuals  to determine if
  the normality assumptions are being met, if not then some form of glm
  that correctly models the residuals or a non parametric method should
  be used.
 
 
  Doing that could be very tricky indeed; I defy anyone, without knowledge
 of
  how the data were generated, to detect departures from normality in such
 a
  small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I
 mean.
 
  Second, one usually considers the distribution of the response when
 fitting
  a GLM, not decide if residuals from an LM are non-Gaussian then move on.
  The decision to use the GLM should be motivated directly from the data
 and
  question to hand. Perhaps sometimes we can get away with fitting the LM,
  but that usually involves some thought, in which case one has probably
  already thought about the GLM as well.

 I agree completely with Gavin. If you have four data points and fit a
 two-parameter linear model and in addition select a one-parameter
 exponential family distribution (as implied in selecting a GLM family) you
 don't have many degrees of freedom left. I don't think you get such models
 accepted in many journals. Forget the regression and get more data. Some
 people suggested here that an acceptable model could be possible if your
 data points are not single observations but means from several
 observations. That is true: then you can proceed, but consult a
 statistician on the way to proceed.

 Cheers, Jari Oksanen




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Lars Westerberg
Why not take the opportunity of getting to know ABC some more? Rasmus 
Bååth wrote a piece on Tiny Data and ABC which might suit your problem 
very well.

http://www.r-bloggers.com/tiny-data-approximate-bayesian-computation-and-the-socks-of-karl-broman/

Cheers
/Lars

On 2014-10-22 08:19, V. Coudrain wrote:

With such a small data set, why not simulate some data sets with  reasonable 
effect sizes and see how an analysis performs? Krzysztof

Dear Krzysztof,
It is good idea. Would you know some R functions thatis are well suited for 
this kind of simulations



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Chris Howden
A good place to start is by looking at your residuals  to determine if
the normality assumptions are being met, if not then some form of glm
that correctly models the residuals or a non parametric method should
be used.

But just as important though is considering how you intend to use your
data and exactly what it is. Irrelevant to what the statistics says if
you only have 4 datum are you really confident in making broad
generalisations with it? And writing a paper with your name on it?
Just a couple datum could change everything, particularly if the scale
isn't bounded so outliers can have a big impact. If the datum are some
form of average I would be more confident with only 4 of them, but if
they are raw values I would consider being very cautious about any
conclusions you draw.

Another reason I would be cautious of a result using only 4 datum is
that their p value estimates may be very poorly estimated. Although
not widely discussed we often use the Central limit theorem to assume
parameter estimates are normally distributed when calculating the p
value. (Because parameters can be thought of as weighted average the
CLT applies to them). With only 4 datum we can't invoke the magic of
the CLT and since there is no way to test whether the parameters are
normal we take quite a risk assuming we have accurate p values at
small sample sample sizes

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
ch...@trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

On 22 Oct 2014, at 17:20, V. Coudrain v_coudr...@voila.fr wrote:

 With such a small data set, why not simulate some data sets with  
 reasonable effect sizes and see how an analysis performs? Krzysztof

 Dear Krzysztof,
 It is good idea. Would you know some R functions thatis are well suited for 
 this kind of simulations



 ___
 Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Nicholas Hamilton
Dear All,

Please do not take any offence, I would really like to be removed from this 
mailing list, can someone let me know how this can be done.

Best Regards,

--
Nicholas Hamilton
School of Materials Science and Engineering
University of New South Wales (Australia)
--
www.ggtern.com

On 23 Oct 2014, at 10:24 am, Chris Howden ch...@trickysolutions.com.au wrote:

 A good place to start is by looking at your residuals  to determine if
 the normality assumptions are being met, if not then some form of glm
 that correctly models the residuals or a non parametric method should
 be used.
 
 But just as important though is considering how you intend to use your
 data and exactly what it is. Irrelevant to what the statistics says if
 you only have 4 datum are you really confident in making broad
 generalisations with it? And writing a paper with your name on it?
 Just a couple datum could change everything, particularly if the scale
 isn't bounded so outliers can have a big impact. If the datum are some
 form of average I would be more confident with only 4 of them, but if
 they are raw values I would consider being very cautious about any
 conclusions you draw.
 
 Another reason I would be cautious of a result using only 4 datum is
 that their p value estimates may be very poorly estimated. Although
 not widely discussed we often use the Central limit theorem to assume
 parameter estimates are normally distributed when calculating the p
 value. (Because parameters can be thought of as weighted average the
 CLT applies to them). With only 4 datum we can't invoke the magic of
 the CLT and since there is no way to test whether the parameters are
 normal we take quite a risk assuming we have accurate p values at
 small sample sample sizes
 
 Chris Howden
 Founding Partner
 Tricky Solutions
 Tricky Solutions 4 Tricky Problems
 Evidence Based Strategic Development, IP Commercialisation and
 Innovation, Data Analysis, Modelling and Training
 
 (mobile) 0410 689 945
 (fax / office)
 ch...@trickysolutions.com.au
 
 Disclaimer: The information in this email and any attachments to it are
 confidential and may contain legally privileged information. If you are not
 the named or intended recipient, please delete this communication and
 contact us immediately. Please note you are not authorised to copy,
 use or disclose this communication or any attachments without our
 consent. Although this email has been checked by anti-virus software,
 there is a risk that email messages may be corrupted or infected by
 viruses or other
 interferences. No responsibility is accepted for such interference. Unless
 expressly stated, the views of the writer are not those of the
 company. Tricky Solutions always does our best to provide accurate
 forecasts and analyses based on the data supplied, however it is
 possible that some important predictors were not included in the data
 sent to us. Information provided by us should not be solely relied
 upon when making decisions and clients should use their own judgement.
 
 On 22 Oct 2014, at 17:20, V. Coudrain v_coudr...@voila.fr wrote:
 
 With such a small data set, why not simulate some data sets with  
 reasonable effect sizes and see how an analysis performs? Krzysztof
 
 Dear Krzysztof,
 It is good idea. Would you know some R functions thatis are well suited for 
 this kind of simulations
 
 
 
 ___
 Mode, hifi, maison,� J'ach�te malin. Je compare les prix avec
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-21 Thread Krzysztof Sakrejda
With such a small data set, why not simulate some data sets with
reasonable effect sizes and see how an analysis performs?  Krzysztof

On Mon, Oct 20, 2014 at 11:53 AM, V. Coudrain v_coudr...@voila.fr wrote:
 Thank you for this helpful thought. So if I get it correctly it is hopeless 
 to try testing an interaction, but we neverless may assess if a covariate has 
 an impact, providing it is the same in all treatments.




 Message du 20/10/14 à 16h46
 De : Elgin Perry
 A : v_coudr...@voila.fr
 Copie à :
 Objet : Regression with few observations per factor level

 If it is reasonable to assume that the slope of the covariate is the same 
 for all treatments and you have numerous treatments then you can do this by 
 specifying one slope parameter for all treatments as you gave in your 
 example (e.g. lm(var ~ trt + cov)).  By combining slope information over 
 treatments, you can obtain a reasonably precise estimate.   With so few 
 observations per treatment, you will not be able to estimate separate slopes 
 for each treatment with any degree of precision (e.g. lm(var ~ trt + 
 trt:cov))


 Elgin S. Perry, Ph.D.
 Statistics Consultant
 377 Resolutions Rd.
 Colonial Beach, Va.  22443
 ph. 410.610.1473


 Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
 From: V. Coudrain  v_coudr...@voila.fr 
 To: r-sig-ecology@r-project.org
 Subject: [R-sig-eco] Regression with few observations per factor level
 Message-ID:  2127199056.738451413795221981.JavaMail.www@wwinf7128 
 Content-Type: text/plain; charset=UTF-8


 Hi, I would like to test the impact of a treatment of some variable using 
 regression (e.g. lm(var ~ trt + cov)).?
 However I only have four observations per factor level. Is it still possible 
 to apply a regression with such a small
 sample size. I think that i should be difficult to correctly estimate 
 variance.Do you think that I rather should compute
 a non-parametric test such as Kruskal-Wallis? However I need to include 
 covariables in my models and I am not sure if
 basic non-parametric tests are suitable for this. Thanks for any suggestion.
 ___
 Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec
  [[alternative HTML version deleted]]



 ___
 Mode, hifi, maison,… J'achète malin. Je compare les prix avec
 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you very much. If I get it right, the CI get wider, my test has less 
power and the probability of getting a significant relation decreases. What 
about the significant coefficients, are they reliable?




 Message du 20/10/14 à 11h30
 De : Roman Luštrik 
 A : V. Coudrain 
 Copie à : r-sig-ecology@r-project.org 
 Objet : Re: [R-sig-eco] Regression with few observations per factor level
 
 I think you can, but the confidence intervals will be rather large due to 
 number of samples.
 Notice how standard errors change for sample size (per group) from 4 to 30.
  pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), 
  rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +         
              trt = rep(c(trt1, trt2, trt3, trt4), each = pg), +      
                 cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, 
  data = my.df))
 Call:lm(formula = var ~ trt + cov, data = my.df)
 Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
 0.03332  0.66380  1.27974 
 Coefficients:            Estimate Std. Error t value Pr(|t|)    (Intercept)  
  1.2345     1.0218   1.208    0.252    trttrt2      -0.7759     0.8667  
 -0.895    0.390    trttrt3       7.8503     0.8308   9.449  1.3e-06 
 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov           1.4027   
   1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
 0.05 ‘.’ 0.1 ‘ ’ 1
 Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
 0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
 p-value: 7.467e-12
   pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 
 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +      
                trt = rep(c(trt1, trt2, trt3, trt4), each = pg), +    
                  cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, 
 data = my.df))
 Call:lm(formula = var ~ trt + cov, data = my.df)
 Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185  
 0.6423  3.2077 
 Coefficients:            Estimate Std. Error t value Pr(|t|)    (Intercept)  
 2.76961    0.25232  10.977   2e-16 ***trttrt2     -1.75490    0.28546  
 -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752   2e-16 
 ***trttrt4     27.04095    0.28286  95.599   2e-16 ***cov          0.05129   
  0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
 0.05 ‘.’ 0.1 ‘ ’ 1
 Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:  
 0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
 p-value:  2.2e-16
 On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
 Hi, I would like to test the impact of a treatment of some variable using 
 regression (e.g. lm(var ~ trt + cov)).  However I only have four observations 
 per factor level. Is it still possible to apply a regression with such a 
 small sample size. I think that i should be difficult to correctly estimate 
 variance.Do you think that I rather should compute a non-parametric test such 
 as Kruskal-Wallis? However I need to include covariables in my models and I 
 am not sure if basic non-parametric tests are suitable for this. Thanks for 
 any suggestion.
 ___
 Mode, hifi, maison,… J'achète malin. Je compare les prix avec
         [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 

 -- 
 In God we trust, all others bring data. 

___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Martin Weiser
Hi,

coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers. 

For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn -  along the
regression line - to work properly.  With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.

If you know (or can assume) that there is simple linear relationship,
you can say: slope of this relationship is such and such, but if you
want to estimate both the nature of the relationship (A *linearly*
depends on B) and its magnitude (the slope of this relationship
is ...), p-values would not help you much.

Of course, I may be wrong - I am not a statistician, just a user.

Best,
Martin W. 


V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
 Thank you very much. If I get it right, the CI get wider, my test has less 
 power and the probability of getting a significant relation decreases. What 
 about the significant coefficients, are they reliable?
 
 
 
 
  Message du 20/10/14 à 11h30
  De : Roman Luštrik 
  A : V. Coudrain 
  Copie à : r-sig-ecology@r-project.org 
  Objet : Re: [R-sig-eco] Regression with few observations per factor level
  
  I think you can, but the confidence intervals will be rather large due to 
  number of samples.
  Notice how standard errors change for sample size (per group) from 4 to 30.
   pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 
   3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +   
 trt = rep(c(trt1, trt2, trt3, trt4), each = 
   pg), + cov = runif(pg*4)) # 4 groups summary(lm(var 
   ~ trt + cov, data = my.df))
  Call:lm(formula = var ~ trt + cov, data = my.df)
  Residuals: Min   1Q   Median   3Q  Max -1.63861 -0.46080  
  0.03332  0.66380  1.27974 
  Coefficients:Estimate Std. Error t value Pr(|t|)
  (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
   0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449  
  1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov  
   1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001 
  ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
  p-value: 7.467e-12
pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 
3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
trt = rep(c(trt1, trt2, trt3, trt4), each = 
pg), + cov = runif(pg*4)) # 4 groups 
summary(lm(var ~ trt + cov, data = my.df))
  Call:lm(formula = var ~ trt + cov, data = my.df)
  Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584 -0.0185  
  0.6423  3.2077 
  Coefficients:Estimate Std. Error t value Pr(|t|)
  (Intercept)  2.769610.25232  10.977   2e-16 ***trttrt2 -1.75490
  0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752   
  2e-16 ***trttrt4 27.040950.28286  95.599   2e-16 ***cov  
  0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001 
  ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 
   0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
  p-value:  2.2e-16
  On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
  Hi, I would like to test the impact of a treatment of some variable using 
  regression (e.g. lm(var ~ trt + cov)).  However I only have four 
  observations per factor level. Is it still possible to apply a regression 
  with such a small sample size. I think that i should be difficult to 
  correctly estimate variance.Do you think that I rather should compute a 
  non-parametric test such as Kruskal-Wallis? However I need to include 
  covariables in my models and I am not sure if basic non-parametric tests 
  are suitable for this. Thanks for any suggestion.
  ___
  Mode, hifi, maison,… J'achète malin. Je compare les prix avec
  [[alternative HTML version deleted]]
  
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
  
 
  -- 
  In God we trust, all others bring data. 
 
 ___
 Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
   [[alternative HTML version

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread stephen sefick
You are more or less preforming an ANOVA/ANCOVA on your data? As pointed
out earlier, all of the normal theory regression assumptions apply.
Assuming all of those things are satisfied then if you have large
confidence intervals and there are significant differences between groups I
don't see why you couldn't correctly infer something about the treatments.
Maybe I am missing something.

Stephen

On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser weis...@natur.cuni.cz
wrote:

 Hi,

 coefficients and their p-values are reliable if your data are OK and you
 do know enough about the process that generated them, so you can choose
 appropriate model. With 4 points per line, it may be really difficult to
 identify bad fit or outliers.

 For example: simple linear regression needs constant variance of the
 normal distribution from which residuals are drawn -  along the
 regression line - to work properly.  With 4 points, you can hardly
 estimate this, but if you know enough about the process that generated
 the data, you are safe. If you do not know, it is not easy to say
 anything about the nature of the process that generated the data.

 If you know (or can assume) that there is simple linear relationship,
 you can say: slope of this relationship is such and such, but if you
 want to estimate both the nature of the relationship (A *linearly*
 depends on B) and its magnitude (the slope of this relationship
 is ...), p-values would not help you much.

 Of course, I may be wrong - I am not a statistician, just a user.

 Best,
 Martin W.


 V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
  Thank you very much. If I get it right, the CI get wider, my test has
 less power and the probability of getting a significant relation decreases.
 What about the significant coefficients, are they reliable?
 
 
 
 
   Message du 20/10/14 à 11h30
   De : Roman Luštrik
   A : V. Coudrain
   Copie à : r-sig-ecology@r-project.org
   Objet : Re: [R-sig-eco] Regression with few observations per factor
 level
  
   I think you can, but the confidence intervals will be rather large due
 to number of samples.
   Notice how standard errors change for sample size (per group) from 4
 to 30.
pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean
 = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +
trt = rep(c(trt1, trt2, trt3, trt4), each = pg),
 + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt +
 cov, data = my.df))
   Call:lm(formula = var ~ trt + cov, data = my.df)
   Residuals: Min   1Q   Median   3Q  Max -1.63861
 -0.46080  0.03332  0.66380  1.27974
   Coefficients:Estimate Std. Error t value Pr(|t|)
 (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
  0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449
 1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov
  1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001
 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
   Residual standard error: 1.154 on 11 degrees of freedomMultiple
 R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and
 11 DF,  p-value: 7.467e-12
 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg,
 mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean =
 30)), + trt = rep(c(trt1, trt2, trt3, trt4),
 each = pg), + cov = runif(pg*4)) # 4 groups
 summary(lm(var ~ trt + cov, data = my.df))
   Call:lm(formula = var ~ trt + cov, data = my.df)
   Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584
 -0.0185  0.6423  3.2077
   Coefficients:Estimate Std. Error t value Pr(|t|)
 (Intercept)  2.769610.25232  10.977   2e-16 ***trttrt2 -1.75490
 0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752  
 2e-16 ***trttrt4 27.040950.28286  95.599   2e-16 ***cov
 0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001
 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
   Residual standard error: 1.094 on 115 degrees of freedomMultiple
 R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and
 115 DF,  p-value:  2.2e-16
   On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
   Hi, I would like to test the impact of a treatment of some variable
 using regression (e.g. lm(var ~ trt + cov)).  However I only have four
 observations per factor level. Is it still possible to apply a regression
 with such a small sample size. I think that i should be difficult to
 correctly estimate variance.Do you think that I rather should compute a
 non-parametric test such as Kruskal-Wallis? However I need to include
 covariables in my models and I am not sure if basic non-parametric tests
 are suitable for this. Thanks for any suggestion.
   ___
   Mode, hifi, maison,… J'achète malin. Je compare les prix avec
   [[alternative HTML

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




 Message du 20/10/14 à 16h46
 De : Elgin Perry 
 A : v_coudr...@voila.fr
 Copie à : 
 Objet : Regression with few observations per factor level
 
 If it is reasonable to assume that the slope of the covariate is the same for 
 all treatments and you have numerous treatments then you can do this by 
 specifying one slope parameter for all treatments as you gave in your example 
 (e.g. lm(var ~ trt + cov)).  By combining slope information over treatments, 
 you can obtain a reasonably precise estimate.   With so few observations per 
 treatment, you will not be able to estimate separate slopes for each 
 treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: V. Coudrain  v_coudr...@voila.fr 
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID:  2127199056.738451413795221981.JavaMail.www@wwinf7128 
Content-Type: text/plain; charset=UTF-8


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small
sample size. I think that i should be difficult to correctly estimate 
variance.Do you think that I rather should compute
a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if
basic non-parametric tests are suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec 
 [[alternative HTML version deleted]]
  


___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Yes, but as I fear, the residuals behave badly as soon as the model get a 
little bit more complex (e.g., with two covariables or an interactions). The 
scope for performing an ANCOVA is thus very limited. That's why I was thinking 
about a potential non-parametric model. But I do not want to artificially makes 
my data tell something if it cannot.




 Message du 20/10/14 à 16h50
 De : stephen sefick 
 A : Martin Weiser 
 Copie à : V. Coudrain , r-sig-ecology 
 Objet : Re: [R-sig-eco] Regression with few observations per factor level
 
 You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out 
 earlier, all of the normal theory regression assumptions apply. Assuming all 
 of those things are satisfied then if you have large confidence intervals and 
 there are significant differences between groups I don't see why you couldn't 
 correctly infer something about the treatments. Maybe I am missing something.
 Stephen 
 On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser  wrote:
 Hi,
 
 coefficients and their p-values are reliable if your data are OK and you
 do know enough about the process that generated them, so you can choose
 appropriate model. With 4 points per line, it may be really difficult to
 identify bad fit or outliers.
 
 For example: simple linear regression needs constant variance of the
 normal distribution from which residuals are drawn -  along the
 regression line - to work properly.  With 4 points, you can hardly
 estimate this, but if you know enough about the process that generated
 the data, you are safe. If you do not know, it is not easy to say
 anything about the nature of the process that generated the data.
 
 If you know (or can assume) that there is simple linear relationship,
 you can say: slope of this relationship is such and such, but if you
 want to estimate both the nature of the relationship (A *linearly*
 depends on B) and its magnitude (the slope of this relationship
 is ...), p-values would not help you much.
 
 Of course, I may be wrong - I am not a statistician, just a user.
 
 Best,
 Martin W.
 
 
 V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
  Thank you very much. If I get it right, the CI get wider, my test has less 
  power and the probability of getting a significant relation decreases. What 
  about the significant coefficients, are they reliable?
 
 
 
 
   Message du 20/10/14 à 11h30
   De : Roman Luštrik
   A : V. Coudrain
   Copie à : r-sig-ecology@r-project.org
   Objet : Re: [R-sig-eco] Regression with few observations per factor level
  
   I think you can, but the confidence intervals will be rather large due to 
   number of samples.
   Notice how standard errors change for sample size (per group) from 4 to 
   30.
pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 
3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
                    trt = rep(c(trt1, trt2, trt3, trt4), each = 
pg), +                     cov = runif(pg*4)) # 4 groups 
summary(lm(var ~ trt + cov, data = my.df))
   Call:lm(formula = var ~ trt + cov, data = my.df)
   Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
   0.03332  0.66380  1.27974
   Coefficients:            Estimate Std. Error t value Pr(|t|)    
   (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759  
      0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449  
   1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov        
      1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 
   ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
   Residual standard error: 1.154 on 11 degrees of freedomMultiple 
   R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 
   and 11 DF,  p-value: 7.467e-12
 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean 
 = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 
 30)), +                     trt = rep(c(trt1, trt2, trt3, 
 trt4), each = pg), +                     cov = runif(pg*4)) # 4 
 groups summary(lm(var ~ trt + cov, data = my.df))
   Call:lm(formula = var ~ trt + cov, data = my.df)
   Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185 
0.6423  3.2077
   Coefficients:            Estimate Std. Error t value Pr(|t|)    
   (Intercept)  2.76961    0.25232  10.977   2e-16 ***trttrt2     -1.75490  
     0.28546  -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752   
   2e-16 ***trttrt4     27.04095    0.28286  95.599   2e-16 ***cov          
   0.05129    0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 
   ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
   Residual standard error: 1.094 on 115 degrees of freedomMultiple 
   R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 
   115 DF,  p-value:  2.2e-16
   On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
   Hi, I would like to test the impact of a treatment

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Baldwin, Jim -FS
Yes, the analysis with a small sample size would be valid (under the assumption 
that the model - both fixed and random effects are correctly specified) but at 
some point there must be a practical assessment as to the desired precision and 
the costs of the consequences of either inadequate estimates or wrong 
acceptance or rejection of hypotheses.  If it were just about the numbers from 
a sample and resulting P-values, we would only need statisticians and no 
subject-matter experts (which is clearly not the case).

And while I'm soapboxing, situations with low variability require fewer samples 
than situations with high variability.  One can't make assessments of the 
adequacy of an analysis solely on the sample size.

Jim

Jim Baldwin
Station Statistician
Pacific Southwest Research Station
USDA Forest Service

-Original Message-
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of V. Coudrain
Sent: Monday, October 20, 2014 8:54 AM
To: ElginPerry
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Regression with few observations per factor level

Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




 Message du 20/10/14 à 16h46
 De : Elgin Perry
 A : v_coudr...@voila.fr
 Copie à :
 Objet : Regression with few observations per factor level

 If it is reasonable to assume that the slope of the covariate is the
 same for all treatments and you have numerous treatments then you can
 do this by specifying one slope parameter for all treatments as you
 gave in your example (e.g. lm(var ~ trt + cov)).  By combining slope
 information over treatments, you can obtain a reasonably precise
 estimate.   With so few observations per treatment, you will not be
 able to estimate separate slopes for each treatment with any degree of
 precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: V. Coudrain  v_coudr...@voila.fr 
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID:  2127199056.738451413795221981.JavaMail.www@wwinf7128 
Content-Type: text/plain; charset=UTF-8


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small sample size. I think that i should be 
difficult to correctly estimate variance.Do you think that I rather should 
compute a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if basic non-parametric tests are 
suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec
 [[alternative HTML version deleted]]



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology