Re: [R-sig-eco] Regression with few observations per factor level

2014-10-24 Thread Jari Oksanen

On 24/10/2014, at 09:03 AM, V. Coudrain wrote:

> Thank you all for the good discussion. To recenter the debate if you are 
> interested in, my data are actually: 20 sample locations distributed across 5 
> treatments (4 locations / treatment). Each sample location has been surveyed 
> for 4 years. Thus at the end of the experiment I will have a grand total of 
> 80 samples.  To include the Year as an additional factor to treatment would 
> however increase the complexity of the model, with less DF left, even more if 
> I have to account for autocorrelation. If I consider the distribution of the 
> pooled response variable (80) points, the distributiondoes not deviate much 
> from a normal distribution, but this is not the case if I consider its 
> distribution within each treatment.
> 
Valerie,

This is a nice description of the structure of your data. When you model your 
data, you should use the same structure in your model. If you ignore some 
features of this structure, you should have good reasons for your decision. 
Reaching those decisions needs first analysing data like it is structured. 
Collapsing these data into, say, five (or four? how?) means does not solve any 
of the problems with this structure -- among other things, means ignore the 
temporal autocorrelation structure. (The temporal autocorrelation may be a more 
important aspect than Year-as-a-factor if you are absolutely uninterested in 
random years.) With averaging, you really lose degrees of freedom, and are 
easily allured to wrong conclusions. If you have five means, you can order them 
in 120 ways (and four means in 24 ways). Two of these are perfectly ordered 
(proportion 1/60 = 0.017 of all permutations of five points) , and many more 
are nearly perfectly or "significantly" ordered and trick you to think that a 
linear regression would be a good solution. With five datum points you just 
can't know.

Cheers, Jari Oksanen

PS. I hope this threading pleases Gav -- this certainly hurts all Outlook users.
> 
> 
> 
> > Message du 24/10/14 à 04h37
> > De : "Chris Howden" 
> > A : "Gavin Simpson" , "Jari Oksanen" 
> > Copie à : r-sig-ecology@r-project.org, "V. Coudrain" 
> > Objet : RE: [R-sig-eco] Regression with few observations per factor level
> > 
> >
> I don’t think the data only has 4 datum, it has more than that but some 
> factor being fit only has 4 datum / level. So it should be possible to do 
> various residual checks at the overall model level to determine if the model 
> is fitting well overall and if the normality assumptions are being fit 
> overall. However it would be quite hard to test the factors individual levels 
> to see how they are fit i.e. is this level under or over fitting, is it a 
> good or bad fit, etc.
>  
> I think we need to be very careful recommending to people they consider the 
> response and not look at the residuals though. Some people might take this to 
> mean they should look at the response, rather than consider its likely 
> distribution. There are all types of reasons a response may not look normal, 
> but the residuals will be, meaning the normality assumptions are met and the 
> model is OK. So if one does decide to start with a LM what’s the harm in 
> making it a habit of always looking at your residuals, and if they aren’t 
> normal then going from there?
>  
> All that said I still wouldn’t feel comfortable using a model with only 4 
> datum / factor level. Even if the residuals did look normal.
>  
> Chris Howden B.Sc. (Hons) GStat.
> Founding Partner
> Data Analysis, Modelling and Training
> Evidence Based Strategy/Policy Development, IP Commercialisation and 
> Innovation
> (mobile) +61 (0) 410 689 945
> (skype) chris.howden
> ch...@trickysolutions.com.au
>  
> 
> 
>  
>  
> Disclaimer: The information in this email and any attachments to it are 
> confidential and may contain legally privileged information. If you are not 
> the named or intended recipient, please delete this communication and contact 
> us immediately. Please note you are not authorised to copy, use or disclose 
> this communication or any attachments without our consent. Although this 
> email has been checked by anti-virus software, there is a risk that email 
> messages may be corrupted or infected by viruses or other interferences. No 
> responsibility is accepted for such interference. Unless expressly stated, 
> the views of the writer are not those of the company. Tricky Solutions always 
> does our best to provide accurate forecasts and analyses based on the data 
> supplied, however it is possible that some important predictors were not 
> included in the data sent to us. Information provided by us should not be 
> solely rel

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Gavin Simpson
I think there are actually 4 data points per level of some factor (after
seeing some of the other no-threaded emails - why can't people use emails
that preserve threads?**); but yes, either way this is a small data set and
trying to decide if residuals are normal or not is going to be nigh on
impossible.

I like the suggestion that someone made to actually do some simulation to
work out whether you have any power to detect an effect of a given size;
seems pointless doing the analysis if you conclusions would be "well, I
didn't detect an effect, but I have no power so I don't even know if I
should have been able to detect an effect if one were present". You'd be in
no worse off a position then than if you hadn't run the analysis or
collected the data.

G

** He says, hoping to heck that GMail preserves the threading information...

On 23 October 2014 14:00, Jari Oksanen  wrote:

>
> On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:
>
> > On 22 October 2014 17:24, Chris Howden 
> wrote:
> >
> >> A good place to start is by looking at your residuals  to determine if
> >> the normality assumptions are being met, if not then some form of glm
> >> that correctly models the residuals or a non parametric method should
> >> be used.
> >>
> >
> > Doing that could be very tricky indeed; I defy anyone, without knowledge
> of
> > how the data were generated, to detect departures from normality in such
> a
> > small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I
> mean.
> >
> > Second, one usually considers the distribution of the response when
> fitting
> > a GLM, not decide if residuals from an LM are non-Gaussian then move on.
> > The decision to use the GLM should be motivated directly from the data
> and
> > question to hand. Perhaps sometimes we can get away with fitting the LM,
> > but that usually involves some thought, in which case one has probably
> > already thought about the GLM as well.
>
> I agree completely with Gavin. If you have four data points and fit a
> two-parameter linear model and in addition select a one-parameter
> exponential family distribution (as implied in selecting a GLM family) you
> don't have many degrees of freedom left. I don't think you get such models
> accepted in many journals. Forget the regression and get more data. Some
> people suggested here that an acceptable model could be possible if your
> data points are not single observations but means from several
> observations. That is true: then you can proceed, but consult a
> statistician on the way to proceed.
>
> Cheers, Jari Oksanen
>
>


-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Jari Oksanen

On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

> On 22 October 2014 17:24, Chris Howden  wrote:
> 
>> A good place to start is by looking at your residuals  to determine if
>> the normality assumptions are being met, if not then some form of glm
>> that correctly models the residuals or a non parametric method should
>> be used.
>> 
> 
> Doing that could be very tricky indeed; I defy anyone, without knowledge of
> how the data were generated, to detect departures from normality in such a
> small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.
> 
> Second, one usually considers the distribution of the response when fitting
> a GLM, not decide if residuals from an LM are non-Gaussian then move on.
> The decision to use the GLM should be motivated directly from the data and
> question to hand. Perhaps sometimes we can get away with fitting the LM,
> but that usually involves some thought, in which case one has probably
> already thought about the GLM as well.

I agree completely with Gavin. If you have four data points and fit a 
two-parameter linear model and in addition select a one-parameter exponential 
family distribution (as implied in selecting a GLM family) you don't have many 
degrees of freedom left. I don't think you get such models accepted in many 
journals. Forget the regression and get more data. Some people suggested here 
that an acceptable model could be possible if your data points are not single 
observations but means from several observations. That is true: then you can 
proceed, but consult a statistician on the way to proceed.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Gavin Simpson
On 22 October 2014 17:24, Chris Howden  wrote:

> A good place to start is by looking at your residuals  to determine if
> the normality assumptions are being met, if not then some form of glm
> that correctly models the residuals or a non parametric method should
> be used.
>

Doing that could be very tricky indeed; I defy anyone, without knowledge of
how the data were generated, to detect departures from normality in such a
small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.

Second, one usually considers the distribution of the response when fitting
a GLM, not decide if residuals from an LM are non-Gaussian then move on.
The decision to use the GLM should be motivated directly from the data and
question to hand. Perhaps sometimes we can get away with fitting the LM,
but that usually involves some thought, in which case one has probably
already thought about the GLM as well.

G


>
> But just as important though is considering how you intend to use your
> data and exactly what it is. Irrelevant to what the statistics says if
> you only have 4 datum are you really confident in making broad
> generalisations with it? And writing a paper with your name on it?
> Just a couple datum could change everything, particularly if the scale
> isn't bounded so outliers can have a big impact. If the datum are some
> form of average I would be more confident with only 4 of them, but if
> they are raw values I would consider being very cautious about any
> conclusions you draw.
>
> Another reason I would be cautious of a result using only 4 datum is
> that their p value estimates may be very poorly estimated. Although
> not widely discussed we often use the Central limit theorem to assume
> parameter estimates are normally distributed when calculating the p
> value. (Because parameters can be thought of as weighted average the
> CLT applies to them). With only 4 datum we can't invoke the magic of
> the CLT and since there is no way to test whether the parameters are
> normal we take quite a risk assuming we have accurate p values at
> small sample sample sizes
>
> Chris Howden
> Founding Partner
> Tricky Solutions
> Tricky Solutions 4 Tricky Problems
> Evidence Based Strategic Development, IP Commercialisation and
> Innovation, Data Analysis, Modelling and Training
>
> (mobile) 0410 689 945
> (fax / office)
> ch...@trickysolutions.com.au
>
> Disclaimer: The information in this email and any attachments to it are
> confidential and may contain legally privileged information. If you are not
> the named or intended recipient, please delete this communication and
> contact us immediately. Please note you are not authorised to copy,
> use or disclose this communication or any attachments without our
> consent. Although this email has been checked by anti-virus software,
> there is a risk that email messages may be corrupted or infected by
> viruses or other
> interferences. No responsibility is accepted for such interference. Unless
> expressly stated, the views of the writer are not those of the
> company. Tricky Solutions always does our best to provide accurate
> forecasts and analyses based on the data supplied, however it is
> possible that some important predictors were not included in the data
> sent to us. Information provided by us should not be solely relied
> upon when making decisions and clients should use their own judgement.
>
> On 22 Oct 2014, at 17:20, V. Coudrain  wrote:
>
> >> With such a small data set, why not simulate some data sets with >
> reasonable effect sizes and see how an analysis performs? Krzysztof
> >
> > Dear Krzysztof,
> > It is good idea. Would you know some R functions thatis are well suited
> for this kind of simulations
> >
> >
> >
> > ___
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> >[[alternative HTML version deleted]]
> >
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Nicholas Hamilton
Dear All,

Please do not take any offence, I would really like to be removed from this 
mailing list, can someone let me know how this can be done.

Best Regards,

--
Nicholas Hamilton
School of Materials Science and Engineering
University of New South Wales (Australia)
--
www.ggtern.com

On 23 Oct 2014, at 10:24 am, Chris Howden  wrote:

> A good place to start is by looking at your residuals  to determine if
> the normality assumptions are being met, if not then some form of glm
> that correctly models the residuals or a non parametric method should
> be used.
> 
> But just as important though is considering how you intend to use your
> data and exactly what it is. Irrelevant to what the statistics says if
> you only have 4 datum are you really confident in making broad
> generalisations with it? And writing a paper with your name on it?
> Just a couple datum could change everything, particularly if the scale
> isn't bounded so outliers can have a big impact. If the datum are some
> form of average I would be more confident with only 4 of them, but if
> they are raw values I would consider being very cautious about any
> conclusions you draw.
> 
> Another reason I would be cautious of a result using only 4 datum is
> that their p value estimates may be very poorly estimated. Although
> not widely discussed we often use the Central limit theorem to assume
> parameter estimates are normally distributed when calculating the p
> value. (Because parameters can be thought of as weighted average the
> CLT applies to them). With only 4 datum we can't invoke the magic of
> the CLT and since there is no way to test whether the parameters are
> normal we take quite a risk assuming we have accurate p values at
> small sample sample sizes
> 
> Chris Howden
> Founding Partner
> Tricky Solutions
> Tricky Solutions 4 Tricky Problems
> Evidence Based Strategic Development, IP Commercialisation and
> Innovation, Data Analysis, Modelling and Training
> 
> (mobile) 0410 689 945
> (fax / office)
> ch...@trickysolutions.com.au
> 
> Disclaimer: The information in this email and any attachments to it are
> confidential and may contain legally privileged information. If you are not
> the named or intended recipient, please delete this communication and
> contact us immediately. Please note you are not authorised to copy,
> use or disclose this communication or any attachments without our
> consent. Although this email has been checked by anti-virus software,
> there is a risk that email messages may be corrupted or infected by
> viruses or other
> interferences. No responsibility is accepted for such interference. Unless
> expressly stated, the views of the writer are not those of the
> company. Tricky Solutions always does our best to provide accurate
> forecasts and analyses based on the data supplied, however it is
> possible that some important predictors were not included in the data
> sent to us. Information provided by us should not be solely relied
> upon when making decisions and clients should use their own judgement.
> 
> On 22 Oct 2014, at 17:20, V. Coudrain  wrote:
> 
>>> With such a small data set, why not simulate some data sets with > 
>>> reasonable effect sizes and see how an analysis performs? Krzysztof
>> 
>> Dear Krzysztof,
>> It is good idea. Would you know some R functions thatis are well suited for 
>> this kind of simulations
>> 
>> 
>> 
>> ___
>> Mode, hifi, maison,� J'ach�te malin. Je compare les prix avec
>>   [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Chris Howden
A good place to start is by looking at your residuals  to determine if
the normality assumptions are being met, if not then some form of glm
that correctly models the residuals or a non parametric method should
be used.

But just as important though is considering how you intend to use your
data and exactly what it is. Irrelevant to what the statistics says if
you only have 4 datum are you really confident in making broad
generalisations with it? And writing a paper with your name on it?
Just a couple datum could change everything, particularly if the scale
isn't bounded so outliers can have a big impact. If the datum are some
form of average I would be more confident with only 4 of them, but if
they are raw values I would consider being very cautious about any
conclusions you draw.

Another reason I would be cautious of a result using only 4 datum is
that their p value estimates may be very poorly estimated. Although
not widely discussed we often use the Central limit theorem to assume
parameter estimates are normally distributed when calculating the p
value. (Because parameters can be thought of as weighted average the
CLT applies to them). With only 4 datum we can't invoke the magic of
the CLT and since there is no way to test whether the parameters are
normal we take quite a risk assuming we have accurate p values at
small sample sample sizes

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
ch...@trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

On 22 Oct 2014, at 17:20, V. Coudrain  wrote:

>> With such a small data set, why not simulate some data sets with > 
>> reasonable effect sizes and see how an analysis performs? Krzysztof
>
> Dear Krzysztof,
> It is good idea. Would you know some R functions thatis are well suited for 
> this kind of simulations
>
>
>
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
>[[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread V. Coudrain
Hi Lars,

I came across this blog some days ago. It sounds very intereting but Iknow 
almost nothing about Baysian statistics and I honestly do not know how I could 
be able to apply it to my data. Would you have in mind a reference or tutorial 
that might help?
Cheers


>Why not take the opportunity of getting to know ABC some more? Rasmus >Bååth 
>wrote a piece on Tiny Data and ABC which might suit your problem >very well. 
>>http://www.r-bloggers.com/tiny-data-approximate-bayesian-computation-and-the-socks-of-karl-broman/
> >Cheers >/La

___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-22 Thread Lars Westerberg
Why not take the opportunity of getting to know ABC some more? Rasmus 
Bååth wrote a piece on Tiny Data and ABC which might suit your problem 
very well.

http://www.r-bloggers.com/tiny-data-approximate-bayesian-computation-and-the-socks-of-karl-broman/

Cheers
/Lars

On 2014-10-22 08:19, V. Coudrain wrote:

With such a small data set, why not simulate some data sets with > reasonable 
effect sizes and see how an analysis performs? Krzysztof

Dear Krzysztof,
It is good idea. Would you know some R functions thatis are well suited for 
this kind of simulations



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Regression with few observations per factor level

2014-10-21 Thread V. Coudrain
> With such a small data set, why not simulate some data sets with > reasonable 
> effect sizes and see how an analysis performs? Krzysztof

Dear Krzysztof,
It is good idea. Would you know some R functions thatis are well suited for 
this kind of simulations



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-21 Thread Krzysztof Sakrejda
With such a small data set, why not simulate some data sets with
reasonable effect sizes and see how an analysis performs?  Krzysztof

On Mon, Oct 20, 2014 at 11:53 AM, V. Coudrain  wrote:
> Thank you for this helpful thought. So if I get it correctly it is hopeless 
> to try testing an interaction, but we neverless may assess if a covariate has 
> an impact, providing it is the same in all treatments.
>
>
>
>
>> Message du 20/10/14 à 16h46
>> De : "Elgin Perry"
>> A : v_coudr...@voila.fr
>> Copie à :
>> Objet : Regression with few observations per factor level
>>
>> If it is reasonable to assume that the slope of the covariate is the same 
>> for all treatments and you have numerous treatments then you can do this by 
>> specifying one slope parameter for all treatments as you gave in your 
>> example (e.g. lm(var ~ trt + cov)).  By combining slope information over 
>> treatments, you can obtain a reasonably precise estimate.   With so few 
>> observations per treatment, you will not be able to estimate separate slopes 
>> for each treatment with any degree of precision (e.g. lm(var ~ trt + 
>> trt:cov))
>
>
> Elgin S. Perry, Ph.D.
> Statistics Consultant
> 377 Resolutions Rd.
> Colonial Beach, Va.  22443
> ph. 410.610.1473
>
>
> Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
> From: "V. Coudrain" < v_coudr...@voila.fr >
> To: r-sig-ecology@r-project.org
> Subject: [R-sig-eco] Regression with few observations per factor level
> Message-ID: < 2127199056.738451413795221981.JavaMail.www@wwinf7128 >
> Content-Type: text/plain; charset="UTF-8"
>
>
> Hi, I would like to test the impact of a treatment of some variable using 
> regression (e.g. lm(var ~ trt + cov)).?
> However I only have four observations per factor level. Is it still possible 
> to apply a regression with such a small
> sample size. I think that i should be difficult to correctly estimate 
> variance.Do you think that I rather should compute
> a non-parametric test such as Kruskal-Wallis? However I need to include 
> covariables in my models and I am not sure if
> basic non-parametric tests are suitable for this. Thanks for any suggestion.
> ___
> Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec
>  [[alternative HTML version deleted]]
>
>
>
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Baldwin, Jim -FS
Yes, the analysis with a small sample size would be valid (under the assumption 
that the model - both fixed and random effects are correctly specified) but at 
some point there must be a practical assessment as to the desired precision and 
the costs of the consequences of either inadequate estimates or wrong 
acceptance or rejection of hypotheses.  If it were just about the numbers from 
a sample and resulting P-values, we would only need statisticians and no 
subject-matter experts (which is clearly not the case).

And while I'm soapboxing, situations with low variability require fewer samples 
than situations with high variability.  One can't make assessments of the 
adequacy of an analysis solely on the sample size.

Jim

Jim Baldwin
Station Statistician
Pacific Southwest Research Station
USDA Forest Service

-Original Message-
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of V. Coudrain
Sent: Monday, October 20, 2014 8:54 AM
To: ElginPerry
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Regression with few observations per factor level

Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




> Message du 20/10/14 à 16h46
> De : "Elgin Perry"
> A : v_coudr...@voila.fr
> Copie à :
> Objet : Regression with few observations per factor level
>
> If it is reasonable to assume that the slope of the covariate is the
> same for all treatments and you have numerous treatments then you can
> do this by specifying one slope parameter for all treatments as you
> gave in your example (e.g. lm(var ~ trt + cov)).  By combining slope
> information over treatments, you can obtain a reasonably precise
> estimate.   With so few observations per treatment, you will not be
> able to estimate separate slopes for each treatment with any degree of
> precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: "V. Coudrain" < v_coudr...@voila.fr >
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID: < 2127199056.738451413795221981.JavaMail.www@wwinf7128 >
Content-Type: text/plain; charset="UTF-8"


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small sample size. I think that i should be 
difficult to correctly estimate variance.Do you think that I rather should 
compute a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if basic non-parametric tests are 
suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec
 [[alternative HTML version deleted]]



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Yes, but as I fear, the residuals behave badly as soon as the model get a 
little bit more complex (e.g., with two covariables or an interactions). The 
scope for performing an ANCOVA is thus very limited. That's why I was thinking 
about a potential non-parametric model. But I do not want to artificially makes 
my data tell something if it cannot.




> Message du 20/10/14 à 16h50
> De : "stephen sefick" 
> A : "Martin Weiser" 
> Copie à : "V. Coudrain" , "r-sig-ecology" 
> Objet : Re: [R-sig-eco] Regression with few observations per factor level
> 
> You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out 
> earlier, all of the normal theory regression assumptions apply. Assuming all 
> of those things are satisfied then if you have large confidence intervals and 
> there are significant differences between groups I don't see why you couldn't 
> correctly infer something about the treatments. Maybe I am missing something.
> Stephen 
> On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser  wrote:
> Hi,
> 
> coefficients and their p-values are reliable if your data are OK and you
> do know enough about the process that generated them, so you can choose
> appropriate model. With 4 points per line, it may be really difficult to
> identify bad fit or outliers.
> 
> For example: simple linear regression needs constant variance of the
> normal distribution from which residuals are drawn -  along the
> regression line - to work properly.  With 4 points, you can hardly
> estimate this, but if you know enough about the process that generated
> the data, you are safe. If you do not know, it is not easy to say
> anything about the nature of the process that generated the data.
> 
> If you know (or can assume) that there is simple linear relationship,
> you can say: "slope of this relationship is such and such", but if you
> want to estimate both the nature of the relationship ("A *linearly*
> depends on B") and its magnitude ("the slope of this relationship
> is ..."), p-values would not help you much.
> 
> Of course, I may be wrong - I am not a statistician, just a user.
> 
> Best,
> Martin W.
> 
> 
> V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> > Thank you very much. If I get it right, the CI get wider, my test has less 
> > power and the probability of getting a significant relation decreases. What 
> > about the significant coefficients, are they reliable?
> >
> >
> >
> >
> > > Message du 20/10/14 à 11h30
> > > De : "Roman Luštrik"
> > > A : "V. Coudrain"
> > > Copie à : "r-sig-ecology@r-project.org"
> > > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> > >
> > > I think you can, but the confidence intervals will be rather large due to 
> > > number of samples.
> > > Notice how standard errors change for sample size (per group) from 4 to 
> > > 30.
> > > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
> > > >                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > > pg), +                     cov = runif(pg*4)) # 4 groups> 
> > > > summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
> > > 0.03332  0.66380  1.27974
> > > Coefficients:            Estimate Std. Error t value Pr(>|t|)    
> > > (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759  
> > >    0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449  
> > > 1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov        
> > >    1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 
> > > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.154 on 11 degrees of freedomMultiple 
> > > R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 
> > > and 11 DF,  p-value: 7.467e-12
> > > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean 
> > > > > = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 
> > > > > 30)), +                     trt = rep(c("trt1", "trt2", "trt3", 
> > > > > "trt4"), each = pg), +                     cov = runif(pg*4)) 

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




> Message du 20/10/14 à 16h46
> De : "Elgin Perry" 
> A : v_coudr...@voila.fr
> Copie à : 
> Objet : Regression with few observations per factor level
> 
> If it is reasonable to assume that the slope of the covariate is the same for 
> all treatments and you have numerous treatments then you can do this by 
> specifying one slope parameter for all treatments as you gave in your example 
> (e.g. lm(var ~ trt + cov)).  By combining slope information over treatments, 
> you can obtain a reasonably precise estimate.   With so few observations per 
> treatment, you will not be able to estimate separate slopes for each 
> treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: "V. Coudrain" < v_coudr...@voila.fr >
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID: < 2127199056.738451413795221981.JavaMail.www@wwinf7128 >
Content-Type: text/plain; charset="UTF-8"


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small
sample size. I think that i should be difficult to correctly estimate 
variance.Do you think that I rather should compute
a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if
basic non-parametric tests are suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec 
 [[alternative HTML version deleted]]
  


___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread stephen sefick
You are more or less preforming an ANOVA/ANCOVA on your data? As pointed
out earlier, all of the normal theory regression assumptions apply.
Assuming all of those things are satisfied then if you have large
confidence intervals and there are significant differences between groups I
don't see why you couldn't correctly infer something about the treatments.
Maybe I am missing something.

Stephen

On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser 
wrote:

> Hi,
>
> coefficients and their p-values are reliable if your data are OK and you
> do know enough about the process that generated them, so you can choose
> appropriate model. With 4 points per line, it may be really difficult to
> identify bad fit or outliers.
>
> For example: simple linear regression needs constant variance of the
> normal distribution from which residuals are drawn -  along the
> regression line - to work properly.  With 4 points, you can hardly
> estimate this, but if you know enough about the process that generated
> the data, you are safe. If you do not know, it is not easy to say
> anything about the nature of the process that generated the data.
>
> If you know (or can assume) that there is simple linear relationship,
> you can say: "slope of this relationship is such and such", but if you
> want to estimate both the nature of the relationship ("A *linearly*
> depends on B") and its magnitude ("the slope of this relationship
> is ..."), p-values would not help you much.
>
> Of course, I may be wrong - I am not a statistician, just a user.
>
> Best,
> Martin W.
>
>
> V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> > Thank you very much. If I get it right, the CI get wider, my test has
> less power and the probability of getting a significant relation decreases.
> What about the significant coefficients, are they reliable?
> >
> >
> >
> >
> > > Message du 20/10/14 à 11h30
> > > De : "Roman Luštrik"
> > > A : "V. Coudrain"
> > > Copie à : "r-sig-ecology@r-project.org"
> > > Objet : Re: [R-sig-eco] Regression with few observations per factor
> level
> > >
> > > I think you can, but the confidence intervals will be rather large due
> to number of samples.
> > > Notice how standard errors change for sample size (per group) from 4
> to 30.
> > > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean
> = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +
>trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg),
> + cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt +
> cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals: Min   1Q   Median   3Q  Max -1.63861
> -0.46080  0.03332  0.66380  1.27974
> > > Coefficients:Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
>  0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449
> 1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov
>  1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.154 on 11 degrees of freedomMultiple
> R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and
> 11 DF,  p-value: 7.467e-12
> > > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg,
> mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean =
> 30)), + trt = rep(c("trt1", "trt2", "trt3", "trt4"),
> each = pg), + cov = runif(pg*4)) # 4 groups>
> summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584
> -0.0185  0.6423  3.2077
> > > Coefficients:Estimate Std. Error t value Pr(>|t|)
> (Intercept)  2.769610.25232  10.977  < 2e-16 ***trttrt2 -1.75490
> 0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752  <
> 2e-16 ***trttrt4 27.040950.28286  95.599  < 2e-16 ***cov
> 0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.094 on 115 degrees of freedomMultiple
> R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and
> 115 DF,  p-value: < 2.2e-16
> > > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wr

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Martin Weiser
Hi,

coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers. 

For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn -  along the
regression line - to work properly.  With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.

If you know (or can assume) that there is simple linear relationship,
you can say: "slope of this relationship is such and such", but if you
want to estimate both the nature of the relationship ("A *linearly*
depends on B") and its magnitude ("the slope of this relationship
is ..."), p-values would not help you much.

Of course, I may be wrong - I am not a statistician, just a user.

Best,
Martin W. 


V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> Thank you very much. If I get it right, the CI get wider, my test has less 
> power and the probability of getting a significant relation decreases. What 
> about the significant coefficients, are they reliable?
> 
> 
> 
> 
> > Message du 20/10/14 à 11h30
> > De : "Roman Luštrik" 
> > A : "V. Coudrain" 
> > Copie à : "r-sig-ecology@r-project.org" 
> > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> > 
> > I think you can, but the confidence intervals will be rather large due to 
> > number of samples.
> > Notice how standard errors change for sample size (per group) from 4 to 30.
> > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +   
> > >   trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > pg), + cov = runif(pg*4)) # 4 groups> summary(lm(var 
> > > ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals: Min   1Q   Median   3Q  Max -1.63861 -0.46080  
> > 0.03332  0.66380  1.27974 
> > Coefficients:Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
> >  0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449  
> > 1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov  
> >  1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001 
> > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
> > 0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
> > p-value: 7.467e-12
> > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
> > > > trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > > pg), + cov = runif(pg*4)) # 4 groups> 
> > > > summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584 -0.0185  
> > 0.6423  3.2077 
> > Coefficients:Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  2.769610.25232  10.977  < 2e-16 ***trttrt2 -1.75490
> > 0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752  < 
> > 2e-16 ***trttrt4 27.040950.28286  95.599  < 2e-16 ***cov  
> > 0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001 
> > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 
> >  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
> > p-value: < 2.2e-16
> > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> > Hi, I would like to test the impact of a treatment of some variable using 
> > regression (e.g. lm(var ~ trt + cov)).  However I only have four 
> > observations per factor level. Is it still possible to apply a regression 
> > with such a small sample size. I think that i should be difficult to 
> > correctly estimate variance.Do you think that I rather should compute a 
> > non-parametric test such as Kruskal-Wallis? 

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you very much. If I get it right, the CI get wider, my test has less 
power and the probability of getting a significant relation decreases. What 
about the significant coefficients, are they reliable?




> Message du 20/10/14 à 11h30
> De : "Roman Luštrik" 
> A : "V. Coudrain" 
> Copie à : "r-sig-ecology@r-project.org" 
> Objet : Re: [R-sig-eco] Regression with few observations per factor level
> 
> I think you can, but the confidence intervals will be rather large due to 
> number of samples.
> Notice how standard errors change for sample size (per group) from 4 to 30.
> > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), 
> > rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +         
> >             trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +      
> >                cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, 
> > data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
> 0.03332  0.66380  1.27974 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  
>  1.2345     1.0218   1.208    0.252    trttrt2      -0.7759     0.8667  
> -0.895    0.390    trttrt3       7.8503     0.8308   9.449  1.3e-06 
> ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov           1.4027   
>   1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
> 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
> 0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
> p-value: 7.467e-12
> > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> >3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +      
> >               trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +    
> >                 cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, 
> >data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185  
> 0.6423  3.2077 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  
> 2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490    0.28546  
> -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  < 2e-16 
> ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov          0.05129   
>  0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
> 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:  
> 0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
> p-value: < 2.2e-16
> On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> Hi, I would like to test the impact of a treatment of some variable using 
> regression (e.g. lm(var ~ trt + cov)).  However I only have four observations 
> per factor level. Is it still possible to apply a regression with such a 
> small sample size. I think that i should be difficult to correctly estimate 
> variance.Do you think that I rather should compute a non-parametric test such 
> as Kruskal-Wallis? However I need to include covariables in my models and I 
> am not sure if basic non-parametric tests are suitable for this. Thanks for 
> any suggestion.
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
>         [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 

> -- 
> In God we trust, all others bring data. 

___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Roman Luštrik
I think you can, but the confidence intervals will be rather large due to
number of samples.

Notice how standard errors change for sample size (per group) from 4 to 30.

> pg <- 4 # pg = per group
> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1),
rnorm(pg, mean = 11), rnorm(pg, mean = 30)),
+ trt = rep(c("trt1", "trt2", "trt3", "trt4"), each =
pg),
+ cov = runif(pg*4)) # 4 groups
> summary(lm(var ~ trt + cov, data = my.df))

Call:
lm(formula = var ~ trt + cov, data = my.df)

Residuals:
 Min   1Q   Median   3Q  Max
-1.63861 -0.46080  0.03332  0.66380  1.27974

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.2345 1.0218   1.2080.252
trttrt2  -0.7759 0.8667  -0.8950.390
trttrt3   7.8503 0.8308   9.449  1.3e-06 ***
trttrt4  28.2685 0.9050  31.236  4.3e-12 ***
cov   1.4027 1.1639   1.2050.253
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.154 on 11 degrees of freedom
Multiple R-squared:  0.9932, Adjusted R-squared:  0.9908
F-statistic: 404.4 on 4 and 11 DF,  p-value: 7.467e-12

>
> pg <- 30 # pg = per group
> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1),
rnorm(pg, mean = 11), rnorm(pg, mean = 30)),
+ trt = rep(c("trt1", "trt2", "trt3", "trt4"), each =
pg),
+ cov = runif(pg*4)) # 4 groups
> summary(lm(var ~ trt + cov, data = my.df))

Call:
lm(formula = var ~ trt + cov, data = my.df)

Residuals:
Min  1Q  Median  3Q Max
-2.5778 -0.6584 -0.0185  0.6423  3.2077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.769610.25232  10.977  < 2e-16 ***
trttrt2 -1.754900.28546  -6.148 1.17e-08 ***
trttrt3  8.405210.28251  29.752  < 2e-16 ***
trttrt4 27.040950.28286  95.599  < 2e-16 ***
cov  0.051290.32523   0.1580.875
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.094 on 115 degrees of freedom
Multiple R-squared:  0.9913, Adjusted R-squared:  0.991
F-statistic:  3269 on 4 and 115 DF,  p-value: < 2.2e-16

On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:

> Hi, I would like to test the impact of a treatment of some variable using
> regression (e.g. lm(var ~ trt + cov)).  However I only have four
> observations per factor level. Is it still possible to apply a regression
> with such a small sample size. I think that i should be difficult to
> correctly estimate variance.Do you think that I rather should compute a
> non-parametric test such as Kruskal-Wallis? However I need to include
> covariables in my models and I am not sure if basic non-parametric tests
> are suitable for this. Thanks for any suggestion.
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
In God we trust, all others bring data.

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).  However I only have four observations 
per factor level. Is it still possible to apply a regression with such a small 
sample size. I think that i should be difficult to correctly estimate 
variance.Do you think that I rather should compute a non-parametric test such 
as Kruskal-Wallis? However I need to include covariables in my models and I am 
not sure if basic non-parametric tests are suitable for this. Thanks for any 
suggestion.
___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology