[R-sig-eco] Unbalanced data and random effects
Thank you very much for these explanations. It is quite technical and I am not sure that I got it all, but I will try to find the book of GelmanHill to get more insight into shrinkage. I read the book of Zuur and as you said the topic is not extensively covered. ___ Les prévisions météo pour aujourd'hui, demain et jusqu'à 8 jours ! Voila.fr http://meteo.voila.fr/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Unbalanced data and random effects
Dear all, I performed a census of insects at different sites and measured there size. I would like to know if size is related to an environmental factor. I modelled the size as a fonction of the factor with site as a random variable to account for within-site variability. However I have strong unbalanced data with some sites having only two individuals and others up to 100. Is having site as a random factor sufficient to deal with this strong data unbalance? The residual fit of the data is quite bad, certainly because of the strong difference in variance among sites. Would anybody have some advice? Thank you! ___ Les prévisions météo pour aujourd'hui, demain et jusqu'à 8 jours ! Voila.fr http://meteo.voila.fr/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Unbalanced data and random effects
Hi Krysztof, Did you have a specific section of Zuur et als book in mind? I've pulled it off my shelf and tried looking up shrinkage, unbalanced design, design, etc in the index but couldn't find anything relevant. I'm sure it's in there, but it’s a rather large book to read in 1 go!! Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (skype) chris.howden ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Krzysztof Sakrejda Sent: Thursday, 17 October 2013 12:29 AM Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Unbalanced data and random effects On Wed, Oct 16, 2013 at 6:41 AM, v_coudr...@voila.fr wrote: Dear all, I performed a census of insects at different sites and measured there size. I would like to know if size is related to an environmental factor. I modelled the size as a fonction of the factor with site as a random variable to account for within-site variability. However I have strong unbalanced data with some sites having only two individuals and others up to 100. Is having site as a random factor sufficient to deal with this strong data unbalance? I'm not sure what you mean by deal with, but reading about shrinkage in random effects models in any decent source would probably be a fine start for you, either here: http://www.amazon.com/Effects-Extensions-Ecology-Statistics-Biology/dp/038 7874577/ref=la_B001JRWU88_1_2/192-3027843-3405263?s=booksie=UTF8qid=1381 929893sr=1-2 or here: http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/d p/052168689X/ref=sr_1_3?s=booksie=UTF8qid=1381929942sr=1-3keywords=gel man+bayesian The short answer is that the site effect will shrink toward the average site effect for sites with few individuals. Krzysztof The residual fit of the data is quite bad, certainly because of the strong difference in variance among sites. Would anybody have some advice? Thank you! ___ Les prévisions météo pour aujourd'hui, demain et jusqu'à 8 jours ! Voila.fr http://meteo.voila.fr/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Krzysztof Sakrejda Organismic and Evolutionary Biology University of Massachusetts, Amherst 319 Morrill Science Center South 611 N. Pleasant Street Amherst, MA 01003 work #: 413-325-6555 email: sakre...@cns.umass.edu ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Unbalanced data and random effects
On Wed, Oct 16, 2013 at 6:20 PM, Chris Howden ch...@trickysolutions.com.au wrote: Hi Krysztof, Did you have a specific section of Zuur et als book in mind? I've pulled it off my shelf and tried looking up shrinkage, unbalanced design, design, etc in the index but couldn't find anything relevant. I'm sure it's in there, but it’s a rather large book to read in 1 go!! I was thinking Mixed Effects Models and Extensions in Ecology with R, but now that I search through Zuur in Google books there appears to be no mention of either partial pooling or shrinkage (I don't have the book on hand). It's not mentioned in the index either... I recommended Zuur because I know a lot of ecologists use it and shrinkage is such a basic and useful topic I expected it to be covered. It's page 477 in Gelman and Hill. Since I stuck my foot in it by recommending Zuur without checking: the basic idea is that if you have a data set with unbalanced group sizes and you just call everything one group, you could get an estimated group mean, MU. If you use fixed effects and you estimate one mean per group(mu_1, mu_2, ..., mu_k), and the means for the small groups will be poorly estimated (large standard errors). If you use a random effects model, you estimate one mean per group but you also constrain the group means (mu*_1, mu*_2, ..., mu*_k) to come from a normal distribution (with an estimated mean, MU*, and variance) which has two effects important for interpretation: 1) groups with fewer observations will mostly be represented by the overall mean (mu*_1 is closer to MU* than mu_1 is to MU, and the effect is more extreme for groups with small sample size); and 2) this effect is even more pronounced in groups with large deviations from MU*. You can get a feel for how much this matters by simulating/fitting some data similar to your data in R (Kery's Introduction to WinBUGS for ecologists does a lot of this kind simulation). The terms used to describe these effects are shrinkage and partial pooling, since complete pooling is what you get when you disregard the divisions. You can also calculate how much pooling is being done directly (Gelman and Hill, pg. 477): mu*_j = w_j x MU* + (1-w_j) * mean(observations in group j) w_j = 1- (estimated variance of random effect) / (estimated variance of random effect + within-group variance/group size) Where w_j tells you how much that groups estimate is pooled towards the mean. That's the short and sloppy version, but the discussion in Gelman is good, sorry for the confusion, maybe somebody else knows for sure where/if Zuur discusses this? Krzysztof Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (skype) chris.howden ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Krzysztof Sakrejda Sent: Thursday, 17 October 2013 12:29 AM Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Unbalanced data and random effects On Wed, Oct 16, 2013 at 6:41 AM, v_coudr...@voila.fr wrote: Dear all, I performed a census of insects at different sites and measured there size. I would like to know if size is related to an environmental factor. I modelled the size as a fonction of the factor with site as a random variable to account for within-site variability. However I have strong unbalanced data with some sites having only two individuals and others up to 100. Is having site as a random factor sufficient to deal with this strong data unbalance? I'm not sure what you mean by deal with, but reading about shrinkage in random effects models in any decent source would probably be a fine start for you, either here: http://www.amazon.com/Effects-Extensions-Ecology-Statistics-Biology/dp/038 7874577/ref=la_B001JRWU88_1_2/192-3027843-3405263?s
Re: [R-sig-eco] Unbalanced data and random effects
Thanks Krzysztof, Your explanation makes a lot of sense. Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (skype) chris.howden ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -Original Message- From: Krzysztof Sakrejda [mailto:krzysztof.sakre...@gmail.com] Sent: Thursday, 17 October 2013 11:57 AM To: Chris Howden Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Unbalanced data and random effects On Wed, Oct 16, 2013 at 6:20 PM, Chris Howden ch...@trickysolutions.com.au wrote: Hi Krysztof, Did you have a specific section of Zuur et als book in mind? I've pulled it off my shelf and tried looking up shrinkage, unbalanced design, design, etc in the index but couldn't find anything relevant. I'm sure it's in there, but it’s a rather large book to read in 1 go!! I was thinking Mixed Effects Models and Extensions in Ecology with R, but now that I search through Zuur in Google books there appears to be no mention of either partial pooling or shrinkage (I don't have the book on hand). It's not mentioned in the index either... I recommended Zuur because I know a lot of ecologists use it and shrinkage is such a basic and useful topic I expected it to be covered. It's page 477 in Gelman and Hill. Since I stuck my foot in it by recommending Zuur without checking: the basic idea is that if you have a data set with unbalanced group sizes and you just call everything one group, you could get an estimated group mean, MU. If you use fixed effects and you estimate one mean per group(mu_1, mu_2, ..., mu_k), and the means for the small groups will be poorly estimated (large standard errors). If you use a random effects model, you estimate one mean per group but you also constrain the group means (mu*_1, mu*_2, ..., mu*_k) to come from a normal distribution (with an estimated mean, MU*, and variance) which has two effects important for interpretation: 1) groups with fewer observations will mostly be represented by the overall mean (mu*_1 is closer to MU* than mu_1 is to MU, and the effect is more extreme for groups with small sample size); and 2) this effect is even more pronounced in groups with large deviations from MU*. You can get a feel for how much this matters by simulating/fitting some data similar to your data in R (Kery's Introduction to WinBUGS for ecologists does a lot of this kind simulation). The terms used to describe these effects are shrinkage and partial pooling, since complete pooling is what you get when you disregard the divisions. You can also calculate how much pooling is being done directly (Gelman and Hill, pg. 477): mu*_j = w_j x MU* + (1-w_j) * mean(observations in group j) w_j = 1- (estimated variance of random effect) / (estimated variance of random effect + within-group variance/group size) Where w_j tells you how much that groups estimate is pooled towards the mean. That's the short and sloppy version, but the discussion in Gelman is good, sorry for the confusion, maybe somebody else knows for sure where/if Zuur discusses this? Krzysztof Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (skype) chris.howden ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly