Re: [R-sig-eco] [EXTERNAL] Use of geometric mean for geochemical concentrations

2024-01-24 Thread Cade, Brian S
Think I miss sent this just to Phillip Dixon so reposting.

Rich:  Just to expand on Phillip Dixon's reply a bit.  You can always estimate 
the median in the log transformed scale, with for example quantile regression, 
and then back-transform to the original concentration scale without bias or 
loss of information as the median like all quantiles is equivariant to 
nonlinear monotonic transformations like the logarithmic.  And as Phillip 
indicated the mean estimated in log transformed scale back-transformed is the 
geometric mean estimate of median in original scale.  If you really require an 
estimate of the expected value (mean in original concentration scale), Duan's 
(1983) smearing estimate is a general nonparametric retransformation method 
that can estimate the mean from an estimated median.  It is fairly simple to 
apply.  If you need an estimate of median handling below detection limit data, 
quantile regression (quantreg package) has a censored data estimator option 
that can be used.

Brian



Brian S. Cade, PhD

U. S. Geological Survey (emeritus)
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov
tel:  970 404-0447


From: R-sig-ecology  on behalf of Rich 
Shepard 
Sent: Monday, January 22, 2024 9:25 AM
To: r-sig-ecology@r-project.org 
Subject: [EXTERNAL] [R-sig-eco] Use of geometric mean for geochemical 
concentrations



 This email has been received from outside of DOI - Use caution before clicking 
on links, opening attachments, or responding.



A statistical question, not specific to R.

I'm asking for a pointer for a source of definitive descriptions of what
types of data are best summarized by the arithmetic, geometric, and harmonic
means.

As an aquatic ecologist I see regulators apply the geometric mean to
geochemical concentrations rather than using the arithmetic mean. I want to
know whether the geometric mean of a set of chemical concentrations (e.g.,
in mg/L) is an appropriate representation of the expected value. If not, I
want to explain this to non-technical decision-makers; if so, I want to
understand why my assumption is wrong.

TIA,

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-ecology=05%7C02%7Ccadeb%40usgs.gov%7Cf051a9748023448e278c08dc1b66e4dd%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638415375936186803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=ya7lJk8tATR55nvzpsng8vTgiPHPM78i61VOmK3%2Bhro%3D=0

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [EXTERNAL] Re: modeling variation along a gradient

2022-10-27 Thread Cade, Brian S
You might want to consider modeling multiple quantiles in a quantile regression 
(package quantreg) as a more general, highly flexible way of modeling variation 
in a response (y) as some function of predictors (X) without having to specify 
a distributional form for the conditional response.  Lots of options available 
in quantreg package with many examples in the vignettes provided by Roger 
Koenker.

Brian


Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov
tel:  970 226-9326


From: R-sig-ecology  on behalf of 
Stratford, Jeff 
Sent: Thursday, October 27, 2022 6:26 AM
To: Zolt�n Botta-Duk�t 
Cc: r-sig-ecology@r-project.org 
Subject: [EXTERNAL] Re: [R-sig-eco] modeling variation along a gradient



 This email has been received from outside of DOI - Use caution before clicking 
on links, opening attachments, or responding.



Hi Zoltan,

The goal is to model variation per se not the mean (so a function like var
= a + Bx instead of y = a + Bx). Thanks for the suggestion for glmmTMB -
I'll look into it.

Best,

Jeff

Jeffrey A. Stratford, PhD.
Department of Biology and Earth System Sciences
84 W South Street
Wilkes University, PA 18766 USA
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsites.google.com%2Fa%2Fwilkes.edu%2Fstratford%2Fdata=05%7C01%7Ccadeb%40usgs.gov%7C52d259635285400e393a08dab816a808%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638024704700574699%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=KP2cPwXKF6ZFUIPpKbniu4hobHuwh0dGVaUkfUxL%2F9s%3Dreserved=0



On Thu, Oct 27, 2022 at 2:30 AM Zolt�n Botta-Duk�t <
botta-dukat.zol...@ecolres.hu> wrote:

> Dear Jeff,
>
> I'm not sure if you want to model mean, with accounting variation, or to
> modelling the change of variation along gradient. Note that in most
> distribution (Gaussian is an exception) mean and variation are not
> independent, and often variation increases with increasing mean. So
> choosing the correct distribution may solve the problem when the aim is
> modelling the mean. If not, or the aim is modelling variation, by
> glmmTMB you can fit models both to mean and variance.
>
> Best wishes,
>
> Zoltan
>
> 2022. 10. 27. 2:25 keltez�ssel, Stratford, Jeff �rta:
> > Hi all,
> >
> > I have several data sets that relate some ecological phenomenon to a
> > gradient (e.g., avian species richness and urbanization, weevil
> prevalence
> > in acorns and latitude - see
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinkprotect.cudasvc.com%2Furl%3Fa%3Dhttps%253a%252f%252fwww.mdpi.com%252f1424-2818%252f13%252f7%252f303%26c%3DE%2C1%2C0cKp2p1DFujYtllwpy1B4D8YXPODGboC24UE7K8-WmmQA8SHSMN_KuZ3nqTTyQLyp7oD4zyt2JQqLPfjwQ2yFskgIviy3du5WZOU07vtA2RpKPaH%26typo%3D1data=05%7C01%7Ccadeb%40usgs.gov%7C52d259635285400e393a08dab816a808%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638024704700574699%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=URwpGDL71%2Fw1D3LK2mL4nfhi%2Fo1pY3%2B46izBWix%2BLbg%3Dreserved=0).
>
> > Most
> > of these data sets have more variation on one end than the other. I have
> > considered quantile regression but this seems inadequate. I would like to
> > model variation per se along a gradient.
> >
> > Is there an R function that models sd or variance along a gradient. I was
> > thinking that it would require a moving window of some sorts.
> >
> > Any suggestions would be appreciated.
> >
> > Thanks,
> >
> > Jeff
> > 
> > Jeffrey A. Stratford, PhD.
> > Department of Biology and Earth System Sciences
> > 84 W South Street
> > Wilkes University, PA 18766 USA
> > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsites.google.com%2Fa%2Fwilkes.edu%2Fstratford%2Fdata=05%7C01%7Ccadeb%40usgs.gov%7C52d259635285400e393a08dab816a808%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638024704700574699%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=KP2cPwXKF6ZFUIPpKbniu4hobHuwh0dGVaUkfUxL%2F9s%3Dreserved=0
> > 
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > 

Re: [R-sig-eco] [EXTERNAL] Queries on regression analysis

2019-08-08 Thread Cade, Brian
While the previous responders have provided some useful advice, it was a
bit misleading.  The linear model for continuous responses does not
automatically assume a normal distribution (of the errors, of which the
residuals are an estimate).  A specific way of estimating the conditional
mean in the linear model assumes a normal distribution of errors.  More
generally, you can estimate the quantiles of the empirical distribution of
the continuous responses with linear quantile regression, which makes no
assumption about a parametric form of the error distribution and naturally
accommodates heterorgeneity.  You can use the median estimate (0.50
quantile regression) as a measure of central tendency rather than the
mean.  But, almost always it is more informative to estimate some interval
of quantiles (say 0.10 to 0.90) to adequately characterize how the response
changes with covariates.   More advanced transformation approaches with
quantile regression will allow you to handle proportions (responses bounded
on [0, 1] interval) and discrete counts.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326



On Thu, Aug 8, 2019 at 5:16 AM Chitra Baniya  wrote:

> We have the data of "gall_diameter" and "elevations". Our objective is to
> see how does gall diameter vary along the elevation gradient. In our case
> the elevation gradient refers to the range between 1500 to 2500 m asl with
> data collected in every 250 m interval.
> Our data did not follow normal distribution. Gall_diameter is a continuous
> dependent variable. Can we apply "glm" to see the relationship between
> gall-diameter and elevation in our case?
> How can we decide the distribution in our data and its family?
> Thank you all in advance for your help.
>
>
> --
>
>
>
>
>
>
> *Chitra Bahadur Baniya, M Sc, M Phil, PhDAssociate ProfessorCentral
> Department of BotanyTribhuvan UniversityKirtipurKathmandu, Nepal*
> orcid.org/-0002-8746-7601
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [EXTERNAL] Re: Fitting a GLMM to a percent cover data with glmer or glmmTMB

2018-11-29 Thread Cade, Brian
Beta regression can be used for modeling proportion (or percentage) cover
data, but there are some issues with using it if you have many values of
0.0 or 1.0.  A much more flexible approach that I've used is to use
quantile regression with the proportion response (y) data logit
transformed.  Much easier to deal with 0.0 or 1.0 (boundary values) with
quantiles than with the means being estimated by beta regression.  The
quantreg package in R has what you need, but I think this logit transformed
quantile regression approach is also implemented in another package too
(perhaps Qtools package).  See Bottai et al. (2010.  Logistic quantile
regression for bounded outcomes.  Statistics in Medicine 29: 309-317.).

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326



On Thu, Nov 29, 2018 at 7:24 AM Botta-Dukát Zoltán <
botta-dukat.zol...@okologia.mta.hu> wrote:

> Hi,
>
> I'm sure that binomial is unsuitable for relative cover. Binomial
> distribution are defined as number of successes in independent trials. I
> think this scheme cannot be applied to relative cover or visually
> estimated cover. It is important because both number of trials and
> probability of success influence mean and variance, thus both should
> have a meaning that correspond to terms in this scheme.
>
> Unfortunately, I have no experience with tweedie distribution. I am also
> interested in experience of others! In theory an alternative would be
> zero-inflated beta distribution (after rescaling percentage between zero
> to one interval). Do some has an experience (including its availability
> in R) with it?
>
> Cheers
>
> Zoltan
>
> 2018. 11. 28. 20:47 keltezéssel, Vasco Silva írta:
> > Hi,
> >
> > I am trying to fit a GLMM on percent cover for each species using glmer:
> >
> >> str(cover)
> > 'data.frame': 102 obs. of  114 variables:
> > $ Plot : Factor w/ 10 levels "P1","P10","P2",..: 1 1 1 1 1 3 3 ...
> > $ Sub.plot: Factor w/ 5 levels "S1","S2","S3",..: 1 2 3 4 5 1 2 ...
> > $ Grazing : Factor w/ 2 levels "Fenced","Unfenced": 1 1 1 1 1 1 1  ...
> > $ sp1 : int  0 0 0 1 0 0 1 ...
> > $ sp2 : int  0 0 0 0 0 3 3 ...
> > $ sp3 : int  0 1 0 0 1 3 3 ...
> > $ sp4 : int  1 3 13 3 3 3 0 ...
> > $ sp6 : int  0 0 0 0 0 0 0 ...
> >   ...
> > $ tot  : int  93 65 120 80 138 113 ...
> >
> > sp1.glmm <- glmer (cbind (sp1, tot- sp1) ~ Grazing + (1|Plot),
> data=cover,
> > family=binomial (link ="logit"))
> >
> > However, I wonder if binomial distribution can be used (proportion of
> > species cover from a total cover) or if I should  fitted the GLMM with
> > glmmTMB (tweedie distribution)?
> >
> > I would greatly appreciate it if someone could help me.
> >
> > Cheers.
> >
> > Vasco Silva
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] different result for permutest (vegan, R) and permdisp (PERMANOVA/Primer)

2016-11-28 Thread Cade, Brian
Ellen:  Not sure why these differences would occur but note that with two
groups of 3 observations each there are only 6!/(3!3!) = 20 possible
permutations.  Not sure where your 720 came from.  Also, I would not expect
a permutation test for homogeneity of dispersions to be very useful with
such small sample sizes.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Wed, Nov 23, 2016 at 5:40 AM, Ellen Pape  wrote:

> Dear all,
>
> As I want to switch for my multivariate data analyses from PRIMER/PERMANOVA
> to R, I am comparing all tests I am doing in the former with those
> performed in the latter program, to see whether results agree.
>
> I performed a test for the homogeneity of multivariate dispersions in both
> programs (R: betadisper + permutest, Primer: PERMDISP, both using
> centroids), and I have very different results, i.e. very different P
> values. The P value of the permutest (in R) is 0.0013; the one of PERMDISP
> is 0.30. I am really puzzled by this difference, as:
>
> - I have used the same pre-treatment for the data, i.e. standardize (divide
> all entries by row totals, with 1 sample per row) and square-root transform
> -I have used Bray-Curtis dissmilarities in both cases; however in
> Primer/Permanova you calculate a resemblance matrix and thereon you perform
> the PERMDISP. In R you calculate Bray-Curtis distances, which you then
> submit to betadisper and then you can test significance of differences
> using permutations with permutest (if you don't want to use ANOVA)
>
> I have looked at the Bray-Curtis resemblance values in Primer/Permanova and
> I have converted them to distances (Tools-> Dissim), but I get different
> values for Bray-Curtis distances (though they all seem to be higher for
> Primer)! I have pasted the values in Excel so you can have a look for
> yourself. I think this might be the reason for these underlying
> differences?
>
>
> One thing I also saw was,  in the case of my dataset, where I have two
> groups of 3 observations each, permutest had to perform a complete
> enumeration of permutations (=720-1=719), but that the PERMDISP in
> Primer/PERMANOVA still gave as output that it did  permutations, which
> then seems inpossible to me!
>
> Thank you for any advise!
>
> Bets regards
> Ellen
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] How to obtain P value from Monte Carlo sampling for adonis (permanova)?

2016-11-18 Thread Cade, Brian
Ellen:  If you are running permutation procedures with data that have very
small sample sizes in each group (your two groups of n = 3 each yields only
6!/(3!3!) = M = 20 permutations under Ho), then you just have to live with
the fact that the smallest possible P-value is 1/M (= 0.05 for your two
group example).  There is nothing magical about P < 0.05 anyways.  But as
the Monte Carlo resampling approach to obtaining permutation P-values
really is just a method to attempt to approximate the exact permutation
P-value (and usually used when M is so large that you can not enumerate it
exactly in reasonable computation time), you should not rely on it when the
number or permutations M is so small, and especially not just because you
might obtain a P < 1/M.  If you obtain a P-value <0.05 in your example
using the Monte Carlo resampling procedure, all that indicates is that the
Monte Carlo resampling approach is a poor approximation in this small
sample situation.  I think it is always preferable to obtain and interpret
the exact permutation distribution if it is easily calculable.  Using a
crummy approximation just because you want P < 0.05 seems unreasonable to
me.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Fri, Nov 18, 2016 at 1:39 AM, Ellen Pape  wrote:

> Dear all,
>
> I have recently decided to switch from Permanova/Primer to R, because the
> latter is freeware (and I don't know for how long I will still have a
> license). However, if I cannot seem to resolve my problem (see below), I
> might have to go back to using Primer/Permanova.
>
> If I run pairwise permanova/adonis tests on my data, the number of unique
> permutations is small (I have two groups, each group has 3 observations)
> and the minimum P value I can get is larger than the alpha value I (and
> most people) that I use to determine statistical significance (i.e. 0.05).
> In the manual of the PERMANOVA+ add-on in Primer by Anderson et al. (2008)
>  it is mentioned (page 28 and onwards) that when the number of unique
> permutations is small (<100) than one should preferably interpret the
> Monte-Carlo p value.
>
> Does anyone know how to do this in R?
>
>
> In my internet search I stumbled upon this page:
> http://r.789695.n4.nabble.com/monte-carlo-simulations-in-
> permanova-in-vegan-package-td4714034.html.
> "JAri Oksanen answered here: 2. If you want to do so, you can generate your
> resampling matrices by hand and use that matrix as the argument of
> permutations=. See the documentations (?adonis) which tells how to do so.
> ", but I don't understand how to this..
>
> Thank you very much!
>
> Ellen
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Resource selection- correlation between variables

2016-06-07 Thread Cade, Brian
Teresa:  There probably are no simple short cuts here - you need to
investigate the correlations structure for each of your possible
comparisons.  You can use the variance inflation factor function vif() in
the car package for glms, which includes an extension for categorical
predictors.  I recommend the vif over pairwise correlations as it is the
linear correlation among multiple predictors that creates issues.  Note
that really large VIFs (e.g., >10 or so) are likely to indicate instability
with standard errors for regression coefficient estimates.  Smaller VIFs
1-5 largely indicate an issue with how to interpret regression coefficients
as partial effects.  VIFs close to 1 indicate no linear correlation.  You
don't necessarily need to eliminate predictor variables from a model just
because there is some multicollinearity, e.g., VIFs in the range 1-5.  You
just need to understand that the interpretation of the regression
coefficient as a partial effect for a unit change in the predictor variable
really needs to be interpreted as a unit change in the part of the
predictor that is not linearly related to the other predictors (see Cade
2015.  Ecology 96:2370-2382).  This, of course, is why it is so wonderful
to have perfectly uncorrelated predictors - the interpretation of partial
effects is simpler. But that is not realistic for most resource selection
analyses.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Sat, Jun 4, 2016 at 9:54 AM, Teresa Oliveira  wrote:

> Dear all,
>
> I have a doubt regarding correlation between variables, and I would like to
> hear your opinion on this.
>
> Background:
> I am working with telemetry data of a single species, collected by several
> researchers, from five study areas. I aim to analyse resource selection
> (with resource selection functions, RSF), applying Design II (individual
> locations (used resource units) against study area (available resource
> units)) and Design III (individual locations (used resource units) against
> home range area (available resource units)). I have 13 variables in total:
> 10 binary variables (land cover characteristics) and 3 continuous variables
> (roughness, distance to water and distance to human settlements).
>
> I want to construct models for each study area and also a global model
> including all five study areas, because I want to see if it is possible to
> apply a global model for all areas or if they are very different from each
> other. I'm planning to use a sampling with replacement method to understand
> the effect of each area on the global model.
>
> Question:
> Before starting with RSF, I want to check if my variables are independent,
> and I'm not quite sure how I am supposed to conduct the analyses. Should I
> use all data (from all individuals in all study areas) to test correlation
> between all variables? Or should I conduct different analyses for each
> study area (or even each individual)?
>
> Does anyone have any suggestions?
>
> Thank you very much in advance,
> Teresa
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Quantile regressions across several predictors

2016-05-26 Thread Cade, Brian
Peter:  Your question is not quite clear to me.  I thought at first you
might be talking about quantile regression but then you mentioned the 50%
quantile (which is not the mean) of the predictor and binning.  So I'm not
sure exactly what you are after. But under the presumption that you might
really be thinking along the lines of quantile regression (which does not
require binning by predictors), I took your example data and ran it through
a linear quantile regression from quantreg package, where quantiles of the
continuous dependent variable are estimated conditional on an additive
effect of the three predictors provided.  Some summary output below for
0.10, 0.25, 0.50, 0.75, and 0.90 quantiles.  Here it looks as if only pred2
has a strong nonzero (negative) effect for the upper quantiles (0.50, 0.75,
and 0.90) of the dependent variable based on 95% confidence intervals not
overlapping zero.  If this is along the lines of what you were thinking
about, then perhaps you can frame you question in a more focused fashion
and I might be able to provide better advice.  There is much more that can
be done with quantile regression. Plotting this sort of summary info is
especially useful.

Brian


example.qr.results <- rq(dependent ~ pred1 + pred2 +
pred3,data=example.data,tau=c(0.10,0.25,0.50,0.75,0.90))
summary(example.qr.results,se="rank",iid=F,alpha=0.05)

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.1

Coefficients:
coefficients lower bd   upper bd
(Intercept) 1665.53049-17.44156 2493.10597
pred1  8.81923-40.77369   53.37269
pred2-57.39947-85.39144   23.59046
pred3-19.74443-60.76278   61.19992

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.25

Coefficients:
coefficients lower bd   upper bd
(Intercept) 1231.52601821.28092 1935.37219
pred1 -2.25995-29.68130   30.79243
pred2-20.83135-62.107123.75916
pred3 -3.51839-23.45116   13.38838

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.5

Coefficients:
coefficients lower bd   upper bd
(Intercept) 1714.10796729.52807 2553.46234
pred1  2.02560-39.70704   29.34070
pred2-41.81862-81.38048   -4.06101
pred3  2.90515-18.68419   21.02118

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.75

Coefficients:
coefficients lower bd   upper bd
(Intercept) 2118.28691   1186.20556 3496.67829
pred1 17.75399-38.41521   32.63466
pred2-62.43047   -113.90480  -15.35846
pred3 10.53731-41.48255   35.23541

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.9

Coefficients:
coefficients lower bd   upper bd
(Intercept) 2855.31941   1631.16351 4217.13007
pred1  1.31388-71.21536   65.65507
pred2-77.54635   -106.11297  -30.33534
pred3  1.74284-63.49143   56.91477
Warning messages:
1: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
2: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
3: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  4.22535211267606 percent fis <=0
4: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
>




Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Wed, May 25, 2016 at 4:43 PM, peterhouk1 .  wrote:

> Greetings -
>
> I'm wondering if folks might be able to point out the best approach for
> examining the influence of any particular quantile of many predictor
> variables simultaneously?  For instance, the below data show three
> potential predictors of a dependent variable, but in this case, we might
> want to use the 50% quantile (i.e., mean) of each predictor.  I'm wondering
> if there Is any standard approach for dealing with multiple predictors,
> that when binned, can no longer be contrasted in a single model.
>
> Thanks for any discussion and guidance,
>
> Peter
>
>
> pred 1 pred 2 pred 3 dependent
> 2 14 4 800.5987
> 2 18 11 414.1341
> 11 15 12 825.5466
> 11 15 12 1143.972
> 11 14 3 904.4725
> 11 18 15 433.1852
> 11 22 14 726.6624
> 11 16 2 1450.15
> 12 20 2 670.4164
> 12 19 7 741.6311
> 12 15 7 1835.707
> 13 18 14 810.5779
> 13 22 5 418.6701
> 13 16 12 1127.189
> 13 20 1 782.0013
> 14 21 4 875.8959
> 14 16 13 1077.747
> 14 11 9 1949.56
> 15 15 14 972.0584
> 16 20 7 1048.716
> 16 11 8 689.4675
> 16 16 11 1523.632
> 16 21 11 816.4746
> 16 14 4 1303.638
> 16 21 13 1270.525
> 16 20 2 

Re: [R-sig-eco] MuMIn package, Inquiry on summary output _ Conditional or Full average models?

2016-02-16 Thread Cade, Brian
You might want to reconsider whether it make any sense to model average the
individual regression coefficients.  See Cade (2015.  Model averaging and
muddled multimodel inferences.  Ecology 96: 2370-2382).

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Sun, Feb 14, 2016 at 9:07 AM, Daniel Gruner  wrote:

> Dear Laura,
>
> Grueber et al. (2011) discusses the distinctions and the rationale for
> making this choice (p 705-706), citing Burnham & Anderson (2002) and
> Nakagawa and Freckleton (2011).
>
>
> Burnham KP, and DR Anderson (2002). Model Selection and Multimodel
> Inference: a Practical Information-Theoretic Approach. 2nd edition.
> Springer, New York.
>
> Grueber CE, S Nakagawa, RJ Laws, and IG Jamieson (2011). Multimodel
> inference in ecology and evolution: challenges and solutions. Journal of
> Evolutionary Biology 24:699-711.
>
> Nakagawa S, and RP Freckleton (2011). Model averaging, missing data and
> multiple imputation: a case study for behavioural ecology. Behavioral
> Ecology and Sociobiology 65:103-116.
>
>
>
>
>
> On 2/14/2016 10:02 AM, Laura Riggi wrote:
>
>> Dear all,
>>
>> I have a question regarding the output for model averaging in R with
>> MuMin package. In the summary for model averaging two models of coefficient
>> calculations come out: the "full average" and the "conditional (or subset)
>> average" model (example of output below).
>>
>> As explained on the MuMin package pdf:
>> "The 'subset' (or 'conditional') average only averages over the models
>> where the parameter appears. An alternative, the 'full' average assumes
>> that a variable is included in every model, but in some models the
>> corresponding coefficient (and its respective variance) is set to zero.
>> Unlike the 'subset average', it does not have a tendency of biasing the
>> value away from zero. The 'full' average is a type of shrinkage estimator
>> and for variables with a weak relationship to the response they are smaller
>> than 'subset' estimators."
>>
>> However, I cannot find information online concerning the theory behind
>> these different outputs. I am not sure what is the point of having a
>> "conditional" model as it seems to go against the idea of doing a model
>> averaging analysis.
>> Do you know of articles / books that discuss this? When should we use one
>> or the other?
>> Any advice would be appreciated.
>>
>> summary(model.avg(dd, subset = delta < 2))
>>>
>> Call:
>> model.avg.model.selection(object = dd, subset = delta < 2)
>>
>> Component model call:
>> lme.formula(fixed = log(Parasitoi_S1.S2 + 1) ~ <8 unique rhs>, data =
>> data, random = ~1 | Field.x/Site.x, method
>>   = ML, na.action = na.fail)
>>
>> Component models:
>>df  logLik   AICc delta weight
>> 1345   8 -161.74 340.52  0.00   0.22
>> 3457 -162.97 340.74  0.22   0.19
>> 12345  9 -161.26 341.82  1.31   0.11
>> 13456  9 -161.36 342.03  1.51   0.10
>> 2345   8 -162.53 342.10  1.58   0.10
>> 3456   8 -162.54 342.11  1.60   0.10
>> 35 6 -164.76 342.12  1.60   0.10
>> 1457 -163.84 342.47  1.96   0.08
>>
>> Term codes:
>> L OSR2012_X500 OSR2013_X500
>>  Weed.coverWood_X500 Weed.cover:Wood_X500
>> 123
>>   456
>>
>> Model-averaged coefficients:
>> (full average)
>> Estimate Std. Error Adjusted SE z value Pr(>|z|)
>> (Intercept)   2.5693356  0.5295081   0.5337822   4.813  1.5e-06
>> ***
>> L-0.0005893  0.0007720   0.0007756   0.7600.447
>> OSR2013_X500 -3.7932641  2.1558940   2.3307509   1.6270.104
>> Weed.cover0.1331237  0.0915813   0.0922877   1.4420.149
>> Wood_X500-5.5516524  3.2502461   3.5300659   1.5730.116
>> OSR2012_X500  0.4326628  1.3007077   1.3966718   0.3100.757
>> Weed.cover:Wood_X500  0.1922066  0.6215019   0.6255589   0.3070.759
>>
>> (conditional average)
>> Estimate Std. Error Adjusted SE z value Pr(>|z|)
>> (Intercept)   2.5693356  0.5295081   0.5337822   4.813  1.5e-06
>> ***
>> L-0.0011489  0.0007205   0.0007279   1.578   0.1145
>> OSR2013_X500 -4.1301091  1.9155698   2.1268744   1.942   0.0522 .
>> Weed.cover0.1474601  0.0847131   0.0855581   1.724   0.0848 .
>> Wood_X500-5.5516524  3.2502461   3.5300659   1.573   0.1158
>> OSR2012_X500  2.0508163  2.1681256   2.4346913   0.842   0.3996
>> Weed.cover:Wood_X500  0.9639767  1.0923693   1.1039224   0.873   0.3825
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Relative variable importance:
>>   Wood_X500 OSR2013_X500 Weed.cover LOSR2012_X500
>> 

Re: [R-sig-eco] AIC in R: back-transforming standardized model parameters (slopes)

2016-01-12 Thread Cade, Brian
Just to echo Bob O'Hara's comment and elaborate a bit more - Don't model
average the regression coefficients, especially if you are considering
models with and without interactions among the predictors. Follow the link
provided by Bob to Cade (2015.  Model averaging and muddled multimodel
inferences. Ecology 96:2370-2382) to see why model-averaged regression
coefficients as conventionally done following Burnham and Anderson provides
meaningless estimates of partial effects in the presence of
multicollinearity and addresses a concept that doesn't exist (model
uncertainty in regression coefficients) when there is no multicollinearity.
  What you perhaps really need to think about is why you want standardized
predictors. Some times they are useful and some times not.  Here you seem
to be going to a lot of trouble to standardize and then to get back to
unstandardized estimates (Drew and Phil have provided good advice about
recovering estimates in unstandardized scale) but without any indication of
how standardization is aiding your interpretations.  Note that in general,
when standardizing regression coefficients it would be more consistent with
the algebra of the regression coefficients to actually standardize them by
their partial standard deviations that adjusts for the linear relationship
(multicollinearity) among predictors (see the Cade 2015 paper to see why
this suggestion originally by Bring 1994 works).  This may be less relevant
to your simple one continuous predictor (time) model with an interaction
with two levels of a factor (air or ice), but it wasn't clear what other
models you might be considering.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov 
tel:  970 226-9326


On Tue, Jan 12, 2016 at 8:16 AM, Bob O'Hara  wrote:

> On 12/01/16 10:54, Matt Perkins wrote:
>
>> Hi Drew and R-help,
>>
>> Many thanks for your email, and your explanations and suggestions have
>> been very helpful in aiding me to understand this problem. Should you, or
>> anyone else on the helplist have time, perhaps I may elaborate on my
>> problem with a little more detail.
>>
>> I appreciate the pragmatic suggestion to use a non-standardised model to
>> extract ‘real’ slope values that I can report, which is great to hear as I
>> had thought to do this, but wasn’t certain of its legitimacy, so extra
>> support in this direction gives me confidence. However, as I am actually
>> running a number of analyses this approach only works when AIC identifies a
>> clear single best model; in some of these analyses AIC finds good evidence
>> for multiple models in the top model set, such that I use model averaging
>> to produce averaged parameter estimates. Obviously with model averaged
>> parameters, it is not possible to run an alternate non-standardised model.
>> So I need to be able to take a model averaged slope parameter estimate from
>> the averaged model summary and back-transform it to a real slope value.
>>
> Oh well, that makes things much easier - you shouldn't use model averaged
> parameters. They're close to meaningless. :-) (see here, for example:
> http://onlinelibrary.wiley.com/doi/10.1890/14-1639.1/full).
>
> In your case, I don't think you need to worry at all about model
> selection: you are actually interested in the interaction, so you need it
> in the model. Beyond that, just follow Phil Dixon's advice.
>
> Bob
>
>
> The provided equation is useful:
>> z.time = (time – mean(time))/2*sd(time)
>> However, I’m still a little uncertain how best I should employ it, or how
>> I might relate it to a model averaged slope estimate.
>>
>> Here is a brief worked example of my confusion, (data at the bottom of
>> the page):
>>
>> My analysis
>> I would like to test if treatment (kept in air or ice) affects nitrogen
>> (N) within shrimps over time. I have repeated measures per shrimp (
>> unique.id) that I use as a random factor to account for non-independence
>> within an individual.
>>
>> My global model is a linear mixed model:
>> model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)
>>
>> Standardisation:
>> So presumably, using the above equation I can standardise my “time”
>> variable to create a new standardised time variable for use in the analysis.
>> “time” is a continuous variable but with 5 different values (2, 30.5,
>> 58.5, 93, 120 hr) where the mean = 60.8 and SD  = 43.34 (time data below).
>> So for each value of time:
>> new standardized time value = (old time value – 60.8)/(2*43.34).
>>
>> This produces a new standardised time variable (also below).
>>
>> I’m not sure how air.ice (2 level factor air or ice) is being
>> standardised when I use the deafult R code with package "arm":
>> stdz.model1<-standardize(model1, standardize.y=FALSE).
>> I believe the default, which I use, leaves this binomial factor unscaled
>> (as a binomial is 

Re: [R-sig-eco] help with censored regression model

2014-07-07 Thread Cade, Brian
Laura:  I think you need to include an interaction of Year + newWater
(i.e., Year:newWater) if you want trends across Year to be able to differ
by newWater categories (the common regression modeling approach of allowing
separate slopes and intercepts among different categorical groups in a
common model).  As you currently have your model parameterized, only the
intercept terms are able to differ among newWater categories (of course,
this assumes that you have newWater in as a design contrast among
categorical factor levels).

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Thu, Jul 3, 2014 at 2:17 PM, Lee, Laura laura@ncdenr.gov wrote:

 Hello all!

 I am trying to model censored catch-per-unit-effort (CPUE) data. The data
 are censored by trip limits that vary by year. I am using the 'gamlss' and
 'survival' packages. My model covariates are Year and newWater. I need to
 get annual trends in CPUE by waterbody (newWater). After running the model
 and computing the index, I find the trends are the same among all water
 bodies--the magnitude is slightly different. I would not expect the trends
 to be the same among the different areas. I would appreciate any
 assistance. Code is below.

 cy - with(data, ifelse(NUMBERS=Limit, Limit, NUMBERS))
 ci - with(data, ifelse(NUMBERS=Limit, 0, 1))
 data - data.frame(data, cy, ci)
 rm(cy,ci)
 gen.cens(LOGNO,type=right)
 cfit - gamlss(Surv(cy, ci) ~ YEAR + newWater, data=data, family=LOGNOrc)
 data$y - exp(fitted(cfit))
 out - with(data, tapply(y, list(YEAR,newWater), mean))

 Thanks again for your help,

 Laura

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] probability distribution for zero-inflated, right skewed data

2014-06-16 Thread Cade, Brian
You could try estimating the conditional cumulative distribution function
with quantile regression by estimating a large interval of quantiles (e.g.,
0.01 to 0.99 if your n is large enough).  Quantile regression will readily
handle skewed and heterogeneous responses.  Some finessing required to
check when estimates are above a mass of zeros but this is all doable.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Mon, Jun 16, 2014 at 5:57 AM, Johannes Björk bjork.johan...@gmail.com
wrote:

 Dear all,

 Im looking into how to fit a GLM model (Im using rjags) with data that are
 heavily right skewed. In addition, some variables also zero-inflated. The
 data are species area distribution measured as total area (km^2) which is
 subsetted into area in tropical zone and area in temperate zone. The
 last two variables contain zeros.

 I have google zero-inflated models... and most that come up is
 zero-inflated negative binomial and zero-inflated negative poisson for
 count data. I reckon I cannot use any of these distributions since my
 variables are not discrete.

 Any pointer to which distribution(s) that might fit this kind of data
 would be much appreciated.

 Attached: Histograms of the data
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Extract residuals from adonis function in vegan package

2014-03-18 Thread Cade, Brian
You ought to be very careful about using residuals from one analysis as the
response variable in another analysis as the inferences about the second
analysis will almost certainly be flawed.  Best to try and do this another
way if at all possible.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Tue, Mar 18, 2014 at 5:53 AM, Alicia Valdés
aliciavaldes1...@gmail.comwrote:

 Hi,

 I am using the adonis function in the vegan package to determine effects of
 different environmental factors in forest plant community composition in
 different regions. I would like to first use adonis to remove the region
 effect, this is, to fit a model like

 adonis_region- adonis(community ~ region,permutations=999,method=bray)

  Where community is a presence/absence data matrix of species in forest
 patches belonging to different regions.

 Then I would like to use the residuals of this model as the response
 variable for some other analyses. I know I could use strata to model region
 as a block variable, but this does not interest me, as afterwards I want to
 perform another kind of analysis where I would like the region effect to be
 removed.

 My problem is that I cannot figure out how to get residual values from the
 adonis model.

 Any hint of some other kind of multivariate analysis that could solve this
 problem is very welcome.

 Thanks in advance,

 --
 Alicia Valdés
 Université de Picardie Jules Verne
 Unité EDYSAN (Ecologie et Dynamique des Systèmes Anthropisés)
 1, rue des Louvels
 F-80037 Amiens Cedex
 France
 tel:+33 322825775
 alicia.val...@u-picardie.fr
 http://www.u-picardie.fr/edysan/

 [[alternative HTML version deleted]]


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] plotting models with confidence bands from glmer

2014-02-27 Thread Cade, Brian
Travis:  I wonder if you can modify the example from predict.lm to do
something comparable (saw this posting recently) with mixed effects models
from glmer().

?predict.lm

Offers this example, which seems to meet the request

x - rnorm(15)
y - x + rnorm(15)
predict(lm(y ~ x))
new - data.frame(x = seq(-3, 3, 0.5))
predict(lm(y ~ x), new, se.fit = TRUE)
pred.w.plim - predict(lm(y ~ x), new, interval = prediction)
pred.w.clim - predict(lm(y ~ x), new, interval = confidence)
matplot(new$x, cbind(pred.w.clim, pred.w.plim[,-1]),
lty = c(1,2,2,3,3), type = l, ylab = predicted y)


Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Thu, Feb 27, 2014 at 12:48 PM, Travis Belote travis_bel...@tws.orgwrote:

 Hi all,

 I'm wondering if someone can help me figure out how to produce plots of
 model fits that include 95% CI bands for a generalized linear mixed model.
 I'm using glmer in lme4 to run essentially an ANCOVA to investigate a
 three-way interaction between one categorical variable (3 different
 species) and 2 continuous variables to investigate survival probability (0
 or 1) of trees.

 I've found a 3-way interaction between these variables. I have produced a
 stacked graph showing how the survival probabilities for different species
 (3 different lines) vary across a gradient of one of the variable and at 2
 levels of one of the other variables (shown by 2 panels). I used the
 parameter estimates to produce the predicted models as line graphs, but I'd
 like to add a confidence band around the models. I've been looking in Zuur
 et al's Mixed effects models and extensions in ecology in R, but wonder
 if someone has a trick to doing this.

 Thanks for any insights!
 Travis


 Travis Belote, Ph.D.
 Research Ecologist
 The Wilderness Society | Northern Rockies Regional Office
 503 W. Mendenhall, Bozeman, MT 59715
 office: 406.586.1600 x110 | cell: 406.581.3808

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mixed effect (intercept) model with AR1 autocorrelation structure

2013-07-18 Thread Cade, Brian
Jonathan:  I wonder if your solution might be as simple as using one of
your model forms but allowing different intercepts for different plots by
modeling them as fixed effects (using a categorical variable for plot).
 This would allow your time series model (whatever specification you use)
to have different average levels.  This approach would require 11 df but it
seems like you might have sufficient n to part with a few more df.
 Remember, there almost always is a fixed effect counterpart to any random
effect specification, they just may use more df.  Indeed, I might model the
entire problem with fixed effects, using a linear time trend, perhaps
autoregressive lags, perhaps sine and cosine terms for the weekly
fluctuations, a categorical predictor for plots, interactions between the
categorical predictor for plots and all the terms related to time to allow
each plot to have different time series and to obtain an average across
plots within treatment, and then a categorical predictor for treatment.
 Just requires sufficient available df.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Thu, Jul 18, 2013 at 8:34 AM, Henkelman, Jonathan 
jonathan.henkel...@usask.ca wrote:

 Gavin,

 Welcome to the prairies...

 I think this thread is getting off track... a few comments on the
 autocorrelation model, then bringing it back to the question I was trying
 to ask:

 I agree completely that we *must* model the time-series because our
 readings are not independent and hence we are violating the assumptions of
 a standard model.  There is no clear deterministic trend to the temperature
 readings: we are only modelling summer temperature and the single arch of
 the season is lost in weekly stochasticity.  There is also no significant
 linear trend (p-value = 0.3).  As I state in my first paragraph I have
 modelled the summer trend with a gam as an alternative approach (highly
 significant with 9 df).  However, even the basic AR1 model gives cleaner
 residuals.  Also, as I mention in bullet 3, I have an ARMA(2,0) model
 working that does a better job of dealing with the autocorrelation (as
 shown by ACF); I will expand this model once the conceptual problems I
 presented are sorted out.

 The problems I have with modelling the summer temperatures as a
 deterministic effect (fixed effect, gam) are outlined in my previous post:
 it limits our inference (as per Zuur 327, although he is discussing a
 slightly different context); and whatever gam I come up with this year
 won't be consistent with next year, whereas an autocorrelation model will
 be.  I also think it is more ecologically valid. I can see there could be
 different points of view here however and debate is healthy.

 Coming back to what I see as the problem at hand...  I _have_ several
 models that adequately deal with the autocorrelation.  Normalized residuals
 for the AR1 model plotted through time are shown in Figure 2 (see end of
 paragraph)  However, residuals plotted against Plot# are shown in Figure 1.
  I don't have (too much) of a problem with Figure 2 (although variance
 seems to be decreasing with time)... I do have a big problem with figure 1!
  This is what I wanted to address in my new model -- a random intercept
 model to account for the different means (and variances) of the residuals
 in each plot.

 [My plots don't include in the posting so I've put them on the web:
 http://homepage.usask.ca/~jmh842/Figure%201.pdf
 http://homepage.usask.ca/~jmh842/Figure%202.pdf
 ]

 However, this model doesn't work! -- it runs, but gives no random effect
 (see original post for code and results...).  The residuals of this new
 model show an identical pattern to Figure 1.  Hence something is going
 wrong in the computation element within nlme::lme.  This is the question I
 wanted to address with this post.

 I was up examining Zuur last night and found a passing reference to this
 problem on page 160 It may be an option to extend the model with the AR1
 correlation, with a random intercept on nest [a case which would correspond
 exactly to my scenario]. Such a model allows for the compound correlation
 between all observations from the same nest, and temporal correlation
 between observations from the same nest _and_ night.  *But the danger is
 that the random intercept and auto-correlation will fight with each other
 for the same information.*  [* are mine...] I think this is exactly what
 is happening: the two correlation specifications are fighting with each
 other and the mixed intercept is loosing.

 I have had many suggestions about other ways to model this dataset.  I
 have been and will continue to pursue these ideas (I'd also like to build a
 simple Baysian model as Zuur does in chapter 23 to deal with a similar
 problem).  However, I have not had anyone comment directly on my two main
 

Re: [R-sig-eco] Comparing variable importance across sites

2013-06-24 Thread Cade, Brian
Chris:  It may make sense to compare relative importance of predictors
across regions.  But it makes NO SENSE to base measures of relative
importance of predictors in your models on AIC weights.  AIC weights apply
to an entire model (1 to many predictors) not to individual predictors
within a model.  This is a flawed procedure advocated by Burnham and
Anderson.  Check out the simulation study by Murray and Connor (2009.
 Ecology 90:348-355). The AIC weights approach to relative importance could
only do anything useful if all your models included a single predictor.
 There are lots of other legitimate approaches to computing relative
importance of predictors within your models (ratio of t-statistics,
standardized coefficients, hierarchical partitioning, variance
decomposition, etc.).  Use those, not AIC weights.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Mon, Jun 24, 2013 at 3:11 AM, Chris Mcowen chrismco...@gmail.com wrote:

 Dear list:



 I am investigating the drivers of variance in fisheries yield over time,
 across multiple sites (52).My aim is to isolate the drivers of yield in one
 site compared to another, in order to investigate spatial heterogeneity in
 the primary correlates.



 To date i have approached this by calculating the importance of each
 potential correlate (3) using an AIC framework (e.g. using model weights)
 for each site.



 From this analysis I have a matrix: of  site against variable importance



 To my question. First, does it make sense, from a theoretical perspective,
 to make conclusions by comparing the importance of a variable in one region
 to another, for example, making statements like: In site A temperature was
 the most important variable in explaining changes in fisheries yield whilst
 in site G it was productivity?



 Second, I then clustered the variable  importance values and ended up with
 3
 distinct clusters, from which i want to say -yield within cluster one is
 primarily driven by temperature, whilst for those sites in cluster 2 it is
 driven by productivity?



 I have got some nice results, that make sense from an ecological
 perspective
 i just wanted to check my approach?



 Thanks










 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] USGS Mail test

2013-04-23 Thread Cade, Brian
Response back to test.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Tue, Apr 23, 2013 at 9:04 AM, Ouren, Douglas our...@usgs.gov wrote:

 This is only a test to see if USGS server is working correctly.

 --
 Doug Ouren
 USGS Fort Collins Science Center
 2150 Centre Avenue, Bldg. C
 Fort Collins, CO, 80526-8118
 email: our...@usgs.gov
 Phone: 970-226-9476
 Fax  : 970-226-9298
 *

 **

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Double zeros and distance measure confusions and thoughts

2013-04-19 Thread Cade, Brian
Hello All:  This post was motivated by the earlier posts this week
regarding CCA/NMDS/RDA etc and dissimilarity measures.  I have often
thought that the usual thinking on double zeros for species
abundance/composition comparisons across sites has confused several issues
and seems driven by an unrealistic expectation that we can have a useful
dissimilarity/similarity measure to look across multiple species at
multiple sites that will simultaneously provide useful information on
absolute abundance, composition (relative abundance) and presence/absence.
 I developed the examples below several years ago to demonstrate to myself
that Euclidean distance can always be made to perform usefully as a
dissimilarity/similarity measure with double zero data and that the only
issue is whether you want a dissimilarity/similarity measure that is
sensitive to absolute abundance, relative abundance (composition), or
presence/absence. I believe it is unrealistic to think you can have a
dissimilarity/similarity measure that adequately compares two or all three
attributes simultaneously.  I post these examples just to see what others
think about this.

Example problem with interpretation of double 0's from Legendre and
Legendre 1998:278); 3 sites (rows) with 3 species (col); done first as
absolute abundances; then as relative abundances(composition); both using
Euclidean distances.

 x
 [,1] [,2] [,3]
[1,]011
[2,]100
[3,]044

 dist(x)
 12
2 1.732051
3 4.242641 5.744563

Legendre and Legendre (1998) and others argue that site 1 should be closer
to site 3 than site 2 because sites 1 and 3 share two species (2 and 3),
whereas sites 1 and 2 share no species. Hidden in this statement about
sharing really is a compositional type interpretation, i.e., relative
abundance. So convert the species abundances to relative proportions
(compositions),and then using Euclidean distance provides distance measures
that indicate similarity between sites 1 and 3 and the same dissimilarity
between sites 1 and 2, and 2 and 3.

 xc
 [,1] [,2] [,3]
[1,]0  0.5  0.5
[2,]1  0.0  0.0
[3,]0  0.5  0.5
 dist(xc)
 12
2 1.224745
3 0.00 1.224745

Notice no change in distance measure (Euclidean) was required, just a
change from absolute to relative abundances.  I would easily argue that if
you were interested in absolute abundances that the Euclidean distance
measures done in the first computation above do correctly recognize that
site 1 and 2 are closer than sites 1 and 3, which are slightly closer than
sites 2 and 3. So I don't believe the use of Euclidean distances is
inappropriate for abundance data, just that if you want a compositional
type of interpretation (proportion of species shared) that you ought to
convert abundances to relative
proportions prior to using Euclidean distances.

Similar arguments could be made for the data below from McCune and
Grace(2002) box 6.2, where sites 1 and 2 and sites 1 and 3 differences were
compared with Euclidean versus city-block distance. City-block distance say
the differences between sites 1 and 2 and between 1 and 3 are the same
(12), whereas the Euclidean distances are 9.165 versus 6.0.  Note that
converting to relative abundances yields Euclidean distances of 0.574 and
0.267, which seems reasonable in a compositional type of interpretation.
 The city-block distance seems to be emphasizing a compositional
interpretation based on presence/absence, proportion of species with any
abundance0.

 x2
 [,1] [,2] [,3] [,4]
[1,]4201
[2,]511   10
[3,]7534
 dist(x2)
 12
2 9.165151
3 6.00 7.745967

###Convert to relative proportions

x2c-matrix(c(4/7,5/17,7/19,2/7,1/17,5/19,0,1/17,3/19,1/7,10/17,4/19),nrow=3,
ncol=4)
 x2c
  [,1]   [,2]   [,3]  [,4]
[1,] 0.5714286 0.28571429 0. 0.1428571
[2,] 0.2941176 0.05882353 0.05882353 0.5882353
[3,] 0.3684211 0.26315789 0.15789474 0.2105263

 dist(x2c)
  1 2
2 0.5746326
3 0.2668908 0.4469370

So we can convert the species abundances to presence/absence (1/0) data
form and use Euclidean distances.  This now has an intepretation comparable
to the city-block distance, i.e., sites 1 and 2 and sites 1 and 3 differ by
the same amount and sites 2 and 3 don't differ.  So really no need to
change to a distance measure other than Euclidean, we just need to have the
species measures either in absolute abundances, relative abundances
(compositions), or presence/absence depending on the desired
interpretations.  Can different distance measures really be expected to
provide simultaneous interpretations in both absolute abundance, relative
abundance, and presence/absence measures?
I don't think so.

 x2d
 [,1] [,2] [,3] [,4]
[1,]1101
[2,]1111
[3,]1111

 dist(x2d)
  1 2
2 1
3 1 0


Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science 

Re: [R-sig-eco] hurdle model - weight of habitat variables in each component

2013-03-01 Thread Cade, Brian
Joana and any others:  You cannot obtain a valid or useful measure of
relative importance of predictor variables across multiple models by
applying relative AIC weights or using model averaged coefficients unless
all your models included a single predictor (which, of course, is not what
is usually done).   And this applies to hurdle or any other models.  AIC
(and relative weights) apply to the log likelihood maximized for estimating
a model that may be composed of 1 to many predictors.  The log likelihood
nor its associated AIC for a model has any ability to distinguish among the
contributions of the individual predictors used in maximizing the log
lilkelihood, and most useful definitions of relative importance of
predictors within a model requires some ability to make that distinction.
 The best that AIC and relative AIC weights applied to individual
 predictor coefficients can tell you is the relative importance of the
models in which the predictors occurred.  And that is not the same as
relative importance of predictors for most statisticians.   It is quite
possible to have a predictor with little relative importance within a model
that has high relative importance, and, of course the opposite is true too.
 This AIC weights approach ignores that fact.  Burnham and Anderson (2002,
2004) have done ecologists a great disservice by suggesting that AIC model
weights can be used to address relative importance of individual predictors
in models that included multiple predictors.  AIC model weights can be used
to assess the relative importance of models (that are combinations of one
to many predictors) but are insufficient to address the relative importance
of individual predictor
variables because they don't recognize the differential contribution of
individual predictors to the likelihood (or equivalently deviance, or sums
of squares).  Indeed, the use of AIC model weights, as employed by most
people, acts as if for a given model that all predictors  contributed
equally to the likelihood and, thus, get the same weight for being in that
model.  That is a totally unreasonable assumption and never likely in
practice.  AIC weights are based on AIC that are computed from the log
likelihood maximized by all predictors simultaneously.  There is nothing in
the theory behind AIC that suggests you can attribute the log likelihood
equally to all predictor variables in the model.  I'm not sure why Burnham
and Anderson (2002) propagated such a notion as it totally conflicts with
and ignores a large body of statistical literature on methods for assigning
relative importance of predictors within a given statistical model.
 Examples from some accessible statistical and ecological literature
include:

Bring, J.  1994.  How to standardize regression coefficients.  The American
Statistician 48: 209-213.

Chevan, A., and M. Sutherland.  1991.  Hierarchical partitioning.  The
American Statistician 45: 90-96.

Christensen, R.  1992.  Comments on Chevan and Sutherland.  The American
Statistician 46: 74.

Grömping, U.  2007.  Estimators of relative importance in linear regression
based on variance decomposition.  The American Statistician 61: 139-147.

Kruskal, W., and R. Majors.  1989.  Concepts of relative importance in
recent scientific literature.  The American Statistician 43: 2-6.

MacNally, R.  2000.  Regression model-building in conservation biology,
biogeography and ecology:  The distinction between - and reconciliation of
- 'predictive' and 'explanatory' models.  Biodiversity and Conservation 9:
655-671.

Murray, K., and M. M. Conner.  2009.  Methods to quantify variable
importance:  implications for the analysis of noisy ecological data.
Ecology 90:348-355.

The paper by Murray and Conner (2009) is a simulation study that confirms
and states what was obvious to most statisticians - AIC was not designed to
differentiate among contributions of individual predictors within a model
and, thus, is not appropriate for evaluating relative importance of
individual predictors.  Stick to using AIC weights to assess relative
importance of models (note however that if all your models had single
predictors then ranking models would be the same as ranking predictors).

Relative importance is a slippery concept with many interpretations.  But
useful ones for predictors variables within a given regression model
typically are related to contribution to reducing the objective function
used in statistical estimation (minimizing negative log likelihood, sums of
squares, deviance, etc.) and expected change in the response variable given
a unit change in the predictor (these are well discussed in Bring 1994).
In essence, the relative importance of individual predictors would have to
involve the relative importance within a given model, i.e., how much it
contributes to the likelihood that is maximized, or equivalently the
minimization of residual sums of squares or minimization of deviances
(-2´log likelihood) relative to the other predictors 

Re: [R-sig-eco] proportion data with many zeros

2013-02-01 Thread Cade, Brian
For a fully parametric approach, you might want to use of zero-inflated
beta distribution (e.g., as available in gamlss package), which is designed
for zero-inflated proportions.  Or for a semi-parametric approach, you
could estimated a sequence of quantile regression estimates (e.g., in
package quantreg), where some interval (hopefully not to large) of the
quantiles will be uninformative because they are massed at the zero values.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  brian_c...@usgs.gov
tel:  970 226-9326



On Fri, Feb 1, 2013 at 1:30 AM, v_coudr...@voila.fr wrote:

 Dear all, I am trying to test how the proportion of pollen of different
 plants found in the brood cells of a wild bee changes over time. I
 conducted 4 sampling sessions
 (thus time is a factor with 4 levels) and collected several pollen samples
 for each time point (300 pollen grains counted for each sample). I thought
 about applying a
 quasi-binomial glm:

 y = cbind(total pollen - pollen of plant X, pollen of plant X)

 glm(y~time, family=quasibinomial)

 The problem is that I have a lot of zero value, because the pollen of some
 plants only occurred rarely or very clumped in time. I thought about
 applying a zero-inflated
 model, but I have never used it and I am not sure if it is suitable for
 proportion data. Additionally I wondered if I have to consider the fact
 that I don't have the same
 number of pollen sample for each date, which makes my design unbalanced.
 Thank you in advance for advice.

 Best wishes
 Valérie
 ___
 CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr
 http://sports.voila.fr/football/can/

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology