Re: [R-sig-eco] proportion data with many zeros

v_coudrain Mon, 04 Feb 2013 04:43:36 -0800

Thank you very much for clarifying this point. My algorithm is certainly pretty 
bad because as you say I am basically looking at zeros. One point I don't 
really 
understand is that for a pollen type I have a lot of pollen collected at date 
1, some at time 2, few at time 3 and not at all at time 4. I get a significant 
difference 
between time 1 and 2 but no significance between 1 and 3 or 1 and 4. That is 
illogical...maybe is it anyway a problem of the residuals because the residuals 
are 
pretty well balanced for time points with fitted values >0, but for time points 
with no pollen collected there is no variance at all. Well I think that if I 
had a very large 
number of data such that the non-zero part of my data would look nicely 
continuous I could use some zero-inflated models, but with only 4 points in 
time and a 
positive part of the model which does not fit well a continuous distribution it 
is difficult. I'd certainly better take a descriptive way of presenting my data 
for 
sparse pollen types.


Best wishes
Valérie


> Message du 04/02/13 à 13h15
> De : "Liz Pryde" 
> A : "v_coudr...@voila.fr" 
> Copie à : 
> Objet : Re: [R-sig-eco] proportion data with many zeros
> 
> Hi,
> If you're using a categorical predictor those QQ plots Etc are pretty 
> useless. Just do a residuals vs fits plots and make sure the residuals look 
> Randomly 
scattered.
> 
> Is the problem with the smaller pollen types just that they're very low 
> across all time scales? The algorithm won't fit b/c you're basically looking 
> at zero data - or 
a vector of zeroes. So you can assume that this is sig diff from the abundant 
types. This is to do with the way ML estimation works - it's a bit complicated. 
> Some people suggest using bayes methods for this (& it works well) but its 
> way too over-complicated for what you're trying to answer.
> 
> The mean variance relationship is specified by the 'family' part if the GLM 
> formula. It is essentially the error structure if your data.
> Liz
> 
> 
> On 04/02/2013, at 7:55 PM, v_coudr...@voila.fr wrote:
> 
> > I tried to use tweedie and it again worked very well for the most abundant 
> > pollen types and when trying to fit the less abundant ones I got the error: 
> > "glm.fit: 
> > algorithm did not converge".
> > I have the impress that it is hopeless to try fitting a model...But anyway 
> > thank you very much for making me aware of tweedie. I still should go a bit 
> > more into 
the 
> > theorical background. I just wonder about the residuals. For the pollen 
> > types that can be modelled, the QQ-plots don't look very nice, but the 
> > residuals are 
relatively 
> > well homogeneously distributed. It is difficult to judge how good the fit 
> > is, but the results make sense in regard to the raw data.
> > 
> > Valérie
> > ___________________________________________________________
> > CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr 
> > http://sports.voila.fr/football/can/
> 

___________________________________________________________
CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr 
http://sports.voila.fr/football/can/

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] proportion data with many zeros

Reply via email to