Re: [R] inverse prediction and Poisson regression
On Fri, 25 Jul 2003, Vincent Philion wrote: Hello and thank you for your interest in this problem. real life data would look like this: x y 0 28 0.03 21 0.1 11 0.3 15 1 5 3 4 101 300 100 0 x y 0 30 0.002530 0.02 25 0.16 25 1.28 10 10.24 0 81.92 0 X Y 0 35 0.00025 23 0.002 14 0.016 6 0.128 5 1.024 3 8.192 2 X Y 0 43 0.00025 35 0.002 20 0.016 16 0.128 11 1.024 6 8.192 0 Where X is dose and Y is response. the relation is linear for log(response) = b log(dose) + intercept Is that log(*mean* response), that is a log link and exponential decay with dose? Response for dose 0 is a control = Ymax. So, What I want is the dose for 50% response. For instance, in example 1: Ymax = 28 (this is also an observation with Poisson error) Once you observe Ymax, Y is no longer Poisson. So I want dose for response = 14 = approx. 0.3 What exactly is Ymax? Is it the response at dose 0? The mean response at dose 0? The largest response? About the only thing I can actually interpret is that you want to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That one is easy. I think you are confusing response with mean response, and we can't disentangle them for you. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Dear Prof. Ripley M. Philion: First some commentary then questions for Prof. Ripley and M. Philion. COMMENTARY Prof. Ripley said, to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That one is easy. Unfortunately, it is not obvious to me at the moment. From www.r-project.org - search - R site search - LD50, I found dose.p, described on p. 193, sec. 7.2, of Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer). Then I cut the data set down to a size that I could easily play with, and fit Poisson regression: Phytopath - data.frame(x=c(0, 0.03, 0.1), y=c(28, 21, 11)) (fitP100 - glm(y~log(x+0.015), data=Phytopath[rep(1:3, 100),], family=poisson)) Call: glm(formula = y ~ log(x + 0.015), family = poisson, data = Phytopath[rep(1:3, 100), ]) Coefficients: (Intercept) log(x + 0.015) 1.6088 -0.4203 (LD50P100 - dose.p(fitP100, p=14)) Dose SE p = 14: -2.451018 0.04858572 To get a 95% confidence interval from this, in S-Plus 6.1, I did: exp(LD50 + c(-2,0, 2)[EMAIL PROTECTED])-0.015 [1] 0.01762321 0.07120579 0.21279601 R 1.7.1 seemed to require a different syntax, which I couldn't parse in my present insomniac state (3:20 AM in California). QUESTIONS: PROF. RIPLEY: Is this what you said was easy? M. PHILION: Does this provide sufficient information for you to now solve your problem? hope this helps. spencer graves Prof Brian Ripley wrote: On Fri, 25 Jul 2003, Vincent Philion wrote: Hello and thank you for your interest in this problem. real life data would look like this: x y 0 28 0.03 21 0.1 11 0.3 15 1 5 3 4 10 1 30 0 100 0 x y 0 30 0.0025 30 0.02 25 0.16 25 1.28 10 10.24 0 81.92 0 X Y 0 35 0.00025 23 0.002 14 0.016 6 0.128 5 1.024 3 8.192 2 X Y 0 43 0.00025 35 0.002 20 0.016 16 0.128 11 1.024 6 8.192 0 Where X is dose and Y is response. the relation is linear for log(response) = b log(dose) + intercept Is that log(*mean* response), that is a log link and exponential decay with dose? Response for dose 0 is a control = Ymax. So, What I want is the dose for 50% response. For instance, in example 1: Ymax = 28 (this is also an observation with Poisson error) Once you observe Ymax, Y is no longer Poisson. So I want dose for response = 14 = approx. 0.3 What exactly is Ymax? Is it the response at dose 0? The mean response at dose 0? The largest response? About the only thing I can actually interpret is that you want to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That one is easy. I think you are confusing response with mean response, and we can't disentangle them for you. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Prof Brian Ripley [EMAIL PROTECTED] writes: x y 0 28 0.0321 0.1 11 0.3 15 1 5 3 4 10 1 30 0 100 0 Where X is dose and Y is response. the relation is linear for log(response) = b log(dose) + intercept Is that log(*mean* response), that is a log link and exponential decay with dose? Response for dose 0 is a control = Ymax. So, What I want is the dose for 50% response. For instance, in example 1: Ymax = 28 (this is also an observation with Poisson error) Once you observe Ymax, Y is no longer Poisson. So I want dose for response = 14 = approx. 0.3 What exactly is Ymax? Is it the response at dose 0? The mean response at dose 0? The largest response? About the only thing I can actually interpret is that you want to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That one is easy. I think you are confusing response with mean response, and we can't disentangle them for you. I don't feel all that confused. Y is Poisson distributed with some mean depending on x. Ymax is a value at X=0, i.e. Poisson distr. with a mean as large as it can be. I think the main confusion here is trying to fit a functional relationship which doesn't extend to X=0. If you extrapolate a log-loglinear relation back to X=0, you get an infinite maximal response if b is negative, so this is going to be inconsistent with a finite Ymax. In some of the data sets I believe you actually do see a leveling off for very small doses. If you insist on this peculiar model, you'd end up with estimating the mean of Ymax by its observed value. Then you can get b and the intercept from the observations with X0, and find your estimate of halving dose by solving log(Ymax/2) = b * log(dose50) + intercept i.e. dose50 = (log(Ymax/2)-intercept)/b. That's a nonlinear function of the estimates, so you'd need (at least) to employ the Delta method to find the approximate variance of the estimate. However, I'd suggest that you should look for a more realistic functional form of the relation, e.g. a logistic curve in log(x) or a Michalis-Menten style inhibition (mean(Y) = ymax/(1+dose/dose50) or variants thereof). These models are not (necessarily) GLMs, but I think you can fit them quite well with gnls() and a suitable variance function. [In fact the ymax/(1+dose/dose50) model is a GLM if you use an inverse link and reparametrize with 1/ymax, 1/(ymax*dose50), but inverse links are not built-in for the poisson() family, so you'd have to modify the code yourself.] -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Hello, and thanks for this. From www.r-project.org - search - R site search - LD50, I found dose.p, described on p. 193, sec. 7.2, of Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer). I found the same, but this is for logistic regression I think, not Poisson. This is used to calculate the dose which causes 50% death. I could do it this way by pretending my data is binomial. i/e Y/Ymax = some value between 0 and 1. I could pretend my response data is dead or alive, but I'm not sure this is proper. but maybe it is? Any hints on this? Y and Ymax follows a Poisson distribution. Can I just use Y/Ymax, run a logistic regression and go on with my life??? ;-) (LD50P100 - dose.p(fitP100, p=14)) Dose SE p = 14: -2.451018 0.04858572 Wow, you got the dose.p function to work with poisson regression? but under SPlus only? is it giving the right results? ... my present insomniac state (3:20 AM in California). The same for me!!! ;-) I worked on this 2:00AM EDT thanks and have a nice day, possibly starting with a good coffee!!! M. PHILION: Does this provide sufficient information for you to now solve your problem? If this works under R and is giving the right solution, I owe you BIG TIME ! bye for now, -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de Développement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Québec J2S 7B8 téléphone: 450-778-6522 poste 233 courriel: [EMAIL PROTECTED] Site internet : www.irda.qc.ca __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Ymax is the maximum observation in your example, and also the observation at zero. I was asking which you meant: if you meant Y at 0 (and I think you do) then it is somewhat misleading notation. You have a set of Poisson random variables Y_x at different values of x. Poisson random variables have a mean (I am using standard statistical terminilogy), so let's call that mu(x). Then you seem to want the value of x such that mu(x) = mu(0)/2 *or* mu(x) = Y_0/2, and I don't know which, except that in your model mu(0) would be infinity, and so the model cannot fit your data (finite values of Y_0 have zero probability). On Fri, 25 Jul 2003, Vincent Philion wrote: Hello sir, answers follow... ... Where X is dose and Y is response. the relation is linear for log(response) = b log(dose) + intercept *** Is that log(*mean* response), that is a log link and exponential decay with dose? I'm not sure I understand what you mean by mean, (no pun intended!) but Y is a biologicial growth. Only one observation for each X. But this observation is from the growth contribution of about 500 individuals, so I guess it is a mean response by design. the log link is for the Poisson regression, so the GLM is response ~ log(dose), (family=poisson) So you have -Inf as the explanatory variable at zero dose? ...Response for dose 0 is a control = Ymax. So, What I want is the dose for 50% response. *** Once you observe Ymax, Y is no longer Poisson. I don't understand this? What do you mean? Please explain. That was understanding Ymax to be the maximum Y, which is what it looks like. ***What exactly is Ymax? Is it the response at dose 0? Correct. it is measured the same way as for any other Y. (It is also the largest response because the dose is always detrimental to growth) The last is not true, given your assumptions, It could have the largest mean response, but 0 is a possible value for Y_0. ***About the only thing I can actually interpret is that you want to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That's it. that sounds right! How? (Confidence interval on log scale and on real scale, etc) Given that the error on Y is Poisson and not normal ***That one is easy. OK...? Fit a model for the mean response (one that actually can fit your data), and solve the estimated mu(x) = mu()/2 or Y_0/2. That gives you an estimate, and the delta method will give your standard errors. *** I think you are confusing response with mean response, and we can't disentangle them for you. What else is needed? bye for now, -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
The Poisson assumption means that Y is a number of independent events from a theoretically infinite population occurring in a specific time or place. The function glm with 'family=poisson' with the default link = log assumes that the logarithm of the mean of Y is a linear model in the explanatory variable. How is Y measured? Is it the number out of 500 who exceed a certain threshold, or is it the average percentage increase in weight of the 500 or what? If it the number out N, with N approximately 500 (and you know N), then you have a logistic regression situation. In that case, section 7.2 in Venables and Ripley (2002) should do what you want. If Y is a percentage increase When dose = 0, log(dose) = (-Inf). Since 0 is a legitimate dose, log(dose) is not acceptable in a model like this. You need a model like Peter suggested. Depending on you purpose, log(dose+0.015) might be sufficiently close to a model like what Peter suggested to answer your question. If not, perhaps this solution will help you find a better solution. I previously was able to get dose.p to work in R, and I just now was able to compute from its output. The following worked in both S-Plus 6.1 and R 1.7.1: LD50P100p - print(LD50P100) Dose SE p = 14: -2.451018 0.04858572 exp(LD50P100p[1,1]+c(-2,0,2)*LD50P100p[1,2])-0.015 [1] 0.06322317 0.07120579 0.08000303 hope this helps. spencer graves Vincent Philion wrote: Hello sir, answers follow... ... Where X is dose and Y is response. the relation is linear for log(response) = b log(dose) + intercept *** Is that log(*mean* response), that is a log link and exponential decay with dose? I'm not sure I understand what you mean by mean, (no pun intended!) but Y is a biologicial growth. Only one observation for each X. But this observation is from the growth contribution of about 500 individuals, so I guess it is a mean response by design. the log link is for the Poisson regression, so the GLM is response ~ log(dose), (family=poisson) ...Response for dose 0 is a control = Ymax. So, What I want is the dose for 50% response. *** Once you observe Ymax, Y is no longer Poisson. I don't understand this? What do you mean? Please explain. ***What exactly is Ymax? Is it the response at dose 0? Correct. it is measured the same way as for any other Y. (It is also the largest response because the dose is always detrimental to growth) ***About the only thing I can actually interpret is that you want to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That's it. that sounds right! How? (Confidence interval on log scale and on real scale, etc) Given that the error on Y is Poisson and not normal ***That one is easy. OK...? *** I think you are confusing response with mean response, and we can't disentangle them for you. What else is needed? bye for now, __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Hello again, sorry for the notation. Again, I'm just a biologist!!! ;-) But I'm enjoying this problem quite a bit! I'm very grateful for all the input. This is great. On 2003-07-25 08:38:00 -0400 Prof Brian Ripley [EMAIL PROTECTED] wrote: Answers: Ymax is the maximum observation in your example, and also the observation at zero. I was asking which you meant: if you meant Y at 0 (and I think you do) then it is somewhat misleading notation. I will clean up my notation! You have a set of Poisson random variables Y_x at different values of x. Poisson random variables have a mean (I am using standard statistical terminilogy), so let's call that mu(x). Then you seem to want the value of x such that mu(x) = mu(0)/2 *or* mu(x) = Y_0/2, OK, I want x for mu(x) = mu(0)/2. that in your model mu(0) would be infinity, and so the model cannot fit your data (finite values of Y_0 have zero probability). Correct, This is part of the problem! The model does not hold for X = 0. the largest response because the dose is always detrimental to growth) The last is not true, given your assumptions, It could have the largest mean response, but 0 is a possible value for Y_0. Yes, you are right, but then there is no growth, nad no LD50 value, so we reject this sample... Fit a model for the mean response (one that actually can fit your data), and solve the estimated mu(x) = mu()/2 or Y_0/2. That gives you an estimate, and the delta method will give your standard errors. Then you suggest using another model that will account for zero dose, OK. I think I saw something similar in another reply. I need to read it more carefully. -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de Développement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Québec J2S 7B8 téléphone: 450-778-6522 poste 233 courriel: [EMAIL PROTECTED] Site internet : www.irda.qc.ca __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Vincent: Here is a simple solution using Prof. Bates' non-linear least squares algorithm: Best, Ravi. Phytopath - data.frame(x=c(0, 0.03, 0.1), y=c(28, 21, 11)) Phyto.nls - nls(y ~ Ymax/(1 + x/x50),data=Phytopath,start=list (Ymax=20.0,x50=0.01),trace=T) 404.3058 : 20.00 0.01 15.76932 : 27.96313636 0.04960484 2.043625 : 28.2145584 0.0694645 1.851401 : 28.33886844 0.07198951 1.851231 : 28.34892493 0.07185953 1.851230 : 28.34843670 0.07186804 1.851230 : 28.3484688 0.0718675 summary(Phyto.nls) Formula: y ~ Ymax/(1 + x/x50) Parameters: Estimate Std. Error t value Pr(|t|) Ymax 28.348471.31522 21.554 0.0295 * x50 0.071870.01348 5.332 0.1180 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.361 on 1 degrees of freedom Correlation of Parameter Estimates: Ymax x50 -0.6001 - Original Message - From: Vincent Philion [EMAIL PROTECTED] Date: Friday, July 25, 2003 9:25 am Subject: Re: [R] inverse prediction and Poisson regression Hi, ... and good morning! ;-) On 2003-07-25 08:43:35 -0400 Spencer Graves [EMAIL PROTECTED] wrote: The Poisson assumption means that Y is a number of independent events from a theoretically infinite population occurring in a specific time or place. The function glm with 'family=poisson' with the default link = log assumes that the logarithm of the mean of Y is a linear model in the explanatory variable. OK, I think my data can fit that description. How is Y measured? Y is the number of line intercepts which encounters mycelial growth. i/e if mycelia intercepts the line twice, 2 is reported. This follows poisson. If it the number out N, with N approximately 500 (and you know N), then you have a logistic regression situation. No, 500 spores can grow, but there is no real limit on the amount of growth possible, and so no limit on the number of intercepts. So this is why I adopted Poisson, not knowing how complicated my life would become!!! ;-) In that case, section 7.2 in Venables and Ripley (2002) should do what you want. If Y is a percentage increase ... But you may be right, that I'm making this just too complicated and that I should simply look at percentage... Any comments on that? When dose = 0, log(dose) = (-Inf). Since 0 is a legitimate dose, log(dose) is not acceptable in a model like this. You need a model like Peter suggested. OK, I see I will need stronger coffee to tackle this, but I will read this in depth today. Depending on you purpose, log(dose+0.015) might be sufficiently close to a model like what Peter suggested to answer your question. If not, perhaps this solution will help you find a better solution. In other words, cheat and model Y_0 with a small value = log(0.015) ? How would this affect the LD50 value calculated and the confidence intervals? I guess I could try several methods, but how would I go about choosing the right one? Criteria? I previously was able to get dose.p to work in R, and I just now was able to compute from its output. The following worked in both S-Plus 6.1 and R 1.7.1: LD50P100p - print(LD50P100) Dose SE p = 14: -2.451018 0.04858572 exp(LD50P100p[1,1]+c(-2,0,2)*LD50P100p[1,2])-0.015 [1] 0.06322317 0.07120579 0.08000303 OK, I will need to try this (later today). I don't see dose.p in this? again, many thanks, -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de Dveloppement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Qubec J2S 7B8 tlphone: 450-778-6522 poste 233 courriel: [EMAIL PROTECTED] Site internet : www.irda.qc.ca __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
The problem with nls is that it is NOT maximum likelihood for the Poisson distribution. For the Poisson, the standard deviation is the square root of the mean, while nls assumes constant standard deviation. That's why I stayed with glm. The answers should be comparable, but I would prefer the glm answers. spencer graves Ravi Varadhan wrote: Vincent: Here is a simple solution using Prof. Bates' non-linear least squares algorithm: Best, Ravi. Phytopath - data.frame(x=c(0, 0.03, 0.1), y=c(28, 21, 11)) Phyto.nls - nls(y ~ Ymax/(1 + x/x50),data=Phytopath,start=list (Ymax=20.0,x50=0.01),trace=T) 404.3058 : 20.00 0.01 15.76932 : 27.96313636 0.04960484 2.043625 : 28.2145584 0.0694645 1.851401 : 28.33886844 0.07198951 1.851231 : 28.34892493 0.07185953 1.851230 : 28.34843670 0.07186804 1.851230 : 28.3484688 0.0718675 summary(Phyto.nls) Formula: y ~ Ymax/(1 + x/x50) Parameters: Estimate Std. Error t value Pr(|t|) Ymax 28.348471.31522 21.554 0.0295 * x50 0.071870.01348 5.332 0.1180 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.361 on 1 degrees of freedom Correlation of Parameter Estimates: Ymax x50 -0.6001 - Original Message - From: Vincent Philion [EMAIL PROTECTED] Date: Friday, July 25, 2003 9:25 am Subject: Re: [R] inverse prediction and Poisson regression Hi, ... and good morning! ;-) On 2003-07-25 08:43:35 -0400 Spencer Graves [EMAIL PROTECTED] wrote: The Poisson assumption means that Y is a number of independent events from a theoretically infinite population occurring in a specific time or place. The function glm with 'family=poisson' with the default link = log assumes that the logarithm of the mean of Y is a linear model in the explanatory variable. OK, I think my data can fit that description. How is Y measured? Y is the number of line intercepts which encounters mycelial growth. i/e if mycelia intercepts the line twice, 2 is reported. This follows poisson. If it the number out N, with N approximately 500 (and you know N), then you have a logistic regression situation. No, 500 spores can grow, but there is no real limit on the amount of growth possible, and so no limit on the number of intercepts. So this is why I adopted Poisson, not knowing how complicated my life would become!!! ;-) In that case, section 7.2 in Venables and Ripley (2002) should do what you want. If Y is a percentage increase ... But you may be right, that I'm making this just too complicated and that I should simply look at percentage... Any comments on that? When dose = 0, log(dose) = (-Inf). Since 0 is a legitimate dose, log(dose) is not acceptable in a model like this. You need a model like Peter suggested. OK, I see I will need stronger coffee to tackle this, but I will read this in depth today. Depending on you purpose, log(dose+0.015) might be sufficiently close to a model like what Peter suggested to answer your question. If not, perhaps this solution will help you find a better solution. In other words, cheat and model Y_0 with a small value = log(0.015) ? How would this affect the LD50 value calculated and the confidence intervals? I guess I could try several methods, but how would I go about choosing the right one? Criteria? I previously was able to get dose.p to work in R, and I just now was able to compute from its output. The following worked in both S-Plus 6.1 and R 1.7.1: LD50P100p - print(LD50P100) Dose SE p = 14: -2.451018 0.04858572 exp(LD50P100p[1,1]+c(-2,0,2)*LD50P100p[1,2])-0.015 [1] 0.06322317 0.07120579 0.08000303 OK, I will need to try this (later today). I don't see dose.p in this? again, many thanks, -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de Dveloppement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Qubec J2S 7B8 tlphone: 450-778-6522 poste 233 courriel: [EMAIL PROTECTED] Site internet : www.irda.qc.ca __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Hello to all: first and foremost: thank you for all this input. I only discovered about R last week (!) and I think I will dump my SAS license!!! ;-) This is a very dynamic listserve! You R all great! Thank you! I just hope some day I can help out a student the way you did today. I will spend part of the weekend studying the different suggestions in detail. Again, I'm not a stats person, so I will need some time and good coffee to digest all this correctly. (I'm most worried about understanding the nonlin suggestion.) Early next week, I will post a summary of the suggestions and the path I chose to follow. (with proper syntax Professor Ripley, I promise) ;-) Have a nice weekend Best regards, Vincent Philion __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Spencer Graves [EMAIL PROTECTED] writes: The problem with nls is that it is NOT maximum likelihood for the Poisson distribution. For the Poisson, the standard deviation is the square root of the mean, while nls assumes constant standard deviation. That's why I stayed with glm. The answers should be comparable, but I would prefer the glm answers. spencer graves That's why I was suggesting either a variance stabilizing transformation or using gnls() with a weights function. In both cases you need to watch out for scaling of SE's stemming from the fact that the variance is supposed to be known in the Poisson distribution, but the fitting routines estimate a sigma from the residuals anyway. I.e. for instance: Phyto.nls2 - nls(sqrt(y+.5) ~ sqrt(Ymax/(1 + x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T) 9.400602 : 20.00 0.01 0.9064723 : 27.21511020 0.03593643 0.06338235 : 28.21654101 0.05995647 0.02616589 : 28.50221759 0.06815302 0.02608587 : 28.54871243 0.06835046 0.02608586 : 28.54960242 0.06834083 0.02608586 : 28.5495605 0.0683413 summary(Phyto.nls2) Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5) Parameters: Estimate Std. Error t value Pr(|t|) Ymax 28.549561.65113 17.291 0.0368 * x50 0.068340.01264 5.407 0.1164 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.1615 on 1 degrees of freedom Correlation of Parameter Estimates: Ymax x50 -0.6833 but here you need to know that the theoretical sd is 0.5, so the Std. errors all need multiplication by 0.5/0.1615. Using gnls() we'd get (if I got the calling sequence right...) Phyto.gnls - gnls(y ~ Ymax/(1 + x/x50), + data=Phytopath,weights=varPower(fixed=.5),start=list(Ymax=20.0,x50=0.01)) summary(Phyto.gnls) Generalized nonlinear least squares fit Model: y ~ Ymax/(1 + x/x50) Data: Phytopath AIC BIClogLik 13.71292 11.00875 -3.856458 Variance function: Structure: Power of variance covariate Formula: ~fitted(.) Parameter estimates: power 0.5 Coefficients: Value Std.Error t-value p-value Ymax 29.393796 2.4828723 11.838626 0.0536 x50 0.063028 0.0125244 5.032376 0.1249 Correlation: Ymax x50 -0.84 Standardized residuals: [1] -0.4894266 0.7621749 -0.4237346 attr(,std) [1] 2.8478142 1.4239071 0.8586483 attr(,label) [1] Standardized residuals Residual standard error: 0.6367906 Degrees of freedom: 3 total; 1 residual and again, it is necessary to adjust the SE's by multiplying with 1/0.6368 It is in fact rather easy to do direct MLE for this kind of model: with(Phytopath, optim(c(28,.7),fn=function(p){Ymax-p[1];x50-p[2]; + -sum(dpois(y,lambda=Ymax/(1 + + x/x50),log=TRUE))}, method=BFGS, hessian=T)) $par [1] 28.55963487 0.06833524 $value [1] 7.21251 $counts function gradient 42 13 $convergence [1] 0 $message NULL $hessian [,1][,2] [1,] 0.073560726.631868 [2,] 6.63186764 1294.792539 Warning message: NaNs produced in: dpois(x, lambda, log) and we can get SE's from the inverse hessian with sqrt(diag(solve(with(Phytopath, + optim(c(28,.7),fn=function(p){Ymax-p[1];x50-p[2]; + -sum(dpois(y,lambda=Ymax/(1 + + x/x50),log=TRUE))}, method=BFGS, hessian=T))$hessian))) [1] 5.02565893 0.03788052 Notice that the variance stabilizing transform seems to do rather better than gnls() compared to true MLE. I'm slightly puzzled by this, but presumably it has to do with the fact that MLE for a Gaussian model with a mean/variance relationship is *not* identical to the iteratively reweighted least squares that glm() et al. are doing. (I wouldn't quite rule out that I've simply made a mistake somewhere, though...) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
Thanks Peter, for the wonderful illustration of various model-fitting options for the non-linear models. Thinking about your comment that the variance stabilizing transform does better than the weighted non-linear least-squars - I am interpreting this to mean that the residual sum of squares is smaller, 0.16 versus 0.64. A possible explanation may be that in the former case the responses are actually smaller because of the square-rooting (range about 3-5), so the residuals are smaller, whereas in the gnls case the response is the original variable (range about 10-30). Does this seem reasonable? Ravi. - Original Message - From: Peter Dalgaard BSA [EMAIL PROTECTED] Date: Friday, July 25, 2003 3:48 pm Subject: Re: [R] inverse prediction and Poisson regression Spencer Graves [EMAIL PROTECTED] writes: The problem with nls is that it is NOT maximum likelihood for the Poisson distribution. For the Poisson, the standard deviation is the square root of the mean, while nls assumes constant standard deviation. That's why I stayed with glm. The answers should be comparable, but I would prefer the glm answers. spencer graves That's why I was suggesting either a variance stabilizing transformation or using gnls() with a weights function. In both cases you need to watch out for scaling of SE's stemming from the fact that the variance is supposed to be known in the Poisson distribution, but the fitting routines estimate a sigma from the residuals anyway. I.e. for instance: Phyto.nls2 - nls(sqrt(y+.5) ~ sqrt(Ymax/(1 + x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T) 9.400602 : 20.00 0.01 0.9064723 : 27.21511020 0.03593643 0.06338235 : 28.21654101 0.05995647 0.02616589 : 28.50221759 0.06815302 0.02608587 : 28.54871243 0.06835046 0.02608586 : 28.54960242 0.06834083 0.02608586 : 28.5495605 0.0683413 summary(Phyto.nls2) Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5) Parameters: Estimate Std. Error t value Pr(|t|) Ymax 28.549561.65113 17.291 0.0368 * x50 0.068340.01264 5.407 0.1164 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.1615 on 1 degrees of freedom Correlation of Parameter Estimates: Ymax x50 -0.6833 but here you need to know that the theoretical sd is 0.5, so the Std. errors all need multiplication by 0.5/0.1615. Using gnls() we'd get (if I got the calling sequence right...) Phyto.gnls - gnls(y ~ Ymax/(1 + x/x50), + data=Phytopath,weights=varPower(fixed=.5),start=list (Ymax=20.0,x50=0.01)) summary(Phyto.gnls) Generalized nonlinear least squares fit Model: y ~ Ymax/(1 + x/x50) Data: Phytopath AIC BIClogLik 13.71292 11.00875 -3.856458 Variance function: Structure: Power of variance covariate Formula: ~fitted(.) Parameter estimates: power 0.5 Coefficients: Value Std.Error t-value p-value Ymax 29.393796 2.4828723 11.838626 0.0536 x50 0.063028 0.0125244 5.032376 0.1249 Correlation: Ymax x50 -0.84 Standardized residuals: [1] -0.4894266 0.7621749 -0.4237346 attr(,std) [1] 2.8478142 1.4239071 0.8586483 attr(,label) [1] Standardized residuals Residual standard error: 0.6367906 Degrees of freedom: 3 total; 1 residual and again, it is necessary to adjust the SE's by multiplying with 1/0.6368 It is in fact rather easy to do direct MLE for this kind of model: with(Phytopath, optim(c(28,.7),fn=function(p){Ymax-p[1];x50-p[2]; + -sum(dpois(y,lambda=Ymax/(1 + + x/x50),log=TRUE))}, method=BFGS, hessian=T)) $par [1] 28.55963487 0.06833524 $value [1] 7.21251 $counts function gradient 42 13 $convergence [1] 0 $message NULL $hessian [,1][,2] [1,] 0.073560726.631868 [2,] 6.63186764 1294.792539 Warning message: NaNs produced in: dpois(x, lambda, log) and we can get SE's from the inverse hessian with sqrt(diag(solve(with(Phytopath, + optim(c(28,.7),fn=function(p){Ymax-p[1];x50-p[2]; + -sum(dpois(y,lambda=Ymax/(1 + + x/x50),log=TRUE))}, method=BFGS, hessian=T))$hessian))) [1] 5.02565893 0.03788052 Notice that the variance stabilizing transform seems to do rather better than gnls() compared to true MLE. I'm slightly puzzled by this, but presumably it has to do with the fact that MLE for a Gaussian model with a mean/variance relationship is *not* identical to the iteratively reweighted least squares that glm() et al. are doing. (I wouldn't quite rule out that I've simply made a mistake somewhere, though...) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch
[R] inverse prediction and Poisson regression
Hello to all, I'm a biologist trying to tackle a fish (Poisson Regression) which is just too big for my modest understanding of stats!!! Here goes... I want to find good literature or proper mathematical procedure to calculate a confidence interval for an inverse prediction of a Poisson regression using R. I'm currently trying to analyse a dose-response experiment. I want to calculate the dose (X) for 50% inhibition of a biological response (Y). My response is a count data that fits a Poisson distribution perfectly. I could make my life easy and calculate: dose response/control response = % of total response... and then use logistic regression, but somehow, that doesn't sound right. Should I just stick to logistic regression and go on with my life? Can I be cured of this paranoia? ;-) I thought a Poisson regression would be more appropriate, but I don't know how to properly calculate the dose equivalent to 50% inhibition. i/e confidence intervals, etc on the X = dose. Basically an inverse prediction problem. By the way, my data is graphically linear for Log(Y) = log(X) where Y is counts and X is dose. I use a Poisson regression to fit my dose-response experiment by EXCLUDING the response for dose = 0, because of log(0) Under R = glm.dose - glm(response[-1] ~ log(dose[-1]),family=poisson()) (that's why you see the dose[-1] term. The first dose in the dose vector is 0. This is really a nice fit. I can obtain a nice slope (B) and intercept (A): log(Y) = B log(x) + A I do have a biological value for dose = 0 from my control. i/e Ymax = some number with a Poisson error again So, what I want is EC50x : Y/Ymax = 0.5 = exp(B log(EC50x) + A) / Ymax exp((log(0.5) + Log(Ymax)) - A)/B) = EC50x That's all fine, except I don't have a clue on how to calculate the confidence intervals of EC50x or even if I can model this inverse prediction with a Poisson regression. In OLS linear regression, fitting X based on Y is not a good idea because of the way OLS calculates the slope and intercept. Is the same problem found in GLM/Poisson regression? Moreover, I also have a Poisson error on Ymax that I would have to consider, right? Help -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de Développement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Québec J2S 7B8 téléphone: 450-778-6522 poste 233 courriel: [EMAIL PROTECTED] Site internet : www.irda.qc.ca __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] inverse prediction and Poisson regression
1. If you provide a toy data set with, e.g., 5 observations, to accompany your example, it would be much easier for people to try out ideas and then give you a more solid response. 2. Have you tried something like log(dose+0.5) or I(log(dose+0.5)) in your model statement in conjunction with predict or predict.glm on the output from glm? hope this helps. spencer graves Vincent Philion wrote: Hello to all, I'm a biologist trying to tackle a fish (Poisson Regression) which is just too big for my modest understanding of stats!!! Here goes... I want to find good literature or proper mathematical procedure to calculate a confidence interval for an inverse prediction of a Poisson regression using R. I'm currently trying to analyse a dose-response experiment. I want to calculate the dose (X) for 50% inhibition of a biological response (Y). My response is a count data that fits a Poisson distribution perfectly. I could make my life easy and calculate: dose response/control response = % of total response... and then use logistic regression, but somehow, that doesn't sound right. Should I just stick to logistic regression and go on with my life? Can I be cured of this paranoia? ;-) I thought a Poisson regression would be more appropriate, but I don't know how to properly calculate the dose equivalent to 50% inhibition. i/e confidence intervals, etc on the X = dose. Basically an inverse prediction problem. By the way, my data is graphically linear for Log(Y) = log(X) where Y is counts and X is dose. I use a Poisson regression to fit my dose-response experiment by EXCLUDING the response for dose = 0, because of log(0) Under R = glm.dose - glm(response[-1] ~ log(dose[-1]),family=poisson()) (that's why you see the dose[-1] term. The first dose in the dose vector is 0. This is really a nice fit. I can obtain a nice slope (B) and intercept (A): log(Y) = B log(x) + A I do have a biological value for dose = 0 from my control. i/e Ymax = some number with a Poisson error again So, what I want is EC50x : Y/Ymax = 0.5 = exp(B log(EC50x) + A) / Ymax exp((log(0.5) + Log(Ymax)) - A)/B) = EC50x That's all fine, except I don't have a clue on how to calculate the confidence intervals of EC50x or even if I can model this inverse prediction with a Poisson regression. In OLS linear regression, fitting X based on Y is not a good idea because of the way OLS calculates the slope and intercept. Is the same problem found in GLM/Poisson regression? Moreover, I also have a Poisson error on Ymax that I would have to consider, right? Help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help