Re: [R] Error using glm with poisson family and identity link
On Thu, 25 Nov 2004, Peter Dalgaard wrote: I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator. It's harder than it looks (well, my experience is with the log-binomial model, but it should be similar). The constraints are not box constraints, but a more general set of linear constraints, and you either have to find the convex hull of the data or use a constraint for every data point. Also, the algorithm in glm.fit, while not perfect, is a little smarter than a simple IRLS. It uses step-halving to back away from the edge, and when the parameter space is convex it has a reasonable chance of creeping along the boundary to the true MLE. I think better glm fitting is worth pursuing: computational difficulties with the log-binomial model have forced many epidemiologists turn to estimators other than the MLE (or contrasts other than the relative risk), which is a pity. -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Thomas Lumley [EMAIL PROTECTED] writes: Also, the algorithm in glm.fit, while not perfect, is a little smarter than a simple IRLS. It uses step-halving to back away from the edge, and when the parameter space is convex it has a reasonable chance of creeping along the boundary to the true MLE. Hmm. That wasn't my experience. I had a situation where there was like a (virtual) maximum outside the boundary, and the algorithm would basically stay on the path to that peak, banging its little head into the same point of the wall repeatedly, so to speak. (If you make a little drawing of an elliptical contour intersecting a linear boundary, I'm sure you'll see that this process leads to a point-of-no-progress that can be quite far from the maximum along the edge.) I think better glm fitting is worth pursuing: computational difficulties with the log-binomial model have forced many epidemiologists turn to estimators other than the MLE (or contrasts other than the relative risk), which is a pity. Yup... -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
On 29 Nov 2004 18:07:40 +0100, Peter Dalgaard [EMAIL PROTECTED] wrote: I had a situation where there was like a (virtual) maximum outside the boundary, and the algorithm would basically stay on the path to that peak, banging its little head into the same point of the wall repeatedly, so to speak. I love that image: the little optimizer that couldn't. I hope the fortunes maintainer is listening... Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
On 26-Nov-04 Prof Brian Ripley wrote: On Thu, 25 Nov 2004, Spencer Graves wrote: I generally avoid constrained optimizers for three reasons: 1. My experience with them has included many cases where the optimizer would stop with an error when testing parameter values that violate the constraints. If I transform the parameter space to remove the constraints, that never happens. The constrained optimizers in R 2.0.1 may not exhibit this behavior, but I have not checked. They do not, and OTOH if the MLE really does lie on the boundary (and here it may well, with one data point fitted with mean zero) transformation will often not find a good solution. Constrained optimization is a hard problem, good methods are very complex and good code to implement them is usually expensive. I've had success with awkward constrained optimisation using the Nelder-Mead procedure -- simply make the region beyond the constraint hot enough for the simplex to recoil on touching it. I once tested it in what I thought might be a tough case by planting trees all over a shallow quadratic bowl. More precisely, many circular regions which were not to be entered. The objective was to locate the minimum of the bowl. The function was defined so as to take a high value inside each circular region. The simplex took a long time to locate the minimum, spending a lot of the time feeling its way slowly round the trunks of the trees, and some of the time trotting downhill over open space, but it did get there. Nelder-Mead is pretty robust. Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 26-Nov-04 Time: 09:39:08 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Thanks very much to everybody for your replies and comments! Cheers, Federico __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
On Thu, 25 Nov 2004, Federico Gherardini wrote: Hi all I'm trying to use the function glm from the MASS package to do the following It's in the stats package. fit. fit - glm(FP ~ rand, data = tab, family = poisson(link = identity), subset = rand = 1) (FP is = 0) but I get the following error Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x) And did you follow the advice in the error message? in contrast if I fit a model without intercept fit - glm(FP ~ rand - 1, data = tab, family = poisson(link = identity), subset = rand = 1) everything goes fine. Now my guess is that the points naturally have a negative intercept so the error is produced because I'm using the poisson distribution for the y and negative values are of course not admitted. Am I right? We don't have your data, but it is plausible. Also if this is the cause, shouldn't the function always try to do the best fit given the parameters? I mean shouldn't it fit a model with intercept 0 anyway and report it as a bad fit? Well, I believe functions should do what they say on the box (and the help page), and not what some user hopes they might do by mind-reading. You do have a suitable set of starting values from the second fit, so why not just follow the rather explicit advice? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Dear Federico: Why do you use the identity link? That can produce situations with an average of (-2) Poisson defects per unit, for example. That's physical nonsense. Also, it seems essentially to be generating your error message. Also, have you considered the following: fit - glm(FP ~ offset(log(rand)), data = tab, family = poisson, subset = rand = 1) I haven't tried this, but it looks like this model is virtually equivalent to the one you wrote: FP ~ rand-1 with link = identity says estimate 'b' in PoissonMean = b*rand. FP ~ offset(log(rand)) with link = log (the default) says estimate 'a' in log(PoissonMean)-log(rand) = a. If I haven't made an error, then log(b) = a. In more general situations, if you really need the identity link, have you considered searching for good starting values, as Prof. Ripley suggested? You could build up to your final model by estimating simpler models and obtaining trial fits using the default log link? With those results and a little thought, you should be able to obtain reasonable starting values. hope this helps. spencer graves Prof Brian Ripley wrote: On Thu, 25 Nov 2004, Federico Gherardini wrote: Hi all I'm trying to use the function glm from the MASS package to do the following It's in the stats package. fit. fit - glm(FP ~ rand, data = tab, family = poisson(link = identity), subset = rand = 1) (FP is = 0) but I get the following error Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x) And did you follow the advice in the error message? in contrast if I fit a model without intercept fit - glm(FP ~ rand - 1, data = tab, family = poisson(link = identity), subset = rand = 1) everything goes fine. Now my guess is that the points naturally have a negative intercept so the error is produced because I'm using the poisson distribution for the y and negative values are of course not admitted. Am I right? We don't have your data, but it is plausible. Also if this is the cause, shouldn't the function always try to do the best fit given the parameters? I mean shouldn't it fit a model with intercept 0 anyway and report it as a bad fit? Well, I believe functions should do what they say on the box (and the help page), and not what some user hopes they might do by mind-reading. You do have a suitable set of starting values from the second fit, so why not just follow the rather explicit advice? -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Spencer Graves [EMAIL PROTECTED] writes: Dear Federico: Why do you use the identity link? That can produce situations with an average of (-2) Poisson defects per unit, for example. That's physical nonsense. So is _not_ using the identity link when the model is manifestly additive on the identity scale. E.g. calibrating differential spectrofluorometry with photon counters recording linear combinations of intensities at different wavelengths. I've bumped into similar situations before (binomial(link=identity), I think it was then) and the glm.fit algorithm could use improvement in dealing with the parameter constraints in these cases. With the standard IRLS algorithm, if the maximum is on the boundary, you basically hit a random point on the boundary and get stuck there with a search direction pointing out of the valid region. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Hi, Peter: What do you do in such situations? Sundar Dorai-Raj and I have extended glm concepts to models driven by a sum of k independent Poissons, with the a linear model for log(defectRate[i]) for each source (i = 1:k). To handle convergence problems, etc., I think we need to use informative Bayes, but we're not there yet. In any context where things are done more than once [which covers most human activities], informative Bayes seems sensible. A related question comes with data representing the differences between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects added between steps i-1 and i in a manufacturing process. Most of the time, d[i] is nonnegative. However, in some cases, it can be negative, either because of metrology errors in X[i] or because of defect removal between steps i-1 and i. Comments? Best Wishes, Spencer Graves Peter Dalgaard wrote: Spencer Graves [EMAIL PROTECTED] writes: Dear Federico: Why do you use the identity link? That can produce situations with an average of (-2) Poisson defects per unit, for example. That's physical nonsense. So is _not_ using the identity link when the model is manifestly additive on the identity scale. E.g. calibrating differential spectrofluorometry with photon counters recording linear combinations of intensities at different wavelengths. I've bumped into similar situations before (binomial(link=identity), I think it was then) and the glm.fit algorithm could use improvement in dealing with the parameter constraints in these cases. With the standard IRLS algorithm, if the maximum is on the boundary, you basically hit a random point on the boundary and get stuck there with a search direction pointing out of the valid region. -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Spencer Graves [EMAIL PROTECTED] writes: Hi, Peter: What do you do in such situations? Sundar Dorai-Raj and I have extended glm concepts to models driven by a sum of k independent Poissons, with the a linear model for log(defectRate[i]) for each source (i = 1:k). To handle convergence problems, etc., I think we need to use informative Bayes, but we're not there yet. In any context where things are done more than once [which covers most human activities], informative Bayes seems sensible. A related question comes with data representing the differences between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects added between steps i-1 and i in a manufacturing process. Most of the time, d[i] is nonnegative. However, in some cases, it can be negative, either because of metrology errors in X[i] or because of defect removal between steps i-1 and i. Comments? I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
Hi, Peter: Thanks for the comment and reply. I generally avoid constrained optimizers for three reasons: 1. My experience with them has included many cases where the optimizer would stop with an error when testing parameter values that violate the constraints. If I transform the parameter space to remove the constraints, that never happens. The constrained optimizers in R 2.0.1 may not exhibit this behavior, but I have not checked. 2. In a few cases, I've plotted the log(likelihood) vs. parameter values using various transformations. When I've done that, I typically found that the most nearly parabolic performance used unconstrained parameterizations. This makes asymptotic normality more useful and increases the accuracy of simple, approximate sequential Bayesian procedures. 3. When I think carefully about a particular application, I often find a rationale for claiming that a certain unconstrained parameterization provides a better description of the application. For example, interest income on investments is essentially additive on the log scale. Similarly, the concept of materiality in Accounting is closer to being constant in log space: One might look for an error of a few Euros in the accounts of a very small business, but in auditing some major government accounts, errors on the order of a few Euros might not be investigated. Also, measurement errors with microvolts are much smaller than with megavolts; expressing the measurements in decibels (i.e., on the log scale) makes the measurement errors more nearly comparable. Thanks again for your comments. Best Wishes, Spencer Graves Peter Dalgaard wrote: Spencer Graves [EMAIL PROTECTED] writes: Hi, Peter: What do you do in such situations? Sundar Dorai-Raj and I have extended glm concepts to models driven by a sum of k independent Poissons, with the a linear model for log(defectRate[i]) for each source (i = 1:k). To handle convergence problems, etc., I think we need to use informative Bayes, but we're not there yet. In any context where things are done more than once [which covers most human activities], informative Bayes seems sensible. A related question comes with data representing the differences between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects added between steps i-1 and i in a manufacturing process. Most of the time, d[i] is nonnegative. However, in some cases, it can be negative, either because of metrology errors in X[i] or because of defect removal between steps i-1 and i. Comments? I haven't got all that much experience with it, but obviously, the various algorithms for constrained optimization (box- or otherwise) at least allow you to find a proper maximum likelihood estimator. -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error using glm with poisson family and identity link
On Thu, 25 Nov 2004, Spencer Graves wrote: I generally avoid constrained optimizers for three reasons: 1. My experience with them has included many cases where the optimizer would stop with an error when testing parameter values that violate the constraints. If I transform the parameter space to remove the constraints, that never happens. The constrained optimizers in R 2.0.1 may not exhibit this behavior, but I have not checked. They do not, and OTOH if the MLE really does lie on the boundary (and here it may well, with one data point fitted with mean zero) transformation will often not find a good solution. Constrained optimization is a hard problem, good methods are very complex and good code to implement them is usually expensive. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html