Re: [R] Error using glm with poisson family and identity link

2004-11-29 Thread Thomas Lumley
On Thu, 25 Nov 2004, Peter Dalgaard wrote:
I haven't got all that much experience with it, but obviously, the
various algorithms for constrained optimization (box- or otherwise) at
least allow you to find a proper maximum likelihood estimator.

It's harder than it looks (well, my experience is with the log-binomial 
model, but it should be similar).  The constraints are not box 
constraints, but a more general set of linear constraints, and you either 
have to find the convex hull of the data or use a constraint for every 
data point.

Also, the algorithm in glm.fit, while not perfect, is a little smarter 
than a simple IRLS. It uses step-halving to back away from the edge, and 
when the parameter space is convex it has a reasonable chance of creeping 
along the boundary to the true MLE.

I think better glm fitting is worth pursuing: computational difficulties 
with the log-binomial model have forced many epidemiologists turn to 
estimators other than the MLE (or contrasts other than the relative risk), 
which is a pity.

-thomas
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-29 Thread Peter Dalgaard
Thomas Lumley [EMAIL PROTECTED] writes:

 Also, the algorithm in glm.fit, while not perfect, is a little smarter
 than a simple IRLS. It uses step-halving to back away from the edge,
 and when the parameter space is convex it has a reasonable chance of
 creeping along the boundary to the true MLE.

Hmm. That wasn't my experience. I had a situation where there was like
a (virtual) maximum outside the boundary, and the algorithm would
basically stay on the path to that peak, banging its little head
into the same point of the wall repeatedly, so to speak.

(If you make a little drawing of an elliptical contour intersecting a
linear boundary, I'm sure you'll see that this process leads to a
point-of-no-progress that can be quite far from the maximum along the
edge.)

 
 I think better glm fitting is worth pursuing: computational
 difficulties with the log-binomial model have forced many
 epidemiologists turn to estimators other than the MLE (or contrasts
 other than the relative risk), which is a pity.

Yup...

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-29 Thread Duncan Murdoch
On 29 Nov 2004 18:07:40 +0100, Peter Dalgaard
[EMAIL PROTECTED] wrote:

 I had a situation where there was like
a (virtual) maximum outside the boundary, and the algorithm would
basically stay on the path to that peak, banging its little head
into the same point of the wall repeatedly, so to speak.

I love that image:  the little optimizer that couldn't.

I hope the fortunes maintainer is listening...

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-26 Thread Ted Harding
On 26-Nov-04 Prof Brian Ripley wrote:
 On Thu, 25 Nov 2004, Spencer Graves wrote:
 
  I generally avoid constrained optimizers for three reasons: 
  1.  My experience with them has included many cases where the
  optimizer would stop with an error when testing parameter
  values that violate the constraints.  If I transform the
  parameter space to remove the constraints, that never happens. 
  The constrained optimizers in R 2.0.1 may not exhibit 
  this behavior, but I have not checked.
 
 They do not, and OTOH if the MLE really does lie on the boundary
 (and here it may well, with one data point fitted with mean zero)
 transformation will often not find a good solution.
 
 Constrained optimization is a hard problem, good methods are very 
 complex and good code to implement them is usually expensive.

I've had success with awkward constrained optimisation using the
Nelder-Mead procedure -- simply make the region beyond the constraint
hot enough for the simplex to recoil on touching it.

I once tested it in what I thought might be a tough case by
planting trees all over a shallow quadratic bowl. More
precisely, many circular regions which were not to be entered.
The objective was to locate the minimum of the bowl.

The function was defined so as to take a high value inside
each circular region.

The simplex took a long time to locate the minimum, spending
a lot of the time feeling its way slowly round the trunks
of the trees, and some of the time trotting downhill over
open space, but it did get there. Nelder-Mead is pretty robust.

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 26-Nov-04   Time: 09:39:08
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-26 Thread Federico Gherardini
Thanks very much to everybody for your replies and comments!
Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Prof Brian Ripley
On Thu, 25 Nov 2004, Federico Gherardini wrote:
Hi all
I'm trying to use the function glm from the MASS package to do the following
It's in the stats package.
fit.
fit - glm(FP ~ rand, data = tab, family = poisson(link = identity), subset 
= rand = 1)
(FP is = 0)

but I get the following error
Error: no valid set of coefficients has been found:please supply starting 
values
In addition: Warning message:
NaNs produced in: log(x)
And did you follow the advice in the error message?
in contrast if I fit a model without intercept
fit - glm(FP ~ rand - 1, data = tab, family = poisson(link = identity), 
subset = rand = 1)

everything goes fine.
Now my guess is that the points naturally have a negative intercept so the 
error is produced because I'm using the poisson distribution for the y and 
negative values are of course not admitted. Am I right?
We don't have your data, but it is plausible.
Also if this is the cause, shouldn't the function always try to do the best 
fit given the parameters? I mean shouldn't it fit a model with intercept 0 
anyway and report it as a bad fit?
Well, I believe functions should do what they say on the box (and the help 
page), and not what some user hopes they might do by mind-reading.

You do have a suitable set of starting values from the second fit, so why 
not just follow the rather explicit advice?

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Spencer Graves
Dear Federico: 

 Why do you use the identity link?  That can produce situations 
with an average of (-2) Poisson defects per unit, for example.  That's 
physical nonsense.  Also, it seems essentially to be generating your 
error message. 

 Also, have you considered the following: 

fit - glm(FP ~ offset(log(rand)), data = tab, family = poisson, subset 
= rand = 1)

 I haven't tried this, but it looks like this model is virtually 
equivalent to the one you wrote: 

 FP ~ rand-1 with link = identity says estimate 'b' in 
PoissonMean = b*rand. 

 FP ~ offset(log(rand)) with link = log (the default) says 
estimate 'a' in log(PoissonMean)-log(rand) = a. 

 If I haven't made an error, then log(b) = a. 

 In more general situations, if you really need the identity 
link, have you considered searching for good starting values, as Prof. 
Ripley suggested?  You could build up to your final model by estimating 
simpler models and obtaining trial fits using the default log link?  
With those results and a little thought, you should be able to obtain 
reasonable starting values. 

 hope this helps. 
 spencer graves

Prof Brian Ripley wrote:
On Thu, 25 Nov 2004, Federico Gherardini wrote:
Hi all
I'm trying to use the function glm from the MASS package to do the 
following

It's in the stats package.
fit.
fit - glm(FP ~ rand, data = tab, family = poisson(link = 
identity), subset = rand = 1)
(FP is = 0)

but I get the following error
Error: no valid set of coefficients has been found:please supply 
starting values
In addition: Warning message:
NaNs produced in: log(x)

And did you follow the advice in the error message?
in contrast if I fit a model without intercept
fit - glm(FP ~ rand - 1, data = tab, family = poisson(link = 
identity), subset = rand = 1)

everything goes fine.
Now my guess is that the points naturally have a negative intercept 
so the error is produced because I'm using the poisson distribution 
for the y and negative values are of course not admitted. Am I right?

We don't have your data, but it is plausible.
Also if this is the cause, shouldn't the function always try to do 
the best fit given the parameters? I mean shouldn't it fit a model 
with intercept 0 anyway and report it as a bad fit?

Well, I believe functions should do what they say on the box (and the 
help page), and not what some user hopes they might do by mind-reading.

You do have a suitable set of starting values from the second fit, so 
why not just follow the rather explicit advice?

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Peter Dalgaard
Spencer Graves [EMAIL PROTECTED] writes:

 Dear Federico: Why do you use the identity link?  That can
 produce situations with an average of (-2) Poisson defects per unit,
 for example.  That's physical nonsense. 

So is _not_ using the identity link when the model is manifestly
additive on the identity scale. E.g. calibrating differential
spectrofluorometry with photon counters recording linear combinations
of intensities at different wavelengths.

I've bumped into similar situations before (binomial(link=identity), I
think it was then) and the glm.fit algorithm could use improvement in
dealing with the parameter constraints in these cases. With the
standard IRLS algorithm, if the maximum is on the boundary, you
basically hit a random point on the boundary and get stuck there with
a search direction pointing out of the valid region.
 
-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Spencer Graves
Hi, Peter: 

 What do you do in such situations? 

 Sundar Dorai-Raj and I have extended glm concepts to models 
driven by a sum of k independent Poissons, with the a linear model for 
log(defectRate[i]) for each source (i = 1:k).  To handle convergence 
problems, etc., I think we need to use informative Bayes, but we're not 
there yet.  In any context where things are done more than once [which 
covers most human activities], informative Bayes seems sensible. 

 A related question comes with data representing the differences 
between Poisson counts, e.g., with d[i] = X[i]-X[i-1] = the number of 
new defects added between steps i-1 and i in a manufacturing process.  
Most of the time, d[i] is nonnegative.  However, in some cases, it can 
be negative, either because of metrology errors in X[i] or because of 
defect removal between steps i-1 and i. 

 Comments?
 Best Wishes,
 Spencer Graves
Peter Dalgaard wrote:
Spencer Graves [EMAIL PROTECTED] writes:
 

Dear Federico: Why do you use the identity link?  That can
produce situations with an average of (-2) Poisson defects per unit,
for example.  That's physical nonsense. 
   

So is _not_ using the identity link when the model is manifestly
additive on the identity scale. E.g. calibrating differential
spectrofluorometry with photon counters recording linear combinations
of intensities at different wavelengths.
I've bumped into similar situations before (binomial(link=identity), I
think it was then) and the glm.fit algorithm could use improvement in
dealing with the parameter constraints in these cases. With the
standard IRLS algorithm, if the maximum is on the boundary, you
basically hit a random point on the boundary and get stuck there with
a search direction pointing out of the valid region.
 

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Peter Dalgaard
Spencer Graves [EMAIL PROTECTED] writes:

 Hi, Peter: What do you do in such situations? Sundar Dorai-Raj
 and I have extended glm concepts to models driven by a sum of k
 independent Poissons, with the a linear model for log(defectRate[i])
 for each source (i = 1:k).  To handle convergence problems, etc., I
 think we need to use informative Bayes, but we're not there yet.  In
 any context where things are done more than once [which covers most
 human activities], informative Bayes seems sensible. A related
 question comes with data representing the differences between Poisson
 counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects
 added between steps i-1 and i in a manufacturing process.  Most of the
 time, d[i] is nonnegative.  However, in some cases, it can be
 negative, either because of metrology errors in X[i] or because of
 defect removal between steps i-1 and i. Comments?

I haven't got all that much experience with it, but obviously, the
various algorithms for constrained optimization (box- or otherwise) at
least allow you to find a proper maximum likelihood estimator.


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Spencer Graves
Hi, Peter: 

 Thanks for the comment and reply. 

 I generally avoid constrained optimizers for three reasons: 

 1.  My experience with them has included many cases where the 
optimizer would stop with an error when testing parameter values that 
violate the constraints.  If I transform the parameter space to remove 
the constraints, that never happens.  The constrained optimizers in R 
2.0.1 may not exhibit this behavior, but I have not checked. 

 2.  In a few cases, I've plotted the log(likelihood) vs. parameter 
values using various transformations.  When I've done that, I typically 
found that the most nearly parabolic performance used unconstrained 
parameterizations.  This makes asymptotic normality more useful and 
increases the accuracy of simple, approximate sequential Bayesian 
procedures. 

 3.  When I think carefully about a particular application, I often 
find a rationale for claiming that a certain unconstrained 
parameterization provides a better description of the application.  For 
example, interest income on investments is essentially additive on the 
log scale.  Similarly, the concept of materiality in Accounting is 
closer to being constant in log space:  One might look for an error of a 
few Euros in the accounts of a very small business, but in auditing some 
major government accounts, errors on the order of a few Euros might not 
be investigated.  Also, measurement errors with microvolts are much 
smaller than with megavolts;  expressing the measurements in decibels 
(i.e., on the log scale) makes the measurement errors more nearly 
comparable. 

 Thanks again for your comments. 
 Best Wishes,
 Spencer Graves

Peter Dalgaard wrote:
Spencer Graves [EMAIL PROTECTED] writes:
 

Hi, Peter: What do you do in such situations? Sundar Dorai-Raj
and I have extended glm concepts to models driven by a sum of k
independent Poissons, with the a linear model for log(defectRate[i])
for each source (i = 1:k).  To handle convergence problems, etc., I
think we need to use informative Bayes, but we're not there yet.  In
any context where things are done more than once [which covers most
human activities], informative Bayes seems sensible. A related
question comes with data representing the differences between Poisson
counts, e.g., with d[i] = X[i]-X[i-1] = the number of new defects
added between steps i-1 and i in a manufacturing process.  Most of the
time, d[i] is nonnegative.  However, in some cases, it can be
negative, either because of metrology errors in X[i] or because of
defect removal between steps i-1 and i. Comments?
   

I haven't got all that much experience with it, but obviously, the
various algorithms for constrained optimization (box- or otherwise) at
least allow you to find a proper maximum likelihood estimator.
 

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error using glm with poisson family and identity link

2004-11-25 Thread Prof Brian Ripley
On Thu, 25 Nov 2004, Spencer Graves wrote:
I generally avoid constrained optimizers for three reasons: 
1.  My experience with them has included many cases where the optimizer 
would stop with an error when testing parameter values that violate the 
constraints.  If I transform the parameter space to remove the constraints, 
that never happens.  The constrained optimizers in R 2.0.1 may not exhibit 
this behavior, but I have not checked.
They do not, and OTOH if the MLE really does lie on the boundary (and here 
it may well, with one data point fitted with mean zero) transformation 
will often not find a good solution.

Constrained optimization is a hard problem, good methods are very 
complex and good code to implement them is usually expensive.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html