If the numbers are letter frequencies, I would suggest Poisson regression using "glm" the default link is logarithms, and that should work quite well for you.

hope this helps. spencer graves

###################################
Very many thanks for your help.


>> What do these numbers represent?



They are letter frequencies arranged in rank order. (A very big sample that I got off the web for testing, but my own data - rank frequencies of various linguistic entities, including letter frequencies - are likely to be similar.)

Basically, I am testing the goodness of fit of three or four equations:

- the one I posted (Yule's equation)
- Zipf's equation (y = a * b^x, if I remember rightly, but the paper's at
home, so I may be wrong...)
- a parameter-free equation

Regards,
Andrew Wilson

####################################
     Since x <- 1:26 and your y's are all positive, your model,
ignoring the error term, is mathematically isomorphic to the following:

x <- 1:26
(fit <- lm(y~x+log(x)))

Call: lm(formula = y ~ x + log(x))

Coefficients:
(Intercept)            x       log(x)
  35802074      -371008     -8222922

     With reasonable starting values, I would expect "a" to converge to
roughly exp(35802074), "k" to (-8222922), and "b" to exp(-371008).  With
values of these magnitudes for "a" and "b", the "nls" optimization is
highly ill conditioned.

     What do these numbers represent?  By using "nls" you are assuming
implicitly the following:

     y = a*x^k*b^x + e, where the e's are independent normal errors
with mean 0 and constant variance.

     Meanwhile, the linear model I fit above assumes a different noise
model:

     log(y) = log(a) + k*log(x) + x*log(b) + e, where these e's are
also independent normal, mean 0, constant variance.

     If you have no preference for one noise model over the other, I
suggest you use the linear model I estimated, declare victory and worry
about something else.  If you insist on estimating the multiplicative
model, you should start by dividing y by some number like 1e6 or 1e7 and
consider reparameterizing the problem if that is not adequate.  Have you
consulted a good book on nonlinear regression?  The two references cited
in "?nls" are both excellent:

      Bates, D.M. and Watts, D.G. (1988) _Nonlinear Regression Analysis
    and Its Applications_, Wiley

    Bates, D. M. and Chambers, J. M. (1992) _Nonlinear models._
    Chapter 10 of _Statistical Models in S_ eds J. M. Chambers and T.
    J. Hastie, Wadsworth & Brooks/Cole.

hope this helps. spencer graves

Dr Andrew Wilson wrote:

I am trying to fit a rank-frequency distribution with 3 unknowns (a, b
and k) to a set of data.

This is my data set:

y <- c(37047647,27083970,23944887,22536157,20133224,
20088720,18774883,18415648,17103717,13580739,12350767,
8682289,7496355,7248810,7022120,6396495,6262477,6005496,
5065887,4594147,2853307,2745322,454572,448397,275136,268771)

and this is the fit I'm trying to do:

nlsfit <- nls(y ~ a * x^k * b^x, start=list(a=5,k=1,b=3))

(It's a Yule distribution.)

However, I keep getting:

"Error in nls(y ~ a * x^k * b^x, start = list(a = 5, k = 1, b = 3)) : singular gradient"

I guess this has something to do with the parameter start values.

I was wondering, is there a fully automated way of estimating parameters
which doesn't need start values close to the final estimates?  I know
other programs do it, so is it possible in R?

Thanks,
Andrew Wilson

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help



______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to