hope this helps. spencer graves
################################### Very many thanks for your help.
>> What do these numbers represent?
They are letter frequencies arranged in rank order. (A very big sample that I got off the web for testing, but my own data - rank frequencies of various linguistic entities, including letter frequencies - are likely to be similar.)
Basically, I am testing the goodness of fit of three or four equations:
- the one I posted (Yule's equation) - Zipf's equation (y = a * b^x, if I remember rightly, but the paper's at home, so I may be wrong...) - a parameter-free equation
Regards, Andrew Wilson
#################################### Since x <- 1:26 and your y's are all positive, your model, ignoring the error term, is mathematically isomorphic to the following:
x <- 1:26 (fit <- lm(y~x+log(x)))
Call: lm(formula = y ~ x + log(x))
Coefficients: (Intercept) x log(x) 35802074 -371008 -8222922
With reasonable starting values, I would expect "a" to converge to roughly exp(35802074), "k" to (-8222922), and "b" to exp(-371008). With values of these magnitudes for "a" and "b", the "nls" optimization is highly ill conditioned.
What do these numbers represent? By using "nls" you are assuming implicitly the following:
y = a*x^k*b^x + e, where the e's are independent normal errors with mean 0 and constant variance.
Meanwhile, the linear model I fit above assumes a different noise model:
log(y) = log(a) + k*log(x) + x*log(b) + e, where these e's are also independent normal, mean 0, constant variance.
If you have no preference for one noise model over the other, I suggest you use the linear model I estimated, declare victory and worry about something else. If you insist on estimating the multiplicative model, you should start by dividing y by some number like 1e6 or 1e7 and consider reparameterizing the problem if that is not adequate. Have you consulted a good book on nonlinear regression? The two references cited in "?nls" are both excellent:
Bates, D.M. and Watts, D.G. (1988) _Nonlinear Regression Analysis and Its Applications_, Wiley
Bates, D. M. and Chambers, J. M. (1992) _Nonlinear models._ Chapter 10 of _Statistical Models in S_ eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
hope this helps. spencer graves
Dr Andrew Wilson wrote:
I am trying to fit a rank-frequency distribution with 3 unknowns (a, b and k) to a set of data.
This is my data set:
y <- c(37047647,27083970,23944887,22536157,20133224, 20088720,18774883,18415648,17103717,13580739,12350767, 8682289,7496355,7248810,7022120,6396495,6262477,6005496, 5065887,4594147,2853307,2745322,454572,448397,275136,268771)
and this is the fit I'm trying to do:
nlsfit <- nls(y ~ a * x^k * b^x, start=list(a=5,k=1,b=3))
(It's a Yule distribution.)
However, I keep getting:
"Error in nls(y ~ a * x^k * b^x, start = list(a = 5, k = 1, b = 3)) : singular gradient"
I guess this has something to do with the parameter start values.
I was wondering, is there a fully automated way of estimating parameters which doesn't need start values close to the final estimates? I know other programs do it, so is it possible in R?
Thanks, Andrew Wilson
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help