Hello.
I am not certain even how to search the archives for this particular question,
so if there is an obvious answer, please smack me with a large halibut and send
me to the URLs.
I have been experimenting with fitting curves by using both maximum likelihood
and maximum spacing estimation techniques. Originally, I have been writing
distribution-specific functions in 'R' which work rather well. As the procedure
is identical for all distributions, other than the actual distribution function
itself, I thought I would try to build a single function that accepted the
distribution as an input and returned the results. Never having played with
calls, formals, and arguments before, I figured I would rip off, cough, cough,
be inspired by the venerable fitdistr in MASS, which accepts distribution
inputs.
After a few hours, I actually got it working decently (although unfinished).
However, I am finding something very weird. At its core, the technique requires
the difference between the value of the cumulative distribution function at
neighboring evaluations. I implement this by running p($DIST) on the vector of
sorted losses (call it SP), creating two new vectors, one c(0, SP) and one
c(SP, 1), and then taking the latter minus the former. If there happen to be
two instances of the same value, unless it is known rounding error, one
substitutes the density at that point for the difference in the cumulative
distribution (which would be 0, as the CDF of two identical values is the
same). So, I run d($DIST), add a 0 in front to make it the same length, and
return a new vector equal to pmax.int(DIFFERECEN, DENSITY), with the idea that
the density is always 0 and always less than the difference in cumulative
distributions, so it will only be max in the case of DIFFERENCE being trul!
y 0. I then take negative the sum of the log of the differences and that is
the function passed to optim.
What is weird is when I leave out the density correction (which is safe
99.% of the time as the chances of two identical losses is almost 0
(assuming no clustering/capping) ), I get a very similar result to my
distribution-customized function which calls the proper plnorm or pgenpareto
directly. When I add in the correction, the value is orders of magnitude
higher, which not only affects the fit (slightly) but also affects the goodness
of fit statistics. I have no idea why this happens, although in theory, if the
function is pulling too many density values, it would return a higher value as
the densities are much closer to 0 so the neg-log is a larger number.
In the code pasted below, if spacing returns -sum(log(SP2), it works fine. If
it returns -(sum(log(SP3)), it gives strange results.
I do not have the S programming language book (perhaps I should invest in it)
and the online help wasn't that helpful to me, so I would very much appreciate
any responses y'all may have.
Thank you very much,
--Avi
#
Code (Unfinished)
MSEFit - function (x, distfun, start, ...)
{
require (MASS); require (actuar);
Call - match.call(expand.dots = TRUE)
if (missing(start))
start - NULL
dots - names(list(...))
if (missing(x) || length(x) == 0L || mode(x) != numeric)
stop('x' must be a non-empty numeric vector)
if (any(!is.finite(x)))
stop('x' contains missing or infinite values)
if (missing(distfun) || !(is.function(distfun) || is.character(distfun)))
stop('density' must be supplied as a function or name)
n - length(x)
if (is.character(distfun)) {
distname - tolower(distfun)
densfun - switch(distname, exp = dexp, exponential = dexp, gamma =
dgamma,
`log-normal` = dlnorm, lnorm = dlnorm, lognormal = dlnorm, weibull
= dweibull,
pareto = dpareto, loglogistic = dllogis, transbeta = dtrbeta,
`transformed beta` = dtrbera, burr = dburr, paralogistic =
dparalogis,
genpareto = dgenpareto, generalizedpareto = dgenpareto,
`generalized pareto` = dgenpareto, invburr = dinvburr,
`inverse burr` = dinvburr, invpareto = dinvpareto, `inverse pareto`
= dinvpareto,
invparalogistic = dinvparalogis, `inverse paralogistic` =
dinvparalogis,
transgamma = dtrgamma, `transformed gamma` = dtrgamma, invexp =
dinvexp,
`inverse exponential` = dinvexp, invtransgamma = dinvtrgamma,
`inverse transformed gamma` = dinvtrgamma, invgamma = dinvgamma,
`inverse gamma` = dinvgamma, invweibull = dinvweibull, `inverse
weibull` = dinvweibull,
loggamma = dlgamma, genbeta = dgenbeta, `generalized beta` =
dgenbeta, NULL)
if (is.null(densfun))
stop(unsupported distribution)
distfun - switch(distname, exp = pexp, exponential = pexp, gamma =
pgamma,
`log-normal` = plnorm, lnorm = plnorm, lognormal = plnorm, weibull
=