Re: [R] Strange behavior when trying to piggyback off of fitdistr

2011-01-11 Thread emorway

Avi, 

Its been nearly a year since you made this post, but I'm curious if you were
able to find a solution to your problem?  Your inquiry is closely related to
a problem I'm having.

Thanks, 
Eric
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Strange-behavior-when-trying-to-piggyback-off-of-fitdistr-tp1012536p3209887.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange behavior when trying to piggyback off of fitdistr

2010-01-12 Thread Adler, Avraham
Hello.

I am not certain even how to search the archives for this particular question, 
so if there is an obvious answer, please smack me with a large halibut and send 
me to the URLs.

I have been experimenting with fitting curves by using both maximum likelihood 
and maximum spacing estimation techniques. Originally, I have been writing 
distribution-specific functions in 'R' which work rather well. As the procedure 
is identical for all distributions, other than the actual distribution function 
itself, I thought I would try to build a single function that accepted the 
distribution as an input and returned the results. Never having played with 
calls, formals, and arguments before, I figured I would rip off, cough, cough, 
be inspired by the venerable fitdistr in MASS, which accepts distribution 
inputs.

After a few hours, I actually got it working decently (although unfinished). 
However, I am finding something very weird. At its core, the technique requires 
the difference between the value of the cumulative distribution function at 
neighboring evaluations. I implement this by running p($DIST) on the vector of 
sorted losses (call it SP), creating two new vectors, one c(0, SP) and one 
c(SP, 1), and then taking the latter minus the former. If there happen to be 
two instances of the same value, unless it is known rounding error, one 
substitutes the density at that point for the difference in the cumulative 
distribution (which would be 0, as the CDF of two identical values is the 
same). So, I run d($DIST), add a 0 in front to make it the same length, and 
return a new vector equal to pmax.int(DIFFERECEN, DENSITY), with the idea that 
the density is always 0 and always less than the difference in cumulative 
distributions, so it will only be max in the case of DIFFERENCE being trul!
 y 0. I then take negative the sum of the log of the differences and that is 
the function passed to optim.

What is weird is when I leave out the density correction (which is safe 
99.% of the time as the chances of two identical losses is almost 0 
(assuming no clustering/capping) ), I get a very similar result to my 
distribution-customized function which calls the proper plnorm or pgenpareto 
directly. When I add in the correction, the value is orders of magnitude 
higher, which not only affects the fit (slightly) but also affects the goodness 
of fit statistics. I have no idea why this happens, although in theory, if the 
function is pulling too many density values, it would return a higher value as 
the densities are much closer to 0 so the neg-log is a larger number.

In the code pasted below, if spacing returns -sum(log(SP2), it works fine. If 
it returns -(sum(log(SP3)), it gives strange results.

I do not have the S programming language book (perhaps I should invest in it) 
and the online help wasn't that helpful to me, so I would very much appreciate 
any responses y'all may have.

Thank you very much,

--Avi

#
Code (Unfinished)

MSEFit - function (x, distfun, start, ...)
{
require (MASS); require (actuar);
Call - match.call(expand.dots = TRUE)
if (missing(start))
start - NULL
dots - names(list(...))
if (missing(x) || length(x) == 0L || mode(x) != numeric) 
stop('x' must be a non-empty numeric vector)
if (any(!is.finite(x))) 
stop('x' contains missing or infinite values)
if (missing(distfun) || !(is.function(distfun) || is.character(distfun))) 
stop('density' must be supplied as a function or name)
n - length(x)
if (is.character(distfun)) {
distname - tolower(distfun)
densfun - switch(distname, exp = dexp, exponential = dexp, gamma = 
dgamma,
`log-normal` = dlnorm, lnorm = dlnorm, lognormal = dlnorm, weibull 
= dweibull,
pareto = dpareto, loglogistic = dllogis, transbeta = dtrbeta,
`transformed beta` = dtrbera, burr = dburr, paralogistic = 
dparalogis,
genpareto = dgenpareto, generalizedpareto = dgenpareto,
`generalized pareto` = dgenpareto, invburr = dinvburr, 
`inverse burr` = dinvburr, invpareto = dinvpareto, `inverse pareto` 
= dinvpareto,
invparalogistic = dinvparalogis, `inverse paralogistic` = 
dinvparalogis,
transgamma = dtrgamma, `transformed gamma` = dtrgamma, invexp = 
dinvexp,
`inverse exponential` = dinvexp, invtransgamma = dinvtrgamma,
`inverse transformed gamma` = dinvtrgamma, invgamma = dinvgamma,
`inverse gamma` = dinvgamma, invweibull = dinvweibull, `inverse 
weibull` = dinvweibull,
loggamma = dlgamma, genbeta = dgenbeta, `generalized beta` = 
dgenbeta, NULL)
if (is.null(densfun)) 
stop(unsupported distribution)
distfun - switch(distname, exp = pexp, exponential = pexp, gamma = 
pgamma,
`log-normal` = plnorm, lnorm = plnorm, lognormal = plnorm, weibull 
=