On Oct 12, 2010, at 3:51 AM, Stratos Laskarides wrote:

Dear Madam/Sir

This may be quite a long shot...

By way of intro, I am a masters student in actuarial science at the
University of Cape Town, and I am doing a project in R on some healthcare cost data. During my coding in R I encountered an error message, which I
then googled, but I am still unable to resolve the issue.

I would like to please ask if and how it is possible to resolve the problem raised by the error message "Error: NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: *step size truncated due to divergence" *in
R?

As for some background on my specific data and research problem at hand, I am fitting a gamma regression model to 13 000 lines of insurance claims data, which will be regressed against categorical variables such as Age
Band, Gender, and Region.

Perhaps my problem arises because the data set is too large and the
iteratively reweighted least squares algorithm therefore cannot converge, in
which case I perhaps need another GLM type. Or maybe the categorical
explanatory variables can take on too many values (e.g. there are 15 Age
Bands, 5 Regions).

Any insights you could provide would be much appreciated.

You are asking the right questions. Most probably some particular stratum of categorical variables has a small number of informative events or is pathologically distributed (from the perspective of your model structure). This is especially likely when you enter interaction terms. Tabular investigation may disclose a suspect and point to way to "nail down" the culprit.

What are the descriptive stats on your outcome variable stratified by age and region?

One option that immediately presents itself is modeling age as a continuous variable with a spline representation. I have quite a bit of experience working with actuaries and I do know the dominant analytic strategy is cutting data into discrete categories. However, this is a pretty small dataset and you should be prepared to argue in favor of the more powerful strategy of keeping continuous variables continuous.

Another issue: how you are handling the often statistically pathological zero claims that almost always occur in healthcare claims data? What does density(plot(claims)) look like? A gamma model is going have real difficulty with the typical sort of health claims distribution. Are you prepared to model using zero-inflated or zero- adjusted models?


--
David.


Thank you ever so much.

Kind regards
Stratos Laskarides
South Africa
--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to