Re: [R] Limitations and scale of R, and performance issues if and when limit reached

Uwe Ligges Sat, 23 Oct 2010 08:43:51 -0700


On 21.10.2010 22:20, Stratos Laskarides wrote:

  Hi there

Thank you for everyone's help in all my previous questions.

By way of intro, I am a masters student in actuarial science at the
University of Cape Town, and I am doing a project in R on some healthcare
cost data. Just for clarity before I embark on further research may I please
ask the following.

I want to take the direction of modelling healh insurance claims data with
Tweedie compound poisson models for over 2 million beneficiaries. I'd also
like to work in a double GLM framework so that the dispersion parameter
captures as much variance as possible. In addition, I'd like these results
to somehow feed into a stochastic model application, which will form part of
a Dynamic Financial Analysis model of a health insurer.

My question is, in light of the above broad overview, how large must data
sets be before R faces any performance problems or issues? In other words
what "scale" can R handle?

Depends on the available memory, the kind of data and the methods youare going to apply.


Uwe Ligges

Thanks ever so much once again.

Kind regards
Stratos

  On Tue, Oct 12, 2010 at 11:31 AM, Dennis Murphy<djmu...@gmail.com>  wrote:

Hi:

  On Tue, Oct 12, 2010 at 12:51 AM, Stratos Laskarides<stratl...@gmail.com

wrote:

  Dear Madam/Sir

This may be quite a long shot...

By way of intro, I am a masters student in actuarial science at the
University of Cape Town, and I am doing a project in R on some healthcare
cost data. During my coding in R I encountered an error message, which I
then googled, but I am still unable to resolve the issue.

I would like to please ask if and how it is possible to resolve the
problem
raised by the error message "Error: NA/NaN/Inf in foreign function call
(arg
1) In addition: Warning message: *step size truncated due to divergence"
*in
R?


That error message can arise if division by zero occurs somewhere in the
computation. Try using ftable() or some related function that will print
out your
complete table (4-way?) and check whether you have zero frequency in one
or more cells. If there are zero frequencies, that does not necessarily
explain
the problem, but it's a reasonable initial hypothesis. Merging some
categories to
get enough frequencies per cell may be useful if you do have zero
frequencies,
and then try the fit again to see if you get more sensible results.

When the error is thrown, it can be useful to do
traceback()

as it recalls the sequence of function calls that led up to the error, but
it helps to
have enough R experience to make heads or tails of the output :)


As for some background on my specific data and research problem at hand, I
am fitting a gamma regression model to 13 000 lines of insurance claims
data, which will be regressed against categorical variables such as Age
Band, Gender, and Region.


The more variables you have in the model, the greater the number of cell
combinations. A 15 x 2 x 5 combination of your three variables, for
example, would generate 150 combinations of the three variables, and it's
entirely possible for a few of those combinations to have small or zero
frequencies.
In addition, adding a new variable to the model would at least double the
number
of cells, spreading/thinning out the data even more.


Perhaps my problem arises because the data set is too large and the
iteratively reweighted least squares algorithm therefore cannot converge,
in
which case I perhaps need another GLM type. Or maybe the categorical
explanatory variables can take on too many values (e.g. there are 15 Age
Bands, 5 Regions).


If your response is continuous and positive valued with a right skewed
distribution,
then a Gamma model would appear to be sensible.

The data set is not too large; successful GLMs have been fit with much
larger
data sets. Your second hypothesis sounds more plausible, though.

HTH,
Dennis


Any insights you could provide would be much appreciated.

Thank you ever so much.

Kind regards
Stratos Laskarides
South Africa

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Limitations and scale of R, and performance issues if and when limit reached

Reply via email to