Re: [R] Optimisation and NaN Errors using clm() and clmm()

2013-04-20 Thread Rune Haubo
On 18 April 2013 18:38, Thomas Foxley thomasfox...@aol.com wrote:
 Rune,

 Thank you very much for your response.

 I don't actually have the models that failed to converge from the first
 (glmulti) part as they were not saved with the confidence set. glmulti
 generates thousands of models so it seems reasonable that a few of these may
 not converge.

 The clmm() model I provided was just an example - not all models have 17
 parameters. There were only one or two that produced errors (the example I
 gave being one of them), perhaps overparameterisation is the root of the
 problem.

 Regarding incomplete data - there are only 103 (of 314) records where I have
 data for every predictor. The number of observations included will obviously
 vary for different models, models with fewer predictors will include more
 observations. glmulti acts as a wrapper for another function, meaning (in
 this case) na's are treated as they would be in clm(). Is there a way around
 this (apart from filling in the missing data)? I believe its possible to
 limit model complexity in the glmulti call - which may or may not increase
 the number of observations - how would this affect interpretation of the
 results?

Since the likelihood (and hence also AIC-like criteria) depends on the
number of observations, I would make sure that only models with the
same number of observations are compared using model selection
criteria. This means that I would make a data.frame with complete
observations either by just deleting all rows with one or more missing
predictors or by imputing some data points. If one or a couple of
variables  are responsible for most of the missing observations, you
could disregard these variables before deleting rows with NAs.

As I said, I am no expert in model averaging or glmulti usage, so
there might be better approaches or other opinions on this.

Cheers,
Rune

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimisation and NaN Errors using clm() and clmm()

2013-04-18 Thread Thomas Foxley

Rune,

Thank you very much for your response.

I don't actually have the models that failed to converge from the first 
(glmulti) part as they were not saved with the confidence set. glmulti 
generates thousands of models so it seems reasonable that a few of these 
may not converge.


The clmm() model I provided was just an example - not all models have 17 
parameters. There were only one or two that produced errors (the example 
I gave being one of them), perhaps overparameterisation is the root of 
the problem.


Regarding incomplete data - there are only 103 (of 314) records where I 
have data for every predictor. The number of observations included will 
obviously vary for different models, models with fewer predictors will 
include more observations. glmulti acts as a wrapper for another 
function, meaning (in this case) na's are treated as they would be in 
clm(). Is there a way around this (apart from filling in the missing 
data)? I believe its possible to limit model complexity in the glmulti 
call - which may or may not increase the number of observations - how 
would this affect interpretation of the results?


Thanks again,

Tom


On 16/04/13 07:54, Rune Haubo wrote:

On 15 April 2013 13:18, Thomas thomasfox...@aol.com wrote:

Dear List,

I am using both the clm() and clmm() functions from the R package 'ordinal'.

I am fitting an ordinal dependent variable with 5 categories to 9 continuous 
predictors, all of which have been normalised (mean subtracted then divided by 
standard deviation), using a probit link function. From this global model I am 
generating a confidence set of 200 models using clm() and the 'glmulti' R 
package. This produces these errors:

/ model.2.10 - glmulti(as.factor(dependent) ~ 
predictor_1*predictor_2*predictor_3*predictor_4*predictor_5*predictor_6*predictor_7*predictor_8*predictor_9,
 data = database, fitfunc = clm, link = probit, method = g, crit = aicc, 
confsetsize = 200, marginality = TRUE)
...
After 670 generations:
Best model: 
as.factor(dependent)~1+predictor_1+predictor_2+predictor_3+predictor_4+predictor_5+predictor_6+predictor_8+predictor_9+predictor_4:predictor_3+predictor_6:predictor_2+predictor_8:predictor_5+predictor_9:predictor_1+predictor_9:predictor_4+predictor_9:predictor_5+predictor_9:predictor_6
Crit= 183.716706496392
Mean crit= 202.022138576506
Improvements in best and average IC have bebingo en below the specified goals.
Algorithm is declared to have converged.
Completed.
There were 24 warnings (use warnings() to see them)

warnings()

Warning messages:
1: optimization failed: step factor reduced below minimum
2: optimization failed: step factor reduced below minimum
3: optimization failed: step factor reduced below minimum/
etc.


I am then re-fitting each of the 200 models with the clmm() function, with 2 
random factors (family nested within order). I get this error in a few of the 
re-fitted models:

/ model.2.glmm.2 - clmm(as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + 
predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 + 
predictor_7:predictor_2 + predictor_7:predictor_3 + predictor_8:predictor_2 + 
predictor_9:predictor_1 + predictor_9:predictor_2 + predictor_9:predictor_3 + 
predictor_9:predictor_6 + predictor_9:predictor_7 + predictor_9:predictor_8+ (1|order/family), 
link = probit, data = database)

summary(model.2.glmm.2)


Cumulative Link Mixed Model fitted with the Laplace approximation

formula: as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + 
predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 
+ predictor_7:predictor_2 +
predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + 
predictor_9:predictor_2 +
predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + 
predictor_9:predictor_8 + (1 | order/family)
data: database

link threshold nobs logLik AIC niter max.grad cond.H
probit flexible 103 -65.56 173.13 58(3225) 8.13e-06 4.3e+03

Random effects:
Var Std.Dev
family:order 7.493e-11 8.656e-06
order 1.917e-12 1.385e-06
Number of groups: family:order 12, order 4

Coefficients:
Estimate Std. Error z value Pr(|z|)
predictor_1 0.40802 0.78685 0.519 0.6041
predictor_2 0.02431 0.26570 0.092 0.9271
predictor_3 -0.84486 0.32056 -2.636 0.0084 **
predictor_6 0.65392 0.34348 1.904 0.0569 .
predictor_7 0.71730 0.29596 2.424 0.0154 *
predictor_8 -1.37692 0.75660 -1.820 0.0688 .
predictor_9 0.15642 0.28969 0.540 0.5892
predictor_2:predictor_6 -0.46880 0.18829 -2.490 0.0128 *
predictor_2:predictor_7 4.97365 0.82692 6.015 1.80e-09 ***
predictor_3:predictor_7 -1.13192 0.46639 -2.427 0.0152 *
predictor_2:predictor_8 -5.52913 0.88476 -6.249 4.12e-10 ***
predictor_1:predictor_9 4.28519 NA NA NA
predictor_2:predictor_9 -0.26558 0.10541 -2.520 0.0117 *
predictor_3:predictor_9 -1.49790 NA NA NA
predictor_6:predictor_9 -1.31538 NA NA NA
predictor_7:predictor_9 -4.41998 NA NA NA
predictor_8:predictor_9 3.99709 NA 

Re: [R] Optimisation and NaN Errors using clm() and clmm()

2013-04-16 Thread Rune Haubo
On 15 April 2013 13:18, Thomas thomasfox...@aol.com wrote:

 Dear List,

 I am using both the clm() and clmm() functions from the R package 'ordinal'.

 I am fitting an ordinal dependent variable with 5 categories to 9 continuous 
 predictors, all of which have been normalised (mean subtracted then divided 
 by standard deviation), using a probit link function. From this global model 
 I am generating a confidence set of 200 models using clm() and the 'glmulti' 
 R package. This produces these errors:

 / model.2.10 - glmulti(as.factor(dependent) ~ 
 predictor_1*predictor_2*predictor_3*predictor_4*predictor_5*predictor_6*predictor_7*predictor_8*predictor_9,
  data = database, fitfunc = clm, link = probit, method = g, crit = aicc, 
 confsetsize = 200, marginality = TRUE)
 ...
 After 670 generations:
 Best model: 
 as.factor(dependent)~1+predictor_1+predictor_2+predictor_3+predictor_4+predictor_5+predictor_6+predictor_8+predictor_9+predictor_4:predictor_3+predictor_6:predictor_2+predictor_8:predictor_5+predictor_9:predictor_1+predictor_9:predictor_4+predictor_9:predictor_5+predictor_9:predictor_6
 Crit= 183.716706496392
 Mean crit= 202.022138576506
 Improvements in best and average IC have bebingo en below the specified goals.
 Algorithm is declared to have converged.
 Completed.
 There were 24 warnings (use warnings() to see them)
  warnings()
 Warning messages:
 1: optimization failed: step factor reduced below minimum
 2: optimization failed: step factor reduced below minimum
 3: optimization failed: step factor reduced below minimum/
 etc.


 I am then re-fitting each of the 200 models with the clmm() function, with 2 
 random factors (family nested within order). I get this error in a few of the 
 re-fitted models:

 / model.2.glmm.2 - clmm(as.factor(dependent) ~ 1 + predictor_1 + 
 predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + 
 predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 + 
 predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + 
 predictor_9:predictor_2 + predictor_9:predictor_3 + predictor_9:predictor_6 + 
 predictor_9:predictor_7 + predictor_9:predictor_8+ (1|order/family), link = 
 probit, data = database)
  summary(model.2.glmm.2)
 
 Cumulative Link Mixed Model fitted with the Laplace approximation

 formula: as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + 
 predictor_6 + predictor_7 + predictor_8 + predictor_9 + 
 predictor_6:predictor_2 + predictor_7:predictor_2 +
 predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + 
 predictor_9:predictor_2 +
 predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + 
 predictor_9:predictor_8 + (1 | order/family)
 data: database

 link threshold nobs logLik AIC niter max.grad cond.H
 probit flexible 103 -65.56 173.13 58(3225) 8.13e-06 4.3e+03

 Random effects:
 Var Std.Dev
 family:order 7.493e-11 8.656e-06
 order 1.917e-12 1.385e-06
 Number of groups: family:order 12, order 4

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 predictor_1 0.40802 0.78685 0.519 0.6041
 predictor_2 0.02431 0.26570 0.092 0.9271
 predictor_3 -0.84486 0.32056 -2.636 0.0084 **
 predictor_6 0.65392 0.34348 1.904 0.0569 .
 predictor_7 0.71730 0.29596 2.424 0.0154 *
 predictor_8 -1.37692 0.75660 -1.820 0.0688 .
 predictor_9 0.15642 0.28969 0.540 0.5892
 predictor_2:predictor_6 -0.46880 0.18829 -2.490 0.0128 *
 predictor_2:predictor_7 4.97365 0.82692 6.015 1.80e-09 ***
 predictor_3:predictor_7 -1.13192 0.46639 -2.427 0.0152 *
 predictor_2:predictor_8 -5.52913 0.88476 -6.249 4.12e-10 ***
 predictor_1:predictor_9 4.28519 NA NA NA
 predictor_2:predictor_9 -0.26558 0.10541 -2.520 0.0117 *
 predictor_3:predictor_9 -1.49790 NA NA NA
 predictor_6:predictor_9 -1.31538 NA NA NA
 predictor_7:predictor_9 -4.41998 NA NA NA
 predictor_8:predictor_9 3.99709 NA NA NA
 ---
 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Threshold coefficients:
 Estimate Std. Error z value
 0|1 -0.2236 0.3072 -0.728
 1|2 1.4229 0.3634 3.915
 (211 observations deleted due to missingness)
 Warning message:
 In sqrt(diag(vc)[1:npar]) : NaNs produced/


This warning is due to a (near) singular variance-covariance matrix of
the model parameters, which in turn is due to the fact that the model
converged to a boundary solution: both random effects variance
parameters are zero. If you exclude the random terms and refit the
model with clm, the variance-covariance matrix will probably be well
defined and standard errors can be computed.

Another thing is that you are fitting 17 regression parameters and 2
random effect terms (which in the end do not count) to only 103
observations. I would be worried about overfitting or perhaps even
non-fitting. I think I would also be concerned about the 211
observations that are incomplete, and I would be careful with
automatic model selection/averaging etc. on incomplete data (though I
don't know how/if glmulti actually deals with that).


 

[R] Optimisation and NaN Errors using clm() and clmm()

2013-04-15 Thread Thomas

Dear List,

I am using both the clm() and clmm() functions from the R package 
'ordinal'.


I am fitting an ordinal dependent variable with 5 categories to 9 
continuous predictors, all of which have been normalised (mean 
subtracted then divided by standard deviation), using a probit link 
function. From this global model I am generating a confidence set of 200 
models using clm() and the 'glmulti' R package. This produces these errors:


/ model.2.10 - glmulti(as.factor(dependent) ~ 
predictor_1*predictor_2*predictor_3*predictor_4*predictor_5*predictor_6*predictor_7*predictor_8*predictor_9, 
data = database, fitfunc = clm, link = probit, method = g, crit = 
aicc, confsetsize = 200, marginality = TRUE)

...
After 670 generations:
Best model: 
as.factor(dependent)~1+predictor_1+predictor_2+predictor_3+predictor_4+predictor_5+predictor_6+predictor_8+predictor_9+predictor_4:predictor_3+predictor_6:predictor_2+predictor_8:predictor_5+predictor_9:predictor_1+predictor_9:predictor_4+predictor_9:predictor_5+predictor_9:predictor_6

Crit= 183.716706496392
Mean crit= 202.022138576506
Improvements in best and average IC have bebingo en below the specified 
goals.

Algorithm is declared to have converged.
Completed.
There were 24 warnings (use warnings() to see them)
 warnings()
Warning messages:
1: optimization failed: step factor reduced below minimum
2: optimization failed: step factor reduced below minimum
3: optimization failed: step factor reduced below minimum/
etc.


I am then re-fitting each of the 200 models with the clmm() function, 
with 2 random factors (family nested within order). I get this error in 
a few of the re-fitted models:


/ model.2.glmm.2 - clmm(as.factor(dependent) ~ 1 + predictor_1 + 
predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + 
predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 + 
predictor_7:predictor_3 + predictor_8:predictor_2 + 
predictor_9:predictor_1 + predictor_9:predictor_2 + 
predictor_9:predictor_3 + predictor_9:predictor_6 + 
predictor_9:predictor_7 + predictor_9:predictor_8+ (1|order/family), 
link = probit, data = database)

 summary(model.2.glmm.2)

Cumulative Link Mixed Model fitted with the Laplace approximation

formula: as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + 
predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + 
predictor_6:predictor_2 + predictor_7:predictor_2 +
predictor_7:predictor_3 + predictor_8:predictor_2 + 
predictor_9:predictor_1 + predictor_9:predictor_2 +
predictor_9:predictor_3 + predictor_9:predictor_6 + 
predictor_9:predictor_7 + predictor_9:predictor_8 + (1 | order/family)

data: database

link threshold nobs logLik AIC niter max.grad cond.H
probit flexible 103 -65.56 173.13 58(3225) 8.13e-06 4.3e+03

Random effects:
Var Std.Dev
family:order 7.493e-11 8.656e-06
order 1.917e-12 1.385e-06
Number of groups: family:order 12, order 4

Coefficients:
Estimate Std. Error z value Pr(|z|)
predictor_1 0.40802 0.78685 0.519 0.6041
predictor_2 0.02431 0.26570 0.092 0.9271
predictor_3 -0.84486 0.32056 -2.636 0.0084 **
predictor_6 0.65392 0.34348 1.904 0.0569 .
predictor_7 0.71730 0.29596 2.424 0.0154 *
predictor_8 -1.37692 0.75660 -1.820 0.0688 .
predictor_9 0.15642 0.28969 0.540 0.5892
predictor_2:predictor_6 -0.46880 0.18829 -2.490 0.0128 *
predictor_2:predictor_7 4.97365 0.82692 6.015 1.80e-09 ***
predictor_3:predictor_7 -1.13192 0.46639 -2.427 0.0152 *
predictor_2:predictor_8 -5.52913 0.88476 -6.249 4.12e-10 ***
predictor_1:predictor_9 4.28519 NA NA NA
predictor_2:predictor_9 -0.26558 0.10541 -2.520 0.0117 *
predictor_3:predictor_9 -1.49790 NA NA NA
predictor_6:predictor_9 -1.31538 NA NA NA
predictor_7:predictor_9 -4.41998 NA NA NA
predictor_8:predictor_9 3.99709 NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Threshold coefficients:
Estimate Std. Error z value
0|1 -0.2236 0.3072 -0.728
1|2 1.4229 0.3634 3.915
(211 observations deleted due to missingness)
Warning message:
In sqrt(diag(vc)[1:npar]) : NaNs produced/


I have tried a number of different approaches, each has its own 
problems. I have fixed these using various suggestions from online 
forums (eg 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/015328.html, 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/016165.html) 
and this is as good as I can get it.


After the first stage (generating the model set with glmulti) I tested 
every model in the confidence set individually - there were no errors - 
but there was clearly a problem during the model selection process. 
Should I be worried?


No errors appear in the top 5% of re-fitted models (which are the only 
ones I will be using) however I am concerned that errors may be 
indicative of a problem with my approach.


A further worry is that the errors might be removing models that could 
otherwise be included.


Any help would be much appreciated.

Tom

__
R-help@r-project.org mailing list