[R] Linear models over large datasets

2007-08-17 Thread dave fournier
 Its actually only a few lines of code to do this from first principles.
 The coefficients depend only on the cross products X'X and X'y and you
 can build them up easily by extending this example to read files or
 a database holding x and y instead of getting them from the args.
 Here we process incr rows of builtin matrix state.x77 at a time
 building up the two cross productxts, xtx and xty, regressing
 Income (variable 2) on the other variables:

 mylm - function(x, y, incr = 25) {
   start - xtx - xty - 0
   while(start  nrow(x)) {
   idx - seq(start + 1, min(start + incr, nrow(x)))
   x1 - cbind(1, x[idx,])
   xtx - xtx + crossprod(x1)
   xty - xty + crossprod(x1, y[idx])
   start - start + incr
   }
   solve(xtx, xty)
 }

 mylm(state.x77[,-2], state.x77[,2])


 On 8/16/07, Alp ATICI alpatici at gmail.com wrote:
  I'd like to fit linear models on very large datasets. My data frames
  are about 200 rows x 200 columns of doubles and I am using an 64
  bit build of R. I've googled about this extensively and went over the
  R Data Import/Export guide. My primary issue is although my data
  represented in ascii form is 4Gb in size (therefore much smaller
  considered in binary), R consumes about 12Gb of virtual memory.
 
  What exactly are my options to improve this? I looked into the biglm
  package but the problem with it is it uses update() function and is
  therefore not transparent (I am using a sophisticated script which is
  hard to modify). I really liked the concept behind the  LM package
  here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html
  But it is no longer available. How could one fit linear models to very
  large datasets without loading the entire set into memory but from a
  file/database (possibly through a connection) using a relatively
  simple modification of standard lm()? Alternatively how could one
  improve the memory usage of R given a large dataset (by changing some
  default parameters of R or even using on-the-fly compression)? I don't
  mind much higher levels of CPU time required.
 
  Thank you in advance for your help.
 
  __
  R-help at stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
If your design matrix X is very well behaved this approach may work for 
you. Often just doing solve(X'X,y) will fail for numerical reasons. The 
right way to do it is tofactor the matrix X  as

   X = A * B

where B is 200x200 in your case and A is  200 x 200 with
   A'*A = I  (That is A is orthogonal.)

so  X'*X = B' *B  and you use

 solve(B'*B,y);

To find A and B you can use modified Gram-Schmidt which is very easy to 
program and works well when you wish to store the columns of X on a hard 
disk and just read in a bit at a time. Some people claim that modifed 
Gram-Schmidt is unstable but it has always worked well for me.
In any event you can check  to ensure that A'*A = I and
   X=A*B

   Cheers,

Dave

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using odesolve to produce non-negative solutions

2007-07-03 Thread dave fournier
If you didn't get this solved.

I have done parameter estimation with models
defined by ODE's where negative solutions are a problem
and one can only avoid them with great difficulty if the
standard explicit methods for solving the ODE are used. I found that
using implicit methods could be a great help.

For example in the exponential case

 dN/dt = -k*N

the simple finite difference approximation is

 N_{t+1}-N+t
 --  = -k*N_t ,  k=0
h

   or
 N_{t+1} = N_t -k*h*N_t

   and if  k*h gets too large N_{t+1} goes negative and you are in trouble.

   Consider instead the implicit formulation where the second
   N_t on the RHS is replaced by N_{t+1}  and one gets

 N_{t+1} = N_t/(1+k*h)

which is correct  for k*h=0 and as k*h-- infinity

   For a more complicated example see

http://otter-rsch.com/admodel/cc4.html

   for something I called semi-implicit.
   I hope these ideas will be useful for your problem.


 Cheers,

  Dave





-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder

2006-11-26 Thread dave fournier
 simulated.) I would never try to fit a nonlinear model with 100
 parameters to data without carefully examining the data, and
 especially selected subsets of the data, first.  For this the
 flexibility of the S language and tools like lattice graphics that
 were developed in this language are invaluable to me.  The flexibility
 of data manipulation and graphics for interactive exploration of data
 is what attracted me to S in the first place.

 I realize that for many people the area of nonlinear statistical
 modeling is reduced to Fit this model to these data and don't ask any
 questions.   Just give me parameter estimates and p-values.  If that
 is your situation then it would make sense to use software that gets
 you those estimates as quickly as possible with a minimum of effort.
 I'm just happy that I get to turn down people who ask me to do that.
 I like that fact that I can spend my time asking questions about the
 data and of the data.

snip

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com



Douglas Bates wrote:
 On 11/24/06, dave fournier [EMAIL PROTECTED] wrote:


Dave
   Did you try supplying gradient information to nlminb?  (I note that
 nlminb is used for the optimization, but I don't see any gradient
 information supplied to it.) I would suspect that supplying gradient
 information would greatly speed up the computation (as you note in
 comments at http://otter-rsch.ca/tresults.htm.)
 
 Actually you should probably ask Norm Olsen these questions.
 I am not proficient in R and am just using his code.
 
 Don't you find it somewhat disingenuous that you publish a comparison
 between the AD Model Builder software that you sell and R - a
 comparison that shows a tremendous advantage for your software - and
 then you write I am not proficient in R?
 
 Had you been proficient in R you might have known about the symbolic
 differentiation capabilities, specifically the deriv function, that
 have been part of the S language since the late 1980s.  I believe that
 the 'AD' in AD model builder stands for automatic differentiation,
 which is actually something that John Chambers and I discussed at
 length when we were developing nonlinear modeling methods for S.  In
 the end we went with symbolic differentiation rather than automatic
 differentiation because we felt that symbolic was more flexible.
 
 This is not to say that automatic differentiation isn't a perfectly
 legitimate technique.  However, my recollection is that it would have
 required extensive revisions to the arithmetic expression evaluator,
 which is already very tricky code because of the recycling rule and
 the desire to shield users from knowledge of the internal
 representations and such details as whether you are using logical or
 integer or double precision operands or a combination.
 
 If you want to see these details you can, of course, examine the
 source code.  I don't believe we would have the opportunity to examine
 how you implemented automatic differentiation.
 
 I must also agree with Spencer Graves that when I start reading a
 description of a nonlinear model with over 100 parameters, the example
 that you chose, I immediately start thinking of nonlinear mixed
 effects models.  In my experience the only way in which a nonlinear
 model ends up with that number of parameters is through applying an
 underlying model with a low number of parameters to various groups
 within the data.  Table 2 in the Schnute et al. paper to which you
 make reference states that the number of parameters in the model is T
 + A + 5 where T is the number of years of data and A is the number of
 age classes.  To me that looks a lot like a nonlinear mixed effects
 model.
 
 Also, your choice of subject heading for your message seems
 deliberatively provocative.  You seem to be implying that you are
 discussing a comparisons of AD Model Builder and R on all aspects of
 nonlinear statistical modeling but you are only discussing one
 comparison on simulated data using a model from the applications area
 for which you wrote AD Model Builder.  Then you follow up by saying I
 am not proficient in R and your results for R are from applying code
 that someone else gave you.
 
 It seems that ADMB had a bit of a home-field advantage in this
 particular comparison.
 
 I view nonlinear statistical modeling differently.  I have had a bit
 of experience in the area and I find that I want to examine the data
 carefully, usually through plots, before I embark on fitting
 complicated models.  I like to have some assurance that the model
 makes sense in the context of the data.  (In your example you don't
 need to worry about appropriateness of the model because the data were
 simulated.) I would never try to fit a nonlinear model with 100
 parameters to data without carefully examining the data, and
 especially selected subsets of the data, first.  For this the
 flexibility of the S language

[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder

2006-11-24 Thread dave fournier

There has recently been some discussion on the list about
AD Model builder and the suitability of R for constructing the
types of models used in fisheries management.

   https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html

   https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html

I  think that many R users understimate the numerical challenges
that some of the typical nonlinear statistical model used in different
fields present. R may not be a suitable platform for development for
such models.

Around 10 years ago John Schnute, Laura Richards, and Norm Olsen
with Canadian federal fisheries undertook an investigation
comparing various statistical modeling packages for a simple
age-structured statistical model of the type commonly used in
fisheries. They compared AD Mdel Builder, Gauss, Matlab, and
Splus. Unfortunately a working model could not be produced with Splus
so its times could not be included in the comparison. It is possible
to produce a working model with the present day version of R so that
R can now be directly compared with AD Model Builder for this type of model.

I have put the results of the test together with the original
Schnute and Richards paper and the working R and AD Model Builder
codes on Otter's web site

 http://otter-rsch.ca/tresults.htm

The results are that AD Model builder is roughly 1000 times faster than
R for this problem. ADMB takes about 2 seconds to converge while
R takes over 90 minutes.

This is a simple toy example. Real fisheries models are often hundred of
times more computationally intensive as this one.

Cheers,

 Dave
~
-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder

2006-11-24 Thread dave fournier


   Dave
  Did you try supplying gradient information to nlminb?  (I note that 
nlminb is used for the optimization, but I don't see any gradient 
information supplied to it.) I would suspect that supplying gradient 
information would greatly speed up the computation (as you note in 
comments at http://otter-rsch.ca/tresults.htm.)

Actually you should probably ask Norm Olsen these questions.
I am not proficient in R and am just using his code.

However I can say that providing derivatives for such a model is a 
highly nontrivial exercise. As I said in my posting, the  R script and 
data are available to anyone who feels that the exercise was not carried 
out properly and would like to improve on it. Also one does not need
to provide derivatives to the AD Model Builder program.

Finally suppose that you are very good at calculating derivatives and 
manage to get them right. Then someone else comes along who wants to 
modify the model. Unless they are also very good at calculating 
derivatives there will be trouble.

 
  I'm curious -- when you say R may not be a suitable platform for 
development for such models, what aspect of R do you feel is lacking? 
Is it the specific optimization routines available, or is it some other 
more general aspect?

2 seconds vs 90 minutes. For a real problem of tihs type the timings 
would probably be something like 10 minutes vs more than 2,700 minutes.

 
  Also, another optimization algorithm available in R is the L-BFGS-B 
method for optim() in the MASS package.  I've had extremely good 
experiences with using this code in S-PLUS.  It can take box 
constraints, and can use gradient information.  It is my first choice 
for most optimization problems, and I believe it is very widely used. 
Did you try using that optimization routine with this problem?
 
  -- Tony Plate
 
  dave fournier wrote:
  There has recently been some discussion on the list about
  AD Model builder and the suitability of R for constructing the
  types of models used in fisheries management.
 
 https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html
 
 https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html
 
  I  think that many R users understimate the numerical challenges
  that some of the typical nonlinear statistical model used in different
  fields present. R may not be a suitable platform for development for
  such models.
 
  Around 10 years ago John Schnute, Laura Richards, and Norm Olsen
  with Canadian federal fisheries undertook an investigation
  comparing various statistical modeling packages for a simple
  age-structured statistical model of the type commonly used in
  fisheries. They compared AD Mdel Builder, Gauss, Matlab, and
  Splus. Unfortunately a working model could not be produced with Splus
  so its times could not be included in the comparison. It is possible
  to produce a working model with the present day version of R so that
  R can now be directly compared with AD Model Builder for this type 
of model.
 
  I have put the results of the test together with the original
  Schnute and Richards paper and the working R and AD Model Builder
  codes on Otter's web site
 
   http://otter-rsch.ca/tresults.htm
 
  The results are that AD Model builder is roughly 1000 times faster than
  R for this problem. ADMB takes about 2 seconds to converge while
  R takes over 90 minutes.
 
  This is a simple toy example. Real fisheries models are often hundred of
  times more computationally intensive as this one.
 
  Cheers,
 
   Dave
  ~
 


-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlme

2006-11-24 Thread dave fournier

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlme sorry about that

2006-11-24 Thread dave fournier
Sorry list I twitched and sent the wrong stuff.
Maybe I had enough fun for today.

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmmADMB and the GPL -- formerly-- How to buy R.

2006-05-24 Thread dave fournier
I promise the list that this will be my last posting
regarding this matter.

The package glmmADMB is not and never was the property
or a product of Otter Research Ltd. It was simply a probram I wrote
using ADMB-RE and gave away. I do this all the time to help various 
people. I don't monitor or try to control what they do with it.
The executable nbmm.exe was produced by using the ADMB-RE package,
but under our license executables produced using
ADMB are freely distributable. See for example the
software Coleraine or stock synthesis.

As I stated in my earlier post I was motivated to
produce software for negative binomial mixed models
by a post of Prof Ripleys, that no such good software
existed. I assumed that to mean anywhere not just as
a currently existing R package.
I originally intended this to be just a stand alone
program that anyone could use after they put the data
into the proper format. I soon realized, however that
R users had a different way of thinking about the methods
for specifying these models. Since I have extremely limited
R skills I asked Anders Nielsen and Hans Skaug if they would be interested
in producing something that could be used more easily by R users.
They were not paid for this and acted simply out of a wish to provide a 
service to the R community.
I had no idea how that would go and at that point my involvement in the
development (except for modifying the nbmm.exe to deal with difficult
data) came to an end.

Neither Anders or Hans are employees of or own any part of Otter 
Research Ltd. I am its sole employee/owner.

When the package was finished I agreed to host it on my web site.

I also put up a forum because I wanted to see how the software worked 
for people.

I appears that Hans put the GPL on the package. He can speak for
himself, but I believe he intended the GPL to apply to the R package 
which he had taken over from Anders and not the nbmm.exe Certainly he 
would have no authority to bind Otter Research Ltd in this matter.

Now as to flames and Lawyers.  As I said Otter Research Ltd is a 
company of one person. I have successfully sold software for 16 years 
without a lawyer, mostly to the fisheries management community.
I never felt the need for a lawyer until I gave something away for free
to the R community.

I removed the link that Prof Ripley referred to. If there is another 
offending link I will be happy to remove it as well if you tell me where 
it is. For those who want to use the software I will put it up in its
original form which means that you will need to put your data into
the format that an AD Model Builder program requires. It will still have 
the same functionality but unfortunately will be more difficult for R 
users to access.

Of course if Hans or Anders or anyone else wants to they are welcome to 
take the nbmm.exe and make it available with their own R package if they 
can satisfy whatever the requirements are -- just not on my website.

 Cheers,

  Dave



























-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] glmmADMB and the GPL -- formerly-- How to buy R.

2006-05-23 Thread dave fournier
Dear List,

Some of you have been following the discussion of the GPL and its inclusion
in the glmmADMB package we created for R users. I would like to provide 
a bit of background and include an email we received from
Prof. Ripley so that everyone can be aware of how some might use the
GPL to try to force access to proprietary software. I think this is 
interesting because many have voiced the opinion about the benign nature 
of the GPL and that commercial enterprises who avoid it do so mainly out 
of ignorance.

I have noticed two things:
Users of the R-help list appear to rely largely on the advice of a
rather small number of statistical experts. Second, the R users regard R 
as being more cutting edge and up to date than lists devoted to 
commercial statistical packages like SAS.

For these reasons  I was surprised to see the following post on the web 
in reply to a question on negative binomial mixed models.

 https://stat.ethz.ch/pipermail/r-help/2005-February/066146.html

I thought that this was bad advice as certainly our ADMB-RE software 
could handle this problem easily.  However one never knows exactly what 
sort of data people might use in a particular example that could lead to
difficulties so I decided to code up a program that R users could test 
for this problem.  However R users are used to a different approach for 
model formulation so that it was difficult for the average R user to 
access the program. I approached Anders Nielsen who is both an 
experienced ADMB user and R user and asked him to write an interface in 
R which would make the program more accessible to R users. He created a 
package and the whole thing seems to have had some success with at least 
one PhD thesis based on calculations using it. The R code that Anders
wrote is simply an interface which takes the R specification for the 
model and outputs a data file in the format the an ADMB program expects.
The ADMB program is a stand alone exe. The R script then reads the ADMB 
output files and presents the results to the user in a more familiar R
format. Now it appears at some revision someone put a GPL notice on this 
package although Anders states that he did not do so, and and he is 
certain that it was not originally included by him. In any event the R 
script is easily extracted from the package by those who know how to do 
so and we have no problem with making the ADMB-RE source to the  exe 
(TPL file) available. In fact the original was on our web site but was 
modified as we made to program more robust to deal with difficult data 
sets.  The compiled TPL file links with our proprietary libraries and we 
have no intention of providing the source for these, but that is exactly 
what Prof. Ripley seems to be demanding since he claims that he wants 
the program to run on his computer which it apparently does not do at 
present. Prof. Ripley seems to feel that he is a qualified spokesman for 
the open source community. I have no idea what the community at large 
feels about this.

What follows is Hans Skaug's post with Prof. Ripley's reply.

  On Mon, 22 May 2006, H. Skaug wrote:
 
   About glmmADMB and GPL:
  
   We were not very cautious when we put in the GPL statement.
   What we wanted to say was that the use of glmmADMB is free, and
   does not require a license for AD Model Builder.
 
  But that is not what you said, and you are legally and morally bound to
  fulfill the promise you made.
 
   Am I correct in interpreting this discussion so that all
   we have to do is to remove the License: GPL statement
   from the DESCRIPTION file (and everywhere else it may occur),
   and there will be no conflict between glmmADMB and the
   rules of the R community?
 
  I have made a request under the GPL. `All' you have to do is to fulfill
  it.
 
   We have temporarily withdrawn glmmADMB until this question has been
   settled.
 
  You can withdraw the package, but it has already been distributed under
  GPL, and those who received it under GPL have the right to 
redistribute it
  under GPL, including the sources you are obliged to give them.  That's
  part of the `freedom' that GPL gives.
 
   hans
  
  
  
   Brian Ripley wrote:
  
   The issue in the glmmADMB example is not if they were required 
to release
   it under GPL (my reading from the GPL FAQ is that they probably 
were not,
   given that communication is between processes and the R code is
   interpreted).
  
   Rather, it is stated to be under GPL _but_ there is no source 
code offer
   for the executables (and the GPL FAQ says that for anonymous FTP 
it should
   be downloadable via the same site, and the principles apply 
equally to
   HTTP sites).  As the executables are not for my normal OS and I 
would like
   to exercise my freedom to try the GPLed code, I have requested 
the sources
   from the package maintainer.
  
  
 
  -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied 
Statistics, http://www.stats.ox.ac.uk/~ripley/ University of 

[R] conservative robust estimation in (nonlinear) mixed models

2006-03-24 Thread dave fournier
I believe that Bert's comments are a non sequitur.
I did not and do not propose identifying which components
of the model are contaminated by outliers. What I do propose
is the more or less routine use of conservative robust methods
to replace the normal theory estimators. By definition such estimators
are to be almost as efficient as the normal theory estimators in the 
case where the normal theory applies. One may argue that
conservative robust estimators do not exist for this class of
problems. I think they do, but the obvious way to establish this
claim is to carry out simulations.

Before such simulations can be carried out one must create the
software to do the analysis. So I am proposing to add that to our
R package glmmADMB. Then other R users can carry out their own
simulation analysis to investigate how the method performs.
I think that normal mixtures are better candidates for
conservative robust estimators than say Student's T distribution,
but I will try to include both (and perhaps any others that appear
useful).

  Dave


 Bert raised an issue I had overlooked.  Ideally, we would like to be 
 able to specify a different family for the observations and for each 
 random effect, with Student's t and contaminated normal as valid options 
 in both places.
 
 If I were allowed to specify a family (or a robust family) for either 
 observations or for random effects but not both, I think I'd pick the 
 observations.  I don't know, but I wonder if misspecification of the 
 observation distribution might create more problems with estimation and 
 inference than misspecification of the distribution of a random effect. 
   As Bert indicated, there may be identifiability issues here, and the 
 choice of a model may depend on one's hypotheses about the situation 
 being modeled.
 
 spencer graves
 
 Berton Gunter wrote:
 
 Ok, since Spencer has dived in,I'll go public (I made some prior private
 remarks to David because I didn't think they were worth wasting the list's
 bandwidth on. Heck, they may still not be...)
 
 My question: isn't the difficult issue which levels of the (co)variance
 hierarchy get longer tailed distributions rather than which distributions
 are used to model ong tails? Seems to me that there is an inherent
 identifiability issue here, and even more so with nonlinear models. It's
 easy to construct examples where it all essentially depends on your priors.
 
 Cheers,
 Bert
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
   
  
 
 
-Original Message-
From: r-help-bounces at stat.math.ethz.ch 
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
Sent: Thursday, March 23, 2006 12:34 PM
To: otter at otter-rsch.com
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] conservative robust estimation in 
(nonlinear) mixed models

   I know of two fairly common models for robust 
methods.  One is the 
contaminated normal that you mentioned.  The other is Student's t.  A 
normal plot of the data or of residuals will often indicate 
whether the 
assumption of normality is plausible or not;  when the plot indicates 
problems, it will often also indicate whether a contaminated 
normal or 
Student's t would be better.

   Using Student's t introduces one additional parameter.  A 
contaminated normal would introduce 2;  however, in many 
applications, 
the contamination proportion (or its logit) will often b highly 
correlated with the ratio of the contamination standard deviation to 
that of the central portion of the distribution.  Thus, in 
some cases, 
it's often wise to fix the ratio of the standard deviations 
and estimate 
only the contamination proportion.

   hope this helps.
   spencer graves

dave fournier wrote:


Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust 

estimation option

for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust 

estimation for 

linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard 
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are 

contaminated with

5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I

[R] conservative robust estimation in (nonlinear) mixed models

2006-03-23 Thread dave fournier

Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust estimation option
for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust estimation for 
linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard 
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are contaminated with
5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I wonder if there are other distributions besides a mixture or normals. 
which might be preferable. Three things to keep in mind are:

1.)  It should be likelihood based so that the standard likelihood
  based tests are applicable.

2.)  It should work well when the random effects are normally
 distributed so that things that are already fixed don't get
 broke.

3.)  In order to implement the method efficiently it is necessary to
 be able to produce code for calculating the inverse of the
 cumulative distribution function. This enables one to extend
 methods based one the Laplace approximation for the random
 effects (i.e. the Laplace approximation itself, adaptive
 Gaussian integration, adaptive importance sampling) to the new
 distribution.

  Dave

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] singular convergence in glmmPQL

2006-02-27 Thread dave fournier

Hi,

If you are having trouble with convergence in glmmPQL
you might want to try our glmmADMB package for R at

http://otter-rsch.com/admbre/examples/glmmadmb/glmmADMB.html

It is more stable according to some users.

I believe the corresponding commands for
your model are

# paste this after your other commands
  df-data.frame(y.long,arm.long,id.long)
# load glmmADMB package
 library(glmmADMB)
# invoke glmmADMB routine
 re.1 - glmm.admb(y.long ~ arm.long,
random=~1,group='id.long',family='binomial',link='logit',data=df)

and that should do it.

   Cheers,

   Dave



-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3s3
Canada
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] glmmpql and lmer keep failing

2005-11-01 Thread dave fournier
Hi,

I believe we could extend our upcoming release of our
freely available AD Model Builder negative binomial mixed model

   http://otter-rsch.com/admbre/admbre.html

packaqge for R to include your model.  Writing the model is
simple, it is the interface with R that is a bit more difficult.
If you contact me privately I will take a look at it.

   Cheers,

   Dave

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3s3
Canada
http://otter-rsch.com


--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] TRAMO-SEATS confusion?

2005-10-16 Thread dave fournier
Hi,

For what its worth I had to hack some horrendous old
FORTRAN which I could not come close to understanding.
The main issue was to allow dynmically allocated arrays.
What I did was to run it through f2c to produce C++
code. Then I verified that it still worked. After that I replaced
the pointers with a vector class with optional bounds checking with an 
overloaded [] operator so that all the old code would still compile.
Then I just checked out all the bounds violations until everything
worked and I was done.

 Dave

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3s3
Canada
http://otter-rsch.com


--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Do Users of Nonlinear Mixed Effects Models Know Whether Their Software Really Works?

2005-10-13 Thread dave fournier
)  6.8059  2.6088
 # of obs: 1908, groups: subject, 294
 Estimated scale (compare to 1)  0.9091945
 Fixed effects:
  Estimate Std. Error  z value Pr(|z|)
 (Intercept) -0.626214   0.264996  -2.3631  0.01812 *
 treat   -0.304660   0.360866  -0.8442  0.39853
 time-0.346605   0.02 -12.9979   2e-16 ***
 sigma2.608
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 Warning message:
 optim or nlminb returned message ERROR:
 ABNORMAL_TERMINATION_IN_LNSRCH
  in: LMEopt(x = mer, value = cv)


   The R routine correctly identifies the treatment effect as not
   significant. However the parameter estimates are poor.


   Likelihood Ratio Testing

   Accurate calculation of the log-likelihood value is desirable so that
   hypothesis testing can be carried out using likelihood ratio tests.

   However as noted above the use of Gaussian integration is not
   practical for  many nonlinear mixed models. We were interested in
   seeing how well the use of the approximate log-likelihood values
   produced by ADMB-RE's Laplace approximation option would perform.

   We consider the alternative model with an extra interaction term
   (b_4) from  Lesaffre et al.

   Here are the results for the laplace approximation:

 # Number of parameters = 5  log-likelihood = -627.809
 name   value  std dev   P vlaue
 b_1  -2.5233e+00 7.8829e-01 0.002
 b_2  -3.0702e-01 6.8996e-01  0.655
 b_3  -4.0009e-01 4.7059e-02 0.001
 b_4  -1.3726e-01 6.9586e-02  0.044
sigma  4.5783e+00 7.2100e-01

   and the exact parameter estimates by 100 point  Gaussian
   integration.


 # Number of parameters = 5  log-likelihood = -625.398
 name   value  std dev   P value
  b_1 -1.6183e+00 4.3427e-01 0.001
  b_2 -1.6077e-01 5.8394e-01  0.783
  b_3 -3.9100e-01 4.4380e-02 0.001
  b_4 -1.3679e-01 6.8013e-02  0.044
 sigma 4.0131e+00 3.8044e-01

The log-likelihood differences are 2.01 for the Laplace
approximation and 2.08 for Gaussian integration.
Since the 95% point for hypothesis testing is 1.92
use of either model results in acceptance of the interaction
term.

   Conclusions

   With the exception of AD Model Builder random effect module none of
   the packages tested appear to function reliably for this problem.
   SAS NLMIXED  was  beginning to exhibit symptoms of instability which
   would probably render it unreliable on more difficult problems.  We
   can see no reason for using quasi-likelihoods to fit nonlinear
   mixed models when ADMB-RE can fit the models by maximum likelihood
   with all the advantages that ensue.

   Note

   We realize that there are many other packages out there. We would
   welcome results for other packages. If we can find a serious
   competitor to AD Model Builder then we could move on to comparing
   the relative performance on more difficult models.

Cheers,

Dave Fournier

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3s3
Canada
http://otter-rsch.com


-- 
Internal Virus Database is out-of-date.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re : Do Users of Nonlinear Mixed Effects Models Know Whether Their Software Really Works?

2005-10-13 Thread dave fournier

Actually one can download a working verssion of our software for free.
So anyonme can verify these results.
It is restricted by needing a network connection to get permission to
operate and is for evaluation only.  The current version does not have 
the Gauss Hermite
integration which I put in to find the true value for this comparison,
but it will soon. In addition we have made our
poisson - negative binomial zero inflated mixed model software for R
freely available.

   http://otter-rsch.com/admbre/admbre.html

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3s3
Canada
http://otter-rsch.com


--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Fitting a mixed negative binomial model

2005-04-13 Thread dave fournier
 I *think* (but am not sure) that these guys were actually (politely)
advertising a commercial package that they're developing.  But, 
looking at
the web page, it seems that this module may be freely available -- can't
tell at the moment.

Ben
The Software for negative binomial mixed models will be
free ie free as in you can use it without paying anything.
It is built using our
proprietary software.  The idea is to show how our software
is good for building nonlinear statstical models including
those with random effects.  Turning our stand alone software
into somethng that can be called easily from r has been a
bit of a steep learning curve for me, but we are making progress.
So far we have looked at 3 models. The model in Booth et al. (easy).
An overdispersed data set that turned out probably be
a zero inflated poisson (faily easy but the negative binomial
is only fit to be rejected for the simpler model) and
what appears to be a true negative binomial (difficult but
doable) and we are discussing the form of the model with the
person who wishes to analyze it.
A few more data sets would be useful if anyone has
an application so that we can ensure the robustness of our
software.
Dave


--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] negativr binomial glmm's

2005-04-05 Thread dave fournier
Hi,
I guess by now you relaize that we have been trying to
promote our NBGLMM in order to show people some of the
capabiuliteis of our randome effects software ADMB-RE.
If you want I can help you to analyze your data with
the package we talk about on the R list. All I ask is that
if it works for you you write a brief note to the list
telling how it helped you.
 Cheers,
  dave
   Dave Fournier
   Otter Research Ltd
   PO Box 2040, sidney B.C.
   Canada  V8L 3S3
--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html