[R] Linear models over large datasets
Its actually only a few lines of code to do this from first principles. The coefficients depend only on the cross products X'X and X'y and you can build them up easily by extending this example to read files or a database holding x and y instead of getting them from the args. Here we process incr rows of builtin matrix state.x77 at a time building up the two cross productxts, xtx and xty, regressing Income (variable 2) on the other variables: mylm - function(x, y, incr = 25) { start - xtx - xty - 0 while(start nrow(x)) { idx - seq(start + 1, min(start + incr, nrow(x))) x1 - cbind(1, x[idx,]) xtx - xtx + crossprod(x1) xty - xty + crossprod(x1, y[idx]) start - start + incr } solve(xtx, xty) } mylm(state.x77[,-2], state.x77[,2]) On 8/16/07, Alp ATICI alpatici at gmail.com wrote: I'd like to fit linear models on very large datasets. My data frames are about 200 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the R Data Import/Export guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory. What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. __ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If your design matrix X is very well behaved this approach may work for you. Often just doing solve(X'X,y) will fail for numerical reasons. The right way to do it is tofactor the matrix X as X = A * B where B is 200x200 in your case and A is 200 x 200 with A'*A = I (That is A is orthogonal.) so X'*X = B' *B and you use solve(B'*B,y); To find A and B you can use modified Gram-Schmidt which is very easy to program and works well when you wish to store the columns of X on a hard disk and just read in a bit at a time. Some people claim that modifed Gram-Schmidt is unstable but it has always worked well for me. In any event you can check to ensure that A'*A = I and X=A*B Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using odesolve to produce non-negative solutions
If you didn't get this solved. I have done parameter estimation with models defined by ODE's where negative solutions are a problem and one can only avoid them with great difficulty if the standard explicit methods for solving the ODE are used. I found that using implicit methods could be a great help. For example in the exponential case dN/dt = -k*N the simple finite difference approximation is N_{t+1}-N+t -- = -k*N_t , k=0 h or N_{t+1} = N_t -k*h*N_t and if k*h gets too large N_{t+1} goes negative and you are in trouble. Consider instead the implicit formulation where the second N_t on the RHS is replaced by N_{t+1} and one gets N_{t+1} = N_t/(1+k*h) which is correct for k*h=0 and as k*h-- infinity For a more complicated example see http://otter-rsch.com/admodel/cc4.html for something I called semi-implicit. I hope these ideas will be useful for your problem. Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
simulated.) I would never try to fit a nonlinear model with 100 parameters to data without carefully examining the data, and especially selected subsets of the data, first. For this the flexibility of the S language and tools like lattice graphics that were developed in this language are invaluable to me. The flexibility of data manipulation and graphics for interactive exploration of data is what attracted me to S in the first place. I realize that for many people the area of nonlinear statistical modeling is reduced to Fit this model to these data and don't ask any questions. Just give me parameter estimates and p-values. If that is your situation then it would make sense to use software that gets you those estimates as quickly as possible with a minimum of effort. I'm just happy that I get to turn down people who ask me to do that. I like that fact that I can spend my time asking questions about the data and of the data. snip -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com Douglas Bates wrote: On 11/24/06, dave fournier [EMAIL PROTECTED] wrote: Dave Did you try supplying gradient information to nlminb? (I note that nlminb is used for the optimization, but I don't see any gradient information supplied to it.) I would suspect that supplying gradient information would greatly speed up the computation (as you note in comments at http://otter-rsch.ca/tresults.htm.) Actually you should probably ask Norm Olsen these questions. I am not proficient in R and am just using his code. Don't you find it somewhat disingenuous that you publish a comparison between the AD Model Builder software that you sell and R - a comparison that shows a tremendous advantage for your software - and then you write I am not proficient in R? Had you been proficient in R you might have known about the symbolic differentiation capabilities, specifically the deriv function, that have been part of the S language since the late 1980s. I believe that the 'AD' in AD model builder stands for automatic differentiation, which is actually something that John Chambers and I discussed at length when we were developing nonlinear modeling methods for S. In the end we went with symbolic differentiation rather than automatic differentiation because we felt that symbolic was more flexible. This is not to say that automatic differentiation isn't a perfectly legitimate technique. However, my recollection is that it would have required extensive revisions to the arithmetic expression evaluator, which is already very tricky code because of the recycling rule and the desire to shield users from knowledge of the internal representations and such details as whether you are using logical or integer or double precision operands or a combination. If you want to see these details you can, of course, examine the source code. I don't believe we would have the opportunity to examine how you implemented automatic differentiation. I must also agree with Spencer Graves that when I start reading a description of a nonlinear model with over 100 parameters, the example that you chose, I immediately start thinking of nonlinear mixed effects models. In my experience the only way in which a nonlinear model ends up with that number of parameters is through applying an underlying model with a low number of parameters to various groups within the data. Table 2 in the Schnute et al. paper to which you make reference states that the number of parameters in the model is T + A + 5 where T is the number of years of data and A is the number of age classes. To me that looks a lot like a nonlinear mixed effects model. Also, your choice of subject heading for your message seems deliberatively provocative. You seem to be implying that you are discussing a comparisons of AD Model Builder and R on all aspects of nonlinear statistical modeling but you are only discussing one comparison on simulated data using a model from the applications area for which you wrote AD Model Builder. Then you follow up by saying I am not proficient in R and your results for R are from applying code that someone else gave you. It seems that ADMB had a bit of a home-field advantage in this particular comparison. I view nonlinear statistical modeling differently. I have had a bit of experience in the area and I find that I want to examine the data carefully, usually through plots, before I embark on fitting complicated models. I like to have some assurance that the model makes sense in the context of the data. (In your example you don't need to worry about appropriateness of the model because the data were simulated.) I would never try to fit a nonlinear model with 100 parameters to data without carefully examining the data, and especially selected subsets of the data, first. For this the flexibility of the S language
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
There has recently been some discussion on the list about AD Model builder and the suitability of R for constructing the types of models used in fisheries management. https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html I think that many R users understimate the numerical challenges that some of the typical nonlinear statistical model used in different fields present. R may not be a suitable platform for development for such models. Around 10 years ago John Schnute, Laura Richards, and Norm Olsen with Canadian federal fisheries undertook an investigation comparing various statistical modeling packages for a simple age-structured statistical model of the type commonly used in fisheries. They compared AD Mdel Builder, Gauss, Matlab, and Splus. Unfortunately a working model could not be produced with Splus so its times could not be included in the comparison. It is possible to produce a working model with the present day version of R so that R can now be directly compared with AD Model Builder for this type of model. I have put the results of the test together with the original Schnute and Richards paper and the working R and AD Model Builder codes on Otter's web site http://otter-rsch.ca/tresults.htm The results are that AD Model builder is roughly 1000 times faster than R for this problem. ADMB takes about 2 seconds to converge while R takes over 90 minutes. This is a simple toy example. Real fisheries models are often hundred of times more computationally intensive as this one. Cheers, Dave ~ -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
Dave Did you try supplying gradient information to nlminb? (I note that nlminb is used for the optimization, but I don't see any gradient information supplied to it.) I would suspect that supplying gradient information would greatly speed up the computation (as you note in comments at http://otter-rsch.ca/tresults.htm.) Actually you should probably ask Norm Olsen these questions. I am not proficient in R and am just using his code. However I can say that providing derivatives for such a model is a highly nontrivial exercise. As I said in my posting, the R script and data are available to anyone who feels that the exercise was not carried out properly and would like to improve on it. Also one does not need to provide derivatives to the AD Model Builder program. Finally suppose that you are very good at calculating derivatives and manage to get them right. Then someone else comes along who wants to modify the model. Unless they are also very good at calculating derivatives there will be trouble. I'm curious -- when you say R may not be a suitable platform for development for such models, what aspect of R do you feel is lacking? Is it the specific optimization routines available, or is it some other more general aspect? 2 seconds vs 90 minutes. For a real problem of tihs type the timings would probably be something like 10 minutes vs more than 2,700 minutes. Also, another optimization algorithm available in R is the L-BFGS-B method for optim() in the MASS package. I've had extremely good experiences with using this code in S-PLUS. It can take box constraints, and can use gradient information. It is my first choice for most optimization problems, and I believe it is very widely used. Did you try using that optimization routine with this problem? -- Tony Plate dave fournier wrote: There has recently been some discussion on the list about AD Model builder and the suitability of R for constructing the types of models used in fisheries management. https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html I think that many R users understimate the numerical challenges that some of the typical nonlinear statistical model used in different fields present. R may not be a suitable platform for development for such models. Around 10 years ago John Schnute, Laura Richards, and Norm Olsen with Canadian federal fisheries undertook an investigation comparing various statistical modeling packages for a simple age-structured statistical model of the type commonly used in fisheries. They compared AD Mdel Builder, Gauss, Matlab, and Splus. Unfortunately a working model could not be produced with Splus so its times could not be included in the comparison. It is possible to produce a working model with the present day version of R so that R can now be directly compared with AD Model Builder for this type of model. I have put the results of the test together with the original Schnute and Richards paper and the working R and AD Model Builder codes on Otter's web site http://otter-rsch.ca/tresults.htm The results are that AD Model builder is roughly 1000 times faster than R for this problem. ADMB takes about 2 seconds to converge while R takes over 90 minutes. This is a simple toy example. Real fisheries models are often hundred of times more computationally intensive as this one. Cheers, Dave ~ -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlme
-- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nlme sorry about that
Sorry list I twitched and sent the wrong stuff. Maybe I had enough fun for today. -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmmADMB and the GPL -- formerly-- How to buy R.
I promise the list that this will be my last posting regarding this matter. The package glmmADMB is not and never was the property or a product of Otter Research Ltd. It was simply a probram I wrote using ADMB-RE and gave away. I do this all the time to help various people. I don't monitor or try to control what they do with it. The executable nbmm.exe was produced by using the ADMB-RE package, but under our license executables produced using ADMB are freely distributable. See for example the software Coleraine or stock synthesis. As I stated in my earlier post I was motivated to produce software for negative binomial mixed models by a post of Prof Ripleys, that no such good software existed. I assumed that to mean anywhere not just as a currently existing R package. I originally intended this to be just a stand alone program that anyone could use after they put the data into the proper format. I soon realized, however that R users had a different way of thinking about the methods for specifying these models. Since I have extremely limited R skills I asked Anders Nielsen and Hans Skaug if they would be interested in producing something that could be used more easily by R users. They were not paid for this and acted simply out of a wish to provide a service to the R community. I had no idea how that would go and at that point my involvement in the development (except for modifying the nbmm.exe to deal with difficult data) came to an end. Neither Anders or Hans are employees of or own any part of Otter Research Ltd. I am its sole employee/owner. When the package was finished I agreed to host it on my web site. I also put up a forum because I wanted to see how the software worked for people. I appears that Hans put the GPL on the package. He can speak for himself, but I believe he intended the GPL to apply to the R package which he had taken over from Anders and not the nbmm.exe Certainly he would have no authority to bind Otter Research Ltd in this matter. Now as to flames and Lawyers. As I said Otter Research Ltd is a company of one person. I have successfully sold software for 16 years without a lawyer, mostly to the fisheries management community. I never felt the need for a lawyer until I gave something away for free to the R community. I removed the link that Prof Ripley referred to. If there is another offending link I will be happy to remove it as well if you tell me where it is. For those who want to use the software I will put it up in its original form which means that you will need to put your data into the format that an AD Model Builder program requires. It will still have the same functionality but unfortunately will be more difficult for R users to access. Of course if Hans or Anders or anyone else wants to they are welcome to take the nbmm.exe and make it available with their own R package if they can satisfy whatever the requirements are -- just not on my website. Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glmmADMB and the GPL -- formerly-- How to buy R.
Dear List, Some of you have been following the discussion of the GPL and its inclusion in the glmmADMB package we created for R users. I would like to provide a bit of background and include an email we received from Prof. Ripley so that everyone can be aware of how some might use the GPL to try to force access to proprietary software. I think this is interesting because many have voiced the opinion about the benign nature of the GPL and that commercial enterprises who avoid it do so mainly out of ignorance. I have noticed two things: Users of the R-help list appear to rely largely on the advice of a rather small number of statistical experts. Second, the R users regard R as being more cutting edge and up to date than lists devoted to commercial statistical packages like SAS. For these reasons I was surprised to see the following post on the web in reply to a question on negative binomial mixed models. https://stat.ethz.ch/pipermail/r-help/2005-February/066146.html I thought that this was bad advice as certainly our ADMB-RE software could handle this problem easily. However one never knows exactly what sort of data people might use in a particular example that could lead to difficulties so I decided to code up a program that R users could test for this problem. However R users are used to a different approach for model formulation so that it was difficult for the average R user to access the program. I approached Anders Nielsen who is both an experienced ADMB user and R user and asked him to write an interface in R which would make the program more accessible to R users. He created a package and the whole thing seems to have had some success with at least one PhD thesis based on calculations using it. The R code that Anders wrote is simply an interface which takes the R specification for the model and outputs a data file in the format the an ADMB program expects. The ADMB program is a stand alone exe. The R script then reads the ADMB output files and presents the results to the user in a more familiar R format. Now it appears at some revision someone put a GPL notice on this package although Anders states that he did not do so, and and he is certain that it was not originally included by him. In any event the R script is easily extracted from the package by those who know how to do so and we have no problem with making the ADMB-RE source to the exe (TPL file) available. In fact the original was on our web site but was modified as we made to program more robust to deal with difficult data sets. The compiled TPL file links with our proprietary libraries and we have no intention of providing the source for these, but that is exactly what Prof. Ripley seems to be demanding since he claims that he wants the program to run on his computer which it apparently does not do at present. Prof. Ripley seems to feel that he is a qualified spokesman for the open source community. I have no idea what the community at large feels about this. What follows is Hans Skaug's post with Prof. Ripley's reply. On Mon, 22 May 2006, H. Skaug wrote: About glmmADMB and GPL: We were not very cautious when we put in the GPL statement. What we wanted to say was that the use of glmmADMB is free, and does not require a license for AD Model Builder. But that is not what you said, and you are legally and morally bound to fulfill the promise you made. Am I correct in interpreting this discussion so that all we have to do is to remove the License: GPL statement from the DESCRIPTION file (and everywhere else it may occur), and there will be no conflict between glmmADMB and the rules of the R community? I have made a request under the GPL. `All' you have to do is to fulfill it. We have temporarily withdrawn glmmADMB until this question has been settled. You can withdraw the package, but it has already been distributed under GPL, and those who received it under GPL have the right to redistribute it under GPL, including the sources you are obliged to give them. That's part of the `freedom' that GPL gives. hans Brian Ripley wrote: The issue in the glmmADMB example is not if they were required to release it under GPL (my reading from the GPL FAQ is that they probably were not, given that communication is between processes and the R code is interpreted). Rather, it is stated to be under GPL _but_ there is no source code offer for the executables (and the GPL FAQ says that for anonymous FTP it should be downloadable via the same site, and the principles apply equally to HTTP sites). As the executables are not for my normal OS and I would like to exercise my freedom to try the GPLed code, I have requested the sources from the package maintainer. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of
[R] conservative robust estimation in (nonlinear) mixed models
I believe that Bert's comments are a non sequitur. I did not and do not propose identifying which components of the model are contaminated by outliers. What I do propose is the more or less routine use of conservative robust methods to replace the normal theory estimators. By definition such estimators are to be almost as efficient as the normal theory estimators in the case where the normal theory applies. One may argue that conservative robust estimators do not exist for this class of problems. I think they do, but the obvious way to establish this claim is to carry out simulations. Before such simulations can be carried out one must create the software to do the analysis. So I am proposing to add that to our R package glmmADMB. Then other R users can carry out their own simulation analysis to investigate how the method performs. I think that normal mixtures are better candidates for conservative robust estimators than say Student's T distribution, but I will try to include both (and perhaps any others that appear useful). Dave Bert raised an issue I had overlooked. Ideally, we would like to be able to specify a different family for the observations and for each random effect, with Student's t and contaminated normal as valid options in both places. If I were allowed to specify a family (or a robust family) for either observations or for random effects but not both, I think I'd pick the observations. I don't know, but I wonder if misspecification of the observation distribution might create more problems with estimation and inference than misspecification of the distribution of a random effect. As Bert indicated, there may be identifiability issues here, and the choice of a model may depend on one's hypotheses about the situation being modeled. spencer graves Berton Gunter wrote: Ok, since Spencer has dived in,I'll go public (I made some prior private remarks to David because I didn't think they were worth wasting the list's bandwidth on. Heck, they may still not be...) My question: isn't the difficult issue which levels of the (co)variance hierarchy get longer tailed distributions rather than which distributions are used to model ong tails? Seems to me that there is an inherent identifiability issue here, and even more so with nonlinear models. It's easy to construct examples where it all essentially depends on your priors. Cheers, Bert -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA -Original Message- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves Sent: Thursday, March 23, 2006 12:34 PM To: otter at otter-rsch.com Cc: r-help at stat.math.ethz.ch Subject: Re: [R] conservative robust estimation in (nonlinear) mixed models I know of two fairly common models for robust methods. One is the contaminated normal that you mentioned. The other is Student's t. A normal plot of the data or of residuals will often indicate whether the assumption of normality is plausible or not; when the plot indicates problems, it will often also indicate whether a contaminated normal or Student's t would be better. Using Student's t introduces one additional parameter. A contaminated normal would introduce 2; however, in many applications, the contamination proportion (or its logit) will often b highly correlated with the ratio of the contamination standard deviation to that of the central portion of the distribution. Thus, in some cases, it's often wise to fix the ratio of the standard deviations and estimate only the contamination proportion. hope this helps. spencer graves dave fournier wrote: Conservative robust estimation methods do not appear to be currently available in the standard mixed model methods for R, where by conservative robust estimation I mean methods which work almost as well as the methods based on assumptions of normality when the assumption of normality *IS* satisfied. We are considering adding such a conservative robust estimation option for the random effects to our AD Model Builder mixed model package, glmmADMB, for R, and perhaps extending it to do robust estimation for linear mixed models at the same time. An obvious candidate is to assume something like a mixture of normals. I have tested this in a simple linear mixed model using 5% contamination with a normal with 3 times the standard deviation, which seems to be a common assumption. Simulation results indicate that when the random effects are normally distributed this estimator is about 3% less efficient, while when the random effects are contaminated with 5% outliers the estimator is about 23% more efficient, where by 23% more efficient I mean that one would have to use a sample size about 23% larger to obtain the same size confidence limits for the parameters. Question? I
[R] conservative robust estimation in (nonlinear) mixed models
Conservative robust estimation methods do not appear to be currently available in the standard mixed model methods for R, where by conservative robust estimation I mean methods which work almost as well as the methods based on assumptions of normality when the assumption of normality *IS* satisfied. We are considering adding such a conservative robust estimation option for the random effects to our AD Model Builder mixed model package, glmmADMB, for R, and perhaps extending it to do robust estimation for linear mixed models at the same time. An obvious candidate is to assume something like a mixture of normals. I have tested this in a simple linear mixed model using 5% contamination with a normal with 3 times the standard deviation, which seems to be a common assumption. Simulation results indicate that when the random effects are normally distributed this estimator is about 3% less efficient, while when the random effects are contaminated with 5% outliers the estimator is about 23% more efficient, where by 23% more efficient I mean that one would have to use a sample size about 23% larger to obtain the same size confidence limits for the parameters. Question? I wonder if there are other distributions besides a mixture or normals. which might be preferable. Three things to keep in mind are: 1.) It should be likelihood based so that the standard likelihood based tests are applicable. 2.) It should work well when the random effects are normally distributed so that things that are already fixed don't get broke. 3.) In order to implement the method efficiently it is necessary to be able to produce code for calculating the inverse of the cumulative distribution function. This enables one to extend methods based one the Laplace approximation for the random effects (i.e. the Laplace approximation itself, adaptive Gaussian integration, adaptive importance sampling) to the new distribution. Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] singular convergence in glmmPQL
Hi, If you are having trouble with convergence in glmmPQL you might want to try our glmmADMB package for R at http://otter-rsch.com/admbre/examples/glmmadmb/glmmADMB.html It is more stable according to some users. I believe the corresponding commands for your model are # paste this after your other commands df-data.frame(y.long,arm.long,id.long) # load glmmADMB package library(glmmADMB) # invoke glmmADMB routine re.1 - glmm.admb(y.long ~ arm.long, random=~1,group='id.long',family='binomial',link='logit',data=df) and that should do it. Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3s3 Canada http://otter-rsch.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glmmpql and lmer keep failing
Hi, I believe we could extend our upcoming release of our freely available AD Model Builder negative binomial mixed model http://otter-rsch.com/admbre/admbre.html packaqge for R to include your model. Writing the model is simple, it is the interface with R that is a bit more difficult. If you contact me privately I will take a look at it. Cheers, Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3s3 Canada http://otter-rsch.com -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] TRAMO-SEATS confusion?
Hi, For what its worth I had to hack some horrendous old FORTRAN which I could not come close to understanding. The main issue was to allow dynmically allocated arrays. What I did was to run it through f2c to produce C++ code. Then I verified that it still worked. After that I replaced the pointers with a vector class with optional bounds checking with an overloaded [] operator so that all the old code would still compile. Then I just checked out all the bounds violations until everything worked and I was done. Dave -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3s3 Canada http://otter-rsch.com -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Do Users of Nonlinear Mixed Effects Models Know Whether Their Software Really Works?
) 6.8059 2.6088 # of obs: 1908, groups: subject, 294 Estimated scale (compare to 1) 0.9091945 Fixed effects: Estimate Std. Error z value Pr(|z|) (Intercept) -0.626214 0.264996 -2.3631 0.01812 * treat -0.304660 0.360866 -0.8442 0.39853 time-0.346605 0.02 -12.9979 2e-16 *** sigma2.608 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning message: optim or nlminb returned message ERROR: ABNORMAL_TERMINATION_IN_LNSRCH in: LMEopt(x = mer, value = cv) The R routine correctly identifies the treatment effect as not significant. However the parameter estimates are poor. Likelihood Ratio Testing Accurate calculation of the log-likelihood value is desirable so that hypothesis testing can be carried out using likelihood ratio tests. However as noted above the use of Gaussian integration is not practical for many nonlinear mixed models. We were interested in seeing how well the use of the approximate log-likelihood values produced by ADMB-RE's Laplace approximation option would perform. We consider the alternative model with an extra interaction term (b_4) from Lesaffre et al. Here are the results for the laplace approximation: # Number of parameters = 5 log-likelihood = -627.809 name value std dev P vlaue b_1 -2.5233e+00 7.8829e-01 0.002 b_2 -3.0702e-01 6.8996e-01 0.655 b_3 -4.0009e-01 4.7059e-02 0.001 b_4 -1.3726e-01 6.9586e-02 0.044 sigma 4.5783e+00 7.2100e-01 and the exact parameter estimates by 100 point Gaussian integration. # Number of parameters = 5 log-likelihood = -625.398 name value std dev P value b_1 -1.6183e+00 4.3427e-01 0.001 b_2 -1.6077e-01 5.8394e-01 0.783 b_3 -3.9100e-01 4.4380e-02 0.001 b_4 -1.3679e-01 6.8013e-02 0.044 sigma 4.0131e+00 3.8044e-01 The log-likelihood differences are 2.01 for the Laplace approximation and 2.08 for Gaussian integration. Since the 95% point for hypothesis testing is 1.92 use of either model results in acceptance of the interaction term. Conclusions With the exception of AD Model Builder random effect module none of the packages tested appear to function reliably for this problem. SAS NLMIXED was beginning to exhibit symptoms of instability which would probably render it unreliable on more difficult problems. We can see no reason for using quasi-likelihoods to fit nonlinear mixed models when ADMB-RE can fit the models by maximum likelihood with all the advantages that ensue. Note We realize that there are many other packages out there. We would welcome results for other packages. If we can find a serious competitor to AD Model Builder then we could move on to comparing the relative performance on more difficult models. Cheers, Dave Fournier -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3s3 Canada http://otter-rsch.com -- Internal Virus Database is out-of-date. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re : Do Users of Nonlinear Mixed Effects Models Know Whether Their Software Really Works?
Actually one can download a working verssion of our software for free. So anyonme can verify these results. It is restricted by needing a network connection to get permission to operate and is for evaluation only. The current version does not have the Gauss Hermite integration which I put in to find the true value for this comparison, but it will soon. In addition we have made our poisson - negative binomial zero inflated mixed model software for R freely available. http://otter-rsch.com/admbre/admbre.html -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3s3 Canada http://otter-rsch.com -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Fitting a mixed negative binomial model
I *think* (but am not sure) that these guys were actually (politely) advertising a commercial package that they're developing. But, looking at the web page, it seems that this module may be freely available -- can't tell at the moment. Ben The Software for negative binomial mixed models will be free ie free as in you can use it without paying anything. It is built using our proprietary software. The idea is to show how our software is good for building nonlinear statstical models including those with random effects. Turning our stand alone software into somethng that can be called easily from r has been a bit of a steep learning curve for me, but we are making progress. So far we have looked at 3 models. The model in Booth et al. (easy). An overdispersed data set that turned out probably be a zero inflated poisson (faily easy but the negative binomial is only fit to be rejected for the simpler model) and what appears to be a true negative binomial (difficult but doable) and we are discussing the form of the model with the person who wishes to analyze it. A few more data sets would be useful if anyone has an application so that we can ensure the robustness of our software. Dave -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] negativr binomial glmm's
Hi, I guess by now you relaize that we have been trying to promote our NBGLMM in order to show people some of the capabiuliteis of our randome effects software ADMB-RE. If you want I can help you to analyze your data with the package we talk about on the R list. All I ask is that if it works for you you write a brief note to the list telling how it helped you. Cheers, dave Dave Fournier Otter Research Ltd PO Box 2040, sidney B.C. Canada V8L 3S3 -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html