Re: [R] fast rowCumsums wanted for calculating the cdf
Although I know there is another message in this thread I am replying to this message to be able to include the whole discussion to this point. Gregor mentioned the possibility of extending the compiled code for cumsum so that it would handle the matrix case. The work by Dirk Eddelbuettel and Romain Francois on developing the Rcpp package make it surprisingly easy to create compiled code for this task. I am copying the Rcpp-devel list on this in case one of the readers of that list has time to create a sample implementation before I can get to it. It's midterm season and I have to tackle the stack of blue books on my desk before doing fun things like writing code. On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi, You might look at Reduce(). It seems faster. I converted the matrix to a list in an incredibly sloppy way (which you should not emulate) because I cannot think of the simple way. probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; add - function(x) {Reduce(`+`, x, accumulate = TRUE)} system.time(F.slow - t(apply(probs, 1, cumsum))) user system elapsed 36.758 0.416 42.464 system.time(for (cc in 2:ncol(F)) { + F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; + }) user system elapsed 0.980 0.196 1.328 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) user system elapsed 0.420 0.072 0.539 .539 seconds for 10 vectors each with 1 million elements does not seem bad. For 3 each, on my system it took .014 seconds, which for 200,000 iterations, I estimated as: (20*.014)/60/60 [1] 0.778 (and this is on a stone age system with a single core processor and only 756MB of RAM) It should not be difficult to get the list back to a matrix. Cheers, Josh On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote: Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregor mailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F - t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this - probably routine - task? Thanks in advance, Gregor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how fit linear model with fixed slope?
On Fri, Oct 22, 2010 at 9:13 AM, Czerminski, Ryszard ryszard.czermin...@astrazeneca.com wrote: I want to fit a linear model with fixed slope e.g. y = x + b (instead of general: y = a*x + b) Is it possible to do with lm()? Yes. The simplest way is to fit lm(y - a*x ~ 1) which will give you the estimate of b, its standard error, etc. An alternative is to use an offset argument or an offset() expression in the model formula lm(y ~ 1 + offset(a*x)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Big data (over 2GB) and lmer
On Thu, Oct 21, 2010 at 2:00 PM, Ben Bolker bbol...@gmail.com wrote: Michal Figurski figurski at mail.med.upenn.edu writes: I have a data set of roughly 10 million records, 7 columns. It has only about 500MB as a csv, so it fits in the memory. It's painfully slow to do anything with it, but it's possible. I also have another dataset of covariates that I would like to explore - with about 4GB of data... I would like to merge the two datasets and use lmer to build a mixed effects model. Is there a way, for example using 'bigmemory' or 'ff', or any other trick, to enable lmer to work on this data set? I don't think this will be easy. Do you really need mixed effects for this task? i.e., are your numbers per group sufficiently small that you will benefit from the shrinkage etc. afforded by mixed models? If you have (say) 1 individuals per group, 1000 groups, then I would expect you'd get very accurate estimates of the group coefficients, you could then calculate variances etc. among these estimates. You might get more informed answers on r-sig-mixed-mod...@r-project.org ... lmer already stores the model matrices and factors related to the random effects as sparse matrices. Depending on the complexity of the model - in particular, if random effects are defined with respect to more than one grouping factor and, if so, if those factors are nested or not - storing the Cholesky factor of the random effects model matrix will be the limiting factor. This object has many slots but only two very large ones in the sense that they are long vectors. At present vectors accessed or created by R are limited to 2^31 elements because they are indexed by 32-bit integers. So the short answer is, it depends. Simple models may be possible. Complex models will need to wait upon decisions about using wider integer representations in indices. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme vs. lmer results
On Tue, Oct 26, 2010 at 12:27 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello, and sorry for asking a question without the data - hope it can still be answered: I've run two things on the same data: # Using lme: mix.lme - lme(DV ~a+b+c+d+e+f+h+i, random = random = ~ e+f+h+i| group, data = mydata) # Using lmer mix.lmer - lmer(DV ~a+b+c+d+(1|group)+(e|group)+(f|group)+(h|group)+(i|group), data = mydata) Those models aren't the same and the model for lmer doesn't make sense. You would need to write the random effects terms as (0+e|group), etc. because (e|group) is the same as (1 + e|group) so you are including (Intercept) random effects for group in each of those 5 terms. To generate the same model as you fit with lme you would use mix.lmer - lmer(DV ~a+b+c+d+(e+f+g+h+ii|group), mydata) I wouldn't recommend it though as this requires estimating 21 variance and covariance parameters for the random effects. Almost certainly the estimated variance-covariance matrix will end up being singular. Unless you are careful you may not notice this. lme provided an output (fixed effects and random effects coefficients). lme is not as picky about singularity of the variance-covariance matrix as lmer is. lmer gave me an error: Error in mer_finalize(ans) : Downdated X'X is not positive definite, 10. I've rerun lmer with - but specifying the random effects for 2 fewer predictors. This time it ran and provided an output. (BTW, the random effects of lmer with 2 fewer predictors specified as random were very close to the output of lme). Yes, lmer could converge in such as case but the parameter estimates are not meaningful because of the ambiguity described above. Question: Looks like lmer could not invert the matrix, right? Well, lmer never tries to invert matrices but it does factor them and that is where the problem is recognized. However, I think that singularity is a symptom of the problem, not the cause. But how come lme (which I thought was an earlier version of lmer) COULD invert it? The computational methods in the two packages are quite different. I think that the methods in lme4 are superior because we have learned a bit in the last 10 years. Greatly appreciate a clarification! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with lmer model specification syntax for nested mixed model
On Sun, Oct 31, 2010 at 2:35 AM, Carabiniero ja...@troutnut.com wrote: I haven't been able to fully make sense of the conflicting online information about whether and how to specify nesting structure for a nested, mixed model. I'll describe my experiment and hopefully somebody who knows lme4 well can help. We're measuring the fluorescence intensity of brain slices from frogs that have undergone various treatments. We want to use multcomp to look for differences between treatments, while accounting for the variance introduced by the random effects of brain and slice. There are a few measurements per slice, several slices per brain, and several brains per treatment. In the data file, the numbering for slices starts over from 1 for each brain, and the numbering for brains starts over from 1 for each treatment. This is what I call implicit nesting in the definition of the variables. My general recommendation is to create new variables that reflect the actual structure of the data, as in mydata - within(mydata, { ubrain - factor(Treatment:Brain) uslice - factor(Treatment:Brain:Slice) } then define the model in terms of these factors, ubrain and uslice, that have the desirable property that each distinct brain has a distinct label. In other words: Treatment is a fixed effect, brain is a random effect nested in treatment, and slice is a random effect nested in brain. As I understood the documentation, this is the correct specification: log(Intensity) ~ Treatment + (1|Brain) + (1|Slice) That will work with ubrain and uslice instead of the implicitly nested Brain and Slice. However, I don't see how lmer understands the correct nesting structure from that. How does it know brain isn't crossed with treatment? lmer can determine the crossed or nested structure from the data whenever the data reflect the structure. Implicitly nested factors don't reflect the structure of the data and rely on external information to augment the data given. The computational methods used in lmer don't depend on whether the grouping factors for the random effects are nested or not. However they do require that the grouping factors are well-defined. Here are two other things I tried, and each gave different results: log(Intensity) ~ Treatment + (1|Slice/Brain/Treatment) log(Intensity) ~ Treatment + (1|Brain/Treatment) + (1|Slice/Brain) I'm not sure why these things give different results, or which one (if any) is right. Can anyone help? I have taken the liberty of cc:ing the R-SIG-Mixed-Models mailing list on this reply and suggest that any follow-ups be on that list. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R for Production - Discussion
On Mon, Nov 1, 2010 at 11:04 PM, Santosh Srinivas santosh.srini...@gmail.com wrote: Hello Group, This is an open-ended question. Quite fascinated by the things I can do and the control I have on my activities since I started using R. I basically have been using this for analytical related work off my desktop. My experience has been quite good and most issues where I need to investigate and solve are typical items more related to data errors, format corruption, etc... not necessarily R Related. Complementing this with Python gives enough firepower to do lots of production (analytical related activities) on the cloud (from my research I see that every innovative technology provider seems to support Python ... google, amazon, etc). Question on using R for Production activities: Q1) Does anyone have experience of using R-scripts etc ... for production related activities. E.g. serving off a computational/ analytical / simulation environment from a webportal with the analytical processing done in R. I've seen that most useful things for normal (not rocket science) business (80-20 rule) can be done just as well in R in comparison with tools like SAS, Matlab, etc. Q2) I haven't tried the processing routines for much larger data-sets assuming size is not a constraint nowadays. I know that I should try out ... but any forewarnings would help. Is it likely that something that works for my desktop dataset is quite as likely to work when scaled up to a cloud dataset? Assuming that I do the clearing out of unused objects, not running into infinite loops, etc? i.e. is there any problem with the fundamental architecture of R itself? (like press articles often say) Q3) There are big fans of the SAS, Matlab, Mathworks environments out there does anyone have a comparison of how R fares. From my experience R is quite neat and low level ... so overheads should be quite low. Most slowness comes due to lack of knowledge (see my code ... like using the wrong structures, functions, loops, etc.) rather than something wrong with the way R itself is. Perhaps there is no commercial focus to enhance performance related issues but my guess is that it is just matter of time till the community evolves the language to score higher on that too. And perhaps develops documentation to assist the challenge users with performance tips (the ten commandments types) Q4) You must have heard about the latest comment from James Goodnight of SAS ... We haven't noticed that a lot. Most of our companies need industrial strength software that has been tested, put through every possible scenario or failure to make sure everything works correctly. My gut is that random passionate geeks (playing part-time) do better testing than a military of professionals ... (but I've no empirical evidence here) I am not taking a side here (although I appreciate those who do!) .. but looking for an objective reasoning. Regarding performance and size of data sets I would suggest viewing the presentation that Dirk Eddelbuettel and Romain Francois gave at Google recently. David Smith links to it in his blog at blog.revolutionanalytics.com One of the advantages of Open Source systems is that people can provide many different kinds of hooks into the code. At present any R vector objects use 32-bit signed integers for indexing, which limits the size of an individual vector to 2^{31}-1. There are some methods available for using external storage to by-pass this but they do introduce another level of complexity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integrate to 1? (gauss.quad)
I don't know about the statmod package and the gauss.quad function but generally the definition of Gauss-Hermite quadrature is with respect to the function that is multiplied by exp(-x^2) in the integrand. So your example would reduce to summing the weights. On Sun, Nov 14, 2010 at 11:18 AM, Doran, Harold hdo...@air.org wrote: Does anyone see why my code does not integrate to 1? library(statmod) mu - 0 s - 1 Q - 5 qq - gauss.quad(Q, kind='hermite') sum((1/(s*sqrt(2*pi))) * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights) ### This does what's it is supposed to myNorm - function(theta) (1/(s*sqrt(2*pi))) * exp(-((theta-mu)^2/(2*s^2))) integrate(myNorm, -Inf, Inf) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to do linear regression when dimension is high
On Tue, Nov 16, 2010 at 10:30 AM, poko2000 quan.poko2...@gmail.com wrote: Hi I am a newbie in R. I have data with dim of 20. How to use lm if i want to do regression with the whole design matrix? My y is the first column. the left are xs. Thanks a lot. Do you have the data stored in a matrix or in a data frame? If it is in a data frame and the first column is named y then you can use a call like lm(y ~ ., mydata) The '.' on the right hand side of the formula is expanded to a formula incorporating of the columns in the data frame except for the response. For example summary(lm(Fertility ~ ., swiss)) Call: lm(formula = Fertility ~ ., data = swiss) Residuals: Min 1Q Median 3Q Max -15.2743 -5.2617 0.5032 4.1198 15.3213 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 66.91518 10.70604 6.250 1.91e-07 Agriculture -0.172110.07030 -2.448 0.01873 Examination -0.258010.25388 -1.016 0.31546 Education-0.870940.18303 -4.758 2.43e-05 Catholic 0.104120.03526 2.953 0.00519 Infant.Mortality 1.077050.38172 2.822 0.00734 Residual standard error: 7.165 on 41 degrees of freedom Multiple R-squared: 0.7067, Adjusted R-squared: 0.671 F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot vs print ??
On Tue, Nov 16, 2010 at 11:09 AM, skan juanp...@gmail.com wrote: What's the differente betwen using plot and using print in order to plot a graph? For example in order to plot the result of a histogram. Could you give an example? It seems that you are referring to graphics functions in packages such as lattice and ggplot2 that produce objects that must be print'ed or plot'ed before they are rendered on a graphics device. However, we would need to know if you mean the result of histogram() in the lattice package or hist() in the graphics package before we could answer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer, Error: Downdated X'X is not positive definite,49
It is often more effective to send questions about lmer or glmer to the r-sig-mixed-mod...@r-project.org mailing list, which I am cc:ing on this response. On Tue, Nov 16, 2010 at 3:25 AM, Annika annika.opperb...@jyu.fi wrote: Dear list, I am new to this list and I am new to the world of R. Additionally I am not very firm in statistics either but have to deal. So here is my problem: I have a dataset (which I attach at the end of the post) with a binomial response variable (alive or not) and three fixed factors (trapping,treat,sex). I do have repeated measures and would like to include one (enclosure) random factor. I chose to use the glmer function (and tried the lmer as well). m1-glmer(alive~treat*sex*trapping*ID+(1|enclosure),family=binomial,data=survadult) Is ID a factor? In the data you posted it is an integer value but I am guessing it should be a factor. If so, you can't incorporate it in the model because there is only one binary observation per level of ID. If I omit the ID in the fixed-effects part of the formula I can fit the model with glmer. summary(m1 - glmer(alive ~ trapping * treat * sex + (1|enclosure), trap, binomial)) Generalized linear mixed model fit by maximum likelihood ['summary.mer'] Family: binomial Formula: alive ~ trapping * treat * sex + (1 | enclosure) Data: trap AIC BIC logLik deviance 213.2111 334.5440 -73.6056 147.2111 Random effects: GroupsNameVariance Std.Dev. enclosure (Intercept) 1.3721.171 Number of obs: 292, groups: enclosure, 13 Fixed effects: Estimate Std. Error z value (Intercept) 10.61897 45.92452 0.231 trappingtrapping2 0.03992 65.59766 0.001 trappingtrapping3-0.01196 64.74665 0.000 trappingtrapping4-0.74441 71.77010 -0.010 treatm/p -0.26605 72.78328 -0.004 treatp/m -0.18007 81.63861 -0.002 treatp/p 0.50290 64.45725 0.008 sexm -8.61299 45.92746 -0.188 trappingtrapping2:treatm/p -0.03657 103.39000 0.000 trappingtrapping3:treatm/p -0.04916 101.87680 0.000 trappingtrapping4:treatm/p -10.26432 91.32210 -0.112 trappingtrapping2:treatp/m -0.04047 115.80425 0.000 trappingtrapping3:treatp/m -8.59631 93.53334 -0.092 trappingtrapping4:treatp/m -7.92162 98.52543 -0.080 trappingtrapping2:treatp/p -0.05942 91.39839 -0.001 trappingtrapping3:treatp/p -7.83591 78.98418 -0.099 trappingtrapping4:treatp/p -9.60581 84.83662 -0.113 trappingtrapping2:sexm0.92624 65.61333 0.014 trappingtrapping3:sexm -1.45692 64.75568 -0.022 trappingtrapping4:sexm -0.88035 71.77685 -0.012 treatm/p:sexm 8.69446 93.57164 0.093 treatp/m:sexm 8.61957 106.06371 0.081 treatp/p:sexm 0.86615 64.46538 0.013 trappingtrapping2:treatm/p:sexm -9.03327 118.95919 -0.076 trappingtrapping3:treatm/p:sexm -7.51474 117.64003 -0.064 trappingtrapping4:treatm/p:sexm 0.81873 108.62671 0.008 trappingtrapping2:treatp/m:sexm -9.51400 134.16089 -0.071 trappingtrapping3:treatp/m:sexm 0.48425 115.47998 0.004 trappingtrapping4:treatp/m:sexm -2.37780 119.56181 -0.020 trappingtrapping2:treatp/p:sexm -0.93097 91.42477 -0.010 trappingtrapping3:treatp/p:sexm 7.55060 79.00487 0.096 trappingtrapping4:treatp/p:sexm 9.47461 84.85476 0.112 Both give the following error message: Error in mer_finalize(ans) : Downdated X'X is not positive definite, 49. Additionally: Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred I get that there is something wrong in my dataframe, but I don`t understand what and how I could change it. If I run a glm without the random factor m1-glm(alive~treat*sex*trapping*ID,family=binomial,data=survadult) it works out really fine, but is the wrong analysis in my opinion. (Does glm actually consider the repeated measures? Does glmer?) Is glmer the right function for me and how do I deal with the error I get? Thanks for any help for a non-professional :) Annika trapping treat sex alive ID enclosure trapping1 m/m f 1 400 1 trapping1 m/m f 1 401 1 trapping1 m/m f 1 402 1 trapping1 m/m m 0 403 1 trapping1 m/m m 1 404 1 trapping1 m/m m 1 405 1 trapping1 m/p f 1 406 2 trapping1 m/p f 1 407 2 trapping1 m/p f 1 408 2 trapping1 m/p m 1 409 2 trapping1 m/p m 1 410 2
Re: [R] Question about GLMER
I am cc:ing the r-sig-mixed-mod...@r-project.org mailing list on this reply as such questions are often answered more quickly on that list. On Tue, Nov 16, 2010 at 2:00 PM, Daniel Jeske daniel.je...@ucr.edu wrote: Dear R Help, I believe the glmer() function in lme4 automatically fits an unstrucruted covariance matirx for the random effects. Is that true? The short answer is no. The longer answer is that you will need to be more specific about the form of the data and the model before your question can be answered. unstructured variance-covariance is SAS-speak (and an oxymoron, but I promise not to rant about that). So please give us a context. If so, do I have an option to somehow ask for a diagonal structured covariance matrix? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix Size
On Wed, Jul 14, 2010 at 4:23 PM, paul s r-project@queuemail.com wrote: hi - i just started using R as i am trying to figure out how perform a linear regression on a huge matrix. i am sure this topic has passed through the email list before but could not find anything in the archives. i have a matrix that is 2,000,000 x 170,000 the values right now are arbitray. i try to allocate this on a x86_64 machine with 16G of ram and i get the following: x - matrix(0,200,17); Error in matrix(0, 2e+06, 17) : too many elements specified R stores matrices and other data objects in memory. A matrix of that size would require 2e+06*17*8/2^30 [1] 2533.197 gigabytes of memory. Start looking for a machine with at least 5 terabytes of memory (you will need more than one copy of the matrix to be stored) or, probably easier, rethink your problem. Results from a linear regression producing 170,000 coefficient estimates are unlikely to be useful. The model matrix is essentially guaranteed to be rank deficient. is R capable of handling data of this size? am i doing it wrong? cheers paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing the name of a variable to a function
On Wed, Aug 4, 2010 at 2:09 PM, Erik Iverson er...@ccbr.umn.edu wrote: Hello, I have a problem which has bitten me occasionally. I often need to prepare graphs for many variables in a data set, but seldom for all. or for any large number of sequential or sequentially named variables. Often I need several graphs for different subsets of the dataset for a given variable. I run into similar problems with other needs besides graphing. What I would like to do is something like write a function which takes the *name* of a variable, presumably a s a character string, from a dataframe, as one argument, and the dataframe, as a second argument. For example, where y is to be the the name of a variable in a given dataframe d, and the other variables needed, T, M and so on, are to be found in the same dataframe :- pf - function (y,data,...) { p1 - xyplot(y~x|T,data) p2 - xyplot(y~x|T,subset(data,M == 2)) p3 - xyplot(y~x|T,subset(data,M == 4)) #print(p1,p2,p3) } pf(Score1,data) pf(Score2,data) This fails, because, of course, Score 1, Score 2 etc.. are not defined, or if you pass them as pf(data$Score2,data), then when you subset the data, data$Score2 is now the wrong shape. I've come up with various inelegant hacks, (often with for loops), for getting around this over the last few years, but I can't help feeling that I'm missing something obvious, which I've been too dim to spot. Depending on your needs (e.g., you use formulas, which can be trickier), I think I often do something like: # I prefer this, I quote the variable name... df1 - data.frame(x = rnorm(100), score1 = rnorm(100), M = sample(c(2, 4), 100, replace = TRUE)) pf - function (y,data,...) { data$y - data[[y]] xyplot(y~x, subset(data, M == 2)) } pf(score1, df1) # as an alternative, use eval/substitute, don't have to quote pf2 - function (y,data,...) { data$y - eval(substitute(y), data) xyplot(y~x, subset(data, M == 2)) } That's okay until you get name collisions with y in the data frame. I would approach the problem by substituting into the formula and perhaps changing the name y to a hidden name like .y (The general rule is that a programmer can intentionally use a name starting with . and expect that it will not conflict with the names chosen by users. Hostile users who use variable names that start with . get what they deserve.) eval(substitute(.y ~ x, list(.y = as.name(score1 score1 ~ x str(eval(substitute(.y ~ x, list(.y = as.name(score1) Class 'formula' length 3 score1 ~ x ..- attr(*, .Environment)=environment: R_GlobalEnv pf2(score1, df1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help installation lme4a, Error Message: lme4a is not a valid installed library
On Thu, Aug 5, 2010 at 9:31 AM, rod84 ngueye...@hotmail.com wrote: Dear R users, I recently downloaded the library lme4a by svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4. I tried to install the library lme4a by copying the downloaded document in the location where all the R libraries are saved. You are copying the source package to the library where the compiled packages are kept, which is why you are getting the error. You will need to install the package from the sources. As a Linux user I find the process straightforward. Windows and Mac OS X users are often taken aback by the complexity of the process. What operating system are you using? When I try to load the library, I obtain the message library(lme4a) Error in library(lme4a) : 'lme4a' is not a valid installed package R version 2.11.1 (2010-05-31) -- View this message in context: http://r.789695.n4.nabble.com/Help-installation-lme4a-Error-Message-lme4a-is-not-a-valid-installed-library-tp2314939p2314939.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapack in R 2.11.1 (Ubuntu 10.04.1)
On Wed, Sep 15, 2010 at 3:48 PM, Matias Salibian-Barrera msalib...@yahoo.ca wrote: Hi there, I'm trying to install the package RcppArmadillo in my R 2.11.1 which I installed and regularly update via Ubuntu's repositories. When I try to install RcppArmadillo from CRAN I get: install.packages('RcppArmadillo', lib='~/myRlibs') [...] g++ -shared -o RcppArmadillo.so RcppArmadillo.o fastLm.o -L/home/matias/myRlibs/Rcpp/lib -lRcpp -Wl,-rpath,/home/matias/myRlibs/Rcpp/lib -llapack -lblas -lgfortran -lm -L/usr/lib/R/lib -lR /usr/bin/ld: cannot find -llapack I believe this means I don't have lapack available to link to. Does anybody know how I can fix this? I really only need to have RcppArmadillo running. I'm not a power user. I'm running: version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major 2 minor 11.1 year 2010 month 05 day 31 svn rev 52157 language R version.string R version 2.11.1 (2010-05-31) and mat...@thecomputer:~$ cat /etc/issue Ubuntu 10.04.1 LTS \n \l Thanks a lot in advance. Did you compile R or did you install the Ubuntu package from the CRAN archives? Installing the package from CRAN will automatically install the Lapack library. See the instructions at http://cran.us.r-project.org/bin/linux/ubuntu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer() vs. lme() gave different variance component estimates
I haven't had the time to keep up with this discussion, or many of the other discussions on the R-SIG-Mixed-Models email list. I swamped with other duties at present. It is important to remember that the nlme and lme4 packages take a model specification and provide code to evaluate the deviance. Other code is used to optimize the deviance. In the case of nlme it is the nlminb function and in the case of lme4 it is the minqa function. There are no guarantees for numerical optimization routines. If the problem is ill-conditioned then they can converge to optima that are not the global optimum. Whoever suggested using the verbose option to see the progress of the iterations has the right idea. On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote: Thank you Peter and Ben for your comments. John - Original Message From: Peter Dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Mon, September 20, 2010 12:28:43 PM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates On 09/20/2010 08:09 PM, array chip wrote: Thank you Peter for your explanation of relationship between aov and lme. It makes perfect sense. When you said you might have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA for treatment effect, would this be the case for balanced design, or it is also true for unbalanced data? It is only exactly true for a balanced design, although it can be a practical expedient in nearly-balanced cases, especially if there is a clearly dominant animal variation. In strongly unbalanced data, you get reduced efficiency because animals with less data should be downweighted (not proportionally if there is substantial between-animal variation, though). And of course the whole thing relies on the fact that you have individuals nested in treatment (no animals had multiple treatments) Another question is if 1-way ANOVA is equivalent to mixed model for testing treatment effect, what would be reason why mixed model is used? Just to estimate the variance components? If the interest is not in the estimation of variance components, then there is no need to run mixed models to test treatment effects? Not too far off the mark. In more complex cases, there is the advantage that the mixed model helps figure out a sensible analysis for you. And my last question is I am glad to find that glht() from multcomp package works well with a lmer() fit for multiple comparisons. Given Professor Bates's view that denominator degree's of freedom is not well defined in mixed models, are the results from glht() reasonable/meaningful? If not, will the suggested 1-way ANOVA used together with glht() give us correct post-hoc multiple comparsion results? I think Doug's view is that DFs are not _reliably_estimated_ with any of the current procedures. In the balanced cases, they are very well defined (well, give or take the issues with negative variances), and I would expect glht() to give meaningful results. Do check the residuals for at least approximate normality, though. Thank you very much! John - Original Message From: Peter Dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Sat, September 18, 2010 1:35:45 AM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates For a nested design, the relation is quite straightforward: The residual MS are the variances of sample means scaled to be comparable with the residuals (so that in the absense of random components, all MS are equal to within the F-ratio variability). So to get the id:eye variance component, subtract the Within MS from the id:eye MS and divide by the number of replicates (4 in this case since you have 640 observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the id variance is the MS for id minus that for id:eye scaled by 8: (42.482-14.4)/8 = 3.51. I.e. it is reproducing the lmer results above, but of course not those from your original post. (Notice, by the way, that if you are only interested in the treatment effect, you might as well have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA). -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] lmer() vs. lme() gave different variance component estimates
On Tue, Sep 21, 2010 at 1:39 PM, Douglas Bates ba...@stat.wisc.edu wrote: I haven't had the time to keep up with this discussion, or many of the other discussions on the R-SIG-Mixed-Models email list. I swamped with other duties at present. It is important to remember that the nlme and lme4 packages take a model specification and provide code to evaluate the deviance. Other code is used to optimize the deviance. In the case of nlme it is the nlminb function and in the case of lme4 it is the minqa function. Shouldn't send email when I'm tired. The optimizer function is bobyqa in the minqa package. (Sort of makes you nostalgic for the punched cards and line printers when you see the Fortran names jammed into six characters.) There are no guarantees for numerical optimization routines. If the problem is ill-conditioned then they can converge to optima that are not the global optimum. Whoever suggested using the verbose option to see the progress of the iterations has the right idea. On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote: Thank you Peter and Ben for your comments. John - Original Message From: Peter Dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Mon, September 20, 2010 12:28:43 PM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates On 09/20/2010 08:09 PM, array chip wrote: Thank you Peter for your explanation of relationship between aov and lme. It makes perfect sense. When you said you might have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA for treatment effect, would this be the case for balanced design, or it is also true for unbalanced data? It is only exactly true for a balanced design, although it can be a practical expedient in nearly-balanced cases, especially if there is a clearly dominant animal variation. In strongly unbalanced data, you get reduced efficiency because animals with less data should be downweighted (not proportionally if there is substantial between-animal variation, though). And of course the whole thing relies on the fact that you have individuals nested in treatment (no animals had multiple treatments) Another question is if 1-way ANOVA is equivalent to mixed model for testing treatment effect, what would be reason why mixed model is used? Just to estimate the variance components? If the interest is not in the estimation of variance components, then there is no need to run mixed models to test treatment effects? Not too far off the mark. In more complex cases, there is the advantage that the mixed model helps figure out a sensible analysis for you. And my last question is I am glad to find that glht() from multcomp package works well with a lmer() fit for multiple comparisons. Given Professor Bates's view that denominator degree's of freedom is not well defined in mixed models, are the results from glht() reasonable/meaningful? If not, will the suggested 1-way ANOVA used together with glht() give us correct post-hoc multiple comparsion results? I think Doug's view is that DFs are not _reliably_estimated_ with any of the current procedures. In the balanced cases, they are very well defined (well, give or take the issues with negative variances), and I would expect glht() to give meaningful results. Do check the residuals for at least approximate normality, though. Thank you very much! John - Original Message From: Peter Dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Sat, September 18, 2010 1:35:45 AM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates For a nested design, the relation is quite straightforward: The residual MS are the variances of sample means scaled to be comparable with the residuals (so that in the absense of random components, all MS are equal to within the F-ratio variability). So to get the id:eye variance component, subtract the Within MS from the id:eye MS and divide by the number of replicates (4 in this case since you have 640 observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the id variance is the MS for id minus that for id:eye scaled by 8: (42.482-14.4)/8 = 3.51. I.e. it is reproducing the lmer results above, but of course not those from your original post. (Notice, by the way, that if you are only interested in the treatment effect, you might as well have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA). -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r
Re: [R] lmer() vs. lme() gave different variance component estimates
On Tue, Sep 21, 2010 at 4:21 PM, Peter Dalgaard pda...@gmail.com wrote: On 09/21/2010 09:02 PM, Douglas Bates wrote: On Tue, Sep 21, 2010 at 1:39 PM, Douglas Bates ba...@stat.wisc.edu wrote: I haven't had the time to keep up with this discussion, or many of the other discussions on the R-SIG-Mixed-Models email list. I swamped with other duties at present. It's not like I don't know how you feel It is important to remember that the nlme and lme4 packages take a model specification and provide code to evaluate the deviance. Other code is used to optimize the deviance. In the case of nlme it is the nlminb function and in the case of lme4 it is the minqa function. Shouldn't send email when I'm tired. The optimizer function is bobyqa in the minqa package. (Sort of makes you nostalgic for the punched cards and line printers when you see the Fortran names jammed into six characters.) There are no guarantees for numerical optimization routines. If the problem is ill-conditioned then they can converge to optima that are not the global optimum. Right. However, what we have here is a case where I'm pretty damn sure that the likelihood function is unimodal (it's a linear reparametrization of three independent chi-square terms) and has an optimum in the interior of the feasible region. In any case, I'm thoroughly confused about the lme4/lme4a/lme4b subtrees. Which algorithm is used by the current CRAN lme4?? (I have lme4b was a placeholder. It should be ignored now. lme4a is the development version that will become the released version once Martin and I are convinced that it does all the things that lme4 does. Before that can happen I have to get my head above water for a week or two to catch up and it could be well into October before that happens (I have a particularly heavy teaching load this semester). lme4a is cleaner than lme4 but leans heavily on the Rcpp package to achieve that. So it uses classes and methods (although in different senses) at both the R and the compiled code levels. lme4a allows for profiling the deviance in a LMM with respect to the parameters. I had thought that the released version of lme4 (the one shown below) allowed for selection of the optimizer but, upon further review (a phrase known to fans of what Americans call football) it turns out that it doesn't. So I misspoke. The released version of lme4 uses nlminb, as shown by the form of the verbose output. The lme4a version of lmer uses bobyqa. It would probably be worthwhile running this test case in that version. sessionInfo() R version 2.11.1 (2010-05-31) i386-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lme4_0.999375-35 Matrix_0.999375-39 lattice_0.18-8 loaded via a namespace (and not attached): [1] grid_2.11.1 nlme_3.1-96 stats4_2.11.1 tcltk_2.11.1 tools_2. ) Whoever suggested using the verbose option to see the progress of the iterations has the right idea. On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote: Thank you Peter and Ben for your comments. John - Original Message From: Peter Dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Mon, September 20, 2010 12:28:43 PM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates On 09/20/2010 08:09 PM, array chip wrote: Thank you Peter for your explanation of relationship between aov and lme. It makes perfect sense. When you said you might have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA for treatment effect, would this be the case for balanced design, or it is also true for unbalanced data? It is only exactly true for a balanced design, although it can be a practical expedient in nearly-balanced cases, especially if there is a clearly dominant animal variation. In strongly unbalanced data, you get reduced efficiency because animals with less data should be downweighted (not proportionally if there is substantial between-animal variation, though). And of course the whole thing relies on the fact that you have individuals nested in treatment (no animals had multiple treatments) Another question is if 1-way ANOVA is equivalent to mixed model for testing treatment effect, what would be reason why mixed model is used? Just to estimate the variance components? If the interest is not in the estimation of variance components, then there is no need to run mixed models to test treatment effects
Re: [R] Odp: Object Browser
On Mon, Sep 27, 2010 at 3:04 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi I noticed that nobody answered your question yet so here is my try. If you want to see what objects are in your environment you can use ls() but its output is only names of objects. Here is a function I use a long time for checking what objects are there, their type, size and possibly rows columns. You can modify it to give you some more info but usually it is not needed. Regards Petr ls.objects - function (pos = 1, pattern, order.by) { napply - function(names, fn) sapply(names, function(x) fn(get(x, pos = pos))) names - ls(pos = pos, pattern = pattern) obj.class - napply(names, function(x) as.character(class(x))[1]) obj.mode - napply(names, mode) obj.type - ifelse(is.na(obj.class), obj.mode, obj.class) obj.size - napply(names, object.size) obj.dim - t(napply(names, function(x) as.numeric(dim(x))[1:2])) vec - is.na(obj.dim)[, 1] (obj.type != function) obj.dim[vec, 1] - napply(names, length)[vec] out - data.frame(obj.type, obj.size, obj.dim) names(out) - c(Type, Size, Rows, Columns) if (!missing(order.by)) out - out[order(out[[order.by]]), ] out } An alternative for the command line interface is ls.str() which combines ls() and a very brief version of str() as applied to each of the objects. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data source for American college football rankings?
An interesting, and topical, example of multivariate data for classroom illustrations are the American college football rankings. Starting at the end of October (or week 8, the 8th week of the football season) a set of rankings called the BCS (Bowl Championship Series) will be published. This is a composite ranking based on two subjective polls, the Harris and USA Today polls, and a trimmed mean of 6 objective scores, the so-called computer rankings. The Harris and USA Today polls are currently available along with several others (the best known and most often quoted is the AP poll but, for complicated reasons, it was replaced in the BCS rankings by the Harris poll). I enjoy using these as classroom examples but I haven't found a web site from which I can download the data directly and am reduced to screen scraping the HTML from popular sites like ESPN. Does anyone know of a site from which one can download the current poll results? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data source for American college football rankings?
On Tue, Sep 29, 2009 at 7:56 AM, Douglas Bates ba...@stat.wisc.edu wrote: An interesting, and topical, example of multivariate data for classroom illustrations are the American college football rankings. Starting at the end of October (or week 8, the 8th week of the football season) a set of rankings called the BCS (Bowl Championship Series) will be published. This is a composite ranking based on two subjective polls, the Harris and USA Today polls, and a trimmed mean of 6 objective scores, the so-called computer rankings. The Harris and USA Today polls are currently available along with several others (the best known and most often quoted is the AP poll but, for complicated reasons, it was replaced in the BCS rankings by the Harris poll). I enjoy using these as classroom examples but I haven't found a web site from which I can download the data directly and am reduced to screen scraping the HTML from popular sites like ESPN. Does anyone know of a site from which one can download the current poll results? To follow up on my own posting, these data are interesting in part because they are presented as tables in many, many places and almost never presented graphically. Some graphical analysis from the rankings on October 21, 2007 is shown in http://www.stat.wisc.edu/~bates/BCS-2007-10-21.pdf It's not often that an example that many students find interesting can be used to illustrate trimmed means, various correlation measures, scatterplot matrices, parallel coordinate plots, principal components and biplots. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Condition to factor (easy to remember)
On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote: On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin Mächler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c(Pre,Post) # Typical C-Programmer style factor(levs[as.integer(data 10)+1], levels=levs) # Easiest to understand factor(ifelse(data =10, levs[1], levs[2]), levels=levs) Why not factor(data 10, labels = c(Pre, Post)) [1] Pre Pre Pre Post Post Levels: Pre Post All you have to remember is that FALSE comes before TRUE. And besides, Frank Harrell will soon be weighing in to tell you why you shouldn't dichotomize in the first place. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rounding error in seq(...)
On Wed, Sep 30, 2009 at 2:32 PM, Peter Dalgaardp.dalga...@biostat.ku.dk wrote: Martin Batholdy wrote: hum, can you explain that a little more detailed? Perhaps I miss the background knowledge - but it seems just absurd to me. 0.1+0.1+0.1 is 0.3 - there is no rounding involved, is there? why is x - 0.1 + 0.1 +0.1 not equal to y - 0.3 Remember that this is in BINARY arithmetic. It's really not any stranger than the fact that 1/3 + 1/3 != 2/3 in finite accuracy decimal arithmetic (0.3 + 0.3 = 0.6 != 0.7). In an earlier thread on this theme I believe that someone quoted Brian Kernighan as saying 10 times 0.1 is hardly ever 1 but I haven't been able to track down the quote. Can anyone point us to such a quote? It summarizes the situation succinctly, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting 1 covariate, 3 factors
I'm not sure if this is exactly what you are looking for but I would generally create an interaction plot using the lattice 'dotplot' with type = c(p,a) so I get both the original data and the lines joining the averages for the different factor levels. I also prefer the horizontal orientation to the vertical orientation. Combining all these variations produces something like dotplot(f2 ~ y | f1, groups = f3, aspect = 0.2, layout = c(1,2), type = c(p,a), pch = 21, strip = FALSE, strip.left = TRUE, auto.key = list(columns = 2, lines = TRUE)) On Wed, Oct 7, 2009 at 11:00 AM, Paul Chatfield p.s.chatfi...@rdg.ac.uk wrote: I'm interested in plotting a y with an x factor as the combination of 2 factors and colour with respect to a third, which the code below does with interaction.plot(). However, this is because I redefine the x to be 1 factor. Is there a way of getting it to plot without redefining it, and ideally to not join up the lines BETWEEN levels a and b, but just join those between after and before for one level of f3. I figure this could be done by manually drawing over blank lines using ?lines but am not sure what the coordinates would be and figured there is probably an easier way where someone has dealt with this before. Any thoughts greatly appreciated, Paul # y-rnorm(36) f1-rep(c(after,before), 18) f2-rep(1:3,12) f3-rep(1:2, each=18) ## Define new factor to be f1 and f3 for x axis - clumsy code, but gets its done; ff-numeric(length(y)) for (i in 1:length(y)) {if (f1[i]==a f3[i]==1) ff[i]-1, a else if(f1[i]==a f3[i]==2) ff[i]-2, a else if(f1[i]==b f3[i]==1) ff[i]-1, b else ff[i]-2, b} ## Plot of interest; interaction.plot(ff,f2,y) -- View this message in context: http://www.nabble.com/Plotting-1-covariate%2C-3-factors-tp25789442p25789442.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] custom selfStart model works with getInitial but not nls
On Fri, Oct 16, 2009 at 7:09 PM, Michael A. Gilchrist mi...@utk.edu wrote: Hello, I'm having problems creating and using a selfStart model with nlme. Briefly, I've defined the model, a selfStart object, and then combined them to make a selfStart.default model. If I apply getInitial to the selfStart model, I get results. However, if I try usint it with nls or nlsList, these routines complain about a lack of initial conditions. If someone could point out what I'm doing wrong, I'd greatly appreciate it. Thanks for providing the code. Could you also provide some data as a test case? Details: ## Nonlinear model I want to fit to the data const.PBMC.tcell.model - function(B0, t, aL, aN, T0){ Tb0 = B0; x = exp(-log(aL) + log(T0*aL+(-1+exp(t * aL))*Tb0 * aN) - t * aL); return(x); } ##Define selfStart routine const.PBMC.tcell.selfStart- function(mCall, LHS, data){ t0 = 0; t1 = 24; t2 = 48; ##Get B0 Value B0 = data[1, B0]; T0 = mean(data[data$Time==t0, Count]); T1 = mean(data[data$Time==t1, Count]); T2 = mean(data[data$Time==t2, Count]); if(T0 T2){ ##increase -- doesn't work stop(paste(Error in const.PBMC.tcell.start: T0 T2 for data: , data[1, ])); } ##Estimate aL based on exponential decline from t=0 to t=24 aLVal = -(log(T1) - log(T0))/(t1-t0); ##Estimate aNVal based on final value aNVal = aLVal*T2/B0; values = list(aLVal, aNVal, T0); names(values) - mCall[c(aL, aN, T0)]; #mimic syntax used by PB return(values) } ##Now create new model with selfStart attributes const.PBMC.tcell.modelSS- selfStart(model = const.PBMC.tcell.model, initial=const.PBMC.tcell.selfStart) ##Test routines using getInitial -- This works getInitial(Count ~ const.PBMC.tcell.modelSS(B0, Time,aL, aN, T0), data = tissueData) [1] 0.05720924 $aL [1] 0.05720924 $aN [1] 0.1981895 $T0 [1] 1360.292 ##Now try to use the SS model -- this doesn't work nls(Count ~ const.PBMC.tcell.modelSS(B0, Time,aL, aN, T0), data = tissueData) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: Warning message: In nls(Count ~ const.PBMC.tcell.modelSS(B0, Time, aL, aN, T0), data = tissueData) : No starting values specified for some parameters. Intializing 'aL', 'aN', 'T0' to '1.'. Consider specifying 'start' or using a selfStart model __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cairo package, png files within for loop are black?
On Thu, Oct 22, 2009 at 12:46 PM, Douglas M. Hultstrand dmhul...@metstat.com wrote: Hello, I am generating .png images using the Cairo package in a for loop (looping on the number of zones, number of zones equals the number of plots to create based on different zone data). When I run the R script the .png files are created but they are all black? If I comment out the for loop and force my zones to equal one the png file is created correctly? Is there an issue with generating .png files within a for loop? Probably not. The behavior you observe is due to a lattice graphics function being called in a loop. When called within a loop you must print or show the result of a lattice graphics function. See FAQ 7.22 Create .png commands within for loop: CairoPNG(paste(t.g), width=800, height=600, pointsize=12, bg=white) xyplot(areasqmi ~ value, x.p, groups=dur, main=(t.n), ylab=expression(Area(mi ^ 2 * )), xlab=Maximum Average Depth of Percipitation (inches), scales=list(y=list(log=TRUE)), ylim=c(1,100), type='l', auto.key=list(space='right')) dev.off() -- - Douglas M. Hultstrand, MS Senior Hydrometeorologist Metstat, Inc. Windsor, Colorado voice: 970.686.1253 email: dmhul...@metstat.com web: http://www.metstat.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What happen for Negative binomial link in Lmer fonction?
On Thu, Oct 22, 2009 at 1:39 PM, Ben Bolker bol...@ufl.edu wrote: ROBARDET Emmanuelle wrote: Dear R users, I'm performing some GLMMs analysis with a negative binomial link. I already performed such analysis some months ago with the lmer() function but when I tried it today I encountered this problem: Erreur dans famType(glmFit$family) : unknown GLM family: 'Negative Binomial' Does anyone know if the negative binomial family has been removed from this function? I really appreciate any response. Emmanuelle I would be extremely surprised if this worked in the past; to the best of my knowledge the negative binomial family has never been implemented in lmer. I too would be extremely surprised if it had worked in the past, considering that I have never implemented it. I did exchange email with Bill Venables about it and we formulated what seems to be a reasonable approach but it hasn't made it to the top of the To Do list yet. Right now the big push is on code to profile the log-likelihood with respect to the parameters so we can actually get confidence intervals and, the holy grail of mixed-modeling, p-values. One could in principle do a glmmPQL fit with the negative binomial family (with a fixed value of the overdispersion parameter). glmmADMB is another option. Can you say which version etc. you were using??? Follow-ups should probably be sent to r-sig-mixed-mod...@r-project.org -- View this message in context: http://www.nabble.com/What-happen-for-Negative-binomial-link-in-Lmer-fonction--tp26013041p26015140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatization of non-linear regression
Alternatively you could install the NRAIA package and collapse the construction of the plot to a call to plotfit as shown in the enclosed. Note that this is a poor fit of a logistic growth model. There are no data values beyond the inflection point so the Asym parameter (the asymptote) is poorly determined and the asymptote and the inflection point on the x axis (the xmid parameter) will be highly correlated. Furthermore the increasing variability is due to differences between plants. You can check with xyplot(severity ~ time, data1, groups = plant, type = c(g, b)) so a nonlinear mixed-effects model may be more appropriate. See Pinheiro and Bates (Springer, 2000). 2009/10/22 Peter Ehlers ehl...@ucalgary.ca: Joel, The following should answer most of your questions: time - c(seq(0,10),seq(0,10),seq(0,10)) plant - c(rep(1,11),rep(2,11),rep(3,11)) severity - c( 42,51,59,64,76,93,106,125,149,171,199, 40,49,58,72,84,103,122,138,162,187,209, 41,49,57,71,89,112,146,174,218,250,288)/288 # Suggestion: you don't need cbind() to make a dataframe: data1 - data.frame(time, plant, severity) # Plot severity versus time # you can avoid the awkward ..$.. by using with(): with(data1, plot(time, severity, type=n)) with(data1, text(time, severity, plant)) title(main=Graph of severity vs time) # Now you use either the 'getInitial' approach and # transform your parameter estimates to your # preferred ones, or you can use SSlogis for your # model and transform the parameter estimates # afterwards: # Method 1: # ini - getInitial( severity ~ SSlogis(time, alpha, xmid, scale), data = data1 ) ini - unname(ini) ##to get rid of names # start vector in terms of alpha, beta, gamma: para0.st - c( alpha = ini[1], beta = ini[2]/ini[3], gamma = 1/ini[3] ) fit0 - nls( severity~alpha/(1+exp(beta-gamma*time)), data1, start=para0.st, trace=T ) # logistic curve on the scatter plot co - coef(fit0) ##get the parameter estimates curve(co[1]/(1+exp(co[2]-co[3]*x)), add=TRUE) # you don't need from, to, since the plot is already # set up and you don't want to restrict the curve; # personally, I prefer defining the function to be plotted: f - function(x, abc){ alpha - abc[1] beta - abc[2] gamma - abc[3] alpha / (1 + exp(beta - gamma * x)) } # then you can plot the curve with: curve(f(x, coef(fit0)), add = TRUE) # Method 2: # fit2 - nls(severity ~ SSlogis(time, Asym, xmid, scal), data = data1) co - unname(coef(fit2)) abc - c(co[1], co[2]/co[3], 1/co[3]) with(data1, plot(time, severity, type=n)) with(data1, text(time, severity, plant)) title(main=Graph of severity vs time) curve(f(x, abc), add = TRUE) Cheers, -Peter Ehlers Joel Fürstenberg-Hägg wrote: Hi everybody, I'm using the method described here to make a linear regression: http://www.apsnet.org/education/advancedplantpath/topics/Rmodules/Doc1/05_Nonlinear_regression.html ## Input the data that include the variables time, plant ID, and severity time - c(seq(0,10),seq(0,10),seq(0,10)) plant - c(rep(1,11),rep(2,11),rep(3,11)) ## Severity represents the number of ## lesions on the leaf surface, standardized ## as a proportion of the maximum severity - c( + 42,51,59,64,76,93,106,125,149,171,199, + 40,49,58,72,84,103,122,138,162,187,209, + 41,49,57,71,89,112,146,174,218,250,288)/288 data1 - data.frame( + cbind( + time, + plant, + severity + ) + ) ## Plot severity versus time ## to see the relationship between## the two variables for each plant plot( + data1$time, + data1$severity, + xlab=Time, + ylab=Severity, + type=n + ) text( + data1$time, + data1$severity, + data1$plant + ) title(main=Graph of severity vs time) getInitial( + severity ~ SSlogis(time, alpha, xmid, scale), + data = data1 + ) alpha xmid scale 2.212468 12.506960 4.572391 ## Using the initial parameters above, ## fit the data with a logistic curve. para0.st - c( + alpha=2.212, + beta=12.507/4.572, # beta in our model is xmid/scale + gamma=1/4.572 # gamma (or r) is 1/scale + ) fit0 - nls( + severity~alpha/(1+exp(beta-gamma*time)), + data1, + start=para0.st, + trace=T + ) 0.1621433 : 2.212 2.7355643 0.2187227 0.1621427 : 2.2124095 2.7352979 0.2187056 ## Plot to see how the model fits the data; plot the ## logistic curve on a scatter plot plot( + data1$time, + data1$severity, + type=n + ) text( + data1$time, + data1$severity, + data1$plant + ) title(main=Graph of severity vs time) curve( +
Re: [R] glm.fit to use LAPACK instead of LINPACK
On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan rvarad...@jhmi.edu wrote: Ted, LAPACK is newer and is supposed to contain better algorithms than LINPACK. It is not true that LAPACK cannot handle rank-deficient problems. It can. It's not just a question of handling rank-deficiency. It's the particular form of pivoting that is used so that columns associated with the same term stay adjacent. The code that is actually used in glm.fit and lm.fit, called through the Fortran subroutine dqrls, is a modified version of the Linpack dqrdc subroutine. However, I do not know if using LAPACK in glm.fit instead of LINPACK would be faster and/or more memory efficient. The big thing that could be gained is the use of level-3 BLAS. The current code uses only level-1 BLAS. Accelerated BLAS can take advantage of level 3 calls relative to level-1. Even so, I doubt that the QR decomposition is the big time sink in glm.fit. Why not profile a representative fit and check? I did profile the glm.fit code a couple of years ago and discovered that a lot of time was being spent evaluating the various family functions like the inverse link and the variance function and that was because of calls to pmin and pmax. Before trying to change very tricky Fortran code you owe it to yourself to check that the potential gains would be. - Original Message - From: Ted tchi...@sickkids.ca Date: Thursday, October 22, 2009 10:53 am Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK To: r-help@R-project.org r-help@r-project.org Hi, I understand that the glm.fit calls LINPACK fortran routines instead of LAPACK because it can handle the 'rank deficiency problem'. If my data matrix is not rank deficient, would a glm.fit function which runs on LAPACK be faster? Would this be worthwhile to convert glm.fit to use LAPACK? Has anyone done this already?? What is the best way to do this? I'm looking at very large datasets (thousands of glm calls), and would like to know if it's worth the effort for performance issues. Thanks, Ted - Ted Chiang Bioinformatics Analyst Centre for Computational Biology Hospital for Sick Children, Toronto 416.813.7028 tchi...@sickkids.ca __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inspect s4 methods
On Thu, Oct 22, 2009 at 8:11 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi everyone, I'm sure this has been asked before, but I could not find the right search terms to find it. If I want to see what summarizing a model fit with lm() does, I can write summary.lm. But what if I want to see what summarizing a model fit with lmer() (lme4 package) does? showMethods(summary) Function: summary (package base) object=ANY object=character (inherited from: object=ANY) object=mer object=sparseMatrix object=summary.mer tells me there is a mer method, but summary.mer gives me no joy. How can I see what summary.mer does? Do I have to inspect the source code? Actually if you had followed up with the documentation for showMethods you would have found out. showMethods(summary, class = mer, include = TRUE) More details are given in Uwe's article in issue 4 of 2006 R News http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package lme4
On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote: Hi R Users, When I use package lme4 for mixed model analysis, I can't distinguish the significant and insignificant variables from all random independent variables. Here is my data and result: Data: Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9), Variety=rep(rep(c(A1,A2,A3),each=3),3), Stand=rep(c(B1,B2,B3),9), Block=rep(1:3,each=9)) Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) + (1|Variety:Stand), data = Rice) Result: Linear mixed model fit by REML Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 96.25 104.0 -42.12 85.33 84.25 Random effects: Groups Name Variance Std.Dev. Variety:Stand (Intercept) 1.345679 1.16003 Block (Intercept) 0.00 0.0 Stand (Intercept) 0.89 0.94281 Variety (Intercept) 0.024691 0.15714 Residual 0.67 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3; Variety, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 Can you give me some advice for recognizing the significant variables among random effects above without other calculating. Well, since the estimate of the variance due to Block is zero, that's probably not one of the significant random effects. Why do you want to do this without other calculations? In olden days when each model fit involved substantial calculations by hand one did try to avoid fitting multiple models but now that is not a problem. You can get a hint of which random effects will be significant by looking at their precision in a caterpillar plot and then fit the reduced model and use anova to compare models. See the enclosed Any suggestions will be appreciated. Wenjun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. R version 2.10.0 Patched (2009-11-01 r50288) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(lme4) Loading required package: Matrix Loading required package: lattice Rice - data.frame(Yield = c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7, +7,8,8,8,4,8,6,4,8,8,9), +Variety = gl(3, 3, 27, labels = c(A1,A2,A3)), +Stand = gl(3, 1, 27, labels = c(B1,B2,B3)), +Block = gl(3, 9)) print(fm1 - lmer(Yield ~ 1 + (1|Block) + (1|Stand) + + (1|Variety) + (1|Variety:Stand), Rice)) Linear mixed model fit by REML Formula: Yield ~ 1 + (1 | Block) + (1 | Stand) + (1 | Variety) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 96.25 104.0 -42.1285.33 84.25 Random effects: GroupsNameVariance Std.Dev. Variety:Stand (Intercept) 1.34568 1.16003 Variety (Intercept) 0.02469 0.15713 Stand (Intercept) 0.9 0.94281 Block (Intercept) 0.0 0.0 Residual 0.7 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3; Block, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 ## Estimate of Block variance is zero, remove that term print(fm2 - lmer(Yield ~ 1 + (1|Stand) + (1|Variety) + (1|Variety:Stand), + Rice)) Linear mixed model fit by REML Formula: Yield ~ 1 + (1 | Stand) + (1 | Variety) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 94.25 100.7 -42.1285.33 84.25 Random effects: GroupsNameVariance Std.Dev. Variety:Stand (Intercept) 1.345679 1.16003 Variety (Intercept) 0.024692 0.15714 Stand (Intercept) 0.88 0.94281 Residual 0.67 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 ## Very small estimate of variance for Variety ## Check the precision of the random effects pl2 -
Re: [R] package lme4
On Tue, Nov 3, 2009 at 9:11 AM, wenjun zheng wjzhen...@gmail.com wrote: May be I can calculate p value by t testing approximately: 1-qnorm(Variance/Std.Dev.) That would be a z test, not a t test, wouldn't it? And it would only be meaningful if the distribution of the estimator is approximately normal, which we know it definitely is not. Estimates of variances have distributions like a chi-square, not like a normal. In particular, the estimators are not symmetrically distributed about the parameter value. But which function can help me to extract Variance and Std.Dev values from the results below: The VarCorr extractor produces the estimates of the variance. You would have to write your own functions to determine an approximate standard error of those estimates and I would not advise doing so. Firstly, the code would be rather complicated and secondly the answer would be of very limited usefulness, for the reasons discussed above. print(fm2 - lmer(Yield ~ 1 + (1|Stand) + (1|Variety) + (1|Variety:Stand),Rice)) Linear mixed model fit by REML Formula: Yield ~ 1 + (1 | Stand) + (1 | Variety) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 94.25 100.7 -42.12 85.33 84.25 Random effects: Groups Name Variance Std.Dev. Variety:Stand (Intercept) 1.345679 1.16003 Variety (Intercept) 0.024692 0.15714 Stand (Intercept) 0.88 0.94281 Residual 0.67 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 2009/11/2 Douglas Bates ba...@stat.wisc.edu On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote: Hi R Users, When I use package lme4 for mixed model analysis, I can't distinguish the significant and insignificant variables from all random independent variables. Here is my data and result: Data: Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9), Variety=rep(rep(c(A1,A2,A3),each=3),3), Stand=rep(c(B1,B2,B3),9), Block=rep(1:3,each=9)) Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) + (1|Variety:Stand), data = Rice) Result: Linear mixed model fit by REML Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 96.25 104.0 -42.12 85.33 84.25 Random effects: Groups Name Variance Std.Dev. Variety:Stand (Intercept) 1.345679 1.16003 Block (Intercept) 0.00 0.0 Stand (Intercept) 0.89 0.94281 Variety (Intercept) 0.024691 0.15714 Residual 0.67 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3; Variety, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 Can you give me some advice for recognizing the significant variables among random effects above without other calculating. Well, since the estimate of the variance due to Block is zero, that's probably not one of the significant random effects. Why do you want to do this without other calculations? In olden days when each model fit involved substantial calculations by hand one did try to avoid fitting multiple models but now that is not a problem. You can get a hint of which random effects will be significant by looking at their precision in a caterpillar plot and then fit the reduced model and use anova to compare models. See the enclosed Any suggestions will be appreciated. Wenjun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package lme4
On Tue, Nov 3, 2009 at 8:08 AM, wenjun zheng wjzhen...@gmail.com wrote: Thanks,Douglas, It really helps me a lot, but is there any other way if I want to show whether a random effect is significant in text file, like P value or other index. Thanks very much again. Wenjun. Well there are p-values from the likelihood ratio tests in that transcript I sent. The point of those tests is that a p-value can only be calculated when you know both the null hypothesis and the alternative, which is why those p-values are the result of comparing two nested model fits. 2009/11/2 Douglas Bates ba...@stat.wisc.edu On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote: Hi R Users, When I use package lme4 for mixed model analysis, I can't distinguish the significant and insignificant variables from all random independent variables. Here is my data and result: Data: Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9), Variety=rep(rep(c(A1,A2,A3),each=3),3), Stand=rep(c(B1,B2,B3),9), Block=rep(1:3,each=9)) Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) + (1|Variety:Stand), data = Rice) Result: Linear mixed model fit by REML Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 | Variety:Stand) Data: Rice AIC BIC logLik deviance REMLdev 96.25 104.0 -42.12 85.33 84.25 Random effects: Groups Name Variance Std.Dev. Variety:Stand (Intercept) 1.345679 1.16003 Block (Intercept) 0.00 0.0 Stand (Intercept) 0.89 0.94281 Variety (Intercept) 0.024691 0.15714 Residual 0.67 0.81650 Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3; Variety, 3 Fixed effects: Estimate Std. Error t value (Intercept) 7.1852 0.6919 10.38 Can you give me some advice for recognizing the significant variables among random effects above without other calculating. Well, since the estimate of the variance due to Block is zero, that's probably not one of the significant random effects. Why do you want to do this without other calculations? In olden days when each model fit involved substantial calculations by hand one did try to avoid fitting multiple models but now that is not a problem. You can get a hint of which random effects will be significant by looking at their precision in a caterpillar plot and then fit the reduced model and use anova to compare models. See the enclosed Any suggestions will be appreciated. Wenjun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LINEAR MIXED EFFECT
On Thu, Nov 12, 2009 at 10:14 AM, milton ruser milton.ru...@gmail.com wrote: Hi Ana, I am not quite sure if it is the problem, but if you call your data.frame as exp, you will crash exp() function... try use another name for your data.frame. By the way, I suggest you not use attach(). Try something like model-lme(weight~date*diet, data=yourdataframe) good luck and include the random specification in the call to lme, not as a separate assignment. milton On Thu, Nov 12, 2009 at 5:43 AM, ANARPCG a.gouvei...@imperial.ac.uk wrote: Milton's point is dead-on, and I would highly encourage you to give the posting guide a look. That said... you might try na.action = na.omit in your call to... actually, we don't know what function you are using (see first point). Regardless, sounds like you have missing data and na.action is set to na.fail (ie, fail if any missing data). cheers, Dave milton ruser wrote: Dear Ana Golveia, It is completelly impossible someone realise what kind or help you need or what is happening. I suggest you give a look on the posting guide, mainly that part about a minimum reproducible code with self explaining information, etc. Cheers milton On Wed, Nov 11, 2009 at 7:22 AM, ANARPCG a.gouvei...@imperial.ac.uk wrote: CAN ANYONE PLEASE HELP ME WITH THIS i HAVE TO DO A MIXED EFFECT LINEAR MODEL WITH MY DATA DUE TO THE FACT THAT I have pseudoreplication! Although after reading and trying it for several times can get around due to Error in na.fail.default(list(date = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, : missing values in object I uploaded my data file Thank you so much Kind regards AG http://old.nabble.com/file/p26300394/rawoctobercalciumexperiment2.txt rawoctobercalciumexperiment2.txt -- View this message in context: http://old.nabble.com/LINEAR-MIXED-EFFECT-tp26300394p26300394.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. sorry I am new on this things the point is after trying to run the model all that happens is: exp-read.table(file=file.choose(),header=T) attach(exp) names(exp) [1] group date diet weight thickness length [7] width liplength lipwidth exp$diet=factor(exp$diet,levels=c(zeropercent,tenpercent,twentypercent,thirtypercent,fortypercent,cuttleprecent)) exp=na.omit(exp) library(nlme) random=~date model-lme(weight~date*diet) Error in na.fail.default(list(date = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, : missing values in object I have pseudoreplications due to the fact that the measurements of the replicates have different starting points so I was advised to do lme. I never used it before and I cant get arround with it! the help I wanted from you is to help me to understand how to do a runable model! http://old.nabble.com/file/p26315302/rawoctobercalciumexperiment2.txt rawoctobercalciumexperiment2.txt -- View this message in context: http://old.nabble.com/LINEAR-MIXED-EFFECT-tp26300394p26315302.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme model specification
On Sun, Nov 15, 2009 at 7:19 AM, Green, Gerwyn (greeng6) g.gre...@lancaster.ac.uk wrote: Dear all this is a question of model specification in lme which I'd for which I'd greatly appreciate some guidance. Suppose I have data in long format gene treatment rep Y 1 1 1 4.32 1 1 2 4.67 1 1 3 5.09 . . . . . . . . . . . . 1 4 1 3.67 1 4 2 4.64 1 4 3 4.87 . . . . . . . . . . . . 2000 1 1 5.12 2000 1 2 2.87 2000 1 3 7.23 . . . . . . . . . . . . 2000 4 1 2.48 2000 4 2 3.93 2000 4 3 5.17 that is, I have data Y_{gtr} for g (gene) =1,...,2000 t (treatment) = 1,...,4 and r (replicate) = 1,...,3 I would like to fit the following linear mixed model using lme Y_{gtr} = \mu_{g} + W_{gt} + Z_{gtr} where the \mu_{g}'s are fixed gene effects, W_{gt} ~ N(0, \sigma^{2}) gene-treatment interactions, and residual errors Z_{gtr} ~ N(0,\tau^{2}). (Yes, I know I'm specifying an interaction between gene and treatment without specifying a treatment main effect ! - there is good reason for this) You are going to end up estimating 2000 fixed-effects parameters for gene, which will take up a lot of memory (one copy of the model matrix for the fixed-effects will be 24000 by 2000 double precision numbers or about 400 MB). You might be able to fit that in lme as lme(Y ~ -1 + factor(gene), data = data, random = ~ 1|gene:treatment) but it will probably take a long time or run out of memory. There is an alternative which is to use the development branch of the lme4 package that allows for a sparse model matrix for the fixed-effects parameters. Or ask yourself if you really need to model the genes as fixed effects instead of random effects. We have seen situations where users do not want the shrinkage involved with random effects but it is rare. If you want to follow up on the development branch (for which binary packages are not currently available, i.e. you need to compile it yourself) then we can correspond off-list. I know that specifying model.1 - lme(Y ~ -1 + factor(gene), data=data, random= ~1|gene/treatment) fits Y_{gtr} = \mu_{g} + U_{g} + W_{gt} + Z_{gtr} with \mu_{g}, W_{gt} and Z_{gtr} as previous and U_{g} ~ N(0,\gamma^{2}), but I do NOT want to specify a random gene effect. I have scoured Bates and Pinheiro without coming across a parallel example. Any help would be greatly appreciated Best Gerwyn Green School of Health and Medicine Lancaster Uinversity __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function which saves an image of a dgtMatrix as png
image applied to a sparseMatrix object uses lattice functions to create the image. As described in R FAQ 7.22 you must use print(image(x)) or show(image(x)) or even plot(image(x)) when a lattice function is called from within another function. On Wed, Apr 28, 2010 at 1:20 PM, Gildas Mazo gildas.m...@curie.fr wrote: Hi, I'm getting crazy: This does work: library(Matrix) a1-b1-c(1,2) c1-rnorm(2) aDgt-spMatrix(ncol=3,nrow=3,i=a1,j=b1,x=c1) png(myImage.png) image(aDgt) dev.off() But this doesn't !!! f-function(x){ png(myImage.png) image(x) dev.off() } f(aDgt) My image is saved as a text file and contains nothing at all !!! Thanks in advance, Gildas Mazo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in La.svd Lapack routine 'dgesdd'
Google the name dgesdd to get the documentation where you will find that the error code indicates that the SVD algorithm failed to converge. Evaluation of the singular values and vectors is done via an iterative optimization and on some occasions will fail to converge. Frequently this is related to the scaling of the matrix. If some rows or columns are a very large magnitude relative to others the convergence of the optimization can be impeded. Providing a reproducible example of such an error condition will help in diagnosing what is happening. If you wonder why the error message is so enigmatic, it is because the underlying code is Fortran and does not provide much flexibility for informative error trapping. On Tue, May 4, 2010 at 1:24 AM, steven mosher mosherste...@gmail.com wrote: Error in La.svd(x, nu, nv) : error code 1 from Lapack routine ‘dgesdd’ what resources are there to track down errors like this [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help in understanding R code, and maybe some math
On Sun, May 23, 2010 at 5:09 AM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2010-05-23 0:56, john smith wrote: Hi, I am trying to implement Higham's algorithm for correcting a non positive definite covariance matrix. I found this code in R: http://projects.cs.kent.ac.uk/projects/cxxr/trac/browser/trunk/src/library/Recommended/Matrix/R/nearPD.R?rev=637 I managed to understand most of it, the only line I really don't understand is this one: X- tcrossprod(Q * rep(d[p], each=nrow(Q)), Q) This line is supposed to calculate the matrix product Q*D*Q^T, Q is an n by m matrix and R is a diagonal n by n matrix. What does this mean? I also don't understand the meaning of a cross product between matrices, I only know it between vectors. In the original S language, on which R is based, the function named crossprod was used for what statisticians view as the cross-product of the columns of a matrix, such as a multivariate data matrix or a model matrix. That is crossprod(X) := X'X This is a special case of the cross-product of the columns of two matrices with the same number of rows crossprod(X, Y) := X'Y The tcrossprod function was introduced more recently to mean the crossprod of the transpose of X. That is trcossprod(X) := crossprod(t(X)) := X %*% t(X) These definitions are unrelated to the cross-product of vectors used in Physics and related disciplines. The reason for creating such functions is that these are common operations in statistical computing and it helps to know the special structure (e.g. the result of crossprod(X) or tcrossprod(X) is a symmetric, positive semidefinite matrix). You could have a look at the help page for crossprod which gives the definitions of crossprod and tcrossprod. Perhaps this will help: Q - matrix(1:12, ncol=3) v - rep(1:3, each=nrow(Q) Q v Q * v (Q * v) %*% t(Q) tcrossprod(Q * v, Q) -Peter Ehlers Thanks, Barisdad. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression with sparse matricies
As Frank mentioned in his reply, expecting to estimate tens of thousands of fixed-effects parameters in a logistic regression is optimistic. You could start with a generalized linear mixed model instead library(lme4) fm1 - glmer(resp ~ 1 + (1|f1) + (1|f2) + (1|f1:f2), mydata, binomial)) If you have difficulty with that it might be best to switch the discussion to the r-sig-mixed-mod...@r-project.org mailing list. On Sat, May 22, 2010 at 2:19 PM, Robin Jeffries rjeffr...@ucla.edu wrote: I would like to run a logistic regression on some factor variables (main effects and eventually an interaction) that are very sparse. I have a moderately large dataset, ~100k observations with 1500 factor levels for one variable (x1) and 600 for another (X2), creating ~19000 levels for the interaction (X1:X2). I would like to take advantage of the sparseness in these factors to avoid using GLM. Actually glm is not an option given the size of the design matrix. I have looked through the Matrix package as well as other packages without much help. Is there some option, some modification of glm, some way that it will recognize a sparse matrix and avoid large matrix inversions? -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse matrices in lme4
On Mon, May 24, 2010 at 6:24 PM, Robin Jeffries rjeffr...@ucla.edu wrote: I read somewhere (help list, documentation) that the random effects in lme4 uses sparse matrix technology. Yes. That is why there is such a close link between the Matrix and lme4 packages. The sparse matrix methods in the Matrix package are crucial to the lme4 package. I'd like to confirm with others that I can't use a sparse matrix as a fixed effect? I'm getting an Invalid type (S4) error. The development version of the lme4 package, called lme4a and only available from R-forge, has an optional argument sparseX that allows for this. In fact, that option is one of the reasons that the development version has been in development for so long (the programming becomes much more intricate when you must allow for such an option). Some day, and I hope not too far in the future, the lme4a package will be released as lme4. By the way, questions such as this are probably more suitable for the r-sig-mixed-mod...@r-project.org mailing list, which I am cc:ing on this reply. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to preallocate numeric NA array?
On Thu, Nov 26, 2009 at 10:03 AM, Rob Steele freenx.10.robste...@xoxy.net wrote: These are the ways that occur to me. ## This produces a logical vector, which will get converted to a numeric ## vector the first time a number is assigned to it. That seems ## wasteful. x - rep(NA, n) ## This does the conversion ahead of time but it's still creating a ## logical vector first, which seems wasteful. x - as.numeric(rep(NA, n)) ## This avoids type conversion but still involves two assignments for ## each element in the vector. x - numeric(n) x[] - NA ## This seems reasonable. x - rep(as.numeric(NA), n) Comments? My intuition would be to go with the third method (allocate a numeric vector then assign NA to its contents) but I haven't tested the different. In fact, it would be difficult to see differences in, for example, execution time unless n was very large. This brings up a different question which is, why do you want to consider this? Are you striving for readability, for speed, for low memory footprint, for efficiency in some other way? When we were programming in S on machines with 1 mips processors and a couple of megabytes of memory, such considerations were important. I'm not sure they are quite as important now. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple CHOLMOD errors when attempting poisson glmm
On Thu, Dec 24, 2009 at 1:03 PM, postava-davig.m postava-davi...@husky.neu.edu wrote: Hello, I have been attempting to run a poisson glmm using lme4 for some time now and have had a lot of trouble. I would say 9 times out of 10 I receive the following warning: CHOLMOD warning: %h Error in mer_finalize(ans) : Cholmod error `not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 432 That is an (admittedly obscure) indication that the Cholesky factorization of a matrix derived from the random-effects model matrix cannot be performed. My data are counts of microbe colony forming units (CFUs) collected from termite cuticles and the surrounding environment over a 3 year period. I am attempting to analyze the effect of several factors on these counts (termite nest volume, temperature, humidity, light, incubation temperature, habitat, year, sample location, etc.) to determine which account for the variance in microbial communities. These data are observations, so there are many missing valueswhich may be part of the problem. I've tried many different combinations of variables, and also have tried reducing my data set to remove as many NA's and confounding variables as possible, but I still can't get any models to work consistently. One most recent attempt had the following output: model1=lmer(totalcfus~habitat*temp*moisture*light+location+(1|habitat/colony/location),family=poisson,control=list(msVerbose=1)) 0: 553377.59: 1.00573 0.620530 0.169516 26.3904 -13.1266 -33.2286 -21.1955 -21.1064 -0.590761 -0.217403 -0.0342272 -0.960593 -0.0962517 0.441626 1.20575 0.718621 0.680580 0.171006 0.403729 0.278822 0.275395 0.00707767 0.0225599 0.0854869 0.0533373 0.0243451 0.00114120 0.000403226 -0.00566960 -0.0143715 -0.00931896 -0.00879323 -0.000753236 -0.00335745 -0.00178054 -0.000788027 -0.000288944 -0.000909455 -0.000839295 -0.000309293 -1.35885e-05 9.76120e-06 3.57035e-05 2.78985e-05 1.01880e-05 CHOLMOD warning: %h Error in mer_finalize(ans) : Cholmod error `not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 432 Thank you for including the output from verbose = TRUE. It would also help if you included the output from sessionInfo() so we can see which version of R you are using and which version of the lme4 package you are using. How many observations are used in this fit? As you can see, the number of parameters being fit is very large and encountering singularities is not unexpected. May I suggest that we move this discussion to the r-sig-mixed-mod...@r-project.org mailing list, which I have cc:d on this reply? That list is specifically intended for discussions of this type. I have to admit that I'm at a loss, and have been unable to determine any pattern to when this error message comes up. I'm hoping that someone can help me eek out what the issue is with my data so that I can eventually work out a usable model. Thanks so much, and happy holidays. -- View this message in context: http://n4.nabble.com/Multiple-CHOLMOD-errors-when-attempting-poisson-glmm-tp978573p978573.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer: how to lmer with no intercept
On Tue, Dec 29, 2009 at 11:46 AM, Julia7 liujul...@yahoo.com wrote: Hello, How do I fit a mixed effect model with no intercept using lmer()? Is the following syntax correct? lmer(y ~ -1+x1+x2+(-1+x1+x2|id), data=dt) Yes. An alternative, which I prefer, is lmer(y ~ 0 + x1 + x2 + (0 + x1 + x2|id), dt) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: glm error: cannot correct step size
I forgot to cc: the list on this response. -- Forwarded message -- From: Douglas Bates ba...@stat.wisc.edu Date: Wed, Dec 30, 2009 at 8:05 AM Subject: Re: [R] glm error: cannot correct step size To: John Sorkin jsor...@grecc.umaryland.edu On Wed, Dec 30, 2009 at 7:50 AM, John Sorkin jsor...@grecc.umaryland.edu wrote: R 2.8.1 windows XP I am getting an error message that I don't understand when I try to run GLM. The error only occurs when I have all independent variables in the model. When I drop one independent variable, the model runs fine. Can anyone help me understand what the error means and how I can correct it? Thank you, John fit11-glm(AAMTCARE~BMI+BMIsq+SEX+jPHI+jMEDICAID+factor(AgeCat)+ + factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat)+factor(jrace),data=SimData,family=Gamma(link=log)) Warning: step size truncated due to divergence Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : inner loop 1; cannot correct step size That error message and the warning indicate that the algorithm (iteratively reweighted least squares or IRLS) for fitting the parameters cannot converge, which can be because the model is over-specified. As you see, when you remove of the terms you do get parameter estimates and I would guess that, even then, the model will be over-specified. # Drop factor(jrace) and model runs without a problem. fit11-glm(AAMTCARE~BMI+BMIsq+SEX+jPHI+jMEDICAID+factor(AgeCat)+ + factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat) ,data=SimData,family=Gamma(link=log)) # Drop jPHI and model runs without a problem. fit11-glm(AAMTCARE~BMI+BMIsq+SEX+ jMEDICAID+factor(AgeCat)+ + factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat)+jrace,data=SimData,family=Gamma(link=log)) Thanks, John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Piecewise regression in lmer
On Mon, Jan 4, 2010 at 6:24 AM, Walmes Zeviani walmeszevi...@hotmail.com wrote: AD Hayward wrote: Dear all, I'm attempting to use a piecewise regression to model the trajectory of reproductive traits with age in a longitudinal data set using a mixed model framework. The aim is to find three slopes and two points- the slope from low performance in early age to a point of high performance in middle age, the slope (may be 0) of the plateau from the start of high performance to the end of high performance , and the slope of the decline from the end of high performance to the end of life. I've found the segmented package useful, but it cannot be implemented in a mixed model framework. I've also attempted piecewise regression using this formula in lmer: m-lmer(repro ~ OTHER FIXED EFFECTS + age*(age 2) + age*(age = 2 age 8) + age*(age = 8) + (1|id) + (1|yr), data = reproduction, family = binomial, link = logit, GHQ = TRUE) However, this gives the warning: Warning message: In mer_finalize(ans) : gr cannot be computed at initial par (65) which is not apparent if I use just two break points or I implement the model in glm. My question is essentially whether anyone can recommend a method for performing piecewise regression in lmer or another mixed model framework. Any advice would be greatly appreciated. Adam, A segmented linear model, for estimation purposes, is a nonlinear model. It requires a iteractive procedure for estimation of fixed effects. You could use nlmer() for this. It appears that Adam is using fixed knot positions, in which case the segmented model is a linear model. He is also using family = binomial so it is a generalized linear mixed model, which does require iterative optimization, but does not require nlmer(). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm output for binomial family
On Tue, Jun 8, 2010 at 7:10 AM, Enrico Colosimo enrico...@gmail.com wrote: Hello, I am having some trouble running a very simple example. I am running a logistic regression entering the SAME data set in two different forms and getting different values for the deviance residual. Just look with this naive data set: # 1- Entering as a Bernoulli data set y-c(1,0,1,1,0) x-c(2,2,5,5,8) ajust1-glm(y~x,family=binomial(link=logit)) ajust1 # Coefficients: (Intercept) x 1.3107 -0.2017 Degrees of Freedom: 4 Total (i.e. Null); 3 Residual Null Deviance: 6.73 Residual Deviance: 6.491 AIC: 10.49 # # 2- Entering as Binomial data set # ysim-c(1,2,0) ynao-c(1,0,1) x-c(2,5,8) dados-cbind(ysim,ynao,x) dados-as.data.frame(dados) attach(dados) ajust2-glm(as.matrix(dados[,c(1,2)])~x,family=binomial, data=dados) summary(ajust2) # Coefficients: (Intercept) x 1.3107 -0.2017 Degrees of Freedom: 2 Total (i.e. Null); 1 Residual Null Deviance: 3.958 Residual Deviance: 3.718 AIC: 9.104 = It seems that there is problem with the first fitting!!! In what way? Notice that the estimates of the coefficients are the same in the two fits and the difference between the null deviance and the residual deviance is approximately the same. If you are worried about the deviance in the first fit being greater than the deviance in the second fit it is because of the definition of the deviance used. The deviance for the binomial fit is a shifted version of the deviance from the Bernoulli fit, which is why the null deviance is also reported. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do faster ANOVAS
The lm and aov functions can take a matrix response allowing you to fit all of the responses for a single attribute simultaneously. On Thu, Jun 10, 2010 at 8:47 AM, melissa mellep...@orange.fr wrote: Dear all R users, I want to realize 800 000 ANOVAS and to store Sum of Squares of the effects. Here is an extract of my table data Product attribute subject rep t1 t2 t3 … t101 P1 A1 S1 R1 1 0 0 … 1 I want to realize 1 ANOVA per timepoint and per attribute, there are 101 timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be an ANOVA with two factors : Here is one example: Aov(t1~Subject*Product,data[data$attribute==”A1”,]) I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal. In fact I want the result in several matrices: Ssprod matrice: T1 t2 t3 t4 … t101 A1 ssprod(A1,T1) A2 A3 … A8 So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and sstotal. And this is for one permutation, and I want to do 1000 permutations Here is my code: SSmatrixglobal-function(k){ daten.temp-data daten.temp$product=permutations[[k]] listmat-apply(daten.temp[,5:105],2,function(x,y){ tab2-as.data.frame(cbind(x,y)) tab.class-by(tab2[,1:3],tab2[,4],function(x){ f - formula(paste(names(x)[1],~,names(x)[2],*,names(x)[3],sep=)) anovas - aov(f, data=x) anovas$call$formula -f s1 - summary(anovas) qa - s1[[1]][,2] return(qa) }) return(tab.class) },y=daten.temp[,1:3] ) ar - array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat))) l=lapply(1:4,function(i) ar[i,,]) sssujet=l[[1]] ssprod=l[[2]] ssinter=l[[3]] sserreur=l[[4]] ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal) ss=as.data.frame(ss) sqlSave(channel,ss,SS1000,append=T) rm(ss,numperm,daten.temp) } system.time(por - lapply(c(1:1000), SSmatrixglobal)) But it takes time about 90seconds for a permutation so *1000, how can I do in order to do faster ANOVAS? Many thanks Best regards Mélissa PS: I think that I can gain a lot of time in the aov function but I don't know how to do [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compiling R with multi-threaded BLAS math libraries - why not actually ?
On Sat, Jun 12, 2010 at 6:18 AM, Tal Galili tal.gal...@gmail.com wrote: Hello Gabor, Matt, Dirk. Thank you all for clarifying the situation. So if I understand correctly then: 1) Changing the BLAST would require specific BLAST per computer configuration (OS/chipset). It's BLAS (Basic Linear Algebra Subroutines) not BLAST. Normally I wouldn't be picky like this but if you plan to use a search engine you won't find anything helpful under BLAST. 2) The advantage would be available only when doing _lots_ of linear algebra You need to be working with large matrices and doing very specific kinds of operations before the time savings of multiple threads overcomes the communications overhead. In fact, sometimes the accelerated BLAS can slow down numerical linear algebra calculations, such as sparse matrix operations. So I am left wondering for each item: 1) How do you find a better (e.g: more suited) BLAST for your system? (I am sure there are tutorials for that, but if someone here has a recommendation on one - it would be nice) As Dirk has pointed out, it is a simple process. Step 1: Install Ubuntu or some other Debian-based Linux system. Step 2: type sudo apt-get install r-base-core libatlas3gf-base 2) In what situations do we use __lots of linear algebra? For example, I have cases where I performed many linear regressions on a problem, would that be a case the BLAST engine be effecting? Re-read David's posting. The lm and glm functions do not benefit substantially from accelerated BLAS because the underlying computational methods only use level-1 BLAS. (David said they don't use BLAS but that is not quite correct. I posted a follow-up comment describing why lm and glm don't benefit from accelerated BLAS.) I am trying to understand if REvolution emphasis on this is a marketing gimmick, or are they insisting on something that some R users might wish to take into account. In which case I would, naturally (for many reasons), prefer to be able to tweak the native R system instead of needing to work with REvolution distribution. As those who, in Duncan Murdoch's phrase, found the situation sufficiently extreme to cause them to read the documentation, would know, descriptions of using accelerated BLAS with R have been in the R administration manual for years. Admittedly it is not a straightforward process but that is because, like so many other things, it needs to be handled differently on each operating system. In fact it is even worse because the procedure can be specific to the operating system and the processor architecture and, sometimes, even the task. Again, re-read David's posting where he says that you probably don't want to combine multiple MKL threads with explicit parallel programming in R using doSMP. David's posting (appropriately) shows very specific examples that benefit greatly from accelerated BLAS. Notice that these examples incorporate very large matrices. The first two examples involve forming chol(crossprod(A)) where A is 1 by 5000. If you have very specific structure in A this calculation might be meaningful. In general, it is meaningless because crossprod(A) is almost certainly singular. (I am vague on the details but perhaps someone who is familiar with the distribution of singular values of matrices can explain the theoretical results. There is a whole field of statistics research dealing with sparsity in the estimation of covariance matrices that attacks exactly this large n, large p rank deficiency problem.) Lastly, following on Matt suggestion, if any has a tutorial on the subject, I'd be more then glad to publish it on r-statistics/r-bloggers. Thanks again to everyone for the detailed replies. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sat, Jun 12, 2010 at 6:01 AM, Matt Shotwell shotw...@musc.edu wrote: In the case of REvolution R, David mentioned using the Intel MKL, proprietary library which may not be distributed in the way R is distributed. Maybe REvolution has a license to redistribute the library. For the others, I suspect Gabor has the right idea, that the R-core team would rather not keep architecture dependent code in the sources, although there is a very small amount already (`grep -R __asm__`). However, I know using Linux (Debian in particular) it is fairly straightforward to build R with `enhanced' BLAS libraries. The R Administration and Installation manual has a pretty good section on linking with enhanced BLAS and LAPACK libs, including the Intel MKL, if you are willing cough up $399, or swear not to use the library commercially or academically. Maybe a short tutorial using free
Re: [R] Can one get a list of recommended packages?
On Sat, Jun 12, 2010 at 8:37 AM, Dr. David Kirkby david.kir...@onetel.net wrote: R 2.10.1 is used in the Sage maths project. Several recommended packages (Matrix, class, mgcv, nnet, rpart, spatial, and survival) are failing to build on Solaris 10 (SPARC). Have you checked the dependencies for those packages? Some require GNU make. We would like to be able to get a list of the recommended packages for R 2.10.1, but ideally via a call to R, so it is not necessary to update that list every time a new version of R is released. We do not want to access the Internet to get this information. Is there a way in R to list the recommended packages? I'm not sure I understand the logic of this. If you are going to build R then presumably you have the tar.gz file which contains the sources for the recommended packages in the subdirectory src/library/Recommended/. Why not get the list from there? $ cd ~/src/R-devel/src/library/Recommended/ $ ls *.tgz boot.tgz codetools.tgz lattice.tgz mgcv.tgz rpart.tgz class.tgzforeign.tgz MASS.tgz nlme.tgz spatial.tgz cluster.tgz KernSmooth.tgz Matrix.tgz nnet.tgz survival.tgz Better still, is there a way to list the recommended packages which have not been installed, so getting a list of any failures? Again, this seems to be a rather convoluted approach. Why not check why the packages don't install properly? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pretty printing progress
On Thu, Jun 17, 2010 at 10:50 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Thu, Jun 17, 2010 at 3:33 PM, Doran, Harold hdo...@air.org wrote: I have a function that is an iterative process for estimating some MLEs. I want to print some progress to screen as the process iterates. I would like to try and line things up nicely in the R window, but am not sure the best way to do this. Below is a toy example. Suppose I want the value of 10 to be just below iteration and the value of -1234 to be just below 'Log Likelihood'. Sure you just dont want to use the progress bar functions from the plyr package: ?plyr::create_progress_bar another example of things being in the wrong package If you want to stick with a text display you can use the sprintf function to format the strings that you print cat('Iteration Log Likelihood\n', sprintf(%8d%20g\n, 10, -1234)) Iteration Log Likelihood 10 -1234 I would avoid the tab character as you can't count on the displays of tabs to be consistent. You may also want to change the display of the log-likelihood to be a fixed number of decimal places rather than a general format for floating point numbers, which can switch into the e notation for very large or very small numbers. cat('Iteration Log Likelihood\n', sprintf(%8d%20.4f\n, 10, -1234)) Iteration Log Likelihood 10 -1234. The format of the format strings is another little language to learn but it is a very powerful mechanism. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to efficiently compute set unique?
On Mon, Jun 21, 2010 at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote: On Jun 21, 2010, at 9:18 PM, Duncan Murdoch wrote: On 21/06/2010 9:06 PM, G FANG wrote: Hi, I want to get the unique set from a large numeric k by 1 vector, k is in tens of millions when I used the matlab function unique, it takes less than 10 secs but when I tried to use the unique in R with similar CPU and memory, it is not done in minutes I am wondering, am I using the function in the right way? dim(cntxtn) [1] 13584763 1 uniqueCntxt = unique(cntxtn); # this is taking really long What type is cntxtn? If I do that sort of thing on a numeric vector, it's quite fast: x - sample(10, size=13584763, replace=T) system.time(unique(x)) user system elapsed 3.61 0.14 3.75 If it's a factor, it could be as simple as: levels(cntxtn) # since the work of unique-ification has already been done. Not quite. When you generate a factor, as you do in your example, the levels correspond to the unique values of the original vector. But when you take a subset of a factor the levels are preserved intact, even if some of those levels do not occur in the subset. This is why there are unusual arguments with names like drop.unused.levels in functions like model.frame. It is also a subtle difference in the behavior of factor(x) and as.factor(x) when x is already a factor. ff - factor(sample.int(200, 1000, replace = TRUE)) ff1 - ff[1:40] length(levels(ff)) [1] 199 length(levels(ff1)) [1] 199 length(levels(as.factor(ff1))) [1] 199 length(levels(factor(ff1))) [1] 34 x - factor(sample(10, size=13584763, replace=T)) system.time(levels(x)) user system elapsed 0 0 0 system.time(y - levels(x)) user system elapsed 0 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice and Beamer
On Mon, Jun 28, 2010 at 12:28 PM, Doran, Harold hdo...@air.org wrote: Two things I think are some of the best developments in statistics and production are the lattice package and the beamer class for presentation in Latex. One thing I have not become very good at is properly sizing my visuals to look good in a presentation. For instance, I have the following code that creates a nice plot (sorry, cannot provide reproducible data). bwplot(testedgrade~person_measure|gender + ethnicity, pfile, layout=c(2,5), main = 'Distribution of Person Measure by Grade\n Conditional on Gender and Ethnicity (Math)', xlab = 'Grade') Now inside my latex document using the beamer class for presentation I have the following \begin{frame} \frametitle{Distribution of Person Parameters by Grade Conditional on Gender and Ethnicity} \begin{figure}[htb] \centering \fbox{\includegraphics[scale=.3]{personGenEthn.pdf}} \end{figure} \end{frame} I use the scale argument here. I do this totally randomly. I first start with scale=.5. Then, I create the document and look at it. If it seems to fit, I keep it. If it's too big, I resize it until it looks good. There must certainly be a much better way to size these for specific use with latex presentations. Any thoughts? I think we have had this discussion before and I have tried to convince you to use Sweave with beamer and lattice.:-) A big advantage of Sweave is that you have the code that the generates the figures in the LaTeX file and you don't allow the possibility of losing track of PDF files containing the latest versions of figures. In my preamble I have some lines like \SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all} \SweaveOpts{include=TRUE} \setkeys{Gin}{width=\textwidth} Setting the default height and width of the PDF figure and the inclusion width=\textwidth provides a default scaling that looks good to me. If I want a shorter figure that allows for text on the slide then I set the height to a smaller value. A full height version looks like \begin{frame} \frametitle{Plot of inverse canonical link for the Bernoulli distribution} BernoulliinvLink,fig=TRUE,echo=FALSE= linkinv - function(eta) 1/(1+exp(-eta)) eta - seq(-7,7,len = 701) print(xyplot(linkinv(eta) ~ eta, type = c(g,l), xlab = expression(eta), ylab = expression(mu == frac(1,1+exp(-eta) @ \end{frame} Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme4
On OS X you need to install the source package for lme4 as the binary package fails one of the tests. We have been unable to reproduce this failure under other operating systems, which makes it hard to debug. On Tue, Jul 6, 2010 at 11:09 AM, Alex Foley alex_foley_...@yahoo.com wrote: Hi, I was trying to install lme4 package, but got the following errors: install.packages(lme4) Warning in install.packages(lme4) : argument 'lib' is missing: using '/Users/xx/Library/R/2.11/library' Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘lme4’ is not available The session info is: sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Matrix_0.999375-39 lattice_0.18-8 loaded via a namespace (and not attached): [1] grid_2.11.1 nlme_3.1-96 stats4_2.11.1 tools_2.11.1 Should I be installing this in a different manner? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] advice/opinion on quot; lt; -quot; vs quot; =quot; in teaching R
On Fri, Jan 15, 2010 at 10:00 AM, Ben Bolker bol...@ufl.edu wrote: John Kane jrkrideau at yahoo.ca writes: I've only been using R for about 2.5 years but and I'm not all that good but I vote for - . I think the deciding factor is in RSiteSearch() and the various manuals. Almost everything I see uses - . Why introduce = when it is not used normally? It will just confuse the students who are trying to use any of the documentation. Not to mention they might slammed for bad syntax on the R-help mailing list. :) Those are all good reasons. I have said something similar before (see http://www.mail-archive.com/r-help@r-project.org/msg16904.html), but I tend to use = because it seems to be more intuitive for students, despite being logically confused at a deeper level, and I want to spare them any additional cognitive load when they are first getting introduced to R. I'm not particularly convinced by the - is more general and there are some contexts where = doesn't work, because I'm not trying to be absolutely rigorous, nor teach all the possible ins and outs of R syntax. I would be very surprised if any of the examples given actually came up in the course of a first-semester statistics/modeling R course. I teach the idiom summary(fm1 - lm(y ~ x, mydata)) in my introductory courses. I just want to do what works best for the students -- the problem is deciding on the balance between short term benefit (- is one more odd thing to get used to) and long term benefit (they will see - in other contexts, so they might as well get used to it eventually). Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hierarchical Linear Model using lme4's lmer
On Sat, Jan 16, 2010 at 8:20 AM, Walmes Zeviani walmeszevi...@hotmail.com wrote: Doug, It appears you are mixing nlme and lme4 formulation type. On nlme library you type lme(y~x, random=~1|subjetc) On lme4 library you type lmer(y~x+(1|subject)) You mixed them. At your disposal. Which is what I tell my wife when I am standing by our sink. Walmes. Doug Adams wrote: Hi, I was wondering: I've got a dataset where I've got student 'project's nested within 'school's, and 'division' (elementary, junior, or senior) at the student project level. (Division is at the student level and not nested within schools because some students are registered as juniors others as seniors within the same school.) So schools are random, division is fixed, and the student Score is the outcome variable. This is what I've tried: lmer(data=Age6m, Score ~ division + (1|school), random=~1 | school) Am I on the right track? Thanks everyone, :) Doug Adams MStat Student University of Utah Walmes is correct that this is mixing two formulations of the model. It turns out that the model will be fit correctly anyway. The lmer function has a ... argument which will silently swallow the argument random = ~ 1|school and ignore it. Looks like we should add a check for specification of a random argument and provide a warning if it is present. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Symmetric Matrix classes
On Tue, Jan 19, 2010 at 9:26 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Scanning for 'Matrix' in old R-help e-mail, I found GA == Gad Abraham gabra...@csse.unimelb.edu.au on Fri, 27 Nov 2009 13:45:00 +1100 writes: GA Hi, GA I'd like to store large covariance matrices using Matrix classes. GA dsyMatrix seems like the right one, but I want to specify just the GA upper/lower triangle and diagonal and not have to instantiate a huge GA n^2 vector just for the sake of having half of it ignored: GA Dumb example: GA M - new(dsyMatrix, uplo=U, x=rnorm(1e4), Dim=as.integer(c(100, 100))) GA diag(M) - 1 GA This doesn't work: GA M - new(dsyMatrix, uplo=U, x=0, Dim=as.integer(c(100, 100))) GA Error in validObject(.Object) : GA invalid class dsyMatrix object: length of x slot != prod(Dim) GA Is there an easier way of doing this? I think you want a dspMatrix (sp == symmetric packed) instead. Before going into details: Is this topic still interesting to those involved almost two months ago? Regards, Martin Also, I fail to understand the advantage of allocating storage for roughly half the elements in the matrix instead of the whole matrix when the matrix is so large. If dense storage of a symmetric matrix of size 2 takes up too much memory (approx 3 GB for each copy) it is unlikely that the packed symmetric storage form, using about 1.5 GB per copy, is going to save the day. 8 * (2 * 2)/2^30 # size in gigabytes for full array [1] 2.980232 4 * (2 * 20001)/2^30 # size in gigabytes for sp array [1] 1.490191 If the matrix is sparse, the dsCMatrix class may help. And there is also the issue of what exactly do you want to do with the matrix once you have stored it? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hierarchical data sets: which software to use?
On Sun, Jan 31, 2010 at 10:24 PM, Anton du Toit atdutoitrh...@gmail.com wrote: Dear R-helpers, I’m writing for advice on whether I should use R or a different package or language. I’ve looked through the R-help archives, some manuals, and some other sites as well, and I haven’t done too well finding relevant info, hence my question here. I’m working with hierarchical data (in SPSS lingo). That is, for each case (person) I read in three types of (medical) record: 1. demographic data: name, age, sex, address, etc 2. ‘admissions’ data: this generally repeats, so I will have 20 or so variables relating to their first hospital admission, then the same 20 again for their second admission, and so on 3. ‘collections’ data, about 100 variables containing the results of a battery of standard tests. These are administered at intervals and so this is repeating data as well. The number of repetitions varies between cases, so in its one case per line format the data is non-rectangular. At present I have shoehorned all of this into SPSS, with each case on one line. My test database has 2,500 variables and 1,500 cases (or persons), and in SPSS’s *.SAV format is ~4MB. The one I finally work with will be larger again, though likely within one order of magnitude. Down the track, funding permitting, I hope to be working with tens of thousands of cases. Although this may not be helpful for your immediate goal, storing and manipulating data of this size and complexity (and, I expect, cost for collection) really calls for tools like relational databases. A single flat file of 2500 variables by 1500 cases is almost never the best way to organize such data. A normalized representation as a collection of interlinked tables in a relational data base is much more effective and less error prone. The widespread use of spreadsheets or SPSS data sets or SAS data sets which encourage the single table with a gargantuan number of columns, most of which are missing data in most cases approach to organization of longitudinal data is regrettable. For later analysis in R it is better to start with long form of the data, as opposed to the wide form, even if it means repeating demographic information over several occasions. Using a relational database allows for a long view to be generated without the possibility of inconsistency in the demographics. I am using the descriptions long and wide in the sense that they are used in the reshape help page. See ?reshape in R. The long view is also called the subject/occasion view in the sense that each row corresponds to one subject on one occasion. Robert Gentleman's book R Programming for Bioinformatics provides background on linking R to relational databases. As I said at the beginning, you may not want to undertake the necessary study and effort to reorganize your data for this specific project but if you do this a lot you may want to consider it. I am wondering if I should keep using SPSS, or try something else. The types of analysis I’ll typically will have to do will involve comparing measurements at different times, e.g. before/ after treatment. I’ll also need to compare groups of people, e.g. treatment / no treatment. Regression and factor analyses will doubtless come into it at some point too. So: 1. should I use R or try something else? 2. can anyone advise me on using R with the type of data I’ve described? Many thanks, Anton du Toit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer Error message
On Sat, Feb 6, 2010 at 4:45 AM, Martin Bulla bu...@centrum.cz wrote: Does anybody knows what this error message means: Error in object$terms : $ operator not defined for this S4 class The error message means what it says and it doesn't come from lmer, it comes from the drop1 function being applied to a model fit by lmer. You are assuming that you can apply drop1 to an lmer model and you can't. You need to do the modifications of the model formula by hand because an lmer formula contains terms that would not be meaningful for drop1. I have peformed the following steps: library(lattice) library(Matrix) library(lme4) inkm inkm$Gamie glm.incm drop1(glm.incm,test=Ch) Error in object$terms : $ operator not defined for this S4 class Your suggestin would be of a greatl help to me, Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-Help
On Sat, Feb 6, 2010 at 2:46 PM, David Winsemius dwinsem...@comcast.net wrote: On Feb 6, 2010, at 3:29 PM, Ravi Ramaswamy wrote: Hi - I am not familiar with R. Could I ask you a quick question? When I read a file like this, I get an error. Not sure what I am doing wrong. I use a MAC. How do I specify a full path name for a file in R? Or do files have to reside locally? KoreaAuto - read.table(/Users/ Especially when just starting using R the simplest approach is KoreaAuto - read.table(file.choose()) which brings up a file chooser panel so you can point and click your way to the desired file. If the file is tab-delimited, as appears to be the case in the file you enclosed, you may want to use read.delim instead of read.table. The read.delim function sets up the defaults for the many optional arguments to read.table specifically for tab-delimited files with a header line of column names as you have shown. I think the opening and clsing quotes meant that you supplied an empty string to the file argument. raviramaswamy/Documents/Rutgers/STT 586/HW1 Data.txt) Error: unexpected numeric constant in KoreaAuto - read.table(/Users/raviramaswamy/Documents/Rutgers/STT 586 Using single instances of either sort of quote ( or ' ) on the ends of strings should work. If you drag a file from a Finder window to the R-console you should get a fully specified file path and name. Seems like the working directory is getwd() [1] /Users/raviramaswamy rd - read.table(file=/Users/davidwinsemius/Downloads/meminfo.csv, sep=,, header=TRUE) rd time RSS VSZ MEM 1 1 3027932 3141808 4.5 2 2 3028572 3141808 4.5 3 3 3030208 3141808 4.5 4 4 302 3150004 4.5 5 5 3035036 3150004 4.5 You can also shorten the Users/username part to ~ rd - read.table(file=~/Downloads/meminfo.csv, sep=,, header=TRUE) so I said this and still got an error KoreaAuto - read.table(/Documents/Rutgers/HW1Data) Error: unexpected '/' in KoreaAuto - read.table(/ But using no quotes will definitely not work. (And that was not a full path name anyway.) Could someone please help me with the correct syntax? Thanks Ravi Year AO GNP CP OP 01 1974 .0022 183 2322 189 02 1975 .0024 238 2729 206 03 1976 .0027 319 3069 206 04 1977 .0035 408 2763 190 05 1978 .0050 540 2414 199 06 1979 .0064 676 2440 233 07 1980 .0065 785 2430 630 08 1981 .0069 944 2631 740 09 1982 .0078 1036 3155 740 10 1983 .0095 1171 3200 660 David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data views (Re: (Another) Bates fortune?)
On Sun, Feb 7, 2010 at 2:40 PM, Emmanuel Charpentier charp...@bacbuc.dyndns.org wrote: Note : this post has been motivated more by the hierarchical data subject than the aside joke of Douglas Bates, but might be of interest to its respondents. Le vendredi 05 février 2010 à 21:56 +0100, Peter Dalgaard a écrit : Peter Ehlers wrote: I vote to 'fortunize' Doug Bates on Hierarchical data sets: which software to use? The widespread use of spreadsheets or SPSS data sets or SAS data sets which encourage the single table with a gargantuan number of columns, most of which are missing data in most cases approach to organization of longitudinal data is regrettable. http://n4.nabble.com/Hierarchical-data-sets-which-software-to-use-td1458477.html#a1470430 Hmm, well, it's not like long format data frames (which I actually think are more common in connection with SAS's PROC MIXED) are much better. Those tend to replicate base data unnecessarily - as if rats change sex with millisecond resolution. [ Note to Achim Zeilis : the rats changing sex with millisecond resolution quote is well worth a nomination to fortune fame ; it seems it is not one already... ] The correct data structure would be a relational database with multiple levels of tables, but, to my knowledge, no statistical software, including R, is prepared to deal with data in that form. I think if you go back to my original reply you will see that my first suggestion was to use an SQL data base. I didn't mention views (in the SQL sense) explicitly but those are a natural construction for organizing longitudinal data. The data can be stored as a set of normalized tables in a data base but extracted as a data frame in the long format. Well, I can think of two exceptions : - BUGS, in its various incarnations (WinBUGS, OpenBUGS, JAGS), does not require its data to come from the same source. For example, while programming a hierarchical model (a. k. a. mixed-effect model), individual level variables may come from one source and various group level variables may come from other sources. Quite handy : no previous merge() required. Now, writing (and debugging !) such models in BUGS is another story... - SAS has had this concept of data view for a long time, its most useful incarnation being a data view of an SQL view. Again, this avoids the need to actually merge the datasets (which, AFAICR, is a serious piece of pain in the @$$ in SAS (maybe that's the *real* etymology of the name ?)). This problem has bugged me for a while. I think that the concept of a data view is right (after all, that's one of the core concepts of SQL for a reason...), but that implementing it *cleanly* in R is probably hard work. Using a DBMS for maintaining tables and views and querying them just at the right time does help, but the ability of using these DBMS data without importing them in R is, AFAIK, currently lacking. I think the issue is more than that. Most model-fitting functions in R incorporate a formula/data - model.frame - model.matrix sequence. The symbolic analysis to create the model frame can, I think, be applied to a view and the result stored back in an SQL data base. (I haven't looked at the code for model.frame in a long time but I think it can be applied a row at a time or to chunks of rows.) Some auxiliary information, such as unused factor levels, could be accumulated. The model.matrix function can be a bit more problematic but it too could be applied to chunks when generating dense model matrices, with the summary information from the chunks being accumulated. Updating sparse model matrices and accumulating summary information is potentially more time-consuming because you may need to update the form of the summary matrices as well as the numerical values. Of course in SAS these changes are easier because their least squares calculations are based on accumulating summary matrices from the data a row at a time. I think there are three different cases for fitting models based on large model matrices. If you have very large n (number of rows) but small to moderate p (number of columns) in the model matrix then you can work on chunks of rows using dense matrices to accumulate summary results. Large n and large p producing a sparse model matrix could be handled with the sparse.model.matrix function in the Matrix package because in these cases the model frame and the sparse model matrix tend to use smaller amounts of storage. When you have a large n and a large p creating a dense model matrix I think the best course is to buy a machine with a 64-bit processor and a 64-bit operating system and stuff it full of memory. It depends on how large p^2 is compared to np whether you are better off working in chunks or rows or not. One upon a time, a very old version of RPgSQL (a Bioconductor package), aimed to such a representation : it created objects
Re: [R] lmer
On Tue, Feb 9, 2010 at 4:38 AM, Demirtas, Hakan demir...@uic.edu wrote: Does lmer do three-level mixed-effects models? Yes. What forms of outcome variables can it handle (continuous, survival, binary)? I'd appreciate any help. Continuous outcomes for lmer. glmer can handle binary outcomes. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Potential problem with subset !!!
On Fri, Feb 12, 2010 at 9:22 AM, Arnaud Mosnier a.mosn...@gmail.com wrote: Dear useRs, Just a little post to provide the answer of a problem that took me some time to resolve ! Hope that reading this will permit the others to avoid that error. When using the subset function, writing subset (data, data$columnname == X) or subset (data, columnname == X) do the same thing. thus, the function consider that argument name given after the coma (like columnname) is the name of a column of the data frame considered. A problem occur when other arguments such as X are the names of both a column of the data frame and an object ! Here is an example: df - data.frame(ID = c(a,b,c,b,e), Other = 1:5) ID - unique (df$ID) ID ## Now the potential problem !! subset (df, df$ID == ID[4]) ## BE CAREFUL subset function use the column ID of the data.frame ## and NOT the object ID containing unique value Sorry if it seems obvious for some of you, but hope that others find it useful !! Myself, I think it would be obvious to anyone who had read the documentation for which the third paragraph is For data frames, the ‘subset’ argument works on the rows. Note that ‘subset’ will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples). Arnaud __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why double quote is preferred?
On Fri, Feb 12, 2010 at 4:25 PM, blue sky bluesky...@gmail.com wrote: ?'`' shows the following: Single and double quotes delimit character constants. They can be used interchangeably but double quotes are *preferred* (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes. It is not clear to me why double quote is preferred (I don't think that character constants are printed using double quotes should be the reason, in the sense that single quote can be used for printing if we want to do so). It seems that double quote and single quote can be used interchangeably, except that Single quotes need to be escaped by backslash in single-quoted strings, and double quotes in double-quoted strings. Could somebody why double quote is preferred? To avoid confusion for those who are accustomed to programming in the C family of languages (C, C++, Java), where there is a difference in the meaning of single quotes and double quotes. A C programmer reads 'a' as a single character and a as a character string consisting of the letter 'a' followed by a null character to terminate the string. In R there is no character data type, there are only character strings. For consistency with other languages it helps if character strings are delimited by double quotes. The single quote version in R is for convenience. On most keyboards you don't need to use the shift key to type a single quote but you do need the shift for a double quote. P.S. I do not plan to get into an extended debate on this issue, Peng. (As others have pointed out, it is considered bad form to disguise your identity on this list.) Your opinions on the design of the language have been noted but if you want the language to be redesigned to your specifications, you will need to fork your own version. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm function in R
On Sat, Feb 13, 2010 at 7:09 PM, Daniel Malter dan...@umd.edu wrote: It seems to me that your question is more about the econometrics than about R. Any introductory econometric textbook or compendium on econometrics will cover this as it is a basic. See, for example, Greene 2006 or Wooldridge 2002. Say X is your data matrix, that contains columns for each of the individual variables (x), columns with their interactions, and one column of 1s for the intercept. Let y be your dependent variable. Then, OLS estimates are computed by X'X inverse X'y Or in R solve(t(X)%*%X)%*%t(X)%*%y But that is not the way that the coefficients are calculated in R. It is the formula that is given in text books but, like most text book formulas, is not suitable for computation, especially with large data sets and complex models. There are many practical ways to calculate the least squares coefficients in large data sets and they all involve decompositions. For a Hadoop environment a scatter-gather approach would be to form Cholesky decompositions of the cross-product of sections of rows of the model matrix then combine the decompositions. Best, Daniel - cuncta stricte discussurus - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Something Something Sent: Saturday, February 13, 2010 5:04 PM To: Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] lm function in R I tried.. mod = lm(Y ~ X1*X2*X3, na.action = na.exclude) formula(mod) This produced Y ~ X1 * X2 * X3 When I typed just mod I got: Call: lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude) Coefficients: (Intercept) X11 X21 X31 X11:X21 X11:X31 X21:X31 X11:X21:X31 177.9245 0.2005 2.4482 3.1216 0.8127 -26.6166 -3.0398 29.6049 I am trying to figure out how R computed all these coefficients. On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter gunter.ber...@gene.com wrote: ?formula Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Something Something Sent: Saturday, February 13, 2010 1:24 PM To: Daniel Nordlund Cc: r-help@r-project.org Subject: Re: [R] lm function in R Thanks Dan. Yes that was very helpful. I didn't see the change from '*' to '+'. Seems like when I put * it means - interaction when I put + it's not an interaction. Is it correct to assume then that... When I put + R evaluates the following equation: Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk But when I put * R evaluates the following equation; Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 + + c Is this correct? If it is then can someone point me to any sources that will explain how the coefficients (such as b0... bk, b12.. , b123..) are calculated. I guess, one source is the R source code :) but is there any other documentation anywhere? Please let me know. Thanks. On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund djnordl...@verizon.netwrote: -Original Message- From: r-help-boun...@r-project.org [mailto: r-help-boun...@r-project.org] On Behalf Of Something Something Sent: Friday, February 12, 2010 5:28 PM To: Phil Spector; r-help@r-project.org Subject: Re: [R] lm function in R Thanks for the replies everyone. Greatly appreciate it. Some progress, but now I am getting the following values when I don't use as.factor 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667 Is that what you guys get? If you look at Phil's response below, no, that is not what he got. The difference is that you are specifying an interaction, whereas Phil did not (because the equation you initially specified did not include an interaction. Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula. On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector spec...@stat.berkeley.eduwrote: By converting the two variables to factors, you are fitting an entirely different model. Leave out the as.factor stuff and it will work exactly as you want it to. dat V1 V2 V3 V4 1 s1 14 4 1 2 s2 23 4 2 3 s3 30 7 2 4 s4 50 7 4 5 s5 39 10 3 6 s6 67 10 6 names(dat) = c('id','y','x1','x2') z = lm(y~x1+x2,dat) predict(z) 1 2 3 4 5 6 15.16667 24.7 27.7 46.7 40.16667 68.7 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu
Re: [R] False convergence of a glmer model
On Tue, Feb 16, 2010 at 9:05 AM, Shige Song shiges...@gmail.com wrote: Dear All, I am trying to fit a 2-level random intercept logistic regression on a data set of 20,000 cases. The model is specified as the following: m1 - glmer(inftmort ~ as.factor(cohort) + (1|code), family=binomial, data=d) I got Warning message: In mer_finalize(ans) : false convergence (8) That message means that the optimizer function, nlminb, got stalled. It has converged but the point at which is has converged is not clearly the optimum. In many cases this just indicates that the optimizer is being overly cautious. However, it can also mean that the problem is ill-defined. The fact the the second parameter is -7.46 is likely the problem. A difference in the probability of infant mortality between levels of cohort on the order of -7.5 on the logit scale is huge. Do the estimated probabilities at this value of the parameters make sense? P.S. Questions of this sort may be more readily answered in the R-SIG-mixed-models mailing list. With the verbose=TRUE option, I was able to get the following iteration history: 0: 3456.4146: 1.15161 -3.99068 -0.498790 -0.122116 1: 3361.3370: 1.04044 -4.38172 -0.561756 -0.289991 2: 3303.7986: 1.48296 -4.40741 -0.566208 -0.259730 3: 3147.5537: 1.93037 -5.14388 -0.682530 -0.443006 4: 3123.6900: 2.10192 -5.18784 -0.685558 -0.428320 5: 2988.6287: 2.94890 -6.31023 -0.825286 -0.586282 6: 2958.3364: 3.25396 -6.88256 -0.316988 0.572428 7: 2853.7703: 4.22731 -7.44955 -0.279492 -0.294353 8: 2844.8476: 4.36583 -7.43902 -0.293111 -0.267308 9: 2843.2879: 4.39182 -7.44895 -0.298791 -0.265899 10: 2840.2676: 4.44288 -7.47103 -0.310477 -0.263945 11: 2839.0890: 4.46259 -7.48131 -0.315320 -0.263753 12: 2838.8550: 4.46649 -7.48344 -0.316292 -0.263745 13: 2838.3889: 4.47428 -7.48771 -0.318236 -0.263737 14: 2838.3703: 4.47459 -7.48788 -0.318314 -0.263738 15: 2838.2216: 4.47708 -7.48927 -0.318936 -0.263742 16: 2838.2157: 4.47718 -7.48932 -0.318961 -0.263742 17: 2838.2145: 4.47720 -7.48934 -0.318966 -0.263742 18: 2838.2121: 4.47724 -7.48936 -0.318976 -0.263742 19: 2838.2120: 4.47724 -7.48936 -0.318976 -0.263742 20: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 21: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 22: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 23: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 24: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 25: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 26: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 27: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 28: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 29: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 30: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 31: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 32: 2838.2118: 4.47724 -7.48936 -0.318977 -0.263742 33: 2837.8154: 4.46385 -7.47464 -0.495684 -0.263985 34: 2837.7613: 4.46641 -7.47053 -0.498335 -0.264014 35: 2837.6418: 4.47259 -7.46200 -0.501644 -0.264141 36: 2837.5982: 4.47485 -7.45928 -0.502598 -0.264214 37: 2837.5850: 4.47537 -7.45882 -0.502848 -0.264237 38: 2837.5307: 4.47674 -7.45848 -0.503216 -0.264313 39: 2837.5014: 4.47725 -7.45875 -0.503273 -0.264344 40: 2837.4955: 4.47735 -7.45881 -0.503284 -0.264350 41: 2837.4944: 4.47738 -7.45882 -0.503286 -0.264351 42: 2837.4941: 4.47738 -7.45882 -0.503287 -0.264351 43: 2837.4936: 4.47739 -7.45883 -0.503288 -0.264352 44: 2837.4935: 4.47739 -7.45883 -0.503288 -0.264352 45: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 46: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 47: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 48: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 49: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 50: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 51: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 52: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 53: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 54: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 55: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 56: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 57: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 58: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 59: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 60: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 61: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 62: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 63: 2837.4931: 4.47740 -7.45883 -0.503289 -0.264352 By the way, the same model can be fitted using Stata using xtlogit and xtmelogit; a simpler model without
Re: [R] lmer - error asMethod(object) : matrix is not symmetric
This is similar to another question on the list today. On Tue, Feb 16, 2010 at 4:39 AM, Luisa Carvalheiro lgcarvalhe...@gmail.com wrote: Dear R users, I am having problems using package lme4. I am trying to analyse the effect of a continuous variable (Dist_NV) on a count data response variable (SR_SUN) using Poisson error distribution. However, when I run the model: summary(lmer((SR_SUN)~Dist_NV + (1|factor(Farm_code)) , family=poisson, REML=FALSE)) 1 error message and 1 warning message show up: in asMethod(object) : matrix is not symmetric [1,2] In addition: Warning message: In mer_finalize(ans) : singular convergence (7) So the first thing to do is to include the optional argument verbose = TRUE in the call to lmer. (Also, REML = FALSE is ignored for Generalized Linear Mixed Models and can be omitted. although there is no harm in including it.) You need to know where the optimizer is taking the parameter values before you can decide why. P.S. Questions like this will probably be more readily answered on the R-SIG-Mixed-Models mailing list. A model including Dist_NV together with other variables runs with no problems. What am I doing wrong? Thank you, Luisa -- Luisa Carvalheiro, PhD Southern African Biodiversity Institute, Kirstenbosch Research Center, Claremont University of Pretoria Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa telephone - +27 (0) 790250944 carvalhe...@sanbi.org lgcarvalhe...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer - error asMethod(object) : matrix is not symmetric
On Tue, Feb 16, 2010 at 10:54 AM, Luisa Carvalheiro lgcarvalhe...@gmail.com wrote: Dear Douglas, Thank you for your reply. Just some extra info on the dataset: In my case Number of obs is 33, and number of groups of factor(Farm_code) is 12. This is the information on iterations I get: summary(lmer(round(SR_SUN)~Dist_NV + (1|factor(Farm_code)) , family=poisson, verbose =TRUE)) 0: 60.054531: 1.06363 2.14672 -0.000683051 1: 60.054531: 1.06363 2.14672 -0.000683051 Error in asMethod(object) : matrix is not symmetric [1,2] In addition: Warning message: In mer_finalize(ans) : singular convergence (7) When I run a similar model (exp variable Dist_hives) the number of iterations is 11: summary(lmer(round(SR_SUN)~Dist_hives + (1|factor(Farm_code)) , family=poisson, verbose =TRUE)) 0: 61.745238: 0.984732 1.63769 0.000126484 1: 61.648229: 0.984731 1.63769 -2.08637e-05 2: 61.498777: 0.984598 1.63769 4.11867e-05 3: 47.960908: 0.381062 1.63585 6.77029e-05 4: 46.223789: 0.250732 1.66727 8.31854e-05 5: 46.23: 0.250732 1.66727 6.97790e-05 6: 46.216710: 0.250730 1.66727 7.60560e-05 7: 46.168835: 0.230386 1.64883 9.16430e-05 8: 46.165955: 0.228062 1.65658 8.70694e-05 9: 46.165883: 0.228815 1.65737 8.63400e-05 10: 46.165883: 0.228772 1.65734 8.63698e-05 11: 46.165883: 0.228772 1.65734 8.63701e-05 I am very confused with the fact that it runs with Dist_hives and not with Dist_NV. Both variables are distance values, the first having no obvious relation with the response variable and the second (Dist_NV) seems to have a negative effect on SR_SUN. As you say, Dist_hives has very little relationship to the response variable. The two fixed-effects coefficients are the last two parameters in the iteration output (the first parameter is the standard deviation of the random effects). So the slope with respect to Dist_hives for the linear predictor is 0.863. Either you have very large magnitudes of Dist_hives or that variable does not have much predictive power. For the second (Dist_NV) variable, the optimization algorithm is not able to make progress from the starting estimates. This may be an indication that the problem is badly scaled. Are the values of Dist_NV very large? If so, you may want to change the unit (say from meters to kilometers) so the values are much smaller. It may also help to use a starting estimate for the standard deviation of the random effects derived from the other model. That is, include start = 0.22 in the call to lmer. Does this information helps identifying the problem with my data/analysis? Thank you, Luisa On Tue, Feb 16, 2010 at 5:35 PM, Douglas Bates ba...@stat.wisc.edu wrote: This is similar to another question on the list today. On Tue, Feb 16, 2010 at 4:39 AM, Luisa Carvalheiro lgcarvalhe...@gmail.com wrote: Dear R users, I am having problems using package lme4. I am trying to analyse the effect of a continuous variable (Dist_NV) on a count data response variable (SR_SUN) using Poisson error distribution. However, when I run the model: summary(lmer((SR_SUN)~Dist_NV + (1|factor(Farm_code)) , family=poisson, REML=FALSE)) 1 error message and 1 warning message show up: in asMethod(object) : matrix is not symmetric [1,2] In addition: Warning message: In mer_finalize(ans) : singular convergence (7) So the first thing to do is to include the optional argument verbose = TRUE in the call to lmer. (Also, REML = FALSE is ignored for Generalized Linear Mixed Models and can be omitted. although there is no harm in including it.) You need to know where the optimizer is taking the parameter values before you can decide why. P.S. Questions like this will probably be more readily answered on the R-SIG-Mixed-Models mailing list. A model including Dist_NV together with other variables runs with no problems. What am I doing wrong? Thank you, Luisa -- Luisa Carvalheiro, PhD Southern African Biodiversity Institute, Kirstenbosch Research Center, Claremont University of Pretoria Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa telephone - +27 (0) 790250944 carvalhe...@sanbi.org lgcarvalhe...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Luisa Carvalheiro, PhD Southern African Biodiversity Institute, Kirstenbosch Research Center, Claremont University of Pretoria Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa telephone - +27 (0) 790250944 carvalhe...@sanbi.org lgcarvalhe...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] False convergence of a glmer model
On Tue, Feb 16, 2010 at 9:38 AM, Shige Song shiges...@gmail.com wrote: Hi Doug, Thanks. Next time I will post it to the R-SIG0-mixed-models mailing list, as you suggested. I have added R-SIG-mixed-models to the cc: list. I suggest we drop the cc: to R-help after this message. With respect to your question, the answer is no, these parameters do not make sense. Here is the Stata output from exactly the same model: . xi:xtlogit inftmort i.cohort, i(code) i.cohort _Icohort_1-3 (naturally coded; _Icohort_1 omitted) Fitting comparison model: Iteration 0: log likelihood = -1754.4476 Iteration 1: log likelihood = -1749.3366 Iteration 2: log likelihood = -1749.2491 Iteration 3: log likelihood = -1749.2491 Fitting full model: tau = 0.0 log likelihood = -1749.2491 tau = 0.1 log likelihood = -1743.8418 tau = 0.2 log likelihood = -1739.0769 tau = 0.3 log likelihood = -1736.4914 tau = 0.4 log likelihood = -1739.5415 Iteration 0: log likelihood = -1736.4914 Iteration 1: log likelihood = -1722.6629 Iteration 2: log likelihood = -1694.9114 Iteration 3: log likelihood = -1694.6509 Iteration 4: log likelihood = -1694.649 Iteration 5: log likelihood = -1694.649 Random-effects logistic regression Number of obs = 21694 Group variable: code Number of groups = 10789 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 2.0 max = 9 Wald chi2(2) = 8.05 Log likelihood = -1694.649 Prob chi2 = 0.0178 Well, the quantities being displayed in the iteration output from glmer are the deviance and the parameter values. This stata log-likelihood corresponds to a deviance of 3389.3. It is possible that glmer and stata don't measure the log-likelihood on the same scale but, if not, then the estimates where glmer gets stalled are producing a lower deviance of 2837.49. The reason that glmer is getting stalled is because of the coefficient of -7.45883 for the intercept. This corresponds to a mean success probability of 0.0005 for cohort 1. The stata coefficient estimate of -5.214642 corresponds to a mean success probability of 0.0055 for cohort 1, which is still very, very small. The success probabilities for the other cohorts are going to be even smaller. What is the overall proportion of zeros in the response? The optimization to determine the maximum likelihood estimates of the coefficients is being done on the scale of the linear predictor. When the values of the linear predictor become very small or very large, the fitted values become insensitive to the coefficients. The fact the one program converges and another one doesn't may have more to do with the convergence criterion than with the quality of the fit. -- inftmort | Coef. Std. Err. z P|z| [95% Conf. Interval] -+ _Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269 _Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685 _cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067 -+ /lnsig2u | .9232684 .1416214 .6456956 1.200841 -+ sigma_u | 1.586665 .1123528 1.381055 1.822885 rho | .4335015 .0347791 .3669899 .5024984 -- Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob = chibar2 = 0.000 The difference is quite huge, and Stata did not have any difficulties estimating this model, which makes feel that I might get some very basic specification wrong in my R model... Best, Shige On Tue, Feb 16, 2010 at 10:29 AM, Douglas Bates ba...@stat.wisc.edu wrote: On Tue, Feb 16, 2010 at 9:05 AM, Shige Song shiges...@gmail.com wrote: Dear All, I am trying to fit a 2-level random intercept logistic regression on a data set of 20,000 cases. The model is specified as the following: m1 - glmer(inftmort ~ as.factor(cohort) + (1|code), family=binomial, data=d) I got Warning message: In mer_finalize(ans) : false convergence (8) That message means that the optimizer function, nlminb, got stalled. It has converged but the point at which is has converged is not clearly the optimum. In many cases this just indicates that the optimizer is being overly cautious. However, it can
Re: [R] Use of R in clinical trials
On Thu, Feb 18, 2010 at 12:36 PM, Bert Gunter gunter.ber...@gene.com wrote: The key dates are 1938 and 1962. The FDC act of 1938 essentially mandated (demonstration of) safety. The tox testing infrastructure grew from that.At that time, there were no computers, little data, little statistics methodology. Statistics played little role -- as is still mainly the case today for safety. Any safety findings whatever in safety testing raise a flag; statistical significance in the multiple testing framework is irrelevant. 1962 saw the Kefauver-Harris Amendments that mandated demonstration of efficacy. That was the key. The whole clinical trial framework and the relevant statistical design and analysis infrastructure flowed from that regulatory requirement. SAS's development soon after was therefore the first direct response to the statistical software needs that resulted. Note also, that statistical software was in its infancy at this time: before SAS there was Fortran and COBOL; there was no statistical software. So, as you can see, there essentially was **no** before SAS. (Corrections/additional information welcome!) My recollection is that the BMD programs (which, in a later version, became BMDP) predated SAS and were specifically for BioMeDical analysis. Early statistical software was oriented to applications areas: SPSS (Statistical Package for the Social Sciences) was the predominant system used in the social sciences, BMD(P) in biomedical areas and SAS in agricultural/life sciences settings. Eventually the more coherent framework and comparative ease-of-use of SAS (yes, I am saying that with a straight face - in the days of batch jobs submitted on punched cards with data residing on magnetic tape, there were different standards of ease-of-use) won over more users in medical fields. Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Christopher W. Ryan Sent: Thursday, February 18, 2010 10:09 AM To: r-help@r-project.org Cc: p.dalga...@biostat.ku.dk Subject: Re: [R] Use of R in clinical trials Pure Food and Drug Act: 1906 FDA: 1930s founding of SAS: early 1970s (from the history websites of SAS and FDA) What did pharmaceutical companies use for data analysis before there was SAS? And was there much angst over the change to SAS from whatever was in use before? Or was there not such emphasis on and need for thorough data analysis back then? --Chris Christopher W. Ryan, MD SUNY Upstate Medical University Clinical Campus at Binghamton 425 Robinson Street, Binghamton, NY 13904 cryanatbinghamtondotedu If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea. [Antoine de St. Exupery] Bert Gunter wrote: DISCLAIMER: This represents my personal view and in no way reflects that of my company. Warning: This is a long harangue that contains no useful information on R. May be wise to delete without reading. -- Sorry folks, I still don't understand your comments. As Cody's original post pointed out, there are a host of factors other than ease of programmability or even quality of results that weigh against any change. To reiterate, all companies have a huge infrastructure of **validated SAS code** that would have to be replaced. This, in itself, would take years and cost tens of millions of dollars at least. Also to reiterate, it's not only statistical/reporting functionality but even more the integration into the existing clinical database systems that would have to be rewritten **and validated**. All this would have to be done while continuing full steam on existing submissions. It is therefore not surprising to me that no pharma company in its right mind even contemplates undertaking such an effort. To put these things into perspective. Let's say Pfizer has 200 SAS programmers (it's probably more, as they are a large Pharma, but I dunno). If each programmer costs, conservatively, $200K U.S. per year fully loaded, that's $40 million U.S. for SAS Programmers. And this is probably a severe underestimate. So the $14M quoted below is chicken feed -- it doesn't even make the radar. To add further perspective, a single (large) pivotal clinical trial can easily cost $250M . A delay in approval due to fooling around trying to shift to a whole new software system could easily cause hundreds of million to billions if it means a competitor gets to the marketplace first. So, to repeat, SAS costs are chicken feed. Yes, I suppose that the present system institutionalizes mediocrity. How could it be otherwise in any such large scale enterprise? Continuity, reliability, and robustness are all orders of magnitude more important for both the FDA and Pharma to get safe and efficacious drugs to the public. Constantly
Re: [R] Print table in lme
On Sat, Feb 20, 2010 at 4:28 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Daniel-6 wrote: Hello, I'm trying to add lme results in a table with lm coef results, but as I know, estout or xtabel cannot support lme objects. I'm a new in R and I'll appreciate some helpful comments. I don't know what estout and xtabel do, but if you want to extract the core results of lme, the following should give you a starter. For latex output, you could use latex in Hmisc. Dieter library(nlme) fm1 = summary(lme(distance ~ age, data = Orthodont)) tabl = fm1$tTable tabl str(tabl) Thanks, Dieter. If lmemod is a fitted lme model then, as Dieter indicates, you want summary(lmemod)$tTable and, for xtable you can create the xtable object as xtable(summary(lmemod)$tTable) For most other model fitting functions the sequence coef(summary(mymodel)) works. Unfortunately, in the nlme package we assigned another meaning to the coef method. In retrospect that wasn't the best idea. The coef(summary(foo)) sequence does work for models fit by the lmer function in the lme4 package. In fact, I was just in communication with David Dahl, the author of xtable, and I beleive he will add a method to the xtable package. -- View this message in context: http://n4.nabble.com/Print-table-in-lme-tp1562325p1562727.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print table in lme
On Sat, Feb 20, 2010 at 10:27 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Douglas Bates-2 wrote: On Sat, Feb 20, 2010 at 4:28 AM, Dieter Menne For most other model fitting functions the sequence coef(summary(mymodel)) works. Unfortunately, in the nlme package we assigned another meaning to the coef method. In retrospect that wasn't the best idea. Good to know. I used to stutter some incomprehensible stuff when students asked me about this. Luckily, the frequency of that question has declined in favor of: why are there no incomprehensiblevalues in lmer tables? Aha, you have caught me out. Failing to provide p-values was a clever diversionary tactic on my part. -- View this message in context: http://n4.nabble.com/Print-table-in-lme-tp1562325p1562922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get problem
On Wed, Feb 24, 2010 at 3:56 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 24/02/2010 4:31 PM, Georg Ehret wrote: Dear R communtiy, I do not understand why this does not work...: betaS$SBP [1] 0.03274 -0.04216 -0.08986 -0.45980 0.60320 -0.63070 -0.05682 0.20130 t-c(betaS$SBP) t [1] betaS$SBP get(t) Error in get(t) : object 'betaS$SBP' not found The problem is that betaS$SBP is an expression that extracts a component of the betaS object, it's not the name of an object. get() only gets single objects, it doesn't evaluate expressions. You need something like eval(parse(text=t)) to get what you want. But if you have read fortune(rethink) you may approach the problem differently. P.S. I agree with Thomas, which may seem strange to anyone who has read the nlme sources. :-) [I am trying to use the variable t in a loop to call many different objects, but the pasting together did not work and this simple example does not neither...] Thank you and best regards, Georg. *** Georg Ehret JHU Baltimore [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross Compiling
An alternative to cross-compiling when you have a source package is to use the Win-builder facility at win-builder.R-project.org Thanks to Uwe for providing this facility. I find it much, much easier than trying to cross-compile or to set up a Windows computer for compiling R packages. On 9/14/07, Scott Hyde [EMAIL PROTECTED] wrote: Hello All, I have a Linux computer and do all of my work from it. However, I teach also, which means that many of my students use windows. Hence, I need to create packages that work under windows as well as Linux. I have tried to follow the directions at http://cran.r-project.org/doc/contrib/cross-build.pdf which is the document Building Microsoft Windows Versions of R and R packages under Intel Linux. This has been very helpful. However, the file R_Tcl.zip is no longer available, so I cannot compile R for Windows using the make R command as described in the document. Is it necessary to have the Tcl sources in there? If it is, how should the directions be modified to enable the complete compilation of R? None of my code contains C, Fortran, or any other language. It is just plain R code. I would think that this would be easier to convert over. Is it? I tried the following and it seems to work, but I'd like to know if it is safe. 1. Build package with pre-compiled binary package option R CMD build --binary pkgname 2. convert the resulting tar.gz file to a zip archive. 3. Install it on a windows machine. This process successfully works when I install it on a windows machine, but I have no idea how safe it is. -- * Scott K. Hyde Assistant Professor of Statistics and Mathematics School of Computing Brigham Young University -- Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing the fixed- and random-effects variance-covariance matrices of an nlme model
On 9/26/07, Rob Forsyth [EMAIL PROTECTED] wrote: I would appreciate confirmation that the function vcov(model.nlme) gives the var-cov matrix of the fixed effects in an nlme model. It gives the approximate variance-covariance matrix for the fixed-effects parameters in the model. (Exact variance-covariance matrices are very difficult to evaluate for nonlinear models and even more difficult to evaluated for nonlinear mixed-effects models.) The documentation for the generic function vcov and some of its methods can be accessed by ?vcov.lme Presumably the random-effects var-cov matrix is given by cov(ranef (model.nlme)? No. The BLUPs of the random effects (actually as Alan James described the situation, For a nonlinear model these are just like the BLUPs (Best Linear Unbiased Predictors) except that they are not linear, and they're not unbiased, and there is no clear sense in which they are best but, other than that, ...) are not guaranteed to have an observed variance-covariance matrix that corresponds to the estimate of the variance-covariance matrix of the random effects. These estimates are returned by the VarCorr function. See ?VarCorr The implementation of VarCorr in the nlme package is not optimal in that it returns the result as a character matrix and you need to covert to a numeric representation if you want to do anything other than print it out. I prefer the implementation in the lme4 package which returns a list of numeric matrices that are formatted by a specific function (called formatVC but hidden in the lme4 package's namespace) when needed. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitted values in LMER for the fixed-effects only
On 9/28/07, Anouk Simard [EMAIL PROTECTED] wrote: Hi, I would like to extract the fitted values from a model using LMER but only for the fix portion of the model and not for the fix and random portion (e.g it is the procedure outpm or outp in SAS). I am aware of the procedure fitted() but I not sure it give the fitted values both for the fixed and random or only the fixed. I looked in the r help and the r list and I haven't not found much details about it excepted this comments from Douglas Bates in January 2006 : Would it help if there were an option in the fitted method to allow for fixed-effects only versus fixed- and random-effects? As you say, because lmer models do not need to be hierarchical it is not obvious what it would mean to include some but not all of the random effects terms in the fitted values. However, it is easy and unambiguous to define fitted values for the fixed-effects only. Up until a few days ago there was an option to do this but then I changed the calculation of the fitted values in an attempt to clean up the code. The calculation of the level = 0 fitted values in the new representation of the fitted model is quite easy. It is [EMAIL PROTECTED] %*% fixef(fm1) I tried the last formula, but I want to confirm that this is still the proper and best way to get fitted values for the fix effects using lmer. Yes. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitted values in LMER for the fixed-effects only
On 9/29/07, Dieter Menne [EMAIL PROTECTED] wrote: On 9/28/07, Anouk Simard Anouk.Simard at bio.ulaval.ca wrote: I would like to extract the fitted values from a model using LMER but only for the fix portion of the model and not for the fix and random portion (e.g it is the procedure outpm or outp in SAS). .. Quoting Douglas Bates bates at stat.wisc.edu The calculation of the level = 0 fitted values in the new representation of the fitted model is quite easy. It is [EMAIL PROTECTED] %*% fixef(fm1) Douglas Bates replied Yes. Mmm.. If I consider a few hundreds of messages in this list on the subject, it's considered dirty to use the [EMAIL PROTECTED] or fm1$X construct. So should we expect that this is final (no longer fitted(), augPred etc), or is it work in prooogres? This is to indicate that I am rather slow in getting the lme4 package finished? :-) I agree that I am and no one is more impatient with the progress than I. You are correct that it is considered bad form to reach into a fitted model structure and pull out individual components or slots. My brief answer agreeing that this was still the best way of obtaining these particular values should have emphasized the still. I haven't thought of a good way of specifying this particular form in, say, the fitted method. There is a clean way of specifying the levels of random effects incorporated into the fitted values for models with nested random effects only. However, for crossed or partially crossed random effects this is less clearly defined. Perhaps a better alternative would be to define a method for the model.matrix generic for lmer models. Then the calculation could be written as model.matrix(fm1, type = fixed) %*% fixef(fm1) using only extractor functions. It may be surprising that sometimes the most difficult part of the design of the code in a package is deciding on the names and arguments of functions or methods. In my experience rushing such decisions often leads to a lot of maintenance headaches. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme (or lmer) question
On 10/5/07, Bert Gunter [EMAIL PROTECTED] wrote: Folks: In the following mixed effect model, the x and residual variance components are nonidentifiable. Yet lme() (or the equivalent in lmer()) happily produces an answer in which the sum of the variance components is the correct estimate of the single variance parameter. Why? -- i.e. why doesn't lme complain? x - 1:50 y - rnorm(50) m1 - lme( y ~ 1, rand = ~1|x) Because we hadn't anticipated that someone would try to do that when we wrote lme? It's the situation of being unable to make the code foolproof because the fools are so inventive. :-) The lmer function (at least the development version) does throw an error for that example although the error message is less than transparent. Creating a kinder, gentler error message for this case is on the ToDo list x - 1:50 y - rnorm(50) lmer(y ~ 1 + (1|x)) Error: length(levels(dm$flist[[1]])) length(Y) is not TRUE The fact that you get a more-or-less arbitrary answer is related to the way that the optimization of the log-likelihood (or the REML criterion) is performed in lme. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] estfun d
On 10/8/07, Abdus Sattar [EMAIL PROTECTED] wrote: Hello EVERYONE, I need an URGENT help from you please! How can I see the estfun (empirical estimating function) and df (degree of freedom) from the following mixed-model please? (fm1 - lmer2(Reaction ~ Days + (Days|Subject), sleepstudy)) You have asked this question several times on this list and on the R-SIG-Mixed-Models list. Repeating the question ad nauseum is unlikely to generate a helpful answer. To the best of my knowledge there is no lmer or lmer2 method for the estfun generic. Others may be willing to give you some references on the difficulty in determining degrees of freedom issue - I don't want to rehash it myself. So, despite your repeated requests, there are no good answers to your questions. You cannot see such results because there are, as far as I know, no functions to provide such results. You are welcome to write your own code to provide them. If that is not suitable you may need to take advantage of our money-back guarantee and remove R from your computer to qualify for a full refund of the purchase price. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] estfun df
On 10/8/07, Achim Zeileis [EMAIL PROTECTED] wrote: On Mon, 8 Oct 2007, Abdus Sattar wrote: Hello EVERYONE, I need an URGENT help from you please! This type of requests is not considered to be very polite, please have a look at the posting guide. How can I see the estfun (empirical estimating function) I guess (because you are not telling us) that you would like to have an estfun() method for lmer objects. I don't provide one in my sandwich package (where the estfun generic is taken from) and AFAIK there is nothing analagous readily available in lme4... But I guess that you should be able to extract/compute the empirical estimating functions from the fitted lmer object. and df (degree of freedom) from the following mixed-model please? (fm1 - lmer2(Reaction ~ Days + (Days|Subject), sleepstudy)) When you compute logLik(fm1) it returns a logLik object that has a df attribute: attr(logLik(fm1), df) That degrees of freedom is a count of the number of parameters estimated in the model. That may be what Abdus Sattar wants but I'm not sure. Many times when people refer to the degrees of freedom for linear mixed models they are thinking about the degrees of freedom for a t-statistic or the denominator degrees of freedom for an F statistic related to hypothesis tests on the fixed-effects parameters. That was what I was referring to in my response when I, in effect, said I don't want to talk about it any more. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme4 install trouble
On 10/11/07, David Reitter [EMAIL PROTECTED] wrote: After upgrading to R 2.6.0, I'm having trouble running lmer: model - lmer(primed ~ log(dist.time)*role + 1|target.utt, data=data.utts) Error in UseMethod(as.logical) : no applicable method for as.logical That problem originates in the Matrix package. You need to upgrade the Matrix package to version 0.999375-3 for R-2.6.0. After that the installed version of lme4 will function properly. So I thought I'd upgrade lme4 to the latest version, but unfortunately the compilation fails - perhaps there's a missing #include: I think this might be a version skew with the Matrix package. Those macros are defined in a file in the .../R/library/Matrix/include directory that is added to the include path but you need a recent version of the Matrix package to get those particular macros. R CMD INSTALL -l ~/usr/lib/R lme4_0.99875-8.tar.gz * Installing *source* package 'lme4' ... ** libs gcc -std=gnu99 -I/home/dreitter/usr/lib/R/include -I/home/dreitter/ usr/lib/R/include -I/usr/local/include -I/home/dreitter/usr/lib/R/ library/Matrix/include -fpic -g -O2 -c glmer.c -o glmer.o glmer.c: In function 'internal_Gaussian_deviance': glmer.c:185: error: 'CHM_SP' undeclared (first use in this function) glmer.c:185: error: (Each undeclared identifier is reported only once glmer.c:185: error: for each function it appears in.) glmer.c:185: error: expected ';' before 'Lm' glmer.c:186: error: 'CHM_FR' undeclared (first use in this function) glmer.c:186: error: expected ';' before 'Lcp' glmer.c:189: error: 'CHM_DN' undeclared (first use in this function) glmer.c:189: error: expected ';' before 'Ltb' glmer.c:195: error: 'Lcp' undeclared (first use in this function) glmer.c:196: error: 'Lm' undeclared (first use in this function) glmer.c:197: error: 'chb' undeclared (first use in this function) glmer.c:197: error: 'Ltb' undeclared (first use in this function) glmer.c: In function 'internal_bhat': glmer.c:396: error: 'CHM_FR' undeclared (first use in this function) glmer.c:396: error: expected ';' before 'L' glmer.c:403: error: 'L' undeclared (first use in this function) glmer.c: In function 'glmer_MCMCsamp': glmer.c:657: error: 'CHM_FR' undeclared (first use in this function) glmer.c:657: error: expected ';' before 'L' glmer.c:680: error: 'L' undeclared (first use in this function) make: *** [glmer.o] Error 1 ERROR: compilation failed for package 'lme4' I've installed Matrix_0.999375-3 and lattice_0.16-5. Any pointers would be appreciated. -- David Reitter ICCS/HCRC, Informatics, University of Edinburgh http://www.david-reitter.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rearrange data columns
On 10/11/07, Martin Ivanov [EMAIL PROTECTED] wrote: Dear R users, I need to to the the following. Let a= 1 2 3 4 5 6 and b= -1 -2 -3 be (2x3) matrices. -4 -5 -6 I need to combine the two matrices into a new (2x6) matrix like this: ab = ( 1 -1 2 -2 3 -3 ) 4 -4 5 -5 6 -6 How can this be done in R? (a - matrix(1:6, nr = 2)) [,1] [,2] [,3] [1,]135 [2,]246 (b - -a) [,1] [,2] [,3] [1,] -1 -3 -5 [2,] -2 -4 -6 (ans - rbind(a, b)) [,1] [,2] [,3] [1,]135 [2,]246 [3,] -1 -3 -5 [4,] -2 -4 -6 dim(ans) - c(2, 6) ans [,1] [,2] [,3] [,4] [,5] [,6] [1,]1 -13 -35 -5 [2,]2 -24 -46 -6 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reference for logistic regression
On 10/11/07, Douglas Bates [EMAIL PROTECTED] wrote: On 10/11/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear list, first accept my apologies for asking a non-R question. Can anyone point me to a good reference on logistic regression? web or book references would be great. I am interested in the use and interpretation of dummy variables and prediction models. I checked the contributed section in the CRAN homepage but could not find anything (Julian Faraway´s practical Regression and ANOVA using R does not cover logistic regression) The wikipedia article on logistic regression (http://en.wikipedia.org/wiki/Logistic_regression) contains a brief description and some references. Statisticians often consider logistic regression to be an example of a more general class of models called generalized linear models, which is why the R function to fit a logistic regression model is called glm. There is a link in the logistic regression wikipedia article to the generalized linear model article. Whenever you use wikipedia you should be cautious of the quality of the information in the articles. Generally the articles are good as a brief introduction but they can and do contain errors so you should check important facts and not take them at face value. A person in one of my classes asked about the standard deviation and I suggested that they look at the wikipedia article on the topic. Then I looked at it myself and saw that one of the things mentioned is that the standard deviation of the Cauchy distribution is undefined, which is true, but the reason given is because E[X] is undefined, which is not true. As several people have pointed out to me privately, I'm the one spreading misinformation, not the authors of the wikipedia article. My, apparently faulty, recollection was that E[X] was defined (because the density is symmetric) but E[X^2] was not defined. It looks like I need to review some elementary properties of distributions and integrals. Note to self: Don't respond to mailing list postings until fully caffinated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic plot question: Figure 1.1 Pinheiro Bates
On 10/12/07, David Afshartous [EMAIL PROTECTED] wrote: All, Sorry for overly simplistic question, but I can't seem to remember how to create the basic plot shown in Figure 1.1 of Pinheiro Bates (2004; p.4). The y-axis delineates a factor (Rail) while the x-axis displays the distribution of a continuous variable (time) according to each level of the factor. Didn't see it in archives but perhaps I'm not searching on correct key words, any help appreciated, David library(nlme) Rail Grouped Data: travel ~ 1 | Rail Rail travel 1 1 55 2 1 53 3 1 54 4 2 26 5 2 37 6 2 32 7 3 78 8 3 91 9 3 85 104 92 114100 124 96 135 49 145 51 155 50 166 80 176 85 186 83 That plot can be reproduced by library(lattice) data(Rail, package = nlme) dotplot(Rail ~ travel, Rail) However, this relies on the Rail$Rail factor being ordered by increasing mean travel time, which is fine for the plot but may get in the way of other uses of the data. In a way we only want to assign an order to the levels of the Rail factor for the purposes of the plot. As Deepayan Sarkar mentions in his useR!2007 presentation (http://user2007.org/program/presentations/sarkar.pd) he has done a considerable amount of upgrading of the lattice package, relative to the original design of Trellis graphics, in part to make the nlme plots easier to create. The version of the Rail data in the MEMSS package removes the ordering on the levels of the Rail factor so it defaults to the original order. Compare data(Rail, package = MEMSS) dotplot(Rail ~ travel, Rail) dotplot(reorder(Rail, travel) ~ travel, Rail) The latter plot is the way I would recommend constructing that plot now. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef se in lme
On 10/15/07, Doran, Harold [EMAIL PROTECTED] wrote: ?vcov The vcov method returns the estimated variance-covariance matrix of the fixed-effects only. I think Irene's question is about the combination of the fixed-effects parameters and the BLUPs of the random effects that is returned by the coef method applied to an lmer object. (You may recall that you were the person who requested such a method in lme4 like the coef method in nlme :-) On the face of it this quantity should be easy to define and evaluate but in fact it is not easy to do so because these are combinations of model parameters (the fixed effects) and unobserved random variables (the random effects). It gets a bit tricky trying to decide what the variance of this combination would be. I think there is a sensible definition, or at least a computationally reasonable definition, but there are still a few slippery points in the argument. Lately I have taken to referring to the estimates of the random effects, what are sometimes called the BLUPs or Best Linear Unbiased Predictors, as the conditional modes of the random effects. That is, they are the values that maximize the density of the random effects given the observed data and the values of the model parameters. For a linear mixed model the conditional distribution of the random effects is multivariate normal so the conditional modes are also the conditional means. Also, we can evaluate the conditional variance-covariance matrix of the random effects up to a scale factor. The next part is where things get a bit hazy for me but I think it makes sense to consider the joint distribution of the estimator of the fixed-effects parameters and the random effects conditional on the data and, possibly, on the variance components. Conditional on the relative variance-covariance of the random effects (i.e. the matrix that occurs as the penalty term in the penalized least squares representation of the model) the joint distribution of the fixed-effects estimators and the random effects is multivariate normal with mean and variance-covariance matrix determined from the mixed-model equations. This big (p+q by p+q, where p is the dimension of the fixed effects and q is the dimension of the random effects) variance-covariance matrix could be evaluated and, from that, the variance of any linear combination of components. However, I have my doubts about whether it is the most sensible answer to evaluate. Conditioning on the relative variance-covariance matrix of the random effects is cheating, in a way. It would be like saying we have a known variance, $\sigma^2$ when, in fact, we are using an estimate. The fact that we don't know $\sigma^2$ is what gives rise to the t distributions and F distributions in linear models and we are all trained to pay careful attention to the number of degrees of freedom in that estimate and how it affects our ideas of the precision of the estimates of other model parameters. For mixed models, though, many practioners are quite comfortable conditioning on the value of some of the variance components but not others. It could turn out that conditioning on the relative variance-covariance of the random effects is not a big deal but I don't know. I haven't examined it in detail and I don't know of others who have. Another approach entirely is to use Markov chain Monte Carlo to examine the joint distribution of the parameters (in the Bayesian sense) and the random effects. If you save the fixed effects and the random effects from the MCMC chain then you can evaluate the linear combination of interest throughout the chain and get an empirical distribution of the quantities returned by coef. This is probably an unsatisfactory answer for Irene who may have wanted something quick and simple. Unfortunately, I don't think there is a quick, simple answer here. I suggest we move this discussion to the R-SIG-Mixed-Models list which I am cc:ing on this reply. -Original Message- From: [EMAIL PROTECTED] on behalf of Irene Mantzouni Sent: Mon 10/15/2007 3:20 PM To: [EMAIL PROTECTED] Subject: [R] coef se in lme Hi all! How is it possible to estimate standard errors for coef obtained from lme? Is there sth like se.coef() for lmer or what is the anaytical solution? Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
[R] Appropriate measure of correlation with 'zero-inflated' data?
I have reached the correlation section in a course that I teach and I hit upon the idea of using data from the weekly Bowl Championship Series (BCS) rankings to illustrate different techniques for assessing correlation. For those not familiar with college football in the United States (where football refers to American football, not what is called soccer here and football in most other countries) I should explain that many, many universities and colleges have football teams but each team only plays 10-15 games per season, so not every team will play every other team. The game is so rough that it is not feasible to play more than one match per week and a national playoff after the regular season is impractical. It would take too long and the players are, in theory, students first and athletes second. In place of a national playoff there are various polls of coaches or sports writers that purport to rank teams nationally. Several analysts also publish computer-based rankings that use complicated formulas based on scores in individual games, strength of the opponent, etc. to rank teams. Rankings from two of the human polls (the Harris poll of sports writers and the USA Today poll of the coaches) and from six of the computer polls are combined to produce the official BCS ranking. The Wikipedia entry for Bowl Championship Series gives the history and evolution of the actual formula that is currently used. This season has been notable for the volatility of those rankings. One is reminded of the biblical prophesy that The first shall be last and the last shall be first. Another notable feature this year is the extent to which the computer-based rankings and the rankings in the human polls disagree. I enclose a listing of the top 25 teams and the components of the rankings as of last Sunday (2007-10-21). (Almost all college football games are played on Saturdays and the rankings are published on Sundays). The columns are Rec - won-loss record Hvot - total number of Harris poll votes Hp - proportion of maximum Harris poll votes HR - rank in the Harris poll (smaller is better) Uvot, Up, UR - same for the USA Today poll Cavg - Average score (it's actually a trimmed mean) on computer-based rankings (larger is better) BCS - BCS score - the average of Hp, Up and Cavg Pre - BCS rank in the previous week As I understand it, the votes in the Harris and USA Today polls are calculated by asking each voter to list their top 25 teams then awarding 25 points for a team ranked 1, 24 points for a team ranked 2, etc. on each ballot and calculating the total. Apparently there are now 114 Harris poll participants and 60 USA Today poll participants giving maximum possible scores of 2850 and 1500, respectively. The Cavg column is calculated from 6 scores of 0 to 25 (larger is better) dropping the largest and smallest scores. The raw score is out of 100 and the proportion is reported as Cavg. The data frame is available (for a little while) as http://www.stat.wisc.edu/~bates/BCS.rda The raw scores and the rankings from the Harris and USA Today polls are in fairly good agreement but the Cavg scores are very different. Although scatterplots will show this I feel that correlation measures may be thrown off by the large number of zeros in the Cavg scores. What would be the preferred of measuring correlation in such a case? What would be a good graphical presentation showing the lack of agreement of the various components of the BCS score? Rec Hvot Uvot Cavg Pre Hp HR Up URBCS Ohio St. (8-0) 2847 1498 0.93 2 0.99895 1 0.9987 1 0.9759 Boston College (7-0) 2676 1412 0.97 23 0.93895 2 0.9413 2 0.9501 LSU(7-1) 2550 1319 0.96 31 0.89474 3 0.8793 3 0.9114 Arizona St.(7-0) 2003 1089 0.86 35 0.70281 8 0.7260 7 0.7629 Oregon (6-1) 2281 1225 0.67 3 0.80035 5 0.8167 5 0.7623 Oklahoma (7-1) 2521 1306 0.51 32 0.88456 4 0.8707 4 0.7551 West Virginia (6-1) 2157 1134 0.61 36 0.75684 6 0.7560 6 0.7076 Virginia Tech (6-1) 1831 1052 0.69 4 0.64246 10 0.7013 9 0.6779 Kansas (7-0) 1671 911 0.75 6 0.58632 11 0.6073 10 0.6479 South Florida (6-1) 1627 813 0.81 13 0.57088 12 0.5420 12 0.6410 Florida(5-2) 1867 906 0.61 8 0.65509 9 0.6040 11 0.6230 USC(6-1) 2100 1060 0.17 7 0.73684 7 0.7067 8 0.5378 Missouri (6-1) 1568 790 0.53 9 0.55018 13 0.5267 13 0.5356 Kentucky (6-2) 1156 604 0.55 34 0.40561 15 0.4027 15 0.4528 Virginia (7-1) 650 466 0.76 12 0.22807 20 0.3107 18 0.4329 South Carolina (6-2) 1031 474 0.39 33 0.36175 17 0.3160 17 0.3559 Hawaii (7-0) 1265 617 0.00 11 0.44386 14 0.4113 14 0.2851 Georgia(5-2) 711 402 0.23 14 0.24947 19 0.2680 19 0.2492 Texas (6-2) 1054 527 0.00 15 0.36982 16 0.3513 16 0.2404 Michigan (6-2) 643 325 0.26 18 0.22561 21 0.2167 21 0.2341 California (5-2) 873 397 0.02 5 0.30632 18 0.2647 20 0.1970 Auburn
Re: [R] R routines vs. MATLAB/SPSS Routines
On 10/26/07, Frank Thomas [EMAIL PROTECTED] wrote: ... BTW: Contrary to some ideas both R SPSS can be programmed and the algorithms for both have been published. So, no matter whether open source or private property you know what you do (if you want). This is off the point of Matt's original question and I apologize for hijacking the thread but where have the algorithms that underly the SPSS procedures been published? My particular interest is in the methods for the linear mixed models implemented in MIXED in SPSS (and also PROC MIXED in SAS). A person who was quite enthusiastic about the MIXED procedure in SPSS sent me a PDF file about MIXED that I suppose could be considered a description of the algorithms as long as you didn't read it too closely. However, the descriptions are far too vague to use them as a basis for writing code and furthermore they jump back and forth between two or three different representations of the model without tying the different threads together. There is no indication of what representation forms the basis of the code and how the calculations are implemented. Even more alarming, parts of it are flat-out wrong. Even the mixed-model equations as given in this document are wrong, as one would quickly find out if one tried to implement them. The organization is disjointed and generally the language and grammar indicate that it has not been copy edited carefully. I would not give it a good grade if it were submitted as a project report in my statistical computing course. I have been unable to trace the source of this document. It is definitely a discussion of the computational algorithms in MIXED but I haven't been able to track its original source. In a way I hope it was a preliminary draft or something like that. If SPSS released this version as an official publication it is a sign that they have fallen on hard times. If one simply needs to have a reference to cite regarding the calculations done n your analysis then a document like this, suitably corrected, might do. I don't know if this is what you meant by saying that the algorithms have been published but if it is then this is not sufficient for the purposes of investigating the computational methods. A few years ago Brian Ripley mentioned in a presentation, I believe to the Royal Statistical Society, the need for a reference implementation of a statistical computing technique. That is, one should provide code that is accessible to, usable by and modifyable by other researchers in the field. (Brian: If you know the presentation to which I am referring, can you provide a reference or URL please?) Perhaps not for students who simply need to apply a few statistical techniques but certainly for researchers in the statistical techniques and computational methods there is a world of difference between a closed-source implementation with a supplementary document describing the alleged computation method and an open-source implementation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there a similar function to perform repeated statements asin SAS PROC MIXED?
On 10/24/07, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: you can wave a look at the gls() function in package nlme, and specifically at the `weights' and `correlation' arguments; the same arguments are also available in the lme() function. That is certainly a way of fitting a correlation structure directly but I don't think of that as a mixed-effects model. To me a mixed-effects model is a model with both fixed-effects parameters and random effects. It is true that SAS PROC MIXED will fit such a model but I don't think that necessarily makes it a mixed-effects model. I think the SAS formulation of mixed-effects models blurs many distinctions and leads to considerable confusion. At least it confuses me :-) It doesn't help that the terminology used in PROC MIXED is so sloppy as to be nonsensical. Gallon Li uses the SAS terminology of an unstructured variance-covariance matrix. Perhaps I am being pedantic but anyone who was not raised in a SAS-speaking environment would, upon reflection, realize that this is asking for an unstructured, highly-structured matrix. It's an oxymoron. Any definition I have ever seen of the variance-covariance matrix of a random vector requires that the matrix be symmetric (hence square) and at least positive semi-definite, if not positive-definite. Even in the one-dimensional case I would interpret an unstructured variance matrix to be an unconstrained 1 by 1 matrix whose element is required to be non-negative. I have difficulty grasping the concept of an unstructured matrix that is required to satisfy several complicated constraints. If the term had been a general variance-covariance matrix I think I could understand what it was supposed to mean. The REPEATED statement propagates the dangerous misconception that there is a single model which is appropriate for any case of repeated measures. The approach I would recommend for a person using R is to plot the data, formulate a preliminary model, fit the model, examine the residuals, modify the model if appropriate, rinse and repeat. In the days when SAS was being developed the rinse and repeat part could take several days so it made sense to try a one size fits all model. I don't feel that makes sense any more. Let me describe what I understand the REPEATED model with an unconstrained variance-covariance matrix to mean. If I have it wrong I hope you or others reading this will correct me. Suppose the observations are indexed by subject and occasion. For definiteness, consider the first 5 days (Days 0 to 4) of data from the sleepstudy data in the lme4 package. Convert the Days variable to a factor, which will will call occ for occasion, and incorporate the indicators for the occasion in both the fixed effects and the random effects indexed by subject. sleep04 - subset(sleepstudy, Days 5) sleep04$occ - factor(sleep04$Days) show(fm1 - lmer(Reaction ~ occ - 1 + (occ - 1|Subject), sleep04)) Linear mixed-effects model fit by REML Formula: Reaction ~ occ - 1 + (occ - 1 | Subject) Data: sleep04 AIC BIC logLik MLdeviance REMLdeviance 808 858 -384 792.7 768 Random effects: Groups Name Variance Std.Dev. Corr Subject occ0 987.132 31.4187 occ1 1072.423 32.7479 0.769 occ2 823.516 28.6970 0.493 0.808 occ3 1464.770 38.2723 0.481 0.769 0.913 occ4 1764.239 42.0028 0.465 0.673 0.722 0.939 Residual45.184 6.7219 Number of obs: 90, groups: Subject, 18 Fixed effects: Estimate Std. Error t value occ0 256.652 7.573 33.89 occ1 264.496 7.880 33.57 occ2 265.362 6.947 38.20 occ3 282.992 9.159 30.90 occ4 288.649 10.026 28.79 Correlation of Fixed Effects: occ0 occ1 occ2 occ3 occ1 0.737 occ2 0.470 0.770 occ3 0.464 0.741 0.875 occ4 0.449 0.651 0.694 0.914 This model requires estimation of 15 variance-covariance parameters using data from only 18 subjects. You can fit such a model to more that 5 occasions from these data but I'm sure the fits are badly overparameterized. Let me emphasize that I am not criticizing you for your perfectly reasonable answer to Gallon Li's question, nor am I criticizing Gallon Li for the question. I am simply expressing my frustration at the sloppy terminology and one size fits all mentality implicit in saying that this model is the repeated measures model. As an example of why I say that the REPEATED statement leads to dangerous misconceptions, I'll give a bit of the story of the PBG data set from the nlme package. The data came from an experiment by a cardiologist who was quite knowledgeable regarding various statistical techniques. He used these data in a tutorial paper about repeated measures analysis published in a cardiology journal. The repeated measures analysis was somewhat disappointing in that it did not show a significant effect for treatment. However a plot of the data, such as Fig. 3.4 in Pinheiro and Bates (2000), which can be reproduced by
Re: [R] Appropriate measure of correlation with'zero-inflated' data?
I have created some slides on the Bowl Championship series data for my class. Copies of just that part are available in http://www.stat.wisc.edu/~bates/BCS.zip Two interesting things about the analysis are that a parallel coordinate plot shows the differences in agreement quite clearly and that axes on the biplot from a principal components analysis are readily interpretable. It's a great example for those teaching multivariate analysis. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence Intervals for Random Effect BLUP's
On Nov 9, 2007 1:15 PM, Rick Bilonick [EMAIL PROTECTED] wrote: On Fri, 2007-11-09 at 18:55 +, Prof Brian Ripley wrote: I think Bert's point is important: I picked up a student on it in a case study presentation on this week because I could think of three interpretations, none strictly confidence intervals. I think 'tolerance interval' is fairly standard for prediction of a random quantity: see ?predict.lm. I think prediction interval is what is usually used. Regardless, I'm not sure how predict.lm will be of much help because I asked specifically about BLUP's for random effects and the last time I checked lm did not handle mixed effects models. Neither predict.lme and predict.lmer provide intervals. Here is the code that I included in my original e-mail. My simple question is, will this code correctly compute a prediction interval for each subjects random effect? In particular, will the code handle the bVar slot correctly? Some postings warned about inappropriate access to slots. Here is the code that I asked about in my original e-mail: Regarding the terminology, I prefer to call the quantities that are returned by the ranef extractor the conditional modes of the random effects. If you want to be precise, these are the conditional modes (for a linear mixed model they are also the conditional means) of the random effects B given Y = y, evaluated at the parameter estimates. One can also evaluate the condtional variance-covariance of B given Y = y and hence obtain a prediction interval. These are returned by ranef when the optional argument postVar is TRUE. I think the intervals you want are what are shown in the caterpillar plots. Try library(lme4) qqmath(ranef(fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy), postVar = TRUE)) BTW, the reason that I say conditional modes, rather than conditional means, is so the term can apply to generalized linear mixed models, nonlinear mixed models and generalized nonlinear mixed models. The conditional distribution of B given Y is a continuous multivariate distribution that can be evaluated explicitly for a linear mixed model but is known only up to a constant for the generalized forms of the model. We can still optimize the conditional density of B given Y = y for particular values of the parameters but doing so provides the conditional modes, not the conditional means. # OrthoFem has all the females from Orthodont from the nlme package library(lme4) fm1OrthF. - lmer(distance~age+(age|Subject), data=OrthoFem) lmer(distance~age+(age|Subject), data=OrthoFem)@bVar$Subject[2,2,]* (attr(VarCorr(lmer(distance~age+(age| Subject),data=OrthoFem)),sc)^2)[1] (attr(VarCorr(fm1OrthF.),sc)^2)[1] fm1.s - coef(fm1OrthF.)$Subject fm1.s.var - [EMAIL PROTECTED](attr(VarCorr(fm1OrthF.),sc)^2)[1] fm1.s0.s - sqrt(fm1.s.var[1,1,]) fm1.s0.a - sqrt(fm1.s.var[2,2,]) fm1.s[,1]+outer(fm1.s0.s, c(-2,0,2)) fm1.s[,2]+outer(fm1.s0.a, c(-2,0,2)) fm1.s (Intercept) age F1014.48493 0.3758608 F0917.26499 0.3529804 F0616.77328 0.3986699 F0116.95609 0.4041058 F0518.36188 0.3855955 F0717.28390 0.5193954 F0216.05461 0.6336191 F0819.40204 0.3562135 F0316.35720 0.6727714 F0419.02380 0.5258971 F1119.13726 0.6498911 fm1.s[,1]+outer(fm1.s0.s, c(-2,0,2)) [,1] [,2] [,3] [1,] 12.21371 14.48493 16.75616 [2,] 14.99377 17.26499 19.53622 [3,] 14.50205 16.77328 19.04450 [4,] 14.68487 16.95609 19.22732 [5,] 16.09066 18.36188 20.63311 [6,] 15.01267 17.28390 19.55512 [7,] 13.78339 16.05461 18.32584 [8,] 17.13082 19.40204 21.67327 [9,] 14.08598 16.35720 18.62843 [10,] 16.75257 19.02380 21.29502 [11,] 16.86604 19.13726 21.40849 fm1.s[,2]+outer(fm1.s0.a, c(-2,0,2)) [,1] [,2] [,3] [1,] 0.1738325 0.3758608 0.5778890 [2,] 0.1509522 0.3529804 0.5550087 [3,] 0.1966417 0.3986699 0.6006982 [4,] 0.2020775 0.4041058 0.6061340 [5,] 0.1835672 0.3855955 0.5876237 [6,] 0.3173671 0.5193954 0.7214236 [7,] 0.4315909 0.6336191 0.8356474 [8,] 0.1541852 0.3562135 0.5582417 [9,] 0.4707432 0.6727714 0.8747997 [10,] 0.3238688 0.5258971 0.7279253 [11,] 0.4478629 0.6498911 0.8519194 This web page describes bVar: http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lme4/html/lmer-class.html bVar: A list of the diagonal inner blocks (upper triangles only) of the positive-definite matrices on the diagonal of the inverse of ZtZ+Omega. With the appropriate scale factor (and conversion to a symmetric matrix) these are the conditional variance-covariance matrices of the random effects. Rick B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using lme (nlme) to find the conditional variance of therandom effects
On Nov 13, 2007 1:02 AM, Rick Bilonick [EMAIL PROTECTED] wrote: On Tue, 2007-11-13 at 01:03 -0500, Rick Bilonick wrote: Is there some way to get ranef with postVar=TRUE to show what the variances are, or what the lower and upper bounds are? qqmath makes nice plots but I need to obtain the numerical values. Rick B. I found a way: attr(ranef(lmer.13,postVar=TRUE)[[2]],postVar) But I still don't understand why it's not OK to access the bVar slot directly. Because the class definitions can (and do, in the case of lme4) change. If you were to install the development version of the lme4 package install.packages(lme4, repos = http://R-forge.R-project.org;) you would find that the bVar slot doesn't exist any more. However, the ranef extractor with postVar = TRUE still does return the conditional variances of the random effects. Also, the code I originally showed and the results from ranef are very similar with a correlation of 0.9983 (it varies very slightly from subject to subject): round(data.frame(a=as.numeric([EMAIL PROTECTED](attr(VarCorr(lmer.13),sc)^2)[1]), b=as.numeric(attr(ranef(lmer.13,postVar=TRUE)[[2]],postVar))),10) ab 1 5.41e-08 5.44e-08 2 4.77e-08 4.83e-08 3 6.24e-08 6.25e-08 4 4.44e-08 4.52e-08 5 6.50e-08 6.50e-08 6 2.67e-08 2.92e-08 7 5.07e-08 5.12e-08 8 6.43e-08 6.43e-08 9 3.64e-08 3.79e-08 10 4.86e-08 4.92e-08 11 6.33e-08 6.33e-08 12 3.44e-08 3.60e-08 13 4.16e-08 4.26e-08 14 3.69e-08 3.83e-08 15 5.96e-08 5.97e-08 16 6.46e-08 6.46e-08 17 3.28e-08 3.46e-08 18 4.71e-08 4.77e-08 19 5.18e-08 5.22e-08 20 2.81e-08 3.04e-08 21 3.97e-08 4.09e-08 22 5.70e-08 5.72e-08 23 6.06e-08 6.07e-08 24 3.23e-08 3.42e-08 25 4.94e-08 4.99e-08 26 5.35e-08 5.38e-08 27 3.86e-08 3.98e-08 28 6.73e-08 6.73e-08 29 4.68e-08 4.74e-08 30 6.15e-08 6.16e-08 31 4.67e-08 4.74e-08 32 2.04e-08 2.37e-08 33 3.45e-08 3.61e-08 34 6.28e-08 6.29e-08 35 5.53e-08 5.55e-08 Not sure why they are not exactly the same. Rick B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] box plot
2008/1/31 stephen sefick [EMAIL PROTECTED]: the data is attached Thanks. I want to boxplot this data y-read.table(afdmmgs.txt, header=T) boxplot(y) At this point you should use str(y) to examine the structure of the data. str(y) 'data.frame': 4 obs. of 12 variables: $ X215: num 30002 223630 316722 125081 $ sc : Factor w/ 3 levels 10894.12023,..: 1 2 3 3 $ X202: Factor w/ 4 levels 25572.94416,..: 1 2 3 4 $ X198: Factor w/ 3 levels 40899.10425,..: 1 2 3 3 $ hc : num9731 13503 459246 9574 $ X190: num 46442 48592 82827 49852 $ bc : Factor w/ 3 levels 150297.1089,..: 2 1 3 3 $ X185: Factor w/ 4 levels 49759.77869,..: 3 1 2 4 $ X179: Factor w/ 3 levels 163687.2308,..: 1 2 3 3 $ X148: num 278803 267594 457433 272564 $ X119: num 326021 292475 956482 201198 $ X61 : num 386116 390206 1002466 322183 y$bc [1] 26617.8358 150297.1089 na na Levels: 150297.1089 26617.8358 na So what has happened is that your lower case 'na' is not recognized as a missing value code. Now go back and try y - read.table(/home/bates/Desktop/afdmmgs.txt, header = TRUE, na.strings = na) str(y) 'data.frame': 4 obs. of 12 variables: $ X215: num 30002 223630 316722 125081 $ sc : num 10894 17648NANA $ X202: num 25573 47692 99062NA $ X198: num 40899 79627NANA $ hc : num9731 13503 459246 9574 $ X190: num 46442 48592 82827 49852 $ bc : num 26618 150297 NA NA $ X185: num 80735 49760 66975NA $ X179: num 163687 318565 NA NA $ X148: num 278803 267594 457433 272564 $ X119: num 326021 292475 956482 201198 $ X61 : num 386116 390206 1002466 322183 and things will work better. Error in oldClass(stats) - cl : adding class factor to an invalid object I get this error when I try and plot something with na in it can boxplot just overlook the na and boxplot the values? stephen -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible with two random effects in lme()?
On Feb 1, 2008 7:30 AM, Falco tinnunculus [EMAIL PROTECTED] wrote: Dear all, Is it possible with two random effects in lme()? lmefit1- lme(Handling ~ Mass + factor(Prey)+ Mass*factor(Prey), random = ~ 1 |Place+Age) I assume that Place and Age are not nested. If so, it's possible to fit such a model with lme but not easy. It is much better to use the lmer function in the lme4 package. The call would be lmerfit1 - lmer(Handling ~ Mass + factor(Prey) + Mass * factor(Prey) + (1|Place) + (1|Age)) P.S. It is often better to send such questions to the special interest group mailing list for mixed models, [EMAIL PROTECTED] You may get a faster answer from that list which is lower traffic. Here I use Place as random effect, but I also want to add Age as a random effect. Since there could be an effect of Age (continous variable), but I like to control for it rather than locking at the effect of Age on handling time, since Mass and Prey type are of main interest. Regards Kes, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of variables allowed in a multiple linearregression model
On Feb 6, 2008 11:28 AM, Tony Plate [EMAIL PROTECTED] wrote: Bert Gunter wrote: I strongly suggest you collaborate with a local statistician. I can think of no circumstance where multiple regression on hundreds of thousands of variables is anything more than a fancy random number generator. That sounds like a challenge! What is the largest regression problem (in terms of numbers of variables) that people have encountered where it made sense to do some sort of linear regression (and gave useful results)? (Including multilevel and Bayesian techniques.) I have fit linear and generalized linear models with hundreds of thousands of coefficients but, of course, with a highly structured model matrix and using sparse matrix techniques. What is called the Rasch model for analysis of item response data (e.g. correct/incorrect responses by students to the items on a multiple-choice test) is a generalized linear model with the students and the items as factors. However, like Bert I would be very dubious of any attempt to fit a linear regression model to 3000 variables that were not generated in a systematic way. Sounds like a massive, computer-fueled fishing expedition (a.k.a. data mining). However, the original poster did say hundreds to thousands, which is smaller than hundreds of thousands. When I try a regression problem with 3,000 coefficients in R running under Windows XP 64 bit with 8Gb of memory on the machine and the /3Gb option active (i.e., R can get up to 3Gb), R 2.6.1 runs out of memory (apparently trying to duplicate the model matrix): R version 2.6.1 (2007-11-26) Copyright (C) 2007 The R Foundation for Statistical Computing ISBN 3-900051-07-0 m - 3000 n - m * 10 x - matrix(rnorm(n*m), ncol=m, nrow=n, dimnames=list(paste(C,1:n,sep=), paste(X,1:m,sep=))) dim(x) [1] 3 3000 k - sample(m, 10) y - rowSums(x[,k]) + 10 * rnorm(n) fit - lm.fit(y=y, x=x) Error: cannot allocate vector of size 686.6 Mb object.size(x)/2^20 [1] 687.7787 memory.size() [1] -2022.552 and the Windows process monitor shows the peak memory usage for Rgui.exe at 2,137,923K. But in a 64 bit version of R, I would be surprised if it was not possible to run this (given sufficient memory). However, R easily handles a slightly smaller problem: m - 1000 # of variables n - m * 10 # of rows k - sample(m, 10) x - matrix(rnorm(n*m), ncol=m, nrow=n, dimnames=list(paste(C,1:n,sep=), paste(X,1:m,sep=))) y - rowSums(x[,k]) + 10 * rnorm(n) fit - lm.fit(y=y, x=x) # distribution of coefs that should be one vs zero round(rbind(one=quantile(fit$coefficients[k]), zero=quantile(fit$coefficients[-k])), digits=2) 0% 25% 50% 75% 100% one 0.94 0.98 1.04 1.10 1.18 zero -0.30 -0.08 -0.01 0.06 0.29 To echo Bert Gunter's cautions, one must be careful doing ordinary linear regression with large numbers of coefficients. It does seem a little unlikely that there is sufficient data to get useful estimates of three thousand coefficients using linear regression in data managed in Excel (though I guess it could be possible using Excel 12.0, which can handle up to 1 million rows - recent versions prior to 2008 could handle on 64K rows - see http://en.wikipedia.org/wiki/Microsoft_Excel#Versions ). So, the suggestion to consult a local statistician is good advice - there may be other more suitable approaches, and if some form of linear regression is an appropriate approach, there are things to do to gain confidence that the results of the linear regression convey useful information. -- Tony Plate -- Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michelle Chu Sent: Tuesday, February 05, 2008 9:00 AM To: R-help@r-project.org Subject: [R] Maximum number of variables allowed in a multiple linearregression model Hi, I appreciate it if someone can confirm the maximum number of variables allowed in a multiple linear regression model. Currently, I am looking for a software with the capacity of handling approximately 3,000 variables. I am using Excel to process the results. Any information for processing a matrix from Excel with hundreds to thousands of variables will helpful. Best Regards, Michelle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA and lmer
On Feb 7, 2008 3:43 PM, Eric Imbert [EMAIL PROTECTED] wrote: I am analyzing from a very simple experiment. I have measured plants of two different colours (yellow and purple) in 9 different populations. So, I have two different factors : a fixed effect (Colour with two levels) and a random one (Population with 9 levels). I first analyzed the data with the aov function LargS is the variable aov(formula = LargS ~ Col + Error(Col/Pop)) Terms: Col Sum of Squares 3.440351 Deg. of Freedom1 Estimated effects are balanced Stratum 2: Col:Pop Terms: Residuals Sum of Squares 3017.112 Deg. of Freedom16 Residual standard error: 13.73206 Stratum 3: Within Terms: Residuals Sum of Squares 3347.385 Deg. of Freedom 302 To test for the interaction Col*Pop, I used the following F-ratio = (3017/16)/(3347/302) = 188. Highly significant ! Now, let's go to the analysis performed by lmer - First I do the linear model without the Col*Pop interaction : m3=lmer(LargS ~ Col + (1 | Pop) And next with the interaction : m2=lmer(LargS ~ Col + (Col | Pop)) I don't think this model provides a comparison that is directly comparable with the test you describe above. There is a model that is intermediate to your m2 and m3 models in terms of the number of parameters. (Actually when I see these results I realize that there is a bug in anova for lmer models in that the count of the parameters 1 less than it should be. I forgot to include the residual variance as a parameter to be estimated. However, this will not affect comparisons here because it is only the differences in the counts that matter.) You model m2 is shown as having 5 parameters, which should be 6 (2 fixed effects, 3 variance components for the random effects, 1 residual variance estimate), while m2 is shown as having 3 parameters (should be 4: 2 f.e. + 1 r.e. var + 1 resid var). The model LargS ~ Col + (1|Pop) + (1|Col:Pop) which allows for random effects for each population and for each Col:Pop combination, has 5 parameters (currently shown as 4). The random effects for Pop are independent of each other and with a constant variance while the random effects for the Col:Pop combinations are independent of each other and of the Pop random effects and with another constant variance. Thus there are 2 f.e. + 2 r.e. var + 1 resid. var. I think it is important to plot the data first and decide which of these models is indicated by the data. It sounds like the structure of your experiment is similar to the structure of the barley data set in the lattice package. Deepayan Sarkar's forthcoming book Lattice: Multivariate Data Visualization with R (to be published by Springer and due to ship next month) describes very effective ways of plotting such data. You can get a preview at his web site for the book http://lmdvr.r-forge.r-project.org Comparing both models : anova(m2,m3) : Df AIC BIC logLik Chisq Chi Df Pr(Chisq) m3 3 1710.67 1721.97 -852.33 m2 5 1714.59 1733.43 -852.30 0.0746 2 0.9634 = Conclusion : the interaction Col*Pop is not significant ! I guess I am missing something. Who can help ? Eric [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] good reference for mixed models and EM algorithm
On Feb 10, 2008 2:32 PM, Spencer Graves [EMAIL PROTECTED] wrote: Hi, Erin: Have you looked at Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer)? As far as I know, Doug Bates has been the leading innovator in this area for the past 20 years. Pinheiro was one of his graduate students. The 'nlme' package was developed by him or under his supervision, and 'lme4' is his current development platform. The ~R\library\scripts subdirectory contains ch01.R, ch02.R, etc. = script files to work the examples in the book (where ~R = your R installation directory). There are other good books, but I recommend you start with Pinheiro and Bates. Except that Doug Bates doesn't use the EM algorithm for fitting mixed models any more. The lme4 package previously had an option for starting with EM (actually ECME, which is a variant of EM) iterations but I have since removed it. For large data sets and especially for models with non-nested random effects, the EM iterations just slowed things down relative to direct optimization of the log-likelihood. Spencer Graves Erin Hodgess wrote: Dear R People: Sorry for the off-topic. Could someone recommend a good reference for using the EM algorithm on mixed models, please? I've been looking and there are so many of them. Perhaps someone here can narrow things down a bit. Thanks in advance, Sincerely, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie HLM with lme4 questions
On Feb 13, 2008 9:53 AM, Ista Zahn [EMAIL PROTECTED] wrote: Dear R listers, I know I'm breaking the rules by asking a homework related question-- I hope you'll forgive me. I am a social psychology graduate student, and the only one in my department who uses R. I successfully completed my multiple regression and structural equation modeling courses using R (John Fox's car and sem packages were a big help, as was his book). Now I am taking a hierarchical linear modeling course, and am hoping that I can use R for this also. I think that learning to use R for a course that is based on other software doesn't fall within the no homework questions rule. I've searched the list archives, and consulted the one-line version of Pinheiro and Bates (available through my library), but I'm having a great deal of difficulty translating what I'm learning in class into lmer syntax. Wow! A one-line version! I'd like to read that. Must be a really long line. Specifically, the instructor is using HLM 6.0. In this program, one specifies the level one and level two models explicitly, and I'm having trouble understanding what the equivalent of this is in lmer. Most of the examples we cover in class are change models, i.e., we working with longitudinal data. Specific questions: in HLM 6.0, we build the following model; Y = P0 + P1*(CONFLICT) + P2*(TIMEYRS) + E Level-2 Model P0 = B00 + B01*(H0MCITOT) + R0 P1 = B10 + B11*(H0MCITOT) + R1 P2 = B20 + B21*(H0MCITOT) + R2 What determines the level 2 groups? I don't see it in the specification although I will admit that I have difficulty reading those specifications. To me they confound fixed effects and random effects and end up expressing fixed-effects interactions in obscure and confusing ways. The way I have to understand this is to substitute the P0, P1 and P2 into the first equation, expand the terms then collect the fixed effects and the random effects. That would give me Y = B00 + B01*(H0MCITOT) + B10*(CONFLICT) + B11*(CONFLICT)*(H0MCITOT) + B20*(TIMEYRS) + B21*(TIMEYRS)*(H0MCITOT) + R0 + R1*(CONFLICT) + R2*(TIMEYRS) + E To me the specification is not complete because I don't know what groups the random effects R0, R1 and R2 are associated with. Also, are R0, R1 and R2 allowed to be correlated within groups or are they required to be independent? Let's say that GRP is the variable that defines the groups that share a common level of the random effects. Then the uncorrelated random effects model is written Y ~ CONFLICT*H0MCITOT + TIMEYRS*H0MCITOT + (1|GRP) + (CONFLICT-1|GRP) + (TIMEYRS-1|GRP) and the model with correlated random effects is Y ~ CONFLICT*H0MCITOT + TIMEYRS*H0MCITOT + (1+CONFLICT+TIMEYRS|GRP) Can someone explain to me how to represent this in lmer syntax? I've tried e.g., lmer(MAT ~ 1 + CONFLICT + TIMEYRS + (1 + CONFLICT + + TIMEYRS | H0MCITOT)) But I don't get the same result. More generally: Should I be using the lme4 package, the nlme package, or something else entirely? Is there any documentation for the lme4 package that is geared more towards novice users? To me the answer to the first question is to use lme4 except that the answer to the second question is that there isn't documentation geared to novice users. To a large degree that is my fault because I keep changing the package so that previously written introductory documentation is no longer applicable. The lmer function should be faster and more reliable than lme. It allows for fitting generalized linear mixed models and for fitting models with crossed random effects (as in having random effects for subject and for item) or partially crossed random effects (e.g. longitudinal data indexed by students and teachers within schools where students get exposed to different teachers and may move between schools). It can handle very large data sets - I showed an example of the analysis of 1.6 million grade point scores for 55,000 students, 8,000 instructors and 100 departments in a message to the MULTILEVEL listserv. Sorry for the long post, and thanks in advance for any help you can offer. --Ista [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cholmod error `matrix not positive definite'
Could you tell us which version of the lme4 package you are using? You can just send the output produced by sessionInfo() If you can make your data available so we can test it then please do so. If the data set is large you could send it to me in private email and I will make it available on a web site. I think that the development version of the lme4 package, available via install.packages(lme4, repos = http://r-forge.r-project.org;) should be resistant to that type of error but I am willing to be shown otherwise. On Thu, Feb 14, 2008 at 10:36 AM, Martijn Vandegehuchte [EMAIL PROTECTED] wrote: Dear R-users, I'm new to R, so my apologies if this question doesn't make sense. I've tried the following model in lmer, and it works perfectly: model-lmer(aphids~densroot+zone+(1|zone/site), family=quasipoisson) But if I try the exact same model with a different variable, totmas, the model looks as follows: model-lmer(aphids~totmas+zone+(1|zone/site), family=quasipoisson) Totmas is also a continuous variable just like densroot, but in this case I receive the following message: CHOLMOD warning: ߆e Error in objective(.par, ...) : Cholmod error `matrix not positive definite' at file:../Supernodal/t_cholmod_super_numeric.c, line 613 Moreover, if I test yet another continuous variable vitality, to my surprise R just crashes completely. This is a mystery to me, especially because the variables totmas or vitality don't give any problem when I build the exact same models in SAS with proc glimmix... Does someone have experience with this type of problem? Thank you in advance, Martijn. -- Martijn Vandegehuchte Ghent University Department Biology Terrestrial Ecology Unit K.L.Ledeganckstraat 35 B-9000 Ghent telephone: +32 (0)9/264 50 84 e-mail: [EMAIL PROTECTED] website TEREC: www.ecology.ugent.be/terec [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LMER
Could you send us the output of sessionInfo() please so we can see which version of the lme4 package you are using? In recent versions, especially the development version available as install.packages(lme4, repos = http://r-forge.r-project.org;) the PQL algorithm is no longer used. The Laplace approximation is now the default. The adaptive Gauss-Hermite quadrature (AGQ) approximation may be offered in the future. If the documentation indicates that PQL is the default then that is a documentation error. With the currently available implementation of the direct optimization of the Laplace approximation to the log-likelihood for the model there is no purpose in offering PQL. On Thu, Feb 14, 2008 at 6:50 PM, Daniel Malter [EMAIL PROTECTED] wrote: Hi, I run the following models: 1a. lmer(Y~X+(1|Subject),family=binomial(link=logit)) and 1b. lmer(Y~X+(1|Subject),family=binomial(link=logit),method=PQL) Why does 1b produce results different from 1a? The reason why I am asking is that the help states that PQL is the default of GLMMs and 2. gamm(Y~X,family=binomial(link=logit),random=list(Subject=~1)) The interesting thing about the example below is, that gamm is also supposed to fit by PQL. Interestingly, however, the GAMM fit yields about the coefficient estimates of 1b. But the significance values of 1a. Any insight would be greatly appreciated. library(lme4) library(mgcv) Y=c(0,1,1,1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1) X=c(1,2,3,4,3,1,0,0,2,3,3,2,4,3,2,1,1,3,4,2,3) Subject=as.factor(c(1,2,3,4,5,6,7,1,2,3,4,5,6,7,1,2,3,4,5,6,7)) cbind(Y,X,Subject) r1=lmer(Y~X+(1|Subject),family=binomial(link=logit)) summary(r1) r2=lmer(Y~X+(1|Subject),family=binomial(link=logit),method=PQL) summary(r2) r3=gamm(Y~X,family=binomial(link=logit),random=list(Subject=~1)) summary(r3$gam) - cuncta stricte discussurus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] History of R
On Fri, Feb 15, 2008 at 1:53 PM, Kathy Gerber [EMAIL PROTECTED] wrote: Earlier today I sent a question to Frank Harrell as an R developer with whom I am most familiar. He suggested also that I put my questions to the list for additional responses. Next month I'll be giving a talk on R as an example of high quality open source software. I think there is much to learn from R as a high quality extensible product that (at least as far as I can tell) has never been spun or hyped like so many open source fads. The question that intrigues me the most is why is R as an open source project is so incredibly successful and other projects, say for example, Octave don't enjoy that level of success? First and foremost there is the incredible generosity of Ross Ihaka and Robert Gentleman who, after spending an enormous amount of time and effort in development of the initial implementation, did not demand exclusive ownership of their work but allowed others to make changes. I believe Martin Maechler was the first non-Auckland person to get write access to the source code repository and I'm sure that the good experience of working at a distance with Martin persuaded R R to open it up to others. Martin is polite, considerate, meticulous and precise (he is a German-speaking Swiss so meticulous and precise kind of comes with the territory) and you couldn't ask for a first experience in sharing something that is very valuable to you with someone whom you may never have met in person. Not everyone has been that pleasant to work with. One of the first things that I did when I joined R-core was to blow up at Kurt and Fritz about something - on Christmas Eve! I surprised the group didn't boot me out after that start. When a project is gaining momentum the personalities of the initial developers have a big influence on its success. The R project has been fortunate in that regard. I have some ideas of course, but I would really like to know your thoughts when you look at R from such a vantage point. Thanks. Kathy Gerber University of Virginia ITC - Research Computing Support __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error 'singular gradient' in nonlinear model fitting
On Feb 15, 2008 11:50 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try fitting regressing log(WEIGHT) against log(TOTAL) using lm and then transform the results into starting values for nls (or possibly that is sufficient and you don't need the nls results). Also, set the trace argument to TRUE in your call to nls. This will cause nls to print the progress of the iterations including the residual sum of squares and the parameter values. What usually happens with a singular gradient is that one of the parameters, probably beta, is heading to an extreme value. This particular model can be reduced to a one-parameter optimization problem by using the optional plinear algorithm. nls(WEIGHT ~ (TOTAL^beta)/454, start = c(beta = 3), data = spottedseatrout2004.female.data, algorithm = plinear) On Fri, Feb 15, 2008 at 12:41 PM, HongSheng Liao [EMAIL PROTECTED] wrote: w.age.female.2004 - nls(WEIGHT ~ (alpha*TOTAL^beta)/454, start=list(alpha=1, beta=3), data=spottedseatrout2004.female.data) I am trying to fit above model to length-weight data of a fish species (spotted seatrout) by year (1999-2006). The convergence occurred for all the years except 2002 and 2004. In these two year, R shows the error called 'singular gradient' in the model. Could anyone please help me to fix the error? I don't think there are any problems with my data because I can fit the same model to 2004 data without any problems using SAS. Thank you very much in advance. Hongsheng (Hank) Liao, Ph.D. Lab Manager Center for Quantitative Fisheries Ecology 800 West 46th Street Old Dominion University Norfolk, Virginia 23508 Phone:757.683.4571 Fax:757.683.5293 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.