from:"Douglas Bates"

Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Douglas Bates

Although I know there is another message in this thread I am replying
to this message to be able to include the whole discussion to this
point.

Gregor mentioned the possibility of extending the compiled code for
cumsum so that it would handle the matrix case.  The work by Dirk
Eddelbuettel and Romain Francois on developing the Rcpp package make
it surprisingly easy to create compiled code for this task.  I am
copying the Rcpp-devel list on this in case one of the readers of that
list has time to create a sample implementation before I can get to
it.  It's midterm season and I have to tackle the stack of blue books
on my desk before doing fun things like writing code.

On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi,

 You might look at Reduce().  It seems faster.  I converted the matrix
 to a list in an incredibly sloppy way (which you should not emulate)
 because I cannot think of  the simple way.

 probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
 probabilites
 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 add - function(x) {Reduce(`+`, x, accumulate = TRUE)}


 system.time(F.slow - t(apply(probs, 1, cumsum)))
   user  system elapsed
  36.758   0.416  42.464

 system.time(for (cc in 2:ncol(F)) {
 +  F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 + })
   user  system elapsed
  0.980   0.196   1.328

 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
 probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
   user  system elapsed
  0.420   0.072   0.539


 .539 seconds for 10 vectors each with 1 million elements does not seem
 bad.  For 3 each, on my system it took .014 seconds, which for
 200,000 iterations, I estimated as:

 (20*.014)/60/60
 [1] 0.778

 (and this is on a stone age system with a single core processor and
 only 756MB of RAM)

 It should not be difficult to get the list back to a matrix.

 Cheers,

 Josh



 On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote:
 Dear all,

 Maybe the easiest solution: Is there anything that speaks against 
 generalizing
 cumsum from base to cope with matrices (as is done in matlab)? E.g.:

 cumsum(Matrix, 1)
 equivalent to
 apply(Matrix, 1, cumsum)

 The main advantage could be optimized code if the Matrix is extreme nonsquare
 (e.g. 100,000x10), but the summation is done over the short side (in this 
 case 10).
 apply would practically yield a loop over 100,000 elements, and 
 vectorization w.r.t.
 the long side (loop over 10 elements) provides considerable efficiency gains.

 Many regards,
 Gregor




 On Tue, 12 Oct 2010 10:24:53 +0200
 Gregor mailingl...@gmx.at wrote:

 Dear all,

 I am struggling with a (currently) cost-intensive problem: calculating the
 (non-normalized) cumulative distribution function, given the 
 (non-normalized)
 probabilities. something like:

 probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
 F - t(apply(probs, 1, cumsum)) #SLOOOW!

 One (already faster, but for sure not ideal) solution - thanks to Henrik 
 Bengtsson:

 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 for (cc in 2:ncol(F)) {
   F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 }

 In my case, probs is a (30,000 x 10) matrix, and i need to iterate this 
 step around
 200,000 times, so speed is crucial. I currently can make sure to have no 
 NAs, but
 in order to extend matrixStats, this could be a nontrivial issue.

 Any ideas for speeding up this - probably routine - task?

 Thanks in advance,
 Gregor

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how fit linear model with fixed slope?

2010-10-22 Thread Douglas Bates

On Fri, Oct 22, 2010 at 9:13 AM, Czerminski, Ryszard
ryszard.czermin...@astrazeneca.com wrote:
 I want to fit a linear model with fixed slope e.g. y = x + b
 (instead of general: y = a*x + b)

 Is it possible to do with lm()?

Yes.  The simplest way is to fit

lm(y - a*x ~ 1)

which will give you the estimate of b, its standard error, etc.

An alternative is to use an offset argument or an offset() expression
in the model formula

lm(y ~ 1 + offset(a*x))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Big data (over 2GB) and lmer

2010-10-23 Thread Douglas Bates

On Thu, Oct 21, 2010 at 2:00 PM, Ben Bolker bbol...@gmail.com wrote:
 Michal Figurski figurski at mail.med.upenn.edu writes:

 I have a data set of roughly 10 million records, 7 columns. It has only
 about 500MB as a csv, so it fits in the memory. It's painfully slow to
 do anything with it, but it's possible. I also have another dataset of
 covariates that I would like to explore - with about 4GB of data...

 I would like to merge the two datasets and use lmer to build a mixed
 effects model. Is there a way, for example using 'bigmemory' or 'ff', or
 any other trick, to enable lmer to work on this data set?

   I don't think this will be easy.

   Do you really need mixed effects for this task?  i.e., are
 your numbers per group sufficiently small that you will benefit
 from the shrinkage etc. afforded by mixed models?  If you have
 (say) 1 individuals per group, 1000 groups, then I would
 expect you'd get very accurate estimates of the group coefficients,
 you could then calculate variances etc. among these estimates.

   You might get more informed answers on r-sig-mixed-mod...@r-project.org ...

lmer already stores the model matrices and factors related to the
random effects as sparse matrices.  Depending on the complexity of the
model - in particular, if random effects are defined with respect to
more than one grouping factor and, if so, if those factors are nested
or not - storing the Cholesky factor of the random effects model
matrix will be the limiting factor.  This object has many slots but
only two very large ones in the sense that they are long vectors.  At
present vectors accessed or created by R are limited to 2^31 elements
because they are indexed by 32-bit integers.

So the short answer is, it depends.  Simple models may be possible.
Complex models will need to wait upon decisions about using wider
integer representations in indices.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme vs. lmer results

2010-10-26 Thread Douglas Bates

On Tue, Oct 26, 2010 at 12:27 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Hello,
 and sorry for asking a question without the data - hope it can still
 be answered:

 I've run two things on the same data:

 # Using lme:
 mix.lme - lme(DV ~a+b+c+d+e+f+h+i, random = random = ~ e+f+h+i|
 group, data = mydata)

 # Using lmer
 mix.lmer - lmer(DV
 ~a+b+c+d+(1|group)+(e|group)+(f|group)+(h|group)+(i|group), data =
 mydata)

Those models aren't the same and the model for lmer doesn't make
sense.  You would need to write the random effects terms as
(0+e|group), etc. because (e|group) is the same as (1 + e|group) so
you are including (Intercept) random effects for group in each of
those 5 terms.

To generate the same model as you fit with lme you would use

mix.lmer - lmer(DV ~a+b+c+d+(e+f+g+h+ii|group), mydata)

I wouldn't recommend it though as this requires estimating  21
variance and covariance parameters for the random effects.  Almost
certainly the estimated variance-covariance matrix will end up being
singular.  Unless you are careful you may not notice this.

 lme provided an output (fixed effects and random effects coefficients).

lme is not as picky about singularity of the variance-covariance
matrix as lmer is.

 lmer gave me an error: Error in mer_finalize(ans) : Downdated X'X is
 not positive definite, 10.
 I've rerun lmer with - but specifying the random effects for 2 fewer
 predictors. This time it ran and provided an output. (BTW, the random
 effects of lmer with 2 fewer predictors specified as random were very
 close to the output of lme).

Yes, lmer could converge in such as case but the parameter estimates
are not meaningful because of the ambiguity described above.

 Question:
 Looks like lmer could not invert the matrix, right?

Well, lmer never tries to invert matrices but it does factor them and
that is where the problem is recognized. However, I think that
singularity is a symptom of the problem, not the cause.

 But how come lme
 (which I thought was an earlier version of lmer) COULD invert it?

The computational methods in the two packages are quite different.  I
think that the methods in lme4 are superior because we have learned a
bit in the last 10 years.


 Greatly appreciate a clarification!


 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with lmer model specification syntax for nested mixed model

2010-10-31 Thread Douglas Bates

On Sun, Oct 31, 2010 at 2:35 AM, Carabiniero ja...@troutnut.com wrote:

 I haven't been able to fully make sense of the conflicting online information
 about whether and how to specify nesting structure for a nested, mixed
 model.  I'll describe my experiment and hopefully somebody who knows lme4
 well can help.

 We're measuring the fluorescence intensity of brain slices from frogs that
 have undergone various treatments.  We want to use multcomp to look for
 differences between treatments, while accounting for the variance introduced
 by the random effects of brain and slice.  There are a few measurements per
 slice, several slices per brain, and several brains per treatment.  In the
 data file, the numbering for slices starts over from 1 for each brain, and
 the numbering for brains starts over from 1 for each treatment.

This is what I call implicit nesting in the definition of the
variables.  My general recommendation is to create new variables that
reflect the actual structure of the data, as in

mydata - within(mydata, {
ubrain - factor(Treatment:Brain)
uslice - factor(Treatment:Brain:Slice)
}

then define the model in terms of these factors, ubrain and uslice,
that have the desirable property that each distinct brain has a
distinct label.

 In other words:  Treatment is a fixed effect, brain is a random effect
 nested in treatment, and slice is a random effect nested in brain.

 As I understood the documentation, this is the correct specification:

 log(Intensity) ~ Treatment + (1|Brain) + (1|Slice)

That will work with ubrain and uslice instead of the implicitly nested
Brain and Slice.

 However, I don't see how lmer understands the correct nesting structure from
 that.  How does it know brain isn't crossed with treatment?

lmer can determine the crossed or nested structure from the data
whenever the data reflect the structure.  Implicitly nested factors
don't reflect the structure of the data and rely on external
information to augment the data given.

The computational methods used in lmer don't depend on whether the
grouping factors for the random effects are nested or not.  However
they do require that the grouping factors are well-defined.

 Here are two other things I tried, and each gave different results:

 log(Intensity) ~ Treatment + (1|Slice/Brain/Treatment)
 log(Intensity) ~ Treatment + (1|Brain/Treatment) + (1|Slice/Brain)

 I'm not sure why these things give different results, or which one (if any)
 is right.  Can anyone help?

I have taken the liberty of cc:ing the R-SIG-Mixed-Models mailing list
on this reply and suggest that any follow-ups be on that list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using R for Production - Discussion

2010-11-02 Thread Douglas Bates

On Mon, Nov 1, 2010 at 11:04 PM, Santosh Srinivas
santosh.srini...@gmail.com wrote:
 Hello Group,

 This is an open-ended question.

 Quite fascinated by the things I can do and the control I have on my
 activities since I started using R.
 I basically have been using this for analytical related work off my desktop.
 My experience has been quite good and most issues where I need to
 investigate and solve are typical items more related to data errors, format
 corruption, etc... not necessarily R Related.

 Complementing this with Python gives enough firepower to do lots of
 production (analytical related activities) on the cloud (from my research I
 see that every innovative technology provider seems to support Python ...
 google, amazon, etc).

 Question on using R for Production activities:
 Q1) Does anyone have experience of using R-scripts etc ... for production
 related activities. E.g. serving off a computational/ analytical /
 simulation environment from a webportal with the analytical processing done
 in R.
 I've seen that most useful things for normal (not rocket science) business
 (80-20 rule) can be done just as well in R in comparison with tools like
 SAS, Matlab, etc.

 Q2) I haven't tried the processing routines for much larger data-sets
 assuming size is not a constraint nowadays.
 I know that I should try out ... but any forewarnings would help. Is it
 likely that something that works for my desktop dataset is quite as likely
 to work when scaled up to a cloud dataset?
 Assuming that I do the clearing out of unused objects, not running into
 infinite loops, etc?

 i.e. is there any problem with the fundamental architecture of R itself?
 (like press articles often say)


 Q3) There are big fans of the SAS, Matlab, Mathworks environments out there
  does anyone have a comparison of how R fares.
 From my experience R is quite neat and low level ... so overheads should be
 quite low.
 Most slowness comes due to lack of knowledge (see my code ... like using the
 wrong structures, functions, loops, etc.) rather than something wrong with
 the way R itself is.
 Perhaps there is no commercial focus to enhance performance related issues
 but my guess is that it is just matter of time till the community evolves
 the language to score higher on that too.
 And perhaps develops documentation to assist the challenge users with
 performance tips (the ten commandments types)

 Q4) You must have heard about the latest comment from James Goodnight of SAS
 ... We haven't noticed that a lot. Most of our companies need industrial
 strength software that has been tested, put through every possible scenario
 or failure to make sure everything works correctly.
 My gut is that random passionate geeks (playing part-time) do better
 testing than a military of professionals ... (but I've no empirical evidence
 here)

 I am not taking a side here (although I appreciate those who do!) .. but
 looking for an objective reasoning.

Regarding performance and size of data sets I would suggest viewing
the presentation that Dirk Eddelbuettel and Romain Francois gave at
Google recently.  David Smith links to it in his blog at
blog.revolutionanalytics.com

One of the advantages of Open Source systems is that people can
provide many different kinds of hooks into the code.

At present any R vector objects use 32-bit signed integers for
indexing, which limits the size of an individual vector to 2^{31}-1.
There are some methods available for using external storage to by-pass
this but they do introduce another level of complexity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Integrate to 1? (gauss.quad)

2010-11-14 Thread Douglas Bates

I don't know about the statmod package and the gauss.quad function but
generally the definition of Gauss-Hermite quadrature is with respect
to the function that is multiplied by exp(-x^2) in the integrand.  So
your example would reduce to summing the weights.

On Sun, Nov 14, 2010 at 11:18 AM, Doran, Harold hdo...@air.org wrote:
 Does anyone see why my code does not integrate to 1?

 library(statmod)

 mu - 0
 s - 1
 Q - 5
 qq - gauss.quad(Q, kind='hermite')
 sum((1/(s*sqrt(2*pi)))  * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights)

 ### This does what's it is supposed to
 myNorm - function(theta) (1/(s*sqrt(2*pi)))  * exp(-((theta-mu)^2/(2*s^2)))
 integrate(myNorm, -Inf, Inf)
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to do linear regression when dimension is high

2010-11-16 Thread Douglas Bates

On Tue, Nov 16, 2010 at 10:30 AM, poko2000 quan.poko2...@gmail.com wrote:

 Hi I am a newbie in R.
 I have data with dim of 20.
 How to use lm if i want to do regression with the whole design matrix? My y
 is the first column.
 the left are xs.
 Thanks a lot.

Do you have the data stored in a matrix or in a data frame?  If it is
in a data frame and the first column is named y then you can use a
call like

lm(y ~ ., mydata)

The '.' on the right hand side of the formula is expanded to a formula
incorporating of the columns in the data frame except for the
response.  For example

 summary(lm(Fertility ~ ., swiss))

Call:
lm(formula = Fertility ~ ., data = swiss)

Residuals:
 Min   1Q   Median   3Q  Max
-15.2743  -5.2617   0.5032   4.1198  15.3213

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  66.91518   10.70604   6.250 1.91e-07
Agriculture  -0.172110.07030  -2.448  0.01873
Examination  -0.258010.25388  -1.016  0.31546
Education-0.870940.18303  -4.758 2.43e-05
Catholic  0.104120.03526   2.953  0.00519
Infant.Mortality  1.077050.38172   2.822  0.00734

Residual standard error: 7.165 on 41 degrees of freedom
Multiple R-squared: 0.7067, Adjusted R-squared: 0.671
F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot vs print ??

2010-11-16 Thread Douglas Bates

On Tue, Nov 16, 2010 at 11:09 AM, skan juanp...@gmail.com wrote:

 What's the differente betwen using plot and using print  in order to
 plot a graph?
 For example in order to plot the result of a histogram.

Could you give an example?

It seems that you are referring to graphics functions in packages such
as lattice and ggplot2 that produce objects that must be print'ed or
plot'ed before they are rendered on a graphics device.  However, we
would need to know if you mean the result of histogram() in the
lattice package or hist() in the graphics package before we could
answer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glmer, Error: Downdated X'X is not positive definite,49

2010-11-16 Thread Douglas Bates

It is often more effective to send questions about lmer or glmer to
the r-sig-mixed-mod...@r-project.org mailing list, which I am cc:ing
on this response.

On Tue, Nov 16, 2010 at 3:25 AM, Annika annika.opperb...@jyu.fi wrote:

 Dear list,

 I am new to this list and I am new to the world of R. Additionally I am not
 very firm in statistics either but have to deal. So here is my problem:
 I have a dataset (which I attach at the end of the post) with a binomial
 response variable (alive or not) and three fixed factors
 (trapping,treat,sex). I do have repeated measures and would like to include
 one (enclosure) random factor.
 I chose to use the glmer function (and tried the lmer as well).

 m1-glmer(alive~treat*sex*trapping*ID+(1|enclosure),family=binomial,data=survadult)

Is ID a factor?  In the data you posted it is an integer value but I
am guessing it should be a factor.  If so, you can't incorporate it in
the model because there is only one binary observation per level of
ID.

If I omit the ID in the fixed-effects part of the formula I can fit
the model with glmer.

 summary(m1 - glmer(alive ~ trapping * treat * sex + (1|enclosure), trap, 
 binomial))
Generalized linear mixed model fit by maximum likelihood ['summary.mer']
 Family: binomial
Formula: alive ~ trapping * treat * sex + (1 | enclosure)
   Data: trap
 AIC  BIC   logLik deviance
213.2111 334.5440 -73.6056 147.2111

Random effects:
 GroupsNameVariance Std.Dev.
 enclosure (Intercept) 1.3721.171
Number of obs: 292, groups: enclosure, 13

Fixed effects:
 Estimate Std. Error z value
(Intercept)  10.61897   45.92452   0.231
trappingtrapping2 0.03992   65.59766   0.001
trappingtrapping3-0.01196   64.74665   0.000
trappingtrapping4-0.74441   71.77010  -0.010
treatm/p -0.26605   72.78328  -0.004
treatp/m -0.18007   81.63861  -0.002
treatp/p  0.50290   64.45725   0.008
sexm -8.61299   45.92746  -0.188
trappingtrapping2:treatm/p   -0.03657  103.39000   0.000
trappingtrapping3:treatm/p   -0.04916  101.87680   0.000
trappingtrapping4:treatm/p  -10.26432   91.32210  -0.112
trappingtrapping2:treatp/m   -0.04047  115.80425   0.000
trappingtrapping3:treatp/m   -8.59631   93.53334  -0.092
trappingtrapping4:treatp/m   -7.92162   98.52543  -0.080
trappingtrapping2:treatp/p   -0.05942   91.39839  -0.001
trappingtrapping3:treatp/p   -7.83591   78.98418  -0.099
trappingtrapping4:treatp/p   -9.60581   84.83662  -0.113
trappingtrapping2:sexm0.92624   65.61333   0.014
trappingtrapping3:sexm   -1.45692   64.75568  -0.022
trappingtrapping4:sexm   -0.88035   71.77685  -0.012
treatm/p:sexm 8.69446   93.57164   0.093
treatp/m:sexm 8.61957  106.06371   0.081
treatp/p:sexm 0.86615   64.46538   0.013
trappingtrapping2:treatm/p:sexm  -9.03327  118.95919  -0.076
trappingtrapping3:treatm/p:sexm  -7.51474  117.64003  -0.064
trappingtrapping4:treatm/p:sexm   0.81873  108.62671   0.008
trappingtrapping2:treatp/m:sexm  -9.51400  134.16089  -0.071
trappingtrapping3:treatp/m:sexm   0.48425  115.47998   0.004
trappingtrapping4:treatp/m:sexm  -2.37780  119.56181  -0.020
trappingtrapping2:treatp/p:sexm  -0.93097   91.42477  -0.010
trappingtrapping3:treatp/p:sexm   7.55060   79.00487   0.096
trappingtrapping4:treatp/p:sexm   9.47461   84.85476   0.112

 Both give the following error message:
       Error in mer_finalize(ans) : Downdated X'X is not positive definite,
 49.
       Additionally: Warning:
       glm.fit: fitted probabilities numerically 0 or 1 occurred
 I get that there is something wrong in my dataframe, but I don`t understand
 what and how I could change it.
 If I run a glm without the random factor
       m1-glm(alive~treat*sex*trapping*ID,family=binomial,data=survadult)
 it works out really fine, but is the wrong analysis in my opinion. (Does glm
 actually consider the repeated measures? Does glmer?)

 Is glmer the right function for me and how do I deal with the error I get?

 Thanks for any help for a non-professional :)

 Annika


 trapping        treat   sex     alive   ID      enclosure
        trapping1       m/m     f       1       400     1
        trapping1       m/m     f       1       401     1
        trapping1       m/m     f       1       402     1
        trapping1       m/m     m       0       403     1
        trapping1       m/m     m       1       404     1
        trapping1       m/m     m       1       405     1
        trapping1       m/p     f       1       406     2
        trapping1       m/p     f       1       407     2
        trapping1       m/p     f       1       408     2
        trapping1       m/p     m       1       409     2
        trapping1       m/p     m       1       410     2

Re: [R] Question about GLMER

2010-11-16 Thread Douglas Bates

I am cc:ing the r-sig-mixed-mod...@r-project.org mailing list on this
reply as such questions are often answered more quickly on that list.

On Tue, Nov 16, 2010 at 2:00 PM, Daniel Jeske daniel.je...@ucr.edu wrote:
 Dear R Help,

 I believe the glmer() function in lme4 automatically fits an
 unstrucruted covariance matirx for the random effects.
 Is that true?

The short answer is no.

The longer answer is that you will need to be more specific about the
form of the data and the model before your question can be answered.
unstructured variance-covariance is SAS-speak (and an oxymoron, but
I promise not to rant about that).

So please give us a context.

    If so, do I have an option to somehow ask for a
 diagonal structured covariance matrix?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix Size

2010-07-14 Thread Douglas Bates

On Wed, Jul 14, 2010 at 4:23 PM, paul s r-project@queuemail.com wrote:
 hi -

 i just started using R as i am trying to figure out how perform a linear
 regression on a huge matrix.

 i am sure this topic has passed through the email list before but could not
 find anything in the archives.

 i have a matrix that is 2,000,000 x 170,000 the values right now are
 arbitray.

 i try to allocate this on a x86_64 machine with 16G of ram and i get the
 following:

 x - matrix(0,200,17);
 Error in matrix(0, 2e+06, 17) : too many elements specified

R stores matrices and other data objects in memory.  A matrix of that
size would require
 2e+06*17*8/2^30
[1] 2533.197

gigabytes of memory.  Start looking for a machine with at least 5
terabytes of memory (you will need more than one copy of the matrix to
be stored) or, probably easier, rethink your problem.

Results from a linear regression producing 170,000 coefficient
estimates are unlikely to be useful.  The model matrix is essentially
guaranteed to be rank deficient.



 is R capable of handling data of this size? am i doing it wrong?

 cheers
 paul

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing the name of a variable to a function

2010-08-04 Thread Douglas Bates

On Wed, Aug 4, 2010 at 2:09 PM, Erik Iverson er...@ccbr.umn.edu wrote:
 Hello,


 I have a problem which has bitten me occasionally. I often need to
 prepare graphs for many variables in a data set, but seldom for all.
 or for any large number of sequential or sequentially named variables.
 Often I need several graphs for different subsets of the dataset
 for a given variable.  I run into similar problems with other needs
 besides graphing.

  What I would like to do is something like write a function which
 takes the *name* of a variable, presumably a s a character string,
 from a dataframe, as one argument, and the dataframe, as a second
 argument.

 For example, where y is to be the the name of a variable in a given
 dataframe d, and the other variables needed, T, M and so on, are
 to be found in the same dataframe :-

 pf - function (y,data,...) {
 p1 - xyplot(y~x|T,data)
 p2 - xyplot(y~x|T,subset(data,M == 2))
 p3 - xyplot(y~x|T,subset(data,M == 4))
 #print(p1,p2,p3)
 }
  pf(Score1,data)
 pf(Score2,data)


 This fails, because, of course, Score 1, Score 2 etc.. are  not
 defined, or if you pass them as pf(data$Score2,data), then when you
 subset the
 data, data$Score2 is now the wrong shape.  I've come up with various
 inelegant hacks, (often with for loops), for getting around this over
 the
 last few years, but I can't help feeling that I'm missing something
 obvious, which I've been too dim to spot.

 Depending on your needs (e.g., you use formulas, which can be trickier),
 I think I often do something like:

 # I prefer this, I quote the variable name...

 df1 - data.frame(x = rnorm(100),
                  score1 = rnorm(100),
                  M = sample(c(2, 4), 100, replace = TRUE))

 pf - function (y,data,...) {
  data$y - data[[y]]
  xyplot(y~x, subset(data, M == 2))
 }

 pf(score1, df1)

 # as an alternative, use eval/substitute, don't have to quote

 pf2 - function (y,data,...) {
  data$y - eval(substitute(y), data)
  xyplot(y~x, subset(data, M == 2))
 }

That's okay until you get name collisions with y in the data frame.  I
would approach the problem by substituting into the formula and
perhaps changing the name y to a hidden name like .y (The general rule
is that a programmer can intentionally use a name starting with . and
expect that it will not conflict with the names chosen by users.
Hostile users who use variable names that start with . get what they
deserve.)

 eval(substitute(.y ~ x, list(.y = as.name(score1
score1 ~ x
 str(eval(substitute(.y ~ x, list(.y = as.name(score1)
Class 'formula' length 3 score1 ~ x
  ..- attr(*, .Environment)=environment: R_GlobalEnv

 pf2(score1, df1)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help installation lme4a, Error Message: lme4a is not a valid installed library

2010-08-05 Thread Douglas Bates

On Thu, Aug 5, 2010 at 9:31 AM, rod84 ngueye...@hotmail.com wrote:

 Dear R users,
 I recently downloaded the library lme4a by
 svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4.
 I tried to install the library lme4a by copying the downloaded document in
 the location where all the R libraries are saved.

You are copying the source package to the library where the compiled
packages are kept, which is why you are getting the error.  You will
need to install the package from the sources.  As a Linux user I find
the process straightforward.  Windows and Mac OS X users are often
taken aback by the complexity of the process.

What operating system are you using?

 When I try to load the library, I obtain the message

 library(lme4a)
 Error in library(lme4a) : 'lme4a' is not a valid installed package

 R version 2.11.1 (2010-05-31)

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Help-installation-lme4a-Error-Message-lme4a-is-not-a-valid-installed-library-tp2314939p2314939.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lapack in R 2.11.1 (Ubuntu 10.04.1)

2010-09-15 Thread Douglas Bates

On Wed, Sep 15, 2010 at 3:48 PM, Matias Salibian-Barrera
msalib...@yahoo.ca wrote:
 Hi there,

 I'm trying to install the package RcppArmadillo in my R 2.11.1 which I 
 installed
 and regularly update via Ubuntu's repositories.


 When I try to install RcppArmadillo from CRAN I get:

 install.packages('RcppArmadillo', lib='~/myRlibs')
 [...]
 g++ -shared -o RcppArmadillo.so RcppArmadillo.o fastLm.o
 -L/home/matias/myRlibs/Rcpp/lib -lRcpp  
 -Wl,-rpath,/home/matias/myRlibs/Rcpp/lib
 -llapack -lblas -lgfortran -lm  -L/usr/lib/R/lib -lR


 /usr/bin/ld: cannot find -llapack

 I believe this means I don't have lapack available to link to.

 Does anybody know how I can fix this? I really only need to have RcppArmadillo
 running.


 I'm not a power user. I'm running:

 version
               _
 platform       i486-pc-linux-gnu
 arch           i486
 os             linux-gnu
 system         i486, linux-gnu
 status
 major          2
 minor          11.1
 year           2010
 month          05
 day            31
 svn rev        52157
 language       R
 version.string R version 2.11.1 (2010-05-31)

 and

 mat...@thecomputer:~$ cat /etc/issue
 Ubuntu 10.04.1 LTS \n \l

 Thanks a lot in advance.

Did you compile R or did you install the Ubuntu package from the CRAN
archives?  Installing the package from CRAN will automatically install
the Lapack library.  See the instructions at
http://cran.us.r-project.org/bin/linux/ubuntu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-21 Thread Douglas Bates

I haven't had the time to keep up with this discussion, or many of the
other discussions on the R-SIG-Mixed-Models email list.  I swamped
with other duties at present.

It is important to remember that the nlme and lme4 packages take a
model specification and provide code to evaluate the deviance.  Other
code is used to optimize the deviance.  In the case of nlme it is the
nlminb function and in the case of lme4 it is the minqa function.
There are no guarantees for numerical optimization routines.  If the
problem is ill-conditioned then they can converge to optima that are
not the global optimum.

Whoever suggested using the verbose option to see the progress of the
iterations has the right idea.


On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote:
 Thank you Peter and Ben for your comments.

 John


 - Original Message 
 From: Peter Dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
 Sent: Mon, September 20, 2010 12:28:43 PM
 Subject: Re: [R] lmer() vs. lme() gave different variance component estimates

 On 09/20/2010 08:09 PM, array chip wrote:
 Thank you Peter for your explanation of relationship between aov and lme. It
 makes perfect sense.


 When you said you might have computed the average of all 8
 measurements on each animal and computed a 1-way ANOVA for treatment effect,
 would this be the case for balanced design, or it is also true for unbalanced
 data?

 It is only exactly true for a balanced design, although it can be a
 practical expedient in nearly-balanced cases, especially if there is a
 clearly dominant animal variation. In strongly unbalanced data, you get
 reduced efficiency because animals with less data should be downweighted
 (not proportionally if there is substantial between-animal variation,
 though). And of course the whole thing relies on the fact that you have
 individuals nested in treatment (no animals had multiple treatments)


 Another question is if 1-way ANOVA is equivalent to mixed model for testing
 treatment effect, what would be reason why mixed model is used? Just to
estimate

 the variance components? If the interest is not in the estimation of variance
 components, then there is no need to run mixed models to test treatment
effects?

 Not too far off the mark. In more complex cases, there is the advantage
 that the mixed model helps figure out a sensible analysis for you.


 And my last question is I am glad to find that glht() from multcomp package
 works well with a lmer() fit for multiple comparisons. Given Professor 
 Bates's

 view that denominator degree's of freedom is not well defined in mixed 
 models,

 are the results from glht() reasonable/meaningful? If not, will the suggested
 1-way ANOVA used together with glht() give us correct post-hoc multiple
 comparsion results?

 I think Doug's view is that DFs are not _reliably_estimated_ with any of
 the current procedures. In the balanced cases, they are very well
 defined (well, give or take the issues with negative variances), and I
 would expect glht() to give meaningful results. Do check the residuals
 for at least approximate normality, though.



 Thank you very much!

 John





 - Original Message 
 From: Peter Dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
 Sent: Sat, September 18, 2010 1:35:45 AM
 Subject: Re: [R] lmer() vs. lme() gave different variance component estimates


 For a nested design, the relation is quite straightforward: The residual
 MS are the variances of sample means scaled to be comparable with the
 residuals (so that in the absense of random components, all
 MS are equal to within the F-ratio variability). So to get the id:eye
 variance component, subtract the Within MS from the id:eye MS and divide
 by the number of replicates (4 in this case since you have 640
 observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the
 id variance is the MS for id minus that for id:eye scaled by 8:
 (42.482-14.4)/8 = 3.51.

 I.e. it is reproducing the lmer results above, but of course not those
 from your original post.

 (Notice, by the way, that if you are only interested in the treatment
 effect, you might as well have computed the average of all 8
 measurements on each animal and computed a 1-way ANOVA).



 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-21 Thread Douglas Bates

On Tue, Sep 21, 2010 at 1:39 PM, Douglas Bates ba...@stat.wisc.edu wrote:
 I haven't had the time to keep up with this discussion, or many of the
 other discussions on the R-SIG-Mixed-Models email list.  I swamped
 with other duties at present.

 It is important to remember that the nlme and lme4 packages take a
 model specification and provide code to evaluate the deviance.  Other
 code is used to optimize the deviance.  In the case of nlme it is the
 nlminb function and in the case of lme4 it is the minqa function.

Shouldn't send email when I'm tired.  The optimizer function is bobyqa
in the minqa package. (Sort of makes you nostalgic for the punched
cards and line printers when you see the Fortran names jammed into six
characters.)

 There are no guarantees for numerical optimization routines.  If the
 problem is ill-conditioned then they can converge to optima that are
 not the global optimum.

 Whoever suggested using the verbose option to see the progress of the
 iterations has the right idea.


 On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote:
 Thank you Peter and Ben for your comments.

 John


 - Original Message 
 From: Peter Dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
 Sent: Mon, September 20, 2010 12:28:43 PM
 Subject: Re: [R] lmer() vs. lme() gave different variance component estimates

 On 09/20/2010 08:09 PM, array chip wrote:
 Thank you Peter for your explanation of relationship between aov and lme. It
 makes perfect sense.


 When you said you might have computed the average of all 8
 measurements on each animal and computed a 1-way ANOVA for treatment 
 effect,
 would this be the case for balanced design, or it is also true for 
 unbalanced
 data?

 It is only exactly true for a balanced design, although it can be a
 practical expedient in nearly-balanced cases, especially if there is a
 clearly dominant animal variation. In strongly unbalanced data, you get
 reduced efficiency because animals with less data should be downweighted
 (not proportionally if there is substantial between-animal variation,
 though). And of course the whole thing relies on the fact that you have
 individuals nested in treatment (no animals had multiple treatments)


 Another question is if 1-way ANOVA is equivalent to mixed model for testing
 treatment effect, what would be reason why mixed model is used? Just to
estimate

 the variance components? If the interest is not in the estimation of 
 variance
 components, then there is no need to run mixed models to test treatment
effects?

 Not too far off the mark. In more complex cases, there is the advantage
 that the mixed model helps figure out a sensible analysis for you.


 And my last question is I am glad to find that glht() from multcomp package
 works well with a lmer() fit for multiple comparisons. Given Professor 
 Bates's

 view that denominator degree's of freedom is not well defined in mixed 
 models,

 are the results from glht() reasonable/meaningful? If not, will the 
 suggested
 1-way ANOVA used together with glht() give us correct post-hoc multiple
 comparsion results?

 I think Doug's view is that DFs are not _reliably_estimated_ with any of
 the current procedures. In the balanced cases, they are very well
 defined (well, give or take the issues with negative variances), and I
 would expect glht() to give meaningful results. Do check the residuals
 for at least approximate normality, though.



 Thank you very much!

 John





 - Original Message 
 From: Peter Dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
 Sent: Sat, September 18, 2010 1:35:45 AM
 Subject: Re: [R] lmer() vs. lme() gave different variance component 
 estimates


 For a nested design, the relation is quite straightforward: The residual
 MS are the variances of sample means scaled to be comparable with the
 residuals (so that in the absense of random components, all
 MS are equal to within the F-ratio variability). So to get the id:eye
 variance component, subtract the Within MS from the id:eye MS and divide
 by the number of replicates (4 in this case since you have 640
 observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the
 id variance is the MS for id minus that for id:eye scaled by 8:
 (42.482-14.4)/8 = 3.51.

 I.e. it is reproducing the lmer results above, but of course not those
 from your original post.

 (Notice, by the way, that if you are only interested in the treatment
 effect, you might as well have computed the average of all 8
 measurements on each animal and computed a 1-way ANOVA).



 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r

Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-21 Thread Douglas Bates

On Tue, Sep 21, 2010 at 4:21 PM, Peter Dalgaard pda...@gmail.com wrote:
 On 09/21/2010 09:02 PM, Douglas Bates wrote:
 On Tue, Sep 21, 2010 at 1:39 PM, Douglas Bates ba...@stat.wisc.edu wrote:
 I haven't had the time to keep up with this discussion, or many of the
 other discussions on the R-SIG-Mixed-Models email list.  I swamped
 with other duties at present.

 It's not like I don't know how you feel



 It is important to remember that the nlme and lme4 packages take a
 model specification and provide code to evaluate the deviance.  Other
 code is used to optimize the deviance.  In the case of nlme it is the
 nlminb function and in the case of lme4 it is the minqa function.

 Shouldn't send email when I'm tired.  The optimizer function is bobyqa
 in the minqa package. (Sort of makes you nostalgic for the punched
 cards and line printers when you see the Fortran names jammed into six
 characters.)

 There are no guarantees for numerical optimization routines.  If the
 problem is ill-conditioned then they can converge to optima that are
 not the global optimum.

 Right. However, what we have here is a case where I'm pretty damn sure
 that the likelihood function is unimodal (it's a linear
 reparametrization of three independent chi-square terms) and has an
 optimum in the interior of the feasible region.

 In any case, I'm thoroughly confused about the lme4/lme4a/lme4b
 subtrees. Which algorithm is used by the current CRAN lme4?? (I have

lme4b was a placeholder.  It should be ignored now.

lme4a is the development version that will become the released version
once Martin and I are convinced that it does all the things that lme4
does.  Before that can happen I have to get my head above water for a
week or two to catch up and it could be well into October before that
happens (I have a particularly heavy teaching load this semester).

lme4a is cleaner than lme4 but leans heavily on the Rcpp package to
achieve that.  So it uses classes and methods (although in different
senses) at both the R and the compiled code levels.  lme4a allows for
profiling the deviance in a LMM with respect to the parameters.

I had thought that the released version of lme4 (the one shown below)
allowed for selection of the optimizer but, upon further review (a
phrase known to fans of what Americans call football) it turns out
that it doesn't.  So I misspoke.  The released version of lme4 uses
nlminb, as shown by the form of the verbose output.

The lme4a version of lmer uses bobyqa.  It would probably be
worthwhile running this test case in that version.

 sessionInfo()
 R version 2.11.1 (2010-05-31)
 i386-redhat-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8       LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] lme4_0.999375-35   Matrix_0.999375-39 lattice_0.18-8

 loaded via a namespace (and not attached):
 [1] grid_2.11.1   nlme_3.1-96   stats4_2.11.1 tcltk_2.11.1  tools_2.

 )




 Whoever suggested using the verbose option to see the progress of the
 iterations has the right idea.


 On Mon, Sep 20, 2010 at 2:50 PM, array chip arrayprof...@yahoo.com wrote:
 Thank you Peter and Ben for your comments.

 John


 - Original Message 
 From: Peter Dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
 Sent: Mon, September 20, 2010 12:28:43 PM
 Subject: Re: [R] lmer() vs. lme() gave different variance component 
 estimates

 On 09/20/2010 08:09 PM, array chip wrote:
 Thank you Peter for your explanation of relationship between aov and lme. 
 It
 makes perfect sense.


 When you said you might have computed the average of all 8
 measurements on each animal and computed a 1-way ANOVA for treatment 
 effect,
 would this be the case for balanced design, or it is also true for 
 unbalanced
 data?

 It is only exactly true for a balanced design, although it can be a
 practical expedient in nearly-balanced cases, especially if there is a
 clearly dominant animal variation. In strongly unbalanced data, you get
 reduced efficiency because animals with less data should be downweighted
 (not proportionally if there is substantial between-animal variation,
 though). And of course the whole thing relies on the fact that you have
 individuals nested in treatment (no animals had multiple treatments)


 Another question is if 1-way ANOVA is equivalent to mixed model for 
 testing
 treatment effect, what would be reason why mixed model is used? Just to
 estimate

 the variance components? If the interest is not in the estimation of 
 variance
 components, then there is no need to run mixed models to test treatment
 effects

Re: [R] Odp: Object Browser

2010-09-28 Thread Douglas Bates

On Mon, Sep 27, 2010 at 3:04 AM, Petr PIKAL petr.pi...@precheza.cz wrote:
 Hi
 I noticed that nobody answered your question yet so here is my try.

 If you want to see what objects are in your environment you can use ls()
 but its output is only names of objects. Here is a function I use a long
 time for checking what objects are there, their type, size and possibly
 rows columns. You can modify it to give you some more info but usually it
 is not needed.

 Regards
 Petr

 ls.objects - function (pos = 1, pattern, order.by)
 {
    napply - function(names, fn) sapply(names, function(x) fn(get(x,
        pos = pos)))
    names - ls(pos = pos, pattern = pattern)
    obj.class - napply(names, function(x) as.character(class(x))[1])
    obj.mode - napply(names, mode)
    obj.type - ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size - napply(names, object.size)
    obj.dim - t(napply(names, function(x) as.numeric(dim(x))[1:2]))
    vec - is.na(obj.dim)[, 1]  (obj.type != function)
    obj.dim[vec, 1] - napply(names, length)[vec]
    out - data.frame(obj.type, obj.size, obj.dim)
    names(out) - c(Type, Size, Rows, Columns)
    if (!missing(order.by))
        out - out[order(out[[order.by]]), ]
    out
 }

An alternative for the command line interface is

ls.str()

which combines ls() and a very brief version of str() as applied to
each of the objects.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data source for American college football rankings?

2009-09-29 Thread Douglas Bates

An interesting, and topical, example of multivariate data for
classroom illustrations are the American college football rankings.
Starting at the end of October (or week 8, the 8th week of the
football season) a set of rankings called the BCS (Bowl Championship
Series) will be published.  This is a composite ranking based on two
subjective polls, the Harris and USA Today polls, and a trimmed mean
of 6 objective scores, the so-called computer rankings.  The Harris
and USA Today polls are currently available along with several others
(the best known and most often quoted is the AP poll but, for
complicated reasons, it was replaced in the BCS rankings by the Harris
poll).

I enjoy using these as classroom examples but I haven't found a web
site from which I can download the data directly and am reduced to
screen scraping the HTML from popular sites like ESPN.  Does anyone
know of a site from which one can download the current poll results?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data source for American college football rankings?

2009-09-29 Thread Douglas Bates

On Tue, Sep 29, 2009 at 7:56 AM, Douglas Bates ba...@stat.wisc.edu wrote:
 An interesting, and topical, example of multivariate data for
 classroom illustrations are the American college football rankings.
 Starting at the end of October (or week 8, the 8th week of the
 football season) a set of rankings called the BCS (Bowl Championship
 Series) will be published.  This is a composite ranking based on two
 subjective polls, the Harris and USA Today polls, and a trimmed mean
 of 6 objective scores, the so-called computer rankings.  The Harris
 and USA Today polls are currently available along with several others
 (the best known and most often quoted is the AP poll but, for
 complicated reasons, it was replaced in the BCS rankings by the Harris
 poll).

 I enjoy using these as classroom examples but I haven't found a web
 site from which I can download the data directly and am reduced to
 screen scraping the HTML from popular sites like ESPN.  Does anyone
 know of a site from which one can download the current poll results?

To follow up on my own posting, these data are interesting in part
because they are presented as tables in many, many places and almost
never presented graphically.  Some graphical analysis from the
rankings on October 21, 2007 is shown in

http://www.stat.wisc.edu/~bates/BCS-2007-10-21.pdf

It's not often that an example that many students find interesting can
be used to illustrate trimmed means, various correlation measures,
scatterplot matrices, parallel coordinate plots, principal components
and biplots.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Douglas Bates

On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote:
 On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne
 dieter.me...@menne-biomed.de wrote:

 Dear List,

 creating factors in a given non-default orders is notoriously difficult to
 explain in a course. Students love the ifelse construct given below most,
 but I remember some comment from Martin Mächler (?) that ifelse should be
 banned from courses.

 Any better idea? Not necessarily short, easy to remember is important.

 Dieter

 data = c(1,7,10,50,70)
 levs = c(Pre,Post)

 # Typical C-Programmer style
 factor(levs[as.integer(data 10)+1], levels=levs)

 # Easiest to understand
 factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

 Why not

 factor(data  10, labels = c(Pre, Post))
 [1] Pre  Pre  Pre  Post Post
 Levels: Pre Post

 All you have to remember is that FALSE comes before TRUE.

And besides, Frank Harrell will soon be weighing in to tell you why
you shouldn't dichotomize in the first place.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rounding error in seq(...)

2009-09-30 Thread Douglas Bates

On Wed, Sep 30, 2009 at 2:32 PM, Peter Dalgaardp.dalga...@biostat.ku.dk wrote:
 Martin Batholdy wrote:

 hum,

 can you explain that a little more detailed?
 Perhaps I miss the background knowledge - but it seems just absurd to me.

 0.1+0.1+0.1 is 0.3 - there is no rounding involved, is there?

 why is
 x - 0.1 + 0.1 +0.1
 not equal to
 y - 0.3

 Remember that this is in BINARY arithmetic. It's really not any stranger
 than the fact that 1/3 + 1/3 != 2/3 in finite accuracy decimal arithmetic
 (0.3 + 0.3 = 0.6 != 0.7).

In an earlier thread on this theme I believe that someone quoted Brian
Kernighan as saying 10 times 0.1 is hardly ever 1 but I haven't been
able to track down the quote.  Can anyone point us to such a quote?
It summarizes the situation succinctly,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting 1 covariate, 3 factors

2009-10-07 Thread Douglas Bates

I'm not sure if this is exactly what you are looking for but I would
generally create an interaction plot using the lattice 'dotplot' with
type = c(p,a) so I get both the original data and the lines
joining the averages for the different factor levels.  I also prefer
the horizontal orientation to the vertical orientation.  Combining all
these variations produces something like

dotplot(f2 ~ y | f1, groups = f3, aspect = 0.2, layout = c(1,2), type
= c(p,a), pch = 21, strip = FALSE, strip.left = TRUE, auto.key =
list(columns = 2, lines = TRUE))

On Wed, Oct 7, 2009 at 11:00 AM, Paul Chatfield p.s.chatfi...@rdg.ac.uk wrote:

 I'm interested in plotting a y with an x factor as the combination of 2
 factors and colour with respect to a third, which the code below does with
 interaction.plot().  However, this is because I redefine the x to be 1
 factor.  Is there a way of getting it to plot without redefining it, and
 ideally to not join up the lines BETWEEN levels a and b, but just join those
 between after and before for one level of f3.  I figure this could be done
 by manually drawing over blank lines using ?lines but am not sure what the
 coordinates would be and figured there is probably an easier way where
 someone has dealt with this before.  Any thoughts greatly appreciated,

 Paul

 #

 y-rnorm(36)
 f1-rep(c(after,before), 18)
 f2-rep(1:3,12)
 f3-rep(1:2, each=18)

 ## Define new factor to be f1 and f3 for x axis - clumsy code, but gets its
 done;

 ff-numeric(length(y))
 for (i in 1:length(y))
 {if (f1[i]==a  f3[i]==1) ff[i]-1, a
 else if(f1[i]==a  f3[i]==2) ff[i]-2, a
 else if(f1[i]==b  f3[i]==1) ff[i]-1, b
 else ff[i]-2, b}

 ## Plot of interest;

 interaction.plot(ff,f2,y)
 --
 View this message in context: 
 http://www.nabble.com/Plotting-1-covariate%2C-3-factors-tp25789442p25789442.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] custom selfStart model works with getInitial but not nls

2009-10-17 Thread Douglas Bates

On Fri, Oct 16, 2009 at 7:09 PM, Michael A. Gilchrist mi...@utk.edu wrote:
 Hello,

 I'm having problems creating and using a selfStart model with nlme.
  Briefly, I've defined the model, a selfStart object, and then combined them
 to make a selfStart.default model.

 If I apply getInitial to the selfStart model, I get results.  However, if I
 try usint it with nls or nlsList, these routines complain about a lack of
 initial conditions.

 If someone could point out what I'm doing wrong, I'd greatly appreciate it.

Thanks for providing the code.  Could you also provide some data as a test case?


 Details:
 ## Nonlinear model I want to fit to the data
 const.PBMC.tcell.model - function(B0, t, aL, aN, T0){

  Tb0 = B0;

  x = exp(-log(aL) + log(T0*aL+(-1+exp(t * aL))*Tb0 * aN) - t * aL);

  return(x);
 }


 ##Define selfStart routine
 const.PBMC.tcell.selfStart- function(mCall, LHS, data){

  t0 = 0;
  t1 = 24;
  t2 = 48;

  ##Get B0 Value
  B0 =  data[1, B0];

  T0 = mean(data[data$Time==t0, Count]);
  T1 = mean(data[data$Time==t1, Count]);
  T2 = mean(data[data$Time==t2, Count]);

  if(T0  T2){ ##increase -- doesn't work
    stop(paste(Error in const.PBMC.tcell.start: T0  T2 for data: , data[1,
 ]));

  }
  ##Estimate aL based on exponential decline from t=0 to t=24
  aLVal = -(log(T1) - log(T0))/(t1-t0);

  ##Estimate aNVal based on final value
  aNVal = aLVal*T2/B0;

  values = list(aLVal, aNVal, T0);
  names(values) - mCall[c(aL, aN, T0)]; #mimic syntax used by PB
  return(values)
 }



 ##Now create new model with selfStart attributes
 const.PBMC.tcell.modelSS-  selfStart(model = const.PBMC.tcell.model,
 initial=const.PBMC.tcell.selfStart)


 ##Test routines using getInitial -- This works

 getInitial(Count ~ const.PBMC.tcell.modelSS(B0, Time,aL, aN, T0), data =
 tissueData)

 [1] 0.05720924
 $aL
 [1] 0.05720924

 $aN
 [1] 0.1981895

 $T0
 [1] 1360.292

 ##Now try to use the SS model -- this doesn't work

 nls(Count ~ const.PBMC.tcell.modelSS(B0, Time,aL, aN, T0), data =
 tissueData)

 Error in numericDeriv(form[[3L]], names(ind), env) :
  Missing value or an infinity produced when evaluating the model
 In addition: Warning message:
 In nls(Count ~ const.PBMC.tcell.modelSS(B0, Time, aL, aN, T0), data =
 tissueData) :
  No starting values specified for some parameters.
 Intializing 'aL', 'aN', 'T0' to '1.'.
 Consider specifying 'start' or using a selfStart model

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cairo package, png files within for loop are black?

2009-10-22 Thread Douglas Bates

On Thu, Oct 22, 2009 at 12:46 PM, Douglas M. Hultstrand
dmhul...@metstat.com wrote:
 Hello,

 I am generating .png images using the Cairo package in a for loop (looping
 on the number of zones, number of zones equals the number of plots to create
 based on different zone data).   When  I run the R script the .png files are
 created but they are all black?  If I comment out the for loop and force my
 zones to equal one the png file is created correctly?  Is there an issue
 with generating .png files within a for loop?

Probably not.  The behavior you observe is due to a lattice graphics
function being called in a loop.  When called within a loop you must
print or show the result of a lattice graphics function.  See FAQ
7.22

 Create .png commands within for loop:

 CairoPNG(paste(t.g), width=800, height=600, pointsize=12, bg=white)
   xyplot(areasqmi ~ value, x.p, groups=dur,
   main=(t.n), ylab=expression(Area(mi ^ 2 * )),
   xlab=Maximum Average Depth of Percipitation (inches),
   scales=list(y=list(log=TRUE)), ylim=c(1,100), type='l',
   auto.key=list(space='right'))
 dev.off()

 --
 -
 Douglas M. Hultstrand, MS
 Senior Hydrometeorologist
 Metstat, Inc. Windsor, Colorado
 voice: 970.686.1253
 email: dmhul...@metstat.com
 web: http://www.metstat.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What happen for Negative binomial link in Lmer fonction?

2009-10-22 Thread Douglas Bates

On Thu, Oct 22, 2009 at 1:39 PM, Ben Bolker bol...@ufl.edu wrote:

ROBARDET Emmanuelle wrote:

Dear R users,
I'm performing some GLMMs analysis with a negative binomial link.
I already performed such analysis some months ago with the lmer() function
but when I tried it today I encountered this problem:
Erreur dans famType(glmFit$family) : unknown GLM family: 'Negative
Binomial'

Does anyone know if the negative binomial family has been removed from
this function?
I really appreciate any response.
Emmanuelle

I would be extremely surprised if this worked in the past; to
the best of my knowledge the negative binomial family has
never been implemented in lmer.

I too would be extremely surprised if it had worked in the past,
considering that I have never implemented it.

I did exchange email with Bill Venables about it and we formulated
what seems to be a reasonable approach but it hasn't made it to the
top of the To Do list yet. Right now the big push is on code to
profile the log-likelihood with respect to the parameters so we can
actually get confidence intervals and, the holy grail of
mixed-modeling, p-values.
One could in principle
do a glmmPQL fit with the negative binomial family
(with a fixed value of the overdispersion parameter).
glmmADMB is another option.
Can you say which version etc. you were using???

Follow-ups should probably be sent to r-sig-mixed-mod...@r-project.org
--
View this message in context:
http://www.nabble.com/What-happen-for-Negative-binomial-link-in-Lmer-fonction--tp26013041p26015140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automatization of non-linear regression

2009-10-22 Thread Douglas Bates

Alternatively you could install the NRAIA package and collapse the
construction of the plot to a call to plotfit as shown in the
enclosed.

Note that this is a poor fit of a logistic growth model.  There are no
data values beyond the inflection point so the Asym parameter (the
asymptote) is poorly determined and the asymptote and the inflection
point on the x axis (the xmid parameter) will be highly correlated.
Furthermore the increasing variability is due to differences between
plants.  You can check with

xyplot(severity ~ time, data1, groups = plant, type = c(g, b))

so a nonlinear mixed-effects model may be more appropriate.  See
Pinheiro and Bates (Springer, 2000).

2009/10/22 Peter Ehlers ehl...@ucalgary.ca:
 Joel,

 The following should answer most of your questions:

  time - c(seq(0,10),seq(0,10),seq(0,10))
  plant - c(rep(1,11),rep(2,11),rep(3,11))
  severity - c(
        42,51,59,64,76,93,106,125,149,171,199,
        40,49,58,72,84,103,122,138,162,187,209,
        41,49,57,71,89,112,146,174,218,250,288)/288

 # Suggestion: you don't need cbind() to make a dataframe:

  data1 - data.frame(time, plant, severity)

 # Plot severity versus time
 # you can avoid the awkward ..$.. by using with():

  with(data1, plot(time, severity, type=n))
  with(data1, text(time, severity, plant))
  title(main=Graph of severity vs time)

 # Now you use either the 'getInitial' approach and
 # transform your parameter estimates to your
 # preferred ones, or you can use SSlogis for your
 # model and transform the parameter estimates
 # afterwards:

 # Method 1:
 # 
  ini - getInitial(
        severity ~ SSlogis(time, alpha, xmid, scale),
        data = data1
  )
  ini - unname(ini) ##to get rid of names

 # start vector in terms of alpha, beta, gamma:
  para0.st - c(
         alpha = ini[1],
         beta  = ini[2]/ini[3],
         gamma = 1/ini[3]
  )

  fit0 - nls(
          severity~alpha/(1+exp(beta-gamma*time)),
          data1,
          start=para0.st,
          trace=T
  )

 # logistic curve on the scatter plot
  co - coef(fit0)  ##get the parameter estimates
  curve(co[1]/(1+exp(co[2]-co[3]*x)), add=TRUE)

 # you don't need from, to, since the plot is already
 # set up and you don't want to restrict the curve;

 # personally, I prefer defining the function to be plotted:

  f - function(x, abc){
    alpha - abc[1]
    beta  - abc[2]
    gamma - abc[3]
    alpha / (1 + exp(beta - gamma * x))
  }

 # then you can plot the curve with:

  curve(f(x, coef(fit0)), add = TRUE)


 # Method 2:
 # 
  fit2 - nls(severity ~ SSlogis(time, Asym, xmid, scal), data = data1)
  co - unname(coef(fit2))
  abc - c(co[1], co[2]/co[3], 1/co[3])
  with(data1, plot(time, severity, type=n))
  with(data1, text(time, severity, plant))
  title(main=Graph of severity vs time)
  curve(f(x, abc), add = TRUE)


 Cheers,
 -Peter Ehlers

 Joel Fürstenberg-Hägg wrote:

 Hi everybody,


 I'm using the method described here to make a linear regression:



 http://www.apsnet.org/education/advancedplantpath/topics/Rmodules/Doc1/05_Nonlinear_regression.html



 ## Input the data that include the variables time, plant ID, and severity
 time - c(seq(0,10),seq(0,10),seq(0,10))
 plant - c(rep(1,11),rep(2,11),rep(3,11))

 ## Severity represents the number of
 ## lesions on the leaf surface, standardized
 ## as a proportion of the maximum
 severity - c(

 +         42,51,59,64,76,93,106,125,149,171,199,
 +         40,49,58,72,84,103,122,138,162,187,209,
 +         41,49,57,71,89,112,146,174,218,250,288)/288

 data1 - data.frame(

 +         cbind(
 +                 time,
 +                 plant,
 +                 severity
 +         )
 + )

 ## Plot severity versus time ## to see the relationship between## the two
 variables for each plant
 plot(

 +         data1$time,
 +         data1$severity,
 +         xlab=Time,
 +         ylab=Severity,
 +         type=n
 + )

 text(

 +         data1$time,
 +         data1$severity,
 +         data1$plant
 + )

 title(main=Graph of severity vs time)

 getInitial(

 +         severity ~ SSlogis(time, alpha, xmid, scale),
 +         data = data1
 + )
    alpha      xmid     scale  2.212468 12.506960  4.572391

 ## Using the initial parameters above,
 ## fit the data with a logistic curve.
 para0.st - c(

 +         alpha=2.212,
 +         beta=12.507/4.572, # beta in our model is xmid/scale
 +         gamma=1/4.572 # gamma (or r) is 1/scale
 + )

 fit0 - nls(

 +         severity~alpha/(1+exp(beta-gamma*time)),
 +         data1,
 +         start=para0.st,
 +         trace=T
 + )
 0.1621433 :  2.212 2.7355643 0.2187227 0.1621427 :  2.2124095
 2.7352979 0.2187056

 ## Plot to see how the model fits the data; plot the
 ## logistic curve on a scatter plot
 plot(

 +         data1$time,
 +         data1$severity,
 +         type=n
 + )

 text(

 +         data1$time,
 +         data1$severity,
 +         data1$plant
 + )

 title(main=Graph of severity vs time)

 curve(

 +

Re: [R] glm.fit to use LAPACK instead of LINPACK

2009-10-22 Thread Douglas Bates

On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 Ted,

 LAPACK is newer and is supposed to contain better algorithms than LINPACK.  
 It is not true that LAPACK cannot handle rank-deficient problems.  It can.

It's not just a question of handling rank-deficiency.  It's the
particular form of pivoting that is used so that columns associated
with the same term stay adjacent.

The code that is actually used in glm.fit and lm.fit, called through
the Fortran subroutine dqrls, is a modified version of the Linpack
dqrdc subroutine.

 However, I do not know if using LAPACK in glm.fit instead of LINPACK would be 
 faster and/or more memory efficient.

The big thing that could be gained is the use of level-3 BLAS.  The
current code uses only level-1 BLAS.  Accelerated BLAS can take
advantage of level 3 calls relative to level-1.

Even so, I doubt that the QR decomposition is the big time sink in
glm.fit.  Why not profile a representative fit and check?  I did
profile the glm.fit code a couple of years ago and discovered that a
lot of time was being spent evaluating the various family functions
like the inverse link and the variance function and that was because
of calls to pmin and pmax.

Before trying to change very tricky Fortran code you owe it to
yourself to check that the potential gains would be.

 - Original Message -
 From: Ted tchi...@sickkids.ca
 Date: Thursday, October 22, 2009 10:53 am
 Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
 To: r-help@R-project.org r-help@r-project.org


 Hi,

 I understand that the glm.fit calls LINPACK fortran routines instead of
 LAPACK because it can handle the 'rank deficiency problem'.  If my data
 matrix is not rank deficient, would a glm.fit function which runs on
 LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
 LAPACK?  Has anyone done this already??  What is the best way to do this?

 I'm looking at very large datasets (thousands of glm calls), and would
 like to know if it's worth the effort for performance issues.

 Thanks,

 Ted

 -
 Ted Chiang
   Bioinformatics Analyst
   Centre for Computational Biology
   Hospital for Sick Children, Toronto
   416.813.7028
   tchi...@sickkids.ca

 __
 R-help@r-project.org mailing list

 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inspect s4 methods

2009-10-22 Thread Douglas Bates

On Thu, Oct 22, 2009 at 8:11 PM, Ista Zahn iz...@psych.rochester.edu wrote:
 Hi everyone,
 I'm sure this has been asked before, but I could not find the right
 search terms to find it.

 If I want to see what summarizing a model fit with lm() does, I can
 write summary.lm. But what if I want to see what summarizing a model
 fit with lmer() (lme4 package) does?

 showMethods(summary)
 Function: summary (package base)
 object=ANY
 object=character
    (inherited from: object=ANY)
 object=mer
 object=sparseMatrix
 object=summary.mer

 tells me there is a mer method, but  summary.mer gives me no joy.
 How can I see what summary.mer does? Do I have to inspect the source
 code?

Actually if you had followed up with the documentation for showMethods
you would have found out.

showMethods(summary, class = mer, include = TRUE)

More details are given in Uwe's article in issue 4 of 2006 R News
http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package lme4

2009-11-02 Thread Douglas Bates

On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote:
 Hi R Users,
     When I use package lme4 for mixed model analysis, I can't distinguish
 the significant and insignificant variables from all random independent
 variables.
     Here is my data and result:
 Data:

  Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9),
                 Variety=rep(rep(c(A1,A2,A3),each=3),3),
                 Stand=rep(c(B1,B2,B3),9),
                 Block=rep(1:3,each=9))
    Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) +
 (1|Variety:Stand), data = Rice)

 Result:

 Linear mixed model fit by REML
 Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 |
 Variety:Stand)
   Data: Rice
   AIC   BIC logLik deviance REMLdev
  96.25 104.0 -42.12    85.33   84.25
 Random effects:
  Groups        Name        Variance Std.Dev.
  Variety:Stand (Intercept) 1.345679 1.16003
  Block         (Intercept) 0.00 0.0
  Stand         (Intercept) 0.89 0.94281
  Variety       (Intercept) 0.024691 0.15714
  Residual                  0.67 0.81650
 Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3; Variety, 3

 Fixed effects:
            Estimate Std. Error t value
 (Intercept)   7.1852     0.6919   10.38

 Can you give me some advice for recognizing the significant variables among
 random effects above without other  calculating.

Well, since the estimate of the variance due to Block is zero, that's
probably not one of the significant random effects.

Why do you want to do this without other calculations?  In olden days
when each model fit involved substantial calculations by hand one did
try to avoid fitting multiple models but now that is not a problem.
You can get a hint of which random effects will be significant by
looking at their precision in a caterpillar plot and then fit the
reduced model and use anova to compare models.  See the enclosed

    Any suggestions will be appreciated.
 Wenjun

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


R version 2.10.0 Patched (2009-11-01 r50288)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 library(lme4)
Loading required package: Matrix
Loading required package: lattice
 Rice - data.frame(Yield = c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,
+7,8,8,8,4,8,6,4,8,8,9),
+Variety = gl(3, 3, 27, labels = c(A1,A2,A3)),
+Stand = gl(3, 1, 27, labels = c(B1,B2,B3)),
+Block = gl(3, 9))
 print(fm1 - lmer(Yield ~ 1 + (1|Block) + (1|Stand) +
+   (1|Variety) + (1|Variety:Stand), Rice))
Linear mixed model fit by REML 
Formula: Yield ~ 1 + (1 | Block) + (1 | Stand) + (1 | Variety) + (1 |  
Variety:Stand) 
   Data: Rice 
   AIC   BIC logLik deviance REMLdev
 96.25 104.0 -42.1285.33   84.25
Random effects:
 GroupsNameVariance Std.Dev.
 Variety:Stand (Intercept) 1.34568  1.16003 
 Variety   (Intercept) 0.02469  0.15713 
 Stand (Intercept) 0.9  0.94281 
 Block (Intercept) 0.0  0.0 
 Residual  0.7  0.81650 
Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3; Block, 3

Fixed effects:
Estimate Std. Error t value
(Intercept)   7.1852 0.6919   10.38
 ## Estimate of Block variance is zero, remove that term
 print(fm2 - lmer(Yield ~ 1 + (1|Stand) + (1|Variety) + (1|Variety:Stand),
+   Rice))
Linear mixed model fit by REML 
Formula: Yield ~ 1 + (1 | Stand) + (1 | Variety) + (1 | Variety:Stand) 
   Data: Rice 
   AIC   BIC logLik deviance REMLdev
 94.25 100.7 -42.1285.33   84.25
Random effects:
 GroupsNameVariance Std.Dev.
 Variety:Stand (Intercept) 1.345679 1.16003 
 Variety   (Intercept) 0.024692 0.15714 
 Stand (Intercept) 0.88 0.94281 
 Residual  0.67 0.81650 
Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3

Fixed effects:
Estimate Std. Error t value
(Intercept)   7.1852 0.6919   10.38
 ## Very small estimate of variance for Variety
 ## Check the precision of the random effects
 pl2 -

Re: [R] package lme4

2009-11-03 Thread Douglas Bates

On Tue, Nov 3, 2009 at 9:11 AM, wenjun zheng wjzhen...@gmail.com wrote:
 May be I can calculate p value by t testing approximately:
  1-qnorm(Variance/Std.Dev.)

That would be a z test, not a t test, wouldn't it?  And it would only
be meaningful if the distribution of the estimator is approximately
normal, which we know it definitely is not.  Estimates of variances
have distributions like a chi-square, not like a normal.  In
particular, the estimators are not symmetrically distributed about the
parameter value.

 But which function can help me to extract Variance and Std.Dev values from
 the results below:

The VarCorr extractor produces the estimates of the variance.  You
would have to write your own functions to determine an approximate
standard error of those estimates and I would not advise doing so.
Firstly, the code would be rather complicated and secondly the answer
would be of very limited usefulness, for the reasons discussed above.

print(fm2 - lmer(Yield ~ 1 + (1|Stand) + (1|Variety) +
 (1|Variety:Stand),Rice))
 Linear mixed model fit by REML
 Formula: Yield ~ 1 + (1 | Stand) + (1 | Variety) + (1 | Variety:Stand)
    Data: Rice
    AIC   BIC logLik deviance REMLdev
  94.25 100.7 -42.12    85.33   84.25
 Random effects:
  Groups        Name        Variance Std.Dev.
  Variety:Stand (Intercept) 1.345679 1.16003
  Variety       (Intercept) 0.024692 0.15714
  Stand         (Intercept) 0.88 0.94281
  Residual                  0.67 0.81650
 Number of obs: 27, groups: Variety:Stand, 9; Variety, 3; Stand, 3
 Fixed effects:
             Estimate Std. Error t value
 (Intercept)   7.1852     0.6919   10.38

 2009/11/2 Douglas Bates ba...@stat.wisc.edu

 On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote:
  Hi R Users,
      When I use package lme4 for mixed model analysis, I can't
  distinguish
  the significant and insignificant variables from all random independent
  variables.
      Here is my data and result:
  Data:
 
 
   Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9),
                  Variety=rep(rep(c(A1,A2,A3),each=3),3),
                  Stand=rep(c(B1,B2,B3),9),
                  Block=rep(1:3,each=9))
     Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) +
  (1|Variety:Stand), data = Rice)
 
  Result:
 
  Linear mixed model fit by REML
  Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 |
  Variety:Stand)
    Data: Rice
    AIC   BIC logLik deviance REMLdev
   96.25 104.0 -42.12    85.33   84.25
  Random effects:
   Groups        Name        Variance Std.Dev.
   Variety:Stand (Intercept) 1.345679 1.16003
   Block         (Intercept) 0.00 0.0
   Stand         (Intercept) 0.89 0.94281
   Variety       (Intercept) 0.024691 0.15714
   Residual                  0.67 0.81650
  Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3;
  Variety, 3

  Fixed effects:
             Estimate Std. Error t value
  (Intercept)   7.1852     0.6919   10.38

  Can you give me some advice for recognizing the significant variables
  among
  random effects above without other  calculating.

 Well, since the estimate of the variance due to Block is zero, that's
 probably not one of the significant random effects.

 Why do you want to do this without other calculations?  In olden days
 when each model fit involved substantial calculations by hand one did
 try to avoid fitting multiple models but now that is not a problem.
 You can get a hint of which random effects will be significant by
 looking at their precision in a caterpillar plot and then fit the
 reduced model and use anova to compare models.  See the enclosed

     Any suggestions will be appreciated.
  Wenjun
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package lme4

2009-11-03 Thread Douglas Bates

On Tue, Nov 3, 2009 at 8:08 AM, wenjun zheng wjzhen...@gmail.com wrote:
 Thanks,Douglas,
 It really helps me a lot, but is there any other way if I want to show
 whether a random effect is significant in text file, like P value or other
  index.
 Thanks very much again.
 Wenjun.

Well there are p-values from the likelihood ratio tests in that
transcript I sent.

The point of those tests is that a p-value can only be calculated when
you know both the null hypothesis and the alternative, which is why
those p-values are the result of comparing two nested model fits.

 2009/11/2 Douglas Bates ba...@stat.wisc.edu

 On Sun, Nov 1, 2009 at 9:01 AM, wenjun zheng wjzhen...@gmail.com wrote:
  Hi R Users,
      When I use package lme4 for mixed model analysis, I can't
  distinguish
  the significant and insignificant variables from all random independent
  variables.
      Here is my data and result:
  Data:
 
 
   Rice-data.frame(Yield=c(8,7,4,9,7,6,9,8,8,8,7,5,9,9,5,7,7,8,8,8,4,8,6,4,8,8,9),
                  Variety=rep(rep(c(A1,A2,A3),each=3),3),
                  Stand=rep(c(B1,B2,B3),9),
                  Block=rep(1:3,each=9))
     Rice.lmer-lmer(Yield ~ (1|Variety) + (1|Stand) + (1|Block) +
  (1|Variety:Stand), data = Rice)
 
  Result:
 
  Linear mixed model fit by REML
  Formula: Yield ~ (1 | Variety) + (1 | Stand) + (1 | Block) + (1 |
  Variety:Stand)
    Data: Rice
    AIC   BIC logLik deviance REMLdev
   96.25 104.0 -42.12    85.33   84.25
  Random effects:
   Groups        Name        Variance Std.Dev.
   Variety:Stand (Intercept) 1.345679 1.16003
   Block         (Intercept) 0.00 0.0
   Stand         (Intercept) 0.89 0.94281
   Variety       (Intercept) 0.024691 0.15714
   Residual                  0.67 0.81650
  Number of obs: 27, groups: Variety:Stand, 9; Block, 3; Stand, 3;
  Variety, 3

  Fixed effects:
             Estimate Std. Error t value
  (Intercept)   7.1852     0.6919   10.38

  Can you give me some advice for recognizing the significant variables
  among
  random effects above without other  calculating.

 Well, since the estimate of the variance due to Block is zero, that's
 probably not one of the significant random effects.

 Why do you want to do this without other calculations?  In olden days
 when each model fit involved substantial calculations by hand one did
 try to avoid fitting multiple models but now that is not a problem.
 You can get a hint of which random effects will be significant by
 looking at their precision in a caterpillar plot and then fit the
 reduced model and use anova to compare models.  See the enclosed

     Any suggestions will be appreciated.
  Wenjun
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LINEAR MIXED EFFECT

2009-11-12 Thread Douglas Bates

On Thu, Nov 12, 2009 at 10:14 AM, milton ruser milton.ru...@gmail.com wrote:
 Hi Ana,

 I am not quite sure if it is the problem, but if you call your data.frame as
 exp,
 you will crash exp() function... try use another name for your data.frame.

 By the way, I suggest you not use attach().

 Try something like model-lme(weight~date*diet, data=yourdataframe)

 good luck

and include the random specification in the call to lme, not as a
separate assignment.

 milton

 On Thu, Nov 12, 2009 at 5:43 AM, ANARPCG a.gouvei...@imperial.ac.uk wrote:





 Milton's point is dead-on, and I would highly encourage you to give the
 posting guide a look.

 That said... you might try na.action = na.omit in your call to...
 actually, we don't know what function you are using (see first point).
 Regardless, sounds like you have missing data and na.action is set to
 na.fail (ie, fail if any missing data).

 cheers, Dave


 milton ruser wrote:
 
  Dear Ana Golveia,
 
  It is completelly impossible someone realise what kind or help you need
 or
  what is happening. I suggest you give a look on the posting guide, mainly
  that part about a minimum reproducible code with self explaining
  information, etc.
 
  Cheers
 
  milton
 
  On Wed, Nov 11, 2009 at 7:22 AM, ANARPCG a.gouvei...@imperial.ac.uk
  wrote:
 
 
  CAN ANYONE PLEASE HELP ME WITH THIS
  i HAVE TO DO A MIXED EFFECT LINEAR MODEL WITH MY DATA DUE TO THE FACT
  THAT
  I
  have pseudoreplication!
  Although after reading and trying it for several times can get around
 due
  to
  Error in na.fail.default(list(date = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
  :
  missing values in object
  I uploaded my data file
  Thank you so much
  Kind regards
  AG
  http://old.nabble.com/file/p26300394/rawoctobercalciumexperiment2.txt
  rawoctobercalciumexperiment2.txt
 
  --
  View this message in context:
  http://old.nabble.com/LINEAR-MIXED-EFFECT-tp26300394p26300394.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
        [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  sorry I am new on this things
  the point is after trying to run the model all that happens is:
  exp-read.table(file=file.choose(),header=T)
  attach(exp)
  names(exp)
  [1] group     date      diet      weight    thickness length
  [7] width     liplength lipwidth
 
 exp$diet=factor(exp$diet,levels=c(zeropercent,tenpercent,twentypercent,thirtypercent,fortypercent,cuttleprecent))
  exp=na.omit(exp)
  library(nlme)
  random=~date
  model-lme(weight~date*diet)
  Error in na.fail.default(list(date = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,  :
    missing values in object
 
   I have pseudoreplications due to the fact that the measurements of the
  replicates have different starting points so I was advised to do lme. I
  never used it before and I cant get arround with it! the help I wanted
  from you is to help me to understand how to do a runable model!
 
 
 
 http://old.nabble.com/file/p26315302/rawoctobercalciumexperiment2.txt
 rawoctobercalciumexperiment2.txt
 --
 View this message in context:
 http://old.nabble.com/LINEAR-MIXED-EFFECT-tp26300394p26315302.html
  Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme model specification

2009-11-15 Thread Douglas Bates

On Sun, Nov 15, 2009 at 7:19 AM, Green, Gerwyn (greeng6)
g.gre...@lancaster.ac.uk wrote:
 Dear all

 this is a question of model specification in lme which I'd for which I'd 
 greatly appreciate some guidance.

 Suppose I have data in long format

 gene treatment rep Y
 1        1      1  4.32
 1        1      2  4.67
 1        1      3  5.09
 .        .      .    .
 .        .      .    .
 .        .      .    .
 1        4      1   3.67
 1        4      2   4.64
 1        4      3   4.87
 .        .      .    .
 .        .      .    .
 .        .      .    .
 2000     1      1  5.12
 2000     1      2  2.87
 2000     1      3  7.23
 .        .      .    .
 .        .      .    .
 .        .      .    .
 2000     4      1   2.48
 2000     4      2   3.93
 2000     4      3   5.17


 that is, I have data Y_{gtr} for g (gene) =1,...,2000    t (treatment) = 
 1,...,4 and    r (replicate) = 1,...,3

 I would like to fit the following linear mixed model using lme

 Y_{gtr} = \mu_{g} +  W_{gt} + Z_{gtr}

 where the \mu_{g}'s are fixed gene effects, W_{gt} ~ N(0, \sigma^{2}) 
 gene-treatment interactions, and residual errors Z_{gtr} ~ N(0,\tau^{2}). 
 (Yes, I know I'm specifying an interaction between gene and treatment without 
 specifying a treatment main effect ! - there is good reason for this)

You are going to end up estimating 2000 fixed-effects parameters for
gene, which will take up a lot of memory (one copy of the model matrix
for the fixed-effects will be 24000 by 2000 double precision numbers
or about 400 MB).  You might be able to fit that in lme as

lme(Y ~ -1 + factor(gene), data = data, random = ~ 1|gene:treatment)

but it will probably take a long time or run out of memory.  There is
an alternative which is to use the development branch of the lme4
package that allows for a sparse model matrix for the fixed-effects
parameters.  Or ask yourself if you really need to model the genes as
fixed effects instead of random effects.  We have seen situations
where users do not want the shrinkage involved with random effects but
it is rare.

If you want to follow up on the development branch (for which binary
packages are not currently available, i.e. you need to compile it
yourself) then we can correspond off-list.


 I know that specifying

 model.1 - lme(Y ~ -1 + factor(gene), data=data, random= ~1|gene/treatment)

 fits Y_{gtr} = \mu_{g} +  U_{g} + W_{gt} + Z_{gtr}

 with \mu_{g}, W_{gt}  and Z_{gtr} as previous and U_{g} ~ N(0,\gamma^{2}), 
 but I do NOT want to specify a random gene effect. I have scoured Bates and 
 Pinheiro without coming across a parallel example.


 Any help would be greatly appreciated

 Best


 Gerwyn Green
 School of Health and Medicine
 Lancaster Uinversity

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function which saves an image of a dgtMatrix as png

2010-04-28 Thread Douglas Bates

image applied to a sparseMatrix object uses lattice functions to
create the image.  As described in R FAQ 7.22 you must use

print(image(x))

or

show(image(x))

or even

plot(image(x))

when a lattice function is called from within another function.
On Wed, Apr 28, 2010 at 1:20 PM, Gildas Mazo gildas.m...@curie.fr wrote:
 Hi,

 I'm getting crazy:

 This does work:

 library(Matrix)
 a1-b1-c(1,2)
 c1-rnorm(2)
 aDgt-spMatrix(ncol=3,nrow=3,i=a1,j=b1,x=c1)
 png(myImage.png)
 image(aDgt)
 dev.off()

 But this doesn't !!!

 f-function(x){
 png(myImage.png)
 image(x)
 dev.off()
 }
 f(aDgt)

 My image is saved as a text file and contains nothing at all !!!
 Thanks in advance,

 Gildas Mazo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in La.svd Lapack routine 'dgesdd'

2010-05-04 Thread Douglas Bates

Google the name dgesdd to get the documentation where you will find
that  the error code indicates that the SVD algorithm failed to
converge.  Evaluation of the singular values and vectors is done via
an iterative optimization and on some occasions will fail to converge.
 Frequently this is related to the scaling of the matrix.  If some
rows or columns are a very large magnitude relative to others the
convergence of the optimization can be impeded.

Providing a reproducible example of such an error condition will help
in diagnosing what is happening.

If you wonder why the error message is so enigmatic, it is because the
underlying code is Fortran and does not provide much flexibility for
informative error trapping.

On Tue, May 4, 2010 at 1:24 AM, steven mosher mosherste...@gmail.com wrote:
 Error in La.svd(x, nu, nv) : error code 1 from Lapack routine ‘dgesdd’

 what resources are there to track down errors like this

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help in understanding R code, and maybe some math

2010-05-23 Thread Douglas Bates

On Sun, May 23, 2010 at 5:09 AM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2010-05-23 0:56, john smith wrote:

 Hi,
 I am trying to implement Higham's algorithm for correcting a non positive
 definite covariance matrix.
 I found this code in R:

 http://projects.cs.kent.ac.uk/projects/cxxr/trac/browser/trunk/src/library/Recommended/Matrix/R/nearPD.R?rev=637

 I managed to understand most of it, the only line I really don't
 understand
 is this one:
 X- tcrossprod(Q * rep(d[p], each=nrow(Q)), Q)

 This line is supposed to calculate the matrix product Q*D*Q^T, Q is an n
 by
 m matrix and R is a diagonal n by n matrix. What does this mean?
 I also don't understand the meaning of a cross product between matrices, I
 only know it between vectors.

In the original S language, on which R is based, the function named
crossprod was used for what statisticians view as the cross-product of
the columns of a matrix, such as a multivariate data matrix or a model
matrix.  That is

crossprod(X) := X'X

This is a special case of the cross-product of the columns of two
matrices with the same number of rows

crossprod(X, Y) := X'Y

The tcrossprod function was introduced more recently to mean the
crossprod of the transpose of X.  That is

trcossprod(X) := crossprod(t(X)) := X %*% t(X)

These definitions are unrelated to the cross-product of vectors used
in Physics and related disciplines.

The reason for creating such functions is that these are common
operations in statistical computing and it helps to know the special
structure (e.g. the result of crossprod(X) or tcrossprod(X) is a
symmetric, positive semidefinite matrix).

 You could have a look at the help page for crossprod which
 gives the definitions of crossprod and tcrossprod.

 Perhaps this will help:

 Q - matrix(1:12, ncol=3)
 v - rep(1:3, each=nrow(Q)
 Q
 v
 Q * v
 (Q * v) %*% t(Q)
 tcrossprod(Q * v, Q)

  -Peter Ehlers


 Thanks,
 Barisdad.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression with sparse matricies

2010-05-23 Thread Douglas Bates

As Frank mentioned in his reply, expecting to estimate tens of
thousands of fixed-effects parameters in a logistic regression is
optimistic.  You could start with a generalized linear mixed model
instead

library(lme4)
fm1 - glmer(resp ~ 1 + (1|f1) + (1|f2) + (1|f1:f2), mydata, binomial))

If you have difficulty with that it might be best to switch the
discussion to the r-sig-mixed-mod...@r-project.org mailing list.

On Sat, May 22, 2010 at 2:19 PM, Robin Jeffries rjeffr...@ucla.edu wrote:
 I would like to run a logistic regression on some factor variables (main
 effects and eventually an interaction) that are very sparse. I have a
 moderately large dataset, ~100k observations with 1500 factor levels for one
 variable (x1) and 600 for another (X2), creating ~19000 levels for the
 interaction (X1:X2).

 I would like to take advantage of the sparseness in these factors to avoid
 using GLM. Actually glm is not an option given the size of the design
 matrix.

 I have looked through the Matrix package as well as other packages without
 much help.

 Is there some option, some modification of glm, some way that it will
 recognize a sparse matrix and avoid large matrix inversions?

 -Robin

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sparse matrices in lme4

2010-05-24 Thread Douglas Bates

On Mon, May 24, 2010 at 6:24 PM, Robin Jeffries rjeffr...@ucla.edu wrote:
 I read somewhere (help list, documentation) that the random effects in lme4
 uses sparse matrix technology.

Yes.  That is why there is such a close link between the Matrix and
lme4 packages.  The sparse matrix methods in the Matrix package are
crucial to the lme4 package.

 I'd like to confirm with others that I can't use a sparse matrix as a fixed
 effect? I'm getting an Invalid type (S4)  error.

The development version of the lme4 package, called lme4a and only
available from R-forge, has an optional argument sparseX that allows
for this.

In fact, that option is one of the reasons that the development
version has been in development for so long (the programming becomes
much more intricate when you must allow for such an option).

Some day, and I hope not too far in the future, the lme4a package will
be released as lme4.

By the way, questions such as this are probably more suitable for the
r-sig-mixed-mod...@r-project.org mailing list, which I am cc:ing on
this reply.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best way to preallocate numeric NA array?

2009-11-26 Thread Douglas Bates

On Thu, Nov 26, 2009 at 10:03 AM, Rob Steele
freenx.10.robste...@xoxy.net wrote:
 These are the ways that occur to me.

 ## This produces a logical vector, which will get converted to a numeric
 ## vector the first time a number is assigned to it.  That seems
 ## wasteful.
 x - rep(NA, n)

 ## This does the conversion ahead of time but it's still creating a
 ## logical vector first, which seems wasteful.
 x - as.numeric(rep(NA, n))

 ## This avoids type conversion but still involves two assignments for
 ## each element in the vector.
 x - numeric(n)
 x[] - NA

 ## This seems reasonable.
 x - rep(as.numeric(NA), n)

 Comments?

My intuition would be to go with the third method (allocate a numeric
vector then assign NA to its contents) but I haven't tested the
different.  In fact, it would be difficult to see differences in, for
example, execution time unless n was very large.

This brings up a different question which is, why do you want to
consider this?  Are you striving for readability, for speed, for low
memory footprint, for efficiency in some other way?  When we were
programming in S on machines with 1 mips processors and a couple of
megabytes of memory, such considerations were important.  I'm not sure
they are quite as important now.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple CHOLMOD errors when attempting poisson glmm

2009-12-25 Thread Douglas Bates

On Thu, Dec 24, 2009 at 1:03 PM, postava-davig.m
postava-davi...@husky.neu.edu wrote:

 Hello,

 I have been attempting to run a poisson glmm using lme4 for some time now
 and have had a lot of trouble.  I would say 9 times out of 10 I receive the
 following warning:

 CHOLMOD warning:  %h
 Error in mer_finalize(ans) :
  Cholmod error `not positive definite' at
 file:../Cholesky/t_cholmod_rowfac.c, line 432

That is an (admittedly obscure) indication that the Cholesky
factorization of a matrix derived from the random-effects model matrix
cannot be performed.

 My data are counts of microbe colony forming units (CFUs) collected from
 termite cuticles and the surrounding environment over a 3 year period.  I am
 attempting to analyze the effect of several factors on these counts (termite
 nest volume, temperature, humidity, light, incubation temperature, habitat,
 year, sample location, etc.) to determine which account for the variance in
 microbial communities.  These data are observations, so there are many
 missing valueswhich may be part of the problem.  I've tried many
 different combinations of variables, and also have tried reducing my data
 set to remove as many NA's and confounding variables as possible, but I
 still can't get any models to work consistently.  One most recent attempt
 had the following output:

 
model1=lmer(totalcfus~habitat*temp*moisture*light+location+(1|habitat/colony/location),family=poisson,control=list(msVerbose=1))
  0:     553377.59:  1.00573 0.620530 0.169516  26.3904 -13.1266 -33.2286
 -21.1955 -21.1064 -0.590761 -0.217403 -0.0342272 -0.960593 -0.0962517
 0.441626  1.20575 0.718621 0.680580 0.171006 0.403729 0.278822 0.275395
 0.00707767 0.0225599 0.0854869 0.0533373 0.0243451 0.00114120 0.000403226
 -0.00566960 -0.0143715 -0.00931896 -0.00879323 -0.000753236 -0.00335745
 -0.00178054 -0.000788027 -0.000288944 -0.000909455 -0.000839295 -0.000309293
 -1.35885e-05 9.76120e-06 3.57035e-05 2.78985e-05 1.01880e-05
 CHOLMOD warning:  %h
 Error in mer_finalize(ans) :
  Cholmod error `not positive definite' at
 file:../Cholesky/t_cholmod_rowfac.c, line 432

Thank you for including the output from verbose = TRUE.  It would also
help if you included the output from sessionInfo() so we can see which
version of R you are using and which version of the lme4 package you
are using.

How many observations are used in this fit?  As you can see, the
number of parameters being fit is very large and encountering
singularities is not unexpected.

May I suggest that we move this discussion to the
r-sig-mixed-mod...@r-project.org mailing list, which I have cc:d on
this reply?  That list is specifically intended for discussions of
this type.
 I have to admit that I'm at a loss, and have been unable to determine any
 pattern to when this error message comes up.  I'm hoping that someone can
 help me eek out what the issue is with my data so that I can eventually work
 out a usable model.

 Thanks so much, and happy holidays.
 --
 View this message in context: 
 http://n4.nabble.com/Multiple-CHOLMOD-errors-when-attempting-poisson-glmm-tp978573p978573.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer: how to lmer with no intercept

2009-12-29 Thread Douglas Bates

On Tue, Dec 29, 2009 at 11:46 AM, Julia7 liujul...@yahoo.com wrote:

 Hello,

 How do I fit a mixed effect model with no intercept using lmer()?
 Is the following syntax correct?

 lmer(y ~ -1+x1+x2+(-1+x1+x2|id), data=dt)

Yes.  An alternative, which I prefer, is

lmer(y ~ 0 + x1 + x2 + (0 + x1 + x2|id), dt)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: glm error: cannot correct step size

2009-12-30 Thread Douglas Bates

I forgot to cc: the list on this response.

-- Forwarded message --
From: Douglas Bates ba...@stat.wisc.edu
Date: Wed, Dec 30, 2009 at 8:05 AM
Subject: Re: [R] glm error: cannot correct step size
To: John Sorkin jsor...@grecc.umaryland.edu

On Wed, Dec 30, 2009 at 7:50 AM, John Sorkin
jsor...@grecc.umaryland.edu wrote:
 R 2.8.1
 windows XP

 I am getting an error message that I don't understand when I try to run GLM. 
 The error only occurs when I have all independent variables in the model. 
 When I drop one independent variable, the model runs fine. Can anyone help me 
 understand what the error means and how I can correct it?
 Thank you,
 John

 fit11-glm(AAMTCARE~BMI+BMIsq+SEX+jPHI+jMEDICAID+factor(AgeCat)+
 + 
 factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat)+factor(jrace),data=SimData,family=Gamma(link=log))
 Warning: step size truncated due to divergence
 Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart = 
 etastart,  :
  inner loop 1; cannot correct step size

That error message and the warning indicate that the algorithm
(iteratively reweighted least squares or IRLS) for fitting the
parameters cannot converge, which can be because the model is
over-specified.  As you see, when you remove of the terms you do get
parameter estimates and I would guess that, even then, the model will
be over-specified.

 # Drop factor(jrace) and model runs without a problem.
 fit11-glm(AAMTCARE~BMI+BMIsq+SEX+jPHI+jMEDICAID+factor(AgeCat)+
 + factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat)       
 ,data=SimData,family=Gamma(link=log))

 # Drop jPHI and model runs without a problem.
 fit11-glm(AAMTCARE~BMI+BMIsq+SEX+    jMEDICAID+factor(AgeCat)+
 + 
 factor(jINDINC)+jMARSTAT+jEDUCATION+factor(jsmokercat)+jrace,data=SimData,family=Gamma(link=log))

 Thanks,
 John

 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Confidentiality Statement:
 This email message, including any attachments, is for th...{{dropped:6}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Piecewise regression in lmer

2010-01-04 Thread Douglas Bates

On Mon, Jan 4, 2010 at 6:24 AM, Walmes Zeviani
walmeszevi...@hotmail.com wrote:

 AD Hayward wrote:

 Dear all,

 I'm attempting to use a piecewise regression to model the trajectory
 of reproductive traits with age in a longitudinal data set using a
 mixed model framework. The aim is to find three slopes and two points-
 the slope from low performance in early age to a point of high
 performance in middle age, the slope (may be 0) of the plateau from
 the start of high performance to the end of high performance , and the
 slope of the decline from the end of high performance to the end of
 life.

 I've found the segmented package useful, but it cannot be implemented
 in a mixed model framework. I've also attempted piecewise regression
 using this formula in lmer:

 m-lmer(repro ~ OTHER FIXED EFFECTS + age*(age  2) + age*(age = 2 
 age  8) + age*(age = 8) + (1|id) + (1|yr), data = reproduction,
 family = binomial, link = logit, GHQ = TRUE)

 However, this gives the warning:

 Warning message:
 In mer_finalize(ans) : gr cannot be computed at initial par (65)

 which is not apparent if I use just two break points or I implement
 the model in glm.

 My question is essentially whether anyone can recommend a method for
 performing piecewise regression in lmer or another mixed model
 framework. Any advice would be greatly appreciated.

 Adam,

 A segmented linear model, for estimation purposes, is a nonlinear model. It
 requires a iteractive procedure for estimation of fixed effects. You could
 use nlmer() for this.

It appears that Adam is using fixed knot positions, in which case the
segmented model is a linear model.  He is also using family = binomial
so it is a generalized linear mixed model, which does require
iterative optimization, but does not require nlmer().

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm output for binomial family

2010-06-08 Thread Douglas Bates

On Tue, Jun 8, 2010 at 7:10 AM, Enrico Colosimo enrico...@gmail.com wrote:
 Hello,
  I am having some trouble running a very simple
 example. I am running a logistic regression entering the SAME data set
 in two different forms and getting different values for the deviance residual.

 Just look with this naive data set:

 
 # 1- Entering as a Bernoulli data set
  y-c(1,0,1,1,0)
  x-c(2,2,5,5,8)
  ajust1-glm(y~x,family=binomial(link=logit))
  ajust1
 #
 Coefficients:
 (Intercept)            x
     1.3107      -0.2017

 Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
 Null Deviance:      6.73
 Residual Deviance: 6.491        AIC: 10.49
 #
 # 2- Entering as Binomial data set
 #
  ysim-c(1,2,0)
  ynao-c(1,0,1)
  x-c(2,5,8)
  dados-cbind(ysim,ynao,x)
  dados-as.data.frame(dados)
  attach(dados)
  ajust2-glm(as.matrix(dados[,c(1,2)])~x,family=binomial, data=dados)
  summary(ajust2)
 #
 Coefficients:
 (Intercept)            x
     1.3107      -0.2017

 Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
 Null Deviance:      3.958
 Residual Deviance: 3.718        AIC: 9.104
 =

 It seems that there is problem with the first fitting!!!

In what way?  Notice that the estimates of the coefficients are the
same in the two fits and the difference between the null deviance and
the residual deviance is approximately the same.  If you are worried
about the deviance in the first fit being greater than the deviance in
the second fit it is because of the definition of the deviance used.
The deviance for the binomial fit is a shifted version of the deviance
from the Bernoulli fit, which is why the null deviance is also
reported.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] do faster ANOVAS

2010-06-10 Thread Douglas Bates

The lm and aov functions can take a matrix response allowing you to
fit all of the responses for a single attribute simultaneously.


On Thu, Jun 10, 2010 at 8:47 AM, melissa mellep...@orange.fr wrote:
 Dear all R users,
 I want to realize 800 000 ANOVAS and to store Sum of Squares of the effects. 
 Here is an extract of my table data
 Product attribute subject rep t1 t2 t3 … t101
 P1 A1 S1 R1 1 0 0 … 1
 I want to realize 1 ANOVA per timepoint and per attribute, there are 101 
 timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be an 
 ANOVA with two factors :
 Here is one example:
 Aov(t1~Subject*Product,data[data$attribute==”A1”,])
 I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal.
 In fact I want the result in several matrices:
 Ssprod matrice:
 T1 t2 t3 t4 … t101
 A1 ssprod(A1,T1)
 A2
 A3
 …
 A8
 So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and 
 sstotal.
 And this is for one permutation, and I want to do 1000 permutations
 Here is my code:
 SSmatrixglobal-function(k){

 daten.temp-data
 daten.temp$product=permutations[[k]]
 listmat-apply(daten.temp[,5:105],2,function(x,y){
 tab2-as.data.frame(cbind(x,y))
 tab.class-by(tab2[,1:3],tab2[,4],function(x){
 f - formula(paste(names(x)[1],~,names(x)[2],*,names(x)[3],sep=))
 anovas - aov(f, data=x)
 anovas$call$formula -f
 s1 - summary(anovas)
 qa - s1[[1]][,2]
 return(qa)
 })
 return(tab.class)
 },y=daten.temp[,1:3]
 )
 ar - 
 array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat)))
 l=lapply(1:4,function(i) ar[i,,])
 sssujet=l[[1]]
 ssprod=l[[2]]
 ssinter=l[[3]]
 sserreur=l[[4]]
 ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal)
 ss=as.data.frame(ss)
 sqlSave(channel,ss,SS1000,append=T)
 rm(ss,numperm,daten.temp)
 }

 system.time(por - lapply(c(1:1000), SSmatrixglobal))

 But it takes time about 90seconds for a permutation so *1000, how can I do in 
 order to do faster ANOVAS?

 Many thanks
 Best regards
 Mélissa

 PS: I think that I can gain a lot of time in the aov function but I don't 
 know how to do
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compiling R with multi-threaded BLAS math libraries - why not actually ?

2010-06-12 Thread Douglas Bates

On Sat, Jun 12, 2010 at 6:18 AM, Tal Galili tal.gal...@gmail.com wrote:
 Hello Gabor, Matt, Dirk.

 Thank you all for clarifying the situation.

 So if I understand correctly then:
 1) Changing the BLAST would require specific BLAST per computer
 configuration (OS/chipset).

It's BLAS (Basic Linear Algebra Subroutines) not BLAST.  Normally I
wouldn't be picky like this but if you plan to use a search engine you
won't find anything helpful under BLAST.

 2) The advantage would be available only when doing  _lots_ of linear
 algebra

You need to be working with large matrices and doing very specific
kinds of operations before the time savings of multiple threads
overcomes the communications overhead.  In fact, sometimes the
accelerated BLAS can slow down numerical linear algebra calculations,
such as sparse matrix operations.

 So I am left wondering for each item:
 1) How do you find a better (e.g: more suited) BLAST for your system? (I
 am sure there are tutorials for that, but if someone here has
 a recommendation on one - it would be nice)

As Dirk has pointed out, it is a simple process.

Step 1: Install Ubuntu or some other Debian-based Linux system.
Step 2: type
sudo apt-get install r-base-core libatlas3gf-base

 2) In what situations do we use __lots of linear algebra?  For example, I
 have cases where I performed many linear regressions on a problem, would
 that be a case the BLAST engine be effecting?

Re-read David's posting.  The lm and glm functions do not benefit
substantially from accelerated BLAS because the underlying
computational methods only use level-1 BLAS. (David said they don't
use BLAS but that is not quite correct.  I posted a follow-up comment
describing why lm and glm don't benefit from accelerated BLAS.)

 I am trying to understand if REvolution emphasis on this is a
 marketing gimmick, or are they insisting on something that some R users
 might wish to take into account.  In which case I would, naturally (for many
 reasons), prefer to be able to tweak the native R system instead of needing
 to work with REvolution distribution.

As those who, in Duncan Murdoch's phrase, found the situation
sufficiently extreme to cause them to read the documentation, would
know, descriptions of using accelerated BLAS with R have been in the R
administration manual for years.  Admittedly it is not a
straightforward process but that is because, like so many other
things, it needs to be handled differently on each operating system.
In fact it is even worse because the procedure can be specific to the
operating system and the processor architecture and, sometimes, even
the task.  Again, re-read David's posting where he says that you
probably don't want to combine multiple MKL threads with explicit
parallel programming in R using doSMP.

David's posting (appropriately) shows very specific examples that
benefit greatly from accelerated BLAS.   Notice that these examples
incorporate very large matrices.  The first two examples involve
forming chol(crossprod(A)) where A is 1 by 5000.  If you have very
specific structure in A this calculation might be meaningful.  In
general, it is meaningless because crossprod(A) is almost certainly
singular.  (I am vague on the details but perhaps someone who is
familiar with the distribution of singular values of matrices can
explain the theoretical results.  There is a whole field of statistics
research dealing with sparsity in the estimation of covariance
matrices that attacks exactly this large n, large p rank deficiency
problem.)

 Lastly, following on Matt suggestion, if any has a tutorial on the subject,
 I'd be more then glad to publish it on r-statistics/r-bloggers.

 Thanks again to everyone for the detailed replies.

 Best,
 Tal




 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Sat, Jun 12, 2010 at 6:01 AM, Matt Shotwell shotw...@musc.edu wrote:

 In the case of REvolution R, David mentioned using the Intel MKL,
 proprietary library which may not be distributed in the way R is
 distributed. Maybe REvolution has a license to redistribute the library.
 For the others, I suspect Gabor has the right idea, that the R-core team
 would rather not keep architecture dependent code in the sources,
 although there is a very small amount already (`grep -R __asm__`).

 However, I know using Linux (Debian in particular) it is fairly
 straightforward to build R with `enhanced' BLAS libraries. The R
 Administration and Installation manual has a pretty good section on
 linking with enhanced BLAS and LAPACK libs, including the Intel MKL, if
 you are willing cough up $399, or swear not to use the library
 commercially or academically.

 Maybe a short tutorial using free

Re: [R] Can one get a list of recommended packages?

2010-06-12 Thread Douglas Bates

On Sat, Jun 12, 2010 at 8:37 AM, Dr. David Kirkby
david.kir...@onetel.net wrote:
 R 2.10.1 is used in the Sage maths project. Several recommended packages
 (Matrix, class, mgcv, nnet, rpart, spatial, and survival) are failing to
 build on Solaris 10 (SPARC).

Have you checked the dependencies for those packages?  Some require GNU make.

 We would like to be able to get a list of the recommended packages for R
 2.10.1, but ideally via a call to R, so it is not necessary to update that
 list every time a new version of R is released. We do not want to access the
 Internet to get this information.

 Is there a way in R to list the recommended packages?

I'm not sure I understand the logic of this.  If you are going to
build R then presumably you have the tar.gz file which contains the
sources for the recommended packages in the subdirectory
src/library/Recommended/. Why not get the list from there?

$ cd ~/src/R-devel/src/library/Recommended/
$ ls *.tgz
boot.tgz codetools.tgz   lattice.tgz  mgcv.tgz  rpart.tgz
class.tgzforeign.tgz MASS.tgz nlme.tgz  spatial.tgz
cluster.tgz  KernSmooth.tgz  Matrix.tgz   nnet.tgz  survival.tgz

 Better still, is there a way to list the recommended packages which have not
 been installed, so getting a list of any failures?

Again, this seems to be a rather convoluted approach.  Why not check
why the packages don't install properly?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pretty printing progress

2010-06-17 Thread Douglas Bates

On Thu, Jun 17, 2010 at 10:50 AM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Thu, Jun 17, 2010 at 3:33 PM, Doran, Harold hdo...@air.org wrote:
 I have a function that is an iterative process for estimating some MLEs. I 
 want to print some progress to screen as the process iterates. I would like 
 to try and line things up nicely in the R window, but am not sure the best 
 way to do this. Below is a toy example. Suppose I want the value of 10 to be 
 just below iteration and the value of -1234 to be just below 'Log 
 Likelihood'.

  Sure you just dont want to use the progress bar functions from the
 plyr package:

 ?plyr::create_progress_bar

  another example of things being in the wrong package

If you want to stick with a text display you can use the sprintf
function to format the strings that you print


 cat('Iteration  Log Likelihood\n', sprintf(%8d%20g\n, 10, -1234))
Iteration  Log Likelihood
   10   -1234

I would avoid the tab character as you can't count on the displays of
tabs to be consistent.  You may also want to change the display of the
log-likelihood to be a fixed number of decimal places rather than a
general format for floating point numbers, which can switch into the
e notation for very large or very small numbers.


 cat('Iteration  Log Likelihood\n', sprintf(%8d%20.4f\n, 10, -1234))
Iteration  Log Likelihood
   10  -1234.

The format of the format strings is another little language to learn
but it is a very powerful mechanism.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to efficiently compute set unique?

2010-06-21 Thread Douglas Bates

On Mon, Jun 21, 2010 at 8:38 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Jun 21, 2010, at 9:18 PM, Duncan Murdoch wrote:

 On 21/06/2010 9:06 PM, G FANG wrote:

 Hi,

 I want to get the unique set from a large numeric k by 1 vector, k is
 in tens of millions

 when I used the matlab function unique, it takes less than 10 secs

 but when I tried to use the unique in R with similar CPU and memory,
 it is not done in minutes

 I am wondering, am I using the function in the right way?

 dim(cntxtn)
 [1] 13584763        1
 uniqueCntxt = unique(cntxtn);    # this is taking really long

 What type is cntxtn?  If I do that sort of thing on a numeric vector, it's
 quite fast:

  x - sample(10, size=13584763, replace=T)
  system.time(unique(x))
  user  system elapsed
  3.61    0.14    3.75

 If it's a factor, it could be as simple as:

 levels(cntxtn)  # since the work of unique-ification has already been
 done.

Not quite.  When you generate a factor, as you do in your example, the
levels correspond to the unique values of the original vector.  But
when you take a subset of a factor the levels are preserved intact,
even if some of those levels do not occur in the subset.  This is why
there are unusual arguments with names like drop.unused.levels in
functions like model.frame.  It is also a subtle difference in the
behavior of factor(x) and as.factor(x) when x is already a factor.

 ff - factor(sample.int(200, 1000, replace = TRUE))
 ff1 - ff[1:40]
 length(levels(ff))
[1] 199
 length(levels(ff1))
[1] 199
 length(levels(as.factor(ff1)))
[1] 199
 length(levels(factor(ff1)))
[1] 34

 x - factor(sample(10, size=13584763, replace=T))
 system.time(levels(x))
   user  system elapsed
      0       0       0
 system.time(y - levels(x))
   user  system elapsed
      0       0       0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice and Beamer

2010-06-28 Thread Douglas Bates

On Mon, Jun 28, 2010 at 12:28 PM, Doran, Harold hdo...@air.org wrote:
 Two things I think are some of the best developments in statistics and 
 production are the lattice package and the beamer class for presentation in 
 Latex. One thing I have not become very good at is properly sizing my visuals 
 to look good in a presentation.

 For instance, I have the following code that creates a nice plot (sorry, 
 cannot provide reproducible data).

 bwplot(testedgrade~person_measure|gender + ethnicity, pfile, layout=c(2,5),
 main = 'Distribution of Person Measure by Grade\n Conditional on Gender and 
 Ethnicity (Math)',
                xlab = 'Grade')

 Now inside my latex document using the beamer class for presentation I have 
 the following

 \begin{frame}
 \frametitle{Distribution of Person Parameters by Grade Conditional on Gender 
 and Ethnicity}
 \begin{figure}[htb]
 \centering
 \fbox{\includegraphics[scale=.3]{personGenEthn.pdf}}
 \end{figure}
 \end{frame}

 I use the scale argument here. I do this totally randomly. I first start with 
 scale=.5. Then, I create the document and look at it. If it seems to fit, I 
 keep it. If it's too big, I resize it until it looks good. There must 
 certainly be a much better way to size these for specific use with latex 
 presentations.

 Any thoughts?


I think we have had this discussion before and I have tried to
convince you to use Sweave with beamer and lattice.:-)

A big advantage of Sweave is that you have the code that the generates
the figures in the LaTeX file and you don't allow the possibility of
losing track of PDF files containing the latest versions of figures.

In my preamble I have some lines like

\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all}
\SweaveOpts{include=TRUE}
\setkeys{Gin}{width=\textwidth}

Setting the default height and width of the PDF figure and the
inclusion width=\textwidth provides a default scaling that looks good
to me.  If I want a shorter figure that allows for text on the slide
then I set the height to a smaller value.  A full height version looks
like

\begin{frame}
  \frametitle{Plot of inverse canonical link for the Bernoulli distribution}
BernoulliinvLink,fig=TRUE,echo=FALSE=
linkinv - function(eta) 1/(1+exp(-eta))
eta - seq(-7,7,len = 701)
print(xyplot(linkinv(eta) ~ eta, type = c(g,l),
 xlab = expression(eta),
 ylab = expression(mu == frac(1,1+exp(-eta)
@
\end{frame}



 Harold


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme4

2010-07-06 Thread Douglas Bates

On OS X you need to install the source package for lme4 as the binary
package fails one of the tests.  We have been unable to reproduce this
failure under other operating systems, which makes it hard to debug.


On Tue, Jul 6, 2010 at 11:09 AM, Alex Foley alex_foley_...@yahoo.com wrote:
 Hi,

 I was trying to install lme4 package, but got the following errors:

 install.packages(lme4)
 Warning in install.packages(lme4) :
  argument 'lib' is missing: using '/Users/xx/Library/R/2.11/library'
 Warning message:
 In getDependencies(pkgs, dependencies, available, lib) :
  package ‘lme4’ is not available



 The session info is:


 sessionInfo()
 R version 2.11.1 (2010-05-31)
 x86_64-apple-darwin9.8.0

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] Matrix_0.999375-39 lattice_0.18-8

 loaded via a namespace (and not attached):
 [1] grid_2.11.1   nlme_3.1-96   stats4_2.11.1 tools_2.11.1

 Should I be installing this in a different manner?


 thanks!




        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] advice/opinion on quot; lt; -quot; vs quot; =quot; in teaching R

2010-01-15 Thread Douglas Bates

On Fri, Jan 15, 2010 at 10:00 AM, Ben Bolker bol...@ufl.edu wrote:
 John Kane jrkrideau at yahoo.ca writes:


 I've only been using R for about 2.5 years but and I'm not all that good  but
 I vote for - .

 I think the deciding factor is in  RSiteSearch() and the various manuals.

 Almost everything I see uses - .  Why introduce = when it is not used
 normally?  It will just confuse the
 students who are trying to use any of the documentation.

 Not to mention they might slammed for bad syntax
 on the R-help mailing list.  :)


  Those are all good reasons.
  I have said something similar before
 (see http://www.mail-archive.com/r-help@r-project.org/msg16904.html),
 but I tend to use = because it seems to be more intuitive for
 students, despite being logically confused at a deeper level,
 and I want to spare them any additional cognitive load when they
 are first getting introduced to R.
   I'm not particularly convinced by the - is more general
 and there are some contexts where = doesn't work, because I'm
 not trying to be absolutely rigorous, nor teach all the possible
 ins and outs of R syntax. I would be very surprised if any of
 the examples given actually came up in the course of a first-semester
 statistics/modeling R course.

I teach the idiom

summary(fm1 - lm(y ~ x, mydata))

in my introductory courses.

 I just want to do what works best for
 the students -- the problem is deciding on the balance between
 short term benefit (- is one more odd thing to get used to)
 and long term benefit (they will see - in other contexts, so
 they might as well get used to it eventually).

  Ben Bolker

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hierarchical Linear Model using lme4's lmer

2010-01-16 Thread Douglas Bates

On Sat, Jan 16, 2010 at 8:20 AM, Walmes Zeviani
walmeszevi...@hotmail.com wrote:

 Doug,

 It appears you are mixing nlme and lme4 formulation type.
 On nlme library you type

 lme(y~x, random=~1|subjetc)

 On lme4 library you type

 lmer(y~x+(1|subject))

 You mixed them.

 At your disposal.

Which is what I tell my wife when I am standing by our sink.

 Walmes.


 Doug Adams wrote:

 Hi,

 I was wondering:  I've got a dataset where I've got student 'project's
 nested within 'school's, and 'division' (elementary, junior, or
 senior) at the student project level.  (Division is at the student
 level and not nested within schools because some students are
 registered as juniors  others as seniors within the same school.)

 So schools are random, division is fixed, and the student Score is the
 outcome variable.  This is what I've tried:

 lmer(data=Age6m, Score ~ division + (1|school), random=~1 | school)

 Am I on the right track?  Thanks everyone,   :)

 Doug Adams
 MStat Student
 University of Utah

Walmes is correct that this is mixing two formulations of the model.
It turns out that the model will be fit correctly anyway.  The lmer
function has a ... argument which will silently swallow the argument
random = ~ 1|school and ignore it.  Looks like we should add a check
for specification of a random argument and provide a warning if it is
present.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Symmetric Matrix classes

2010-01-19 Thread Douglas Bates

On Tue, Jan 19, 2010 at 9:26 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Scanning for 'Matrix' in old R-help e-mail, I found

 GA == Gad Abraham gabra...@csse.unimelb.edu.au
     on Fri, 27 Nov 2009 13:45:00 +1100 writes:

    GA Hi,
    GA I'd like to store large covariance matrices using Matrix classes.

    GA dsyMatrix seems like the right one, but I want to specify just the
    GA upper/lower triangle and diagonal and not have to instantiate a huge
    GA n^2 vector just for the sake of having half of it ignored:

    GA Dumb example:
    GA M - new(dsyMatrix, uplo=U, x=rnorm(1e4), Dim=as.integer(c(100, 
 100)))
    GA diag(M) - 1

    GA This doesn't work:
    GA M - new(dsyMatrix, uplo=U, x=0, Dim=as.integer(c(100, 100)))
    GA Error in validObject(.Object) :
    GA invalid class dsyMatrix object: length of x slot != prod(Dim)

    GA Is there an easier way of doing this?

 I think you want a  dspMatrix  (sp == symmetric packed)
 instead.

 Before going into details: Is this topic still interesting to
 those involved almost two months ago?

 Regards,
 Martin

Also, I fail to understand the advantage of allocating storage for
roughly half the elements in the matrix instead of the whole matrix
when the matrix is so large.  If dense storage of a symmetric matrix
of size 2 takes up too much memory (approx 3 GB for each copy) it
is unlikely that the packed symmetric storage form, using about 1.5 GB
per copy, is going to save the day.

 8 * (2 * 2)/2^30  # size in gigabytes for full array
[1] 2.980232
 4 * (2 * 20001)/2^30  # size in gigabytes for sp array
[1] 1.490191

If the matrix is sparse, the dsCMatrix class may help.

And there is also the issue of what exactly do you want to do with the
matrix once you have stored it?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hierarchical data sets: which software to use?

2010-02-05 Thread Douglas Bates

On Sun, Jan 31, 2010 at 10:24 PM, Anton du Toit atdutoitrh...@gmail.com wrote:
 Dear R-helpers,

 I’m writing for advice on whether I should use R or a different package or
 language. I’ve looked through the R-help archives, some manuals, and some
 other sites as well, and I haven’t done too well finding relevant info,
 hence my question here.

 I’m working with hierarchical data (in SPSS lingo). That is, for each case
 (person) I read in three types of (medical) record:

 1. demographic data: name, age, sex, address, etc

 2. ‘admissions’ data: this generally repeats, so I will have 20 or so
 variables relating to their first hospital admission, then the same 20 again
 for their second admission, and so on

 3. ‘collections’ data, about 100 variables containing the results of a
 battery of standard tests. These are administered at intervals and so this
 is repeating data as well.

 The number of repetitions varies between cases, so in its one case per line
 format the data is non-rectangular.

 At present I have shoehorned all of this into SPSS, with each case on one
 line. My test database has 2,500 variables and 1,500 cases (or persons), and
 in SPSS’s *.SAV format is ~4MB. The one I finally work with will be larger
 again, though likely within one order of magnitude. Down the track, funding
 permitting, I hope to be working with tens of thousands of cases.

Although this may not be helpful for your immediate goal, storing and
manipulating data of this size and complexity (and, I expect, cost for
collection) really calls for tools like relational databases.  A
single flat file of 2500 variables by 1500 cases is almost never the
best way to organize such data.  A normalized representation as a
collection of interlinked tables in a relational data base is much
more effective and less error prone.  The widespread use of
spreadsheets or SPSS data sets or SAS data sets which encourage the
single table with a gargantuan number of columns, most of which are
missing data in most cases approach to organization of longitudinal
data is regrettable.

For later analysis in R it is better to start with long form of the
data, as opposed to the wide form, even if it means repeating
demographic information over several occasions.  Using a relational
database allows for a long view to be generated without the
possibility of inconsistency in the demographics.  I am using the
descriptions long and wide in the sense that they are used in the
reshape help page.  See

?reshape

in R.  The long view is also called the subject/occasion view in the
sense that each row corresponds to one subject on one occasion.

Robert Gentleman's book R Programming for Bioinformatics provides
background on linking R to relational databases.


As I said at the beginning, you may not want to undertake the
necessary study and effort to reorganize your data for this specific
project but if you do this a lot you may want to consider it.

 I am wondering if I should keep using SPSS, or try something else.

 The types of analysis I’ll typically will have to do will involve comparing
 measurements at different times, e.g. before/ after treatment. I’ll also
 need to compare groups of people, e.g. treatment / no treatment. Regression
 and factor analyses will doubtless come into it at some point too.

 So:

 1. should I use R or try something else?

 2. can anyone advise me on using R with the type of data I’ve described?


 Many thanks,

 Anton du Toit

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer Error message

2010-02-06 Thread Douglas Bates

On Sat, Feb 6, 2010 at 4:45 AM, Martin Bulla bu...@centrum.cz wrote:
 Does anybody knows what this error message means: Error in object$terms : $
 operator not defined for this S4 class

The error message means what it says and it doesn't come from lmer, it
comes from the drop1 function being applied to a model fit by lmer.
You are assuming that you can apply drop1 to an lmer model and you
can't.  You need to do the modifications of the model formula by hand
because an lmer formula contains terms that would not be meaningful
for drop1.

 I have peformed the following steps:


 library(lattice)

 library(Matrix)

 library(lme4)

 inkm inkm$Gamie glm.incm drop1(glm.incm,test=Ch) Error in object$terms :
 $ operator not defined for this S4 class


 Your suggestin would be of a greatl help to me,
 Martin


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-Help

2010-02-07 Thread Douglas Bates

On Sat, Feb 6, 2010 at 2:46 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Feb 6, 2010, at 3:29 PM, Ravi Ramaswamy wrote:

 Hi - I am not familiar with R.  Could I ask you a quick question?

 When I read a file like this, I get an error.  Not sure what I am doing
 wrong.  I use a MAC.  How do I specify a full path name for a file in R?
  Or
 do files have to reside locally?

 KoreaAuto - read.table(/Users/

Especially when just starting using R the simplest approach is

KoreaAuto - read.table(file.choose())

which brings up a file chooser panel so you can point and click your
way to the desired file.  If the file is tab-delimited, as appears to
be the case in the file you enclosed, you may want to use read.delim
instead of read.table.  The read.delim function sets up the defaults
for the many optional arguments to read.table specifically for
tab-delimited files with a header line of column names as you have
shown.

 I think the opening and clsing quotes meant that you supplied an empty
 string to the file argument.

 raviramaswamy/Documents/Rutgers/STT 586/HW1 Data.txt)
 Error: unexpected numeric constant in KoreaAuto -
 read.table(/Users/raviramaswamy/Documents/Rutgers/STT 586



 Using single instances of either sort of quote (  or ' ) on the ends of
 strings should work. If you drag a file from a Finder window to the
 R-console you should get a fully specified file path and name.

 Seems like the working directory is

 getwd()

 [1] /Users/raviramaswamy

 rd - read.table(file=/Users/davidwinsemius/Downloads/meminfo.csv,
 sep=,, header=TRUE)
 rd
     time      RSS      VSZ  MEM
 1       1  3027932  3141808  4.5
 2       2  3028572  3141808  4.5
 3       3  3030208  3141808  4.5
 4       4  302  3150004  4.5
 5       5  3035036  3150004  4.5

 You can also shorten the Users/username part to ~
 rd - read.table(file=~/Downloads/meminfo.csv, sep=,, header=TRUE)




 so I said this and still got an error

 KoreaAuto - read.table(/Documents/Rutgers/HW1Data)

 Error: unexpected '/' in KoreaAuto - read.table(/

 But using no quotes will definitely not work. (And that was not a full path
 name anyway.)



 Could someone please help me with the correct syntax?

 Thanks

 Ravi

   Year   AO      GNP          CP   OP
 01    1974 .0022     183          2322 189
 02    1975 .0024     238          2729 206
 03    1976 .0027     319          3069 206
 04    1977 .0035     408          2763 190
 05    1978 .0050     540          2414 199
 06    1979 .0064     676          2440 233
 07    1980 .0065     785          2430 630
 08    1981 .0069     944          2631 740
 09    1982 .0078     1036         3155 740
 10    1983 .0095     1171         3200 660


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data views (Re: (Another) Bates fortune?)

2010-02-08 Thread Douglas Bates

On Sun, Feb 7, 2010 at 2:40 PM, Emmanuel Charpentier
charp...@bacbuc.dyndns.org wrote:
Note : this post has been motivated more by the hierarchical data
subject than the aside joke of Douglas Bates, but might be of interest
to its respondents.

Le vendredi 05 février 2010 à 21:56 +0100, Peter Dalgaard a écrit :
Peter Ehlers wrote:
I vote to 'fortunize' Doug Bates on

Hierarchical data sets: which software to use?

The widespread use of spreadsheets or SPSS data sets or SAS data sets
which encourage the single table with a gargantuan number of columns,
most of which are missing data in most cases approach to organization
of longitudinal data is regrettable.

http://n4.nabble.com/Hierarchical-data-sets-which-software-to-use-td1458477.html#a1470430

Hmm, well, it's not like long format data frames (which I actually
think are more common in connection with SAS's PROC MIXED) are much
better. Those tend to replicate base data unnecessarily - as if rats
change sex with millisecond resolution.

[ Note to Achim Zeilis : the rats changing sex with millisecond
resolution quote is well worth a nomination to fortune fame ; it
seems it is not one already... ]

The correct data structure
would be a relational database with multiple levels of tables, but, to
my knowledge, no statistical software, including R, is prepared to deal
with data in that form.

I think if you go back to my original reply you will see that my first
suggestion was to use an SQL data base. I didn't mention views (in
the SQL sense) explicitly but those are a natural construction for
organizing longitudinal data. The data can be stored as a set of
normalized tables in a data base but extracted as a data frame in the
long format.

Well, I can think of two exceptions :

- BUGS, in its various incarnations (WinBUGS, OpenBUGS, JAGS), does not
require its data to come from the same source. For example, while
programming a hierarchical model (a. k. a. mixed-effect model),
individual level variables may come from one source and various group
level variables may come from other sources. Quite handy : no previous
merge() required. Now, writing (and debugging !) such models in BUGS
is another story...

- SAS has had this concept of data view for a long time, its most
useful incarnation being a data view of an SQL view. Again, this
avoids the need to actually merge the datasets (which, AFAICR, is a
serious piece of pain in the @$$ in SAS (maybe that's the *real*
etymology of the name ?)).

This problem has bugged me for a while. I think that the concept of a
data view is right (after all, that's one of the core concepts of SQL
for a reason...), but that implementing it *cleanly* in R is probably
hard work. Using a DBMS for maintaining tables and views and querying
them just at the right time does help, but the ability of using these
DBMS data without importing them in R is, AFAIK, currently lacking.

I think the issue is more than that. Most model-fitting functions in
R incorporate a formula/data - model.frame - model.matrix sequence.
The symbolic analysis to create the model frame can, I think, be
applied to a view and the result stored back in an SQL data base. (I
haven't looked at the code for model.frame in a long time but I think
it can be applied a row at a time or to chunks of rows.) Some
auxiliary information, such as unused factor levels, could be
accumulated. The model.matrix function
can be a bit more problematic but it too could be applied to chunks
when generating dense model matrices, with the summary information
from the chunks being accumulated. Updating sparse model matrices and
accumulating summary information is potentially more time-consuming
because you may need to update the form of the summary matrices as
well as the numerical values.

Of course in SAS these changes are easier because their least squares
calculations are based on accumulating summary matrices from the data
a row at a time.

I think there are three different cases for fitting models based on
large model matrices. If you have very large n (number of rows) but
small to moderate p (number of columns) in the model matrix then you
can work on chunks of rows using dense matrices to accumulate summary
results. Large n and large p producing a sparse model matrix could be
handled with the sparse.model.matrix function in the Matrix package
because in these cases the model frame and the sparse model matrix
tend to use smaller amounts of storage. When you have a large n and a
large p creating a dense model matrix I think the best course is to
buy a machine with a 64-bit processor and a 64-bit operating system
and stuff it full of memory. It depends on how large p^2 is compared
to np whether you are better off working in chunks or rows or not.

One upon a time, a very old version of RPgSQL (a Bioconductor package),
aimed to such a representation : it created objects

Re: [R] lmer

2010-02-09 Thread Douglas Bates

On Tue, Feb 9, 2010 at 4:38 AM, Demirtas, Hakan demir...@uic.edu wrote:

 Does lmer do three-level mixed-effects models?

Yes.

 What forms of outcome
 variables can it handle (continuous, survival, binary)? I'd appreciate any
 help.

Continuous outcomes for lmer.  glmer can handle binary outcomes.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Potential problem with subset !!!

2010-02-12 Thread Douglas Bates

On Fri, Feb 12, 2010 at 9:22 AM, Arnaud Mosnier a.mosn...@gmail.com wrote:
 Dear useRs,

 Just a little post to provide the answer of a problem that took me
 some time to resolve !
 Hope that reading this will permit the others to avoid that error.

 When using the subset function, writing

 subset (data, data$columnname == X) or subset (data, columnname == X)

 do the same thing.

 thus, the function consider that argument name given after the coma
 (like columnname) is the name of a column of the data frame
 considered.
 A problem occur when other arguments such as X are the names of both a
 column of the data frame and  an object !

 Here is an example:

 df - data.frame(ID = c(a,b,c,b,e), Other = 1:5)
 ID - unique (df$ID)
 ID

 ## Now the potential problem !!

 subset (df, df$ID == ID[4])

 ## BE CAREFUL subset function use the column ID of the data.frame
 ## and NOT the object ID containing unique value 

 Sorry if it seems obvious for some of you, but hope that others find
 it useful !!

Myself, I think it would be obvious to anyone who had read the
documentation for which the third paragraph is

 For data frames, the ‘subset’ argument works on the rows.  Note
 that ‘subset’ will be evaluated in the data frame, so columns can
 be referred to (by name) as variables in the expression (see the
 examples).



 Arnaud

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why double quote is preferred?

2010-02-13 Thread Douglas Bates

On Fri, Feb 12, 2010 at 4:25 PM, blue sky bluesky...@gmail.com wrote:
 ?'`' shows the following:

 Single and double quotes delimit character constants. They can be
 used interchangeably but double quotes are *preferred* (and character
 constants are printed using double quotes), so single quotes are
 normally only used to delimit character constants containing double
 quotes.

 It is not clear to me why double quote is preferred (I don't think
 that character constants are printed using double quotes should be
 the reason, in the sense that single quote can be used for printing if
 we want to do so). It seems that double quote and single quote can be
 used interchangeably, except that Single quotes need to be escaped by
 backslash in single-quoted strings, and double quotes in double-quoted
 strings.

 Could somebody why double quote is preferred?

To avoid confusion for those who are accustomed to programming in the
C family of languages (C, C++, Java), where there is a difference in
the meaning of single quotes and double quotes.
A C programmer reads 'a' as a single character and a as a character
string consisting of the letter 'a' followed by a null character to
terminate the string.  In R there is no character data type, there are
only character strings.  For consistency with other languages it helps
if character strings are delimited by double quotes.  The single quote
version in R is for convenience.  On most keyboards you don't need to
use the shift key to type a single quote but you do need the shift for
a double quote.

P.S. I do not plan to get into an extended debate on this issue, Peng.
 (As others have pointed out, it is considered bad form to disguise
your identity on this list.)  Your opinions on the design of the
language have been noted but if you want the language to be redesigned
to your specifications, you will need to fork your own version.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm function in R

2010-02-14 Thread Douglas Bates

On Sat, Feb 13, 2010 at 7:09 PM, Daniel Malter dan...@umd.edu wrote:
 It seems to me that your question is more about the econometrics than about
 R. Any introductory econometric textbook or compendium on econometrics will
 cover this as it is a basic. See, for example, Greene 2006 or Wooldridge
 2002.

 Say X is your data matrix, that contains columns for each of the individual
 variables (x), columns with their interactions, and one column of 1s for the
 intercept. Let y be your dependent variable. Then, OLS estimates are
 computed by

 X'X inverse X'y

 Or in R

 solve(t(X)%*%X)%*%t(X)%*%y

But that is not the way that the coefficients are calculated in R.  It
is the formula that is given in text books but, like most text book
formulas, is not suitable for computation, especially with large data
sets and complex models.

There are many practical ways to calculate the least squares
coefficients in large data sets and they all involve decompositions.
For a Hadoop environment a scatter-gather approach would be to form
Cholesky decompositions of the cross-product of sections of rows of
the model matrix then combine the decompositions.

 Best,
 Daniel


 -
 cuncta stricte discussurus
 -
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Something Something
 Sent: Saturday, February 13, 2010 5:04 PM
 To: Bert Gunter
 Cc: r-help@r-project.org
 Subject: Re: [R] lm function in R

 I tried..

 mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
 formula(mod)

 This produced
 Y ~ X1 * X2 * X3


 When I typed just mod I got:

 Call:
 lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)

 Coefficients:
 (Intercept)          X11          X21          X31      X11:X21      X11:X31
     X21:X31  X11:X21:X31
   177.9245       0.2005       2.4482       3.1216       0.8127     -26.6166
     -3.0398      29.6049


 I am trying to figure out how R computed all these coefficients.





 On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter gunter.ber...@gene.com wrote:

 ?formula


 Bert Gunter
 Genentech Nonclinical Statistics

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Something Something
 Sent: Saturday, February 13, 2010 1:24 PM
 To: Daniel Nordlund
 Cc: r-help@r-project.org
 Subject: Re: [R] lm function in R

 Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
 to
 '+'.

 Seems like when I put * it means - interaction  when I put + it's not an
 interaction.

 Is it correct to assume then that...

 When I put + R evaluates the following equation:
 Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk


 But when I put * R evaluates the following equation;
 Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c

 Is this correct?  If it is then can someone point me to any sources that
 will explain how the coefficients (such as b0... bk, b12.. , b123..) are
 calculated.  I guess, one source is the R source code :) but is there any
 other documentation anywhere?

 Please let me know.  Thanks.



 On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
 djnordl...@verizon.netwrote:

   -Original Message-
   From: r-help-boun...@r-project.org [mailto:
 r-help-boun...@r-project.org]
   On Behalf Of Something Something
   Sent: Friday, February 12, 2010 5:28 PM
   To: Phil Spector; r-help@r-project.org
   Subject: Re: [R] lm function in R
  
   Thanks for the replies everyone.  Greatly appreciate it.  Some
 progress,
   but
   now I am getting the following values when I don't use as.factor
  
   13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
  
   Is that what you guys get?
 
 
  If you look at Phil's response below, no, that is not what he got.  The
  difference is that you are specifying an interaction, whereas Phil did
 not
  (because the equation you initially specified did not include an
  interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
 
  
  
   On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
   spec...@stat.berkeley.eduwrote:
  
By converting the two variables to factors, you are fitting
an entirely different model.  Leave out the as.factor stuff
and it will work exactly as you want it to.
   
 dat
   
 V1 V2 V3 V4
1 s1 14  4  1
2 s2 23  4  2
3 s3 30  7  2
4 s4 50  7  4
5 s5 39 10  3
6 s6 67 10  6
   
names(dat) = c('id','y','x1','x2')
z = lm(y~x1+x2,dat)
predict(z)
   
      1        2        3        4        5        6 15.16667
 24.7
27.7 46.7 40.16667 68.7
   
   
                                       - Phil Spector
                                        Statistical Computing
 Facility
                                        Department of Statistics
                                        UC Berkeley
                                        spec...@stat.berkeley.edu

Re: [R] False convergence of a glmer model

2010-02-16 Thread Douglas Bates

On Tue, Feb 16, 2010 at 9:05 AM, Shige Song shiges...@gmail.com wrote:
 Dear All,

 I am trying to fit a 2-level random intercept logistic regression on a
 data set of 20,000 cases.  The model is specified as the following:

  m1 - glmer(inftmort ~ as.factor(cohort) + (1|code), family=binomial, data=d)

 I got Warning message: In mer_finalize(ans) : false convergence (8)

That message means that the optimizer function, nlminb, got stalled.
It has converged but the point at which is has converged is not
clearly the optimum.  In many cases this just indicates that the
optimizer is being overly cautious.  However, it can also mean that
the problem is ill-defined.

The fact the the second parameter is -7.46 is likely the problem.  A
difference in the probability of infant mortality between levels of
cohort on the order of -7.5 on the logit scale is huge.   Do the
estimated probabilities at this value of the parameters make sense?

P.S. Questions of this sort may be more readily answered in the
R-SIG-mixed-models mailing list.

 With the verbose=TRUE option, I was able to get the following
 iteration history:

  0:     3456.4146:  1.15161 -3.99068 -0.498790 -0.122116
  1:     3361.3370:  1.04044 -4.38172 -0.561756 -0.289991
  2:     3303.7986:  1.48296 -4.40741 -0.566208 -0.259730
  3:     3147.5537:  1.93037 -5.14388 -0.682530 -0.443006
  4:     3123.6900:  2.10192 -5.18784 -0.685558 -0.428320
  5:     2988.6287:  2.94890 -6.31023 -0.825286 -0.586282
  6:     2958.3364:  3.25396 -6.88256 -0.316988 0.572428
  7:     2853.7703:  4.22731 -7.44955 -0.279492 -0.294353
  8:     2844.8476:  4.36583 -7.43902 -0.293111 -0.267308
  9:     2843.2879:  4.39182 -7.44895 -0.298791 -0.265899
  10:     2840.2676:  4.44288 -7.47103 -0.310477 -0.263945
  11:     2839.0890:  4.46259 -7.48131 -0.315320 -0.263753
  12:     2838.8550:  4.46649 -7.48344 -0.316292 -0.263745
  13:     2838.3889:  4.47428 -7.48771 -0.318236 -0.263737
  14:     2838.3703:  4.47459 -7.48788 -0.318314 -0.263738
  15:     2838.2216:  4.47708 -7.48927 -0.318936 -0.263742
  16:     2838.2157:  4.47718 -7.48932 -0.318961 -0.263742
  17:     2838.2145:  4.47720 -7.48934 -0.318966 -0.263742
  18:     2838.2121:  4.47724 -7.48936 -0.318976 -0.263742
  19:     2838.2120:  4.47724 -7.48936 -0.318976 -0.263742
  20:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  21:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  22:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  23:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  24:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  25:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  26:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  27:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  28:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  29:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  30:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  31:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  32:     2838.2118:  4.47724 -7.48936 -0.318977 -0.263742
  33:     2837.8154:  4.46385 -7.47464 -0.495684 -0.263985
  34:     2837.7613:  4.46641 -7.47053 -0.498335 -0.264014
  35:     2837.6418:  4.47259 -7.46200 -0.501644 -0.264141
  36:     2837.5982:  4.47485 -7.45928 -0.502598 -0.264214
  37:     2837.5850:  4.47537 -7.45882 -0.502848 -0.264237
  38:     2837.5307:  4.47674 -7.45848 -0.503216 -0.264313
  39:     2837.5014:  4.47725 -7.45875 -0.503273 -0.264344
  40:     2837.4955:  4.47735 -7.45881 -0.503284 -0.264350
  41:     2837.4944:  4.47738 -7.45882 -0.503286 -0.264351
  42:     2837.4941:  4.47738 -7.45882 -0.503287 -0.264351
  43:     2837.4936:  4.47739 -7.45883 -0.503288 -0.264352
  44:     2837.4935:  4.47739 -7.45883 -0.503288 -0.264352
  45:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  46:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  47:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  48:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  49:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  50:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  51:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  52:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  53:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  54:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  55:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  56:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  57:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  58:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  59:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  60:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  61:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  62:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352
  63:     2837.4931:  4.47740 -7.45883 -0.503289 -0.264352

 By the way, the same model can be fitted using Stata using xtlogit and
 xtmelogit; a simpler model without

Re: [R] lmer - error asMethod(object) : matrix is not symmetric

2010-02-16 Thread Douglas Bates

This is similar to another question on the list today.

On Tue, Feb 16, 2010 at 4:39 AM, Luisa Carvalheiro
lgcarvalhe...@gmail.com wrote:
 Dear R users,

 I  am having problems using package lme4.

 I am trying to analyse the effect of a continuous variable (Dist_NV)
 on a count data response variable (SR_SUN) using Poisson error
 distribution. However, when I run the model:

 summary(lmer((SR_SUN)~Dist_NV + (1|factor(Farm_code)) ,
 family=poisson, REML=FALSE))

 1 error message and 1 warning message show up:

 in asMethod(object) : matrix is not symmetric [1,2]
 In addition: Warning message:
 In mer_finalize(ans) : singular convergence (7)

So the first thing to do is to include the optional argument verbose =
TRUE in the call to lmer.  (Also, REML = FALSE is ignored for
Generalized Linear Mixed Models and can be omitted. although there is
no harm in including it.)

You need to know where the optimizer is taking the parameter values
before you can decide why.

P.S. Questions like this will probably be more readily answered on the
R-SIG-Mixed-Models mailing list.

 A model including  Dist_NV together with other variables runs with no 
 problems.
 What am I doing wrong?

 Thank you,

 Luisa


 --
 Luisa Carvalheiro, PhD
 Southern African Biodiversity Institute, Kirstenbosch Research Center, 
 Claremont
  University of Pretoria
 Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa
 telephone - +27 (0) 790250944
 carvalhe...@sanbi.org
 lgcarvalhe...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer - error asMethod(object) : matrix is not symmetric

2010-02-16 Thread Douglas Bates

On Tue, Feb 16, 2010 at 10:54 AM, Luisa Carvalheiro
lgcarvalhe...@gmail.com wrote:
 Dear Douglas,

 Thank you for your reply.
 Just some extra info on the dataset: In my case Number of obs is 33,
 and number of groups of factor(Farm_code) is 12.
 This is the information on iterations I get:

 summary(lmer(round(SR_SUN)~Dist_NV + (1|factor(Farm_code)) ,
 family=poisson, verbose =TRUE))
  0:     60.054531:  1.06363  2.14672 -0.000683051
  1:     60.054531:  1.06363  2.14672 -0.000683051
 Error in asMethod(object) : matrix is not symmetric [1,2]
 In addition: Warning message:
 In mer_finalize(ans) : singular convergence (7)

 When I run a similar model (exp variable Dist_hives) the number of
 iterations is 11:

  summary(lmer(round(SR_SUN)~Dist_hives + (1|factor(Farm_code)) ,
 family=poisson, verbose =TRUE))
  0:     61.745238: 0.984732  1.63769 0.000126484
  1:     61.648229: 0.984731  1.63769 -2.08637e-05
  2:     61.498777: 0.984598  1.63769 4.11867e-05
  3:     47.960908: 0.381062  1.63585 6.77029e-05
  4:     46.223789: 0.250732  1.66727 8.31854e-05
  5:     46.23: 0.250732  1.66727 6.97790e-05
  6:     46.216710: 0.250730  1.66727 7.60560e-05
  7:     46.168835: 0.230386  1.64883 9.16430e-05
  8:     46.165955: 0.228062  1.65658 8.70694e-05
  9:     46.165883: 0.228815  1.65737 8.63400e-05
  10:     46.165883: 0.228772  1.65734 8.63698e-05
  11:     46.165883: 0.228772  1.65734 8.63701e-05

 I am very confused with the fact that it runs with Dist_hives and not
 with Dist_NV. Both variables are distance values, the first having no
 obvious relation with the response variable and the second (Dist_NV)
 seems to have a negative effect on SR_SUN.

As you say, Dist_hives has very little relationship to the response
variable.  The two fixed-effects coefficients are the last two
parameters in the iteration output (the first parameter is the
standard deviation of the random effects).  So the slope with respect
to Dist_hives for the linear predictor is 0.863.  Either you have
very large magnitudes of Dist_hives or that variable does not have
much predictive power.

For the second (Dist_NV) variable, the optimization algorithm is not
able to make progress from the starting estimates.  This may be an
indication that the problem is badly scaled.  Are the values of
Dist_NV very large?  If so, you may want to change the unit (say from
meters to kilometers) so the values are much smaller.

It may also help to use a starting estimate for the standard deviation
of the random effects derived from the other model.  That is, include
start = 0.22 in the call to lmer.

 Does this information helps identifying the problem with my data/analysis?

 Thank you,

 Luisa




 On Tue, Feb 16, 2010 at 5:35 PM, Douglas Bates ba...@stat.wisc.edu wrote:
 This is similar to another question on the list today.

 On Tue, Feb 16, 2010 at 4:39 AM, Luisa Carvalheiro
 lgcarvalhe...@gmail.com wrote:
 Dear R users,

 I  am having problems using package lme4.

 I am trying to analyse the effect of a continuous variable (Dist_NV)
 on a count data response variable (SR_SUN) using Poisson error
 distribution. However, when I run the model:

 summary(lmer((SR_SUN)~Dist_NV + (1|factor(Farm_code)) ,
 family=poisson, REML=FALSE))

 1 error message and 1 warning message show up:

 in asMethod(object) : matrix is not symmetric [1,2]
 In addition: Warning message:
 In mer_finalize(ans) : singular convergence (7)

 So the first thing to do is to include the optional argument verbose =
 TRUE in the call to lmer.  (Also, REML = FALSE is ignored for
 Generalized Linear Mixed Models and can be omitted. although there is
 no harm in including it.)

 You need to know where the optimizer is taking the parameter values
 before you can decide why.

 P.S. Questions like this will probably be more readily answered on the
 R-SIG-Mixed-Models mailing list.

 A model including  Dist_NV together with other variables runs with no 
 problems.
 What am I doing wrong?

 Thank you,

 Luisa


 --
 Luisa Carvalheiro, PhD
 Southern African Biodiversity Institute, Kirstenbosch Research Center, 
 Claremont
  University of Pretoria
 Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa
 telephone - +27 (0) 790250944
 carvalhe...@sanbi.org
 lgcarvalhe...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 --
 Luisa Carvalheiro, PhD
 Southern African Biodiversity Institute, Kirstenbosch Research Center, 
 Claremont
  University of Pretoria
 Postal address - SAWC Pbag X3015 Hoedspruit 1380, South Africa
 telephone - +27 (0) 790250944
 carvalhe...@sanbi.org
 lgcarvalhe...@gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] False convergence of a glmer model

2010-02-16 Thread Douglas Bates

On Tue, Feb 16, 2010 at 9:38 AM, Shige Song shiges...@gmail.com wrote:
Hi Doug,

Thanks. Next time I will post it to the R-SIG0-mixed-models mailing
list, as you suggested.

I have added R-SIG-mixed-models to the cc: list. I suggest we drop
the cc: to R-help after this message.

With respect to your question, the answer is no, these parameters do
not make sense. Here is the Stata output from exactly the same
model:

. xi:xtlogit inftmort i.cohort, i(code)
i.cohort _Icohort_1-3 (naturally coded; _Icohort_1 omitted)

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 2.0
max = 9

Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob chi2 = 0.0178

Well, the quantities being displayed in the iteration output from
glmer are the deviance and the parameter values. This stata
log-likelihood corresponds to a deviance of 3389.3. It is possible
that glmer and stata don't measure the log-likelihood on the same
scale but, if not, then the estimates where glmer gets stalled are
producing a lower deviance of 2837.49.

The reason that glmer is getting stalled is because of the coefficient
of -7.45883 for the intercept. This corresponds to a mean success
probability of 0.0005 for cohort 1. The stata coefficient estimate of
-5.214642 corresponds to a mean success probability of 0.0055 for
cohort 1, which is still very, very small. The success probabilities
for the other cohorts are going to be even smaller. What is the
overall proportion of zeros in the response?

The optimization to determine the maximum likelihood estimates of the
coefficients is being done on the scale of the linear predictor. When
the values of the linear predictor become very small or very large,
the fitted values become insensitive to the coefficients. The fact
the one program converges and another one doesn't may have more to do
with the convergence criterion than with the quality of the fit.

--
inftmort | Coef. Std. Err. z P|z| [95% Conf. Interval]
-+
_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-+
/lnsig2u | .9232684 .1416214 .6456956 1.200841
-+
sigma_u | 1.586665 .1123528 1.381055 1.822885
rho | .4335015 .0347791 .3669899 .5024984
--
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob = chibar2 = 0.000

The difference is quite huge, and Stata did not have any difficulties
estimating this model, which makes feel that I might get some very
basic specification wrong in my R model...

Best,
Shige

On Tue, Feb 16, 2010 at 10:29 AM, Douglas Bates ba...@stat.wisc.edu wrote:
On Tue, Feb 16, 2010 at 9:05 AM, Shige Song shiges...@gmail.com wrote:
Dear All,

I am trying to fit a 2-level random intercept logistic regression on a
data set of 20,000 cases. The model is specified as the following:

m1 - glmer(inftmort ~ as.factor(cohort) + (1|code), family=binomial,
data=d)

I got Warning message: In mer_finalize(ans) : false convergence (8)

That message means that the optimizer function, nlminb, got stalled.
It has converged but the point at which is has converged is not
clearly the optimum. In many cases this just indicates that the
optimizer is being overly cautious. However, it can

Re: [R] Use of R in clinical trials

2010-02-18 Thread Douglas Bates

On Thu, Feb 18, 2010 at 12:36 PM, Bert Gunter gunter.ber...@gene.com wrote:
 The key dates are 1938 and 1962. The FDC act of 1938 essentially mandated
 (demonstration of) safety. The tox testing infrastructure grew from that.At
 that time, there were no computers, little data, little statistics
 methodology. Statistics played little role -- as is still mainly the case
 today for safety. Any safety findings whatever in safety testing raise a
 flag; statistical significance in the multiple testing framework is
 irrelevant.

 1962 saw the Kefauver-Harris Amendments that mandated demonstration of
 efficacy. That was the key. The whole clinical trial framework and the
 relevant statistical design and analysis infrastructure flowed from that
 regulatory requirement. SAS's development soon after was therefore the first
 direct response to the statistical software needs that resulted. Note also,
 that statistical software was in its infancy at this time: before SAS there
 was Fortran and COBOL; there was no statistical software.

 So, as you can see, there essentially was **no** before SAS.

 (Corrections/additional information welcome!)

My recollection is that the BMD programs (which, in a later version,
became BMDP) predated SAS and were specifically for BioMeDical
analysis.  Early statistical software was oriented to applications
areas: SPSS (Statistical Package for the Social Sciences) was the
predominant system used in the social sciences, BMD(P) in biomedical
areas and SAS in agricultural/life sciences settings.  Eventually the
more coherent framework and comparative ease-of-use of SAS (yes, I am
saying that with a straight face - in the days of batch jobs submitted
on punched cards with data residing on magnetic tape, there were
different standards of ease-of-use) won over more users in medical
fields.


 Bert Gunter
 Genentech Nonclinical Biostatistics



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Christopher W. Ryan
 Sent: Thursday, February 18, 2010 10:09 AM
 To: r-help@r-project.org
 Cc: p.dalga...@biostat.ku.dk
 Subject: Re: [R] Use of R in clinical trials

 Pure Food and Drug Act: 1906
 FDA: 1930s
 founding of SAS: early 1970s

 (from the history websites of SAS and FDA)

 What did pharmaceutical companies use for data analysis before there was
 SAS? And was there much angst over the change to SAS from whatever was
 in use before?

 Or was there not such emphasis on and need for thorough data analysis
 back then?

 --Chris
 Christopher W. Ryan, MD
 SUNY Upstate Medical University Clinical Campus at Binghamton
 425 Robinson Street, Binghamton, NY  13904
 cryanatbinghamtondotedu

 If you want to build a ship, don't drum up the men to gather wood,
 divide the work and give orders. Instead, teach them to yearn for the
 vast and endless sea.  [Antoine de St. Exupery]

 Bert Gunter wrote:
 DISCLAIMER: This represents my personal view and in no way reflects that
 of
 my company.

 Warning: This is a long harangue that contains no useful information on R.
 May be wise to delete without reading.
 --

 Sorry folks, I still don't understand your comments. As Cody's original
 post
 pointed out, there are a host of factors other than ease of
 programmability
 or even quality of results that weigh against any change. To reiterate,
 all
 companies have a huge infrastructure of **validated SAS code** that would
 have to be replaced. This, in itself, would take years and cost tens of
 millions of dollars at least. Also to reiterate, it's not only
 statistical/reporting functionality but even more the integration into the
 existing clinical database systems that would have to be rewritten **and
 validated**. All this would have to be done while continuing full steam on
 existing submissions. It is therefore not surprising to me that no pharma
 company in its right mind even contemplates undertaking such an effort.

 To put these things into perspective. Let's say Pfizer has 200 SAS
 programmers (it's probably more, as they are a large Pharma, but I dunno).
 If each programmer costs, conservatively, $200K U.S. per year fully
 loaded,
 that's $40 million U.S. for SAS Programmers. And this is probably a severe
 underestimate. So the $14M quoted below is chicken feed -- it doesn't even
 make the radar.

 To add further perspective, a single (large) pivotal clinical trial can
 easily cost $250M . A delay in approval due to fooling around trying to
 shift to a whole new software system could easily cause hundreds of
 million
 to billions if it means a competitor gets to the marketplace first. So, to
 repeat, SAS costs are chicken feed.

 Yes, I suppose that the present system institutionalizes mediocrity. How
 could it be otherwise in any such large scale enterprise? Continuity,
 reliability, and robustness are all orders of magnitude more important for
 both the FDA and Pharma to get safe and efficacious drugs to the public.
 Constantly

Re: [R] Print table in lme

2010-02-20 Thread Douglas Bates

On Sat, Feb 20, 2010 at 4:28 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:


 Daniel-6 wrote:

 Hello, I'm trying to add lme results in a table with lm coef results, but
 as
 I know, estout or xtabel cannot  support lme objects.
 I'm a new in R and I'll appreciate some helpful comments.


 I don't know what estout and xtabel do, but if you want to extract the core
 results of lme, the following should give you a starter. For latex output,
 you could use latex in Hmisc.

 Dieter

 library(nlme)
 fm1 = summary(lme(distance ~ age, data = Orthodont))
 tabl = fm1$tTable
 tabl
 str(tabl)

Thanks, Dieter.  If lmemod is a fitted lme model then, as Dieter
indicates, you want

summary(lmemod)$tTable

and, for xtable you can create the xtable object as

xtable(summary(lmemod)$tTable)

For most other model fitting functions the sequence

coef(summary(mymodel))

works.  Unfortunately, in the nlme package we assigned another meaning
to the coef method.  In retrospect that wasn't the best idea.

The coef(summary(foo)) sequence does work for models fit by the lmer
function in the lme4 package.  In fact, I was just in communication
with David Dahl, the author of xtable, and I beleive he will add a
method to the xtable package.




 --
 View this message in context: 
 http://n4.nabble.com/Print-table-in-lme-tp1562325p1562727.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Print table in lme

2010-02-20 Thread Douglas Bates

On Sat, Feb 20, 2010 at 10:27 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:


 Douglas Bates-2 wrote:

 On Sat, Feb 20, 2010 at 4:28 AM, Dieter Menne
 For most other model fitting functions the sequence

 coef(summary(mymodel))

 works.  Unfortunately, in the nlme package we assigned another meaning
 to the coef method.  In retrospect that wasn't the best idea.



 Good to know. I used to stutter some incomprehensible stuff when students
 asked me about this. Luckily, the frequency of that question has declined in
 favor of: why are there no incomprehensiblevalues in lmer tables?

Aha, you have caught me out.  Failing to provide p-values was a clever
diversionary tactic on my part.

 --
 View this message in context: 
 http://n4.nabble.com/Print-table-in-lme-tp1562325p1562922.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get problem

2010-02-24 Thread Douglas Bates

On Wed, Feb 24, 2010 at 3:56 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 24/02/2010 4:31 PM, Georg Ehret wrote:

 Dear R communtiy,
   I do not understand why this does not work...:

  betaS$SBP
 [1]  0.03274 -0.04216 -0.08986 -0.45980  0.60320 -0.63070 -0.05682
  0.20130
  t-c(betaS$SBP)
  t
 [1] betaS$SBP
  get(t)
 Error in get(t) : object 'betaS$SBP' not found


 The problem is that betaS$SBP is an expression that extracts a component of
 the betaS object, it's not the name of an object.  get() only gets single
 objects, it doesn't evaluate expressions.  You need something like

 eval(parse(text=t))

 to get what you want.

But if you have read

fortune(rethink)

you may approach the problem differently.

P.S. I agree with Thomas, which may seem strange to anyone who has
read the nlme sources. :-)

 [I am trying to use the variable t in a loop to call many different
 objects, but the pasting together did not work and this simple example
 does
 not neither...]

 Thank you and best regards, Georg.
 ***
 Georg Ehret
 JHU
 Baltimore

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Compiling

2007-09-14 Thread Douglas Bates

An alternative to cross-compiling when you have a source package is to
use the Win-builder facility at win-builder.R-project.org

Thanks to Uwe for providing this facility.  I find it much, much
easier than trying to cross-compile or to set up a Windows computer
for compiling R packages.

On 9/14/07, Scott Hyde [EMAIL PROTECTED] wrote:
 Hello All,

 I have a Linux computer and do all of my work from it.  However, I
 teach also, which means that many of my students use windows.   Hence,
 I need to create packages that work under windows as well as Linux.  I
 have tried to follow the directions at

 http://cran.r-project.org/doc/contrib/cross-build.pdf

 which is the document Building Microsoft Windows Versions of R and R
 packages under Intel Linux.  This has been very helpful.  However,
 the file R_Tcl.zip is no longer available, so I cannot compile R for
 Windows using the make R command as described in the document.  Is
 it necessary to have the Tcl sources in there?  If it is, how should
 the directions be modified to enable the complete compilation of R?

 None of my code contains C, Fortran, or any other language.  It is
 just plain R code.  I would think that this would be easier to convert
 over.  Is it?  I tried the following and it seems to work, but I'd
 like to know if it is safe.

 1.  Build package with pre-compiled binary package option R CMD
 build --binary pkgname
 2. convert the resulting tar.gz file to a zip archive.
 3. Install it on a windows machine.

 This process successfully works when I install it on a windows
 machine, but I have no idea how safe it is.

 --
 *
 Scott K. Hyde
 Assistant Professor of Statistics and Mathematics
 School of Computing
 Brigham Young University -- Hawaii

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing the fixed- and random-effects variance-covariance matrices of an nlme model

2007-09-27 Thread Douglas Bates

On 9/26/07, Rob Forsyth [EMAIL PROTECTED] wrote:
 I would appreciate confirmation that the function vcov(model.nlme)
 gives the var-cov matrix of the fixed effects in an nlme model.

It gives the approximate variance-covariance matrix for the
fixed-effects parameters in the model.  (Exact variance-covariance
matrices are very difficult to evaluate for nonlinear models and even
more difficult to evaluated for nonlinear mixed-effects models.)

The documentation for the generic function vcov and some of its
methods can be accessed by

?vcov.lme


 Presumably the random-effects var-cov matrix is given by cov(ranef
 (model.nlme)?

No.  The BLUPs of the random effects (actually as Alan James described
the situation, For a nonlinear model these are just like the BLUPs
(Best Linear Unbiased Predictors) except that they are not linear, and
they're not unbiased, and there is no clear sense in which they are
best but, other than that, ...) are not guaranteed to have an
observed variance-covariance matrix that corresponds to the estimate
of the variance-covariance matrix of the random effects.  These
estimates are returned by the VarCorr function.  See ?VarCorr

The implementation of VarCorr in the nlme package is not optimal in
that it returns the result as a character matrix and you need to
covert to a numeric representation if you want to do anything other
than print it out.  I prefer the implementation in the lme4 package
which returns a list of numeric matrices that are formatted by a
specific function (called formatVC but hidden in the lme4 package's
namespace) when needed.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitted values in LMER for the fixed-effects only

2007-09-28 Thread Douglas Bates

On 9/28/07, Anouk Simard [EMAIL PROTECTED] wrote:
 Hi,

 I would like to extract the fitted values from a model using LMER but
 only for the fix portion of the model and not for the fix and random
 portion (e.g it is the procedure outpm or outp in SAS). I am aware of
 the procedure fitted() but I not sure it give the fitted values both for
 the fixed and random or only the fixed. I looked in the r help and the r
 list and I haven't not found much details about it excepted this
 comments from Douglas Bates in January 2006 :

 Would it help if there were an option in the fitted method to allow for
 fixed-effects only versus fixed- and random-effects? As you say, because
 lmer models do not need to be hierarchical it is not obvious what it
 would mean to include some but not all of the random effects terms in
 the fitted values. However, it is easy and unambiguous to define
 fitted values for the fixed-effects only.

 Up until a few days ago there was an option to do this but then I
 changed the calculation of the fitted values in an attempt to clean up
 the code. The calculation of the level = 0 fitted values in the new
 representation of the fitted model is quite easy. It is

 [EMAIL PROTECTED] %*% fixef(fm1)

 I tried the last formula, but I want to confirm that this is still the
 proper and best way to get fitted values for the fix effects using lmer.

Yes.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitted values in LMER for the fixed-effects only

2007-10-02 Thread Douglas Bates

On 9/29/07, Dieter Menne [EMAIL PROTECTED] wrote:
 On 9/28/07, Anouk Simard Anouk.Simard at bio.ulaval.ca wrote:

   I would like to extract the fitted values from a model using LMER but
   only for the fix portion of the model and not for the fix and random
   portion (e.g it is the procedure outpm or outp in SAS).
 ..
 Quoting Douglas Bates bates at stat.wisc.edu

   The calculation of the level = 0 fitted values in the new
   representation of the fitted model is quite easy. It is
  
   [EMAIL PROTECTED] %*% fixef(fm1)
  

 Douglas Bates replied

  Yes.

 Mmm.. If I consider a few hundreds of messages in this list on the subject,
 it's considered dirty to use the [EMAIL PROTECTED] or fm1$X construct. So 
 should we
 expect that this is final (no longer fitted(), augPred etc), or is it work
 in prooogres?

This is to indicate that I am rather slow in getting the lme4 package
finished?  :-)

I agree that I am and no one is more impatient with the progress than I.

You are correct that it is considered bad form to reach into a fitted
model structure and pull out individual components or slots.  My brief
answer agreeing that this was still the best way of obtaining these
particular values should have emphasized the still.  I haven't
thought of a good way of specifying this particular form in, say, the
fitted method.  There is a clean way of specifying the levels of
random effects incorporated into the fitted values for models with
nested random effects only.  However, for crossed or partially crossed
random effects this is less clearly defined.

Perhaps a better alternative would be to define a method for the
model.matrix generic for lmer models.  Then the calculation could be
written as

model.matrix(fm1, type = fixed) %*% fixef(fm1)

using only extractor functions.

It may be surprising that sometimes the most difficult part of the
design of the code in a package is deciding on the names and arguments
of functions or methods.  In my experience rushing such decisions
often leads to a lot of maintenance headaches.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme (or lmer) question

2007-10-05 Thread Douglas Bates

On 10/5/07, Bert Gunter [EMAIL PROTECTED] wrote:
 Folks:

 In the following mixed effect model, the x and residual variance
 components are nonidentifiable. Yet lme() (or the equivalent in lmer())
 happily produces an answer in which the sum of the variance components is
 the correct estimate of the single variance parameter. Why? -- i.e. why
 doesn't lme complain?

 x - 1:50
 y - rnorm(50)
 m1 - lme( y ~ 1, rand = ~1|x)

Because we hadn't anticipated that someone would try to do that when
we wrote lme?  It's the situation of being unable to make the code
foolproof because the fools are so inventive. :-)

The lmer function (at least the development version) does throw an
error for that example although the error message is less than
transparent.  Creating a kinder, gentler error message for this case
is on the ToDo list

 x - 1:50
 y - rnorm(50)
 lmer(y ~ 1 + (1|x))
Error: length(levels(dm$flist[[1]]))  length(Y) is not TRUE

The fact that you get a more-or-less arbitrary answer is related to
the way that the optimization of the log-likelihood (or the REML
criterion) is performed in lme.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] estfun d

2007-10-08 Thread Douglas Bates

On 10/8/07, Abdus Sattar [EMAIL PROTECTED] wrote:
 Hello EVERYONE,

 I need an URGENT help from you please!

 How can I see the estfun (empirical estimating function) and df (degree 
 of freedom) from the following mixed-model please?

 (fm1 - lmer2(Reaction ~ Days + (Days|Subject), sleepstudy))

You have asked this question several times on this list and on the
R-SIG-Mixed-Models list.  Repeating the question ad nauseum is
unlikely to generate a helpful answer.

To the best of my knowledge there is no lmer or lmer2 method for the
estfun generic.  Others may be willing to give you some references on
the difficulty in determining degrees of freedom issue - I don't want
to rehash it myself.

So, despite your repeated requests, there are no good answers to your
questions.  You cannot see such results because there are, as far as
I know, no functions to provide such results.  You are welcome to
write your own code to provide them.  If that is not suitable you may
need to take advantage of our money-back guarantee and remove R from
your computer to qualify for a full refund of the purchase price.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] estfun df

2007-10-08 Thread Douglas Bates

On 10/8/07, Achim Zeileis [EMAIL PROTECTED] wrote:
 On Mon, 8 Oct 2007, Abdus Sattar wrote:

  Hello EVERYONE,
 
  I need an URGENT help from you please!

 This type of requests is not considered to be very polite, please have a
 look at the posting guide.

  How can I see the estfun (empirical estimating function)

 I guess (because you are not telling us) that you would like to have an
 estfun() method for lmer objects. I don't provide one in my sandwich
 package (where the estfun generic is taken from) and AFAIK there is
 nothing analagous readily available in lme4... But I guess that you
 should be able to extract/compute the empirical estimating functions from
 the fitted lmer object.

  and df (degree of freedom) from the following mixed-model please?

  (fm1 - lmer2(Reaction ~ Days + (Days|Subject), sleepstudy))

 When you compute
logLik(fm1)
 it returns a logLik object that has a df attribute:
attr(logLik(fm1), df)

That degrees of freedom is a count of the number of parameters
estimated in the model.  That may be what Abdus Sattar wants but I'm
not sure.  Many times when people refer to the degrees of freedom for
linear mixed models they are thinking about the degrees of freedom for
a t-statistic or the denominator degrees of freedom for an F statistic
related to hypothesis tests on the fixed-effects parameters.  That was
what I was referring to in my response when I, in effect, said I
don't want to talk about it any more.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lme4 install trouble

2007-10-11 Thread Douglas Bates

On 10/11/07, David Reitter [EMAIL PROTECTED] wrote:
 After upgrading to R 2.6.0, I'm having trouble running lmer:

 model -  lmer(primed ~ log(dist.time)*role + 1|target.utt,
 data=data.utts)
 Error in UseMethod(as.logical) : no applicable method for as.logical

That problem originates in the Matrix package.  You need to upgrade
the Matrix package to version 0.999375-3 for R-2.6.0.  After that the
installed version of lme4 will function properly.

 So I thought I'd upgrade lme4 to the latest version, but
 unfortunately the compilation fails - perhaps there's a missing
 #include:

I think this might be a version skew with the Matrix package.  Those
macros are defined in a file in the .../R/library/Matrix/include
directory that is added to the include path but you need a recent
version of the Matrix package to get those particular macros.

 R CMD INSTALL -l ~/usr/lib/R lme4_0.99875-8.tar.gz
 * Installing *source* package 'lme4' ...
 ** libs
 gcc -std=gnu99 -I/home/dreitter/usr/lib/R/include -I/home/dreitter/
 usr/lib/R/include  -I/usr/local/include -I/home/dreitter/usr/lib/R/
 library/Matrix/include   -fpic  -g -O2 -c glmer.c -o glmer.o
 glmer.c: In function 'internal_Gaussian_deviance':
 glmer.c:185: error: 'CHM_SP' undeclared (first use in this function)
 glmer.c:185: error: (Each undeclared identifier is reported only once
 glmer.c:185: error: for each function it appears in.)
 glmer.c:185: error: expected ';' before 'Lm'
 glmer.c:186: error: 'CHM_FR' undeclared (first use in this function)
 glmer.c:186: error: expected ';' before 'Lcp'
 glmer.c:189: error: 'CHM_DN' undeclared (first use in this function)
 glmer.c:189: error: expected ';' before 'Ltb'
 glmer.c:195: error: 'Lcp' undeclared (first use in this function)
 glmer.c:196: error: 'Lm' undeclared (first use in this function)
 glmer.c:197: error: 'chb' undeclared (first use in this function)
 glmer.c:197: error: 'Ltb' undeclared (first use in this function)
 glmer.c: In function 'internal_bhat':
 glmer.c:396: error: 'CHM_FR' undeclared (first use in this function)
 glmer.c:396: error: expected ';' before 'L'
 glmer.c:403: error: 'L' undeclared (first use in this function)
 glmer.c: In function 'glmer_MCMCsamp':
 glmer.c:657: error: 'CHM_FR' undeclared (first use in this function)
 glmer.c:657: error: expected ';' before 'L'
 glmer.c:680: error: 'L' undeclared (first use in this function)
 make: *** [glmer.o] Error 1
 ERROR: compilation failed for package 'lme4'

 I've installed Matrix_0.999375-3 and lattice_0.16-5.

 Any pointers would be appreciated.


 --
 David Reitter
 ICCS/HCRC, Informatics, University of Edinburgh
 http://www.david-reitter.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rearrange data columns

2007-10-11 Thread Douglas Bates

On 10/11/07, Martin Ivanov [EMAIL PROTECTED] wrote:
 Dear R users,
  I need to to the the following. Let a= 1 2 3
  4 5 6
  and b= -1 -2 -3  be (2x3) matrices.
 -4 -5 -6
  I need to combine the two matrices into a new (2x6) matrix like this:

  ab = ( 1 -1 2 -2 3 -3 )
 4 -4 5 -5 6 -6

  How can this be done in R?

 (a - matrix(1:6, nr = 2))
 [,1] [,2] [,3]
[1,]135
[2,]246
 (b - -a)
 [,1] [,2] [,3]
[1,]   -1   -3   -5
[2,]   -2   -4   -6
 (ans - rbind(a, b))
 [,1] [,2] [,3]
[1,]135
[2,]246
[3,]   -1   -3   -5
[4,]   -2   -4   -6
 dim(ans) - c(2, 6)
 ans
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]1   -13   -35   -5
[2,]2   -24   -46   -6

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reference for logistic regression

2007-10-11 Thread Douglas Bates

On 10/11/07, Douglas Bates [EMAIL PROTECTED] wrote:
 On 10/11/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Dear list, first accept my apologies for asking a non-R question.

  Can anyone point me to a good reference on logistic regression? web or
  book references would be great. I am interested in the use and
  interpretation of dummy variables and prediction models.
  I checked the contributed section in the CRAN homepage but could not
  find anything (Julian Faraway´s practical Regression and ANOVA using
  R does not cover logistic regression)

 The wikipedia article on logistic regression
 (http://en.wikipedia.org/wiki/Logistic_regression) contains a brief
 description and some references.  Statisticians often consider
 logistic regression to be an example of a more general class of models
 called generalized linear models, which is why the R function to fit a
 logistic regression model is called glm.  There is a link in the
 logistic regression wikipedia article to the generalized linear model
 article.

 Whenever you use wikipedia you should be cautious of the quality of
 the information in the articles.  Generally the articles are good as a
 brief introduction but they can and do contain errors so you should
 check important facts and not take them at face value.  A person in
 one of my classes asked about the standard deviation and I suggested
 that they look at the wikipedia article on the topic.  Then I looked
 at it myself and saw that one of the things mentioned is that the
 standard deviation of the Cauchy distribution is undefined, which is
 true, but the reason given is because E[X] is undefined, which is not
 true.

As several people have pointed out to me privately, I'm the one
spreading misinformation, not the authors of the wikipedia article.
My, apparently faulty, recollection was that E[X] was defined (because
the density is symmetric) but E[X^2] was not defined.  It looks like I
need to review some elementary properties of distributions and
integrals.

Note to self: Don't respond to mailing list postings until fully caffinated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Basic plot question: Figure 1.1 Pinheiro Bates

2007-10-12 Thread Douglas Bates

On 10/12/07, David Afshartous [EMAIL PROTECTED] wrote:
 All,
 Sorry for overly simplistic question, but I can't seem to remember how to
 create the basic plot shown in Figure 1.1 of Pinheiro  Bates (2004; p.4).
 The y-axis delineates a factor (Rail) while the x-axis displays the
 distribution of a continuous variable (time) according to each level of the
 factor.  Didn't see it in archives but perhaps I'm not searching on correct
 key words, any help appreciated,
 David

  library(nlme)
  Rail
 Grouped Data: travel ~ 1 | Rail
Rail travel
 1 1 55
 2 1 53
 3 1 54
 4 2 26
 5 2 37
 6 2 32
 7 3 78
 8 3 91
 9 3 85
 104 92
 114100
 124 96
 135 49
 145 51
 155 50
 166 80
 176 85
 186 83

That plot can be reproduced by

 library(lattice)
 data(Rail, package = nlme)
 dotplot(Rail ~ travel, Rail)

However, this relies on the Rail$Rail factor being ordered by
increasing mean travel time, which is fine for the plot but may get in
the way of other uses of the data.  In a way we only want to assign an
order to the levels of the Rail factor for the purposes of the plot.

As Deepayan Sarkar mentions in his useR!2007 presentation
(http://user2007.org/program/presentations/sarkar.pd) he has done a
considerable amount of upgrading of the lattice package, relative to
the original design of Trellis graphics, in part to make the nlme
plots easier to create.  The version of the Rail data in the MEMSS
package removes the ordering on the levels of the Rail factor so it
defaults to the original order.  Compare

 data(Rail, package = MEMSS)
 dotplot(Rail ~ travel, Rail)
 dotplot(reorder(Rail, travel) ~ travel, Rail)

The latter plot is the way I would recommend constructing that plot now.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coef se in lme

2007-10-17 Thread Douglas Bates

On 10/15/07, Doran, Harold [EMAIL PROTECTED] wrote:
 ?vcov

The vcov method returns the estimated variance-covariance matrix of
the fixed-effects only.  I think Irene's question is about the
combination of the fixed-effects parameters and the BLUPs of the
random effects that is returned by the coef method applied to an lmer
object.  (You may recall that you were the person who requested such a
method in lme4 like the coef method in nlme :-)

On the face of it this quantity should be easy to define and evaluate
but in fact it is not easy to do so because these are combinations of
model parameters (the fixed effects) and unobserved random variables
(the random effects).  It gets a bit tricky trying to decide what the
variance of this combination would be.  I think there is a sensible
definition, or at least a computationally reasonable definition, but
there are still a few slippery points in the argument.

Lately I have taken to referring to the estimates of the random
effects, what are sometimes called the BLUPs or Best Linear Unbiased
Predictors, as the conditional modes of the random effects.  That
is, they are the values that maximize the density of the random
effects given the observed data and the values of the model
parameters.  For a linear mixed model the conditional distribution of
the random effects is multivariate normal so the conditional modes are
also the conditional means.  Also, we can evaluate the conditional
variance-covariance matrix of the random effects up to a scale factor.

The next part is where things get a bit hazy for me but I think it
makes sense to consider the joint distribution of the estimator of the
fixed-effects parameters and the random effects conditional on the
data and, possibly, on the variance components.  Conditional on the
relative variance-covariance of the random effects (i.e. the matrix
that occurs as the penalty term in the penalized least squares
representation of the model) the joint distribution of the
fixed-effects estimators and the random effects is multivariate normal
with mean and variance-covariance matrix determined from the
mixed-model equations.

This big (p+q by p+q, where p is the dimension of the fixed effects
and q is the dimension of the random effects) variance-covariance
matrix could be evaluated and, from that, the variance of any linear
combination of components.  However, I have my doubts about whether it
is the most sensible answer to evaluate.  Conditioning on the relative
variance-covariance matrix of the random effects is cheating, in a
way.  It would be like saying we have a known variance, $\sigma^2$
when, in fact, we are using an estimate.  The fact that we don't know
$\sigma^2$ is what gives rise to the t distributions and F
distributions in linear models and we are all trained to pay careful
attention to the number of degrees of freedom in that estimate and how
it affects our ideas of the precision of the estimates of other model
parameters.  For mixed models, though, many practioners are quite
comfortable conditioning on the value of some of the variance
components but not others.  It could turn out that conditioning on the
relative variance-covariance of the random effects is not a big deal
but I don't know.  I haven't examined it in detail and I don't know of
others who have.

Another approach entirely is to use Markov chain Monte Carlo to
examine the joint distribution of the parameters (in the Bayesian
sense) and the random effects.  If you save the fixed effects and the
random effects from the MCMC chain then you can evaluate the linear
combination of interest throughout the chain and get an empirical
distribution of the quantities returned by coef.

This is probably an unsatisfactory answer for Irene who may have
wanted something quick and simple.  Unfortunately, I don't think there
is a quick, simple answer here.

I suggest we move this discussion to the R-SIG-Mixed-Models list which
I am cc:ing on this reply.

 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Irene Mantzouni
 Sent: Mon 10/15/2007 3:20 PM
 To: [EMAIL PROTECTED]
 Subject: [R] coef se in lme

 Hi all!

 How is it possible to estimate  standard errors for coef obtained from lme?
 Is there sth like se.coef() for lmer or what is the anaytical solution?

 Thank you!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

[R] Appropriate measure of correlation with 'zero-inflated' data?

2007-10-25 Thread Douglas Bates

I have reached the correlation section in a course that I teach and I
hit upon the idea of using data from the weekly Bowl Championship
Series (BCS) rankings to illustrate different techniques for assessing
correlation.

For those not familiar with college football in the United States
(where football refers to American football, not what is called
soccer here and football in most other countries) I should explain
that many, many universities and colleges have football teams but each
team only plays 10-15 games per season, so not every team will play
every other team.  The game is so rough that it is not feasible to
play more than one match per week and a national playoff after the
regular season is impractical.  It would take too long and the players
are, in theory, students first and athletes second.

In place of a national playoff there are various polls of coaches or
sports writers that purport to rank teams nationally.  Several
analysts also publish computer-based rankings that use complicated
formulas based on scores in individual games, strength of the
opponent, etc. to rank teams.

Rankings from two of the human polls (the Harris poll of sports
writers and the USA Today poll of the coaches) and from six of the
computer polls are combined to produce the official BCS ranking.  The
Wikipedia entry for Bowl Championship Series gives the history and
evolution of the actual formula that is currently used.

This season has been notable for the volatility of those rankings.
One is reminded of the biblical prophesy that The first shall be last
and the last shall be first.

Another notable feature this year is the extent to which the
computer-based rankings and the rankings in the human polls disagree.
I enclose a listing of the top 25 teams and the components of the
rankings as of last Sunday (2007-10-21).  (Almost all college football
games are played on Saturdays and the rankings are published on
Sundays).  The columns are
Rec - won-loss record
Hvot - total number of Harris poll votes
Hp - proportion of maximum Harris poll votes
HR - rank in the Harris poll (smaller is better)
Uvot, Up, UR - same for the USA Today poll
Cavg - Average score (it's actually a trimmed mean) on computer-based
rankings (larger is better)
BCS - BCS score - the average of Hp, Up and Cavg
Pre - BCS rank in the previous week

As I understand it, the votes in the Harris and USA Today polls are
calculated by asking each voter to list their top 25 teams then
awarding 25 points for a team ranked 1, 24 points for a team ranked 2,
etc. on each ballot and calculating the total.  Apparently there are
now 114 Harris poll participants and 60 USA Today poll participants
giving maximum possible scores of 2850 and 1500, respectively.

The Cavg column is calculated from 6 scores of 0 to 25 (larger is
better) dropping the largest and smallest scores.  The raw score is
out of 100 and the proportion is reported as Cavg.

The data frame is available (for a little while) as
http://www.stat.wisc.edu/~bates/BCS.rda

The raw scores and the rankings from the Harris and USA Today polls
are in fairly good agreement but the Cavg scores are very different.
Although scatterplots will show this  I feel that correlation measures
may be thrown off by the large number of zeros in the Cavg scores.
What would be the preferred of measuring correlation in such a case?
What would be a good graphical presentation showing the lack of
agreement of the various components of the BCS score?
 Rec Hvot Uvot Cavg Pre  Hp HR Up URBCS
Ohio St.   (8-0) 2847 1498 0.93   2 0.99895  1 0.9987  1 0.9759
Boston College (7-0) 2676 1412 0.97  23 0.93895  2 0.9413  2 0.9501
LSU(7-1) 2550 1319 0.96  31 0.89474  3 0.8793  3 0.9114
Arizona St.(7-0) 2003 1089 0.86  35 0.70281  8 0.7260  7 0.7629
Oregon (6-1) 2281 1225 0.67   3 0.80035  5 0.8167  5 0.7623
Oklahoma   (7-1) 2521 1306 0.51  32 0.88456  4 0.8707  4 0.7551
West Virginia  (6-1) 2157 1134 0.61  36 0.75684  6 0.7560  6 0.7076
Virginia Tech  (6-1) 1831 1052 0.69   4 0.64246 10 0.7013  9 0.6779
Kansas (7-0) 1671  911 0.75   6 0.58632 11 0.6073 10 0.6479
South Florida  (6-1) 1627  813 0.81  13 0.57088 12 0.5420 12 0.6410
Florida(5-2) 1867  906 0.61   8 0.65509  9 0.6040 11 0.6230
USC(6-1) 2100 1060 0.17   7 0.73684  7 0.7067  8 0.5378
Missouri   (6-1) 1568  790 0.53   9 0.55018 13 0.5267 13 0.5356
Kentucky   (6-2) 1156  604 0.55  34 0.40561 15 0.4027 15 0.4528
Virginia   (7-1)  650  466 0.76  12 0.22807 20 0.3107 18 0.4329
South Carolina (6-2) 1031  474 0.39  33 0.36175 17 0.3160 17 0.3559
Hawaii (7-0) 1265  617 0.00  11 0.44386 14 0.4113 14 0.2851
Georgia(5-2)  711  402 0.23  14 0.24947 19 0.2680 19 0.2492
Texas  (6-2) 1054  527 0.00  15 0.36982 16 0.3513 16 0.2404
Michigan   (6-2)  643  325 0.26  18 0.22561 21 0.2167 21 0.2341
California (5-2)  873  397 0.02   5 0.30632 18 0.2647 20 0.1970
Auburn

Re: [R] R routines vs. MATLAB/SPSS Routines

2007-10-26 Thread Douglas Bates

On 10/26/07, Frank Thomas [EMAIL PROTECTED] wrote:
...
 BTW: Contrary to some ideas both R   SPSS can be programmed and the
 algorithms for both have been published. So, no matter whether open
 source or private property you know what you do (if you want).

This is off the point of Matt's original question and I apologize for
hijacking the thread but where have the algorithms that underly the
SPSS procedures been published?

My particular interest is in the methods for the linear mixed models
implemented in MIXED in SPSS (and also PROC MIXED in SAS).  A person
who was quite enthusiastic about the MIXED procedure in SPSS sent me a
PDF file about MIXED that I suppose could be considered a description
of the algorithms as long as you didn't read it too closely.  However,
the descriptions are far too vague to use them as a basis for writing
code and furthermore they jump back and forth between two or three
different representations of the model without tying the different
threads together.  There is no indication of what representation forms
the basis of the code and how the calculations are implemented.  Even
more alarming, parts of it are flat-out wrong.  Even the mixed-model
equations as given in this document are wrong, as one would quickly
find out if one tried to implement them.  The organization is
disjointed and generally the language and grammar indicate that it has
not been copy edited carefully.  I would not give it a good grade if
it were submitted as a project report in my statistical computing
course.

I have been unable to trace the source of this document.  It is
definitely a discussion of the computational algorithms in MIXED but I
haven't been able to track its original source.  In a way I hope it
was a preliminary draft or something like that.  If SPSS released this
version as an official publication it is a sign that they have fallen
on hard times.

If one simply needs to have a reference to cite regarding the
calculations done n your analysis then a document like this, suitably
corrected, might do.  I don't know if this is what you meant by saying
that the algorithms have been published but if it is then this is not
sufficient for the purposes of investigating the computational
methods.  A few years ago Brian Ripley mentioned in a presentation, I
believe to the Royal Statistical Society, the need for a reference
implementation of a statistical computing technique.  That is, one
should provide code that is accessible to, usable by and modifyable by
other researchers in the field.  (Brian: If you know the presentation
to which I am referring, can you provide a reference or URL please?)

Perhaps not for students who simply need to apply a few statistical
techniques but certainly for researchers in the statistical techniques
and computational methods there is a world of difference between a
closed-source implementation with a supplementary document describing
the alleged computation method and an open-source implementation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is there a similar function to perform repeated statements asin SAS PROC MIXED?

2007-10-28 Thread Douglas Bates

On 10/24/07, Dimitris Rizopoulos [EMAIL PROTECTED] wrote:
 you can wave a look at the gls() function in package nlme, and
 specifically at the `weights' and `correlation' arguments; the same
 arguments are also available in the lme() function.

That is certainly a way of fitting a correlation structure directly
but I don't think of that as a mixed-effects model.  To me a
mixed-effects model is a model with both fixed-effects parameters and
random effects.

It is true that SAS PROC MIXED will fit such a model but I don't think
that necessarily makes it a mixed-effects model.  I think the SAS
formulation of mixed-effects models blurs many distinctions and leads
to considerable confusion.  At least it confuses me :-)

It doesn't help that the terminology used in PROC MIXED is so sloppy
as to be nonsensical.  Gallon Li uses the SAS terminology of an
unstructured variance-covariance matrix.  Perhaps I am being
pedantic but anyone who was not raised in a SAS-speaking environment
would, upon reflection, realize that this is asking for an
unstructured, highly-structured matrix.  It's an oxymoron.  Any
definition I have ever seen of the variance-covariance matrix of a
random vector requires that the matrix be symmetric (hence square) and
at least positive semi-definite, if not positive-definite.

Even in the one-dimensional case I would interpret an unstructured
variance matrix to be an unconstrained 1 by 1 matrix whose element is
required to be non-negative.  I have difficulty grasping the concept
of an unstructured matrix that is required to satisfy several
complicated constraints.

If the term had been a general variance-covariance matrix I think I
could understand what it was supposed to mean.

The REPEATED statement propagates the dangerous misconception that
there is a single model which is appropriate for any case of repeated
measures.   The approach I would recommend for a person using R is to
plot the data, formulate a preliminary model, fit the model, examine
the residuals, modify the model if appropriate, rinse and repeat.  In
the days when SAS was being developed the rinse and repeat part
could take several days so it made sense to try a one size fits all
model.  I don't feel that makes sense any more.

Let me describe what I understand the REPEATED model with an
unconstrained variance-covariance matrix to mean.  If I have it
wrong I hope you or others reading this will correct me.

Suppose the observations are indexed by subject and occasion.  For
definiteness, consider the first 5 days (Days 0 to 4) of data from the
sleepstudy data in the lme4 package.  Convert the Days variable to a
factor, which will will call occ for occasion, and incorporate the
indicators for the occasion in both the fixed effects and the random
effects indexed by subject.

 sleep04 - subset(sleepstudy, Days  5)
 sleep04$occ - factor(sleep04$Days)
 show(fm1 - lmer(Reaction ~ occ - 1 + (occ - 1|Subject), sleep04))
Linear mixed-effects model fit by REML
Formula: Reaction ~ occ - 1 + (occ - 1 | Subject)
   Data: sleep04
 AIC BIC logLik MLdeviance REMLdeviance
 808 858   -384  792.7  768
Random effects:
 Groups   Name Variance Std.Dev. Corr
 Subject  occ0  987.132 31.4187
  occ1 1072.423 32.7479  0.769
  occ2  823.516 28.6970  0.493 0.808
  occ3 1464.770 38.2723  0.481 0.769 0.913
  occ4 1764.239 42.0028  0.465 0.673 0.722 0.939
 Residual45.184  6.7219
Number of obs: 90, groups: Subject, 18

Fixed effects:
 Estimate Std. Error t value
occ0  256.652  7.573   33.89
occ1  264.496  7.880   33.57
occ2  265.362  6.947   38.20
occ3  282.992  9.159   30.90
occ4  288.649 10.026   28.79

Correlation of Fixed Effects:
 occ0  occ1  occ2  occ3
occ1 0.737
occ2 0.470 0.770
occ3 0.464 0.741 0.875
occ4 0.449 0.651 0.694 0.914

This model requires estimation of  15 variance-covariance parameters
using data from only 18 subjects.  You can fit such a model to more
that 5 occasions from these data but I'm sure the fits are badly
overparameterized.

Let me emphasize that I am not criticizing you for your perfectly
reasonable answer to Gallon Li's question, nor am I criticizing Gallon
Li for the question.  I am simply expressing my frustration at the
sloppy terminology and one size fits all mentality implicit in
saying that this model is the repeated measures model.

As an example of why I say that the REPEATED statement leads to
dangerous misconceptions, I'll give a bit of the story of the PBG data
set from the nlme package.   The data came from an experiment by a
cardiologist who was quite knowledgeable regarding various statistical
techniques.  He used these data in a tutorial paper about repeated
measures analysis published in a cardiology journal.  The repeated
measures analysis was somewhat disappointing in that it did not show a
significant effect for treatment.  However a plot of the data, such as
Fig. 3.4 in Pinheiro and Bates (2000), which can be reproduced by

Re: [R] Appropriate measure of correlation with'zero-inflated' data?

2007-10-29 Thread Douglas Bates

I have created some slides on the Bowl Championship series data for my
class.  Copies of just that part are available in
http://www.stat.wisc.edu/~bates/BCS.zip

Two interesting things about the analysis are that a parallel
coordinate plot shows the differences in agreement quite clearly and
that axes on the biplot from a principal components analysis are
readily interpretable.

It's a great example for those teaching multivariate analysis.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence Intervals for Random Effect BLUP's

2007-11-10 Thread Douglas Bates

On Nov 9, 2007 1:15 PM, Rick Bilonick [EMAIL PROTECTED] wrote:
 On Fri, 2007-11-09 at 18:55 +, Prof Brian Ripley wrote:

  I think Bert's point is important: I picked up a student on it in a case
  study presentation on this week because I could think of three
  interpretations, none strictly confidence intervals.  I think 'tolerance
  interval' is fairly standard for prediction of a random quantity: see
  ?predict.lm.
 

 I think prediction interval is what is usually used. Regardless, I'm not
 sure how predict.lm will be of much help because I asked specifically
 about BLUP's for random effects and the last time I checked lm did not
 handle mixed effects models. Neither predict.lme and predict.lmer
 provide intervals. Here is the code that I included in my original
 e-mail. My simple question is, will this code correctly compute a
 prediction interval for each subjects random effect? In particular, will
 the code handle the bVar slot correctly? Some postings warned about
 inappropriate access to slots. Here is the code that I asked about in my
 original e-mail:

Regarding the terminology, I prefer to call the quantities that are
returned by the ranef extractor the conditional modes of the random
effects. If you want to be precise, these are the conditional modes
(for a linear mixed model they are also the conditional means) of the
random effects B given Y = y, evaluated at the parameter estimates.
One can also evaluate the condtional variance-covariance of B given Y
= y and hence obtain a prediction interval.  These are returned by
ranef when the optional argument postVar is TRUE.  I think the
intervals you want are what are shown in the caterpillar plots.  Try

library(lme4)
qqmath(ranef(fm1 - lmer(Reaction ~ Days + (Days|Subject),
sleepstudy), postVar = TRUE))

BTW, the reason that I say conditional modes, rather than
conditional means, is so the term can apply to generalized linear
mixed models, nonlinear mixed models and generalized nonlinear mixed
models.  The conditional distribution of B given Y is a continuous
multivariate distribution that can be evaluated explicitly for a
linear mixed model but is known only up to a constant for the
generalized forms of the model.  We can still optimize the conditional
density of B given Y = y for particular values of the parameters but
doing so provides the conditional modes, not the conditional means.




 # OrthoFem has all the females from Orthodont from the nlme package


 library(lme4)
 fm1OrthF. - lmer(distance~age+(age|Subject), data=OrthoFem)

 lmer(distance~age+(age|Subject), data=OrthoFem)@bVar$Subject[2,2,]*
   (attr(VarCorr(lmer(distance~age+(age|
 Subject),data=OrthoFem)),sc)^2)[1]

 (attr(VarCorr(fm1OrthF.),sc)^2)[1]

 fm1.s - coef(fm1OrthF.)$Subject
 fm1.s.var - [EMAIL PROTECTED](attr(VarCorr(fm1OrthF.),sc)^2)[1]
 fm1.s0.s - sqrt(fm1.s.var[1,1,])
 fm1.s0.a - sqrt(fm1.s.var[2,2,])
 fm1.s[,1]+outer(fm1.s0.s, c(-2,0,2))
 fm1.s[,2]+outer(fm1.s0.a, c(-2,0,2))

  fm1.s
 (Intercept)   age
 F1014.48493 0.3758608
 F0917.26499 0.3529804
 F0616.77328 0.3986699
 F0116.95609 0.4041058
 F0518.36188 0.3855955
 F0717.28390 0.5193954
 F0216.05461 0.6336191
 F0819.40204 0.3562135
 F0316.35720 0.6727714
 F0419.02380 0.5258971
 F1119.13726 0.6498911

  fm1.s[,1]+outer(fm1.s0.s, c(-2,0,2))
   [,1] [,2] [,3]
  [1,] 12.21371 14.48493 16.75616
  [2,] 14.99377 17.26499 19.53622
  [3,] 14.50205 16.77328 19.04450
  [4,] 14.68487 16.95609 19.22732
  [5,] 16.09066 18.36188 20.63311
  [6,] 15.01267 17.28390 19.55512
  [7,] 13.78339 16.05461 18.32584
  [8,] 17.13082 19.40204 21.67327
  [9,] 14.08598 16.35720 18.62843
 [10,] 16.75257 19.02380 21.29502
 [11,] 16.86604 19.13726 21.40849

  fm1.s[,2]+outer(fm1.s0.a, c(-2,0,2))
[,1]  [,2]  [,3]
  [1,] 0.1738325 0.3758608 0.5778890
  [2,] 0.1509522 0.3529804 0.5550087
  [3,] 0.1966417 0.3986699 0.6006982
  [4,] 0.2020775 0.4041058 0.6061340
  [5,] 0.1835672 0.3855955 0.5876237
  [6,] 0.3173671 0.5193954 0.7214236
  [7,] 0.4315909 0.6336191 0.8356474
  [8,] 0.1541852 0.3562135 0.5582417
  [9,] 0.4707432 0.6727714 0.8747997
 [10,] 0.3238688 0.5258971 0.7279253
 [11,] 0.4478629 0.6498911 0.8519194


 This web page describes bVar:

 http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lme4/html/lmer-class.html

 bVar:
 A list of the diagonal inner blocks (upper triangles only) of
 the positive-definite matrices on the diagonal of the inverse of
 ZtZ+Omega. With the appropriate scale factor (and conversion to
 a symmetric matrix) these are the conditional
 variance-covariance matrices of the random effects.

 Rick B.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using lme (nlme) to find the conditional variance of therandom effects

2007-11-13 Thread Douglas Bates

On Nov 13, 2007 1:02 AM, Rick Bilonick [EMAIL PROTECTED] wrote:
 On Tue, 2007-11-13 at 01:03 -0500, Rick Bilonick wrote:

 
  Is there some way to get ranef with postVar=TRUE to show what the
  variances are, or what the lower and upper bounds are? qqmath makes nice
  plots but I need to obtain the numerical values.
 
  Rick B.
 
 I found a way:

 attr(ranef(lmer.13,postVar=TRUE)[[2]],postVar)

 But I still don't understand why it's not OK to access the bVar slot
 directly.

Because the class definitions can (and do, in the case of lme4)
change.  If you were to install the development version of the lme4
package

install.packages(lme4, repos = http://R-forge.R-project.org;)

you would find that the bVar slot doesn't exist any more.  However,
the ranef extractor with postVar = TRUE still does return the
conditional variances of the random effects.

 Also, the code I originally showed and the results from ranef are very
 similar with a correlation of 0.9983 (it varies very slightly from
 subject to subject):

 
 round(data.frame(a=as.numeric([EMAIL 
 PROTECTED](attr(VarCorr(lmer.13),sc)^2)[1]),
   b=as.numeric(attr(ranef(lmer.13,postVar=TRUE)[[2]],postVar))),10)
   ab
 1  5.41e-08 5.44e-08
 2  4.77e-08 4.83e-08
 3  6.24e-08 6.25e-08
 4  4.44e-08 4.52e-08
 5  6.50e-08 6.50e-08
 6  2.67e-08 2.92e-08
 7  5.07e-08 5.12e-08
 8  6.43e-08 6.43e-08
 9  3.64e-08 3.79e-08
 10 4.86e-08 4.92e-08
 11 6.33e-08 6.33e-08
 12 3.44e-08 3.60e-08
 13 4.16e-08 4.26e-08
 14 3.69e-08 3.83e-08
 15 5.96e-08 5.97e-08
 16 6.46e-08 6.46e-08
 17 3.28e-08 3.46e-08
 18 4.71e-08 4.77e-08
 19 5.18e-08 5.22e-08
 20 2.81e-08 3.04e-08
 21 3.97e-08 4.09e-08
 22 5.70e-08 5.72e-08
 23 6.06e-08 6.07e-08
 24 3.23e-08 3.42e-08
 25 4.94e-08 4.99e-08
 26 5.35e-08 5.38e-08
 27 3.86e-08 3.98e-08
 28 6.73e-08 6.73e-08
 29 4.68e-08 4.74e-08
 30 6.15e-08 6.16e-08
 31 4.67e-08 4.74e-08
 32 2.04e-08 2.37e-08
 33 3.45e-08 3.61e-08
 34 6.28e-08 6.29e-08
 35 5.53e-08 5.55e-08

 Not sure why they are not exactly the same.


 Rick B.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] box plot

2008-01-31 Thread Douglas Bates

2008/1/31 stephen sefick [EMAIL PROTECTED]:
 the data is attached

Thanks.

 I want to boxplot this data

 y-read.table(afdmmgs.txt, header=T)
 boxplot(y)

At this point you should use

str(y)

to examine the structure of the data.

 str(y)
'data.frame':   4 obs. of  12 variables:
 $ X215: num   30002 223630 316722 125081
 $ sc  : Factor w/ 3 levels 10894.12023,..: 1 2 3 3
 $ X202: Factor w/ 4 levels 25572.94416,..: 1 2 3 4
 $ X198: Factor w/ 3 levels 40899.10425,..: 1 2 3 3
 $ hc  : num9731  13503 459246   9574
 $ X190: num  46442 48592 82827 49852
 $ bc  : Factor w/ 3 levels 150297.1089,..: 2 1 3 3
 $ X185: Factor w/ 4 levels 49759.77869,..: 3 1 2 4
 $ X179: Factor w/ 3 levels 163687.2308,..: 1 2 3 3
 $ X148: num  278803 267594 457433 272564
 $ X119: num  326021 292475 956482 201198
 $ X61 : num   386116  390206 1002466  322183
 y$bc
[1] 26617.8358  150297.1089 na  na
Levels: 150297.1089 26617.8358 na

So what has happened is that your lower case 'na' is not recognized as
a missing value code.  Now go back and try

 y - read.table(/home/bates/Desktop/afdmmgs.txt, header = TRUE, na.strings 
 = na)
 str(y)
'data.frame':   4 obs. of  12 variables:
 $ X215: num   30002 223630 316722 125081
 $ sc  : num  10894 17648NANA
 $ X202: num  25573 47692 99062NA
 $ X198: num  40899 79627NANA
 $ hc  : num9731  13503 459246   9574
 $ X190: num  46442 48592 82827 49852
 $ bc  : num   26618 150297 NA NA
 $ X185: num  80735 49760 66975NA
 $ X179: num  163687 318565 NA NA
 $ X148: num  278803 267594 457433 272564
 $ X119: num  326021 292475 956482 201198
 $ X61 : num   386116  390206 1002466  322183

and things will work better.

 Error in oldClass(stats) - cl :
   adding class factor to an invalid object
 I get this error when I try and plot something with na in it

 can boxplot just overlook the na and boxplot the values?

 stephen


 --
 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods.  We are mammals, and have not exhausted the
 annoying little problems of being mammals.

 -K. Mullis

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is it possible with two random effects in lme()?

2008-02-03 Thread Douglas Bates

On Feb 1, 2008 7:30 AM, Falco tinnunculus [EMAIL PROTECTED] wrote:
 Dear all,

 Is it possible with two random effects in lme()?

  lmefit1- lme(Handling ~ Mass + factor(Prey)+ Mass*factor(Prey), random = ~
 1 |Place+Age)

I assume that Place and Age are not nested.  If so, it's possible to
fit such a model with lme but not easy. It is much better to use the
lmer function in the lme4 package.  The call would be

lmerfit1 - lmer(Handling ~ Mass + factor(Prey) + Mass * factor(Prey)
+ (1|Place) + (1|Age))

P.S. It is often better to send such questions to the special interest
group mailing list for mixed models, [EMAIL PROTECTED]
 You may get a faster answer from that list which is lower traffic.

 Here I use Place as random effect, but I also want to add Age as a random
 effect. Since there could be an effect of Age (continous variable), but I
 like to control for it rather than locking at the effect of Age on handling
 time, since Mass and Prey type are of main interest.

 Regards Kes,

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Maximum number of variables allowed in a multiple linearregression model

2008-02-06 Thread Douglas Bates

On Feb 6, 2008 11:28 AM, Tony Plate [EMAIL PROTECTED] wrote:
 Bert Gunter wrote:
  I strongly suggest you collaborate with a local statistician. I can think of
  no circumstance where multiple regression on hundreds of thousands of
  variables is anything more than a fancy random number generator.

 That sounds like a challenge!  What is the largest regression problem (in
 terms of numbers of variables) that people have encountered where it made
 sense to do some sort of linear regression (and gave useful results)?
 (Including multilevel and Bayesian techniques.)

I have fit linear and generalized linear models with hundreds of
thousands of coefficients but, of course, with a highly structured
model matrix and using sparse matrix techniques.  What is called the
Rasch model for analysis of item response data (e.g. correct/incorrect
responses by students to the items on a multiple-choice test) is a
generalized linear model with the students and the items as factors.

However, like Bert I would be very dubious of any attempt to fit a
linear regression model to 3000 variables that were not generated in a
systematic way.  Sounds like a massive, computer-fueled fishing
expedition (a.k.a. data mining).


 However, the original poster did say hundreds to thousands, which is
 smaller than hundreds of thousands.  When I try a regression problem with
 3,000 coefficients in R running under Windows XP 64 bit with 8Gb of memory
 on the machine and the /3Gb option active (i.e., R can get up to 3Gb), R
 2.6.1 runs out of memory (apparently trying to duplicate the model matrix):

 R version 2.6.1 (2007-11-26)
 Copyright (C) 2007 The R Foundation for Statistical Computing
 ISBN 3-900051-07-0

   m - 3000
   n - m * 10
   x - matrix(rnorm(n*m), ncol=m, nrow=n,
 dimnames=list(paste(C,1:n,sep=), paste(X,1:m,sep=)))
   dim(x)
 [1] 3  3000
   k - sample(m, 10)
   y - rowSums(x[,k]) + 10 * rnorm(n)
   fit - lm.fit(y=y, x=x)
 Error: cannot allocate vector of size 686.6 Mb
   object.size(x)/2^20
 [1] 687.7787
   memory.size()
 [1] -2022.552
  
 and the Windows process monitor shows the peak memory usage for Rgui.exe at
 2,137,923K.  But in a 64 bit version of R, I would be surprised if it was
 not possible to run this (given sufficient memory).

 However, R easily handles a slightly smaller problem:
   m - 1000 # of variables
   n - m * 10 # of rows
   k - sample(m, 10)
   x - matrix(rnorm(n*m), ncol=m, nrow=n,
 dimnames=list(paste(C,1:n,sep=), paste(X,1:m,sep=)))
   y - rowSums(x[,k]) + 10 * rnorm(n)
   fit - lm.fit(y=y, x=x)
   # distribution of coefs that should be one vs zero
   round(rbind(one=quantile(fit$coefficients[k]),
 zero=quantile(fit$coefficients[-k])), digits=2)
  0%   25%   50%  75% 100%
 one   0.94  0.98  1.04 1.10 1.18
 zero -0.30 -0.08 -0.01 0.06 0.29
  

 To echo Bert Gunter's cautions, one must be careful doing ordinary linear
 regression with large numbers of coefficients.  It does seem a little
 unlikely that there is sufficient data to get useful estimates of three
 thousand coefficients using linear regression in data managed in Excel
 (though I guess it could be possible using Excel 12.0, which can handle up
 to 1 million rows - recent versions prior to 2008 could handle on 64K rows
 - see http://en.wikipedia.org/wiki/Microsoft_Excel#Versions ).  So, the
 suggestion to consult a local statistician is good advice - there may be
 other more suitable approaches, and if some form of linear regression is an
 appropriate approach, there are things to do to gain confidence that the
 results of the linear regression convey useful information.

 -- Tony Plate


 
  -- Bert Gunter
  Genentech Nonclinical Statistics
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
  Behalf Of Michelle Chu
  Sent: Tuesday, February 05, 2008 9:00 AM
  To: R-help@r-project.org
  Subject: [R] Maximum number of variables allowed in a multiple
  linearregression model
 
  Hi,
 
  I appreciate it if someone can confirm the maximum number of variables
  allowed in a multiple linear regression model.  Currently, I am looking for
  a software with the capacity of handling approximately 3,000 variables.  I
  am using Excel to process the results.  Any information for processing a
  matrix from Excel with hundreds to thousands of variables will helpful.
 
  Best Regards,
  Michelle
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

Re: [R] ANOVA and lmer

2008-02-08 Thread Douglas Bates

On Feb 7, 2008 3:43 PM, Eric Imbert [EMAIL PROTECTED] wrote:

 I am analyzing from a very simple experiment.
 I have measured plants of two different colours (yellow and purple) in 9
 different populations.

 So, I have two different factors : a fixed effect (Colour with two
 levels) and a random one (Population with 9 levels).

 I first analyzed the data with the aov function
 LargS is the variable

 aov(formula = LargS ~ Col + Error(Col/Pop))

 Terms:
 Col
 Sum of Squares  3.440351
 Deg. of Freedom1

 Estimated effects are balanced

 Stratum 2: Col:Pop

 Terms:
 Residuals
 Sum  of Squares   3017.112
 Deg. of Freedom16

 Residual standard error: 13.73206

 Stratum 3: Within

 Terms:
 Residuals
 Sum of Squares   3347.385
 Deg. of Freedom   302

 To test for the interaction Col*Pop, I used the following F-ratio =
 (3017/16)/(3347/302) = 188. Highly significant !

 Now, let's go to the analysis performed by lmer - First I do the linear
 model without the Col*Pop interaction :
 m3=lmer(LargS ~ Col + (1 | Pop)

 And next with the interaction : m2=lmer(LargS ~ Col + (Col | Pop))

I don't think this model provides a comparison that is directly
comparable with the test you describe above.

There is a model that is intermediate to your m2 and m3 models in
terms of the number of parameters.  (Actually when I see these results
I realize that there is a bug in anova for lmer models in that the
count of the parameters 1 less than it should be.  I forgot to include
the residual variance as a parameter to be estimated.  However, this
will not affect comparisons here because it is only the differences in
the counts that matter.)

You model m2 is shown as having 5 parameters, which should be 6 (2
fixed effects, 3 variance components for the random effects, 1
residual variance estimate), while m2 is shown as having 3 parameters
(should be 4: 2 f.e. + 1 r.e. var + 1 resid var).  The model

LargS ~ Col + (1|Pop) + (1|Col:Pop)

which allows for random effects for each population and for each
Col:Pop combination, has 5 parameters (currently shown as 4).  The
random effects for Pop are independent of each other and with a
constant variance while the random effects for the Col:Pop
combinations are independent of each other and of the Pop random
effects and with another constant variance.  Thus there are 2 f.e. + 2
r.e. var + 1 resid. var.

I think it is important to plot the data first and decide which of
these models is indicated by the data.  It sounds like the structure
of your experiment is similar to the structure of the barley data
set in the lattice package.  Deepayan Sarkar's forthcoming book
Lattice: Multivariate Data Visualization with R (to be published by
Springer and due to ship next month) describes very effective ways of
plotting such data.  You can get a preview at his web site for the
book http://lmdvr.r-forge.r-project.org

 Comparing both models : anova(m2,m3) :

Df AIC BIC  logLik  Chisq Chi Df Pr(Chisq)
 m3  3 1710.67 1721.97 -852.33
 m2  5 1714.59 1733.43 -852.30 0.0746  2 0.9634

 = Conclusion : the interaction Col*Pop is not significant !

 I guess I am missing something.
 Who can help ?

 Eric


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [OT] good reference for mixed models and EM algorithm

2008-02-11 Thread Douglas Bates

On Feb 10, 2008 2:32 PM, Spencer Graves [EMAIL PROTECTED] wrote:
 Hi, Erin:

   Have you looked at Pinheiro and Bates (2000) Mixed-Effects Models
 in S and S-Plus (Springer)?

   As far as I know, Doug Bates has been the leading innovator in
 this area for the past 20 years.  Pinheiro was one of his graduate
 students.  The 'nlme' package was developed  by him or under his
 supervision, and 'lme4' is his current development platform.  The
 ~R\library\scripts subdirectory contains ch01.R, ch02.R, etc. =
 script files to work the examples in the book (where ~R = your R
 installation directory).  There are other good books, but I recommend
 you start with Pinheiro and Bates.

Except that Doug Bates doesn't use the EM algorithm for fitting mixed
models any more.  The lme4 package previously had an option for
starting with EM (actually ECME, which is a variant of EM) iterations
but I have since removed it.  For large data sets and especially for
models with non-nested random effects, the EM iterations just slowed
things down relative to direct optimization of the log-likelihood.


   Spencer Graves

 Erin Hodgess wrote:
  Dear R People:
 
  Sorry for the off-topic.  Could someone recommend a good reference for
  using the EM algorithm on mixed models, please?
 
  I've been looking and there are so many of them.  Perhaps someone here
  can narrow things down a bit.
 
  Thanks in advance,
  Sincerely,
  Erin
 


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie HLM with lme4 questions

2008-02-13 Thread Douglas Bates

On Feb 13, 2008 9:53 AM, Ista Zahn [EMAIL PROTECTED] wrote:
 Dear R listers,
 I know I'm breaking the rules by asking a homework related question--
 I hope you'll forgive me. I am a social psychology graduate student,
 and the only one in my department who uses R. I successfully completed
 my multiple regression and structural equation modeling courses using
 R (John Fox's car and sem packages were a big help, as was his book).
 Now I am taking a hierarchical linear modeling course, and am hoping
 that I can use R for this also.

I think that learning to use R for a course that is based on other
software doesn't fall within the no homework questions rule.

 I've searched the list archives, and
 consulted the one-line version of Pinheiro and Bates (available
 through my library), but I'm having a great deal of difficulty
 translating what I'm learning in class into lmer syntax.

Wow! A one-line version! I'd like to read that.  Must be a really long line.

 Specifically,
 the instructor is using HLM 6.0. In this program, one specifies the
 level one and level two models explicitly, and I'm having trouble
 understanding what the equivalent of this is in lmer. Most of the
 examples we cover in class are change models, i.e., we working with
 longitudinal data.

 Specific questions:

 in HLM 6.0, we build the following model;

 Y = P0 + P1*(CONFLICT) + P2*(TIMEYRS) + E
 Level-2 Model
 P0 = B00 + B01*(H0MCITOT) + R0
 P1 = B10 + B11*(H0MCITOT) + R1
 P2 = B20 + B21*(H0MCITOT) + R2

What determines the level 2 groups?  I don't see it in the
specification although I will admit that I have difficulty reading
those specifications.  To me they confound fixed effects and random
effects and end up expressing fixed-effects interactions in obscure
and confusing ways.

The way I have to understand this is to substitute the P0, P1 and P2
into the first equation, expand the terms then collect the fixed
effects and the random effects.  That would give me

Y = B00 + B01*(H0MCITOT) + B10*(CONFLICT) + B11*(CONFLICT)*(H0MCITOT)
+ B20*(TIMEYRS) + B21*(TIMEYRS)*(H0MCITOT) + R0 + R1*(CONFLICT) +
R2*(TIMEYRS) + E

To me the specification is not complete because I don't know what
groups the random effects R0, R1 and R2 are associated with.  Also,
are R0, R1 and R2 allowed to be correlated within groups or are they
required to be independent?

Let's say that GRP is the variable that defines the groups that share
a common level of the random effects.  Then the uncorrelated random
effects model is written

Y ~ CONFLICT*H0MCITOT + TIMEYRS*H0MCITOT + (1|GRP) + (CONFLICT-1|GRP)
+ (TIMEYRS-1|GRP)

and the model with correlated random effects is

Y ~ CONFLICT*H0MCITOT + TIMEYRS*H0MCITOT + (1+CONFLICT+TIMEYRS|GRP)

 Can someone explain to me how to represent this in lmer syntax? I've
 tried e.g.,

 lmer(MAT ~ 1 + CONFLICT + TIMEYRS + (1 + CONFLICT +
 + TIMEYRS | H0MCITOT))

 But I don't get the same result.

 More generally: Should I be using the lme4 package, the nlme package,
 or something else entirely? Is there any documentation for the lme4
 package that is geared more towards novice users?

To me the answer to the first question is to use lme4 except that the
answer to the second question is that there isn't documentation geared
to novice users.  To a large degree that is my fault because I keep
changing the package so that previously written introductory
documentation is no longer applicable.

The lmer function should be faster and more reliable than lme.  It
allows for fitting generalized linear mixed models and for fitting
models with crossed random effects (as in having random effects for
subject and for item) or partially crossed random effects (e.g.
longitudinal data indexed by students and teachers within schools
where students get exposed to different teachers and may move between
schools).  It can handle very large data sets - I showed an example of
the analysis of 1.6 million grade point scores for 55,000 students,
8,000 instructors and 100 departments in a message to the MULTILEVEL
listserv.
 Sorry for the long post, and thanks in advance for any help you can
 offer.

 --Ista
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cholmod error `matrix not positive definite'

2008-02-14 Thread Douglas Bates

Could you tell us which version of the lme4 package you are using?
You can just send the output produced by

sessionInfo()

If you can make your data available so we can test it then please do
so.  If the data set is large you could send it to me in private email
and I will make it available on a web site.

I think that the development version of the lme4 package, available via

install.packages(lme4, repos = http://r-forge.r-project.org;)

should be resistant to that type of error but I am willing to be shown
otherwise.

On Thu, Feb 14, 2008 at 10:36 AM, Martijn Vandegehuchte
[EMAIL PROTECTED] wrote:
 Dear R-users,

  I'm new to R, so my apologies if this question doesn't make sense.

  I've tried the following model in lmer, and it works perfectly:
  model-lmer(aphids~densroot+zone+(1|zone/site), family=quasipoisson)

  But if I try the exact same model with a different variable, totmas, the 
 model looks as follows:
  model-lmer(aphids~totmas+zone+(1|zone/site), family=quasipoisson)

  Totmas is also a continuous variable just like densroot, but in this case I 
 receive the following message:

  CHOLMOD warning: ß†e
  Error in objective(.par, ...) :
   Cholmod error `matrix not positive definite' at 
 file:../Supernodal/t_cholmod_super_numeric.c, line 613

  Moreover, if I test yet another continuous variable vitality, to my 
 surprise R just crashes completely.


  This is a mystery to me, especially because the variables totmas or vitality 
 don't give any problem when I build the exact same models in SAS with proc 
 glimmix...

  Does someone have experience with this type of problem?

  Thank you in advance,

  Martijn.




  --
  Martijn Vandegehuchte
  Ghent University
  Department Biology
  Terrestrial Ecology Unit
  K.L.Ledeganckstraat 35
  B-9000 Ghent
  telephone: +32 (0)9/264 50 84
  e-mail: [EMAIL PROTECTED]

  website TEREC: www.ecology.ugent.be/terec

 [[alternative HTML version deleted]]


 __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LMER

2008-02-15 Thread Douglas Bates

Could you send us the output of sessionInfo() please so we can see
which version of the lme4 package you are using?  In recent versions,
especially the development version available as

install.packages(lme4, repos = http://r-forge.r-project.org;)

the PQL algorithm is no longer used.  The Laplace approximation is now
the default.  The adaptive Gauss-Hermite quadrature (AGQ)
approximation may be offered in the future.

If the documentation indicates that PQL is the default then that is a
documentation error.  With the currently available implementation of
the direct optimization of the Laplace approximation to the
log-likelihood for the model there is no purpose in offering PQL.

On Thu, Feb 14, 2008 at 6:50 PM, Daniel Malter [EMAIL PROTECTED] wrote:
 Hi,

  I run the following models:

  1a. lmer(Y~X+(1|Subject),family=binomial(link=logit)) and
  1b. lmer(Y~X+(1|Subject),family=binomial(link=logit),method=PQL)

  Why does 1b produce results different from 1a? The reason why I am asking is
  that the help states that PQL is the default of GLMMs

  and

  2. gamm(Y~X,family=binomial(link=logit),random=list(Subject=~1))

  The interesting thing about the example below is, that gamm is also supposed
  to fit by PQL. Interestingly, however, the GAMM fit yields about the
  coefficient estimates of 1b. But the significance values of 1a. Any insight
  would be greatly appreciated.


  library(lme4)
  library(mgcv)

  Y=c(0,1,1,1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1)
  X=c(1,2,3,4,3,1,0,0,2,3,3,2,4,3,2,1,1,3,4,2,3)
  Subject=as.factor(c(1,2,3,4,5,6,7,1,2,3,4,5,6,7,1,2,3,4,5,6,7))
  cbind(Y,X,Subject)

  r1=lmer(Y~X+(1|Subject),family=binomial(link=logit))
  summary(r1)

  r2=lmer(Y~X+(1|Subject),family=binomial(link=logit),method=PQL)
  summary(r2)

  r3=gamm(Y~X,family=binomial(link=logit),random=list(Subject=~1))
  summary(r3$gam)



  -
  cuncta stricte discussurus

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] History of R

2008-02-15 Thread Douglas Bates

On Fri, Feb 15, 2008 at 1:53 PM, Kathy Gerber [EMAIL PROTECTED] wrote:
 Earlier today I sent a question to Frank Harrell as an R developer with
  whom I am most familiar.  He suggested also that I put my questions to
  the list for additional responses.  Next month I'll be giving a talk on
  R as an example of high quality open source software.  I think there is
  much to learn from R as a high quality extensible product that (at least
  as far as I can tell) has never been spun or hyped like so many open
  source fads.

  The question that intrigues me the most is why is R as an open source
  project is so incredibly successful and other projects, say for example,
  Octave don't enjoy that level of success?

First and foremost there is the incredible generosity of Ross Ihaka
and Robert Gentleman who, after spending an enormous amount of time
and effort in development of the initial implementation, did not
demand exclusive ownership of their work but allowed others to make
changes.  I believe Martin Maechler was the first non-Auckland person
to get write access to the source code repository and I'm sure that
the good experience of working at a distance with Martin persuaded R 
R to open it up to others.  Martin is polite, considerate, meticulous
and precise (he is a German-speaking Swiss so meticulous and precise
kind of comes with the territory) and you couldn't ask for a first
experience in sharing something that is very valuable to you with
someone whom you may never have met in person.

Not everyone has been that pleasant to work with.  One of the first
things that I did when I joined R-core was to blow up at Kurt and
Fritz about something - on Christmas Eve!  I surprised the group
didn't boot me out after that start.

When a project is gaining momentum the personalities of the initial
developers have a big influence on its success.  The R project has
been fortunate in that regard.

  I have some ideas of course, but I would really like to know your
  thoughts when you look at R from such a vantage point.

  Thanks.
  Kathy Gerber
  University of Virginia
  ITC - Research Computing Support

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error 'singular gradient' in nonlinear model fitting

2008-02-16 Thread Douglas Bates

On Feb 15, 2008 11:50 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Try fitting regressing log(WEIGHT) against log(TOTAL) using lm
 and then transform the results into starting values for nls (or
 possibly that is sufficient and you don't need the nls results).

Also, set the trace argument to TRUE in your call to nls.  This will
cause nls to print the progress of the iterations including the
residual sum of squares and the parameter values.  What usually
happens with a singular gradient is that one of the parameters,
probably beta, is heading to an extreme value.

This particular model can be reduced to a one-parameter optimization
problem by using the optional plinear algorithm.

nls(WEIGHT ~ (TOTAL^beta)/454, start = c(beta = 3), data =
spottedseatrout2004.female.data, algorithm = plinear)

 On Fri, Feb 15, 2008 at 12:41 PM, HongSheng Liao [EMAIL PROTECTED] wrote:
 
  w.age.female.2004 - nls(WEIGHT ~ (alpha*TOTAL^beta)/454,
 start=list(alpha=1, beta=3),
 data=spottedseatrout2004.female.data)
 
  I am trying to fit above model to length-weight data of a fish species
  (spotted seatrout) by year (1999-2006).  The convergence occurred for all
  the years except 2002 and 2004.  In these two year, R shows the error
  called 'singular gradient' in the model.  Could anyone please help me to
  fix the error?  I don't think there are any problems with my data because I
  can fit the same model to 2004 data without any problems using SAS.  Thank
  you very much in advance.
 
  Hongsheng (Hank) Liao, Ph.D.
  Lab Manager
  Center for Quantitative Fisheries Ecology
  800 West 46th Street
  Old Dominion University
  Norfolk, Virginia 23508
  Phone:757.683.4571
  Fax:757.683.5293
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 289 matches

Mail list logo