[R] test deviation from a binomial distribution - lack of 50:50
Dear R-users, I have a data set where each observation consists of a number of trials (n.trials) that varies between 5 and 7, 6 being most common. Each trial can take either of two outcomes, success or failure. A dummy data set: n.trials - sample(5:7, 50, replace=T, prob=c(0.2, 0.6, 0.2)) success - rbinom(50, n.trials, p=0.5) failure - n.trials - success I know I could test for a deviation from 50:50 success:failure in one or the other direction using a glm with binomial errors. However, I suspect that in my 'real' data set the outcome 50:50 is underrepresented, not due to a skew in one particular direction, but rather that within each observation there are either many successes or many failures. Although I did not manage to create a dummy data set with these properties, which would be the proper way in R to test for a 'lack of 50:50 outcome' using the simple dummy data above as a starting point? Thanks in advance! Henrik -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mcmcsamp-CI instead of P-values - references?
Dear all, Given the discussions and issues of d.f., p-values and mcmcsamp-CIs in mixed models, I wonder if anyone could recommend one or two papers (or other citable sources for that sake) that summarizes the arguments for/against P-values/mcCIs. I have followed the discussions on R-help and have learnt a lot (in my naive non-statistician way). With all respect to all great contributions, I may need published articles (if there are any), that could be suitable to use in the Methods-section of a manuscript, to motivate a no-p-values-but-some-nice-mcmcsampCIs-instead-approach! Thanks in advance! Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] equivalent of model.tables for an lm.object?
Dear all, I run a linear model with three significant explanatory variabels x1: a factor with 4 levels x2 and x3: factors with two levels each x4: continuous model - lm(y ~ x1 + x2 * x3 + x4) The data is not perfectly balanced between the different factor-combinations and I use treatment contrasts. With an aov.object, I assume I could have used model.tables(aov.object, type = means, se = TRUE), to get the means and se for all factor combinations. In an lm.object like mine, I calculate the means 'manually' from the Estimates (for sure it could be done with a script, but fair enough). For the standard error of the means, I started out using formulas of a variance of a sum of two variables, but I messed things up with the interaction. Is there a way to calculate the standard error of the means from Estimates and Std.Error (or other information) from the lm.object? Thanks in advance for any advice! Best regards, Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer applied to a wellknown (?) example: summary
Dear all, I wish to thank Christoph Buser and John Wilkinson for their input, and especially John his examples and for for pointing me to the thread 'Doubt about nested aov output' where the rat-example was hiding...: http://search.gmane.org/?query=Doubt+about+nested+aov+outputemail=group=gmane.comp.lang.r.generalsort=revdateDEFAULTOP=andxP=doubt.nested.aov.output.xFILTERS=Gcomp.lang.r.general---A In this great thread you find a description not only on using aov on this nested rat-data, but also how to handle it with lmer, e.g. model specification and coding of grouping levels. Best regards, Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer applied to a wellknown (?) example
Dear all, During my pre-R era I tried (yes, tried) to understand mixed models by working through the 'rat example' in Sokal and Rohlfs Biometry (2000) 3ed p 288-292. The same example was later used by Crawley (2002) in his Statistical Computing p 363-373 and I have seen the same data being used elsewhere in the litterature. Because this example is so thoroughly described, I thought it would be very interesting to analyse it also using lmer and to see how the different approaches and outputs differs - from the more or less manual old-school (?) approach in Sokal, aov in Crawley, and to mixed models by lmer. In the example, three treatments (Treatment) with two rats (Rat) each (i.e six unique rats in total). Three liver preparations (Liver) are taken from each rat (i.e 18 unique liver preparations), and two glycogen readings (Glycogen) are taken from each liver preparation (36 readings). We want to test if treatments has affected the glycogen levels. The readings are nested in preparation and the preparations nested in rats. The data can be found here (or on p. 289 in Sokal): http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/rats.txt // I was hoping to use the rat example as some kind of reference on my way to understand mixed models and using lmer. However, first I wish someone could check my suggested models! My suggestions: attach(rats) rats$Glycogen - as.numeric(Glycogen) rats$Treatment - as.factor(Treatment) rats$Rat - as.factor(Rat) rats$Liver - as.factor(Liver) str(rats) model1 - lmer(Glycogen ~ Treatment + (1|Liver) + (1|Rat), data=rats) summary(model1) Was that it? I also tried to make the 'liver-in-rat' nesting explicit (as suggested in 'Examples from...') model2 - lmer(Glycogen ~ Treatment + (1|Rat:Liver) + (1|Rat), data=rats) summary(model2) but then the random effects differs from model1. Does the non-unique coding of rats and preparations in the original data set mess things up? Do I need to recode the ids to unique levels... rats$rat2 - as.factor(rep(1:6, each=6)) rats$liver2 - as.factor(rep(1:18, each=2)) str(rats) ...and then: model3 - lmer(Glycogen ~ Treatment + (1|liver2) + (1|rat2), data=rats) # or maybe model3 - lmer(Glycogen ~ Treatment + (1|rat2:liver2) + (1|rat2), data=rats) Can anyone help me to get this right! Thanks in advance! P.S. Thanks to all contributors to lme/lmer topic on the list (yes, I have searched for 'lmer nested'...) and also the examples provided by John Fox' 'Linear mixed models, Appendix to An R and S-PLUS companion...' and Douglas Bates' 'Examples from Multilevel Software...' and R-news 5/1. Very helpful, but as usually I bet I missed something...Sorry. Regards, Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmList and missing values
Dear all, I have a question on handling of missing values in lmList. My data set have continuous predictor and response, x and y, and a grouping variable group.id. All these variables have NAs and the data set also has several other variables that also contains NAs. To create the lmList-object seems to work fine: y.list - lmList(y ~ x | group.id, data=mydata, na.action=na.omit) However, when I try to apply functions on the object such as coef or intervals it fails: coef(y.list) Error in !unlist(lapply(coefs, is.null)) : invalid argument type If I beforehand select only the variables (still including missing values) used as arguments in lmList... mydata2 - mydata[ , c(x, y, group)] ...and use mydata2 in lmList as above, coef and intervals works fine. In order to provide a reproducible example, I made a small test data set. This data set contained the same pattern of NAs as in my real data set, i.e. NAs in both used and unused variables. The example turned out to be not very illustrative though, because strange enough 'na.action=na.omit' works just fine on the small test data set so I don't bother to include it here... I have not encountered any problems to apply other functions, such as lm or lme, to my data set. Any idea what causes the error? A related problem seems to have occured before, although it is not clear if different na.action options was tried in this case: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/14293.html Thanks in advance for any help! Best regards, Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fixed effects following lmer and mcmcsamp - which to present?
Dear all, I am running a mixed model using lmer. In order to obtain CI of individual coefficients I use mcmcsamp. However, I need advice which values that are most appropriate to present in result section of a paper. I have not used mixed models and lmer so much before so my question is probably very naive. However, to avoid to much problems with journal editors and referees addicted to p-values, I would appreciate advice of which values of the output for the fixed factor that would be most appropriate to present in a result section, in order to convince them of the p-value free 'lmer-mcmcsamp'-approach! Using the example from the help page on lmer: fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ...I obtain the following for 'Days': summary(fm1) ... Estimate Std. Error t value Days 10.4673 1.54586.771 ...and from mcmcsamp: summary(mcmcsamp(fm1 , n = 1)) 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE Days 10.4695 1.7354 0.017354 0.015921 2. Quantiles for each variable: 2.5% 25% 50% 75%97.5% Days 7.02279.3395 10.4712 11.5719 13.957 The standard way of presenting coefficients following a 'non-lmer' output is often (beta=..., SE=..., statistic=..., P=...). What would be the best equivalent in a 'lmer-mcmcsamp-context'? (beta=..., CI=...) is a good start I believe. But which beta? And what else? I assume that the a 95% CI in this case would be 7.0227-13.957 (please, do correct me I have completely misunderstood!). But which would be the corresponding beta? 10.4673?, 10.4695? 10.4712? Is the t-value worth presenting or is it 'useless' without corresponding degrees of freedom and P-value? If I present the mcmcsamp-CI, does it make sense to present any of the three SE obtained in the output above? BTW, I have no idea what Naive SE, Time-series SE means. Could not find much in help and pdfs to coda or Matrix, or in Google. Thanks in advance for any advice and hints to help-texts I have missed! Best regards, Henrik __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] paired t-test. Need to rearrange data?
Dear all, I have received some data arranged like this: # original data id - rep(letters[1:6], each=2) time - as.factor(rep(1:2, 6)) y - as.vector(replicate(6, c(rnorm(n=1, mean=1), rnorm(n=1, mean=2 test.data - data.frame(id, time, y) test.data I would like to perform a paired t-test of the y-values at time=1 against those at time=2, with samples paired by their id. Is it necessary to arrange the data in a format like this: # rearranged data id - letters[1:6] y1 - replicate(6, rnorm(n=1, mean=1)) # y-value at time = 1 y2 - replicate(6, rnorm(n=1, mean=2)) # y-value at time = 2 test.data2 - data.frame(id, y1, y2) test.data2 ...in order to perform a paired t-test? t.test(y1, y2, paired=T) If yes, which is the most convenient way to rearrange the data? Or is it possible to apply the paired t-test function to the original data set? And a side question: In my examples, I suppose can I use set.seed to reproduce the 'rnorm-values' created in the 'original data' also in my the 'rearranged data'. Can someone give me a hint of how to apply the same 'seed' to all the rnorms? Thanks a lot in advance! Henrik __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] efficient way to make NAs of empty cells in a factor (or character)
Dear all, I have some csv-files (originating from Excel-files) containing empty cells. In my example file I have four variables of different classes, each with some empty cells in the original csv-file: test - read.csv2(test.csv, dec=.) test id id2 x y 1 a 1 NA 2 b e NA 2.2 3 f 3 3.3 4 c g 4 4.4 class(test$id) [1] factor class(test$id2) [1] factor class(test$x) [1] integer class(test$y) [1] numeric In the help text of read.csv2 you can read 'Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.'. Thus, empty cells in a factor (or a character I assume) is not considered as missing values but an own level: is.na(test$id) [1] FALSE FALSE FALSE FALSE levels(test$id) [1] a b c When I work with my real (larger) dataset I would like to use functions like 'is.na' and '!is.na' on factors. Now I wonder if there is an R alternativ to do 'search (for empty cells) and replace (with NA)' in Excel? I have tried a modification of Uwe Ligges suggestion on missing value posted 2 Aug: is.na(test[test==]) - TRUE ...but it did not work on the data set: Error in [-.data.frame(`*tmp*`, test == , value = c(NA, NA, NA, NA : rhs is the wrong length for indexing by a logical matrix However it worked fine when applied to a single vector: is.na(test$id[test$id==]) - TRUE test$id [1] abNA c Levels: a b c is.na(test$id) [1] FALSE FALSE TRUE FALSE Is there a more efficient way to fill empty cells in all my factors in R or should I just do it in advance in Excel by 'search and replace'? Thanks in advance! -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] break axis using plotrix
Dear all, I am trying to plot some data with differing range in y-values with type=b, adding error bars and break the y-axis into two parts, one lower part from 12 to 20, and one upper part from 34 to 40. I have tried to follow the basic ideas from the script provided here by Jim Lemon: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56487.html My attempt looks like this: ### # x-values x - 1:4 # small y-values with corresponding standard errors meansarr - c(14.9, 18.2, 14.5, 18.3) searr - c(0.47, 1.27, 1.22, 0.49) # large values meanslay - c(36.4, 39.0, 35.3, 38.6) selay - c(0.51, 0.34, 0.57, 0.40) library(plotrix) # plot small values plot(x, meansarr, ylim=c(12, 30), axes=F, type=b, xlab=, ylab=Day) arrows(x, meansarr-searr, x, meansarr+searr, code = 3, angle = 90, length = 0.03) box() # x-axis axis(1, tck=0.01, las=1, at=1:4, labels=c(1998, 1999, 2002, 2003), mgp=c(3, 0.5, 0)) # y-axis axis(2,at=c(12, 14, 16, 18, 20, 24, 26, 28, 30),labels=c(12,14,16,18,20, 34,36,38,40)) # break axis axis.break(2, 22, style=zigzag) # add large values to same plot par(new=TRUE) plot(x, meanslay, ylim=c(30, 40), type=b, xlab=, ylab=Day, axes=F) arrows(x, meanslay-selay, x, meanslay+selay, code = 3, angle = 90, length = 0.03) As you can see, I have problems adding the larger y-values - they end up in the wrong place in the graph. I suppose Jim's warning 'just be careful that the ylim= and labels= arguments match up' is relevant here, but I don't manage to fix it... Can anyone help me to plot the large y-values on the right placeaccording to the y-axis labeling? I am using R 2.3.1 and WinXp. Thanks a lot in advance! Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] more documentation on lmer?
Dear Bill, You might check Faraway's 'Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models' (2006). Without having read the book properly, at least I noticed that package lme4 and the function lmer is used in the examples. Best regards, Henrik Hello. Is there any documentation on the lmer function in the lme4 package beyond what was published in the May 2005 R News (vol.5/1)? As well, has the nonlinear version of lmer appeared yet? Bill Shipley -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] dist() between groups of data
Dear all, I have a data set with x and y positions of nests. Each nest can be grouped according to a factor z. I have made a small dummy data set, where the grouping variable z is 0 or 1: x - runif(6) y - runif(6) z - rep(c(0,1), each=3) xyz - cbind(x,y,z) xyz x y z [1,] 0.7176801 0.760944688 0 [2,] 0.2377928 0.080622524 0 [3,] 0.9450770 0.022039470 0 [4,] 0.8725492 0.971996677 1 [5,] 0.6356198 0.001569859 1 [6,] 0.8949670 0.066044377 1 First of all, I suppose that this is the correct way to calculate the Euclidean distance between /all/ nests: xy - cbind(x,y) d.all - dist(xy) However, I would like to obtain a distance matrix for distances between nests belonging to certain groups. For example calculate all distances from nests where z equals something, to nests where z equals something else. Is it best to split the data in two matrices according to the value of z in the two groups to be compared, and then try to follow the suggestions by Gabor Grothendieck* Date:* Wed 07 Jan 2004 - 15:16:37 EST , or are there other ways that can be applied straight to one matrix? Thanks in advance! Henrik -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] SciViews-R_0.8-9 Console problem
Dear R users, I successfully installed SciViews the other day. However, when I try to run it now, the command/script window does not appear. Strange. Well, actually I see a tendency to a script window (in the lower part of the sciview window where it is suppose to be) during start up, but when the program is entirely open, the script window is gone. I have tried to uninstall and reinstall latest versions available of R, Sciviews and Tinn-R several times, but it does not solve the problem. I have tried both the 'Detailed installation of Sciviews-R' and 'Manually installing additional R packages'. Another 'new' problem I did not have before is the default size of the Sciviews window - both the upper and lower part 'disappears' outside my screen when starting up. And in the upper 'R-window' two prompts appear with maybe 6-7 lines in between. However, as soon as I hit a key the lower of these to prompt disappears. But still no script editor. I suspect I have missed something fundamental...but what? Any suggestions that can help me is appreciated! Thanks! Henrik PS I have tried to mail to [EMAIL PROTECTED], but the mail just bounces. I have WinXP, PIII, 512 RAM R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 2.0 $year [1] 2005 $month [1] 10 $day [1] 06 $svn rev [1] 35749 $language [1] R capabilities(tcltk) tcltk TRUE SciViews version: SciViews-R_0.8-9Setup.exe From the SciViews R console: search() [1] .GlobalEnvpackage:Rcmdr package:car [4] package:svGUI package:svViews package:svIO[7] package:svMiscpackage:R2HTMLpackage:tcltk [10] package:methods package:stats package:graphics [13] package:grDevices package:utils package:datasets [16] TempEnv RcmdrEnv Autoloads [19] package:base -- Henrik Pärn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html