Re: [R] Wilcoxon Test and Mean Ratios
On 2012-09-20 21:07, Thomas Lumley wrote: On Fri, Sep 21, 2012 at 6:43 AM, avinash barnwal avinashbarnwal...@gmail.com wrote: Hi, http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test We can clearly see that null hypothesis is median different or not. One way of proving non difference is P(XY) = P(XY) where X and Y are ordered. Avinash. No. Firstly, the Wikipedia link is for the WIlcoxon signed rank test, which is a different test and so is irrelevant. Even if the signed-rank test were the one being discussed, you are still incorrect. The signed rank test is on the median of differences, not the difference in medians. These are not the same, and need not even be in the same direction. Secondly, it is easy to establish that the WIlcoxon rank sum test need not agree with the ordering in medians, just by looking at examples, as Peter showed Thirdly, there is a well-known demonstration originally due to Brad Efron, Efron's non-transitive dice', which implies that the Mann-Whitney U test (which *is* equivalent to the Wilcoxon rank-sum test) need not agree with the ordering given by *any* one-sample summary statistic. In this case, assuming the sample sizes are not too small (which looks plausible given the p-value), the question is what summary the original poster want's to compare: the mean (in which case the t-test is the only option) or some other summary. I'll just chime in here and point towards the Fay and Proschan (2010) paper discussing decision rules, and their assumptions, in the two-sample situation. It's freely available at http://www.i-journals.org/ss/viewarticle.php?id=51 Henric It's not possible to work this out from the distribution of the data, so we need to ask the original poster. With reasonably large sample sizes he can get a permutation test and bootstrap confidence interval for any summary statistic of interest, but for the mean these will just reduce to the t-test. Rank tests (apart from Mood's test for quantiles, which has different problems) can really behave very strangely in the absence of stochastic ordering, because without stochastic ordering there is no non-parametric way to define the direction of difference between two samples. It's important to remember that all the beautiful theory for rank tests was developed under the (much stronger) a location shift model: the distribution can have any shape, but the shape is assumed to be identical in the two groups. Or, as one of my colleagues puts it you don't know whether the treatment raises or lowers the outcome, but you know it doesn't change anything else. Knowledgeable and sensible statisticians who like the Wilcoxon test (Frank Harrell comes to mind) like it because they believe stochastic ordering is a reasonable assumption in the problems they work in, not because they think you can do non-parametric testing in its absence. -thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debug vs regular mode
On 2012-08-10 06:10, Zhang, Peng wrote: Thanks to both for your reply. library(glmulti) testdata = cbind(Y=rnorm(100), data.frame(matrix(rnorm(100*50), ncol = 50))) glmulti(Y~(X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15)*X16, data = testdata, level = 2) This is reproducible to get a segmentation fault. There's some information missing here, notably the output from 'sessionInfo()', but I guess this is under GNU/Linux. I can confirm a seg. fault under Fedora FC17 x86_64. Under Windows 7 64-bit, however, we get some further info: library(glmulti) Loading required package: rJava testdata = cbind(Y=rnorm(100), data.frame(matrix(rnorm(100*50), ncol = 50))) glmulti(Y~(X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15)*X16, data + = testdata, level = 2) Initialization... Error in .jnew(glmulti/ModelGenerator, y, .jarray(xc), .jarray(xq), : java.lang.ArrayIndexOutOfBoundsException: 15 sessionInfo() R version 2.15.1 Patched (2012-08-06 r60178) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 [3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C [5] LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glmulti_1.0.4 rJava_0.9-3 So, this doesn't seem to be a bug in R and is thus likely to need the attention of the 'glmulti' package's maintainer. HTH, Henric But I have troubles to extract the exact information from this S4 class to make a simpler example because of my limited knowledge on S4 class. The author of the package is busy at the moment, and does not have time to look into it. Peng On 08/09/2012 10:25 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: On Aug 9, 2012, at 9:14 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 9, 2012, at 4:56 PM, Zhang, Peng wrote: Dear all, I had a R segmentation fault, and then invoked debug mode and ran step by step. 2. Why does the same function behave differently under debug and regular mode? I cannot help you there. Though a reproducible segfault is certainly worth a bug report if you can do so, in debug or regular modes. You may wish to search stackoverflow for tips on how to make a great reproducible example in R. Michael ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debug vs regular mode
On 2012-08-10 15:42, Zhang, Peng wrote: You are right. I am running Arch Linux. However, I obtained a segmentation directly, so didn't know where to find the bug?? library(glmulti) Loading required package: rJava testdata = cbind(Y=rnorm(100), data.frame(matrix(rnorm(100*50), ncol = 50))) glmulti(Y~(X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15)*X16, data= testdata, level = 2) Segmentation fault Is this information Error in .jnew(glmulti/ModelGenerator, y, jarray(xc), .jarray(xq), : java.lang.ArrayIndexOutOfBoundsException: 15 only in Windows, or did you see it under Fedora as well? Windows only. Fedora just returned Segmentation fault. //Henric Thank you! Peng BTW: $ uname -a Linux Precision 3.4.7-1-ARCH #1 SMP PREEMPT Sun Jul 29 22:02:56 CEST 2012 x86_64 GNU/Linux sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base On 08/10/2012 09:25 AM, Henric (Nilsson) Winell wrote: On 2012-08-10 06:10, Zhang, Peng wrote: Thanks to both for your reply. library(glmulti) testdata = cbind(Y=rnorm(100), data.frame(matrix(rnorm(100*50), ncol = 50))) glmulti(Y~(X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15)*X16, data = testdata, level = 2) This is reproducible to get a segmentation fault. There's some information missing here, notably the output from 'sessionInfo()', but I guess this is under GNU/Linux. I can confirm a seg. fault under Fedora FC17 x86_64. Under Windows 7 64-bit, however, we get some further info: library(glmulti) Loading required package: rJava testdata = cbind(Y=rnorm(100), data.frame(matrix(rnorm(100*50), ncol = 50))) glmulti(Y~(X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15)*X16, data + = testdata, level = 2) Initialization... Error in .jnew(glmulti/ModelGenerator, y, .jarray(xc), .jarray(xq), : java.lang.ArrayIndexOutOfBoundsException: 15 sessionInfo() R version 2.15.1 Patched (2012-08-06 r60178) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 [3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C [5] LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glmulti_1.0.4 rJava_0.9-3 So, this doesn't seem to be a bug in R and is thus likely to need the attention of the 'glmulti' package's maintainer. HTH, Henric But I have troubles to extract the exact information from this S4 class to make a simpler example because of my limited knowledge on S4 class. The author of the package is busy at the moment, and does not have time to look into it. Peng On 08/09/2012 10:25 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: On Aug 9, 2012, at 9:14 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 9, 2012, at 4:56 PM, Zhang, Peng wrote: Dear all, I had a R segmentation fault, and then invoked debug mode and ran step by step. 2. Why does the same function behave differently under debug and regular mode? I cannot help you there. Though a reproducible segfault is certainly worth a bug report if you can do so, in debug or regular modes. You may wish to search stackoverflow for tips on how to make a great reproducible example in R. Michael ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme4 t value for 3 levels of fixed factor
On 2012-07-27 05:50, Obermeier Andrew wrote: Hello, I just joined this list today, so am worried about proper protocol, but would like to post a question about lme4. The R-sig-mixed-models list (https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models) may be a better place for questions on lme4 and other packages fitting mixed effects models. In Baayen, Davidson, and Bates (2008), Mixed-effects modeling with crossed random effects for subjects and items, the authors describe steps for a Latin Square Design (p. 402) in which they compare 3 levels of the experimental conditions. I am considering replicating this analysis for my dissertation, I would also like to investigate 3 levels of my factor, but wish to confirm how lme4 derives the t value. It is my understanding that t values can only be used to compare 2 means. For 3 levels, does lme4 do some kind of pairwise comparison? For pairwise comparisons, and other contrasts, take a look at the 'multcomp' package. HTH, Henric __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mann-Whitney by group
On 2012-07-17 05:13, R. Michael Weylandt wrote: On Mon, Jul 16, 2012 at 3:39 PM, Oxenstierna david.chert...@gmail.com wrote: lapply(thing, function(x) x[['p.value']]) --works very well, thank you. Not to be a chore, but I'm interested in comparing the results of wilcox.test--and the methodology we've employed so far--with the results and methodology of wilcox_test (library(coin)). So, I'd like to compare There should not be any differences between the p-values obtained using 'wilcox.test' and 'wilcox_test' in the asymptotic case. However, the latter function allows you to use the exact null distribution even in the presence of ties, or use an Monte Carlo approximation of the exact null distribution. Using the approximately exact null distribution is particularly helpful when the asymptotics doesn't work well, say, large but unbalanced data, and/or the exact computations are too time consuming. groups 5 and 6 across A through H using wilcox_test, and then I'd like to extract the p-values. Going through the same methodology as above, but replacing wilcox.test with wilcox_test has failed, and so has the p.value extraction method: lapply(thing, function(x) x[['p.value']]) . I believe the latter failure has to do with the fact that the coin package has a built-in class and built-in extraction method (pvalue() to extract and class IndependenceTest), but I don't know how to work around it. For example, for a single comparison: wilcox_test(A~Group, Dtb) works fine, and pvalue(wilcox.test(A~Group, Dtb)) extracts the p-value. So, any ideas about how to compare groups 5 and 6 across A through H using wilcox_test? Well, since you're doing multiple tests here (A, C, ..., H vs Group) you should consider adjusting for multiplicity and 'coin' allows you to do that easily and efficiently. A multivariate Wilcoxon rank-sum test can be constructed using set.seed(711109) # for reproducibility it - independence_test(A + C + D + E + F + G + H ~ Group, data = Dtb, ytrafo = function(data) trafo(data, numeric_trafo = rank), distribution = approximate(B = 10)) approximating the exact null distribution using 100,000 Monte Carlo replicates. Step-down adjusted p-values taking account of the dependence structure between the test statistics and possible discreteness in the null distribution are available through (psd - pvalue(it, step-down)) A C D E F G H 5 0.08598 0.08598 0.08598 0.20018 0.08598 0.08598 0.34182 and using the development version of 'coin', available at R-Forge, we can get the unadjusted p-values from (pu - pvalue(it, unadjusted)) A C D E F G H 5 0.02894 0.02894 0.02894 0.11512 0.02894 0.02894 0.34182 If we look at the ratio of step-down adjusted and unadjusted p-values, psd / pu ACDEFG H 5 2.970974 2.970974 2.970974 1.738881 2.970974 2.970974 1 we can see that this type of adjustment is pretty powerful compared to step-down methods like Bonferroni-Holm that doesn't take account of the correlation nor the discreteness p.adjust(pu) / pu A C D E F G H 5 7 7 7 2 7 7 1 HTH, Henric I think there are a few things at play here. 1) coin uses so-called S4 objects, so `[[` style subsetting isn't going to work. The right way is, as you have found to use the pvalue() function. 2) It looks like you need to use the formula intervace for wilcox_test. Unfortunately, this makes things a little more complicated as you'll need to construct the formula programmatically. A one liner looks something like this. lapply(LETTERS[1:8], function(x) pvalue(wilcox_test(as.formula(paste(x, ~ Group )), Dtb))) Where lapply loops over the letters A,B, etc. and makes the string `A ~ Group`, converts it to a formula, passes that to wilcox_test, then gets the pvalue and returns it. In two lines you could do: thing - lapply(LETTERS[1:8], function(x) wilcox_test(as.formula(paste(x, ~ Group)), Dtb)) thing2 - lapply(thing, pvalue) Where thing has all the test result objects, and thing2 collects the pvalues. Hope this helps, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutation test on paired samples
Holger, Thanks for providing a reproducible example. However, since your space key only works sporadically, the below is a little hard to read... ;) On 2012-07-12 20:26, Holger Taschenberger wrote: Hi, I'm trying to run a permutation test on paired samples. First I tried the package exactRankTests: require(exactRankTests) x - c(1.83,0.50,1.62,2.48,1.68,1.88,1.55,3.06,1.30) y - c(0.878,0.647,0.598,2.05,1.06,1.29,1.06,3.14,1.29) The relevant output missing here is wilcox.test(x,y,paired = TRUE,alternative = greater) Wilcoxon signed rank test data: x and y V = 40, p-value = 0.01953 alternative hypothesis: true location shift is greater than 0 perm.test(y,x,paired = TRUE,exact = TRUE,alternative = greater) 1-sample Permutation Test (scores mapped into 1:m using rounded scores) data: y and x T = 41, p-value = 0.003906 alternative hypothesis: true mu is greater than 0 Firstly, you've interchanged the 'x' and 'y' in the second call. Secondly, and more important, the output says that (scores mapped into 1:m using rounded scores). In this case this can easily be avoided, and note the interchange of 'x' and 'y' to match your 'wilcox.test' call, using: yy - 1000 * y xx - 1000 * x perm.test(xx, yy, paired = TRUE, exact = TRUE, + alternative = greater) 1-sample Permutation Test data: xx and yy T = 4114, p-value = 0.01367 alternative hypothesis: true mu is greater than 0 So, now that we've computed the correct p-value, let's see how to obtain this using the 'coin' package: Then I wanted to use the package 'coin': require(coin) x - c(1.83,0.50,1.62,2.48,1.68,1.88,1.55,3.06,1.30) y - c(0.878,0.647,0.598,2.05,1.06,1.29,1.06,3.14,1.29) xydat - data.frame(y = c(y,x),x = gl(2,length(x)),block = factor(rep(1:length(x),2))) The relevant output missing here is wilcoxsign_test(y ~ x | block,data = xydat,alternative = greater,distribution = exact()) Exact Wilcoxon-Signed-Rank Test data: y by x (neg, pos) stratified by block Z = 2.0732, p-value = 0.01953 alternative hypothesis: true mu is greater than 0 oneway_test(y ~ x | block,data = xydat,alternative = greater,distribution = exact()) Exact 2-Sample Permutation Test data: y by x (1, 2) stratified by block Z = -2.1948, p-value = 0.6982 alternative hypothesis: true mu is greater than 0 Using 'oneway_test' in this way does *not* correspond to a paired test. The raw scores version of the Wilcoxon signed-rank test can be constructed using diff - x - y y - as.vector(t(cbind(abs(diff) * (diff 0), +abs(diff) * (diff = 0 x - factor(rep(c(neg, pos), length(diff)), + levels = c(pos, neg)) b - gl(length(diff), 2) oneway_test(y ~ x | b, alternative = greater, distr = exact) Exact 2-Sample Permutation Test data: y by x (pos, neg) stratified by b Z = 2.1948, p-value = 0.01367 alternative hypothesis: true mu is greater than 0 And, as you can see, this is equal to the 'perm.test' result. HTH, Henric While the results of the Wilcoxon test are the same for both packages are the same, those of the permutation test are very different. So, obviously I'm doing something wrong here. Can somebody please help? Thanks a lot, Holger __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ctree - party package multivariate response variables
On 2010-03-09 04:40, valeriano.parravic...@unige.it wrote: Hi, I have a problem with ctree of party package. I have data on distribution of more than one species (about 50 species) and I would like identify the relation of this multivariate object (species distribution) with a number of explanatory variables. rs is the name of my dataframe containing the species (columns from 2 to 51) and the explanatory variables (columns 52 and 53). Rows are my sampling sites. I wrote: species-rs[,2:51] v1-rs[,52] v2-rs[53] tree-ctree(species~v1+v2) It does not work , but when I use the same formula for the univariate case (i.e. a single column - e.g. the total number of species in each samplig sites) it works. I know that ctree can handle multivariate response variables, but I cannot figure out how to do that. The response variables needs to be explicitly specified, e.g. ctree(y1 + y2 ~ x1 + x2) gives you a bivariate response. HTH, Henric Someone can help me? Thank you Valeriano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to export a function from a package and access it only by specifying the namespace?
On 2009-12-02 16:31, Peng Yu wrote: On Tue, Dec 1, 2009 at 11:27 PM, Sharpie ch...@sharpsteen.net wrote: Peng Yu wrote: Then I try the package 'try.package' in an R session. I'm wondering why neither 'my_test_f' and 'try.package::my_test_f' work. The error message you got below clearly explains this-- you did not export my_test_f in your NAMESPACE file. To access unexported functions, you must use the ':::' operator: try.package:::my_test_f() Peng Yu wrote: Why 'my_test_g' can be accessed with 'try.package::' and without 'try.package::'? Because you exported it in the NAMESPACE file. Peng Yu wrote: Is there a way to make 'my_test_g' accessible only by specifying the namespace 'try.package::'? No. The purpose of the '::' operator is for those cases where multiple packages are loaded that each export a function with the same name. This is known as masking and the last loaded package will contribute the dominant function-- i.e. the function the gets called when the user types functionName() and not packageName::functionName(). The :: operator allows the selection of functions that are masked by the dominant function. If you really want to conceal a function from user-level code, don't export it and it will only be accessible via the ::: operator. Is there a way to list all the functions in a namespace? I tried the following one, but it is not working. showMethods(where=getNamespace('try.package')) No applicable functions You're almost there, and the above approach *is* working but only for S4 methods. Try showMethods(where = e - getNamespace('stats4')) I don't think there's a direct analogue for S3, but it that case you only do ls(e) HTH, Henric __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.