Re: [R] mahalanobis
On 31/05/07, Anup Nandialath [EMAIL PROTECTED] wrote: oops forgot the example example try this line sqrt(mahalanobis(all, colMeans(predictors), cov(all), FALSE) Hi and thanks for the reply Anup. Unfortunately, I had a look on the example before posting but not much of a help... I did some further tests and in order to have the same results I must run mahalanobis with the predictors only dataset, ie. mahalanobis(predictors, colMeans(predictors), cov(predictors)). Now, on a first glance it seems to me a bit strange that the influence of these points on a regression are measured without taking into account the response variable (provided that the other stat software calculates the mahalanobis distances correctly) but I guess this is something that I have to resolve by doing some studying on my own on the mahalanobis distance... thanks again. now cross check with other software best Anup No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mahalanobis
Hi, I am not sure I am using correctly the mahalanobis distnace method... Suppose I have a response variable Y and predictor variables X1 and X2 all - cbind(Y, X1, X2) mahalanobis(all, colMeans(all), cov(all)); However, my results from this are different from the ones I am getting using another statistical software. I was reading that the comparison is with the means of the predictor variables which led me to think that the above should be transformed into: predictors - cbind(X1, X2) mahalanobis(all, colMeans(predictors), cov(all)) But still the results are different Am I doing something wrong or have I misunderstood something in the use of the function mahalanobis? Thanks. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] normality tests
Hi all, apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normality tests
On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hi all, apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, Wow - I have so many concerns with that approach that it's hard to know where to begin. But first of all, why care about normality? Why not use distribution-free methods? You should examine the power of the tests for n=20. You'll probably find it's not good enough to reach a reliable conclusion. And wouldn't it be even worse if I used non-parametric tests? Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normality tests [Broadcast]
Thank you all for your replies they have been more useful... well in my case I have chosen to do some parametric tests (more precisely correlation and linear regressions among some variables)... so it would be nice if I had an extra bit of support on my decisions... If I understood well from all your replies... I shouldn't pay s much attntion on the normality tests, so it wouldn't matter which one/ones I use to report... but rather focus on issues such as the power of the test... Thanks again. On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote: Most standard tests, such as t-tests and ANOVA, are fairly resistant to non-normalilty for significance testing. It's the sample means that have to be normal, not the data. The CLT kicks in fairly quickly. Testing for normality prior to choosing a test statistic is generally not a good idea. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM To: [EMAIL PROTECTED]; Frank E Harrell Jr Cc: r-help Subject: Re: [R] normality tests [Broadcast] From: [EMAIL PROTECTED] On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hi all, apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, Wow - I have so many concerns with that approach that it's hard to know where to begin. But first of all, why care about normality? Why not use distribution-free methods? You should examine the power of the tests for n=20. You'll probably find it's not good enough to reach a reliable conclusion. And wouldn't it be even worse if I used non-parametric tests? I believe what Frank meant was that it's probably better to use a distribution-free procedure to do the real test of interest (if there is one) instead of testing for normality, and then use a test that assumes normality. I guess the question is, what exactly do you want to do with the outcome of the normality tests? If those are going to be used as basis for deciding which test(s) to do next, then I concur with Frank's reservation. Generally speaking, I do not find goodness-of-fit for distributions very useful, mostly for the reason that failure to reject the null is no evidence in favor of the null. It's difficult for me to imagine why there's insufficient evidence to show that the data did not come from a normal distribution would be interesting. Andy Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] partial correlation function
Hi, after reading the archives I found some methods... adopted and modified one of them to the following. I think it is correct after checking and comparing the results with other software... but if possible someone could have a look and spot any mistakes I would be grateful. Thanks pcor3 - function (x, test = T, p = 0.05, alternative=two.sided) { nvar - ncol(x) ndata - nrow(x) conc - solve(cor(x)) resid.sd - 1/sqrt(diag(conc)) pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *) #colnames(pcc) - rownames(pcc) - colnames(x) if (test) { t.df - ndata - nvar t - pcc/sqrt((1 - pcc^2)/t.df) #pcc - list(coefs = pcc, sig = t qt(1 - (p/2), df = t.df)); # original statement if (alternative == two.sided) { pcc - list(coefs = pcc, sig = t qt(1 - (p/2), df = t.df), p.value = 2 * pmin(pt(t, t.df), 1-pt(t, t.df))) # two.sided } else if (alternative == greater) { pcc - list(coefs = pcc, sig = t qt(1 - p, df = t.df), p.value = 1-pt(t, t.df)) # greater } else if (alternative == less) { pcc - list(coefs = pcc, sig = t qt(1 - p, df = t.df), p.value = 2*(1-pt(t, t.df))) } } str - sprintf(Partial correlation for:); print(str, quote=FALSE); str - sprintf(%s, colnames(x)); print(str, quote=FALSE); str - sprintf(p: %.2f, alternative: %s, p, alternative); print(str, quote=FALSE); if (test) { str - sprintf(df: %d, t.df); print(str, quote=FALSE); } return(pcc) } The function was adopted from the following email: http://tolstoy.newcastle.edu.au/R/help/00a/0518.html -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] part or semi-partial correlation
Is it possible to conduct part (also called semi-partial correlation) with R. The help.search produces no results and there is also nothing into the archive, well one post asking what is part correlation. Just quickly from Field [Discovering statistics using spss]: When we do a partial correlation between two variables, we control for the effect of a third variable. Specifically, the effect that the third variable has on BOTH variables in the correlation is controlled. In a semi-partial correlation we control for the effect that the third variable has on only one of the variables in the correlation. Apologies if it is a trivial question. Thanks. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] partial correlation significance
Hi, among the many (5) methods that I found in the list to do partial correlation in the following two that I had a look I am getting different t-values. Does anyone have any clues on why is that? The source code is below. Thanks. pcor3 - function (x, test = T, p = 0.05) { nvar - ncol(x) ndata - nrow(x) conc - solve(cor(x)) resid.sd - 1/sqrt(diag(conc)) pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *) #colnames(pcc) - rownames(pcc) - colnames(x) if (test) { t.df - ndata - nvar t - pcc/sqrt((1 - pcc^2)/t.df) print(t); pcc - list(coefs = pcc, sig = t qt(1 - (p/2), df = t.df)) } return(pcc) } pcor4 - function(x, y, z) { return(cor.test(lm(x~z)$resid,lm(y~z)$resid)); } [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] partial correlation significance
On 18/05/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, among the many (5) methods that I found in the list to do partial correlation in the following two that I had a look I am getting different t-values. Does anyone have any clues on why is that? The source code is below. Thanks. pcor3 - function (x, test = T, p = 0.05) { nvar - ncol(x) ndata - nrow(x) conc - solve(cor(x)) resid.sd - 1/sqrt(diag(conc)) pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *) #colnames(pcc) - rownames(pcc) - colnames(x) if (test) { t.df - ndata - nvar t - pcc/sqrt((1 - pcc^2)/t.df) print(t); pcc - list(coefs = pcc, sig = t qt(1 - (p/2), df = t.df)) } return(pcc) } pcor4 - function(x, y, z) { return(cor.test (lm(x~z)$resid,lm(y~z)$resid)); } Just to self-reply my question since I found the answer. The difference is in the degrees of freedom. The variable t.df in pcor3 is smaller than the df used for the test in pcor4, and how smaller it is depends on the number of variables used in the partial correlation. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] t value two.sided and one.sided
Hi, on a summary(lm(y~x)) are the computed t-values for two.sided or one.sided. By looking on some tables they seem like they are for two.sided. Is it possible to have them for one.sided? If this makes sense... Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in plot.new() : figure margins too large
Yes, I already had a look on previous posts but nothing is really helpful to me. The code is: postscript(filename, horizontal=FALSE, onefile=FALSE, paper=special, bg=white, family=ComputerModern, pointsize=10); par(mar=c(5, 4, 0, 0) + 0.1); plot(x.nor, y.nor, xlim=c(3,6), ylim=c(20,90), pch=normal.mark); gives error Error in plot.new() : figure margins too large plotting on the screen without calling postscript works just fine . Any clues? Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in plot.new() : figure margins too large
On 09/05/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Wed, 9 May 2007, [EMAIL PROTECTED] wrote: The code is: postscript(filename, horizontal=FALSE, onefile=FALSE, paper=special, You have not set a width or height, so please do your homework. Thanks a lot for that and to Phil for replying. Just a minor correction to your post. You have not set a width AND height. Both seem to be required. I had tried only with width thinking height would be calculated relatively but I was still getting the same error. bg=white, family=ComputerModern, pointsize=10); par(mar=c(5, 4, 0, 0) + 0.1); plot(x.nor, y.nor, xlim=c(3,6), ylim=c(20,90), pch=normal.mark); gives error Error in plot.new() : figure margins too large plotting on the screen without calling postscript works just fine . Any clues? Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plots - scatterplots - find index of a point in a list
Hi, is it possible to find the index of a point of a plot (e.g. scatterplot) in an easy way? Eg. x - c(1:5); y - c(1:5); plot(x, y); On the plot if I move my cursor on top of a point or click on it is it possible to have its index printed or its exact value? Any clues? Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plots - scatterplots - find index of a point in a list
On 08/05/07, John Kane [EMAIL PROTECTED] wrote: Try ?locator Thanks. Your tip also lead to another function: ?identify to add my two cents. --- [EMAIL PROTECTED] wrote: Hi, is it possible to find the index of a point of a plot (e.g. scatterplot) in an easy way? Eg. x - c(1:5); y - c(1:5); plot(x, y); On the plot if I move my cursor on top of a point or click on it is it possible to have its index printed or its exact value? Any clues? Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question about use of list
On 30/04/07, Y G [EMAIL PROTECTED] wrote: Hi, is this list only related to R issues or it has a broader context regarding questions and discussions about statistics. Is there any other email list or forum for that? Well, just to quickly self-reply my question. From http://www.r-project.org/posting-guide.html *Questions about statistics:* The R mailing lists are primarily intended for questions and discussion about the R software. However, questions about statistical methodology are sometimes posted. If the question is well-asked and of interest to someone on the list, it *may* elicit an informative up-to-date answer. See also the Usenet groups sci.stat.consult (applied statistics and consulting) and sci.stat.math (mathematical stat and probability). *Basic statistics and classroom homework:* R-help is not intended for these. Thanks. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.