Re: [R] mahalanobis

2007-06-01 Thread gatemaze
On 31/05/07, Anup Nandialath [EMAIL PROTECTED] wrote:
 oops forgot the example example

 try this line

 sqrt(mahalanobis(all, colMeans(predictors), cov(all), FALSE)
Hi and thanks for the reply Anup. Unfortunately, I had a look on the
example before posting but not much of a help... I did some further
tests and in order to have the same results I must run mahalanobis
with the predictors only dataset, ie.
mahalanobis(predictors, colMeans(predictors), cov(predictors)).

Now, on a first glance it seems to me a bit strange that the influence
of these points on a regression are measured without taking into
account the response variable (provided that the other stat software
calculates the mahalanobis distances correctly) but I guess this
is something that I have to resolve by doing some studying on my own
on the mahalanobis distance...

thanks again.


 now cross check with other software

 best

 Anup


  
 No need to miss a message. Get email on-the-go
 with Yahoo! Mail for Mobile. Get started.




-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mahalanobis

2007-05-31 Thread gatemaze
Hi, I am not sure I am using correctly the mahalanobis distnace method...
Suppose I have a response variable Y and predictor variables X1 and X2

all - cbind(Y, X1, X2)
mahalanobis(all, colMeans(all), cov(all));

However, my results from this are different from the ones I am getting
using another statistical software.

I was reading that the comparison is with the means of the predictor
variables which led me to think that the above should be transformed
into:

predictors - cbind(X1, X2)
mahalanobis(all, colMeans(predictors), cov(all))

But still the results are different

Am I doing something wrong or have I misunderstood something in the
use of the function mahalanobis? Thanks.

-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] normality tests

2007-05-25 Thread gatemaze
Hi all,

apologies for seeking advice on a general stats question. I ve run
normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia

All show that the null hypothesis that the data come from a normal
distro cannot be rejected. Great. However, I don't think it looks nice
to report the values of 8 different tests on a report. One note is
that my sample size is really tiny (less than 20 independent cases).
Without wanting to start a flame war, are there any advices of which
one/ones would be more appropriate and should be reported (along with
a Q-Q plot). Thank you.

Regards,

-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests

2007-05-25 Thread gatemaze
On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
  Hi all,
 
  apologies for seeking advice on a general stats question. I ve run
  normality tests using 8 different methods:
  - Lilliefors
  - Shapiro-Wilk
  - Robust Jarque Bera
  - Jarque Bera
  - Anderson-Darling
  - Pearson chi-square
  - Cramer-von Mises
  - Shapiro-Francia
 
  All show that the null hypothesis that the data come from a normal
  distro cannot be rejected. Great. However, I don't think it looks nice
  to report the values of 8 different tests on a report. One note is
  that my sample size is really tiny (less than 20 independent cases).
  Without wanting to start a flame war, are there any advices of which
  one/ones would be more appropriate and should be reported (along with
  a Q-Q plot). Thank you.
 
  Regards,
 

 Wow - I have so many concerns with that approach that it's hard to know
 where to begin.  But first of all, why care about normality?  Why not
 use distribution-free methods?

 You should examine the power of the tests for n=20.  You'll probably
 find it's not good enough to reach a reliable conclusion.

And wouldn't it be even worse if I used non-parametric tests?


 Frank


 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University



-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread gatemaze
Thank you all for your replies they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely
correlation and linear regressions among some variables)... so it
would be nice if I had an extra bit of support on my decisions... If I
understood well from all your replies... I shouldn't pay s much
attntion on the normality tests, so it wouldn't matter which one/ones
I use to report... but rather focus on issues such as the power of the
test...

Thanks again.

On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 
  On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
   [EMAIL PROTECTED] wrote:
Hi all,
   
apologies for seeking advice on a general stats question. I ve run

normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia
   
All show that the null hypothesis that the data come from a normal

distro cannot be rejected. Great. However, I don't think
  it looks nice
to report the values of 8 different tests on a report. One note is

that my sample size is really tiny (less than 20
  independent cases).
Without wanting to start a flame war, are there any
  advices of which
one/ones would be more appropriate and should be reported
  (along with
a Q-Q plot). Thank you.
   
Regards,
   
  
   Wow - I have so many concerns with that approach that it's
  hard to know
   where to begin.  But first of all, why care about
  normality?  Why not
   use distribution-free methods?
  
   You should examine the power of the tests for n=20.  You'll probably

   find it's not good enough to reach a reliable conclusion.
 
  And wouldn't it be even worse if I used non-parametric tests?

 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.

 Generally speaking, I do not find goodness-of-fit for distributions very
 useful, mostly for the reason that failure to reject the null is no
 evidence in favor of the null.  It's difficult for me to imagine why
 there's insufficient evidence to show that the data did not come from a
 normal distribution would be interesting.

 Andy


  
   Frank
  
  
   --
   Frank E Harrell Jr   Professor and Chair   School
  of Medicine
 Department of Biostatistics
  Vanderbilt University
  
 
 
  --
  yianni
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


 
 --
 Notice:  This e-mail message, together with any
 attachments,...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] partial correlation function

2007-05-22 Thread gatemaze
Hi,

after reading the archives I found some methods... adopted and
modified one of them to the following. I think it is correct after
checking and comparing the results with other software... but if
possible someone could have a look and spot any mistakes I would be
grateful. Thanks

pcor3 - function (x, test = T, p = 0.05, alternative=two.sided) {
  nvar - ncol(x)
  ndata - nrow(x)
  conc - solve(cor(x))
  resid.sd - 1/sqrt(diag(conc))
  pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *)
  #colnames(pcc) - rownames(pcc) - colnames(x)
  if (test) {
t.df - ndata - nvar
t - pcc/sqrt((1 - pcc^2)/t.df)
#pcc - list(coefs = pcc, sig = t  qt(1 - (p/2), df = t.df)); #
original statement
if (alternative == two.sided) {
  pcc - list(coefs = pcc, sig = t  qt(1 - (p/2), df = t.df),
p.value = 2 * pmin(pt(t, t.df), 1-pt(t, t.df))) # two.sided
} else if (alternative == greater) {
  pcc - list(coefs = pcc, sig = t  qt(1 - p, df = t.df), p.value
= 1-pt(t, t.df)) # greater
} else if (alternative == less) {
  pcc - list(coefs = pcc, sig = t  qt(1 - p, df = t.df), p.value
= 2*(1-pt(t, t.df)))
}
  }
  str - sprintf(Partial correlation for:); print(str, quote=FALSE);
  str - sprintf(%s, colnames(x)); print(str, quote=FALSE);
  str - sprintf(p: %.2f, alternative: %s, p, alternative);
print(str, quote=FALSE);
  if (test) {
str - sprintf(df: %d, t.df); print(str, quote=FALSE);
  }

  return(pcc)
}



The function was adopted from the following email:
http://tolstoy.newcastle.edu.au/R/help/00a/0518.html

-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] part or semi-partial correlation

2007-05-22 Thread gatemaze
Is it possible to conduct part (also called semi-partial correlation)
with R. The help.search produces no results and there is also nothing
into the archive, well one post asking what is part correlation. Just
quickly from Field [Discovering statistics using spss]:

When we do a partial correlation between two variables, we control
for the effect of a third variable. Specifically, the effect that the
third variable has on BOTH variables in the correlation is controlled.
In a semi-partial correlation we control for the effect that the third
variable has on only one of the variables in the correlation.

Apologies if it is a trivial question. Thanks.

-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] partial correlation significance

2007-05-18 Thread gatemaze
Hi,

among the many (5) methods that I found in the list to do partial
correlation in the following two that I had a look I am getting different
t-values. Does anyone have any clues on why is that? The source code is
below. Thanks.

pcor3 - function (x, test = T, p = 0.05) {
  nvar - ncol(x)
  ndata - nrow(x)
  conc - solve(cor(x))
  resid.sd - 1/sqrt(diag(conc))
  pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *)
  #colnames(pcc) - rownames(pcc) - colnames(x)
  if (test) {
t.df - ndata - nvar
t - pcc/sqrt((1 - pcc^2)/t.df)
print(t);
pcc - list(coefs = pcc, sig = t  qt(1 - (p/2), df = t.df))
  }
  return(pcc)
}


pcor4 - function(x, y, z) {
  return(cor.test(lm(x~z)$resid,lm(y~z)$resid));
}

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] partial correlation significance

2007-05-18 Thread gatemaze
On 18/05/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Hi,

 among the many (5) methods that I found in the list to do partial
 correlation in the following two that I had a look I am getting different
 t-values. Does anyone have any clues on why is that? The source code is
 below. Thanks.

 pcor3 - function (x, test = T, p = 0.05) {
   nvar - ncol(x)
   ndata - nrow(x)
   conc - solve(cor(x))
   resid.sd - 1/sqrt(diag(conc))
   pcc - -sweep(sweep(conc, 1, resid.sd, *), 2, resid.sd, *)
   #colnames(pcc) - rownames(pcc) - colnames(x)
   if (test) {
 t.df - ndata - nvar
 t - pcc/sqrt((1 - pcc^2)/t.df)
 print(t);
 pcc - list(coefs = pcc, sig = t  qt(1 - (p/2), df = t.df))
   }
   return(pcc)
 }


 pcor4 - function(x, y, z) {
   return(cor.test (lm(x~z)$resid,lm(y~z)$resid));
 }



Just to self-reply my question since I found the answer. The difference is
in the degrees of freedom. The variable t.df in pcor3 is smaller than the df
used for the test in pcor4, and how smaller it is depends on the number of
variables used in the partial correlation.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] t value two.sided and one.sided

2007-05-10 Thread gatemaze
Hi,

on a
 summary(lm(y~x))
are the computed t-values for two.sided or one.sided. By looking on some
tables they seem like they are for two.sided. Is it possible to have them
for one.sided? If this makes sense...

Thanks.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in plot.new() : figure margins too large

2007-05-09 Thread gatemaze
Yes, I already had a look on previous posts but nothing is really helpful to
me.
The code is:

postscript(filename, horizontal=FALSE, onefile=FALSE, paper=special,
bg=white, family=ComputerModern, pointsize=10);
par(mar=c(5, 4, 0, 0) + 0.1);
plot(x.nor, y.nor, xlim=c(3,6), ylim=c(20,90), pch=normal.mark);

gives error
Error in plot.new() : figure margins too large

plotting on the screen without calling postscript works just fine .

Any clues? Thanks.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in plot.new() : figure margins too large

2007-05-09 Thread gatemaze
On 09/05/07, Prof Brian Ripley [EMAIL PROTECTED] wrote:

 On Wed, 9 May 2007, [EMAIL PROTECTED] wrote:

  The code is:
 
  postscript(filename, horizontal=FALSE, onefile=FALSE, paper=special,

 You have not set a width or height, so please do your homework.


Thanks a lot for that and to Phil for replying. Just a minor correction to
your post. You have not set a width AND height. Both seem to be required.
I had tried only with width thinking height would be calculated relatively
but I was still getting the same error.

 bg=white, family=ComputerModern, pointsize=10);
  par(mar=c(5, 4, 0, 0) + 0.1);
  plot(x.nor, y.nor, xlim=c(3,6), ylim=c(20,90), pch=normal.mark);
 
  gives error
  Error in plot.new() : figure margins too large
 
  plotting on the screen without calling postscript works just fine .
 
  Any clues? Thanks.
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plots - scatterplots - find index of a point in a list

2007-05-08 Thread gatemaze
Hi,

is it possible to find the index of a point of a plot (e.g. scatterplot) in
an easy way?

Eg.
x - c(1:5); y - c(1:5);
plot(x, y);

On the plot if I move my cursor on top of a point or click on it is it
possible to have its index printed or its exact value? Any clues?

Thanks.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plots - scatterplots - find index of a point in a list

2007-05-08 Thread gatemaze
On 08/05/07, John Kane [EMAIL PROTECTED] wrote:

 Try ?locator


Thanks. Your tip also lead to another function: ?identify to add my two
cents.

--- [EMAIL PROTECTED] wrote:

  Hi,
 
  is it possible to find the index of a point of a
  plot (e.g. scatterplot) in
  an easy way?
 
  Eg.
  x - c(1:5); y - c(1:5);
  plot(x, y);
 
  On the plot if I move my cursor on top of a point or
  click on it is it
  possible to have its index printed or its exact
  value? Any clues?
 
  Thanks.
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 



   Be smarter than spam. See how smart SpamGuard is at giving junk
 email the boot with the All-new Yahoo! Mail at
 http://mrd.mail.yahoo.com/try_beta?.intl=ca



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] general question about use of list

2007-04-30 Thread gatemaze
On 30/04/07, Y G [EMAIL PROTECTED] wrote:

 Hi,

 is this list only related to R issues or it has a broader context
 regarding questions and discussions about statistics. Is there any other
 email list or forum for that?


Well, just to quickly self-reply my question. From
http://www.r-project.org/posting-guide.html

*Questions about statistics:* The R mailing lists are primarily intended for
questions and discussion about the R software. However, questions about
statistical methodology are sometimes posted. If the question is well-asked
and of interest to someone on the list, it *may* elicit an informative
up-to-date answer. See also the Usenet groups sci.stat.consult (applied
statistics and consulting) and sci.stat.math (mathematical stat and
probability).

*Basic statistics and classroom homework:* R-help is not intended for these.


Thanks.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.