Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

Exactly.


 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

Thank you! I will do this.

Is this kind of !Monte Carlo -evaluation (?) often used in  
statistics.If it is, do you know any reference for ti?

Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel that

 the

 U-statistic associated with the WSR test uses: the indicator (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

I check, so you mean doing it this way:

t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
(SAMPLE), alt = less)

Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel that

 the

 U-statistic associated with the WSR test uses: the indicator (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:


 Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation  
 of it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or  
 not.

 I check, so you mean doing it this way:

 t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
 (SAMPLE), alt = less)

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace =  
FALSE)], mu=mean(SAMPLE), alt = less)

Atte


 Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in  
 more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of  
 all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you  
 offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data  
 are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you  
 much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel  
 that

 the

 U-statistic associated with the WSR test uses: the indicator  
 (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Greg Snow
No I mean something like this, assuming that the iris dataset contains the full 
population and we want to see if Setaso have a different mean than the 
population (the null would be that there is no difference in sepal width 
between species, or that species tells nothing about sepal width):


out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) )
obs1 - mean( iris$Sepal.Width[1:50] )

hist(out1, xlim=range(out1,obs1))
abline(v=obs1)

mean( out1  obs1 )


I don't have a reference (other than a text book that defines sampling 
distributions).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: Atte Tenkanen [mailto:atte...@utu.fi]
Sent: Friday, June 25, 2010 10:08 PM
To: Atte Tenkanen
Cc: Greg Snow; David Winsemius; R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements


Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:



Greg Snow kirjoitti 25.6.2010 kello 21.55:


Let me see if I understand.  You actually have the data for the whole 
population (the entire piece) but you have some pre-defined sections that you 
want to see if they differ from the population, or more meaningfully they are 
different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the 
actual sampling distribution (or a good approximation of it).  Just take random 
samples from the population of the given size (matching the subset you are 
interested in) and calculate the means (or other value of interest), probably 
10,000 to 1,000,000 samples.  Now compare the value from your predefined subset 
to the set of random values you generated to see if it is in the tail or not.

I check, so you mean doing it this way:

t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean(SAMPLE), alt = 
less)

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], 
mu=mean(SAMPLE), alt = less)

Atte



Atte



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111


-Original Message-
From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-
project.org] On Behalf Of Atte Tenkanen
Sent: Thursday, June 24, 2010 11:04 PM
To: David Winsemius
Cc: R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements

The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class
segments' and these segments are compared with one reference set with a
distance function. Only some distance values are possible. These
distance values can be averaged over music bars which produces smoother
distribution and the 'comparison curve' that illustrates the distances
according to the reference set through a musical piece result in more
readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
would prefer to use original values.

then, I want to pick only some regions from the piece and compare those
values of those regions, whether they are higher than the mean of all
values.

Atte

On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n—250-300


I do not understand why there should be so many ties. You have not
described the measurement process or units. ( ... although you offer
a

glipmse without much background  later.)

i would like to test, whether the mean of the sample differ
significantly from the population mean.

Why? What is the purpose of this investigation? Why should the mean
of

a sample be that important?


The histogram of the population looks like in attached histogram,
what test should I use? No choices?

This distribution comes from a musical piece and the values are
'tonal distances'.

http://users.utu.fi/attenka/Hist.png

That picture does not offer much insidght into the features of that
measurement. It appears to have much more structure than I would
expect for a sample from a smooth unimodal underlying population.

--
David.


Atte

On 06/24/2010 12:40 PM, David Winsemius wrote:

On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume
that

the
data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much

about
whether the median is different than the hypothetical value.

You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does
not
mean
that it is necessarily true. My understanding (confirmed
reviewing
Nonparametric

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen
Thanks! The results were similar to the t.test p-values show (I have  
four samples).
Thank you also for using that replicate-function which i didn't know.  
Till now I have just used for-loops that are not so beautiful... i  
don't know about the speed. Have to test that.

Atte

Greg Snow kirjoitti 26.6.2010 kello 23.30:

 No I mean something like this, assuming that the iris dataset  
 contains the full population and we want to see if Setaso have a  
 different mean than the population (the null would be that there is  
 no difference in sepal width between species, or that species tells  
 nothing about sepal width):


 out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) )
 obs1 - mean( iris$Sepal.Width[1:50] )

 hist(out1, xlim=range(out1,obs1))
 abline(v=obs1)

 mean( out1  obs1 )


 I donÕt have a reference (other than a text book that defines  
 sampling distributions).

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111

 From: Atte Tenkanen [mailto:atte...@utu.fi]
 Sent: Friday, June 25, 2010 10:08 PM
 To: Atte Tenkanen
 Cc: Greg Snow; David Winsemius; R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements


 Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:



 Greg Snow kirjoitti 25.6.2010 kello 21.55:


 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

 I check, so you mean doing it this way:

 t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
 (SAMPLE), alt = less)

 NO, this way:

 t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace =  
 FALSE)], mu=mean(SAMPLE), alt = less)

 Atte



 Atte



 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Daniel Malter

Atte, note the similarity between what Greg described and a bootstrap. The
difference to a true bootstrap is that in Greg's version you subsample the
population (or in other instances the data). This is known as subsampling
bootstrap and discussed in Politis, Romano, and Wolf (1999).

HTH,
Daniel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2269775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Atte Tenkanen
The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class segments' and 
these segments are compared with one reference set with a distance function. 
Only some distance values are possible. These distance values can be averaged 
over music bars which produces smoother distribution and the 'comparison curve' 
that illustrates the distances according to the reference set through a musical 
piece result in more readable curve (see e.g. 
http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original 
values.

then, I want to pick only some regions from the piece and compare those values 
of those regions, whether they are higher than the mean of all values. 

Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
  Is there anything for me?
 
  There is a lot of data, n=2418, but there are also a lot of ties.
  My sample n≈250-300
 
 
 I do not understand why there should be so many ties. You have not  
 described the measurement process or units. ( ... although you offer a 
  
 glipmse without much background  later.)
 
  i would like to test, whether the mean of the sample differ  
  significantly from the population mean.
 
 Why? What is the purpose of this investigation? Why should the mean of 
  
 a sample be that important?
 
 
  The histogram of the population looks like in attached histogram,  
  what test should I use? No choices?
 
  This distribution comes from a musical piece and the values are  
  'tonal distances'.
 
  http://users.utu.fi/attenka/Hist.png
 
 That picture does not offer much insidght into the features of that  
 measurement. It appears to have much more structure than I would  
 expect for a sample from a smooth unimodal underlying population.
 
 -- 
 David.
 
 
  Atte
 
  On 06/24/2010 12:40 PM, David Winsemius wrote:
 
  On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are
  sampled from a Gaussian distribution. However it does assume that 
  
  the
  data are distributed symmetrically around the median. If the
  distribution is asymmetrical, the P value will not tell you much  
 
  about
  whether the median is different than the hypothetical value.
 
  You are being misled. Simply finding a statement on a statistics
  software website, even one as reputable as Graphpad (???), does not
  mean
  that it is necessarily true. My understanding (confirmed reviewing
  Nonparametric statistical methods for complete and censored data
  by M.
  M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
  does
  not require that the underlying distributions be symmetric. The  
  above
  quotation is highly inaccurate.
 
 
  To add to what David and others have said, look at the kernel that  
 
  the
 
  U-statistic associated with the WSR test uses: the indicator (0/1) 
 of
  xi
  + xj  0.  So WSR tests H0:p=0.5 where p = the probability that the
  average of a randomly chosen pair of values is positive.  [If there
  are
  ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj 
 
 
  0], i neq j.
 
  Frank
 
  -- 
  Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
   Department of Biostatistics   Vanderbilt  
  University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Atte Tenkanen

BTW. If there is not so weak test that would be suitable for my purpose 
(because of the ties and the shape of the data), could I proceed this way:

It is also worth of comparing different samples taken from the data. Since the 
mean and sd of the data are available, could I approximate p-values using z- or 
t-test, just to compare several different samples?

Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
  Is there anything for me?
 
  There is a lot of data, n=2418, but there are also a lot of ties.
  My sample n≈250-300
 
 
 I do not understand why there should be so many ties. You have not  
 described the measurement process or units. ( ... although you offer a 
  
 glipmse without much background  later.)
 
  i would like to test, whether the mean of the sample differ  
  significantly from the population mean.
 
 Why? What is the purpose of this investigation? Why should the mean of 
  
 a sample be that important?
 
 
  The histogram of the population looks like in attached histogram,  
  what test should I use? No choices?
 
  This distribution comes from a musical piece and the values are  
  'tonal distances'.
 
  http://users.utu.fi/attenka/Hist.png
 
 That picture does not offer much insidght into the features of that  
 measurement. It appears to have much more structure than I would  
 expect for a sample from a smooth unimodal underlying population.
 
 -- 
 David.
 
 
  Atte
 
  On 06/24/2010 12:40 PM, David Winsemius wrote:
 
  On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are
  sampled from a Gaussian distribution. However it does assume that 
  
  the
  data are distributed symmetrically around the median. If the
  distribution is asymmetrical, the P value will not tell you much  
 
  about
  whether the median is different than the hypothetical value.
 
  You are being misled. Simply finding a statement on a statistics
  software website, even one as reputable as Graphpad (???), does not
  mean
  that it is necessarily true. My understanding (confirmed reviewing
  Nonparametric statistical methods for complete and censored data
  by M.
  M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
  does
  not require that the underlying distributions be symmetric. The  
  above
  quotation is highly inaccurate.
 
 
  To add to what David and others have said, look at the kernel that  
 
  the
 
  U-statistic associated with the WSR test uses: the indicator (0/1) 
 of
  xi
  + xj  0.  So WSR tests H0:p=0.5 where p = the probability that the
  average of a randomly chosen pair of values is positive.  [If there
  are
  ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj 
 
 
  0], i neq j.
 
  Frank
 
  -- 
  Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
   Department of Biostatistics   Vanderbilt  
  University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys
As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :

x-rnorm(250)
hist(x,freq=F)
lines(density(x),col=red)

See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics-1.pdf

2010/6/25 Atte Tenkanen atte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

You should think about the central limit theorem. Actually, you can
just use a t-test to compare means, as with those sample sizes the
mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?

cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Frank E Harrell Jr
The central limit theorem doesn't help.  It just addresses type I error,
not power.

Frank

On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :
 
 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)
 
 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf
 
 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300
 
 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?
 
 cheers
 Joris
 


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys
2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu:
 The central limit theorem doesn't help.  It just addresses type I error,
 not power.

 Frank

I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the t-test in case of skewed distributions. Apologies if I caused
more confusion.

The problem with the wilcoxon is twofold as far as I understood this
data correctly :
- there are quite some ties
- the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.

The central limit theory makes that :
- the t-test will do correct inference in the presence of ties
- unequal variances can be taken into account using the modified
t-test, both in the case of equal and unequal sample sizes

For these reasons, I would personally use the t-test for comparing two
samples from the described population. Your mileage may vary.

Cheers
Joris


 On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :

 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)

 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf

 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?

 cheers
 Joris



 --
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Frank E Harrell Jr
You still are stating the effect of the central limit theorem
incorrectly.  Please see my previous note.

Frank

On 06/25/2010 10:27 AM, Joris Meys wrote:
 2010/6/25 Frank E Harrell Jrf.harr...@vanderbilt.edu:
 The central limit theorem doesn't help.  It just addresses type I error,
 not power.

 Frank
 
 I don't think I stated otherwise. I am aware of the fact that the
 wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
 to the t-test in case of skewed distributions. Apologies if I caused
 more confusion.
 
 The problem with the wilcoxon is twofold as far as I understood this
 data correctly :
 - there are quite some ties
 - the wilcoxon assumes under the null that the distributions are the
 same, not only the location. The influence of unequal variances and/or
 shapes of the distribution is enhanced in the case of unequal sample
 sizes.
 
 The central limit theory makes that :
 - the t-test will do correct inference in the presence of ties
 - unequal variances can be taken into account using the modified
 t-test, both in the case of equal and unequal sample sizes
 
 For these reasons, I would personally use the t-test for comparing two
 samples from the described population. Your mileage may vary.
 
 Cheers
 Joris
 

 On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :

 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)

 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf

 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?

 cheers
 Joris



 --
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
  Department of Biostatistics   Vanderbilt University

 
 
 


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Daniel Malter

Atte, I would not wonder if you got lost and confused by the certainly
interesting methodological discussion that has been going on in this thread.

Since the helpers do not seem to converge/agree, I propose to you to use a
different nonparametric approach: The bootstrap. The important thing about
the bootstrap is that you do not have to be concerned with the questions
that have been discussed in this thread.

In the bootstrap you draw repeatedly samples with replacement from your data
and compute the statistic you are interested in (for you this is the mean).
The beauty of this approach is i) that the bootstrap distribution is normal
and ii) that you can directly compare the quantiles/confidence intervals of
the bootstrap distribution.

Let's say you have x and y, which both come from Poisson distributions with
relatively low means. Note that this resembles your data in that the
distributions are asymmetric, but contain a considerable number of ties.

#set seed for random number generation
set.seed(123)

#simulate x and y (these would be your data)
x=rpois(100,3)
y=rpois(100,4)

#plot histograms for x and y
par(mfcol=c(1,2))
hist(x,breaks=length(unique(x)))
hist(y,breaks=length(unique(y))) 


Now we sample with replacement from x and y (i.e., we draw one observation
from x and one from y, and afterwards we put the drawn observation back into
x and y, respectively). For each bootstrap of x and y, respectively, we
sample exactly as many observations as there are in x and y, respectively
(here 100). We then compute the statistic of interest of this bootstrap
(here the mean). We repeat this process many times (here 1000).


n=1000 #number of bootstraps to draw
x.boot1=numeric(n)
y.boot1=numeric(n)
for(i in 1:1000){
  x.boot1[i]=mean(sample(x,length(x),replace=T))
  y.boot1[i]=mean(sample(y,length(y),replace=T))
} 

Doing this, we draw the bootstrap distribution of the mean of x and y,
respectively. Note that the bootstrap distribution is normally distributed
and unbiased (the latter automatically because we bootstrap the mean):

par(mfcol=c(1,2))
hist(x.boot1)
hist(y.boot1)

The simple(st) way of comparing these distributions is by checking whether
their confidence intervals overlap or not. You get the 95-percent confidence
intervals by

quantile(x.boot1,p=c(0.025,0.975))
quantile(y.boot1,p=c(0.025,0.975))

If they do not overlap, you would conclude that they are significantly
different. In the one-sample case, you would just compare whether value of
interest is within or outside the confidence interval.

Finally, note that the little loop that we have programmed to draw the
bootstraps are already implemented in an R package. Using the bootstrap
package, you could draw the bootstraps analogously by:

library(bootstrap)
x.boot2=bootstrap(x,nboot=1000,mean)
y.boot2=bootstrap(y,nboot=1000,mean)

The bootstrapped means are then stored in x.boot2$thetastar and
y.boot2$thetastar.

Hope that helps,
Daniel











This process we repeatAnd now we draw many bootstraps, r
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Greg Snow
Let me see if I understand.  You actually have the data for the whole 
population (the entire piece) but you have some pre-defined sections that you 
want to see if they differ from the population, or more meaningfully they are 
different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the 
actual sampling distribution (or a good approximation of it).  Just take random 
samples from the population of the given size (matching the subset you are 
interested in) and calculate the means (or other value of interest), probably 
10,000 to 1,000,000 samples.  Now compare the value from your predefined subset 
to the set of random values you generated to see if it is in the tail or not.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements
 
 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces smoother
 distribution and the 'comparison curve' that illustrates the distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
 would prefer to use original values.
 
 then, I want to pick only some regions from the piece and compare those
 values of those regions, whether they are higher than the mean of all
 values.
 
 Atte
 
  On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
   Is there anything for me?
  
   There is a lot of data, n=2418, but there are also a lot of ties.
   My sample n≈250-300
  
 
  I do not understand why there should be so many ties. You have not
  described the measurement process or units. ( ... although you offer
 a
 
  glipmse without much background  later.)
 
   i would like to test, whether the mean of the sample differ
   significantly from the population mean.
 
  Why? What is the purpose of this investigation? Why should the mean
 of
 
  a sample be that important?
 
  
   The histogram of the population looks like in attached histogram,
   what test should I use? No choices?
  
   This distribution comes from a musical piece and the values are
   'tonal distances'.
  
   http://users.utu.fi/attenka/Hist.png
 
  That picture does not offer much insidght into the features of that
  measurement. It appears to have much more structure than I would
  expect for a sample from a smooth unimodal underlying population.
 
  --
  David.
 
  
   Atte
  
   On 06/24/2010 12:40 PM, David Winsemius wrote:
  
   On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
  
   Thanks. What I have had to ask is that
  
   how do you test that the data is symmetric enough?
   If it is not, is it ok to use some data transformation?
  
   when it is said:
  
   The Wilcoxon signed rank test does not assume that the data are
   sampled from a Gaussian distribution. However it does assume
 that
 
   the
   data are distributed symmetrically around the median. If the
   distribution is asymmetrical, the P value will not tell you much
 
   about
   whether the median is different than the hypothetical value.
  
   You are being misled. Simply finding a statement on a statistics
   software website, even one as reputable as Graphpad (???), does
 not
   mean
   that it is necessarily true. My understanding (confirmed
 reviewing
   Nonparametric statistical methods for complete and censored
 data
   by M.
   M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
   does
   not require that the underlying distributions be symmetric. The
   above
   quotation is highly inaccurate.
  
  
   To add to what David and others have said, look at the kernel that
 
   the
  
   U-statistic associated with the WSR test uses: the indicator (0/1)
  of
   xi
   + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
   average of a randomly chosen pair of values is positive.  [If
 there
   are
   ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
  
  
   0], i neq j.
  
   Frank
  
   --
   Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
Department of Biostatistics   Vanderbilt
   University
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Thomas Lumley

On Thu, 24 Jun 2010, Atte Tenkanen wrote:


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:


Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume that
the data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much
about whether the median is different than the hypothetical value.


You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does not
mean that it is necessarily true. My understanding (confirmed
reviewing Nonparametric statistical methods for complete and censored

data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-

rank test does not require that the underlying distributions be
symmetric. The above quotation is highly inaccurate.

--
David.


Thanks. Unfortunately, I can't follow the reference at all, but I read this in 
that way that I can be carefree as far as the underlying distribution is 
concerned?

Is there any other authoritative reference where that is just stated in a way test 
does not require that the underlying distributions be   symmetric or normal.



The statement from GraphPad is correct, but for a different question.  Let me 
expound.

First let us consider means:

If you have paired samples X1.. Xn and Y1..Yn you could ask if the mean of X is 
equal to the mean of Y, or if the mean of (X-Y) is zero.   These are equivalent 
questions, because of the way the mean is defined.   So the paired t-test, 
which answers the first question, and the one-sample t-test, which answers the 
second question, are equivalent.  They have no assumptions (other than 
sufficient sample size for the means to be Normally distributed).


Now, let us consider medians.
f you have paired samples X1.. Xn and Y1..Yn you could ask if the median of X 
is equal to the median of Y, or if the median of (X-Y) is zero.  The first 
question can be answered by any standard test (though there are ways to do it). 
 The second is answered by the sign test.  They are not at all equivalent: it 
is possible for the median of X to be larger than the median of Y but the 
median of (X-Y) to be negative.   The non-equivalence is true for essentially 
all statistics except for the mean.

Now, let us consider the Wilcoxon signed-rank test.
This can be characterized precisely as a test of the null hypothesis that the 
median pairwise mean of  X-Y is zero. That is, take all n(n-1)/2 pairs of 
(X-Y)s.  Take the mean of each pair to get n(n-1)/2 pairwise means. Take the 
median of these numbers.  The p-value will be 0.5 one-sided or 1.0 two-sided 
when this median pairwise mean is exactly zero.  The median pairwise mean is 
also sometimes known as the Hodges-Lehmann estimator (though this is strictly 
speaking a more general term).

As David correctly points out, no assumptions are needed for the Wilcoxon signed-rank 
test to be a test of *this* null hypothesis.   The problem is that this may not be the 
null hypothesis you care about.  As GraphPad correctly points out, the P value will 
not tell you much about whether the *median* is different than the hypothetical 
value because the median is not the same as the median pairwise mean.  It is 
entirely possible for the median difference to be positive and the median pairwise mean 
difference to be zero or negative.

If you assume that the distribution of differences X-Y is symmetric, then the 
Wilcoxon signed-rank test also tests the null hypothesis that the median of X-Y 
is zero (and that the mean of X-Y is zero), because these null hypotheses are 
equivalent for a symmetric distribution.  That's what GraphPad is saying

You could also assume that the distributions X and Y are stochastically 
ordered.  This basically implies that the direction of difference is the same 
no matter what location statistic you use to measure it. If X was before some 
intervention and Y was afterwards you would basically be assuming that the 
intervention is either beneficial for everyone or harmful for everyone (up to 
measurement error). Under this assumption, the signed rank test also tells you 
reliably about differences in medians.

To some extent this is a philosophical issue.  My preference is to know exactly 
what a test is doing and to make these distinctions.   Many other people, 
including reputable experts like Frank Harrell, believe (I think) that 
simplifying assumptions such as stochastic ordering are a pretty good 
approximation in a lot of situations, so it isn't necessary to always make 
these distinctions.


 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Atte Tenkanen
PS.

Mayby I can somehow try to transform data and check it, for example, using the 
skewness-function of timeDate-package? 

 Thanks. What I have had to ask is that
  
 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?
 
 when it is said:
 
 The Wilcoxon signed rank test does not assume that the data are 
 sampled from a Gaussian distribution. However it does assume that the 
 data are distributed symmetrically around the median. If the 
 distribution is asymmetrical, the P value will not tell you much about 
 whether the median is different than the hypothetical value.
 
  On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
   Hi all,
  
   I have a distribution, and take a sample of it. Then I compare 
 that 
  sample with the mean of the population like here in Wilcoxon signed 
 
  rank test with continuity correction:
  
   wilcox.test(Sample,mu=mean(All), alt=two.sided)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0002093
   alternative hypothesis: true location is not equal to 0.4115136
  
   wilcox.test(Sample,mu=mean(All), alt = greater)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0001047
   alternative hypothesis: true location is greater than 0.4115136
  
   What assumptions are needed for the population?
  
  wikipedia says:
  The Wilcoxon signed-rank test is a _non-parametric_ statistical
  hypothesis test for... 
  it also talks about the assumptions.
  
   What can we say according these results?
   p-value for the less is 0.999.
  
  That the p-value for less and greater seem to sum up to one, and that
  the p-value of greater is half of that for two-sided. You shouldn't
  ask what we can say. You should ask yourself What was the question
  and is this test giving me an answer on that question?
  
  Cheers
  Joris
  
  -- 
  Joris Meys
  Statistical consultant
  
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
  
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Atte Tenkanen
Thanks. What I have had to ask is that
 
how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are sampled from a 
Gaussian distribution. However it does assume that the data are distributed 
symmetrically around the median. If the distribution is asymmetrical, the P 
value will not tell you much about whether the median is different than the 
hypothetical value.

 On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
  Hi all,
 
  I have a distribution, and take a sample of it. Then I compare that 
 sample with the mean of the population like here in Wilcoxon signed 
 rank test with continuity correction:
 
  wilcox.test(Sample,mu=mean(All), alt=two.sided)
 
         Wilcoxon signed rank test with continuity correction
 
  data:  AlphaNoteOnsetDists
  V = 63855, p-value = 0.0002093
  alternative hypothesis: true location is not equal to 0.4115136
 
  wilcox.test(Sample,mu=mean(All), alt = greater)
 
         Wilcoxon signed rank test with continuity correction
 
  data:  AlphaNoteOnsetDists
  V = 63855, p-value = 0.0001047
  alternative hypothesis: true location is greater than 0.4115136
 
  What assumptions are needed for the population?
 
 wikipedia says:
 The Wilcoxon signed-rank test is a _non-parametric_ statistical
 hypothesis test for... 
 it also talks about the assumptions.
 
  What can we say according these results?
  p-value for the less is 0.999.
 
 That the p-value for less and greater seem to sum up to one, and that
 the p-value of greater is half of that for two-sided. You shouldn't
 ask what we can say. You should ask yourself What was the question
 and is this test giving me an answer on that question?
 
 Cheers
 Joris
 
 -- 
 Joris Meys
 Statistical consultant
 
 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control
 
 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
One way of looking at it is doing a sign test after substraction of
the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect
to have about as many values above as below zero. There is a sign test
somewhere in one of the packages, but it's easily done using the
binom.test as well :

 set.seed(12345)
 x1 - rnorm(100)
 x2 - rpois(100,2)

  binom.test((sum(x1-mean(x1)0)),length(x1))

Exact binomial test

data:  (sum(x1 - mean(x1)  0)) and length(x1)
number of successes = 56, number of trials = 100, p-value = 0.2713
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4571875 0.6591640
sample estimates:
probability of success
  0.56

  binom.test((sum(x2-mean(x2)0)),length(x2))

Exact binomial test

data:  (sum(x2 - mean(x2)  0)) and length(x2)
number of successes = 37, number of trials = 100, p-value = 0.01203
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2755666 0.4723516
sample estimates:
probability of success
  0.37

Cheers
Joris

On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen atte...@utu.fi wrote:
 PS.

 Mayby I can somehow try to transform data and check it, for example, using 
 the skewness-function of timeDate-package?

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume that the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much about
 whether the median is different than the hypothetical value.

  On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
   Hi all,
  
   I have a distribution, and take a sample of it. Then I compare
 that
  sample with the mean of the population like here in Wilcoxon signed

  rank test with continuity correction:
  
   wilcox.test(Sample,mu=mean(All), alt=two.sided)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0002093
   alternative hypothesis: true location is not equal to 0.4115136
  
   wilcox.test(Sample,mu=mean(All), alt = greater)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0001047
   alternative hypothesis: true location is greater than 0.4115136
  
   What assumptions are needed for the population?
 
  wikipedia says:
  The Wilcoxon signed-rank test is a _non-parametric_ statistical
  hypothesis test for... 
  it also talks about the assumptions.
 
   What can we say according these results?
   p-value for the less is 0.999.
 
  That the p-value for less and greater seem to sum up to one, and that
  the p-value of greater is half of that for two-sided. You shouldn't
  ask what we can say. You should ask yourself What was the question
  and is this test giving me an answer on that question?
 
  Cheers
  Joris
 
  --
  Joris Meys
  Statistical consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread David Winsemius


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:


Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are  
sampled from a Gaussian distribution. However it does assume that  
the data are distributed symmetrically around the median. If the  
distribution is asymmetrical, the P value will not tell you much  
about whether the median is different than the hypothetical value.


You are being misled. Simply finding a statement on a statistics  
software website, even one as reputable as Graphpad (???), does not  
mean that it is necessarily true. My understanding (confirmed  
reviewing Nonparametric statistical methods for complete and censored  
data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- 
rank test does not require that the underlying distributions be  
symmetric. The above quotation is highly inaccurate.


--
David.



On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi  
wrote:

Hi all,

I have a distribution, and take a sample of it. Then I compare that

sample with the mean of the population like here in Wilcoxon signed
rank test with continuity correction:



wilcox.test(Sample,mu=mean(All), alt=two.sided)


   Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists
V = 63855, p-value = 0.0002093
alternative hypothesis: true location is not equal to 0.4115136


wilcox.test(Sample,mu=mean(All), alt = greater)


   Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists
V = 63855, p-value = 0.0001047
alternative hypothesis: true location is greater than 0.4115136

What assumptions are needed for the population?


wikipedia says:
The Wilcoxon signed-rank test is a _non-parametric_ statistical
hypothesis test for... 
it also talks about the assumptions.


What can we say according these results?
p-value for the less is 0.999.


That the p-value for less and greater seem to sum up to one, and that
the p-value of greater is half of that for two-sided. You shouldn't
ask what we can say. You should ask yourself What was the question
and is this test giving me an answer on that question?

Cheers
Joris

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Frank E Harrell Jr

On 06/24/2010 12:40 PM, David Winsemius wrote:


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:


Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume that the
data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much about
whether the median is different than the hypothetical value.


You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does not mean
that it is necessarily true. My understanding (confirmed reviewing
Nonparametric statistical methods for complete and censored data by M.
M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does
not require that the underlying distributions be symmetric. The above
quotation is highly inaccurate.



To add to what David and others have said, look at the kernel that the 
U-statistic associated with the WSR test uses: the indicator (0/1) of xi 
+ xj  0.  So WSR tests H0:p=0.5 where p = the probability that the 
average of a randomly chosen pair of values is positive.  [If there are 
ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj  
0], i neq j.


Frank

--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Atte Tenkanen
 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are  
  sampled from a Gaussian distribution. However it does assume that  
  the data are distributed symmetrically around the median. If the  
  distribution is asymmetrical, the P value will not tell you much  
  about whether the median is different than the hypothetical value.
 
 You are being misled. Simply finding a statement on a statistics  
 software website, even one as reputable as Graphpad (???), does not  
 mean that it is necessarily true. My understanding (confirmed  
 reviewing Nonparametric statistical methods for complete and censored 
  
 data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- 
 
 rank test does not require that the underlying distributions be  
 symmetric. The above quotation is highly inaccurate.
 
 -- 
 David.

Thanks. Unfortunately, I can't follow the reference at all, but I read this in 
that way that I can be carefree as far as the underlying distribution is 
concerned?

Is there any other authoritative reference where that is just stated in a way 
test does not require that the underlying distributions be   symmetric or 
normal.

Atte


 
  On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi  
  wrote:
  Hi all,
 
  I have a distribution, and take a sample of it. Then I compare that
  sample with the mean of the population like here in Wilcoxon signed
  rank test with continuity correction:
 
  wilcox.test(Sample,mu=mean(All), alt=two.sided)
 
 Wilcoxon signed rank test with continuity correction
 
  data:  AlphaNoteOnsetDists
  V = 63855, p-value = 0.0002093
  alternative hypothesis: true location is not equal to 0.4115136
 
  wilcox.test(Sample,mu=mean(All), alt = greater)
 
 Wilcoxon signed rank test with continuity correction
 
  data:  AlphaNoteOnsetDists
  V = 63855, p-value = 0.0001047
  alternative hypothesis: true location is greater than 0.4115136
 
  What assumptions are needed for the population?
 
  wikipedia says:
  The Wilcoxon signed-rank test is a _non-parametric_ statistical
  hypothesis test for... 
  it also talks about the assumptions.
 
  What can we say according these results?
  p-value for the less is 0.999.
 
  That the p-value for less and greater seem to sum up to one, and that
  the p-value of greater is half of that for two-sided. You shouldn't
  ask what we can say. You should ask yourself What was the question
  and is this test giving me an answer on that question?
 
  Cheers
  Joris
 
  -- 
  Joris Meys
  Statistical consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test, the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :

 set.seed(100)
 y - rnorm(100,1,2)^2
 wilcox.test(y,mu=median(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867

 wilcox.test(y,mu=mean(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does matter.

Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote:


 You are being misled. Simply finding a statement on a statistics software
 website, even one as reputable as Graphpad (???), does not mean that it is
 necessarily true. My understanding (confirmed reviewing Nonparametric
 statistical methods for complete and censored data by M. M. Desu, Damaraju
 Raghavarao, is that the Wilcoxon signed-rank test does not require that the
 underlying distributions be symmetric. The above quotation is highly
 inaccurate.

 --
 David.



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread David Winsemius


On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:


I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test,


That is a key point. I was assuming that you were using the paired  
sample version of the WSRT and I may have been misleading the OP. For  
the one-sample situation, the assumption of symmetry is needed but for  
the paired sampling version of the test, the location shift becomes  
the tested hypothesis, and no assumptions about the form of the  
hypothesis are made except that they be the same. Any consideration of  
median or mean (which will be the same in the case of symmetric  
distributions) gets lost in the paired test case.


--
David.



the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :


set.seed(100)
y - rnorm(100,1,2)^2
wilcox.test(y,mu=median(y))


   Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867


wilcox.test(y,mu=mean(y))


   Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does matter.

Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net 
 wrote:



You are being misled. Simply finding a statement on a statistics  
software
website, even one as reputable as Graphpad (???), does not mean  
that it is
necessarily true. My understanding (confirmed reviewing  
Nonparametric
statistical methods for complete and censored data by M. M. Desu,  
Damaraju
Raghavarao, is that the Wilcoxon signed-rank test does not require  
that the

underlying distributions be symmetric. The above quotation is highly
inaccurate.

--
David.





--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
dwinsem...@comcast.net wrote:

 On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:

 I do agree that one should not trust solely on sources like wikipedia
 and graphpad, although they contain a lot of valuable information.

 This said, it is not too difficult to illustrate why, in the case of
 the one-sample signed rank test,

 That is a key point. I was assuming that you were using the paired sample
 version of the WSRT and I may have been misleading the OP. For the
 one-sample situation, the assumption of symmetry is needed but for the
 paired sampling version of the test, the location shift becomes the tested
 hypothesis, and no assumptions about the form of the hypothesis are made
 except that they be the same.

I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i - y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.

Cheers
Joris

 Any consideration of median or mean (which
 will be the same in the case of symmetric distributions) gets lost in the
 paired test case.

 --
 David.


 the differences should be not to far
 away from symmetrical. It just needs some reflection on how the
 statistic is calculated. If you have an asymmetrical distribution, you
 have a lot of small differences with a negative sign and a lot of
 large differences with a positive sign if you test against the median
 or mean. Hence the sum of ranks for one side will be higher than for
 the other, leading eventually to a significant result.

 An extreme example :

 set.seed(100)
 y - rnorm(100,1,2)^2
 wilcox.test(y,mu=median(y))

       Wilcoxon signed rank test with continuity correction

 data:  y
 V = 3240.5, p-value = 0.01396
 alternative hypothesis: true location is not equal to 1.829867

 wilcox.test(y,mu=mean(y))

       Wilcoxon signed rank test with continuity correction

 data:  y
 V = 1763, p-value = 0.008837
 alternative hypothesis: true location is not equal to 5.137409

 Which brings us to the question what location is actually tested in
 the wilcoxon test. For the measure of location to be the mean (or
 median), one has to assume that the distribution of the differences is
 rather symmetrical, which implies your data has to be distributed
 somewhat symmetrical. The test is robust against violations of this
 -implicit- assumption, but in more extreme cases skewness does matter.

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net
 wrote:


 You are being misled. Simply finding a statement on a statistics software
 website, even one as reputable as Graphpad (???), does not mean that it
 is
 necessarily true. My understanding (confirmed reviewing Nonparametric
 statistical methods for complete and censored data by M. M. Desu,
 Damaraju
 Raghavarao, is that the Wilcoxon signed-rank test does not require that
 the
 underlying distributions be symmetric. The above quotation is highly
 inaccurate.

 --
 David.



 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread David Winsemius


On Jun 24, 2010, at 6:42 PM, Joris Meys wrote:


On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
dwinsem...@comcast.net wrote:


On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:

I do agree that one should not trust solely on sources like  
wikipedia

and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test,


That is a key point. I was assuming that you were using the paired  
sample

version of the WSRT and I may have been misleading the OP. For the
one-sample situation, the assumption of symmetry is needed but for  
the
paired sampling version of the test, the location shift becomes the  
tested
hypothesis, and no assumptions about the form of the hypothesis are  
made

except that they be the same.


I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i - y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.


What I meant about different hypotheses was that in the single sample  
case the H0 was mean (or median) = mu_pop and in the paired two sample  
the H0 was mean(distr_A_i - distr_B_1) =0. And yes, I did miss the  
OP's point. My apologies.


--
David.


Cheers
Joris


Any consideration of median or mean (which
will be the same in the case of symmetric distributions) gets lost  
in the

paired test case.

--
David.



the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution,  
you

have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the  
median

or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :


set.seed(100)
y - rnorm(100,1,2)^2
wilcox.test(y,mu=median(y))


  Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867


wilcox.test(y,mu=mean(y))


  Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the  
differences is

rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does  
matter.


Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net 


wrote:



You are being misled. Simply finding a statement on a statistics  
software
website, even one as reputable as Graphpad (???), does not mean  
that it

is
necessarily true. My understanding (confirmed reviewing  
Nonparametric

statistical methods for complete and censored data by M. M. Desu,
Damaraju
Raghavarao, is that the Wilcoxon signed-rank test does not  
require that

the
underlying distributions be symmetric. The above quotation is  
highly

inaccurate.

--
David.





--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php







--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Atte Tenkanen
Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n≈250-300

i would like to test, whether the mean of the sample differ significantly from 
the population mean.

The histogram of the population looks like in attached histogram, what test 
should I use? No choices?

This distribution comes from a musical piece and the values are 'tonal 
distances'.

http://users.utu.fi/attenka/Hist.png

Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:
 
  On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are
  sampled from a Gaussian distribution. However it does assume that the
  data are distributed symmetrically around the median. If the
  distribution is asymmetrical, the P value will not tell you much about
  whether the median is different than the hypothetical value.
 
  You are being misled. Simply finding a statement on a statistics
  software website, even one as reputable as Graphpad (???), does not 
 mean
  that it is necessarily true. My understanding (confirmed reviewing
  Nonparametric statistical methods for complete and censored data 
 by M.
  M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test 
 does
  not require that the underlying distributions be symmetric. The above
  quotation is highly inaccurate.
 
 
 To add to what David and others have said, look at the kernel that the 
 
 U-statistic associated with the WSR test uses: the indicator (0/1) of 
 xi 
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that the 
 average of a randomly chosen pair of values is positive.  [If there 
 are 
 ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj  
 
 0], i neq j.
 
 Frank
 
 -- 
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
   Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread David Winsemius




On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:


Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n≈250-300



I do not understand why there should be so many ties. You have not  
described the measurement process or units. ( ... although you offer a  
glipmse without much background  later.)


i would like to test, whether the mean of the sample differ  
significantly from the population mean.


Why? What is the purpose of this investigation? Why should the mean of  
a sample be that important?




The histogram of the population looks like in attached histogram,  
what test should I use? No choices?


This distribution comes from a musical piece and the values are  
'tonal distances'.


http://users.utu.fi/attenka/Hist.png


That picture does not offer much insidght into the features of that  
measurement. It appears to have much more structure than I would  
expect for a sample from a smooth unimodal underlying population.


--
David.



Atte


On 06/24/2010 12:40 PM, David Winsemius wrote:


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:


Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume that  
the

data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much  
about

whether the median is different than the hypothetical value.


You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does not

mean

that it is necessarily true. My understanding (confirmed reviewing
Nonparametric statistical methods for complete and censored data

by M.

M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test

does
not require that the underlying distributions be symmetric. The  
above

quotation is highly inaccurate.



To add to what David and others have said, look at the kernel that  
the


U-statistic associated with the WSR test uses: the indicator (0/1) of
xi
+ xj  0.  So WSR tests H0:p=0.5 where p = the probability that the
average of a randomly chosen pair of values is positive.  [If there
are
ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj 

0], i neq j.

Frank

--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt  
University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wilcoxon signed rank test and its requirements

2010-06-23 Thread Atte Tenkanen
Hi all,

I have a distribution, and take a sample of it. Then I compare that sample with 
the mean of the population like here in Wilcoxon signed rank test with 
continuity correction:

 wilcox.test(Sample,mu=mean(All), alt=two.sided)

Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists 
V = 63855, p-value = 0.0002093
alternative hypothesis: true location is not equal to 0.4115136 

 wilcox.test(Sample,mu=mean(All), alt = greater)

Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists 
V = 63855, p-value = 0.0001047
alternative hypothesis: true location is greater than 0.4115136 

What assumptions are needed for the population?
What can we say according these results?
p-value for the less is 0.999.

Thanks in advance,

Atte

Atte Tenkanen
University of Turku, Finland
Department of Musicology
+35823335278
http://users.utu.fi/attenka/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-23 Thread Joris Meys
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
 Hi all,

 I have a distribution, and take a sample of it. Then I compare that sample 
 with the mean of the population like here in Wilcoxon signed rank test with 
 continuity correction:

 wilcox.test(Sample,mu=mean(All), alt=two.sided)

        Wilcoxon signed rank test with continuity correction

 data:  AlphaNoteOnsetDists
 V = 63855, p-value = 0.0002093
 alternative hypothesis: true location is not equal to 0.4115136

 wilcox.test(Sample,mu=mean(All), alt = greater)

        Wilcoxon signed rank test with continuity correction

 data:  AlphaNoteOnsetDists
 V = 63855, p-value = 0.0001047
 alternative hypothesis: true location is greater than 0.4115136

 What assumptions are needed for the population?

wikipedia says:
The Wilcoxon signed-rank test is a _non-parametric_ statistical
hypothesis test for... 
it also talks about the assumptions.

 What can we say according these results?
 p-value for the less is 0.999.

That the p-value for less and greater seem to sum up to one, and that
the p-value of greater is half of that for two-sided. You shouldn't
ask what we can say. You should ask yourself What was the question
and is this test giving me an answer on that question?

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.