Re: [R] Wilcoxon signed rank test and its requirements
Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? Exactly. If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. Thank you! I will do this. Is this kind of !Monte Carlo -evaluation (?) often used in statistics.If it is, do you know any reference for ti? Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting
Re: [R] Wilcoxon signed rank test and its requirements
Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented
Re: [R] Wilcoxon signed rank test and its requirements
Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University
Re: [R] Wilcoxon signed rank test and its requirements
No I mean something like this, assuming that the iris dataset contains the full population and we want to see if Setaso have a different mean than the population (the null would be that there is no difference in sepal width between species, or that species tells nothing about sepal width): out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) ) obs1 - mean( iris$Sepal.Width[1:50] ) hist(out1, xlim=range(out1,obs1)) abline(v=obs1) mean( out1 obs1 ) I don't have a reference (other than a text book that defines sampling distributions). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Atte Tenkanen [mailto:atte...@utu.fi] Sent: Friday, June 25, 2010 10:08 PM To: Atte Tenkanen Cc: Greg Snow; David Winsemius; R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean(SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orgmailto:greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric
Re: [R] Wilcoxon signed rank test and its requirements
Thanks! The results were similar to the t.test p-values show (I have four samples). Thank you also for using that replicate-function which i didn't know. Till now I have just used for-loops that are not so beautiful... i don't know about the speed. Have to test that. Atte Greg Snow kirjoitti 26.6.2010 kello 23.30: No I mean something like this, assuming that the iris dataset contains the full population and we want to see if Setaso have a different mean than the population (the null would be that there is no difference in sepal width between species, or that species tells nothing about sepal width): out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) ) obs1 - mean( iris$Sepal.Width[1:50] ) hist(out1, xlim=range(out1,obs1)) abline(v=obs1) mean( out1 obs1 ) I donÕt have a reference (other than a text book that defines sampling distributions). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Atte Tenkanen [mailto:atte...@utu.fi] Sent: Friday, June 25, 2010 10:08 PM To: Atte Tenkanen Cc: Greg Snow; David Winsemius; R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements Atte Tenkanen kirjoitti 26.6.2010 kello 5.15: Greg Snow kirjoitti 25.6.2010 kello 21.55: Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. I check, so you mean doing it this way: t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean (SAMPLE), alt = less) NO, this way: t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = less) Atte Atte -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample nÅ250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data
Re: [R] Wilcoxon signed rank test and its requirements
Atte, note the similarity between what Greg described and a bootstrap. The difference to a true bootstrap is that in Greg's version you subsample the population (or in other instances the data). This is known as subsampling bootstrap and discussed in Politis, Romano, and Wolf (1999). HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2269775.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
BTW. If there is not so weak test that would be suitable for my purpose (because of the ties and the shape of the data), could I proceed this way: It is also worth of comparing different samples taken from the data. Since the mean and sd of the data are available, could I approximate p-values using z- or t-test, just to compare several different samples? Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanen atte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
The central limit theorem doesn't help. It just addresses type I error, not power. Frank On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu: The central limit theorem doesn't help. It just addresses type I error, not power. Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The problem with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
You still are stating the effect of the central limit theorem incorrectly. Please see my previous note. Frank On 06/25/2010 10:27 AM, Joris Meys wrote: 2010/6/25 Frank E Harrell Jrf.harr...@vanderbilt.edu: The central limit theorem doesn't help. It just addresses type I error, not power. Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The problem with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Atte, I would not wonder if you got lost and confused by the certainly interesting methodological discussion that has been going on in this thread. Since the helpers do not seem to converge/agree, I propose to you to use a different nonparametric approach: The bootstrap. The important thing about the bootstrap is that you do not have to be concerned with the questions that have been discussed in this thread. In the bootstrap you draw repeatedly samples with replacement from your data and compute the statistic you are interested in (for you this is the mean). The beauty of this approach is i) that the bootstrap distribution is normal and ii) that you can directly compare the quantiles/confidence intervals of the bootstrap distribution. Let's say you have x and y, which both come from Poisson distributions with relatively low means. Note that this resembles your data in that the distributions are asymmetric, but contain a considerable number of ties. #set seed for random number generation set.seed(123) #simulate x and y (these would be your data) x=rpois(100,3) y=rpois(100,4) #plot histograms for x and y par(mfcol=c(1,2)) hist(x,breaks=length(unique(x))) hist(y,breaks=length(unique(y))) Now we sample with replacement from x and y (i.e., we draw one observation from x and one from y, and afterwards we put the drawn observation back into x and y, respectively). For each bootstrap of x and y, respectively, we sample exactly as many observations as there are in x and y, respectively (here 100). We then compute the statistic of interest of this bootstrap (here the mean). We repeat this process many times (here 1000). n=1000 #number of bootstraps to draw x.boot1=numeric(n) y.boot1=numeric(n) for(i in 1:1000){ x.boot1[i]=mean(sample(x,length(x),replace=T)) y.boot1[i]=mean(sample(y,length(y),replace=T)) } Doing this, we draw the bootstrap distribution of the mean of x and y, respectively. Note that the bootstrap distribution is normally distributed and unbiased (the latter automatically because we bootstrap the mean): par(mfcol=c(1,2)) hist(x.boot1) hist(y.boot1) The simple(st) way of comparing these distributions is by checking whether their confidence intervals overlap or not. You get the 95-percent confidence intervals by quantile(x.boot1,p=c(0.025,0.975)) quantile(y.boot1,p=c(0.025,0.975)) If they do not overlap, you would conclude that they are significantly different. In the one-sample case, you would just compare whether value of interest is within or outside the confidence interval. Finally, note that the little loop that we have programmed to draw the bootstraps are already implemented in an R package. Using the bootstrap package, you could draw the bootstraps analogously by: library(bootstrap) x.boot2=bootstrap(x,nboot=1000,mean) y.boot2=bootstrap(y,nboot=1000,mean) The bootstrapped means are then stored in x.boot2$thetastar and y.boot2$thetastar. Hope that helps, Daniel This process we repeatAnd now we draw many bootstraps, r -- View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code
Re: [R] Wilcoxon signed rank test and its requirements
On Thu, 24 Jun 2010, Atte Tenkanen wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned? Is there any other authoritative reference where that is just stated in a way test does not require that the underlying distributions be symmetric or normal. The statement from GraphPad is correct, but for a different question. Let me expound. First let us consider means: If you have paired samples X1.. Xn and Y1..Yn you could ask if the mean of X is equal to the mean of Y, or if the mean of (X-Y) is zero. These are equivalent questions, because of the way the mean is defined. So the paired t-test, which answers the first question, and the one-sample t-test, which answers the second question, are equivalent. They have no assumptions (other than sufficient sample size for the means to be Normally distributed). Now, let us consider medians. f you have paired samples X1.. Xn and Y1..Yn you could ask if the median of X is equal to the median of Y, or if the median of (X-Y) is zero. The first question can be answered by any standard test (though there are ways to do it). The second is answered by the sign test. They are not at all equivalent: it is possible for the median of X to be larger than the median of Y but the median of (X-Y) to be negative. The non-equivalence is true for essentially all statistics except for the mean. Now, let us consider the Wilcoxon signed-rank test. This can be characterized precisely as a test of the null hypothesis that the median pairwise mean of X-Y is zero. That is, take all n(n-1)/2 pairs of (X-Y)s. Take the mean of each pair to get n(n-1)/2 pairwise means. Take the median of these numbers. The p-value will be 0.5 one-sided or 1.0 two-sided when this median pairwise mean is exactly zero. The median pairwise mean is also sometimes known as the Hodges-Lehmann estimator (though this is strictly speaking a more general term). As David correctly points out, no assumptions are needed for the Wilcoxon signed-rank test to be a test of *this* null hypothesis. The problem is that this may not be the null hypothesis you care about. As GraphPad correctly points out, the P value will not tell you much about whether the *median* is different than the hypothetical value because the median is not the same as the median pairwise mean. It is entirely possible for the median difference to be positive and the median pairwise mean difference to be zero or negative. If you assume that the distribution of differences X-Y is symmetric, then the Wilcoxon signed-rank test also tests the null hypothesis that the median of X-Y is zero (and that the mean of X-Y is zero), because these null hypotheses are equivalent for a symmetric distribution. That's what GraphPad is saying You could also assume that the distributions X and Y are stochastically ordered. This basically implies that the direction of difference is the same no matter what location statistic you use to measure it. If X was before some intervention and Y was afterwards you would basically be assuming that the intervention is either beneficial for everyone or harmful for everyone (up to measurement error). Under this assumption, the signed rank test also tells you reliably about differences in medians. To some extent this is a philosophical issue. My preference is to know exactly what a test is doing and to make these distinctions. Many other people, including reputable experts like Frank Harrell, believe (I think) that simplifying assumptions such as stochastic ordering are a pretty good approximation in a lot of situations, so it isn't necessary to always make these distinctions. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle
Re: [R] Wilcoxon signed rank test and its requirements
PS. Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package? Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
One way of looking at it is doing a sign test after substraction of the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect to have about as many values above as below zero. There is a sign test somewhere in one of the packages, but it's easily done using the binom.test as well : set.seed(12345) x1 - rnorm(100) x2 - rpois(100,2) binom.test((sum(x1-mean(x1)0)),length(x1)) Exact binomial test data: (sum(x1 - mean(x1) 0)) and length(x1) number of successes = 56, number of trials = 100, p-value = 0.2713 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.4571875 0.6591640 sample estimates: probability of success 0.56 binom.test((sum(x2-mean(x2)0)),length(x2)) Exact binomial test data: (sum(x2 - mean(x2) 0)) and length(x2) number of successes = 37, number of trials = 100, p-value = 0.01203 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2755666 0.4723516 sample estimates: probability of success 0.37 Cheers Joris On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen atte...@utu.fi wrote: PS. Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package? Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned? Is there any other authoritative reference where that is just stated in a way test does not require that the underlying distributions be symmetric or normal. Atte On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, That is a key point. I was assuming that you were using the paired sample version of the WSRT and I may have been misleading the OP. For the one-sample situation, the assumption of symmetry is needed but for the paired sampling version of the test, the location shift becomes the tested hypothesis, and no assumptions about the form of the hypothesis are made except that they be the same. Any consideration of median or mean (which will be the same in the case of symmetric distributions) gets lost in the paired test case. -- David. the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, That is a key point. I was assuming that you were using the paired sample version of the WSRT and I may have been misleading the OP. For the one-sample situation, the assumption of symmetry is needed but for the paired sampling version of the test, the location shift becomes the tested hypothesis, and no assumptions about the form of the hypothesis are made except that they be the same. I believe you mean the form of the distributions. The assumption that the distributions of both samples are the same (or similar, it is a robust test) implies that the differences x_i - y_i are more or less symmetrically distributed. Key point here that we're not talking about the distribution of the populations/samples (as done in the OP) but about the distribution of the difference. I may not have been clear enough on that one. Cheers Joris Any consideration of median or mean (which will be the same in the case of symmetric distributions) gets lost in the paired test case. -- David. the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Jun 24, 2010, at 6:42 PM, Joris Meys wrote: On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, That is a key point. I was assuming that you were using the paired sample version of the WSRT and I may have been misleading the OP. For the one-sample situation, the assumption of symmetry is needed but for the paired sampling version of the test, the location shift becomes the tested hypothesis, and no assumptions about the form of the hypothesis are made except that they be the same. I believe you mean the form of the distributions. The assumption that the distributions of both samples are the same (or similar, it is a robust test) implies that the differences x_i - y_i are more or less symmetrically distributed. Key point here that we're not talking about the distribution of the populations/samples (as done in the OP) but about the distribution of the difference. I may not have been clear enough on that one. What I meant about different hypotheses was that in the single sample case the H0 was mean (or median) = mu_pop and in the paired two sample the H0 was mean(distr_A_i - distr_B_1) =0. And yes, I did miss the OP's point. My apologies. -- David. Cheers Joris Any consideration of median or mean (which will be the same in the case of symmetric distributions) gets lost in the paired test case. -- David. the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 i would like to test, whether the mean of the sample differ significantly from the population mean. The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wilcoxon signed rank test and its requirements
Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? What can we say according these results? p-value for the less is 0.999. Thanks in advance, Atte Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.