Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

John Fox Tue, 19 Jan 2021 15:22:07 -0800

Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treatingPFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vectorof scores and drug_code as defining two groups. If that's correct, andwith your data into Data, you can try the following:


------snip ------

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

        Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.000014e+00  5.037654e-05
sample estimates:
difference in location
             -1.000019

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact confidence intervals with ties

------snip ------

You can get an approximate confidence interval by specifying exact=FALSE:

------snip ------

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

        Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.000014e+00  5.037654e-05
sample estimates:
difference in location
             -1.000019

------snip ------

As it turns out, your data are highly discrete and have a lot of ties(see in particular PFD_n = 28):


------snip ------

> xtabs(~ PFD_n + drug_code, data=Data)

     drug_code
PFD_n  0  1
   0   2  0
   16  1  1
   18  0  1
   19  0  1
   20  2  0
   22  0  1
   24  2  0
   25  1  2
   26  5  2
   27  4  2
   28  5 13
   30  1  2

------snip ------

I'm no expert in nonparametric inference, but I doubt whether theapproximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving youslightly different results -- assuming that you're actually doing thesame thing in both cases. I couldn't help but notice that most of yourdata are missing. Are you getting the same value of the test statisticand different p-values, or is the test statistic different as well?


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

  Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n 
= c(1, NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, 
NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, 
NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, 
NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, NA, 28, NA, 26, NA, 20, NA, 30, 
24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, 
NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 
27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, 
NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = 
c("tbl_df", "tbl", "data.frame"))

Yours sincerely Bharat Rawlley    On Tuesday, 19 January, 2021, 03:53:27 pm IST, 
Michael Dewey <li...@dewey.myzen.co.uk> wrote:

Unfortunately your data did not come through. Try using dput() and then

pasting that into the body of your e-mail message.

On 18/01/2021 17:26, bharat rawlley via R-help wrote:

Hello,
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am unable to explain.
Q1 In the attached data set, I was trying to compare freq4w_n in those with 
drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
The code I used in R is as follows -
wilcox.test(freq4w_n, drug_code, conf.int = T)


Q2 Similarly, in the same data set, when trying to compare PFD_n in those with 
drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 2.2e-16.
The code I used in R is as follows -
wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = 
TRUE, paired = FALSE, conf.int = TRUE)


I have tried searching on Google and watching some Youtube tutorials, I cannot 
find an answer, Any help will be really appreciated, Thank you!
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

Reply via email to