This is a general usability/UX remark: when there are NaNs in the input, the 
SciPy tests return NaNs as a statistic and p-value. This is an behavior 
inherited from NumPy reduction functions (such as np.sum), however, in the 
context of statistical functions this is very unintuitive, potentially 
misleading (easily can be interpreted as non-significant or highly significant 
result), and can cost some time to debug.

See for example:

 * 
https://stackoverflow.com/questions/77087907/kruskal-wallis-test-always-gives-nan-values
 * https://github.com/scipy/scipy/issues/20056

A simple solution that I propose is: add a warning message that will appear if 
there are NaNs in the input, so that the user could be alerted/immediately 
notified (tipped) on what is the reason for the observed behaviour and how to 
proceed with debugging such an issue.

I am not sure about it, and have not tested it, but knowing the general 
philosophy of R or Excel, they would just disregard NaNs (implicit .dropna()) 
and returned the user with the desired statistics. Such radical solution is out 
of scope, I believe, in Python, but please consider adding at least a warning.

Kind Regards,
Mikolaj
_______________________________________________
SciPy-Dev mailing list -- scipy-dev@python.org
To unsubscribe send an email to scipy-dev-le...@python.org
https://mail.python.org/mailman3/lists/scipy-dev.python.org/
Member address: arch...@mail-archive.com

Reply via email to