Luc <ouaga...@gmail.com> added the comment: If we are trying to fix this, the behavior should be like computing the mean or harmonic mean with the statistics library when there are missing values in the data. At least that way, it is consistent with how the statistics library works when computing with NaNs in the data. Then again, it should be mentioned somewhere in the docs.
import statistics as stats import numpy as np import pandas as pd data = [75, 90,85, 92, 95, 80, np.nan] stats.mean(data) nan stats.harmonic_mean(data) nan stats.stdev(data) nan As you can see, when there is a missing value, computing the mean, harmonic mean and sample standard deviation with the statistics library return a nan. However, with the median, median_high and median_low, it computes those statistics incorrectly with the missing values present in the data. It is better to return a nan, then let the user drop (or resolve) any missing values before computing. ## Another example using pandas serie df = pd.DataFrame(data, columns=['data']) df.head() data 0 75.0 1 90.0 2 85.0 3 92.0 4 95.0 5 80.0 6 NaN ### Use the statistics library to compute the median of the serie stats.median(df1['data']) 90 ## Pandas returns the correct median by dropping the missing values ## Now use pandas to compute the median of the serie with missing value df['data'].median() 87.5 I did not test the median_grouped in statistics library, but will let you know afterwards if its affected as well. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33084> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com