Warren Weckesser <warren.weckes...@gmail.com> added the comment:

I really don't like the current inconsistent behavior.  I can't help but see it 
as a bug, and I am surprised that folks think it is OK that harmonic_mean([0, 
"foo"]) returns 0.  This is the *statistics* module, and the first line of 
https://docs.python.org/3/library/statistics.html says "This module provides 
functions for calculating mathematical statistics of numeric (Real-valued) 
data".  I understand the desire to maintain the performance benefit of the 
early exit, but I would rather disallow values that are zero than accept the 
nonsensical result of harmonic_mean([0, "foo"]) and the inconsistency between 
that and the result of harmonic_mean(["foo", 0]).  I spend most of my time 
working on numerical-ish code, so my world view might be different than that of 
y'all.

Here's a new proposal.  To continue to accept zero values, and disallow 
negative values consistently (as the function is documented to do), *and* 
preserve the early exit, I propose that the try/except statement in 
harmonic_mean in master be changed to

    try:
        T, total, count = _sum(1/x for x in _fail_neg(data, errmsg))
    except ZeroDivisionError:
        if any(x < 0 for x in data):
            raise StatisticsError(errmsg) from None
        return 0

The change adds just the two lines in the except block.  This doesn't change 
the "happy path" that is followed when all the values are strictly greater than 
0.  The cost of the additional check for any negative values in the input only 
occurs in the (presumably uncommon) case where an input value is 0.  (This 
means, however, that any values before the first 0 are checked twice.)

> In the original design, I prohibited negative numbers because that seems to 
> be what everyone says you should do, but as far as I can tell it is 
> mathematically well defined so long as the sum of positive values doesn't 
> equal the sum of negative values. And in that case, we could return a NAN.

Allowing negative values introduces some mathematical issues.  Most 
importantly, the limit of the harmonic mean as two or more values approach zero 
(in the multivariable limit sense) no longer exists.  Even in the simple case 
of computing the harmonic mean H(x, y) of two values, the limit of H(x, y) as 
(x, y) -> (0, 0) does not exist. H(x, y) is one of those funky functions where 
there are different paths to the origin in the second and fourth quadrants that 
lead to different values of the limit.  Of course, practicality beats purity, 
so the harmonic mean could simply be defined to be 0 whenever any one of the 
inputs is 0.  Then the only issue is the one you mentioned: the handling of the 
case where the denominator of the harmonic mean formula is zero.  In that case, 
the code would return NAN as you suggested, or raise an exception.  Overall, I 
don't think handling negative values is very important, and I'm not sure such a 
change to the function is worthwhile.  On the other hand, I t
 hink it would be better than the current behavior.

> By the way, I've just noticed that harmonic_mean([INF, INF]) raises. Should 
> it return INF instead?

Yes, I expect INF in that case.  Currently it only raises when all the inputs 
are INF.  Include one finite value, and there is no error:

In [67]: harmonic_mean([math.inf, math.inf, 5])
Out[67]: 15.0

The exception could be fixed by checking if total is zero before computing 
n/total (currently computed in the return statement of the function).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38382>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to