Raymond Hettinger <[email protected]> added the comment:
In general NaNs wreak havoc on any function that uses comparisons:
>>> max(10, nan)
10
>>> max(nan, 10)
nan
It isn't the responsibility of the functions to test for NaNs. Likewise, it
would not make sense to document the NaN behavior in every function that uses a
comparison. That would clutter the docs and it puts the responsibility in the
wrong place.
It is the NaN itself that is responsible for the behavior you're seeing. It is
up to the NaN documentation to discuss its effects on downstream code. For
example, sorting isn't consistent when NaNs are present:
>>> sorted([10, nan, 5])
[10, nan, 5]
>>> sorted([5, nan, 10])
[ 5, nan, 10]
Also, a NaN is just one of many possible objects that create weird downstream
effects. For examples, sets have a partial ordering and would also create odd
results with sorted(), bisect(), max(), min(), etc.
The situation with Infinity is similar. It is a special object that has the
unusual property that inf+1==inf, so bisection of cumulative sums will give the
rightmost infinity.
Consider a population 'ABC' and weights of [5, 7, 2]. We have
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B 7 12 5.0 <= X < 12.0
C 2 14 12.0 <= X < 14.0
------
14
The selector X comes from: random() * 14.0 which gives a range of: 0.0 <= X <
14.0.
Now consider a population 'ABC' and weights of [5, 0, 2]. We have
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B 0 5 5.0 <= X < 5.0
C 2 7 5.0 <= X < 7.0
------
7
If X is 5.0, we have to choose C because B has a zero selection probabliity.
So have to pick the rightmost (bottommost) range that has 5.0, giving the C.
Now, replace B's weight with float('Inf'):
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B Inf Inf 5.0 <= X < Inf
C 2 Inf Inf <= X < Inf
------
Inf
Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated.
The selector itself in always Inf because "X = random() * inf" always gives
inf. Using the previous rule, we have to choose the rightmost Inf which is C.
This is in fact what choices() does:
>>> choices('ABC', [5, float('Inf'), 2])
['C']
It isn't an error. That is the same behavior that bisect() has when searching
for a infinite value (or any other object that universally compares larger than
anything except itself):
>>> bisect([10, 20, inf], inf)
3
>>> bisect([10, 20, 30], inf)
3
When searching for infinity, you always get the rightmost insertion point even
if the cuts points are finite. Accordingly, it makes sense that if one of the
weights is infinite, then the total infinite, and the selector is infinite, so
you always get the rightmost value.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41773>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com