[issue20479] Efficiently support weight/frequency mappings in the statistics module

Wolfgang Maier Sun, 02 Feb 2014 14:28:37 -0800

Wolfgang Maier added the comment:

> -----Ursprüngliche Nachricht-----
> Von: Steven D'Aprano [mailto:[email protected]]
> Gesendet: Sonntag, 2. Februar 2014 12:55
> An: [email protected]
> Betreff: [issue20479] Efficiently support weight/frequency mappings in the
> statistics module
> 
> 
> Steven D'Aprano added the comment:
> 
> Off the top of my head, I can think of three APIs:
> 
> (1) separate functions, as Nick suggests:
> mean vs weighted_mean, stdev vs weighted_stdev
> 
> (2) treat mappings as an implied (value, frequency) pairs
>


(2) is clearly my favourite. (1) may work well, if you have a module with a 
small fraction of functions, for which you need an alternate API.
In the statistics module, however, almost all of its current functions could 
profit from having a way to treat mappings specially.
In such a case, (1) is prone to create lots of redundancies.

I do not share Oscar's opinion that

> apart from mode() the implementation of each function on
> map-format data will be completely different from the iterable version
> so you'd want to have it as a separate function at least internally
> anyway.

Consider _sum's current code (docstring omitted for brevity):
def _sum(data, start=0):
    n, d = _exact_ratio(start)
    T = type(start)
    partials = {d: n}  # map {denominator: sum of numerators}
    # Micro-optimizations.
    coerce_types = _coerce_types
    exact_ratio = _exact_ratio
    partials_get = partials.get
    # Add numerators for each denominator, and track the "current" type.
    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n
    if None in partials:
        assert issubclass(T, (float, Decimal))
        assert not math.isfinite(partials[None])
        return T(partials[None])
    total = Fraction()
    for d, n in sorted(partials.items()):
        total += Fraction(n, d)
    if issubclass(T, int):
        assert total.denominator == 1
        return T(total.numerator)
    if issubclass(T, Decimal):
        return T(total.numerator)/total.denominator
    return T(total)

all you'd have to do to treat mappings as proposed here is to add a check 
whether we are dealing with a mapping, then in this case, instead of the for 
loop:

    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n

use this:

    for x,m in data.items():
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n*m

and no other changes (though I haven't tested this carefully).

Wolfgang

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue20479>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20479] Efficiently support weight/frequency mappings in the statistics module

Reply via email to