On 15/08/13 21:42, Mark Dickinson wrote:
The PEP and code look generally good to me.
I think the API for median and its variants deserves some wider discussion:
the reference implementation has a callable 'median', and variant callables
'median.low', 'median.high', 'median.grouped'. The pattern of attaching
the variant callables as attributes on the main callable is unusual, and
isn't something I've seen elsewhere in the standard library. I'd like to
see some explanation in the PEP for why it's done this way. (There was
already some discussion of this on the issue, but that was more centered
around the implementation than the API.)
I'd propose two alternatives for this: either have separate functions
'median', 'median_low', 'median_high', etc., or have a single function
'median' with a "method" argument that takes a string specifying
computation using a particular method. I don't see a really good reason to
deviate from standard patterns here, and fear that users would find the
current API surprising.
Alexander Belopolsky has convinced me (off-list) that my current implementation
is better changed to a more conservative one of a callable singleton instance
with methods implementing the alternative computations. I'll have something
like:
def _singleton(cls):
return cls()
@_singleton
class median:
def __call__(self, data):
...
def low(self, data):
...
...
In my earlier stats module, I had a single median function that took a argument to choose
between alternatives. I called it "scheme":
median(data, scheme="low")
R uses parameter called "type" to choose between alternate calculations, not
for median as we are discussing, but for quantiles:
quantile(x, probs ... type = 7, ...).
SAS also uses a similar system, but with different numeric codes. I rejected both "type"
and "method" as the parameter name since it would cause confusion with the usual meanings
of those words. I eventually decided against this system for two reasons:
- Each scheme ended up needing to be a separate function, for ease of both implementation
and testing. So I had four private median functions, which I put inside a class to act as
namespace and avoid polluting the main namespace. Then I needed a "master
function" to select which of the methods should be called, with all the additional
testing and documentation that entailed.
- The API doesn't really feel very Pythonic to me. For example, we write:
mystring.rjust(width)
dict.items()
rather than mystring.justify(width, "right") or dict.iterate("items"). So I
think individual methods is a better API, and one which is more familiar to most Python users. The
only innovation (if that's what it is) is to have median a callable object.
As far as having four separate functions, median, median_low, etc., it just
doesn't feel right to me. It puts four slight variations of the same function
into the main namespace, instead of keeping them together in a namespace. Names
like median_low merely simulates a namespace with pseudo-methods separated with
underscores instead of dots, only without the advantages of a real namespace.
(I treat variance and std dev differently, and make the sample and population
forms separate top-level functions rather than methods, simply because they are
so well-known from scientific calculators that it is unthinkable to me to do
differently. Whenever I use numpy, I am surprised all over again that it has
only a single variance function.)
--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com