David Mertz wrote:
> Set union, however, has a great deal in common with bitwise-or
On the contrary, in mathematics there's the concept of direct sum of sets, and
the categorical sum, aka disjoint union. They are not union operations, but
similar. They have not named them "direct or" and "catego
On Thu, Dec 26, 2019 at 11:15 PM Marco Sulla via Python-ideas
wrote:
> Chris Angelico wrote:
> > > Subtracting two lists or two strings has no sense, so the comparison is
> > > unfair.
> > Except that it DOES make sense in some contexts.
>
> Source, please.
Pike v8.1 release 13 running Hilfe v3.5
Hello,
I think it would be nice to introduce an avg method for lists as a built-in
function in python3.
To get average of the list, I need to use some libs (eg numpy).
In my opinion, if I can get sum of the list, I should get avg also in a same
way.
For ex [python3]:
>>> l = [5, 9, 7,]
...
...
Chris Angelico wrote:
> > Mathematically, the operator is ⊂. "<" operator is used for comparison, and
> > it's vital for sorting. And sorting sets makes no sense.
> Once again, you assert this. Do you have proof that it absolutely
> makes NO SENSE in any context, or just that you don't see value in
Just use `from statistics import mean as avg` (see
https://docs.python.org/3/library/statistics.html#statistics.mean).
Please provide some justification on why do you think it's desirable to
make `avg` a builtin, considering, that doing so is a backwards
incompatible change due to the more than li
Thank you Sebastien for your contribution. I wasn't clear maybe.
My idea is being able to use avg function without importing any library.
The reason to propose this evolution is basically,
* If I can do sum(list) and len(list), would be better to do avg(list)
(since I know sum and len of my
The Python standard library module 'statistics' has a "mean" function.
On Thu, Dec 26, 2019, 08:54 Kemal Diri wrote:
> Hello,
>
> I think it would be nice to introduce an avg method for lists as a
> built-in function in python3.
> To get average of the list, I need to use some libs (eg numpy).
>
So why only mean and not median, that's better for statistics? :D
Seriously, if you want it builtin, add it to PYTHONSTARTUP:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP
from statistics import mean
___
Python-ideas mailing list
This came up in discussion here before, maybe a year ago, I think. There
was a decision not to change the implementation, but that seemed like a
mistake (and the discussion was about broader things).
Anyway, I propose that the obviously broken version of
`statistics.median()` be replaced with a b
Kemal Diri writes:
> In my opinion, if I can get sum of the list, I should get avg also
> in a same way.
And later:
> The reason to propose this evolution is basically,
> * If I can do sum(list) and len(list), would be better to do
> avg(list) (since I know sum and len of my list),
Oh yeah, I should have thrown in these cases:
>>> statistics.median([1, 2, nan, 3, 4])
nan
>>> statistics.median([2, 3, 4, nan, 1])
4
I admit that this issue *never* hits me in particular since I use NumPy and
Pandas widely, and every time I do statistics it is from those packages (or
statsmodels
On Wed, Dec 25, 2019, at 21:09, python-ideas--- via Python-ideas wrote:
> On the contrary, on sets you can apply union *and* difference. And
> since union seems the exact contrary of difference, it's illogical that
> | is used instead of +.
But sets also support symmetric difference ^, and inter
On 12/26/19 10:31 AM, David Mertz wrote:
This came up in discussion here before, maybe a year ago, I think.
There was a decision not to change the implementation, but that seemed
like a mistake (and the discussion was about broader things).
Anyway, I propose that the obviously broken version
Well, *I* know the implementation. And I know about NaN being neither less
than or greater than anything else (even itself). And I know the basic
working of Timsort.
But a lot of other folks, especially beginners or casual users, don't know
all that. The do know that fractional numbers are a thin
Stephen J. Turnbull wrote:
> > from statistics import mean
> > sum([1e16,1,1])/3 == 1e16/3# surprise!
> > True
> > mean([1e16,1,1]) == 1e16/3
> > False
> > Regards,
Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01)
[GCC 9.2.1 20190909] on linux
Type "help", "copyright"
Random832 wrote:
> But sets also support symmetric difference ^, and intersection &. All the
> bitwise operators mean the same thing that they do for an integer imagined as
> a set of bit
> values. The use of - for difference is the odd one out
and?
__
Well, some days ago i didn't know about `statistics` module, so I
wrote my own median implementation, that I improved with the help of a
private discussion:
```
import math
def median(it, member=False, sort_fn=sorted, **kwargs):
if sort is None:
# Don't sort. Coder must be carefull to
> On Dec 26, 2019, at 04:15, Marco Sulla via Python-ideas
> wrote:
>
> Mathematically,
Whenever someone tries to argue that “Mathematically, this doesn’t make sense”
it ends up isomorphic to an argument that they really would have enjoyed one
more semester of math classes as an undergrad but
As I was saying, the issue is that statistics.median can deal with many
types and to have it special case for nan would be awkward. The user
could also have done something like use None values (but this does give
an error).
Perhaps where the test could be done would be in the built in function
On Dec 26, 2019, at 05:54, Kemal Diri wrote:
>
>
> Hello,
>
> I think it would be nice to introduce an avg method for lists as a built-in
> function in python3.
> To get average of the list, I need to use some libs (eg numpy).
You don’t need a third party library like numpy; you can use sta
On 12/26/19 1:38 PM, Marco Sulla via Python-ideas wrote:
Well, some days ago i didn't know about `statistics` module, so I
wrote my own median implementation, that I improved with the help of a
private discussion:
```
import math
def median(it, member=False, sort_fn=sorted, **kwargs):
if s
On Dec 26, 2019, at 10:58, Richard Damon wrote:
>
> Note, that NaN values are somewhat rare in most programs, I think they can
> only come about by explicitly requesting them (like float("nan") ) or perhaps
> with some of the more advanced math packages
You can get them easily just from math i
The problem is that everyone has a different idea about what is a "basic
operation" is. If everything that anyone considered a "basic operation"
was included as a built-in then the builtins would be unusably large. That
is why we have the standard library, so people can easily do "basic
operation
Andrew Barnert wrote:
> > the operator is ⊂. "<" operator is used for
> > comparison, and it's vital for sorting.
> Yes. It’s the defining operation for the partial order in a poset (partially
> ordered set). And when studying posets generically, you always spell the
> operation <.
Nope.
Usuall
Richard Damon wrote:
> Note, this functions still has issues with NaN values, unless you change
> to use a sort function different than sorted
Well, yes. Indeed I wrote you can change the `sort_fn` parameter.
Anyway IMHO the best part of this function is that you don't need anymore the
biased m
On Dec 26, 2019, at 10:53, Andrew Barnert wrote:
>
> You’ve got it backward. Historically, the subset symbol is a C squashed to
> look graphically similar to a < (or actually a reversed version of a reversed
> C squashed to look graphically similar to >), and Russell, who chose that out
> of t
Maybe we can just change the function signature:
statistics.median(it, do_wrong_ass_thing_with_nans=False)
:-)
But yes, the problem is really with sorted(). However, the implementation
of statistics.median() doesn't HAVE TO use sorted(), that's just one
convenient way to do it.
There IS NO righ
IMHO, another sorted function, slower than it, should be added.
I don't know exactly how timsort works, but I know that it requires only
`__lt__()`.
So probably it checks, element by element, if `elem_b < elem_a`, and probably
exchange them if this is `1`. If `0`, it does nothing.
(Yes, `0` an
Well, Barnert, maybe you didn't understood my irony, so I speak more seriously.
This is extremely interesting, but **completely** OT.
We are discussing about the operator that potentially could merge in a future
two `dict`s.
I think this is OT also for the mailing list... but I think you could
On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote:
On Dec 26, 2019, at 10:58, Richard Damon wrote:
Note, that NaN values are somewhat rare in most programs, I think they can only come
about by explicitly requesting them (like float("nan") ) or perhaps with some
of the more advanced m
> On Dec 26, 2019, at 11:32, Marco Sulla via Python-ideas
> wrote:
>
> Andrew Barnert wrote:
>>> the operator is ⊂. "<" operator is used for
>>> comparison, and it's vital for sorting.
>> Yes. It’s the defining operation for the partial order in a poset (partially
>> ordered set). And when stud
On 12/26/19 3:14 PM, David Mertz wrote:
Maybe we can just change the function signature:
statistics.median(it, do_wrong_ass_thing_with_nans=False)
:-)
But yes, the problem is really with sorted(). However, the
implementation of statistics.median() doesn't HAVE TO use sorted(),
that's just on
Here is an implementation that:
A. Only relies on '<'
B. Has no special knowledge of NaN
C. Does not use sorted() or .sort() for anything
D. Is Pandas-like in choosing among only comparable items
E. Only picks an element from the original iterator, does not take mean of
two candidates
(I.e. ki
On Thu, Dec 26, 2019 at 4:12 PM Richard Damon
wrote:
> As was pointed out, the statistics module specifically doesn't claim to
> replace more powerful packages, like Numpy, so expecting it to handle
> this level of nuance is beyond its specification.
>
Not being flat-out crazy in its answer isn'
Andrew Barnert wrote:
> I didn’t want to get into that, because I assumed you weren’t going to argue
> that
> <= makes sense for sets but < doesn’t
So you're telling about **strict** partial ordering. I can spend thousand of
words, but I think Python can speak for me:
```
(venv) marco@buzz:~/s
FWIW, here is a timing:
>>> many_nums = [randint(10, 100) for _ in range(1_000_000)]
>>> %timeit statistics.median_low(many_nums)
87.2 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit median(many_nums)
282 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Oh... mine doesn't handle unbalanced duplicate values correctly :-). Maybe
I'll figure out how to fix it. But I'm not really proposing the
implementation anyway, just making the abstract point that we don't need
sorted()
On Thu, Dec 26, 2019 at 4:34 PM David Mertz wrote:
> FWIW, here is a timi
Really the point about partial order was EXACTLY the thread.
If you want to say that floating point numbers are not ordered for exactly
the same reason, and in exactly the same way, as sets... well, I guess you
can die on that hill. Since NaN is an IEEE-854 value, everything you
mention is precis
David Mertz wrote:
> Here is an implementation that:
> A. Only relies on '<'
Well, no. There's an `==` check.
Can you please read this 2 posts of mine?
https://mail.python.org/archives/list/python-ideas@python.org/message/7255SH6LSC266HAGI4SRJGV4JTUMMI4J/
https://mail.python.org/archives/list/py
On Dec 26, 2019, at 12:19, Marco Sulla via Python-ideas
wrote:
>
> IMHO, another sorted function, slower than it, should be added.
You can very easily just write a key function that does this for you. In fact,
you can write different key functions for different variations.
For example, if yo
On Fri, Dec 27, 2019 at 9:07 AM Andrew Barnert via Python-ideas
wrote:
>
> You can very easily just write a key function that does this for you. In
> fact, you can write different key functions for different variations.
>
> For example, if you’re specifically sorting floats and want to shift nans
On Dec 26, 2019, at 12:36, Richard Damon wrote:
>
> On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote:
>>> On Dec 26, 2019, at 10:58, Richard Damon wrote:
>>> Note, that NaN values are somewhat rare in most programs, I think they can
>>> only come about by explicitly requesting them (
On Dec 26, 2019, at 14:13, Chris Angelico wrote:
>
> On Fri, Dec 27, 2019 at 9:07 AM Andrew Barnert via Python-ideas
> wrote:
>>
>> You can very easily just write a key function that does this for you. In
>> fact, you can write different key functions for different variations.
>>
>> For exam
David Mertz wrote:
> NaN is an IEEE-854 value, everything you
> mention is precisely identical of floats.
> Is your argument that we need to stop using the '<' operator for floats
> also?!
Nope, but I simply ignored in this case IEEE 754-2019 (that supersedes IEEE
854) and I raised an exception,
On Fri, Dec 27, 2019 at 9:35 AM Andrew Barnert wrote:
>
> On Dec 26, 2019, at 14:13, Chris Angelico wrote:
> >
> > On Fri, Dec 27, 2019 at 9:07 AM Andrew Barnert via Python-ideas
> > wrote:
> >>
> >> You can very easily just write a key function that does this for you. In
> >> fact, you can wr
Andrew Barnert wrote:
> On Dec 26, 2019, at 12:19, Marco Sulla via Python-ideas
> python-ideas@python.org wrote:
> you can get the behavior of your algorithm below:
> @functools.cmp_to_key
> def flip_incomparables_key(a, b):
> if a < b: return -1
> if b < a: return 1
> return 1
^_
> On Dec 26, 2019, at 14:46, Marco Sulla via Python-ideas
> wrote:
>
> David Mertz wrote:
>> NaN is an IEEE-854 value, everything you
>> mention is precisely identical of floats.
>> Is your argument that we need to stop using the '<' operator for floats
>> also?!
>
> Nope, but I simply ignored
On Dec 26, 2019, at 15:27, Marco Sulla via Python-ideas
wrote:
>
> Andrew Barnert wrote:
>> On Dec 26, 2019, at 12:19, Marco Sulla via Python-ideas
>> python-ideas@python.org wrote:
>> you can get the behavior of your algorithm below:
>> @functools.cmp_to_key
>> def flip_incomparables_key(a, b
Nope. flip_incomparables_key does not work, and neither my key. This one works:
```
import functools
@functools.cmp_to_key
def iliadSort(a, b):
if a < b:
res = -1
elif not b == b:
res = -1
else:
res = 0
return res
x = float("nan")
y = float("nan")
p
Thanks everyone commenting on this thread. I haven't quite read it all
yet (I will) but I wanted to get a few comments now.
On Thu, Dec 26, 2019 at 10:31:00AM -0500, David Mertz wrote:
> Anyway, I propose that the obviously broken version of
> `statistics.median()` be replaced with a better imp
Andrew Barnert wrote:
> if you’re going to be a pedant, the floats in
> whatever Python you’re using right now are probably 854/754-1985 doubles, not
> 754-2019
> binary64s.
Mr. Andrew Barnet,
if pedant means adhere to the standard, yes, I'm a pedant.
> > This is because NaN, IMHO, it's not the
Steven D'Aprano wrote:
> Marco, you don't have to use median_low and median_high if you don't
> like them, but they aren't any worse than any other choice for
> calculating order statistics. All order statistics (apart from min and
> max) require you to sometimes make a choice between returning
FWIW, although no one cares, I "withdraw" my proposed implementation.
While it bugs me that I'm not sure what error I made in dealing with
duplicate values in an iterable, on reflection I think the whole idea is
wrong.
That is, I don't like the weirdness of the behavior of statistics.median.
But w
It's very common to see:
```
for i, x in enumerate(sequence):
[...]
```
and also to see
```
for i in range(len(sequence)):
[...]
```
I propose to introduce for sequences the methods `indexes()` and `entries()`.
They are similar to `keys()` and `items()` for maps. I changed the names
David Mertz wrote:
> So we could get the Pandas-style behavior simply by calling median like so:
> statistics.median((x for x in it if not math.isnan(x)))
This is wrong. Or maybe potentially wrong.
This way you're removing items from the iterable, so you're moving the median.
If the NaNs are not
Yes. Like Pandas does! Like I wrote!
And yes, it is one of three plausible good behaviors that Steven described
well. Which is kinda why is still like a named parameter like 'on_nan' to
choose which behavior you want inside the function.
Unfortunately, propogating/poisoning NaN like NumPy does ca
Oh my... Mertz, listen to me, you don't need a parameter. You only need a key
function to pass to `sorted()`
If you use this key function:
https://mail.python.org/archives/list/python-ideas@python.org/message/M3DEOZLA63Z5OIF6H6ZCEXK36GQMLVVA/
in my median() function:
https://mail.python.org/arch
Excuse me, but extraordinarily I agree with D'Aprano :D
Usually if you want the first element of an iterable, you have just to do:
```
it = iter(iterable)
first = next(it)
```
Yes, `first()` is really sexy... but a simple question: where is the iterator?
With the code above, I can continue to
The behavior of your sort function is not any of the desirable options.
Moving NaNs to the end is not the widely used Panda style of removing them;
I cannot think of any situation where that behavior would be useful... even
though I've read the Illiad.
Other than wastefully creating an eager list,
David Mertz wrote:
> The behavior of your sort function is not any of the desirable options.
> Moving NaNs to the end is not the widely used Panda style of removing them
...Mertz, you are really hardheaded I supported **all** the option of your
lovely Pandas, that supports also poisoning, and
On Fri, Dec 27, 2019 at 1:23 PM Marco Sulla via Python-ideas
wrote:
> I propose to introduce for sequences the methods `indexes()` and `entries()`.
>
"Sequence" is a protocol, not a class. Adding a method to sequences
actually means mandating that everything that calls itself a sequence
now has t
On Fri, Dec 27, 2019 at 03:40:10AM -, Marco Sulla via Python-ideas wrote:
> Oh my... Mertz, listen to me, you don't need a parameter. You only
> need a key function to pass to `sorted()`
How do you pass the key function to sorted() without a parameter?
> median(iterable, key=iliadSort)
Wha
On Thu, Dec 26, 2019 at 02:23:42PM -0800, Andrew Barnert via Python-ideas wrote:
> I don’t think that’s true. Surely the median of (-inf, 1, 2, 3, inf,
> inf, inf) is well defined and can only be 3?
It's well-defined, but probably not good statistics. I'm not sure what
measurement you are makin
On Fri, Dec 27, 2019 at 04:32:44AM -, Marco Sulla via Python-ideas wrote:
> Think about this: you have a population of 1 million of people. You
> want to take the median of their heart rate. But for some reason, your
> calculations gives you some NaN.
The only reasonable scenario for that i
On Fri, Dec 27, 2019 at 02:03:57AM -, Marco Sulla via Python-ideas wrote:
> Steven D'Aprano wrote:
> > Marco, you don't have to use median_low and median_high if you don't
> > like them, but they aren't any worse than any other choice for
> > calculating order statistics. All order statistics
Forcing NANs to the end is not the right solution.
Consider the median of [NAN, 2, 3, 4, 5]. If you force the NAN to remain
at the start, the median is 3. If you force the NAN to the end of the
list, the median is 4. Your choice to force NANs to the end is
equivalent to introducing a bias towar
66 matches
Mail list logo