[Python-ideas] Re: A standard library Multiset implementation?

2022-08-17 Thread Raymond Hettinger
> I am often driven to use, for example, itertools set(permutations(multiset, 
> n)) 

Try the more-itertools package: 
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.distinct_permutations

from more_itertools import distinct_permutations
from collections import Counter

c = Counter('abracadabra’)
print(list(distinct_permutations(c.elements(), 3)))


> there is little if any solid evidence that they do what they claim to do (

The code in more-itertools is trustworthy.  Also the docs have links to the 
source, so you can just read the code and let it earn your trust.


Raymond ___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GQQDVWYY5JPGEBL7DY33O4IZGD6ADLDV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Raymond Hettinger
> I propose a method:
> ...
> returns a dictionary {arg: value} representing the cache. 
> It wouldn't be the cache itself, just a shallow copy 
> of the cache data

I recommend against going down this path.

It exposes (and potentially locks in) implementation details such as
how we distinguish positional arguments, keyword arguments, and
type information (something that has changed more than once).

Also, a shallow copy still leaves plenty of room for meddling
with the contents of the keys, potentially breaking the integrity
of the cache.

Another concern is that we've worked hard to remove potential
deadlocks from the lru_cache.  Hanging on a lock while copying the
whole cache complicates our efforts and risks breaking it as users
exploit the new feature in unpredictable ways.

FWIW, OrderedDict provides methods that make it easy to roll
your own variants of the lru_cache().  It would better to do that
than to complexify the base implementation in ways that I think
we would regret.

Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OKQO6GFE4JTEAJR4S454KMMKN6C6CNUZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Inline Try-Except Clause

2020-08-06 Thread raymond . hettinger
> Have a look at PEP 463, which looks into this in some detail.

I wish this PEP had gained more traction.  Sooner or later, everyone wants an 
expression form of a try/except.

When it comes to expressing "in the event of this exception, I want this 
default",  exception expressions read much more nicely than an equivalent 
try/except block.

Also, new syntax would keep the rest of the language clean so that don't end up 
adding dozens of get() methods. Or r having us expand function signatures with 
default arguments, like min() and max() functions for example.

It would be great if this PEP were to be resurrected.

Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S4NAVCQQAXYHXYL5DYOYMTACZ6G6A4SW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Propouse add context to json module.

2020-08-06 Thread raymond . hettinger
Based on experience with the decimal module, I think this would open a can of 
worms.  To match what decimal does, we would need a Context() object with 
methods for dump, dumps, load, loads.  There would need to be a thread-local or 
contextvar instance accessed by getcontext and setcontext, and perhaps a 
decorator as well.  We would need a few pre-made instances for common cases.  

Also, the decimal module was context aware from the outset. For JSON, we have 
large body of pre-existing client code that was created and tested without the 
concept of a context.  Should existing code use the new context and possibly 
break assumed invariants?  If the existing code had explicit parameters (such 
as indent=4), would the context override the parameter, take a backseat to the 
parameter, or raise an exception?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KAD3U5HPTC76PCVXBAB4PMKVPDV4P4JC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread raymond . hettinger
Ram Rachum wrote:.
> I notice that the random.sample function doesn't have a default behavior
> set when you don't specify k. This is fortunate, because we could make
> that behavior just automatically take the length of the first argument. So
> we could do this:
> shuffled_numbers = random.sample(range(10, 10 ** 5))
> What do you think?

This is bad API design.  The most likely user mistake is to omit the *k* 
argument.  We want that to be an error.  It is common to sample from large 
populations, we don't want the default to do anything terrible — for example, 
you're in a Jupyter notebook and type "sample(range(10_000_000))" and forget to 
enter the sample size.

Also, having *k* default to the population size would be surprisingly 
inconsistent given that choices() has a default k=1.  API design principle: 
don't have unexpectedly different defaults in related functions.

Lastly, the use for in-line shuffling is not the primary use case.  If there 
were a default argument, it should cater to the principal use case,.  API 
design principle:  don't do anything weird or unexpected by default.

IMO you're trying too hard to jam a round peg into a square hole. There isn't a 
substantive problem being solved — being explicit by writing "sample(p, 
len(p))"  instead of "sample(p)" isn't an undue burden.

Please also consider that we thought about all of this when sample() was first 
created.  The current API is intentional.  As you noted, this suggestion was 
also already rejected on the bug tracker.  So, this thread seems like an 
attempt to second guess that outcome as well as the original design decision.  
If you're going to do something like that, save it for something important :-)


Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K4RQTFYD43OHQTSCWC32R2KYFQGXHR36/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-02 Thread raymond . hettinger
Steven D'Aprano wrote:
> > This is easily solved with a three-line helper:
> def shuffled(iterable):
 ...
> I have implemented this probably a half a dozen times, and I expect 
> others have too.

FWIW, we've already documented a clean way to do it, 
https://docs.python.org/3/library/random.html#random.shuffle , "To shuffle an 
immutable sequence and return a new shuffled list, use sample(x, k=len(x)) 
instead."

>>> data = 'random module'
>>> ''.join(sample(data, len(data)))
'uaemdor odmln'

Given that we already have shuffle() and sample(), I really don't think we need 
a third way to it.  How about we save API extensions for ideas that add genuine 
new, useful capabilities.

Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VVKZU6ABPBYZORXMURCIHBZZRNRREMIS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Augment abc.Set API (support named set methods for dictionary view objects)

2020-06-01 Thread Raymond Hettinger


> On Jun 1, 2020, at 3:32 AM, a...@yert.pink a...@yert.pink  
> wrote:
> 
> I propose that the `Set` ABC API should be augmented to contain all of the 
> named methods. This would provide consistency in the collections, and enhance 
> the duck typing capabilities of the `Set` abc.

Two thoughts.  First, I believe Guido intentionally omitted the named set 
methods from the ABC — perhaps the reasons are documented in the ABC PEP.   
Second, most APIs are easily expanded by adding new methods, but ABCs define a 
minimum for other classes to implement. So if we added new methods, it would 
likely break code that was only meeting the existing minimum.


Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YX7KJ6XRMF4O4MQYRFP7X5KBD52VTGTC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Equality between some of the indexed collections

2020-05-04 Thread Raymond Hettinger



> On May 3, 2020, at 6:19 PM, Steven D'Aprano  wrote:
> 
>>> `frozenset` and `set` make a counterexample:
>>> 
>> frozenset({1}) == {1}
>>> True
>>> 
>> 
>> Nice catch! That's really interesting. Is there reasoning behind
>> `frozenset({1}) == {1}` but `[1] != (1,)`, or is it just an accident of
>> history? 
> 
> Conceptually, sets are sets, whether they are mutable or frozen.

Right.  This isn't an accident. It is by design.

Also, some numeric types are specifically designed for cross-type comparison:

 >>> int(3) == float(3) == complex(3, 0)
 True

And in Python 2, by design, str and unicode were comparable:

>>> u'abc' == 'abc'
True

But the general rule is that objects aren't cross-type comparable by default.  
We have to specifically enable that behavior when we think it universally makes 
sense.  The modern trend is to avoid cross-type comparability, enumerates and 
data classes for example:

>>> Furniture = Enum('Furniture', ('table', 'chair', 'couch'))
>>> HTML = Enum('HTML', ('dl', 'ol', 'ul', 'table'))
>>> Furniture.table == HTML.table
False

>>> A = make_dataclass('A', 'x')
>>> B = make_dataclass('B', 'x')
>>> A(10) == B(10)
False

Bytes and str are not comparable in Python 3:

>>> b'abc' == 'abc'
False


>> Isn't a tuple essentially just a frozenlist? I know the intended
>> semantics of tuples and lists tend to be different, but I'm not sure that's
>> relevant.


In terms of API, it might look that way.  But in terms of use cases, they are 
less alike:  lists-are-looping, tuples-are-for-nonhomongenous-fields.  List are 
like database tables; tuples are like records in the database.   Lists are like 
C arrays; tuples are like structs.

On the balance, I think more harm than good would result from making sequence 
equality not depend on type.  Also when needed, it isn't difficult to be 
explicit that you're converting to a common type to focus on contents:

>>> s = bytes([10, 20, 30])
>>> t = (10, 20, 30)
>>> list(s) == list(t)

When you think about it, it makes sense that a user gets to choose whether 
equality is determined by contents or by contents and type.  For some drinkers, 
a can of beer is equal to a bottle of bear; for some drinkers, they aren't 
equal at all ;-)

Lastly, when it comes to containers.  They each get to make their own rules 
about what is equal.  Dicts compare on contents regardless of order, but 
OrderedDict requires that the order matches.


Raymond






___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7WOB36JSIX3ZSG7KFNQ4F563ZKSW32G5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding a "once" function to functools

2020-04-29 Thread Raymond Hettinger



> On Apr 29, 2020, at 11:15 AM, Tom Forbes  wrote:
> 
> What exactly would the issue be with this:
> 
> ```
> import functools
> from threading import Lock
> 
> def once(func):
>sentinel = object()
>cache = sentinel
>lock = Lock()
> 
>@functools.wraps(func)
>def _wrapper():
>nonlocal cache, lock, sentinel
>if cache is sentinel:
>with lock:
>if cache is sentinel:
>cache = func()
>return cache
> 
>return _wrapper
> ```

This recipe is the best variant so far and gives us something concrete to talk 
about :-)

Benefits: Guarantees the wrapped function is not called more than once.
Restrictions:  Only works with zero argument functions.
Risks: Any reentrancy or recursion will result in deadlock.
Limitations: No instrumentation. No ability to reset or clear. Won't work 
across multiple processes.

It would be nice to look at some compelling use cases.  Off hand, I can't think 
of time when I would have used this decorator.  Also, I have a nagging worry 
that holding a non-reentrant lock across an arbitrary user defined function 
call is recipe for deadlocks.  That's why during code reviews we typically 
check every single use of Lock() to see if it should have been an RLock(), 
especially in big systems where GC, __del__, or weakref callbacks can trigger 
running any code at just about any time.


Raymond










___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B32VKG5IPHKEL4Y7MP7WMQZXZYYWVT64/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: deque: Allow efficient operations

2020-04-29 Thread Raymond Hettinger

> On Apr 29, 2020, at 12:02 PM, Christopher Barker  wrote:
> 
> On Apr 29, 2020, at 08:33, Christopher Barker  wrote:
> > I've wondered about Linked Lists for a while, but while there are many 
> > versions on PyPi, I can't find one that seems to be mature and maintained. 
> > Which seems to indicate that there isn't much demand for them.
> 
> Isn't much demand for a *generic* linked list. It would probably be a good 
> recipe though -- so users could have a starting point for their custom 
> version.

In case you're interested, the pure python OrderedDict code uses a doubly 
linked list augmented by a dictionary to quickly find individual links. It may 
be worth taking at look.¹ 

The implementation was mostly obvious.  The only trick was to use weakrefs for 
the backlink to avoid creating a reference cycle — the original version just 
lets GC do the clean-up, but users wanted to avoid cycles entirely.


Raymond


¹ https://github.com/python/cpython/blob/3.8/Lib/collections/__init__.py#L78
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5ZSDX4FBSZEG3W6CGY6DNDOTLDOK7AQJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding a "once" function to functools

2020-04-28 Thread Raymond Hettinger

> On Apr 26, 2020, at 7:03 AM, Tom Forbes  wrote:
> 
> I would like to suggest adding a simple “once” method to functools. As the 
> name suggests, this would be a decorator that would call the decorated 
> function, cache the result and return it with subsequent calls.

It seems like you would get just about everything you want with one line:

 once = lru_cache(maxsize=None)

which would be used like this:

@once
def welcome():
len('hello')

> Using lru_cache like this works but it’s not as efficient as it could be - in 
> every case you’re adding lru_cache overhead despite not requiring it.


You're likely imagining more overhead than there actually is.  Used as shown 
above, the lru_cache() is astonishingly small  and efficient.  Access time is 
slightly cheaper than writing d[()]  where d={(): some_constant}. The 
infinite_lru_cache_wrapper() just makes a single dict lookup and returns the 
value.¹ The lru_cache_make_key() function just increments the empty args tuple 
and returns it.²   And because it is a C object, calling it will be faster than 
for a Python function that just returns a constant, "lambda: some_constant()".  
This is very, very fast.


Raymond


¹ https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L870
² https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L809 



___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VCWTMH6Z6ADAH5YKRQ6CU4ZIHLLBN4KQ/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Why operators are useful

2019-03-15 Thread Raymond Hettinger



> On Mar 15, 2019, at 6:49 PM, Chris Angelico  wrote:
> 
> On Sat, Mar 16, 2019 at 12:40 PM Raymond Hettinger
>  wrote:
>> Also, it seems like the efficiency concerns were dismissed with hand-waving. 
>> But usually, coping and updating aren't the desired behavior. When teaching 
>> Python, I like to talk about how the design of the language nudges you 
>> towards fast, clear, correct code.  The principle is that things that are 
>> good for you are put within easy reach. Things that require more thought are 
>> placed a little further away.  That is the usual justification for copy() 
>> and deepcopy() having to be imported rather than being builtins.  Copying is 
>> an obvious thing to do; it is also not usually good for you; so, we have you 
>> do one extra step to get to it.
>> 
> 
> I'm not sure I understand this argument. Are you saying that d1+d2 is
> bad code because it will copy the dictionary, and therefore it
> shouldn't be done? Because the exact same considerations apply to the
> addition of two lists, which already exists in the language. Is it bad
> to add lists together instead of using extend()?

Yes, that exactly.

Consider a table in a database. Usually what people want/need/ought-to-do is an 
SQL UPDATE rather than copy and update which would double the memory 
requirement and be potentially many times slower.  The same applies to Python 
lists. Unless you actually have a requirement for three distinct lists (c = a + 
b), it is almost always better to extend in place.  Adding lists rather than 
extending them is a recipe for poor performance (especially if it occurs in a 
loop):


Raymond



 Performant version 

s = socket.socket()
try:
s.connect((host, port))
s.send(request)
blocks = []
while True:
block = s.recv(4096)
if not block:
break
blocks += [block]   # Normally done with append()
page = b''.join(blocks)  
print(page.replace(b'\r\n', b'\n').decode())
finally:
s.close()

 Catastrophic version 

s = socket.socket()
try:
s.connect((host, port))
s.send(request)
blocks = []
while True:
block = s.recv(4096)
if not block:
break
blocks = blocks + [block]  # Not good for you.
page = b''.join(blocks)  
print(page.replace(b'\r\n', b'\n').decode())
finally:
s.close()

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Why operators are useful

2019-03-15 Thread Raymond Hettinger



> On Mar 15, 2019, at 12:28 PM, Rhodri James  wrote:
> 
> I suspect this is a bit personal; I had sufficiently evil lecturers in my 
> university Algebra course that I still don't automatically take the 
> commutativity of "+" over a particular group as a given :-)  Nothing is 
> obvious unless you already know it.

We don't design Python for ourselves. We design it for everyday users. Telling 
them that they can assume nothing is an anti-pattern. People do rely quite a 
bit on their intuitions. They also rely on implicit patterns already present in 
the language (i.e. in no other place is + idempotent, in no other place is + a 
destructive rather than concatenative or accumulative operator).  As for 
commutativity, + would be obviously commutative for numeric types and obviously 
noncommutative for sequence concatenation, but for dicts the non-commutativity 
isn't obvious at all. And since the "|" operator is already used for mapping 
views, the + operator for merging would be unexpected.

What is missing from the discussion is that we flat out don't need an operator 
for this.  Use of explicit method names, update() or merge(), is already clear 
and already brief.  Also, if we're honest with ourselves, most of us would use 
this less than once a year. So why make a pervasive change for this?

Today, at least one PEP was rejected that had a stronger case than this 
proposal.  We should consider asking why other major languages haven't gone 
down this path. The most likely reasons are 1) insufficient need, 2) the "+" 
operator doesn't make sense, and 3) there are already clean ways to do it.

Also, it seems like the efficiency concerns were dismissed with hand-waving. 
But usually, coping and updating aren't the desired behavior. When teaching 
Python, I like to talk about how the design of the language nudges you towards 
fast, clear, correct code.  The principle is that things that are good for you 
are put within easy reach. Things that require more thought are placed a little 
further away.  That is the usual justification for copy() and deepcopy() having 
to be imported rather than being builtins.  Copying is an obvious thing to do; 
it is also not usually good for you; so, we have you do one extra step to get 
to it.


Raymond


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Why operators are useful

2019-03-15 Thread Raymond Hettinger

> On Mar 15, 2019, at 10:51 AM, Guido van Rossum  wrote:
> 
> The general idea here is that once you've learned this simple notation, 
> equations written using them are easier to *manipulate* than equations 
> written using functional notation -- it is as if our brains grasp the 
> operators using different brain machinery, and this is more efficient.

There is no question that sometimes operators can be easier to manipulate and 
reason about than equivalent methods.  The use of "+" and "*" are a major win 
for numeric and sequence types.

There is also no question that sometimes method names are better than operators 
(otherwise, we wouldn't use method names at all).  APL is an extreme example of 
a rich set of operators being both powerful and opaque.

So, we have to ask whether we're stretching too far from "operators are good" 
to "we need this operator".  Here are some considerations:

Frequency of usage:   Math provides ∑ and ∏ because they are common. It doesn't 
provide a special operator for sqrt(c**2 - b**2) because the latter is less 
fundamental and less common.  To me, f=d.copy() followed by f.update(e) arises 
so rarely that an operator isn't warranted.  The existing code is already 
concise, clear, and rare.

Familiarity:  We know about + because we use it a lot in addition and 
concatenation contexts. However, a symbol like ⊗ is more opaque unless we're 
using it every day for a particular purpose.  To me, the "+" operator implies 
"add/extend" semantics rather than "replace" semantics.  Successive 
applications of "+" are never idempotent unless one operand is an identity 
element.  So for me, "+" isn't familiar for dict merges.  Loosely put, it isn't 
"plus-like".  I think this is why so many other languages decided not use "+" 
for dict merges even when that would have been a trivially easy implementation 
choice.

Obviousness: When working with "+" on numeric types, it is obvious it should be 
commutative. When using "+" when sequence types, it is obvious that 
concatenation is non-commutative. When using "+" for mapping types, it is not 
obvious that it isn't commutative. Likewise, it isn't obvious that "+" is a 
destructive operation for mappings (consider that adding to a log file never 
destroys existing log entries, while updating a dict will overwrite existing 
values).

Harmony: The operators on dict views use "|" but regular dicts would use "+". 
That doesn't seem harmonious.

Impact: When a class in the standard library adds a method or operator, the 
reverberations are felt only locally.  In contrast, the dict API is 
fundamental.  Changing it will reverberate for years. It will be felt in the 
ABCs, typeshed, and every mapping-like object.  IMO such an impactful change 
should only be made if it adds significant new functionality rather than 
providing a slightly shorter spelling of something we already have.



Raymond

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Dict joining using + and +=

2019-03-05 Thread Raymond Hettinger

> On Mar 5, 2019, at 2:13 PM, Greg Ewing  wrote:
> 
> Rhodri James wrote:
>> I have to go and look in the documentation because I expect the union 
>> operator to be '+'.
> 
> Anyone raised on Pascal is likely to find + and * more
> natural. Pascal doesn't have bitwise operators, so it
> re-uses + and * for set operations. I like the economy
> of this arrangement -- it's not as if there's any
> other obvious meaning that + and * could have for sets.

The language SETL (the language of sets) also uses + and * for set operations.¹

For us though, the decision to use  | and & are set in stone.  The time for 
debating the decision was 19 years ago.²


Raymond


¹ https://www.linuxjournal.com/article/6805
² https://www.python.org/dev/peps/pep-0218/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Dict joining using + and +=

2019-03-04 Thread Raymond Hettinger



> On Mar 4, 2019, at 11:24 AM, Guido van Rossum  wrote:
> 
> * Regarding how often this is needed, we know that this is proposed and 
> discussed at length every few years, so I think this will fill a real need.

I'm not sure that conclusion follows from the premise :-)  Some ideas get 
proposed routinely because they are obvious things to propose, not because 
people actually need them.  One hint is that the proposals always have generic 
variable names, "d = d1 + d2", and another is that they are almost never 
accompanied by actual use cases or real code that would be made better. I 
haven't seen anyone in this thread say they would use this more than once a 
year or that their existing code was unclear or inefficient in any way.  The 
lack of dict addition support in other languages (like Java example) is another 
indicator that there isn't a real need -- afaict there is nothing about Python 
that would cause us to have a unique requirement that other languages don't 
have.

FWIW, there are some downsides to the proposal -- it diminishes some of the 
unifying ideas about Python that I typically present on the first day of class:

* One notion is that the APIs nudge users toward good code.  The "copy.copy()" 
function has to be imported -- that minor nuisance is a subtle hint that 
copying isn't good for you.  Likewise for dicts, writing "e=d.copy(); 
e.update(f)" is a minor nuisance that either serves to dissuade people from 
unnecessary copying or at least will make very clear what is happening.  The 
original motivating use case for ChainMap() was to make a copy free replacement 
for excessively slow dict additions in ConfigParser.  Giving a plus-operator to 
mappings is an invitation to writing code that doesn't scale well.

* Another unifying notion is that the star-operator represents repeat addition 
across multiple data types.  It is a nice demo to show that "a * 5 == a + a + a 
+ a + a" where "a" is an int, float, complex, str, bytes, tuple, or list.  
Giving __add__() to dicts breaks this pattern.

* When teaching dunder methods, the usual advice regarding operators is to use 
them only when their meaning is unequivocal; otherwise, have a preference for 
named methods where the method name clarifies what is being done -- don't use 
train+car to mean train.shunt_to_middle(car). For dicts that would mean not 
having the plus-operator implement something that isn't inherently additive (it 
applies replace/overwrite logic instead), that isn't commutative, and that 
isn't linear when applied in succession (d1+d2+d3).

* In the advanced class where C extensions are covered, the organization of the 
slots is shown as a guide to which methods make sense together: tp_as_number, 
tp_as_sequence, and tp_as_mapping.  For dicts to gain the requisite methods, 
they will have to become numbers (in the sense of filling out the tp_as_number 
slots).  That will slow down the abstract methods that search the slot groups, 
skipping over groups marked as NULL.  It also exposes method groups that don't 
typically appear together, blurring their distinction.

* Lastly, there is a vague piece of zen-style advice, "if many things in the 
language have to change to implement idea X, it stops being worth it".   In 
this case, it means that every dict-like API and the related abstract methods 
and typing equivalents would need to grow support for addition in mappings 
(would it even make sense to add to shelve objects or os.environ objects 
together?)

That's my two cents worth.  I'm ducking out now (nothing more to offer on the 
subject). Guido's participation in the thread has given it an air of 
inevitability so this post will likely not make a difference.


Raymond


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Dict joining using + and +=

2019-03-02 Thread Raymond Hettinger


> On Mar 1, 2019, at 11:31 AM, Guido van Rossum  wrote:
> 
> There's a compromise solution for this possible. We already do this for 
> Sequence and MutableSequence: Sequence does *not* define __add__, but 
> MutableSequence *does* define __iadd__, and the default implementation just 
> calls self.update(other). I propose the same for Mapping (do nothing) and 
> MutableMapping: make the default __iadd__ implementation call 
> self.update(other).

Usually, it's easy to add methods to classes without creating disruption, but 
ABCs are more problematic.  If MutableMapping grows an __iadd__() method, what 
would that mean for existing classes that register as MutableMapping but don't 
already implement __iadd__?  When "isinstance(m, MutableMapping)" returns True, 
is it a promise that the API is fully implemented? Is this something that mypy 
could would or should complain about?

> Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter 
> is highly non-obvious except if you've already encountered that pattern before

I concur.  The latter is also an eyesore and almost certain to be a stumbling 
block when reading code.

That said, I'm not sure we actually need a short-cut for "d=e.copy(); 
d.update(f)".  Code like this comes-up for me perhaps once a year.  Having a 
plus operator on dicts would likely save me five seconds per year.

If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); 
d.update(h)", converting it to "d = e + f + g + h" would be a tempting but 
algorithmically poor thing to do (because the behavior is quadratic).  Most 
likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a 
zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result 
without incurring quadratic costs.  Both of those are short and clear.

Lastly, I'm still bugged by use of the + operator for replace-logic instead of 
additive-logic.  With numbers and lists and Counters, the plus operator creates 
a new object where all the contents of each operand contribute to the result.  
With dicts, some of the contents for the left operand get thrown-away.  This 
doesn't seem like addition to me (IIRC that is also why sets have "|" instead 
of "+").
 

Raymond


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 8 update on line length

2019-02-24 Thread Raymond Hettinger


> On Feb 22, 2019, at 1:10 PM, Greg Ewing  wrote:

>> “Typesetters hundreds of years ago used less than 80 chars per line, so 
>> that’s what we should do for Python code now” is a pretty weak argument.
> 
> But that's not the entire argument -- the point it is that typesetters
> had the goal of making lines of text readable, which is similar (if not
> quite the same) as the goal of making lines of program code readable.
> It's a lot closer than, for example, the goal of fitting in an
> accountant's spreadsheet.


The issue with reference to typesetter rules is that they were targeted at 
blocks of prose rather than heavily nested hanging indents with non-trivial 
string literals or a dotted attribute notation.  Typesetters were also dealing 
with fixed page widths and need to leave gutter space for binding.

The "rules" aren't comparable at all.  


> I would say it the other way around. Once you've reduced the complexity
> of a line to something a human can handle, *most* of the time 80 chars
> is enough.

That would make sense if we started at column 0; however, if you have your 
prefix your thoughts with something like

'''
class TestRemote(unittest.TestCase):
def test_heartbeat(self):
...
self.assertIsInstance(...
'''

then the meant of the part "a human can handle" starts at column 30.  Then if 
you need good variable names and/or have to module.function prefixes, there is 
sometimes little to left to work with.


Raymond


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 8 update on line length

2019-02-21 Thread Raymond Hettinger



> On Feb 21, 2019, at 5:06 PM, Chris Barker via Python-ideas 
>  wrote:
> 
>  
> class Frabawidget:
> ...
> @wozzle.setter
> def (self, woozle):
> if not (self.min_woozle < woozle < self.max_woozle):
> raise ValueError(f"Expected woozle to be between 
> {self.min_woozle} and {self.max_woozle}")
> self._wozzle = normalize(woozle)
> 
> That's 103 chars long -- and very readable. But, is this that much worse?
> 
> class Frabawidget:
> ...
> @wozzle.setter
> def (self, woozle):
> if not (self.min_woozle < woozle < self.max_woozle):
> raise ValueError(f"Expected woozle to be between"
>   "{self.min_woozle} and {self.max_woozle}")
> self._wozzle = normalize(woozle)
>  
> (it IS harder to write, that's for sure)

Yes, it's worse.  You introduced twos bugs.  First the space between the two 
fragments was lost.  Second the f on the second f-string was dropped.  I see 
these kinds of line-wrapping errors frequently.  The bugs are CAUSED by the 
line length rule.

Also, in the case of multi-line templates, there is no way to wrap them without 
getting very far from WYSIWYG:

def run_diagnostics(location, system, test_engineer):
...
if (failures):
print(dedent(f'''\
There were {num_failures) anomalies detected in the {location} 
{system} at {event_time ::%I:%M:%S}}.
These anomalies were classified as {level}.  Further action is {'' 
if escalate else 'not'} recommended.
''')
else:
print(dedent(f'''\
A total of {num_test_cases} diagnostics were run in the {location} 
{system} as of {event_time::%I:%M:%S}}.
No anomalies were detected and further action is not required.
Test signed by {test_engineer.title()}.
...


Raymond
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 505: None-aware operators

2018-07-25 Thread Raymond Hettinger


> On Jul 18, 2018, at 10:43 AM, Steve Dower  wrote:
> 
> Possibly this is exactly the wrong time to propose the next big syntax 
> change, since we currently have nobody to declare on it, but since we're 
> likely to argue for a while anyway it probably can't hurt (and maybe this 
> will become the test PEP for whoever takes the reins?).

It probably is the wrong time and probably can hurt (by introducing 
divisiveness when we most need for be focusing on coming together).

This PEP also shares some traits with PEP 572 in that it solves a somewhat 
minor problem with   new syntax and grammar changes that affect the look and 
feel of the language in a way that at least some of us (me for example) find to 
be repulsive.

This PEP is one step further away from Python reading like executable 
pseudo-code.  That trait is currently a major draw to the language and I don't 
think it should get tossed away just to mitigate a minor irritant.

We should also consider a moratorium on language changes for while.  There is 
more going on than just a transition to a post-bdfl world.  The other 
implementations of Python are having a hard time keeping up with our recent, 
ferocious rate of change.  Even among the core developers, most people are not 
fully up to date learning all the new features that have already been added 
(how many of you are competent with typing, data classes, generalized 
unpacking, concurrent futures, async, the scoping rules for exceptions and 
comprehensions, the hundreds of niggling changes in the past few releases, 
__init_subclass__, __set_name__, details of import logic, issues with SSL 
certificates, new collections ABCs, etc.?)  We've been putting major changes in 
faster than anyone can keep up with them.  We really need to take a breath.


Raymond




___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: collections.Counter should implement fromkeys

2018-06-29 Thread Raymond Hettinger
On Jun 29, 2018, at 5:32 PM, Abe Dillon  wrote:
> 
> Sure, but in Hettinger's own words "whenever you have a constructor war, 
> everyone should get their wish". People that want a counting constructor have 
> that,
> people that want the ability to initialize values don't have that.

Sorry Abe, but you're twisting my words and pushing very hard for a proposal 
that doesn't make sense and isn't necessary.

* Counts initialized to zero:   This isn't necessary.  The whole point of 
counters is that counts default to zero without pre-initialization.

* Counts initialized to one:  This is already done by the regular constructor.  
Use "Counter(keys)" if the keys are known to be unique and "Counter(set(keys)" 
to ignore duplicates.

>>> Counter('abc')
Counter({'a': 1, 'b': 1, 'c': 1})
>>> Counter(set('abbacac'))
Counter({'a': 1, 'b': 1, 'c': 1})

* Counts initialized to some other value:  That would be an unusual thing to do 
but would be easy with the current API.

>>> Counter(dict.fromkeys('abc', 21))
Counter({'a': 21, 'b': 21, 'c': 21})

* Note, the reason that fromkeys() is disabled is that it has nonsensical or 
surprising interpretations:

>>> Counter.fromkeys('aaabbc', 2)  # What should this do that 
doesn't surprise at least some users?

* That reason is already shown in the source code.

@classmethod
def fromkeys(cls, iterable, v=None):
# There is no equivalent method for counters because setting v=1
# means that no element can have a count greater than one.
raise NotImplementedError(
'Counter.fromkeys() is undefined.  Use Counter(iterable) 
instead.')

> Obviously, Python breaks SOLID principals successfully all over the place for 
> pragmatic reasons.
> I don't think this is one of those cases.


No amount of citing generic design principles will justify adding an API that 
doesn't make sense.  

Besides, any possible use cases already have reasonable solutions using the 
existing API.  That is likely why no one has ever requested this behavior 
before.

Based on what I've read in this thread, I see nothing that would change the 
long-standing decision not to have a fromkeys() method for collections.Counter. 
 The original reasoning still holds.


Raymond
  
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Have a "j" format option for lists

2018-05-09 Thread Raymond Hettinger
On May 9, 2018, at 7:39 AM, Facundo Batista  wrote:
> 
> This way, I could do:
> 
 authors = ["John", "Mary", "Estela"]
 "Authors: {:, j}".format(authors)
> 'Authors: John, Mary, Estela'
> 
...
> 
> What do you think?

That is an inspired idea.  I like it :-)


Raymond
 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add "default" kw argument to operator.itemgetter and operator.attrgetter

2018-05-06 Thread Raymond Hettinger

> On May 6, 2018, at 6:00 AM, Steven D'Aprano  wrote:
> 
> On Thu, May 03, 2018 at 04:32:09PM +1000, Steven D'Aprano wrote:
> 
>> Maybe I'm slow today, but I'm having trouble seeing how to write this as 
>> a lambda.
> 
> Yes, I was definitely having a "cannot brain, I have the dumb" day, 
> because it is not that hard to write using lambda. See discussion here:
> 
> https://mail.python.org/pipermail/python-list/2018-May/732795.html
> 
> If anything, the problem is a plethora of choices, where it isn't clear 
> which if any is the best way, or the One Obvious Way

At one time, lambda was the one obvious way.  Later, partial, itemgetter, 
attrgetter, and methodcaller were added to express common patterns for 
key-functions and map().  If needed, the zoo of lambda alternatives could be 
further extended to add a rpartial() function that partials from the right.  
That would have helped with Miki's example.  Instead of:

get = attrgetter('foo', None)
return get(args) or get(config) or get(env)

He could've written:

get = rpartial(getattr, 'foo', None)
return get(args) or get(config) or get(env)

If itemgetter and attrgetter only did a single lookup, a default might make 
sense.  However, that doesn't fit well with multiple and/or chained lookups 
where are number of options are possible. (See 
https://bugs.python.org/issue14384#msg316222 for examples and alternatives).
   

Raymond
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] [offlist] Re: Add "default" kw argument to operator.itemgetter and operator.attrgetter

2018-05-03 Thread Raymond Hettinger

> On May 2, 2018, at 11:32 PM, Steven D'Aprano  wrote:
> 
> Intended by whom?

By me.  I proposed itemgetter() in the first place.  That rationale I gave 
convinced Guido and python-dev to accept it.  I then wrote the code, docs, 
tests and have maintained it for over a decade.  So, I have a pretty good idea 
of what it was intended for.


> I think you are being too dismissive of actual use-cases requested by 
> actual users. 

Wow, I don't know what to do with this.  Over the years, I've added a lot of 
things requested by users.  I really don't like the tone you've struck and what 
you've implied about me as developer.  That feels somewhat pushy and 
aggressive.  Why not just give a +1 to things that are a good idea and -1 for 
things we're better off without -- no need for ad hominem comments about the 
person making the post rather than its content -- that feels somewhat 
disrespectful.


> Default values might not have been the primary use 
> considered when the API was first invented, but the fact that people 
> keep asking for this feature should tell us that at least some people 
> have intended uses that are remaining unmet.

When I've seen the request in the past, it always alway "it might be nice if 
..." but there were no legitimate use cases presented, just toy examples.   
Also, I'm concerned that about increasing the complexity of itemgetter() API to 
serve an occasional exotic use case rather that being easy to learn and 
remember for the common cases. 


Raymond



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add "default" kw argument to operator.itemgetter and operator.attrgetter

2018-05-02 Thread Raymond Hettinger


> On May 2, 2018, at 1:08 AM, Vincent Maillol  wrote:
> 
> Our PEP idea would be to purpose to add a global default value for
> itemgeet and attrgetter method.

My preference is to not grow that API further.  It is creep well beyond its 
intended uses.  At some point, we're really better off just using a lambda.


Raymond
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-18 Thread Raymond Hettinger


> On Apr 16, 2018, at 5:43 PM, Tim Peters  wrote:
> 
> BTW, if _`Counter * scalar` is added, we should think more about
> oddball cases.  While everyone knows what _they_ mean by "scalar",
> Python doesn't.

I've started working on an implementation and several choices arise:

1) Reject scalar with a TypeError if scalar is a Counter
2) Reject scalar with a TypeError if scalar is a Mapping
3) Reject scalar with a TypeError if scalar is a Collection
4) Reject scalar with a TypeError if scalar is Sized (has a __len__ method).

I lean toward rejecting all things Sized because _everyone_ knows that scalars 
aren't sized ;-)


Raymond


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-15 Thread Raymond Hettinger


> On Apr 15, 2018, at 10:07 PM, Tim Peters  wrote:
> 
> Adding Counter * integer doesn't bother me a bit, but the definition
> of what that should compute isn't obvious.

Any thoughts on Counter * float?   A key use case for what is being proposed is:

c *= 1 / c.total


Raymond

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-15 Thread Raymond Hettinger


> On Apr 15, 2018, at 9:04 PM, Peter Norvig  wrote:
> 
> it would be a bit weird and disorienting for the arithmetic operators to have 
> two different signatures:
> 
>  += 
>  -= 
>  *= 
>  /= 
> 
> Is it weird and disorienting to have:
> 
>  += 
>  *=  

Yes, there is a precedent that does seem to have worked out well in practice 
:-)  It isn't exactly parallel because strings aren't containers of numbers, 
they don't have & and |, and there isn't a reason to want a / operation, but it 
does suggest that signature variation might not be problematic.  

BTW, do you just want __mul__ and __rmul__?  If those went in, presumably there 
will be a request to support __imul__ because otherwise c*=3 would still work 
but would be inefficient (that was the rationale for adding inplace variants 
for all the current arithmetic operators). Likewise, presumably someone would 
legitimately want __div__ to support the normalization use case.  Perhaps less 
likely, there would be also be a request for __floordiv__ to allow exactly 
scaled results to stay in the domain of integers.  Which if any of these makes 
sense to you?

Also, any thoughts on the cleanest way to express the computation of a 
chi-squared statistic (for example, to compare observed first digit frequencies 
to the frequencies predicted by Benford's Law)?  This isn't an arbitrary 
question (it came up when a professor first proposed a variant of this idea a 
few years ago).


Raymond
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-15 Thread Raymond Hettinger

> On Apr 15, 2018, at 7:18 PM, Wes Turner  wrote:
> 
> And I'm done sharing non-pure-python solutions for this problem, I promise

Keep them coming :-)

Thanks for the research.  It helps to remind ourselves that almost none of our 
problems are new :-)


Raymond
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-15 Thread Raymond Hettinger


> On Apr 15, 2018, at 5:44 PM, Peter Norvig  wrote:
> 
> If you think of a Counter as a multiset, then it should support __or__, not 
> __add__, right?

FWIW, Counter is explicitly documented to support the four multiset-style 
mathematical operations discussed in Knuth TAOCP Volume II section 4.6.3 
exercise 19:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d   # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d   # saturating subtraction (keeping only positive 
>>> counts)
Counter({'a': 2})
>>> c & d   # intersection:  min(c[x], d[x]) 
Counter({'a': 1, 'b': 1})
>>> c | d   # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

The wikipedia article on Multisets lists a further operation, inclusion, that 
is not currently supported:  
https://en.wikipedia.org/wiki/Multiset#Basic_properties_and_operations

> I do think it would have been fine if Counter did not support "+" at all 
> (and/or if Counter was limited to integer values). But  given where we are 
> now, it feels like we should preserve `c + c == 2 * c`. 

The + operation has legitimate use cases (it is perfectly reasonable to want to 
combine the results two separate counts).  And, as you pointed out, it is what 
we already have and cannot change :-)

So, the API design issue that confronts us is that it would be a bit weird and 
disorienting for the arithmetic operators to have two different signatures:

 += 
 -= 
 *= 
 /= 

Also, we should respect the comments given by others on the tracker issue.  In 
particular, there is a preference to not have an in-place operation and only 
allow a new counter instance to be created.  That will help people avoid data 
structure modality problems:
.  
c[category] += 1   # Makes sense during the frequency counting or 
accumulation phase
c /= c.total   # Covert to a probability mass function
c[category] += 1   # This code looks correct but no longer makes any 
sense


> As to the "doesn't really add any new capabilities" argument, that's true, 
> but it is also true for Counter as a whole: it doesn't add much over 
> defaultdict(int), but it is certainly convenient to have a standard way to do 
> what it does.

IIRC, the defaultdict(int) in your first version triggered a bug because the 
model inadvertently changed during the analysis phase rather than being frozen 
after the training phase.  The Counter doesn't suffer from the same issue 
(modifying the dict on a failed lookup).  Also, the Counter class does have a 
few value added features:  Counter(iterable), c.most_common(), c.elements(), 
etc.   But yes, at its heart the counter is mostly just a specialized 
dictionary.  The thought I was trying to express is that suggestions to build 
out Counter API are a little less compelling when we already have a way to do 
it that is flexible, fast, clear, and standard (i.e. dict comprehensions).


> I agree with your intuition that low level is better. `total` would be 
> useful. If you have total and mul, then as you and others have pointed out, 
> normalize is just c *= 1/c.total.

I fully support adding some functionality for scaling to support probability 
distributions, bayesian update steps, chi-square tests, and whatnot.  The 
people who need convincing are the other respondents on the tracker.  They had 
a strong mental model for the Counter class that is somewhat at odds with this 
proposal.


Raymond



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__

2018-04-15 Thread Raymond Hettinger


> On Apr 15, 2018, at 2:05 PM, Peter Norvig  wrote:
> 
> For most types that implement __add__, `x + x` is equal to `2 * x`. 
> 
> ... 
> 
> 
> That is true for all numbers, list, tuple, str, timedelta, etc. -- but not 
> for collections.Counter. I can add two Counters, but I can't multiply one by 
> a scalar. That seems like an oversight. 

If you view the Counter as a sparse associative array of numeric values, it 
does seem like an oversight.  If you view the Counter as a Multiset or Bag, it 
doesn't make sense at all ;-)

>From an implementation point of view, Counter is just a kind of dict that has 
>a __missing__() method that returns zero.  That makes it trivially easy to 
>subclass Counter to add new functionality or just use dictionary 
>comprehensions for bulk updates.

>  
> 
> It would be worthwhile to implement multiplication because, among other 
> reasons, Counters are a nice representation for discrete probability 
> distributions, for which multiplication is an even more fundamental operation 
> than addition.

There is an open issue on this topic.  See:  https://bugs.python.org/issue25478

One stumbling point is that a number of commenters are fiercely opposed to 
non-integer uses of Counter. Also, some of the use cases (such as those found 
in Allen Downey's "Think Stats" and "Think Bayes" books) also need division and 
rescaling to a total (i.e. normalizing the total to 1.0) for a probability mass 
function.

If the idea were to go forward, it still isn't clear whether the correct API 
should be low level (__mul__ and __div__ and a "total" property) or higher 
level (such as a normalize() or rescale() method that produces a new Counter 
instance).  The low level approach has the advantage that it is simple to 
understand and that it feels like a logical extension of the __add__ and 
__sub__ methods.  The downside is that doesn't really add any new capabilities 
(being just short-cuts for a simple dict comprehension or call to c.values()).  
And, it starts to feature creep the Counter class further away from its core 
mission of counting and ventures into the realm of generic sparse arrays with 
numeric values.  There is also a learnability/intelligibility issue in __add__ 
and __sub__ correspond to "elementwise" operations while  __mul__ and __div__ 
would be "scalar broadcast" operations.

Peter, I'm really glad you chimed in.  My advocacy lacked sufficient weight to 
move this idea forward.


Raymond



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

2018-04-08 Thread Raymond Hettinger


> On Apr 8, 2018, at 6:43 PM, Tim Peters  wrote:
> 
>> My other common case for accumulate() is building cumulative
>> probability distributions from probability mass functions (see the
>> code for random.choice() for example, or typical code for a K-S test).
> 
> So, a question:  why wasn't itertools.accumulate() written to accept
> iterables of _only_ numeric types?  Akin to `sum()`.  I gather from
> one of Nick's messages that it was so restricted in 3.2.  Then why was
> it generalized to allow any 2-argument function?

Prior to 3.2, accumulate() was in the recipes section as pure Python code.  It 
had no particular restriction to numeric types.

I received a number of requests for accumulate() to be promoted to a real 
itertool (fast, tested, documented C code with a stable API).  I agreed and 
accumulate() was added to itertools in 3.2.  It worked with anything supporting 
__add__, including str, bytes, lists, and tuples.  More specifically, 
accumulate_next() called PyNumber_Add() without any particular type restriction.

Subsequently, I got requests to generalize accumulate() to support any arity-2 
function (with operator.mul offered as the motivating example).  Given that 
there were user requests and there were ample precedents in other languages, I 
acquiesced despite having some reservations (if used with a lambda, the 
function call overhead might make accumulate() slower than a plain Python 
for-loop without the function call). So, that generalized API extension went 
into 3.3 and has remained unchanged ever since.

Afterwards, I was greeted with the sound of crickets.  Either it was nearly 
perfect or no one cared or both ;-)  

It remains one of the least used itertools.


> Given that it was, `sum()` is no longer particularly relevant:  the
> closest thing by far is now `functools.reduce()`, which does support
> an optional `initial` argument.  Which it really should, because it's
> impossible for the implementation to guess a suitable starting value
> for an arbitrary user-supplied dyadic function.
> 
> My example using accumulate() to generate list prefixes got snipped,
> but same thing there:  it's impossible for that snippet to work unless
> an empty list is supplied as the starting value.  And it's impossible
> for the accumulate() implementation to guess that.

Honestly, I couldn't immediately tell what this code was doing:

list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[]))

This may be a case where a person would be better-off without accumulate() at 
all.


> In short, for _general_ use `accumulate()` needs `initial` for exactly
> the same reasons `reduce()` needed it.

The reduce() function had been much derided, so I've had it mentally filed in 
the anti-pattern category.  But yes, there may be wisdom there.


> BTW, the type signatures on the scanl (requires an initial value) and
> scanl1 (does not support an initial value) implementations I pasted
> from Haskell's Standard Prelude give a deeper reason:  without an
> initial value, a list of values of type A can only produce another
> list of values of type A via scanl1.  The dyadic function passed must
> map As to As.  But with an initial value supplied of type B, scanl can
> transform a list of values of type A to a list of values of type B.
> While that may not have been obvious in the list prefix example I
> gave, that was at work:  a list of As was transformed into a list _of_
> lists of As.  That's impossible for scanl1 to do, but easy for scanl.

Thanks for pointing that out.  I hadn't considered that someone might want to 
transform one type into another using accumulate().  That is pretty far from my 
mental model of what accumulate() was intended for.  Also, I'm still not sure 
whether we would want code like that buried in an accumulate() call rather than 
as a regular for-loop where I can see the logic and trace through it with pdb.

As for scanl, I'm not sure what this code means without seeing some python 
equivalent.

scanl:: (a -> b -> a) -> a -> [b] -> [a]
scanl f q xs =  q : (case xs of
   []   -> []
   x:xs -> scanl f (f q x) xs)


scanl1   :: (a -> a -> a) -> [a] -> [a]
scanl1 f (x:xs)  =  scanl f x xs
scanl1 _ []  =  []


> Or, in short, someone coming from a typed functional language
> background sees all sorts of things that rarely (if ever) come up in
> number-crunching languages.  Their sensibilities should count too -
> although not much ;-)  They should get _some_ extra consideration in
> this context, though, because `itertools` is one of the first things
> they dig into when they give Python a try.

I concur.


>> and it would have been distracting to even had the option.
> 
> Distracting for how long?  One second or two? ;-)

Possibly forever.  In my experience, if a person initially frames a problem 
wrong (or perhaps in a hard to solve way), it can 

Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

2018-04-08 Thread Raymond Hettinger

> On Apr 8, 2018, at 12:22 PM, Tim Peters  wrote:
> 
> [Guido]
>> Well if you can get Raymond to agree on that too I suppose you can go ahead.
>> Personally I'm -0 but I don't really write this kind of algorithmic code
>> enough to know what's useful.
> 
> Actually, you do - but you don't _think_ of problems in these terms.
> Neither do I.  For those who do:  consider any program that has state
> and responds to inputs.  When you get a new input, the new state is a
> function of the existing state and the input.

The Bayesian world view isn't much different except they would prefer "prior" 
instead of "initial" or "start" ;-)

my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, 
prior=what_i_used_to_think)

Though the two analogies are cute, I'm not sure they tell us much.  In running 
programs or bayesian analysis, we care more about the result rather than the 
accumulation of intermediate results.

My own experience with actually using accumulations in algorithmic code falls 
neatly into two groups.  Many years ago, I used APL extensively in accounting 
work and my recollection is that a part of the convenience of "\+" was that the 
sequence length didn't change (so that the various data arrays could 
interoperate with one another).  

My other common case for accumulate() is building cumulative probability 
distributions from probability mass functions (see the code for random.choice() 
for example, or typical code for a K-S test).

For neither of those use case categories did I ever want an initial value and 
it would have been distracting to even had the option. For example, when doing 
a discounted cash flow analysis, I was taught to model the various flows as a 
single sequence of up and down arrows rather than thinking of the initial 
balance as a distinct concept¹

Because of this background, I was surprised to have the question ever come up 
at all (other than the symmetry argument that sum() has "start" so accumulate() 
must as well).

When writing itertools.accumulate(), I started by looking to see what other 
languages had done.  Since accumulate() is primarily a numerical tool, I 
expected that the experience of numeric-centric languages would have something 
to teach us.  My reasoning was that if the need hadn't arisen for APL, R, 
Numpy, Matlab², or Mathematica, perhaps it really was just noise.

My views may be dated though.  Looking at the wheel sieve and collatz glide 
record finder, I see something new, a desire to work with lazy, potentially 
infinite accumulations (something that iterators do well but almost never 
arises in the world of fixed-length sequences or cumulative probability 
distributions).

So I had been warming up to the idea, but got concerned that Nick could have 
had such a profoundly different idea about what the code should do.  That 
cooled my interest a bit, especially when thinking about two key questions, 
"Will it create more problems than it solves?" and "Will anyone actually use 
it?".



Raymond







¹ 
http://www.chegg.com/homework-help/questions-and-answers/solve-present-worth-cash-flow-shown-using-three-interest-factors-10-interest-compounded-an-q878034

² https://www.mathworks.com/help/matlab/ref/accumarray.html
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

2018-04-07 Thread Raymond Hettinger


> On Apr 6, 2018, at 9:06 PM, Tim Peters  wrote:
> 
>> 
>>What is this code trying to accomplish?
> 
> It's quite obviously trying to bias the reader against the proposal by
> presenting a senseless example ;-) 

FWIW, the example was not from me. It was provided by the OP on the tracker.  I 
changed the start point from 10 to a 6 so it at least made some sense as the 
continuation of a factorial sequence: 6 24 120


> By sheer coincidence, I happened to write another yesterday.  This is
> from a program looking for the smallest integers that yield new
> records for Collatz sequence lengths.

Nice.  That brings the number of real-world examples up to a total of three 
(collatz, wheel sieve, and signal processing).  Prior to today, that total was 
only one (which was found after much digging).

> Later:
> 
>def coll(SHIFT=24):
>...
>from itertools import accumulate, chain, cycle
>...
>LIMIT = 1 << SHIFT
>...
>abc, first, deltas = buildtab(SHIFT, LIMIT)
>...
>for num in accumulate(chain([first], cycle(deltas))):
>assert num % 3 != 2
> 
> As in Will's code, it would be more readable as:
> 
>for num in accumulate(cycle(deltas), start=first):

That does read better.  I am curious how you would have written it as a plain 
for-loop before accumulate() was added (part of the argument against reduce() 
was that a plain for-loop would be clearer 99% of the time).


> That said, if the need came up often, as you noted it's dead easy to
> write a helper function to encapsulate the "head scratcher" part, and
> with no significant loss of efficiency.
> 
> So I'd be -0 overall, _except_ that "chain together a singleton list
> and a cycle" is so obscure on the face of it than I'm not sure most
> programmers who wanted the functionality of `start=` would ever think
> of it.  I'm not sure that I would have, except that I studied Ness's
> wheel sieve code a long time ago and the idea stuck.  So that makes me
> +0.4.

Agreed that the "chain([x], it)" step is obscure.  That's a bit of a bummer -- 
one of the goals for the itertools module was to be a generic toolkit for 
chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for 
iterators).  The docs probably need another recipe to show this pattern:

def prepend(value, iterator):
"prepend(1, [2, 3, 4]) -> 1 2 3 4"
return chain([value], iterator)

Thanks for taking a look at the proposal.  I was -0 when it came up once 
before. Once I saw a use case pop-up on this list, I thought it might be worth 
discussing again.



Raymond











___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/